This is an introduction tutorial to gem5 simulation framework


This is an introduction tutorial on gem5 minor cpu model.

Many a times it gets difficult for the computer architects to get started with event-driven simulators. This document is written to target that audience and provide an overview of the minor cpu model in gem5 which implements an in-order pipelined processor. If you have never worked on event-driven simulators and don’t know what they are, there is a cool video here. This tutorial will help the reader to understand how the event-driven minor cpu model is implemented in gem5 and will not go much into details of how to compile and build gem5, how to add tracing and what are ports and how do they work. This information can be found in Learning Gem5. OK!! So lets get started.


This is a tutorial on how to add an instruction to the RISCV ISA, how to write program with the special instruction. I will also talk about how to add the new instruction to RISCV assembler and how to execute it on gem5.


This is a tutorial on how to add statistics in gem5.


This is an introduction tutorial on different types of simulation techniques and a brief overview of event driven simulators.


Here I will talk about miscellaneous things that you can use with gem5.



Modern high-level synthesis (HLS) tools greatly reduce the turnaround time of designing and implementing complex FPGA-based accelerators. They also expose various optimization opportunities, which cannot be easily explored at the register-transfer level. With the increasing adoption of the HLS design methodology and continued advances of synthesis optimization, there is a growing need for realistic benchmarks to (1) facilitate comparisons between tools, (2) evaluate and stress-test new synthesis techniques, and (3) establish meaningful performance baselines to track progress of the HLS technology. While several HLS benchmark suites already exist, they are primarily comprised of small textbook-style function kernels, instead of complete and complex applications. To address this limitation, we introduce Rosetta, a realistic benchmark suite for software programmable FPGAs. In this paper we describe the characteristics of our benchmarks and the optimization techniques applied to them.
In FPGA, 2018

High-level synthesis (HLS) enables designing at a higher level of abstraction to effectively cope with design complexity of emerging applications on modern programmable system-on-chip (SoC). While HLS continues to evolve with a growing set of algorithms, methodologies, and tools to efficiently map software designs onto optimized hardware architectures, there continues to lack realistic benchmark applications with sufficient complexity and enforceable constraints. In this work we presented a case study of accelerating face detection based on the Viola Jones algorithm on a programmable SoC using a C-based HLS flow. Our design was able to achieve a frame rate of 30 frames per second which is suitable for realtime applications.
In FPGA, 2017

For multi-beam broadband satellites operating at 10 GHz and above frequencies, rain attenuation is the dominant impairment factor. Using a stochastic model for rain attenuation prediction and a greedy approach, dynamic power allocation has been recently shown to increase the number of users served than the static technique. This letter proposes a new dynamic power allocation algorithm the novelty of which lies in treating users with similar power requirement as a group, instead of individuals. Thus, without resorting to exhaustive search we are able to serve more number of users than the existing technique.
In IEEE Comm Letters, 2013


Pointer-Chase Prefetcher

Pointer-Chase Prefetcher for Linked Data Structures


I have been TA for following courses at Cornell University:

  • CS3420: Embedded Systems