In Int’l Symp. of Field-Programmable Custom Computing Machines (FCCM), 2019

In FPGA, 2018

In FPGA, 2017

In IEEE Comm Letters, 2013


T2S-Tensor: Productively Generating High-Performance Spatial Hardware for Dense Tensor Computations

We present a language and compilation framework for productively generating high-performance systolic arrays for dense tensor kernels on spatial architectures, including FPGAs and CGRAs. It decouples a functional specification from a spatial mapping, allowing programmers to quickly explore various spatial optimizations for the same function. The actual implementation of these optimizations is left to a compiler. Thus, productivity and performance are achieved at the same time.


Pointer-Chase Prefetcher

Pointer-Chase Prefetcher for Linked Data Structures


I have been TA for following courses at Cornell University:

  • CS3420: Embedded Systems