This is a list of academic papers that cover all sorts of FPGA related topic, more from a system researcher's point of view though.
Not all the listed papers are good, some of them just repeat the history. That's why I started this: have a thorough understanding of this area first. If you see any papers missing, please comment below and I will add accordingly.
Website version: link.
Questions to cover:
- Can we partition an FPGA bitstream into multiple small ones? If so, how?
- Are there any tools that could automatically partiton an FPGA bitstream into multiple smalls ones without developer intervention?
- Sure, partitioning adds extra communication overhead, potentionally lower overall performance. But partitioning should be able to greatly increases scheduling flexibility in terms of area allocation.
- How to do hardware context-switch?
- How can we preemptively de-schedule already deployed bitstream? What should be taken care of?
- How to deal with area framentation?
- FPGA area framentation is different from traditional memory framentation, simply because a) FPGA area is two-dimentional, b) relocating a bitstream may not be possible, c) the bitstream has a fixed requirement on the area.
- How to allocate FPGA areas?
- How to relocate already deployed bitstreams to make room for a new one?
- What scheduling algorithms shall we use?
Offline:
- hthreads: A hardware/software co-designed multithreaded RTOS kernel, 2005
- hthreads: Enabling a Uniform Programming Model Across the Software/Hardware Boundary, FCCM'16
- Tartan: Evaluating Spatial Computation for Whole Program Execution, ASPLOS'06
Online:
- A virtual hardware operating system for the Xilinx XC6200, 1996
- The Swappable Logic Unit: a Paradigm for Virtual Hardware, FCCM'97
- Run-time management of dynamically reconfigurable designs, 1998
- All above ones are early work on FPGA scheduling.
- Worth a read, but don't take some of their assumptions. Some have been changed after SO many years.
- Multitasking on FPGA Coprocessors, 2000
- Be careful about the concurrent DRAM accesses if you want to PR an exisitng bitstream!
- I think this rule applies all kinds of IO communication: make sure to handle on-the-fly transactions.
- Preemptive multitasking on fpgas, FCCM'00
- Very practical technique discussions about doing preemptive scheduling on FPGA.
- The Development of an Operating System for Reconfigurable Computing, 2001
- Discussed about area framentation.
- It's discussion about partitioning, place, and route FPGA bitstreams brought me to realize that FPGA online scheduling is just another level of P&R, with coaser-granularity (small PR bitstream).
- Configuration Relocation and Defragmentation for Run-Time Reconfigurable Computing, 2002
- Proposed a NEW architecture of FPGA to aid online relocation and avoid defragementation. Some designs are still valid, e.g., use Virtual IO for each PR bitstream. Thus it won't be constrained by physical IO pin location.
- I came across similar issues lately, but found people already done this around 20 years ago.
- S1. Reconfigurable Hardware Operating Systems: From Design Concepts to Realizations, 2003
- S2. Operating Systems for Reconfigurable Embedded Platforms: Online Scheduling of Real-Time Tasks, 2004
- Very fruitful discussion. The paper schedules bitstreams inside FPGA, following a Real-Time sched policy (deadline).
- Different from CPU sched, FPGA scheduling needs to consider "areas". The chip is a rectangle box, allocating areas needs great care to avoid fragmentation!
- Context saving and restoring for multitasking in reconfigurable systems, FPL'05
- Optimizing deschedule perf.
- This paper discusses ways to save and restore the state information of a hardware task. There are generally three approachs: a) adding indirection. Let app use system API to read/write states. b) yield-type API. c) use PR controller to read back bitstream.
- This paper used ICAP to read the bitstream back and extract necenssay state information that must be present at next bitstream resume.
- Scheduling intervals for reconfigurable computing, FCCM'08
- Block, drop or roll(back): Alternative preemption methods for RH multi-tasking, FCCM'09
- I like their arguement of having SoC: "To avoid stalling for RH kernel availability, our system requires applications to include software alternatives for each application kernel, which are used if the RH kernel is not immediately available."
- The system is simulated.
- ReconOS Cooperative multithreading in dynamically reconfigurable systems, FPL'09
- Middle of preemptive and non-preemptive scheduling: let FPGA apps use
yield()
. - It could save cost if and only if FPGA app is already doing the right thing, which, is not somehting an OS should have in mind.
- Middle of preemptive and non-preemptive scheduling: let FPGA apps use
- Hardware context-switch methodology for dynamically partially reconfigurable systems, 2010
- Online Scheduling for Multi-core Shared Reconfigurable Fabric, DATE'12
- More on policy.
- Multi-shape Tasks Scheduling for Online Multitasking on FPGAs, 2014
- Policy and mechanism.
- AmophOS, OSDI'18
Network-on-Chip on FPGA.
- Interconnection Networks Enable Fine-Grain Dynamic Multi-Tasking on FPGAs, 2002
- Like the idea of separating computation from communication.
- Also a lot discussions about possible NoC designs within FPGA.
- LEAP Soft connections: Addressing the hardware-design modularity problem, DAC'09
- Virtual channel concept. Time-insensitive.
- Leveraging Latency-Insensitivity to Ease Multiple FPGA Design, FPGA'12
- Your Programmable NIC Should be a Programmable Switch, HotNets'18
Papers deal with BRAM, registers, on-board DRAM, and host DRAM.
- LEAP Scratchpads: Automatic Memory and Cache Management for Reconfigurable Logic, FPGA'11
- Main design hierarchy: Use BRAM as L1 cache, use on-board DRAM as L2 cache, and host memory as the backing store. Everthing is abstracted away through their interface (similar to load/store). Programming is pretty much the same as if you are writing for CPU.
- According to sec 2.2.2, its scratchpad controller, is using simple segment-based mapping scheme. Like AmorphOS's one.
- LEAP Shared Memories: Automating the Construction of FPGA Coherent Memories, FCCM'14
- Follow up work on LEAP Scratchpads, extends the work to have cache coherence between multiple FPGAs.
- Coherent Scatchpads with MOSI protocol.
- MATCHUP: Memory Abstractions for Heap Manipulating Programs, FPGA'15
- CoRAM: An In-Fabric Memory Architecture for FPGA-Based Computing
- CoRAM provides an interface for managing the on- and off-chip memory resource of an FPGA. It use "control threads" enforce low-level control on data movement.
- Seriously, the CoRAM is just like Processor L1-L3 caches.
- CoRAM Prototype and evaluation of the CoRAM memory architecture for FPGA-based computing, FPGA'12
- Prototype on FPGA.
- Sharing, Protection, and Compatibility for Reconfigurable Fabric with AMORPHOS, OSDI'18
- Hull: provides memory protection for on-board DRAM using segment-based address translation.
- Virtualized Execution Runtime for FPGA Accelerators in the Cloud, IEEE Access'17
malloc()
and free()
for FPGA on-board DRAM.
- A High-Performance Memory Allocator for Object-Oriented Systems, IEEE'96
- SysAlloc: A Hardware Manager for Dynamic Memory Allocation in Heterogeneous Systems, FPL'15
- Hi-DMM: High-Performance Dynamic Memory Management in High-Level Synthesis, IEEE'18
Papers deal with OS Virtual Memory System (VMS). Note that, all these papers introduce some form of MMU into the FPGA to let FPGA be able to work with host VMS. This added MMU is similar to CPU's MMU and RDMA NIC's internal cache. Note that the VMS still runs inside Linux (include pgfault, swapping, TLB shootdown and so on. What could really stands out, is to implement VMS inside FPGA.)
- Virtual Memory Window for Application-Specific Reconfigurable Coprocessors, DAC'04
- Early work that adds a new MMU to FPGA to let FPGA logic access
on-chip DRAM
. Note, it's not the system main memory. Thus the translation pgtable is different. - Has some insights on prefetching and MMU CAM design.
- Early work that adds a new MMU to FPGA to let FPGA logic access
- Seamless Hardware Software Integration in Reconfigurable Computing Systems, 2005
- Follow up summary on previous DAC'04 Virtual Memory Window.
- A Reconfigurable Hardware Interface for a Modern Computing System, FCCM'07
- This work adds a new MMU which includes a 16-entry TLB to FPGA. FPGA and CPU shares the same user virtual address space, use the same physical memory. FPGA and CPU share memory at cacheline granularity, FPGA is just another core in this sense. Upon a TLB miss at FPGA MMU, the FPGA sends interrupt to CPU, to let software to handle the TLB miss. Using software-managed TLB miss is not efficient. But they made cache coherence between FPGA and CPU easy.
- Low-Latency High-Bandwidth HW/SW Communication in a Virtual Memory Environment, FPL'08
- This work actually add a new MMU to FPGA, which works just like CPU MMU. It's similar to IOMMU, in some sense.
- But I think they missed one important aspect: cache coherence between CPU and FPGA. There is not too much information about this in the paper, it seems they do not have cache at FPGA. Anyhow, this is why recently CCIX and OpenCAPI are proposed.
- Memory Virtualization for Multithreaded Reconfigurable Hardware, FPL'11
- Part of the ReconOS project
- They implemented a simple MMU inside FPGA that includes a TLB. On protection violation or page invalid access cases, their MMU just hand over to CPU pgfault routines. How is this different from the FPL'08 one? Actually, IMO, they are the same.
- S4 Virtualized Execution Runtime for FPGA Accelerators in the Cloud, IEEE Access'17
- This paper also implemented a hardware MMU, but the virtual memory system still run on Linux.
- Also listed in
Cloud Infrastructure
part.
- Lightweight Virtual Memory Support for Many-Core Accelerators in Heterogeneous Embedded SoCs, 2015
- Lightweight Virtual Memory Support for Zero-Copy Sharing of Pointer-Rich Data Structures in Heterogeneous Embedded SoCs, IEEE'17
- Part of the PULP project.
- Essentially a software-managed IOMMU. The control path is running as a Linux kernel module. The datapath is a lightweight AXI transation translation.
- A Virtual Hardware Operating System for the Xilinx XC6200, FPL'96
- Operating systems for reconfigurable embedded platforms: online scheduling of real-time tasks, IEEE'04
- hthreads: a hardware/software co-designed multithreaded RTOS kernel, 2005
- Reconfigurable computing: architectures and design methods, IEE'05
- BORPH: An Operating System for FPGA-Based Reconfigurable Computers. PhD Thesis.
- FUSE: Front-end user framework for O/S abstraction of hardware accelerators, FCCM'11
- ReconOS – an Operating System Approach for Reconfigurable Computing, IEEE Micro'14
- Invoke kernel from FPGA. They built a shell in FPGA and delegation threads in CPU to achieve this.
- They implemented their own MMU (using pre-established pgtables) to let FPGA logic to access system memory. Ref.
- Read the "Operating Systems for Reconfigurable Computing" sidebar, nice summary.
- LEAP Soft connections: Addressing the hardware-design modularity problem, DAC'09
- Channel concept. Good.
- LEAP Scratchpads: Automatic Memory and Cache Management for Reconfigurable Logic, FPGA'11
- BRAM/on-board DRAM/host DRAM layering. Caching.
- LEAP Shared Memories: Automating the Construction of FPGA Coherent Memories
- Add cache-coherence on top of previous work.
- Also check out my note on Cache Coherence.
- LEAP FPGA Operating System, FPL'14.
Summary on current FPGA Virtualization Status. Prior art mainly focus on: 1) How to virtualize on-chip BRAM (e.g., CoRAM, LEAP Scratchpad), 2) How to work with host, specifically, how to use the host DRAM, how to use host virtual memory. 3) How to schedule bitstreams inside a FPGA chip. 4) How to provide certain services to make FPGA programming easier (mostly work with host OS).
Innovations in the toolchain space.
- Design Patterns for Code Reuse in HLS Packet Processing Pipelines, FCCM'19
- A very good HLS library from Mellanox folks.
- Templatised Soft Floating-Point for High-Level Synthesis, FCCM'19
- ST-Accel: A High-Level Programming Platform for Streaming Applications on FPGA, FCCM'18
- Separation Logic-Assisted Code Transformations for Efficient High-Level Synthesis, FCCM'14
- An HLS design aids that analyze the original program at compile time and perform automated code transformations. The tool analysis pointer-manipulating programs and automatically splits heap-allocated data structures into disjoint, independent regions.
- The tool is for C++ heap operations.
- To put in another way: the tool looks at your BRAM usage, found any false-dependencies, and make multiple independent regions, then your II is improved.
- MATCHUP: Memory Abstractions for Heap Manipulating Programs, FPGA'15
- This is an HLS toolchain aid.
- Follow-up work of the above FCCM'14 one. This time they use LEAP scracchpads as the underlying caching block.
- Just-in-Time Compilation for Verilog, ASPLOS'19
- Chisel: Constructing Hardware in a Scala Embedded Language, DAC'12
- Chisel is being actively improved and used by UCB folks.
- Rosetta: A Realistic High-Level Synthesis Benchmark Suite for Software Programmable FPGAs, FPGA'18
- From JVM to FPGA: Bridging Abstraction Hierarchy via Optimized Deep Pipelining, HotCloud'18
- HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Computing, FPGA'19
- LINQits: Big Data on Little Clients, ISCA'13
- From Microsoft, used to express SQL-like functions (thus big data) and runs on ZYNQ (thus little client),
- You wrote C#, LINQits translate it to verilog, and run the whole thing at a ZYNQ (ARM+FPGA) board.
- Lime: a Java-Compatible and Synthesizable Language for Heterogeneous Architectures, OOPSLA'10
- Lime is a Java-based programming model and runtime from IBM which aims to provide a single unified language to program heterogeneous architectures, from FPGAs to conventional CPUs
- A line of work from Standord
- Generating configurable hardware from parallel patterns, ASPLOS'16
- Plasticine: A Reconfigurable Architecture For Parallel Patterns, ISCA'17
- Spatial: A Language and Compiler for Application Accelerators, PLDI'18
- Spatial generates Chisel code along with C++ code which can be used on a host CPU to control the execution of the accelerator on the target FPGA.
- This kind of academic papers must have a lot good ideas. But the truth is it will not be reliable because it's from academic labs.
- Map-reduce as a Programming Model for Custom Computing Machines, FCCM'08
- This paper proposes a model to translate MapReduce code written in C to code that could run on FPGA and GPU. Many details are omitted, and they don't really have the compiler.
- Single-host framework, everything is in FPGA and GPU.
- Axel: A Heterogeneous Cluster with FPGAs and GPUs, FPGA'10
- A distributed MapReduce Framework, targets clusters with CPU, GPU, and FPGA. Mainly the idea of scheduling FPGA/GPU jobs.
- Distributed Framework.
- FPMR: MapReduce Framework on FPGA, FPGA'10
- A MapReduce framework on a single host's FPGA. You need to write Verilog/HLS for processing logic to hook with their framework. The framework mainly includes a data transfer controller, a simple schedule that enable certain blocks at certain time.
- Single-host framework, everything is in FPGA.
- Melia: A MapReduce Framework on OpenCL-Based FPGAs, IEEE'16
- Another framework, written in OpenCL, and users can use OpenCL to program as well. Similar to previous work, it's more about the framework design, not specific algorithms on FPGA.
- Single-host framework, everything is in FPGA. But they have a discussion on running on multiple FPGAs.
- Four MapReduce FPGA papers here, I believe there are more. The marriage between MapReduce and FPGA is not something hard to understand. FPGA can be viewed as another core with different capabilities. The thing is, given FPGA's reprogram-time and limited on-board memory, how to design a good scheduling algorithm and data moving/caching mechanisms. Those papers give some hints on this.
- UCLA: When Apache Spark Meets FPGAs: A Case Study for Next-Generation DNA Sequencing Acceleration, HotCloud'16
- UCLA: Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale, SoCC'16
- A system that hooks FPGA with Spark.
- There is a line of work that hook FPGA with big data processing framework (Spark), so the implementation of FPGA and the scale-out software can be separated. The Spark can schedule FPGA jobs to different machines, and take care of scale-out, failure handling etc. But, I personally think this line of work is really just an extension to ReconOS/FUSE/BORPH line of work. The main reason is: both these two lines of work try to integrate jobs run on CPU and jobs run on FPGA, so CPU and FPGA have an easier way to talk, or put in another way, CPU and FPGA have a better division of labor. Whether it's single-machine (like ReconOS, Melia), or distributed (like Blaze, Axel), they are essentially the same.
- UCLA: Heterogeneous Datacenters: Options and Opportunities, DAC'16
- Follow up work of Blaze. Nice comparison of big and wimpy cores.
- Huawei: FPGA as a Service in the Cloud
- UCLA: Customizable Computing: From Single Chip to Datacenters, IEEE'18
- UCLA: Accelerator-Rich Architectures: Opportunities and Progresses, DAC'14
- Reminds me of OmniX. Disaggregation at a different scale.
- This paper actually targets single-machine case. But it can reflect a distributed setting.
- Enabling FPGAs in the Cloud, CF'14
- Paper raised four important aspects to enable FPGA in cloud: Abstraction, Sharing, Compatibility, and Security. FPGA itself requires a shell (paper calls it service logic) and being partitioned into multiple slots. Things discussed in the paper are straightforward, but worth reading. They did not solve the FPGA sharing issue, which, is solved by AmorphOS.
- FPGAs in the Cloud: Booting Virtualized Hardware Accelerators with OpenStack, FCCM'14
- Use OpenStack to manage FPGA resource. The FPGA is partitioned into multiple regions, each region can use PR. The FPGA shell includes: 1) basic MAC, and packet dispatcher, 2) memory controller, and segment-based partition scheme, 3) a soft processor used for runtime PR control. One very important aspect of this project is: they envision input to FPGA comes from Ethernet, which is very true nowadays. And this also makes their project quite similar to Catapult. It's a very solid paper, though the evaluation is a little bit weak. What could be added: migration, different-sized region.
- The above CF and FCCM papers are similar in the sense that they are both building SW framework and HW shell to provide a unified cloud management system. They differ in their shell design: CF one take inputs from DMA engine, which is local system DRAM, FCCM one take inputs from Ethernet. The things after DMA or MAC, are essentially similar.
- It seems all of them are using simple segment-based memory partition for user FPGA logic. What's the pros and cons of using paging here?
- S1 DyRACT: A partial reconfiguration enabled accelerator and test platform, FPL'14
- S2 Virtualized FPGA Accelerators for Efficient Cloud Computing, CloudCom'15
- S3 Designing a Virtual Runtime for FPGA Accelerators in the Cloud, FPL'16
- S4 Virtualized Execution Runtime for FPGA Accelerators in the Cloud, IEEE Access'17
- The above four papers came from the same group of folks. S1 developed a framework to use PCIe to do PR, okay. S2 is a follow-up on S1, read S2's chapter IV hardware architecture, many implementation details like internal FPGA switch, AXI stream interface. But no memory virtualization discussion. S3 is a two page short paper. S4 is the realization of S3. I was particularly interested if S4 has implemented their own virtual memory management. The answer is NO. S4 leveraged on-chip Linux, they just build a customized MMU (in the form of using BRAM to store page tables. This approach is similar to the papers listed in
Integrate with Virtual Memory
). Many things discussed in S4 have been proposed multiple times in previous cloud FPGA papers since 2014.
- The above four papers came from the same group of folks. S1 developed a framework to use PCIe to do PR, okay. S2 is a follow-up on S1, read S2's chapter IV hardware architecture, many implementation details like internal FPGA switch, AXI stream interface. But no memory virtualization discussion. S3 is a two page short paper. S4 is the realization of S3. I was particularly interested if S4 has implemented their own virtual memory management. The answer is NO. S4 leveraged on-chip Linux, they just build a customized MMU (in the form of using BRAM to store page tables. This approach is similar to the papers listed in
- MS: A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services, ISCA'14
- MS: A Cloud-Scale Acceleration Architecture, Micro'16
- Catapult is unique in its shell, which includes the Lightweight Transport Layer (LTL), and Elastic Router(ER). The cloud management part, which the paper just briefly mentioned, actually should include everything the above CF'14 and FCCM'14 have. The LTL has congestion control, packet loss detection/resend, ACK/NACK. The ER is a crossbar switch used by FPGA internal modules, which is essential to connect shell and roles.
- These two Catapult papers are simply a must read.
- MS: A Configurable Cloud-Scale DNN Processor for Real-Time AI, Micro'18
- MS: Azure Accelerated Networking: SmartNICs in the Public Cloud, NSDI'18
- MS: Direct Universal Access : Making Data Center Resources Available to FPGA, NSDI'19
- Catapult is just sweet, isn't it?
- ASIC Clouds: Specializing the Datacenter, ISCA'16
- MS: ClickNP: Highly Flexible and High Performance Network Processing with Reconfigurable Hardware, SIGCOMM'16
- MS: Multi-Path Transport for RDMA in Datacenters, NSDI'18
- MS: Azure Accelerated Networking: SmartNICs in the Public Cloud, NSDI'18
- Mellanox. NICA: An Infrastructure for Inline Acceleration of Network Applications, ATC'19
- The Case For In-Network Computing On Demand, EuroSys'19
- Fast, Scalable, and Programmable Packet Scheduler in Hardware, SIGCOMM'19
- HPCC: high precision congestion control, SIGCOMM'19
- Offloading Distributed Applications onto SmartNICs using iPipe, SIGCOMM'19
- Not necessary FPGA, but SmartNICs. The actor programming model seems a good fit. There is another paper from ATC'19 that optimizes distributed actor runtime.
- Cognitive SSD: A Deep Learning Engine for In-Storage Data Retrieval, ATC'19
- INSIDER: Designing In-Storage Computing System for Emerging High-Performance Drive, ATC'19
- Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks, FPGA'15
- From High-Level Deep Neural Models to FPGAs, ISCA'16
- Deep Learning on FPGAs: Past, Present, and Future, arXiv'16
- Accelerating binarized neural networks: Comparison of FPGA, CPU, GPU, and ASIC, FPT'16
- FINN: A Framework for Fast, Scalable Binarized Neural Network Inference, FPGA'17
- In-Datacenter Performance Analysis of a Tensor Processing Unit, ISCA'17
- Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs, FPGA'17
- A Configurable Cloud-Scale DNN Processor for Real-Time AI, ISCA'18
- A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks, MICRO'18
- DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs, ICCAD'18
- FA3C : FPGA-Accelerated Deep Reinforcement Learning, ASPLOS’19
- Cognitive SSD: A Deep Learning Engine for In-Storage Data Retrieval, ATC'19
- A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing, ISCA'15
- Energy Efficient Architecture for Graph Analytics Accelerators, ISCA'16
- Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search, FPGA'17
- FPGA-Accelerated Transactional Execution of Graph Workloads, FPGA'17
- An FPGA Framework for Edge-Centric Graph Processing, CF'18
- Achieving 10Gbps line-rate key-value stores with FPGAs, HotCloud'13
- Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached, ISCA'13
- An FPGA Memcached Appliance, FPGA'13
- Scaling out to a Single-Node 80Gbps Memcached Server with 40Terabytes of Memory, HotStorage'15
- KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC, SOSP'17
- This link is also useful for better understading Morning Paper
- Ultra-Low-Latency and Flexible In-Memory Key-Value Store System Design on CPU-FPGA, FPT'18
- When Apache Spark Meets FPGAs: A Case Study for Next-Generation DNA Sequencing Acceleration, HotCloud'16
- FPGA Accelerated INDEL Realignment in the Cloud, HPCA'19
- Consensus in a Box: Inexpensive Coordination in Hardware, NSDI'16
- TODO
- TODO
- TODO
- FPGA and CPLD architectures: a tutorial, 1996
- Reconfigurable computing: a survey of systems and software, 2002
- Reconfigurable computing: architectures and design methods
- FPGA Architecture: Survey and Challenges, 2007
- Read the first two paragraphs of each section and then come back to read all of that if needed.
- RAMP: Research Accelerator For Multiple Processors, 2007
- Three Ages of FPGAs: A Retrospective on the First Thirty Years of FPGA Technology, IEEE'15
- FPGA Dynamic and Partial Reconfiguration: A Survey of Architectures, Methods, and Applications, CSUR'18
- Must read.
- DyRACT: A partial reconfiguration enabled accelerator and test platform, FPL'14
- A high speed open source controller for FPGA partial reconfiguration
- Hardware context-switch methodology for dynamically partially reconfigurable systems, 2010
- FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs, 1994
- Combinational Logic Synthesis for LUT Based Field Programmable Gate Arrays, 1996
- DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs, 2004