FPGA Programming for the Masses
When looking at how hardware influences computing performance, we have general-purpose processors (GPPs) on one end of the spectrum and application-specific integrated circuits (ASICs) on the other. Processors are highly programmable but often inefficient in terms of power and performance. ASICs implement a dedicated and fixed function and provide the best power and performance characteristics, but any functional change requires a complete (and extremely expensive) re-spinning of the circuits.
Fortunately, several architectures exist between these two extremes. Programmable logic devices (PLDs) are one such example, providing the best of both worlds. They are closer to the hardware and can be reprogrammed.
The most prominent example of a PLD is a field programmable gate array (FPGA). It consists of look-up tables (LUTs), which are used to implement combinational logic; and flip-flops (FFs), which are used to implement sequential logic. Apart from the homogeneous array of logic cells, an FPGA also contains discrete components such as BRAMs (block RAMs), digital signal processing (DSP) slices, processor cores, and various communication cores (for example, Ethernet MAC and PCIe).
BRAMs, which are specialized memory structures distributed throughout the FPGA fabric in columns, are of particular importance. Each BRAM can hold up to 36Kbits of data. BRAMs can be used in various form factors and can be cascaded to form a larger logical memory structure. Because of the distributed organization of BRAMs, they can provide terabytes of bandwidth for memory bandwidth-intensive applications. Xilinx and Altera dominate the PLD market, collectively holding more than an 85% share.
FPGAs were long considered low-volume, low-density ASIC replacements. Following Moores Law, however, FPGAs are getting denser and faster. Modern-day FPGAs can have up to two million logic cells, 68Mbits of BRAM, more than 3,000 DSP slices, and up to 96 transceivers for implementing multigigabit communication channels.23 The latest FPGA families from Xilinx and Altera are more like a system-on-chip (SoC), mixing dual-core ARM processors with programmable logic on the same fabric. Coupled with higher device density and performance, FPGAs are quickly replacing ASICs and application-specific standard products (ASSPs) for implementing fixed function logic. Analysts expect the programmable integrated circuit (IC) market to reach the $10 billion mark by 2016.
The most perplexing fact is how an FPGA running at a clock frequency that is an order of magnitude lower than CPUs and GPUs (graphics processing units) is able to outperform them. In several classes of applications, especially floating-point-based ones, GPU performance is either slightly better or very close to that of an FPGA. When it comes to power efficiency (performance per watt), however, both CPUs and GPUs significantly lag behind FPGAs, as shown in the available literature comparing the performance of CPUs, GPUs, and FPGAs for different classes of applications.6,18,19
The contrast in performance between processors and FPGAs lies in the architecture itself. Processors rely on the Von Neumann paradigm where an application is compiled and stored in instruction and data memory. They typically work on an instruction and data fetch-decode-execute-store pipeline. This means both instructions and data have to be fetched from an external memory into the processor pipeline. Although caches are used to alleviate the cost of expensive fetch operations from external memory, each cache miss incurs a severe penalty. The bandwidth between processor and memory is often the critical factor in determining the overall performance. The phenomenon is also known as 'hitting the memory wall.'
FPGAs have programmable logic cells that could be used to implement an arbitrary logic function both spatially and temporally. FPGA designs implement the data and control path, thereby getting rid of the fetch and decode pipeline. The distributed on-chip memory provides much-needed bandwidth to satisfy the demands of concurrent logic. The inherent fine-grained architecture of FPGAs is very well suited for exploiting various forms of parallelism present in the application, ranging from bit-level to task-level parallelism. Apart from the conventional reconfiguration capability where the entire FPGA fabric is programmed with an image before execution, FPGAs are also capable of undergoing partial dynamic reconfiguration. This means part of the FPGA can be loaded with a new image while the rest of the FPGA is functional. This is similar to the concept of paging and virtual memory in the processor taxonomy.
Various kinds of FPGA-based systems are available today. They range from heterogeneous systems targeted at high-performance computing that tightly couple FPGAs with conventional CPUs (for example, Convey Computers), to midrange commercial-off-the-shelf workstations that use PCIe-attached FPGAs, to low-end embedded systems that integrate embedded processors directly into the FPGA fabric or on the same chip.
Programming Challenges
Despite the advantages offered by FPGAs and their rapid growth, use of FPGA technology is restricted to a narrow segment of hardware programmers. The larger community of software programmers has stayed away from this technology, largely because of the challenges experienced by beginners trying to learn and use FPGAs.
Abstraction. FPGAs are predominantly programmed using hardware description languages (HDLs) such as Verilog and VHDL. These languages, which date back to the 1980s and have seen few revisions, are very low level in terms of the abstraction offered to the user. A hardware designer thinks about the design in terms of low-level building blocks such as gates, registers, and multiplexors. VHDL and Verilog are well suited for describing a design at that level of abstractio
剩余内容已隐藏,支付完成后下载完整资料
原文:
FPGA Programming for the Masses
When looking at how hardware influences computing performance, we have general-purpose processors (GPPs) on one end of the spectrum and application-specific integrated circuits (ASICs) on the other. Processors are highly programmable but often inefficient in terms of power and performance. ASICs implement a dedicated and fixed function and provide the best power and performance characteristics, but any functional change requires a complete (and extremely expensive) re-spinning of the circuits.
Fortunately, several architectures exist between these two extremes. Programmable logic devices (PLDs) are one such example, providing the best of both worlds. They are closer to the hardware and can be reprogrammed.
The most prominent example of a PLD is a field programmable gate array (FPGA). It consists of look-up tables (LUTs), which are used to implement combinational logic; and flip-flops (FFs), which are used to implement sequential logic. Apart from the homogeneous array of logic cells, an FPGA also contains discrete components such as BRAMs (block RAMs), digital signal processing (DSP) slices, processor cores, and various communication cores (for example, Ethernet MAC and PCIe).
BRAMs, which are specialized memory structures distributed throughout the FPGA fabric in columns, are of particular importance. Each BRAM can hold up to 36Kbits of data. BRAMs can be used in various form factors and can be cascaded to form a larger logical memory structure. Because of the distributed organization of BRAMs, they can provide terabytes of bandwidth for memory bandwidth-intensive applications. Xilinx and Altera dominate the PLD market, collectively holding more than an 85% share.
FPGAs were long considered low-volume, low-density ASIC replacements. Following Moores Law, however, FPGAs are getting denser and faster. Modern-day FPGAs can have up to two million logic cells, 68Mbits of BRAM, more than 3,000 DSP slices, and up to 96 transceivers for implementing multigigabit communication channels.23 The latest FPGA families from Xilinx and Altera are more like a system-on-chip (SoC), mixing dual-core ARM processors with programmable logic on the same fabric. Coupled with higher device density and performance, FPGAs are quickly replacing ASICs and application-specific standard products (ASSPs) for implementing fixed function logic. Analysts expect the programmable integrated circuit (IC) market to reach the $10 billion mark by 2016.
The most perplexing fact is how an FPGA running at a clock frequency that is an order of magnitude lower than CPUs and GPUs (graphics processing units) is able to outperform them. In several classes of applications, especially floating-point-based ones, GPU performance is either slightly better or very close to that of an FPGA. When it comes to power efficiency (performance per watt), however, both CPUs and GPUs significantly lag behind FPGAs, as shown in the available literature comparing the performance of CPUs, GPUs, and FPGAs for different classes of applications.6,18,19
The contrast in performance between processors and FPGAs lies in the architecture itself. Processors rely on the Von Neumann paradigm where an application is compiled and stored in instruction and data memory. They typically work on an instruction and data fetch-decode-execute-store pipeline. This means both instructions and data have to be fetched from an external memory into the processor pipeline. Although caches are used to alleviate the cost of expensive fetch operations from external memory, each cache miss incurs a severe penalty. The bandwidth between processor and memory is often the critical factor in determining the overall performance. The phenomenon is also known as 'hitting the memory wall.'
FPGAs have programmable logic cells that could be used to implement an arbitrary logic function both spatially and temporally. FPGA designs implement the data and control path, thereby getting rid of the fetch and decode pipeline. The distributed on-chip memory provides much-needed bandwidth to satisfy the demands of concurrent logic. The inherent fine-grained architecture of FPGAs is very well suited for exploiting various forms of parallelism present in the application, ranging from bit-level to task-level parallelism. Apart from the conventional reconfiguration capability where the entire FPGA fabric is programmed with an image before execution, FPGAs are also capable of undergoing partial dynamic reconfiguration. This means part of the FPGA can be loaded with a new image while the rest of the FPGA is functional. This is similar to the concept of paging and virtual memory in the processor taxonomy.
Various kinds of FPGA-based systems are available today. They range from heterogeneous systems targeted at high-performance computing that tightly couple FPGAs with conventional CPUs (for example, Convey Computers), to midrange commercial-off-the-shelf workstations that use PCIe-attached FPGAs, to low-end embedded systems that integrate embedded processors directly into the FPGA fabric or on the same chip.
Programming Challenges
Despite the advantages offered by FPGAs and their rapid growth, use of FPGA technology is restricted to a narrow segment of hardware programmers. The larger community of software programmers has stayed away from this technology, largely because of the challenges experienced by beginners trying to learn and use FPGAs.
Abstraction. FPGAs are predominantly programmed using hardware description languages (HDLs) such as Verilog and VHDL. These languages, which date back to the 1980s and have seen few revisions, are very low level in terms of the abstraction offered to the user. A hardware designer thinks about the desi
剩余内容已隐藏,支付完成后下载完整资料
资料编号:[151032],资料为PDF文档或Word文档,PDF文档可免费转换为Word
以上是毕业论文外文翻译,课题毕业论文、任务书、文献综述、开题报告、程序设计、图纸设计等资料可联系客服协助查找。