From d432b96f958db9145ffb5a8d80ecc2bd84d96f47 Mon Sep 17 00:00:00 2001 From: Ibrahim Abu Kharmeh Date: Mon, 14 Feb 2022 16:56:26 +0000 Subject: [PATCH] Convert documentation to CV32E41P (#19) * Convert documentation to CV32E41P * Update links and Verible version * Add comments about the RTL status Co-authored-by: Tariq Kurd --- README.md | 24 +-- docs/source/apu.rst | 2 +- docs/source/control_status_registers.rst | 49 +------ docs/source/core_versions.rst | 50 ++----- docs/source/fpu.rst | 2 +- docs/source/integration.rst | 18 ++- docs/source/intro.rst | 177 ++--------------------- docs/source/load_store_unit.rst | 15 +- docs/source/pipeline.rst | 8 +- docs/source/register_file.rst | 6 +- 10 files changed, 64 insertions(+), 287 deletions(-) diff --git a/README.md b/README.md index 6ee334e..6724af5 100644 --- a/README.md +++ b/README.md @@ -5,9 +5,7 @@ CV32E41P is a small and efficient, 32-bit, in-order RISC-V core with a 4-stage pipeline that implements the RV32IM\[F,Zfinx\]C\[Zce\] instruction set architecture, and the Xpulp custom extensions for achieving higher code density, performance, and energy efficiency \[[1](https://doi.org/10.1109/TVLSI.2017.2654506)\], \[[2](https://doi.org/10.1109/PATMOS.2017.8106976)\]. -It started its life as a fork of the CV32E40P core to implement the official RISC-V [Zfinx](https://github.com/riscv/riscv-zfinx/blob/main/zfinx-spec-20210511-0.41.pdf) and [Zce](https://github.com/riscv/riscv-code-size-reduction/blob/master/ISA%20proposals/Huawei/Zce_spec.adoc) ISA extensions. - -A first implementation of the Zce ISA extensions has been explored in \[[3](https://www.research-collection.ethz.ch/bitstream/handle/20.500.11850/461404/1/CARRV2020_paper_12_Perotti.pdf)\] to investigate code reduction benefits. +It started its life as a fork of the CV32E40P core to implement the official RISC-V [Zfinx](https://github.com/riscv/riscv-zfinx/blob/main/zfinx-spec-20210511-0.41.pdf) and [Zce](https://github.com/riscv/riscv-code-size-reduction/releases/tag/V0.50.1-TOOLCHAIN-DEV) ISA extensions. ## Documentation @@ -46,27 +44,13 @@ When contributing SystemVerilog source code, please try to be consistent and adh coding style guide](https://github.com/lowRISC/style-guides/blob/master/VerilogCodingStyle.md). To get started, please check out the ["Good First Issue" - list](https://github.com/openhwgroup/cv32e40p/issues?q=is%3Aissue+is%3Aopen+-label%3Astatus%3Aresolved+label%3A%22good+first+issue%22). + list](https://github.com/openhwgroup/cv32e41p/issues?q=is%3Aissue+is%3Aopen+-label%3Astatus%3Aresolved+label%3A%22good+first+issue%22). -The RTL code has been formatted with ["Verible"](https://github.com/google/verible) v0.0-1149-g7eae750. +The RTL code has been formatted with ["Verible"](https://github.com/google/verible) v0.0-1824-ga3b5bedf. ## Issues and Troubleshooting If you find any problems or issues with CV32E41P or the documentation, please check out the [issue - tracker](https://github.com/openhwgroup/cv32e40p/issues) and create a new issue if your problem is + tracker](https://github.com/openhwgroup/cv32e41p/issues) and create a new issue if your problem is not yet tracked. -## References - -1. [Gautschi, Michael, et al. "Near-Threshold RISC-V Core With DSP Extensions for Scalable IoT Endpoint Devices." - in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 10, pp. 2700-2713, Oct. 2017](https://doi.org/10.1109/TVLSI.2017.2654506) - -2. [Schiavone, Pasquale Davide, et al. "Slow and steady wins the race? A comparison of - ultra-low-power RISC-V cores for Internet-of-Things applications." - _27th International Symposium on Power and Timing Modeling, Optimization and Simulation - (PATMOS 2017)_](https://doi.org/10.1109/PATMOS.2017.8106976) - -3. [Perotti, Matteo, et al. "HW/SW approaches for RISC-V code size reduction." - Workshop on Computer Architecture Research with RISC-V (CARRV 2020). 2020.](https://www.research-collection.ethz.ch/bitstream/handle/20.500.11850/461404/1/CARRV2020_paper_12_Perotti.pdf) - - diff --git a/docs/source/apu.rst b/docs/source/apu.rst index 66d26dc..b9caeb7 100644 --- a/docs/source/apu.rst +++ b/docs/source/apu.rst @@ -63,7 +63,7 @@ The CV32E41P apu interface can cause up to two outstanding transactions. Connection with the FPU ----------------------- -The CV32E41P sends FP operands over the ``apu_operands_o`` bus; the decoded RV32F operation as ADD, SUB, MUL, etc through the ``apu_op_o`` bus; the cast, destination and source formats as well as rounding mode through the ``apu_flags_o`` bus. The respose is the FPU result and relative output flags as Overflow, Underflow, etc. +The CV32E41P sends FP operands over the ``apu_operands_o`` bus; the decoded RV32F operation as ADD, SUB, MUL, etc through the ``apu_op_o`` bus; the cast, destination and source formats as well as rounding mode through the ``apu_flags_o`` bus. The response is the FPU result and relative output flags as Overflow, Underflow, etc. APU Tracer diff --git a/docs/source/control_status_registers.rst b/docs/source/control_status_registers.rst index 9883db5..2e37d55 100644 --- a/docs/source/control_status_registers.rst +++ b/docs/source/control_status_registers.rst @@ -442,71 +442,30 @@ Detailed: +=============+============+========================================================================+ | 31:30 | RO (0x1) | **MXL** (Machine XLEN). | +-------------+------------+------------------------------------------------------------------------+ -| 29:26 | RO (0x0) | (Reserved). | -+-------------+------------+------------------------------------------------------------------------+ -| 25 | RO (0x0) | **Z** (Reserved). Read-only; writes are ignored. | -+-------------+------------+------------------------------------------------------------------------+ -| 24 | RO (0x0) | **Y** (Reserved). | -+-------------+------------+------------------------------------------------------------------------+ | 23 | RO | **X** (Non-standard extensions present). | +-------------+------------+------------------------------------------------------------------------+ -| 22 | RO (0x0) | **W** (Reserved). | -+-------------+------------+------------------------------------------------------------------------+ -| 21 | RO (0x0) | **V** (Tentatively reserved for Vector extension). | -+-------------+------------+------------------------------------------------------------------------+ -| 20 | RO (0x0) | **U** (User mode implemented). | -+-------------+------------+------------------------------------------------------------------------+ -| 19 | RO (0x0) | **T** (Tentatively reserved for Transactional Memory extension). | -+-------------+------------+------------------------------------------------------------------------+ -| 18 | RO (0x0) | **S** (Supervisor mode implemented). | -+-------------+------------+------------------------------------------------------------------------+ -| 17 | RO (0x0) | **R** (Reserved). | -+-------------+------------+------------------------------------------------------------------------+ -| 16 | RO (0x0) | **Q** (Quad-precision floating-point extension). | -+-------------+------------+------------------------------------------------------------------------+ -| 15 | RO (0x0) | **P** (Tentatively reserved for Packed-SIMD extension). | -+-------------+------------+------------------------------------------------------------------------+ -| 14 | RO (0x0) | **O** (Reserved). | -+-------------+------------+------------------------------------------------------------------------+ -| 13 | RO (0x0) | **N** (User-level interrupts supported). | -+-------------+------------+------------------------------------------------------------------------+ | 12 | RO (0x1) | **M** (Integer Multiply/Divide extension). | +-------------+------------+------------------------------------------------------------------------+ -| 11 | RO (0x0) | **L** (Tentatively reserved for Decimal Floating-Point extension). | -+-------------+------------+------------------------------------------------------------------------+ -| 10 | RO (0x0) | **K** (Reserved). | -+-------------+------------+------------------------------------------------------------------------+ -| 9 | RO (0x0) | **J** (Tentatively reserved for Dynamically Translated Languages | -| | | extension). | -+-------------+------------+------------------------------------------------------------------------+ | 8 | RO (0x1) | **I** (RV32I/64I/128I base ISA). | +-------------+------------+------------------------------------------------------------------------+ -| 7 | RO (0x0) | **H** (Hypervisor extension). | -+-------------+------------+------------------------------------------------------------------------+ -| 6 | RO (0x0) | **G** (Additional standard extensions present). | -+-------------+------------+------------------------------------------------------------------------+ | 5 | RO | **F** (Single-precision floating-point extension). | +-------------+------------+------------------------------------------------------------------------+ -| 4 | RO (0x0) | **E** (RV32E base ISA). | -+-------------+------------+------------------------------------------------------------------------+ -| 3 | RO (0x0) | **D** (Double-precision floating-point extension). | -+-------------+------------+------------------------------------------------------------------------+ | 2 | RO (0x1) | **C** (Compressed extension). | +-------------+------------+------------------------------------------------------------------------+ -| 1 | RO (0x0) | **B** (Tentatively reserved for Bit-Manipulation extension). | -+-------------+------------+------------------------------------------------------------------------+ -| 0 | RO (0x0) | **A** (Atomic extension). | +| others | RO (0x0) | All other fields read as zero | +-------------+------------+------------------------------------------------------------------------+ All bitfields in the ``misa`` CSR read as 0 except for the following: * **C** = 1 -* **F** = 1 if ``FPU`` = 1 +* **F** = 1 if ``FPU`` = 1 and ``ZFINX`` = 0 * **I** = 1 * **M** = 1 * **X** = 1 if ``PULP_XPULP`` = 1 or ``PULP_CLUSTER`` = 1 * **MXL** = 1 (i.e. XLEN = 32) +The bit positions are shown in the table above. + Machine Interrupt Enable Register (``mie``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/core_versions.rst b/docs/source/core_versions.rst index 0bafb52..b99dadb 100644 --- a/docs/source/core_versions.rst +++ b/docs/source/core_versions.rst @@ -23,9 +23,7 @@ The tuple identify which sets of parameters have been verified by OpenHW Group, and once RTL Freeze is achieved, no further non-logically equivalent changes are allowed on that set of parameters. -The RTL Freeze version of the core is indentified by a GitHub -tag with the format cv32e41p_vMAJOR.MINOR.PATCH (e.g. cv32e41p_v1.0.0). -In addition, the release date is reported in the documentation. +The core RTL is not yet frozen, but it's kept sequentially equivalent to CV32E40P for the RV32IMC subset except for don't care states. What happens after RTL Freeze? ------------------------------ @@ -40,13 +38,13 @@ value and the bug and the fix must be documented. These changes are visible by software as the ``mimpid`` has a different value. Every bug or set of bugs found must be followed by another RTL Freeze release and a new GitHub tag. -RTL changes on non-verified yet parameters -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +RTL changes on unverified parameters +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If changes affecting the core on a non-frozen parameter set are required, as for example, to fix bugs found in the communication to the FPU (e.g., affecting the core only if ``FPU=1``), or to change the ISA Extensions decoding of PULP instructions (e.g., affecting the core only if ``PULP_XPULP=1``), -then such changes must remain logically equivalent for the already frozen set of parameters (except for the required mimpid update), and they must be applied on a different ``mimpid`` value. They can be non-logically equivalent to a non-frozen set of parameters. +then such changes must remain logically equivalent for the already frozen set of parameters (except for the required ``mimpid`` update), and they must be applied on a different ``mimpid`` value. They can be non-logically equivalent to a non-frozen set of parameters. These changes are visible by software as the ``mimpid`` has a different value. Once the new set of parameters is verified and achieved the sign-off for RTL freeze, a new GitHub tag and version of the core is released. @@ -60,21 +58,11 @@ If PPA optimizations are logically-equivalent instead, they can be applied witho changing the ``mimpid`` value (as such changes are not visible in software). However, a new GitHub tag should be release and changes documented. -:numref:`rtl_freeze_rules` shows the aforementioned rules. - -.. figure:: ../images/rtl_freeze_rules.png - :name: rtl_freeze_rules - :align: center - :alt: - - Versions control of CV32E41P - - Released core versions ---------------------- The verified parameter sets of the core, their implementation version, GitHub tags, -and dates are reported here. +and dates will be reported here. ``mimpid=0`` ------------ @@ -84,27 +72,21 @@ The ``mimpid=0`` refers to the CV32E41P core verified with the following paramet +---------------------------+-------+ | Name | Value | +===========================+=======+ -| ``FPU`` | 0 | -+---------------------------+-------+ | ``NUM_MHPMCOUNTERS`` | 1 | +---------------------------+-------+ | ``PULP_CLUSTER`` | 0 | +---------------------------+-------+ | ``PULP_XPULP`` | 0 | +---------------------------+-------+ -| ``PULP_ZFINX`` | 0 | +| ``FPU`` | 0 | ++---------------------------+-------+ +| ``ZFINX`` | 0 | ++---------------------------+-------+ +| ``Zcea`` | 0 | ++---------------------------+-------+ +| ``Zceb`` | 0 | ++---------------------------+-------+ +| ``Zcec`` | 0 | ++---------------------------+-------+ +| ``Zcee`` | 0 | +---------------------------+-------+ - -Following, all the GitHub tags related to ``mimpid=0``. - -+--------------------+-------------------+------------+--------------------+---------+ -| Git Tag | Tagged By | Date | Reason for Release | Comment | -+====================+===================+============+====================+=========+ -| cv32e41p_v1.0.0 | Arjan Bink | 2020-12-10 | RTL Freeze | | -+--------------------+-------------------+------------+--------------------+---------+ - -The list of open (waived) issues at the time of applying the cv32e41p_v1.0.0 tag can be found at: - -* https://github.com/openhwgroup/core-v-docs/blob/master/program/milestones/CV32E41P/RTL_Freeze_v1.0.0/Design_openissues.md -* https://github.com/openhwgroup/core-v-docs/blob/master/program/milestones/CV32E41P/RTL_Freeze_v1.0.0/Verification_openissues.md -* https://github.com/openhwgroup/core-v-docs/blob/master/program/milestones/CV32E41P/RTL_Freeze_v1.0.0/Documentation_openissues.md diff --git a/docs/source/fpu.rst b/docs/source/fpu.rst index f39c2d1..0945f11 100644 --- a/docs/source/fpu.rst +++ b/docs/source/fpu.rst @@ -31,7 +31,7 @@ In the core repository, a wrapper showing how the FPU is connected to the core is available at ``example_tb/core/cv32e41p_fp_wrapper.sv``. By default a dedicated register file consisting of 32 floating-point registers, ``f0``-``f31``, is instantiated. This default behavior -can be overruled by setting the parameter **PULP_ZFINX** of the toplevel +can be overruled by setting the parameter **ZFINX** of the toplevel file ``cv32e41p_core.sv`` to 1, in which case the dedicated register file is not included and the general purpose register file is used instead to host the floating-point operands. diff --git a/docs/source/integration.rst b/docs/source/integration.rst index ffae639..0cc62fb 100644 --- a/docs/source/integration.rst +++ b/docs/source/integration.rst @@ -33,7 +33,11 @@ Instantiation Template .NUM_MHPMCOUNTERS ( 1 ), .PULP_CLUSTER ( 0 ), .PULP_XPULP ( 0 ), - .PULP_ZFINX ( 0 ) + .ZFINX ( 0 ) + .Zcea ( 0 ) //FIXME these will change names + .Zceb ( 0 ) //when moving to v0.70 + .Zcec ( 0 ) + .Zcee ( 0 ) ) u_core ( // Clock and reset .clk_i (), @@ -95,7 +99,7 @@ Parameters ---------- .. note:: - The non-default (i.e. non-zero) settings of ``FPU``, ``PULP_CLUSTER``, ``PULP_XPULP`` and ``PULP_ZFINX`` have not + The non-default (i.e. non-zero) settings of ``FPU``, ``PULP_CLUSTER``, ``PULP_XPULP`` and ``ZFINX`` have not been verified yet. The default parameter value for ``PULP_XPULP`` will be changed to 1 once it has been verified. The default configuration reflected below is currently under verification and this verification effort will be completed first. @@ -124,11 +128,19 @@ Parameters | | | | (see :ref:`corev_hardware_loop`). | | | | | | +------------------------------+-------------+------------+------------------------------------------------------------------+ -| ``PULP_ZFINX`` | bit | 0 | Enable Floating Point instructions to use the General Purpose | +| ``ZFINX`` | bit | 0 | Enable Floating Point instructions to use the General Purpose | | | | | register file instead of requiring a dedicated Floating Point | | | | | register file, see :ref:`fpu`. Only allowed to be set to 1 | | | | | if ``FPU`` = 1 | +------------------------------+-------------+------------+------------------------------------------------------------------+ +| ``Zcea`` | bit | 0 | Enable all Zcea instruction from Zce v0.50.1 | ++------------------------------+-------------+------------+------------------------------------------------------------------+ +| ``Zceb`` | bit | 0 | Enable all Zceb instruction from Zce v0.50.1 | ++------------------------------+-------------+------------+------------------------------------------------------------------+ +| ``Zcec`` | bit | 0 | Enable all Zcec instruction from Zce v0.50.1 | ++------------------------------+-------------+------------+------------------------------------------------------------------+ +| ``Zcee`` | bit | 0 | Enable all Zcee instruction from Zce v0.50.1 | ++------------------------------+-------------+------------+------------------------------------------------------------------+ Interfaces ---------- diff --git a/docs/source/intro.rst b/docs/source/intro.rst index 742b930..dd5359a 100644 --- a/docs/source/intro.rst +++ b/docs/source/intro.rst @@ -59,7 +59,7 @@ It follows these specifications: CV32E41P implements the Machine ISA version 1.11. * `RISC-V External Debug Support, version 0.13.2 `_ -Many features in the RISC-V specification are optional, and CV32E41P can be parametrized to enable or disable some of them. +Many features in the RISC-V specification are optional, and CV32E41P can be parameterized to enable or disable some of them. CV32E41P supports the following base instruction set. @@ -94,9 +94,13 @@ In addition, the following standard instruction set extensions are available. - 2.0 - always enabled - * - **F**: Single-Precision Floating-Point + * - **F**: Single-Precision Floating-Point using F registers - 2.2 - - optionally enabled based on ``FPU`` parameter + - optionally enabled with the ``FPU`` parameter + + * - **Zfinx**: Single-Precision Floating-Point using X registers + - 1.0 + - optionally enabled with the ``ZFINX`` parameter (also requires the ``FPU`` parameter) The following custom instruction set extensions are available. @@ -109,22 +113,18 @@ The following custom instruction set extensions are available. * - **Xcorev**: CORE-V ISA Extensions (excluding **cv.elw**) - 1.0 - - optionally enabled based on ``PULP_XPULP`` parameter + - optionally enabled with the ``PULP_XPULP`` parameter * - **Xpulpcluster**: PULP Cluster Extension - 1.0 - - optionally enabled based on ``PULP_CLUSTER`` parameter - - * - **Xpulpzfinx**: PULP Share Integer (X) Registers with Floating Point (F) Register Extension - - 1.0 - - optionally enabled based on ``PULP_ZFINX`` parameter + - optionally enabled with the ``PULP_CLUSTER`` parameter Most content of the RISC-V privileged specification is optional. CV32E41P currently supports the following features according to the RISC-V Privileged Specification, version 1.11. * M-Mode * All CSRs listed in :ref:`cs-registers` -* Hardware Performance Counters as described in :ref:`performance-counters` based on ``NUM_MHPMCOUNTERS`` parameter +* Hardware Performance Counters as described in :ref:`performance-counters` controlled by the ``NUM_MHPMCOUNTERS`` parameter * Trap handling supporting direct mode or vectored mode as described at :ref:`exceptions-interrupts` @@ -159,9 +159,10 @@ be provided. FPGA Synthesis ^^^^^^^^^^^^^^^ -FPGA synthesis is supported for CV32E41P when the flip-flop based register -file is used. Since latches are not well supported on FPGAs, it is -crucial to select the flip-flop based register file. The user needs to provide +FPGA synthesis is only supported for CV32E41P when the flip-flop based register +file is used as latches are not well supported on FPGAs. + +The user needs to provide a technology specific implementation of a clock gating cell as described in :ref:`clock-gating-cell`. @@ -173,70 +174,6 @@ core can be found at `core-v-verif `_. -In early 2021 the CV32E41P achieved Functional RTL Freeze, meaning that is has -been fully verified as per its -`Verification Plan `_. -The top-level `README in core-v-verif `_ -has a link to the final functional, code and test coverage reports. - -The unofficial start date for the CV32E41P verification effort is 2020-02-27, -which is the date the core-v-verif environment "went live". Between then and -RTL Freeze, a total of 47 RTL issues and 38 User Manual issues were identified -and resolved [1]_. A breakdown of the RTL issues is as follows: - -.. table:: How RTL Issues Were Found - :name: How RTL Issues Were Found - - +---------------------+-------+----------------------------------------------------+ - | "Found By" | Count | Note | - +=====================+=======+====================================================+ - | Simulation | 18 | See classification below | - +---------------------+-------+----------------------------------------------------+ - | Inspection | 13 | Human review of the RTL | - +---------------------+-------+----------------------------------------------------+ - | Formal Verification | 13 | This includes both Designer and Verifier use of FV | - +---------------------+-------+----------------------------------------------------+ - | Lint | 2 | | - +---------------------+-------+----------------------------------------------------+ - | Unknown | 1 | | - +---------------------+-------+----------------------------------------------------+ - -A classification of the simulation issues by method used to identify them is informative: - -.. table:: Breakdown of Issues found by Simulation - :name: Breakdown of Issues found by Simulation - - +------------------------------+-------+----------------------------------------------------------------------------------------+ - | Simulation Method | Count | Note | - +==============================+=======+========================================================================================+ - | Directed, self-checking test | 10 | Many test supplied by Design team and a couple from the Open Source Community at large | - +------------------------------+-------+----------------------------------------------------------------------------------------+ - | Step & Compare | 6 | Issues directly attributed to S&C against ISS | - +------------------------------+-------+----------------------------------------------------------------------------------------+ - | Constrained-Random | 2 | Test generated by corev-dv (extension of riscv-dv) | - +------------------------------+-------+----------------------------------------------------------------------------------------+ - -A classification of the issues themselves: - -.. table:: Issue Classification - :name: Issue Classification - - +------------------------------+-------+----------------------------------------------------------------------------------------+ - | Issue Type | Count | Note | - +==============================+=======+========================================================================================+ - | RTL Functional | 40 | A bug! | - +------------------------------+-------+----------------------------------------------------------------------------------------+ - | RTL coding style | 4 | Linter issues, removing TODOs, removing `ifdefs, etc. | - +------------------------------+-------+----------------------------------------------------------------------------------------+ - | Non-RTL functional | 1 | Issue related to behavioral tracer (not part of the core) | - +------------------------------+-------+----------------------------------------------------------------------------------------+ - | Unreproducible | 1 | | - +------------------------------+-------+----------------------------------------------------------------------------------------+ - | Invalid | 1 | | - +------------------------------+-------+----------------------------------------------------------------------------------------+ - -Additional details are available as part of the `CV32E41P v1.0.0 Report `_. - Contents -------- @@ -257,89 +194,3 @@ Contents * :ref:`custom-isa-extensions` describes the custom instruction set extensions. * :ref:`glossary` provides definitions of used terminology. -History -------- - -CV32E41P started its life as a fork of the OR10N CPU core based on the OpenRISC ISA. Then, under the name of RI5CY, it became a RISC-V core (2016), and it has been maintained by the PULP platform team until February 2020, when it has been contributed to OpenHW Group https://www.openhwgroup.org>. - -As RI5CY has been used in several projects, a list of all the changes made by OpenHW Group since February 2020 follows: - -Memory-Protocol -^^^^^^^^^^^^^^^ - -The Instruction and Data memory interfaces are now compliant with the OBI protocol (see https://github.com/openhwgroup/core-v-docs/blob/master/cores/obi/OBI-v1.2.pdf). -Such memory interface is slightly different from the one used by RI5CY as: the grant signal can now be kept high by the bus even without the core raising a request; and the request signal does not depend anymore on the rvalid signal (no combinatorial dependency). The OBI is easier to be interfaced to the AMBA AXI and AHB protocols and improves timing as it removes rvalid->req dependency. Also, the protocol forces the address stability. Thus, the core can not retract memory requests once issued, nor can it change the issued address (as was the case for the RI5CY instruction memory interface). - -RV32F Extensions -^^^^^^^^^^^^^^^^ - -The FPU is not instantiated in the core EX stage anymore, and it must be attached to the APU interface. -Previously, RI5CY could select with a parameter whether the FPU was instantiated inside the EX stage or via the APU interface. - -RV32A Extensions, Security and Memory Protection -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -CV32E41P core does not support the RV32A (atomic) extensions, the U-mode, and the PMP anymore. -Most of the previous RTL descriptions of these features have been kept but not maintained. The RTL code has been partially kept to allow previous users of these features to develop their own by reusing previously developed RI5CY modules. - -CSR Address Re-Mapping -^^^^^^^^^^^^^^^^^^^^^^ - -CV32E41P is fully compliant with RISC-V. -RI5CY used to have custom performance counters 32b wide (not compliant with RISC-V) in the CSR address space -{0x7A0, 0x7A1, 0x780-0x79F}. CV32E41P is fully compliant with the RISC-V spec. -The custom PULP HWLoop CSRs moved from the 0x7C* to RISC-V user custom space 0x80* address space. - -Interrupts -^^^^^^^^^^ - -RI5CY used to have a req plus a 5bits ID interrupt interface, supporting up to 32 interrupt requests (only one active at a time), with the priority defined outside in an interrupt controller. CV32E41P is now compliant with the CLINT RISC-V spec, extended with 16 custom interrupts lines called fast, for a total of 19 interrupt lines. They can be all active simultaneously, and priority and per-request interrupt enable bit is controlled by the core CLINT definition. - -PULP HWLoop Spec -^^^^^^^^^^^^^^^^ - -RI5CY supported two nested HWLoops. Every loop had a minimum of two instructions. The start and end of the loop addresses -could be misaligned, and the instructions in the loop body could be of any kind. CV32E41P has a more restricted spec for the -HWLoop (see :ref:`hwloop-specs`). - -Compliancy, bug fixing, code clean-up, and documentation -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -The CV32E41P has been verified. It is fully compliant with RISC-V (RI5CY was partially compliant). Many bugs have been fixed, and the RTL code cleaned-up. The documentation has been formatted with reStructuredText and has been developed following at industrial quality level. - - - -References ----------- - -1. `Gautschi, Michael, et al. "Near-Threshold RISC-V Core With DSP Extensions for Scalable IoT Endpoint Devices." in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 10, pp. 2700-2713, Oct. 2017 `_ - -2. `Schiavone, Pasquale Davide, et al. "Slow and steady wins the race? A comparison of ultra-low-power RISC-V cores for Internet-of-Things applications." 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS 2017) `_ - -Contributors ------------- - -| Andreas Traber - (`*atraber@iis.ee.ethz.ch* `__) - -Michael Gautschi -(`*gautschi@iis.ee.ethz.ch* `__) - -Pasquale Davide Schiavone -(`*pschiavo@iis.ee.ethz.ch* `__) - -Arjan Bink (`*arjan.bink@silabs.com* `__) - -Paul Zavalney (`*paul.zavalney@silabs.com* `__) - -| Micrel Lab and Multitherman Lab -| University of Bologna, Italy - -| Integrated Systems Lab -| ETH Zürich, Switzerland - - -.. [1] - It is a testament on the quality of the work done by the PULP platform team - that it took a team of professonal verification engineers more than 9 months - to find all these issues. diff --git a/docs/source/load_store_unit.rst b/docs/source/load_store_unit.rst index e65ed59..6e22e7a 100644 --- a/docs/source/load_store_unit.rst +++ b/docs/source/load_store_unit.rst @@ -127,6 +127,8 @@ one ``data_rvalid_i`` will be signalled for each of them, in the order they were Post-Incrementing Load and Store Instructions --------------------------------------------- +This section is only valid if ``PULP_XPULP=1`` + Post-incrementing load and store instructions perform a load/store operation from/to the data memory while at the same time increasing the base address by the specified offset. For the memory access, the base @@ -139,16 +141,3 @@ instructions allow the address increment to be embedded in the memory access instructions and get rid of separate instructions to handle pointers. Coupled with hardware loop extension, these instructions allow to reduce the loop overhead significantly. - -.. only:: PMP - - Physical Memory Protection (PMP) Unit - ------------------------------------- - - The CV32E41P core has a PMP module which can be enabled by setting the - parameter PULP_SECURE=1 which also enabled the core to possibly run in - USER MODE. Such unit has a configurable number of entries (up to 16) and - supports all the modes as TOR, NAPOT and NA4. Every fetch, load and - store access executed in USER MODE are first filtered by the PMP unit - which can possibly generated exceptions. For the moment, the MPRV bit in - MSTATUS as well as the LOCK mechanism in the PMP are not supported. diff --git a/docs/source/pipeline.rst b/docs/source/pipeline.rst index 8249b92..8aecaa1 100644 --- a/docs/source/pipeline.rst +++ b/docs/source/pipeline.rst @@ -29,10 +29,10 @@ Pipeline Details CV32E41P has a 4-stage in-order completion pipeline, the 4 stages are: Instruction Fetch (IF) - Fetches instructions from memory via an aligning prefetch buffer, capable of fetching 1 instruction per cycle if the instruction side memory system allows. The IF stage also pre-decodes RVC instructions into RV32I base instructions. See :ref:`instruction-fetch` for details. + Fetches instructions from memory via an aligning prefetch buffer, capable of fetching 1 instruction per cycle if the instruction side memory system allows. See :ref:`instruction-fetch` for details. Instruction Decode (ID) - Decodes fetched instruction and performs required registerfile reads. Jumps are taken from the ID stage. + Decodes fetched instruction and performs required register file reads. Jumps are taken from the ID stage. Execute (EX) Executes the instructions. The EX stage contains the ALU, Multiplier and Divider. Branches (with their condition met) are taken from the EX stage. Multi-cycle instructions will stall this stage until they are complete. The ALU, Multiplier and Divider instructions write back their result to the register file from the EX stage. The address generation part of the load-store-unit (LSU) is contained in EX as well. @@ -43,8 +43,8 @@ Writeback (WB) Multi- and Single-Cycle Instructions ------------------------------------ -:numref:`Cycle counts per instruction type` shows the cycle count per instruction type. Some instructions have a variable time, this is indicated as a range e.g. 1..32 means -that the instruction takes a minimum of 1 cycle and a maximum of 32 cycles. The cycle counts assume zero stall on the instruction-side interface +:numref:`Cycle counts per instruction type` shows the cycle count per instruction type. Some instructions have a variable time, this is indicated as a range e.g. 3..35 means +that the instruction takes a minimum of 3 cycles and a maximum of 35 cycles. The cycle counts assume zero stall on the instruction-side interface and zero stall on the data-side memory interface. .. table:: Cycle counts per instruction type diff --git a/docs/source/register_file.rst b/docs/source/register_file.rst index 2cd1201..5e87f2b 100644 --- a/docs/source/register_file.rst +++ b/docs/source/register_file.rst @@ -60,7 +60,7 @@ Simulation of the latch-based register file is possible using commercial tools. .. note:: The latch-based register file cannot be simulated using Verilator. -The latch-based register file can also be used for FPGA synthesis, but this is not recommended as FPGAs usually do not well support latches. +The latch-based register file can also be used for FPGA synthesis, but this is not recommended as FPGAs may not support latches. To select the latch-based register file, make sure to use the source file ``cv32e41p_register_file_latch.sv`` in your project. In addition, a technology-specific clock gating cell must be provided to keep the clock inactive when the latches are not written. @@ -70,12 +70,12 @@ For more information regarding the clock gating cell, checkout :ref:`getting-sta FPU Register File ----------------- -In case the optional FPU is instantiated, the register file is extended +If the optional FPU is instantiated, unless ZFINX is configured, the register file is extended with an additional register bank of 32 registers ``f0``-``f31``. These registers are stacked on top of the existing register file and can be accessed concurrently with the limitation that a maximum of three operands per cycle can be read. Each of the three operands addresses is extended with -an fp_reg_sel signal which is generated in the instruction decoder +a register file select signal which is generated in the instruction decoder when a FP instruction is decoded. This additional signals determines if the operand is located in the integer or the floating point register file.