rule4ml
is a tool designed for pre-synthesis estimation of FPGA resource utilization and inference latency for machine learning models.
rule4ml
releases are uploaded to the Python Package Index for easy installation via pip
.
pip install rule4ml
This will only install the base package and its dependencies for resources and latency prediction. The data_gen scripts and the Jupyter notebooks are to be cloned from the repo if needed. The data generation dependencies are listed seperately in data_gen/requirements.txt.
To get started with rule4ml
, please refer to the detailed Jupyter Notebook tutorial. This tutorial covers:
- Using pre-trained estimators for resources and latency predictions.
- Generating synthetic datasets.
- Training and testing your own predictors.
Here's a quick example of how to use rule4ml
to estimate resources and latency for a given model:
import keras
from keras.layers import Input, Dense, Activation
from rule4ml.models.estimators import MultiModelEstimator
# Example of a simple keras Model
input_size = 16
inputs = Input(shape=(input_size,))
x = Dense(32, activation="relu")(inputs)
x = Dense(32, activation="relu")(x)
x = Dense(32, activation="relu")(x)
outputs = Dense(5, activation="softmax")(x)
model_to_predict = keras.Model(inputs=inputs, outputs=outputs, name="Jet Classifier")
model_to_predict.build((None, input_size)) # building keras models is required
# Loading default predictors
estimator = MultiModelEstimator()
estimator.load_default_models()
# MultiModelEstimator predictions are formatted as a pandas DataFrame
prediction_df = estimator.predict(model_to_predict)
# Further formatting can applied to organize the DataFrame
if not prediction_df.empty:
prediction_df = prediction_df.groupby(
["Model", "Board", "Strategy", "Precision", "Reuse Factor"], observed=True
).mean() # each row is unique in the groupby, mean() is only called to convert DataFrameGroupBy
# Outside of Jupyter notebooks, we recommend saving the DataFrame as HTML for better readability
prediction_df.to_html("keras_example.html")
keras_example.html (truncated)
BRAM (%) | DSP (%) | FF (%) | LUT (%) | CYCLES | |||||
---|---|---|---|---|---|---|---|---|---|
Model | Board | Strategy | Precision | Reuse Factor | |||||
Jet Classifier | pynq-z2 | Latency | ap_fixed<2, 1> | 1 | 2.77 | 0.89 | 2.63 | 30.02 | 54.68 |
2 | 2.75 | 0.86 | 2.62 | 29.91 | 55.84 | ||||
4 | 2.70 | 0.79 | 2.58 | 29.80 | 55.78 | ||||
8 | 2.97 | 0.67 | 2.49 | 29.79 | 68.84 | ||||
16 | 2.97 | 0.63 | 2.50 | 30.24 | 75.38 | ||||
32 | 2.26 | 0.74 | 2.43 | 30.90 | 76.19 | ||||
64 | 0.83 | 0.47 | 2.19 | 32.89 | 112.04 | ||||
ap_fixed<8, 3> | 1 | 2.63 | 1.58 | 13.91 | 115.89 | 53.96 | |||
2 | 2.63 | 1.50 | 13.63 | 111.75 | 54.70 | ||||
4 | 2.59 | 1.25 | 13.07 | 108.52 | 56.16 | ||||
8 | 2.76 | 1.41 | 12.22 | 108.01 | 53.07 | ||||
16 | 3.42 | 1.96 | 11.98 | 104.58 | 64.71 | ||||
32 | 2.99 | 1.93 | 12.74 | 94.71 | 83.06 | ||||
64 | 0.56 | 1.70 | 14.74 | 92.78 | 104.88 | ||||
ap_fixed<16, 6> | 1 | 1.78 | 199.86 | 45.96 | 184.86 | 66.59 | |||
2 | 2.30 | 198.30 | 45.71 | 190.51 | 68.14 | ||||
4 | 2.38 | 198.50 | 45.95 | 195.05 | 73.15 | ||||
8 | 1.48 | 175.18 | 46.42 | 188.65 | 95.70 | ||||
16 | 2.90 | 83.85 | 48.13 | 184.96 | 101.44 | ||||
32 | 4.43 | 51.04 | 51.83 | 193.38 | 141.07 | ||||
64 | 0.75 | 30.32 | 55.36 | 193.26 | 229.37 |
Training accurate predictors requires large datasets of synthesized neural networks. We used hls4ml to synthesize neural networks generated with parameters randomly sampled from predefined ranges (defaults of data classes in the code). Our models' training data is publicly available at https://borealisdata.ca/dataverse/rule4ml.
In their current iteration, the predictors can process Keras or PyTorch models to generate FPGA resources (BRAM, DSP, FF, LUT) and latency (Clock Cycles) estimations for various synthesis configurations. However, the training models are limited to specific layers: Dense/Linear, ReLU, Tanh, Sigmoid, Softmax, BatchNorm, Add, Concatenate, and Dropout. They are also constrained by synthesis parameters, notably clock_period (10 ns) and io_type (io_parallel). Inputs outside these configurations may result in inaccurate predictions.
This project is licensed under the GPL-3.0 License. See the LICENSE file for details.