Skip to content

Latest commit



110 lines (79 loc) · 3.99 KB

File metadata and controls

110 lines (79 loc) · 3.99 KB

Writing Architecture Files

wav2letter++ provides a simple way to create fl::Sequential module for the acoustic model from text files. These are specified using the gflags -arch and -archdir.

Example architecture file:

# Comments like this are ignored
# the output tensor will have the shape (Time, 1, NFEAT, Batch)
V -1 1 NFEAT 0
C2 NFEAT 300 48 1 2 1 -1 -1
C2 300 300 32 1 1 1
RO 2 0 3 1
# the output should be with the shape (NLABEL, Time, Batch, 1)

While parsing, we ignore lines stating with # as comments. We also replace the following tokens NFEAT = input feature size (e.g. number of frequency bins), NLABEL = output size (e.g. number of grapheme tokens)

The first token in each line represents a specific flashlight/wav2letter module followed by the specification of its parameters.

Here, we describe how to specify different flashlight/wav2letter modules in the architecture files.

fl::Conv2D C2 [inputChannels] [outputChannels] [xFilterSz] [yFilterSz] [xStride] [yStride] [xPadding <OPTIONAL>] [yPadding <OPTIONAL>] [xDilation <OPTIONAL>] [yDilation <OPTIONAL>]

(Use padding = -1 for fl::PaddingMode::SAME)

fl::Linear L [inputChannels] [outputChannels]

fl::BatchNorm BN [totalFeatSize] [firstDim] [secondDim <OPTIONAL>] [thirdDim <OPTIONAL>]

fl::LayerNorm LN [firstDim] [secondDim <OPTIONAL>] [thirdDim <OPTIONAL>]

fl::WeightNorm WN [normDim] [Layer]

fl::Dropout DO [dropProb]


  1. Average : A [xFilterSz] [yFilterSz] [xStride] [yStride] [xPadding] [yPadding]
  2. Max : M [xFilterSz] [yFilterSz] [xStride] [yStride] [xPadding] [yPadding]

(Use padding = -1 for fl::PaddingMode::SAME)

fl::View V [firstDim] [secondDim] [thirdDim] [fourthDim]

(Use -1 to infer dimension, only one param can be a -1. Use 0 to use the corresponding input dimension.)

fl::Reorder RO [firstDim] [secondDim] [thirdDim] [fourthDim]


fl::ReLU R

fl::PReLU PR [numElements <OPTIONAL>] [initValue <OPTIONAL>]

fl::Log LG

fl::HardTanh HT

fl::Tanh T

fl::GatedLinearUnit GLU [sliceDim]

fl::LogSoftmax LSM [normDim]


  1. RNN : RNN [inputSize] [outputSize] [numLayers] [isBidirectional] [dropProb]
  2. GRU : GRU [inputSize] [outputSize] [numLayers] [isBidirectional] [dropProb]
  3. LSTM : LSTM [inputSize] [outputSize] [numLayers] [isBidirectional] [dropProb]

fl::Embedding E [embeddingSize] [nTokens]

fl::AsymmetricConv1D AC [inputChannels] [outputChannels] [xFilterSz] [xStride] [xPadding <OPTIONAL>] [xFuturePart <OPTIONAL>] [xDilation <OPTIONAL>]


RES [numLayers (N)] [numResSkipConnections (K)] [numBlocks <OPTIONAL>]

Residual skip connections between layers can only be added if these layers have already been added. There two ways to define residual skip connection:

  • standard
SKIP [fromLayerInd] [toLayerInd] [scale <OPTIONAL, DEFAULT=1>]
  • with a sequence of projection layers, when, for the residual skip connection, the number of channels in the output of fromLayer differs from the number of channels expected in the input of toLayer (or some transformation is needed to be applied):
SKIPL [fromLayerInd] [toLayerInd] [nLayersInProjection (M)] [scale <OPTIONAL, DEFAULT=1>]

where scale is the value by which the final output is multiplied ((x + f(x)) * scale). scale must be the same for all residual skip connections that share the same toLayer. (Use fromLayerInd = 0 for a skip connection from input, toLayerInd = N+1 for a residual skip connection to output, and fromLayerInd/toLayerInd = K for a residual skip connection from/to LayerK.)


TDS [kernel width] [input width] [channels] [drop prob]