You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is the model a neural network with one hidden layer with no activation function ("embeddings" layer) and one output layer with a softmax activation function?
After, are the weights of the "embeddings" and the "probs" layer stored and used to make predictions in new data?
The text was updated successfully, but these errors were encountered:
The last layer of the model can use a softmax function in order outputs that sum to one. In this way it ensures that outputs can be interpreted as probabilities. While the term softmax is commonly used in deep learning literature, it is really just the generalization of logistic regression to work with more than 2 classes. You can refer to 4.4 in Elements of Statistical Learning for an indepth explanation of logistic regression
Note that you don't need to output the normalized probabilites if all you are doing is ranking the classes for a particular record. You can the logits (normalized or not) for that.
The model should always have an embedding layer so that the character and word grams are represented numerically.
Once the ngrams are represented numerically, other layers can transform these numerical representation before the final layer
These layers can use activation functions or can just be linear transformations depending on what model you specify
My experience was that very simple models worked the best. This was partially due to the relatively small number of training examples as well as the simplicity of the text provided by the supermarket retailers. If you now have much more training data or are working with other types of data, I encourage to experiment and figure out what works best. That's certainly what I did 🥳!
You definitely need to store the weights of the model as well as some metadata that specifies the structure of the model
Based on the model_summary results.
Is the model a neural network with one hidden layer with no activation function ("embeddings" layer) and one output layer with a softmax activation function?
After, are the weights of the "embeddings" and the "probs" layer stored and used to make predictions in new data?
The text was updated successfully, but these errors were encountered: