Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neural Network with one hidden layer with no activation function #1

Open
manolo20 opened this issue Feb 8, 2019 · 1 comment
Open

Comments

@manolo20
Copy link
Contributor

manolo20 commented Feb 8, 2019

Based on the model_summary results.

Is the model a neural network with one hidden layer with no activation function ("embeddings" layer) and one output layer with a softmax activation function?

After, are the weights of the "embeddings" and the "probs" layer stored and used to make predictions in new data?

@rossbm
Copy link
Contributor

rossbm commented Mar 22, 2019

Hi Manolo:

  • The last layer of the model can use a softmax function in order outputs that sum to one. In this way it ensures that outputs can be interpreted as probabilities. While the term softmax is commonly used in deep learning literature, it is really just the generalization of logistic regression to work with more than 2 classes. You can refer to 4.4 in Elements of Statistical Learning for an indepth explanation of logistic regression
    • Note that you don't need to output the normalized probabilites if all you are doing is ranking the classes for a particular record. You can the logits (normalized or not) for that.
  • The model should always have an embedding layer so that the character and word grams are represented numerically.
    • Once the ngrams are represented numerically, other layers can transform these numerical representation before the final layer
  • These layers can use activation functions or can just be linear transformations depending on what model you specify
  • My experience was that very simple models worked the best. This was partially due to the relatively small number of training examples as well as the simplicity of the text provided by the supermarket retailers. If you now have much more training data or are working with other types of data, I encourage to experiment and figure out what works best. That's certainly what I did 🥳!
  • You definitely need to store the weights of the model as well as some metadata that specifies the structure of the model

Hope that helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants