Skip to content

Many variants and approximations of Support Vector Machines are implemented and analysed for the task of hate-speech detection in a text sentence.

Notifications You must be signed in to change notification settings

piyushsinghpasi/SVM-based-Hate-speech-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

SVM based Hate Speech Detection

An implementation that uses support vector machine to classify text into either toxic or non-toxic class. Many variants, approximations of SVMs are implemented and analyzed to compare the effect of SVMs on the Binary classification Task of Hate Speech Detection.

Resources and Libraries

Platform : Python Jupyter, Google Colab or VS Code.
VS Code (version 1.59.0) was used to run the code.

numpy (version 1.19.5)
pickle (version 4.0)
regex (version 2019.12.20)
sklearn (version 0.24.2)
chars2vec (version 0.1.7)
keras (version 2.7.0)
matplotlib (version 3.4.3)
nltk (version 3.6.5)
pandas (version 1.3.4)
sentence-transformers (version 2.1.0)
tensorflow (version 2.7.0)
datasets library is used

If any other dependency issue then please install from within notebook only using '!pip install package_name'

Dataset

Jigsaw Toxic Comment Challenge Dataset is used: link

Running notebooks

Before running the notebooks go to the google drive link below and download the folder for all models and data

GDrive link

Once downloaded please adjust the ingestion path in the notebooks to point to correct location accordingly.

For each Notebook in Code folder:
Run all the cells for training and testing of SVM model.

Caution:

Run all cells for functions to be defined, Note and Cautions are mentioned in the Notebook as well

References

https://www.kaggle.com/eikedehling/feature-engineering
https://www.analyticsvidhya.com/blog/2021/04/a-guide-to-feature-engineering-in-nlp/
https://keras.io/getting_started/faq/#how-can-i-obtain-the-output-of-an-intermediate-layer-feature-extraction
https://www.kaggle.com/jpmiller/augmenting-the-data
https://github.com/keunwoochoi/transfer_learning_music
keras-team/keras#2588
keras-team/keras#6090
https://github.com/UKPLab/sentence-transformers
https://scikit-learn.org/stable/modules/kernel_approximation.html
https://stackoverflow.com/questions/23056460/does-the-svm-in-sklearn-support-incremental-online-learning
https://towardsdatascience.com/how-to-make-sgd-classifier-perform-as-well-as-logistic-regression-using-parfit-cc10bca2d3c4
https://stackoverflow.com/questions/51495819/how-to-plot-svm-decision-boundary-in-sklearn-python
https://stackabuse.com/implementing-svm-and-kernel-svm-with-pythons-scikit-learn/
https://www.datacamp.com/community/tutorials/svm-classification-scikit-learn-python

About

Many variants and approximations of Support Vector Machines are implemented and analysed for the task of hate-speech detection in a text sentence.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published