hatErase

Realtime twitter hate speech detection

This is the master branch of the repo it contains the codebase of Machine Learning approach that we followed to train our Model

The other branch is webapp which contains the codebase for deployment part mainly

Dataset Collection

We used different open source datasets, from different hackathons and competetions and combined them to make a bid dataset which containes variety of tweets the dataset majorly focuses on English Language Dataset Exploration has the code for all the exploration part of dataset and how we concatenated them.

Dataset Preprocessing

Dataset Preprocessing contains the code of how we cleaned the dataset as it can not be directly fed to the Machine Learning Models. How different techniques we used to useful features from the text like hashtags, user mentions etc.

Machine Learning Models

We trained our model using two prominent ML algorithms for Binary Classification, namely - Multinomial Naive Bayes and Logistic Regression.

The final model was saved based on training LR with n-grams of range (1,3) as lexical features.

The trainingg set classification report was:

The Test set classification report was:

The AUC-ROC curve for test set was:

Hate Score prediction

documentation goes here

Contributed By

Nitin Chauhan and Srijan Singh

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.ipynb_checkpoints		.ipynb_checkpoints
api-test-code		api-test-code
images		images
models		models
Dataset_Exploration.ipynb		Dataset_Exploration.ipynb
LICENSE		LICENSE
LR_training.ipynb		LR_training.ipynb
LSTM_train.ipynb		LSTM_train.ipynb
ML_Pipelines.ipynb		ML_Pipelines.ipynb
MNB.ipynb		MNB.ipynb
MNB_training.ipynb		MNB_training.ipynb
README.md		README.md
dataset_preprocess.ipynb		dataset_preprocess.ipynb
image1.jpg		image1.jpg
image2.jpg		image2.jpg
image3.jpg		image3.jpg
testing.py		testing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hatErase

Dataset Collection

Dataset Preprocessing

Machine Learning Models

Hate Score prediction

Contributed By

About

Releases

Packages

Contributors 2

Languages

License

chauhan17nitin/hatErase

Folders and files

Latest commit

History

Repository files navigation

hatErase

Dataset Collection

Dataset Preprocessing

Machine Learning Models

Hate Score prediction

Contributed By

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages