Skip to content

The Toxic Comment Classification project is an application that uses deep learning to identify toxic comments as toxic, severe toxic, obscene, threat, insult, and identity hate based using various NLP algorithm

License

Notifications You must be signed in to change notification settings

praj2408/Jigsaw-Toxic-Comment-Classification

Repository files navigation

Jigsaw-Toxic-Comment-Classification

Project Overview

The Toxic Comment Classification project is an application that uses machine learning to identify toxic comments. The application uses a dataset of comments from social media platforms, such as Twitter, to train a model that can detect toxic comments. The goal of the project is to develop a model that can accurately classify toxic comments and help moderators filter out comments that violate community guidelines.

Website

image

File structure

.
├── app_exception           # Custom exception
├── application_logging     # logging
├── data_given              # given Data
├── data                    # raw / processed/transformed data
├── saved_models            # classification model
├── report                  # model parameter and pipeline reports.
├── src                     # Source files for project implementation
├── webapp                  # ml web application
├── dvc.yaml                # data version control pipeline.
├── app.py                  # gradio app
├── param.yaml              # parameters
├── requirements.txt        # Dependesis for the project
└── README.md

Dataset

The dataset used in this project is the Toxic Comment Classification Challenge from Kaggle. The dataset contains approximately 159,000 comments from Wikipedia talk pages that have been labeled by human annotators as toxic or non-toxic. The dataset includes six different types of toxicity: toxic, severe toxic, obscene, threat, insult, and identity hate. The dataset is split into a training set and a testing set, with approximately 80% of the comments in the training set and 20% in the testing set.

Model Information

Experiments:

RNN:

A baseline was created using the RNN model. An embedding layer of size 64 was used. Training the model with an Adam optimizer with a learning rate of 0.001 for 10 epochs yielded an Accuracy of 83.68% and an ROC-AUC Score of 52.03%.

Contributions

Contributions to this project are welcome! To contribute, please follow the standard GitHub workflow for pull requests.

Contact Information

If you have any questions or comments about this project, feel free to contact the project maintainer at gmail

License

MIT License

About

The Toxic Comment Classification project is an application that uses deep learning to identify toxic comments as toxic, severe toxic, obscene, threat, insult, and identity hate based using various NLP algorithm

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published