Jigsaw-Toxic-Comment-Classification

Project Overview

The Toxic Comment Classification project is an application that uses machine learning to identify toxic comments. The application uses a dataset of comments from social media platforms, such as Twitter, to train a model that can detect toxic comments. The goal of the project is to develop a model that can accurately classify toxic comments and help moderators filter out comments that violate community guidelines.

Website

File structure

.
├── app_exception           # Custom exception
├── application_logging     # logging
├── data_given              # given Data
├── data                    # raw / processed/transformed data
├── saved_models            # classification model
├── report                  # model parameter and pipeline reports.
├── src                     # Source files for project implementation
├── webapp                  # ml web application
├── dvc.yaml                # data version control pipeline.
├── app.py                  # gradio app
├── param.yaml              # parameters
├── requirements.txt        # Dependesis for the project
└── README.md

Dataset

The dataset used in this project is the Toxic Comment Classification Challenge from Kaggle. The dataset contains approximately 159,000 comments from Wikipedia talk pages that have been labeled by human annotators as toxic or non-toxic. The dataset includes six different types of toxicity: toxic, severe toxic, obscene, threat, insult, and identity hate. The dataset is split into a training set and a testing set, with approximately 80% of the comments in the training set and 20% in the testing set.

Model Information

Experiments:

RNN:

A baseline was created using the RNN model. An embedding layer of size 64 was used. Training the model with an Adam optimizer with a learning rate of 0.001 for 10 epochs yielded an Accuracy of 83.68% and an ROC-AUC Score of 52.03%.

Contributions

Contributions to this project are welcome! To contribute, please follow the standard GitHub workflow for pull requests.

Contact Information

If you have any questions or comments about this project, feel free to contact the project maintainer at gmail

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.dvc		.dvc
data/raw		data/raw
data_given		data_given
notebook		notebook
predictions		predictions
saved_models/rnn_base		saved_models/rnn_base
src		src
.dvcignore		.dvcignore
.gitignore		.gitignore
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
app.py		app.py
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
params.yaml		params.yaml
requirements.txt		requirements.txt
setup.py		setup.py
template.py		template.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Jigsaw-Toxic-Comment-Classification

Project Overview

Website

File structure

Dataset

Model Information

Experiments:

RNN:

Contributions

Contact Information

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

praj2408/Jigsaw-Toxic-Comment-Classification

Folders and files

Latest commit

History

Repository files navigation

Jigsaw-Toxic-Comment-Classification

Project Overview

Website

File structure

Dataset

Model Information

Experiments:

RNN:

Contributions

Contact Information

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages