This is an example project for sentiment analysis using TensorFlow and PyTorch. Sentiment analysis is the process of determining the sentiment or emotional tone of text, identifying whether it is positive, negative, neutral or irrelivant.
In this project, we utilize TensorFlow to build a machine learning model capable of classifying textual data based on sentiment. We will train the model on available datasets and evaluate its performance on new data. We are going to compare results with Pytorch, LogisticRegression and CatBoost.
The goal of this project is to provide a structured example of sentiment analysis on Twitter data, especially for newcomers to the field. By sharing the code, data, and documentation, I am aiming to assist beginners in understanding the process of building and training a sentiment analysis model using TensorFlow. Additionally, we strive to foster collaboration and learning within the NLP community. Also, we want to campare different methods on one case in order to show and present them.
The models performance results are presented in ml_logs.txt
Please read the MAQSAT in order to learn author' vision of this project.
The data used for training and evaluating the sentiment analysis model is sourced from the following Kaggle dataset: - Twitter Entity Sentiment Analysis Please refer to the dataset for more details about its contents and format. Make sure to comply with the dataset's license and terms of use when using the data.
The project has the following structure:
- bin/
- ready to go model
- build/
- configs/
- model YAML configs
- ml_logs.txt (.tsv file with different models metrics)
- labels.json (JSON file with label->number and number->label)
- data/
- our csv files
- docs/
- CODE_OF_CONDUCT.md
- MAQSAT.md
- TASKS.md
- jupyter_nbs/
- (Jupyter notebooks go here)
- source/
- (ready-to-use scripts go here)
- Clone the repository to your local machine:
git clone https://github.com/AigozhiyevB/twitter-nlp.git
- Ensure that TensorFlow is installed. You can install it using pip:
pip install tensorflow
- Navigate to the project directory:
cd twitter-nlp
- Setup venv:
python -m venv *venv_name*
*venv_name*/bin/activate
- Install the required dependencies from the
requirements.txt
file:
pip install -r requirements.txt
-
Prepare the data for training the model. Place your data in the
data/
directory. -
Run the Jupyter notebook to train the model based on the provided data. You will find the corresponding notebook in the
jupyter-nbs/
directory. -
Execute the script from the
source/
directory to apply the trained model to new data and perform sentiment analysis.
If you would like to contribute to the project, you can do the following:
- Fork the repository.
- Make the necessary changes or add new features.
- Submit a pull request to have your changes merged into the main project branch.
More detailed instructions are presented in the CONTRIBUTING
Where are plenty of tasks available in TASKS
Short summaries of used models:
- TensorFlow: Open-source ML framework by Google, supports CPUs, GPUs, and specialized hardware. Provides high-level APIs and low-level operations.
- Logistic regression: Statistical binary classification model using the logistic function. Assumes linear relationship between features and class probabilities.
- CatBoost: Gradient boosting framework by Yandex. Utilizes Ordered Boosting algorithm for effective handling of categorical features. Supports various tasks, GPU acceleration, and robustness to outliers.
- PyTorch: An open-source deep learning framework by Facebook's AI Research lab (FAIR). PyTorch is known for its flexibility and dynamic computation graph, making it popular for research and production.
The project is licensed under the MIT License.
I am an open-source developer licensed by JetBrains. This project is supported by JetBrains' Open Source Developer License. You can learn more about JetBrains and their open-source support here.
If you have any questions or suggestions, please contact me at [email protected].
This project is developed in the Republic of Kazakhstan.