By Camila Andrea Cardona Alzate
This workshop aims to make a prediction of the happiness score, making use of five csv files of rankings from various countries, from 2015 to 2019. In this case, we will make use of an exploratory data analysis, the respective transformations to concatenate the five files, stream the data using Kafka, and evaluate the prediction accuracy of the model used.
- Pandas, to manipulate the data.
- Postgres to create the table of happiness index, and insert data with the final prediction.
- Docker to run Kafka and create a topic.
- Kafka to stream the data and make predictions.
-
Clone this repository:
git clone https://github.com/Camiau20/workshop_3.git)https://github.com/Camiau20/workshop_3.git
-
Create a virtual environment and activate it:
python -m venv env
-
Install the dependencies:
pip install -r requirements.txt
-
Go to the docker folder and run the docker compose:
docker-compose up -d
-
Enter to the Kafka bash and create a new topic with:
docker exec -it kafka-test bash kafka-topics --bootstrap-server kafka-test:9092 --create --topic topic_name
NOTE: The topic_name you choose must be change on the producer_features.py and the consumer_predictions.py.
- Create database credentials file 'config_db.json' with the following structure and fill it with your own credentials:
{ "host": "", "database": "", "user": "", "password": "" }
- After you have your data in the DB you can run the metrics.py to get the metrics of the model precision.