GitHub - hassanteymoori/Income-classification: this will be a archive repo for the big data computing project.

Income classification

This project is related to the Big-data computing (2020-2021) course at Sapienza University as a final project.

Project Status: [Done]

Project Intro/Objective

The purpose of this project is to make predictions where the prediction task is to determine whether a person makes over 50K a year or not. (classification task).
I have worked with many supervised ML algorithms to analyse the performance of the model. I am expected to use PySpark with mllib instead of plain python with sk-learn. To analyze the data, I have done the best model selection to choose the best classifier to predict whether a person makes over 50K a year.

Methods Used

Machine Learning
Data Visualization
Predictive Modeling
MLlib
PySpark

Technologies

Python
Pandas, jupyter
Numpy
PySpark

Project Description and dataset

I have used the Income classification dataset for this project which is publicly availabe in the kaggle website. This dataset contains more than 40k entries and 15 columns. Many pre-processing, cleaning, imputing, encoding, balancing and scaling were addressed. Since the dataset contains many categorical features, the number of features as result of encoding, were increased to more than 100 features. Therefore, feature engineering were addressed to increase the performance.

Needs of this project

data exploration/descriptive statistics
data processing/cleaning
statistical modeling
writeup/reporting
mllib learning
PySpark workflow
big-data concepts

Outcome

The related results and performance metrics will be found in the outcome directory

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
dataset		dataset
outcome		outcome
.gitignore		.gitignore
README.md		README.md
pyspark_notebook.ipynb		pyspark_notebook.ipynb
skewness.jpg		skewness.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Income classification

Project Status: [Done]

Project Intro/Objective

Methods Used

Technologies

Project Description and dataset

Needs of this project

Outcome

About

Releases

Packages

Languages

hassanteymoori/Income-classification

Folders and files

Latest commit

History

Repository files navigation

Income classification

Project Status: [Done]

Project Intro/Objective

Methods Used

Technologies

Project Description and dataset

Needs of this project

Outcome

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages