Telco Customer Churn Prediction

The goal of this project is to predict customer churn for a telecommunications company. The dataset contains information about customers, including their demographics, services they subscribe to, account information, and whether they churned or not. The project involves exploratory data analysis (EDA), data visualization, feature engineering, and data preprocessing to prepare the data for modeling.

Expected outcome of these process is to have a clean, well-understood dataset ready for feature engineering and model development. All the steps will be documented and explained in a Jupyter notebook. The project will involve identifying and handling missing values, encoding categorical variables, scaling numerical features, and splitting the data into training, validation, and test sets. The data will be saved to an .npz file, which can then be loaded for training a machine learning model.

After the data is cleaned and prepared, the project will involve training, tuning and deploying a machine learning model to predict customer churn using Google Cloud Vertex AI. The model will be evaluated using metrics such as accuracy, precision, recall, and F1 score. The project will also involve identifying important features that contribute to customer churn and providing recommendations to reduce churn rate.

Exploratory Data Analysis (EDA)

Data Visualization

Feature Engineering

AverageMonthlyCharges: It's common for customers to have variations in their charges throughout their tenure. This feature represents the average spend per month.
TenureGroups: Grouping tenure into categorical bins could reveal patterns related to customer loyalty and churn rate.

Identify outliers

IQR Method
Z-score Method

Encode Categorical Variables

Encode binary variables (gender, Partner, Dependents, PhoneService, PaperlessBilling, Churn) with 0 and 1.
Use one-hot encoding for nominal variables with more than two categories (MultipleLines, InternetService, OnlineSecurity, OnlineBackup, DeviceProtection, TechSupport, StreamingTV, StreamingMovies, Contract, PaymentMethod, TenureGroups) to prepare them for modeling.
Scale Numerical Features: Standardize or normalize AverageMonthlyCharges,tenure, MonthlyCharges, and TotalCharges.

Training Data Preparing

Splits the data into feature (X) and label (y) arrays.
Uses train_test_split twice to create a train set (60% of the data), a validation set (20%), and a test set (20%).
Saves the training, validation, and test sets to an .npz file, which can then be loaded for training.

Machine Learning Model Development

The goal is to select and prototype suitable machine learning algorithms for predicting customer churn for a subscription-based telco service. This involves evaluating various models to identify the most effective approach for this specific churn prediction task.

Initial Model Prototyping

Several models were prototyped to assess their suitability and performance for the churn prediction task. These models can be built using standard libraries with minimal effort. If the dataset and preprocessing required vary significantly from one model to another, resulting in considerable training effort, we must stick to theoretical concepts. This approach involves selecting a few ML algorithms well-suited for the task and limiting the number of models tried. But in this case following models were prototyped:

Logistic Regression Model Prototyping
Random Forest Model Prototyping
XGBoost Model Prototyping
DNN Model Prototyping
CNN for Tabular Data Prototyping

Evaluation Metrics

For each prototyped model, several key metrics were considered to evaluate performance, including accuracy, precision, recall, and the confusion matrix. These metrics provide a comprehensive view of each model's strengths and weaknesses in predicting customer churn. Based on those metrics, the best models for Vertex AI Vizier hyperparameter tuning will be selected.

Vertex AI Training, Tuning, and Deployment

Training XGBoost Model on Vertex AI: Train the XGBoost Model on Vertex AI as a custom training job.
Training DNN Model on Vertex AI: Train the DNN Model on Vertex AI as a custom training job.
Tuning XGBoost Model on Vertex AI: Tune the XGBoost Model on Vertex AI using Vizier hyperparameter tuning.
Tuning DNN Model on Vertex AI: Tune the DNN Model on Vertex AI as part of a custom training job.
Deployment: Deploy the model as Vertex AI model endpoints for predictions.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
notebook		notebook
vertex-ai		vertex-ai
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Telco Customer Churn Prediction

Exploratory Data Analysis (EDA)

Data Visualization

Feature Engineering

Identify outliers

Encode Categorical Variables

Training Data Preparing

Machine Learning Model Development

Initial Model Prototyping

Evaluation Metrics

Vertex AI Training, Tuning, and Deployment

About

Releases

Packages

Languages

VLTSankalpa/TelcoChurnPrediction-VertexAI

Folders and files

Latest commit

History

Repository files navigation

Telco Customer Churn Prediction

Exploratory Data Analysis (EDA)

Data Visualization

Feature Engineering

Identify outliers

Encode Categorical Variables

Training Data Preparing

Machine Learning Model Development

Initial Model Prototyping

Evaluation Metrics

Vertex AI Training, Tuning, and Deployment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages