This Jupyter Notebook is designed to tackle the problem of predicting restaurant costs in Bengaluru using a dataset sourced from Zomato. The dataset includes features like location, rating, etc., and the target variable is the average cost for 2 people to have a meal at a restaurant.
- Introduction
- Data Loading
- Data Exploration
- Data Preprocessing
- Feature Engineering
- Model Training
- Evaluation
- Submission
- Requirements
- How to Run
- Acknowledgements
The notebook outlines a workflow for predicting the cost of a restaurant based on various features. It includes steps for data loading, exploration, preprocessing, feature engineering, model training, evaluation, and generating a submission file.
The dataset is loaded into a Pandas DataFrame from a CSV file. This step ensures that the data is ready for exploration and preprocessing.
Initial exploration includes displaying the first few rows of the dataset, checking for missing values, and performing basic statistical analysis. This helps in understanding the structure and content of the data.
Several preprocessing steps are applied to the data:
- Handling missing values through methods like forward filling.
- Encoding categorical features such as location and cuisine types using techniques like one-hot encoding.
- Normalizing numerical features like ratings and votes to ensure they are on a similar scale.
- Splitting the dataset into training and testing sets to evaluate model performance effectively.
Creating new features or transforming existing ones to enhance model performance. This can include creating interaction terms or polynomial features, aggregating data, or other domain-specific transformations.
Training various machine learning models on the preprocessed data. Models can include linear regression, decision trees, random forests, or more complex algorithms like gradient boosting or neural networks. The training process involves fitting the models to the training data and tuning hyperparameters for optimal performance.
Evaluating the trained models using metrics such as mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), or R-squared. This step includes making predictions on the test set and comparing them to the actual values to assess model accuracy. Visualizations and detailed analysis of the model performance are also provided.
Generating a submission file with the predictions from the best-performing model. The predictions are saved to a CSV file for submission to a competition platform or further analysis.
- Python 3.10 or later
- Jupyter Notebook
- Pandas
- NumPy
- Scikit-learn
- Matplotlib
- Seaborn
- Clone this repository.
- Install the required packages.
- Open the notebook in Jupyter.
- Run the notebook cells sequentially to execute the entire workflow.
Special thanks to the dataset providers. The dataset can be found here