This project analyzes and predicts apartment rental prices in Manhattan using machine learning techniques. The dataset is sourced from StreetEasy and contains various features about rental listings, such as the number of bedrooms, bathrooms, square footage, amenities, and proximity to the subway.
-
Data Loading & Exploration
- The dataset is loaded using pandas and basic exploration is performed to understand the structure and contents.
-
Feature Selection & Correlation Analysis
- Correlation between features and the target variable (
rent
) is analyzed. - Highly correlated features are visualized using a heatmap, and multicollinearity is addressed by removing redundant features.
- Correlation between features and the target variable (
-
Data Visualization
- Scatter plots and boxplots are used to visualize relationships between features and the target, and to detect outliers.
-
Missing Value Analysis
- Null values are visualized and counted to ensure data quality.
-
Model Selection & Cross-Validation
- Multiple regression models are evaluated using cross-validation:
- Linear Regression
- Ridge Regression
- Lasso Regression
- Random Forest Regressor
- Support Vector Regressor (SVR)
- The best-performing model is selected based on cross-validation scores.
- Multiple regression models are evaluated using cross-validation:
-
Model Training & Prediction
- The selected model is trained on the full dataset.
- Predictions are made for new data samples.
- Python 3.x
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
- Clone this repository.
- Install the required dependencies.
- Open the Jupyter Notebook and run the cells sequentially.
- Modify the
new_data
variable to predict rent for custom apartment features.