Skip to content

Mushroom-Wang/kaggle-airbnb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Classification on Airbnb Availability Data

Purpose

This project aims to predict whether an Airbnb is avaliable based on a bunch of features.

Data

The data comes from the Inside Airbnb data platform, specifically the Airbnb listings in the region of Asheville, NC. It is cleaned up for this in-class competition. The attributes and the labels are created by some heuristic.

Pipeline

First, an exploratory analysis is presented, which contains how I preprocess data and search for the correlation between target variable and features. I also present results on the feature importance and feature selection. Second, I introduce the evaluated models and the motivations of selecting them. One novel part I want to highlight is that I employ a powerful hyperparameter auto-tuner and provide a reusable and easy-to-config implementation. Finally, I report the prediction results.

Table of contents

Project Report

Check out my project report

Installation

(Back to top)

To use this project, first clone the repo on your device using the command below:

git init

git clone https://github.com/Mushroom-Wang/kaggle-airbnb.git

Development

(Back to top)

Exploratory Analysis

I draw the scatter plots of the target variable vs. each exploratory variable. Figure 1: Correlation Matrix

Feature Selection

First, I use the Random Forest algorithm to choose the features. Second, I compare the accuracy of the training data when changing the number of features (from 2 to 20) to select. According to Figure 1, choosing more features yields a higher training accuracy. Since selecting 14 to 19 features yields similar classification accuracy, further analysis needed to determine which features are more important. Figure 2: Box Plot of RFE Number of Selected Features vs. Classification Accuracy

Feature Importance

The feature importance provides some guidelines for me to drop several features when run the model later. Figure 3: Bar Chart of DecisionTreeRegressor Feature Importance Scores

Footer

(Back to top)

Leave a star in GitHub if you found this helpful.

About

Predict whether an Airbnb is available

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published