Skip to content

Commit

Permalink
initial draft of new readme.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Ed Landamore committed Dec 10, 2020
1 parent d835af3 commit f84a674
Show file tree
Hide file tree
Showing 4 changed files with 416 additions and 2,516 deletions.
28 changes: 0 additions & 28 deletions .github/workflows/pythonpackage.yml
Original file line number Diff line number Diff line change
@@ -1,29 +1 @@
name: Python package

on:
push:
branches: [ master ]
pull_request:
branches: [ master ]

jobs:
build:

runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.6, 3.7]

steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v1
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Install package and test
run: |
make install test clean
102 changes: 48 additions & 54 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,70 +1,64 @@
# Data analysis
- Document here the project: 5-star
- Description: Project Description
- Data Source:
- Type of analysis:
# Project Overview

Please document the project the better you can.
The goal of the project was to explore Airbnb listings in London, from a host’s perspective, and predict guest review scores based on a certain property's attributes.

# Stratup the project
Ultimately this may just have the potential to become the one-stop shop tool for an Airbnb host when managing and optimising their listing offering.

The initial setup.
Data source: Inside Airbnb

Create virtualenv and install the project:
```bash
$ sudo apt-get install virtualenv python-pip python-dev
$ deactivate; virtualenv ~/venv ; source ~/venv/bin/activate ;\
pip install pip -U; pip install -r requirements.txt
```
Status - completed (version 1)

Unittest test:
```bash
$ make clean install test
```
## Team
Miles Tomlinson - [GH profile](https://github.com/milestommo)<br>
Ed Landamore - [GH profile](https://github.com/OrthoLoess)<br>
Elsa Lebrun-Grandie - [GH profile](https://github.com/ElsaLGF)<br>
Leone Cavicchia - [GH profile](https://github.com/leoncav)

Check for 5-star in gitlab.com/{group}.
If your project is not set please add it:
## Methods
Data exploration<br>
Inferential statistics<br>
Data visualisation<br>
Machine learning/predictive modelling<br>
Natural Language Processing<br>
App user interface design

- Create a new project on `gitlab.com/{group}/5-star`
- Then populate it:
## Tech
SQL<br>
Python (Jupyter)<br>
Pandas<br>
Numpy<br>
Matplotlib<br>
Seaborn<br>
Scikit Learn<br>
NLTK<br>
Miro Scratchpad<br>
Streamlit / HTML

```bash
$ ## e.g. if group is "{group}" and project_name is "5-star"
$ git remote add origin [email protected]:{group}/5-star.git
$ git push -u origin master
$ git push -u origin --tags
```
# Project Description
- Inspired by the wealth of data provided by Inside Airbnb, we chose to explore a listing’s review score and its relationship to the features that a property offers its guests
- The early stage of the project prioritised on what the end product would look like and how it could offer real value to hosts who wanted more insight on which features to address or install in order to improve a guest’s experience
- Miro was used to design the app wireframes and visualise the user flow
- The next stage centred around data understanding and exploration. With so much data collated for each listing (c 90 dataframe columns), the trick was to shortlist the most potentially influential features for the predictive model by undergoing multiple phases of feature prioritisation
- Pandas and Matplotlib were used to understand/visualise the make up of each feature while a dummy model regressor was used to highlight the more influential features in relation to the review score
- In the modelling phase, we put K Means clustering to good use along with manual grouping in Pandas to identify groups of listings that share a common set of fixed attributes that a host wouldn’t necessarily be able to change (eg borough location, number of bedrooms, property type, etc). This allowed the app to offer the functionality of being able to compare vs other hosts with similar properties
- The offering to the host was enhanced by applying NLP methods to the verbatim review feedback left by guests, with a focus on the top rated listings in each group allowing the host to leverage the qualitative insights available to them
- The final linear regression model used a set of features chosen to minimise multicollinearity. It used l2 regularisation to help control overfitting on the training set.
- Finally, all of this led to the creation of an interactive front end that would provide information about a host’s listing, the group a host belonged to and how the most influential features could be dialled up or down to positively, or negatively, affect the review score.

Functionnal test with a script:
```bash
$ cd /tmp
$ 5-star-run
```
# Install
Go to `gitlab.com/{group}/5-star` to see the project, manage issues,
setup you ssh public key, ...

Create a python3 virtualenv and activate it:

# Startup the project

The initial setup.

Create virtualenv and install the project:
```bash
$ sudo apt-get install virtualenv python-pip python-dev
$ deactivate; virtualenv -ppython3 ~/venv ; source ~/venv/bin/activate
$ deactivate; virtualenv ~/venv ; source ~/venv/bin/activate ;\
pip install pip -U; pip install -r requirements.txt
```

Clone the project and install it:
Run the streamlit server with
```bash
$ git clone gitlab.com/{group}/5-star
$ cd 5-star
$ pip install -r requirements.txt
$ make clean install test # install and test
$ streamlit run fivestar/five-star.py
```
Functionnal test with a script:
```bash
$ cd /tmp
$ 5-star-run
```

# Continus integration
## Github
Every push of `master` branch will execute `.github/workflows/pythonpackages.yml` docker jobs.
## Gitlab
Every push of `master` branch will execute `.gitlab-ci.yml` docker jobs.
Loading

0 comments on commit f84a674

Please sign in to comment.