Skip to content

rodrigovdf/real-estate-scrapper

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CDIA 2021-2

Real Estate Scraper

Framework for rental advertisement collection and analysis

Summary
  1. About
  2. Installation
  3. How to use

About

This is a second-semester college project from students in the Data Science and Artificial Intelligence course at the Pontifical Catholic University of São Paulo (PUC-SP)

Motivation

Many of us, despite studying at PUC, live far from São Paulo. With the return of in-person activities, a big problem that we would all have to face emerged: finding a good place to live in São Paulo.

There are many websites that advertise properties for rent. In order to find a good offer, the user is required, in addition to browsing several sites, to pay attention to several variables, such as the distance from the desired location, the price of the property (plus the condominium, if any; plus taxes and fees). It is very hard work that can be automated.

This gave us the opportunity to use our programming and statistical knowledge acquired during the course to provide the user with tools that facilitate this process and also contribute to the community, since there are not many projects to solve this problem in Brazil.

Goals

Our goal was initially to extract rental advertisements from 3 websites: trovit, zapimoveis, vivareal. These ads are scraped and saved to a database, but can also be saved to a local CSV file.

After the scraping, each scraper delivers a standard output with the information of each ad following the model below:

[
{"preço": 123, "endereço": "abc", "vagas": 123, "área": 123, "quartos": 123, "banheiros": 123, "link": "VivaReal", "img1": 'imagem'},
{"preço": 123, "endereço": "abc", "vagas": 123, "área": 123, "quartos": 123, "banheiros": 123, "link": "Trovit", "img1": 'imagem'},
]

With that, we planned provide the user some features.

First, a module which we call RentStats, which facilitates access to basic statistics about the data, without requiring the user to have knowledge of Pandas.

Second, a form of spatial visualization. We did this by providing two classes: LocationsMap and HeatMap.

LocationsMap HeatMap

In LocationsMap, each dot represents a rental ad, and the color of the markers represents how expensive that rental price is relative to all other rentals in the view.

In HeatMap we make the relative representation of prices as a heat map.

Installation

# Requires Python >= 3.8
pip install real_estate

How to Use

# Importing the Collector
from real_estate import Collector
# Collecting the data
extractor = Collector('/path/to/chromedriver', "Perdizes, São Paulo")  # Example Address
extractor.collect_data()
# Creating a DataFrame
df = extractor.data
# Plotting LocationsMap map
from real_estate import LocationsMap
map1 = LocationsMap(df)
map1.print()
# The map is showed in the output

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 96.5%
  • Python 3.5%