IDP Project Repository

Project Description

Analyze the personality of a person from scraping result of their social media account.

Order of the code:

Initial Data Exploration:

Open the python notebook file Data_exploration.ipynb on Jupyter Notebook, Google Colab or VS Code.
Adjust the path of the input file in cell 4.
Run.
Since it's a Jupyter Notebook, the outputs from the latest execution of the file is visible on github upon opening it.

Twitter Scraping:

install snscrape using the following command: pip3 install --upgrade git+https://github.com/JustAnotherArchivist/snscrape.git
Run twitter.py to scrape twitter. Adjust line 102 to set the directory of the initail founders dataset.
The scrapper requires internet connection.
The scrapping sometimes randomly shuts down due to network issues. As a result please keep track of until which index twitter data has been scrapped.
Then rerun the twitter.py by adjusting the value of starting index at line 105.

LinkedIn Scraping:

It needs selenium to be installed, pip install selenium
run linkedin.py main.
filepath is hardcoded as csv_file_path, if column names differ modify csv_file_content inputs.
if you like to iterate more than 3576 rows (it was our sample size) modify for loop interval.
The scrapper requires internet connection.
LinkedIn frequently changes its site structure for preventing scraping, check the structure of the profiles page.

Processing the text prior to usage of Receptiviti API:

process_text.py contains necessary fucntions to remove urls, symbols etc processing which must be done before feeding the texts to receptiviti API.
It takes the given txt files from defined directory and controls if the txt files contains required number of words or less
If the file contains more than 500 words it will take only first 500, if less then it will take all

Reciptiviti API:

Define input and output directories as; data_directory, json_directory respectively.
Define starting and ending index numbers, it's dependent on alphabetical order.
receptiviti_send() method contains api_key and api_secret, these must be changed if it will be used with a different account
Returned Json files will be saved in the defined directory.

Combine generated personality scores with founders data:

Open Combine_dataset.ipynb on Jupyter Notebook, Google Colab or VS Code.
Adjust the path to the initial dataset on cell 3.
Adjust the path to the directory containing all JSON files in which personality scores of the founders are stored.
Run
Since it's a Jupyter Notebook, the outputs from the latest execution of the file is visible on github upon opening it.

Generate boxplots of personality score distribution:

Open boxplot.ipynb on Jupyter Notebook, Google Colab or VS Code.
Set the path conraining all JSON files containing personality of founders in cell 3.
Run
Since it's a Jupyter Notebook, the outputs from the latest execution of the file is visible on github upon opening it.

Correlation Analysis Numerical:

Run Correlation2.ipynb with the merged_dataset.csv in the same directory as the python notebook.
Since it's a Jupyter Notebook, the outputs from the latest execution of the file is visible on github upon opening it.

Company level Statistical Analysis:

Run company_analysis.ipynb with the merged_dataset.csv in the same directory as the python notebook.
Since it's a Jupyter Notebook, the outputs from the latest execution of the file is visible on github upon opening it.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
assets		assets
figures		figures
.gitignore		.gitignore
Combine_dataset.ipynb		Combine_dataset.ipynb
Correlation2.ipynb		Correlation2.ipynb
Data_exploration.ipynb		Data_exploration.ipynb
LICENSE		LICENSE
README.md		README.md
app.py		app.py
boxplot.ipynb		boxplot.ipynb
callbacks.py		callbacks.py
company_analysis.ipynb		company_analysis.ipynb
correlation.py		correlation.py
framework.md		framework.md
layout.py		layout.py
linkedin.py		linkedin.py
merge_csv.py		merge_csv.py
merged_dataframe.csv		merged_dataframe.csv
plot.py		plot.py
process_text.py		process_text.py
receptiviti.py		receptiviti.py
style.py		style.py
twitter.py		twitter.py
utils.py		utils.py
visualize.py		visualize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IDP Project Repository

Project Description

Order of the code:

Initial Data Exploration:

Twitter Scraping:

LinkedIn Scraping:

Processing the text prior to usage of Receptiviti API:

Reciptiviti API:

Combine generated personality scores with founders data:

Generate boxplots of personality score distribution:

Correlation Analysis Numerical:

Company level Statistical Analysis:

About

Releases

Packages

Contributors 3

Languages

License

lptl/idp

Folders and files

Latest commit

History

Repository files navigation

IDP Project Repository

Project Description

Order of the code:

Initial Data Exploration:

Twitter Scraping:

LinkedIn Scraping:

Processing the text prior to usage of Receptiviti API:

Reciptiviti API:

Combine generated personality scores with founders data:

Generate boxplots of personality score distribution:

Correlation Analysis Numerical:

Company level Statistical Analysis:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages