This library is designed to scrape, store, and process bibliographic data from Google Scholar. It consists of two main components:
data_scraper.py
: Scrapes academic data and saves each entry as a pickle file.data_handler.py
: Reads and processes the stored pickle files, extracting relevant metadata and generating structured outputs.
Ensure you have Python 3 installed and install scholarly.
-
Modify
config.json
to add the desired queries before running the scraper. This file should contain the search terms or parameters you want to use when collecting data. -
Run
data_scraper.py
script to collect and store academic data into pickle files.
python data_scraper.py
- Run
data_handler.py
script to read and process the stored entries.
python data_handler.py
This will generate structured outputs in multiple formats:
- JSON (
scholar_results.json
) - CSV (
scholar_results.csv
) - BibTeX (
scholar_results.bib
)