- The main goal of this project is to build a data crawler system to collect information about reputable information technology conferences worldwide.
- This information may include conference names, deadlines, dates, locations, themes, and additional details such as speaker lists, accepted papers, and registration information.
- Identify a list of websites or data sources from which you want to gather information.
- Design the structure of the database to store this information.
- Build a program or script capable of automatically navigating websites, searching for information about conferences, and extracting data from these web pages.
- Suggested tools/libraries: Consider using tools like Scrapy or BeautifulSoup in Python, or any language suitable for your team's skills.
- After collecting the data, process and filter the information to eliminate unnecessary data and ensure data accuracy.
- Store the collected data in a database or file for future use.
- If necessary, build a user interface to interact with the collected data.
- Some websites may have anti-crawling mechanisms or limitations on access speed.
- Data on websites may change frequently, requiring regular updates.
- Data may not be presented in an easily analyzable format.
- This project can be beneficial for academic researchers, conference organizing entities, or individuals interested in tracking global conference events.
Note:
- Data collection from websites should adhere to copyright regulations and website policies. The use of collected information may need to comply with legal regulations and the specific policies of each website.
- CCFDDL
- LIX Polytechnique
- Link to crawler with scrapy: go here