A perfect tool for crawling Baidu
简体中文
|
繁體中文
|
English
Getting Started »
View Demo
·
Report Issue
·
Feature Request
Table of Contents
Search engine is a very powerful tool. However, if other tools could implant the most features of the search engine, then it will be even more powerful. But, I have not found any web spider to extract the search results accurately. So, with that goal in mind, I developed this project to crawl Baidu: BaiduSpider.
Here's why:
-
Makes the time of extracting data less, which speeds up the development of projects like deep-learning.
-
Extract data accurately, without Ads.
-
Provides in-detailed search results, supports multiple search types and return models.
Of course, nothing is perfect, including this project. Any open-source project needs the community's help. You can help BaiduSpider by opening an issue or submit a PR! 😄
Some of the helpful documentations and tools will be listed in the acknowledgements.
Some open-source packages used in BaiduSpider.
Please follow the steps below in order to install BaiduSpider.
Before installing BaiduSpider, please make sure you have Python3.6+
installed:
$ python --version
If the version is lower than 3.6.0
, please go to python.org to download and install a higher version of Python.
Please enter the following commands in the terminal:
$ pip install baiduspider
$ git clone [email protected]:BaiduSpider/BaiduSpider.git
# ...
$ python setup.py install
You can get the search result by using one simple command using BaiduSpider:
# Import BaiduSpider
from baiduspider import BaiduSpider
from pprint import pprint
# Generate the BaiduSpider object
spider = BaiduSpider()
# Search the web
pprint(spider.search_web(query='Python'))
For more examples and configurations, please refer to the documentation.
See the open issues for a list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b NewFeatures
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin username/BaiduSpider
) - Open a Pull Request
Distributed under the GPL-V3 License. See LICENSE
for more information.
samzhangjy - @samzhangjy - [email protected]
Project Link: https://github.com/BaiduSpider/BaiduSpider
This project can only be used for learning purposes and cannot be used in commercial projects or crawl a lot of data. Also, BaiduSpider is distributed under the GPL-V3
license, meaning any project using BaiduSpider must be open-source and link to this project. The author of this project will not afford any legal risks. It is hereby stated that offenders are responsible for the consequences.