-
Notifications
You must be signed in to change notification settings - Fork 253
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
d0efeef
commit 7e6e986
Showing
5 changed files
with
28 additions
and
27 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,48 +1,53 @@ | ||
#Beanbun | ||
简介 | ||
---- | ||
Beanbun 是一个简单可扩展的爬虫框架,支持守护进程模式与普通模式,守护进程模式基于 Workerman,下载器基于 Guzzle。 | ||
Beanbun 是一个简单可扩展的爬虫框架,支持分布式,支持守护进程模式与普通模式,守护进程模式基于 [Workerman](http://www.workerman.net),下载器基于 [Guzzle](http://guzzle.org)。 | ||
|
||
特点 | ||
---- | ||
- 支持守护进程与普通两种模式 | ||
- 默认使用guzzle进行爬取 | ||
- 支持守护进程与普通两种模式(守护进程模式只支持 Linux 服务器) | ||
- 默认使用 guzzle 进行爬取 | ||
- 支持分布式 | ||
- 支持内存、Redis等多种队列方式 | ||
- 支持内存、Redis 等多种队列方式 | ||
- 支持自定义URI过滤 | ||
- 支持广度优先和深度优先两种爬取方式 | ||
- 遵循PSR-4标准 | ||
- 爬取网页分为多步,每步均支持自定义动作(如添加代理、修改user-agent等) | ||
- 遵循 PSR-4 标准 | ||
- 爬取网页分为多步,每步均支持自定义动作(如添加代理、修改 user-agent 等) | ||
- 灵活的扩展机制,可方便的为框架制作插件:自定义队列、自定义爬取方式... | ||
|
||
安装 | ||
---- | ||
|
||
Beanbun 可以通过 composer 进行安装。 | ||
|
||
``` | ||
$ composer require kiddyu/beanbun | ||
``` | ||
|
||
示例 | ||
快速开始 | ||
---- | ||
创建一个文件start.php,包含以下内容 | ||
|
||
创建一个文件 start.php,包含以下内容 | ||
|
||
``` php | ||
<?php | ||
use Beanbun\Beanbun; | ||
$beanbun = new Beanbun; | ||
$beanbun->name = '950d'; | ||
$beanbun->count = 5; | ||
$beanbun->seed = 'http://www.950d.com/'; | ||
$beanbun->max = 100; | ||
$beanbun->logFile = __DIR__ . '/950d_access.log'; | ||
$beanbun->seed = [ | ||
'http://www.950d.com/', | ||
'http://www.950d.com/list-1.html', | ||
'http://www.950d.com/list-2.html', | ||
]; | ||
$beanbun->afterDownloadPage = function($beanbun) { | ||
file_put_contents(__DIR__ . '/' . md5($beanbun->url), $beanbun->page); | ||
}; | ||
$beanbun->start(); | ||
``` | ||
在命令行中执行 | ||
``` | ||
$ php start.php start | ||
$ php start.php | ||
``` | ||
|
||
更多详细内容,请查看[文档 http://www.beanbun.org](http://www.beanbun.org) | ||
接下来就可以看到抓取的日志了。 | ||
更多详细内容,请查看 [文档](http://www.beanbun.org) | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters