-
Notifications
You must be signed in to change notification settings - Fork 252
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Showing
12 changed files
with
3,191 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
#Beanbun | ||
简介 | ||
---- | ||
Beanbun是一个简单可扩展的爬虫框架,基于[Workerman](http://www.workerman.net)。 | ||
|
||
特点 | ||
---- | ||
- 支持守护进程与xx两种模式 | ||
- 默认使用guzzle进行爬取 | ||
- 支持分布式 | ||
- 支持内存、Redis等多种队列方式 | ||
- 支持自定义URI过滤 | ||
- 支持广度优先和深度优先两种爬取方式 | ||
- 遵循PSR-4标准 | ||
- 爬取网页分为多步,每步均支持自定义动作(如添加代理、修改user-agent等) | ||
- 灵活的扩展机制,可方便的为框架制作插件:自定义队列、自定义爬取方式... | ||
|
||
安装 | ||
---- | ||
``` | ||
$ composer require kiddyuchina/Beanbun | ||
``` | ||
|
||
示例 | ||
---- | ||
创建一个文件start.php,包含以下内容 | ||
``` php | ||
<?php | ||
use Beanbun\Beanbun; | ||
$beanbun = new Beanbun; | ||
$beanbun->name = '950d'; | ||
$beanbun->count = 5; | ||
$beanbun->seed = 'http://www.950d.com/'; | ||
$beanbun->max = 100; | ||
$beanbun->logFile = __DIR__ . '/950d_access.log'; | ||
$beanbun->afterDownloadPage = function($beanbun) { | ||
file_put_contents(__DIR__ . '/' . md5($beanbun->url), $beanbun->page); | ||
}; | ||
$beanbun->start(); | ||
``` | ||
在命令行中执行 | ||
``` | ||
$ php start.php start | ||
``` | ||
|
||
更多详细内容,请查看[文档 http://www.beanbun.org](http://www.beanbun.org) | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
{ | ||
"name": "kiddyuchina/beanbun", | ||
"description" : "Beanbun 是用 PHP 编写的多进程网络爬虫框架,具有良好的开放性、高可扩展性", | ||
"type": "application", | ||
"keywords": ["spider", "crawler", "scraper"], | ||
"license": "MIT", | ||
"authors": [ | ||
{ | ||
"name": "Kidd Yu", | ||
"email": "[email protected]" | ||
} | ||
], | ||
"require": { | ||
"php": ">=5.5.0", | ||
"workerman/workerman": "^3.3", | ||
"imangazaliev/didom": "^1.8", | ||
"guzzlehttp/guzzle": "^6.0" | ||
}, | ||
"autoload": { | ||
"psr-4": { | ||
"Beanbun\\": "src/" | ||
} | ||
} | ||
} |
Oops, something went wrong.