Skip to content

Commit

Permalink
init
Browse files Browse the repository at this point in the history
init
  • Loading branch information
kiddyuchina authored Apr 13, 2017
1 parent 75d3a99 commit e1320bf
Show file tree
Hide file tree
Showing 12 changed files with 3,191 additions and 0 deletions.
48 changes: 48 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
#Beanbun
简介
----
Beanbun是一个简单可扩展的爬虫框架,基于[Workerman](http://www.workerman.net)

特点
----
- 支持守护进程与xx两种模式
- 默认使用guzzle进行爬取
- 支持分布式
- 支持内存、Redis等多种队列方式
- 支持自定义URI过滤
- 支持广度优先和深度优先两种爬取方式
- 遵循PSR-4标准
- 爬取网页分为多步,每步均支持自定义动作(如添加代理、修改user-agent等)
- 灵活的扩展机制,可方便的为框架制作插件:自定义队列、自定义爬取方式...

安装
----
```
$ composer require kiddyuchina/Beanbun
```

示例
----
创建一个文件start.php,包含以下内容
``` php
<?php
use Beanbun\Beanbun;
$beanbun = new Beanbun;
$beanbun->name = '950d';
$beanbun->count = 5;
$beanbun->seed = 'http://www.950d.com/';
$beanbun->max = 100;
$beanbun->logFile = __DIR__ . '/950d_access.log';
$beanbun->afterDownloadPage = function($beanbun) {
file_put_contents(__DIR__ . '/' . md5($beanbun->url), $beanbun->page);
};
$beanbun->start();
```
在命令行中执行
```
$ php start.php start
```

更多详细内容,请查看[文档 http://www.beanbun.org](http://www.beanbun.org)


24 changes: 24 additions & 0 deletions composer.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
"name": "kiddyuchina/beanbun",
"description" : "Beanbun 是用 PHP 编写的多进程网络爬虫框架,具有良好的开放性、高可扩展性",
"type": "application",
"keywords": ["spider", "crawler", "scraper"],
"license": "MIT",
"authors": [
{
"name": "Kidd Yu",
"email": "[email protected]"
}
],
"require": {
"php": ">=5.5.0",
"workerman/workerman": "^3.3",
"imangazaliev/didom": "^1.8",
"guzzlehttp/guzzle": "^6.0"
},
"autoload": {
"psr-4": {
"Beanbun\\": "src/"
}
}
}
Loading

0 comments on commit e1320bf

Please sign in to comment.