Skip to content

Commit

Permalink
更新版本号与 README.md 文件
Browse files Browse the repository at this point in the history
  • Loading branch information
kiddyuchina committed Apr 16, 2017
1 parent d0efeef commit 7e6e986
Show file tree
Hide file tree
Showing 5 changed files with 28 additions and 27 deletions.
37 changes: 21 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,48 +1,53 @@
#Beanbun
简介
----
Beanbun 是一个简单可扩展的爬虫框架,支持守护进程模式与普通模式,守护进程模式基于 Workerman,下载器基于 Guzzle
Beanbun 是一个简单可扩展的爬虫框架,支持分布式,支持守护进程模式与普通模式,守护进程模式基于 [Workerman](http://www.workerman.net),下载器基于 [Guzzle](http://guzzle.org)

特点
----
- 支持守护进程与普通两种模式
- 默认使用guzzle进行爬取
- 支持守护进程与普通两种模式(守护进程模式只支持 Linux 服务器)
- 默认使用 guzzle 进行爬取
- 支持分布式
- 支持内存、Redis等多种队列方式
- 支持内存、Redis 等多种队列方式
- 支持自定义URI过滤
- 支持广度优先和深度优先两种爬取方式
- 遵循PSR-4标准
- 爬取网页分为多步,每步均支持自定义动作(如添加代理、修改user-agent等
- 遵循 PSR-4 标准
- 爬取网页分为多步,每步均支持自定义动作(如添加代理、修改 user-agent 等
- 灵活的扩展机制,可方便的为框架制作插件:自定义队列、自定义爬取方式...

安装
----

Beanbun 可以通过 composer 进行安装。

```
$ composer require kiddyu/beanbun
```

示例
快速开始
----
创建一个文件start.php,包含以下内容

创建一个文件 start.php,包含以下内容

``` php
<?php
use Beanbun\Beanbun;
$beanbun = new Beanbun;
$beanbun->name = '950d';
$beanbun->count = 5;
$beanbun->seed = 'http://www.950d.com/';
$beanbun->max = 100;
$beanbun->logFile = __DIR__ . '/950d_access.log';
$beanbun->seed = [
'http://www.950d.com/',
'http://www.950d.com/list-1.html',
'http://www.950d.com/list-2.html',
];
$beanbun->afterDownloadPage = function($beanbun) {
file_put_contents(__DIR__ . '/' . md5($beanbun->url), $beanbun->page);
};
$beanbun->start();
```
在命令行中执行
```
$ php start.php start
$ php start.php
```

更多详细内容,请查看[文档 http://www.beanbun.org](http://www.beanbun.org)
接下来就可以看到抓取的日志了。
更多详细内容,请查看 [文档](http://www.beanbun.org)


2 changes: 1 addition & 1 deletion src/Beanbun.php
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

class Beanbun
{
const VERSION = '1.0.0';
const VERSION = '1.0.1';

public $id = null;
public $name = null;
Expand Down
12 changes: 4 additions & 8 deletions src/Lib/Helper.php
Original file line number Diff line number Diff line change
@@ -1,12 +1,8 @@
<?php
namespace Beanbun\Lib;

Class Helper {

public static $data = '';
public static $curl = '';
public static $page = '';

Class Helper
{
public static $userAgent = [
'pc' => [
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36',
Expand Down Expand Up @@ -41,11 +37,11 @@ public static function getUrlbyHtml($html, $url)
$pattern = "'<\s*a\s.*?href\s*=\s*([\"\'])?(?(1) (.*?)\\1 | ([^\s\>]+))'isx";
preg_match_all($pattern, $html, $match);
$match = array_merge($match[2], $match[3]);
$hrefs = array_flip(array_flip($match));
$hrefs = array_flip(array_flip(array_filter($match)));
foreach ($hrefs as $key => $href) {
$hrefs[$key] = self::formatUrl($href, $url);
}
return $hrefs;
return array_flip(array_flip($hrefs));
}

public static function formatUrl($l1, $l2)
Expand Down
2 changes: 1 addition & 1 deletion src/Queue/MemoryQueue.php
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
class MemoryQueue implements QueueInterface
{
public $globalData = null;
public $maxQueueSize = 0;
public $maxQueueSize = 10000;
public $maxQueuedCount = 0;

protected static $server = [];
Expand Down
2 changes: 1 addition & 1 deletion src/Queue/RedisQueue.php
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ class RedisQueue implements QueueInterface
{
public $redis = null;
public $config = [];
public $maxQueueSize = 0;
public $maxQueueSize = 10000;
public $maxQueuedCount = 0;

protected $name = '';
Expand Down

0 comments on commit 7e6e986

Please sign in to comment.