All notable changes to this project will be documented in this file. See standard-version for commit guidelines.
0.1.6 (2019-11-18)
- counter: init config after load (656c5a7)
- crawler: meta merge (07af099)
- item: family argument (85a9ff6)
- chain: default & kwargs (8629dcf)
- item: meta considered as xpath vars (a34750b)
- item: ParselxItem (a3ef679)
- task: pass meta between parent and child task (3ae6c46)
0.1.5 (2019-11-07)
- chain: change naming (62995e2)
- chain: spawn crawler during run() (6ccee3a)
- cralwer: wrong default values (2ec846f)
- examples: custom process change (b926c78)
- exceptions: dumping error (4157c3b)
- http: default callback condition check (0d40b67)
- http: FileRequest filename (76bc483)
- http: pass meta args (6c2b64b)
- item: log as class attr (66222c1)
- item: log, store as attributes (12d5a33)
- log: remove handlers at beginning (a143405)
- middleware: change default priority; add name (e5f1573)
- middleware: check if the handler already exists (69e6c1e)
- web: change naming; expose routes; (c077f01)
- web: default action return items (6981883)
- default values; naming; (dee18c5)
- acrawler: add_and_wait (256cbb4)
- chain: add multiple tasks (0b6d14a)
- chain: ChainCrawler use(); typo (71dbf2d)
- chain: implement ChainCrawler ChainRequest (0d59537)
- chain: implement ChainItem (9807b9c)
- chain: pretty debug (b7f3995)
- chain: spawn by xpath rule (4bb2271)
- chain: status control (6e2c8f6)
- chain: support web service (b9f7c32)
- expiredwathcer: delay and retry (fd1f135)
- http: FileRequest use args to determine fdir (f0aff89)
- middleware: @register accepts more types (dbf28ad)
- middleware: @register supports generator func (f83b11d)
- parser: accepts callbacks (4ea5b8e)
- processors: processors of string type (31fa2c3)
- processors: re_groups (157e481)
- processors: register&use (e46b7b7)
- processors: support drop_item (44efee0)
- processors: to_date enhanced (9c1fa22)
- web: expose web.routes (94a8c30)
- x: processors as a module (7032286)
0.1.4 (2019-09-17)
- crawler: remove signal handlers after shutdown (2cf1bff)
- handlers: ToMongo won't create index (73cc390)
- log: change logger formatter (30e5068)
- cralwer: middleware with priority True (b30f836)
- crawler: use dill for pickling (b3228ad)
- response: paginate accepts keyword arguments (fcd59cc)
0.1.3 (2019-08-24)
0.1.1 (2019-08-23)
- style (9f91351)
0.1.0 (2019-08-23)
- examples: update for change of item parsing rules (4eea6ac)
- handler: event as instance'attr (6d3164d)
- item: clean processors (a90470a)
- item: clean rules (fe48ef8)
- task: optional ancestor (0a2df10)
- counter: accept flag param (adfeb10)
- counter: count requests in progress (3ae9753)
- counter: delay in counter (cd734ec)
- crawler: check import (ab300b2)
- crawler: name attr (35d62a2)
- crawler: pickle all types of task (53e087f)
- http: picklable response (93c74de)
- item: picklable (fc16e41)
- item: support dropField (49f1d2c)
- item: support inline rule (e864a4a)
- response: paginate & follow (c66d0d3)
0.0.9 (2019-07-28)
- max_requests & download_delay bug (0681ad9)
- http: correct fingerprint for request (ba2aba1)
- utils: make_links_absolute fix dot_all (c5ff219)
- bugs (459bef3)
- exceptions: reschedule now release req limit (c648215)
- http: req hosts limits (ec4593d)
- item: default processor strip (7562bb8)
- handlers: implement ExpiredWatcher (3cb2e8d)
- http: open method for response (659544a)
- item: add bind meth for processors (61bc0fa)
- item: add store attr (023b796)
- item: new Field parsing (501164f)
- item: processors map & filter (969762d)
- utils: sync_coroutine (80853bb)
0.0.8 (2019-07-02)
- crawler: add dict new_task (039ff93)
- crawler: simplify logging (c5bf20a)
- examples: update wh-crawler for new site (cdfafe3)
- handlers: tomongo supports index (d0051c1)
- http: BrowserRequest's exception catching (5049a3e)
- http: correct absolut links (ed66892)
- http: not catching json error anymore (06086a1)
- scheduler: transfer waiting queue using zrangebyscore (af3574d)
- crawler: now task can be yielded from handler (b235261)
- examples: crawl pythonclock.org (javascript) (17334ca)
- http: add PyQuery support (4137cbf)
- http: implement BrowserRequest (58aa947), closes #9
- http: special delay and random delay (76eae8f)
- http: support absolute links (d34f08e)
- http: support DISABLE_COOKIES (0123526)
- parser: support add_meta (dca8505)
- web: implement web_action_after_query (2481e8d)
0.0.7 (2019-06-04)
- crawler: cancel task correctly (803160f)
- crawler: crawler's method can be pickled now (480b8f0)
- crawler: dynamically configure & style fix (03d975f)
- crawler: enable attributes are properties now (b8d67c0)
- crawler: import user setting (8630714)
- crawler: shutdown correctly to wait for nonrequest and nonwaiting tasks (badf859)
- crawler: start_requests() supports NonRequest Task (cbc5271)
- crawler: start,finish are methods now (5872fc4)
- handler: check before add callbacks to response (0c4061f)
- http: initialize Response directly (bc6aafc)
- setting: clear relations of config redis/web/lock (aaae1d9)
- task: better retry/recrawl & ignore_exception (cd4ff49)
- task: default fingerprint is hash() (b53e6d4)
- counter: enable persistence (ed1fbbe)
- counter: implement counter in diff module (7f11017)
- counter: implement Redis Counter (2a8d9ba)
- counter: support join_by_ancestor (e59587c)
- crawler: support customed web_add_task_query() (926d411)
- crawler: support LOCK_ALWAYS without Counter (3acb588)
- crawler: support ReScheduleError (b797272)
- crawler: support web service (add_task) (0face33)
- http: response now has property: json (8da841c)
- http: support MAX_REQUESTS_PER_HOST (23dbc85)
- http: support MAX_REQUESTS_PER_HOST / SPECIAL_HOST using Counter (feb7139)
- task: support ignore_exception and exceptions (1a188b0)
- task: support SkipTaskError (0edcb87)
- settings: max_requests not accepted as Crawler's attribute (dc36db9)
- two errors (ac47cc1)
- settings: You must use config to assign MAX_REQUESTS
0.0.6 (2019-05-27)
- examples: imdb popular movies (1791d82)
- handlers: ItemToMongo supports update operation (557c0b5)
- item: ParselItem supports default processors (strip by default) (0826548)
- parse: urljoin now also accepts list (463ab93)
0.0.5 (2019-05-21)
- crawler: use parsers rather than Parsers (b01ffdc)
- item: better log & not support custom_parse (1db20b7)
- examples: scrape Bilibili video info (e81d5d2)
- parse: Request from start_request() has default callback parse() (9c061fc)
0.0.5 (2019-05-21)
- crawler: use parsers rather than Parsers (b01ffdc)
- item: better log & not support custom_parse (1db20b7)
- examples: scrape Bilibili video info (e81d5d2)
- parse: Request from start_request() has default callback parse() (9c061fc)
0.0.5 (2019-05-21)
- crawler: use parsers rather than Parsers (b01ffdc)
- item: better log & not support custom_parse (1db20b7)
- examples: scrape Bilibili video info (e81d5d2)
- parse: Request from start_request() has default callback parse() (9c061fc)
0.0.4 (2019-05-20)
- crawler: shutdown after non-request tasks finish (dc5404c)
- examples: update WALLHAVEN with callback (93c6dae)
- http: fix pickle for Request (6b83d36)
- crawler: catch system's signals gracefully (f5f595c), closes #3
- crawler: support persistent crawling (d55cfce), closes #4
- handler: support mongodb & fix bugs (8956a31)
- http: support status_allowed parameter and setting (dc86fdf)
- item: custom_process allows async/asyncgenerator (3a6f64b)
- parse: ParselItem supports rules_first (e6cd5c3)
- task: support recrawl, exetime (77c59d7), closes #5
0.0.3 (2019-05-18)
- examples: update due to breaking change (1839f95)
- global: not contain self.logger anymore (0691204)
- http: correct request's encoding (64129a2)
- http: detach Response from aiohttp.ClientResponse (3b168d3)
- http: Request/Response now only accept one callback function (2e7171b)
- http: Response dynamically decode body & slim codes (c782e5c)
- http: use aiofiles to write files (7f22025)
- examples: crawl v2ex hot page (314c3a7)
- allow to directly yield dictionary & no pyquery (9c9d3bd)
- examples: provide Redis-based quotes crawler (39357bd)
- examples: provide WALLHAVEN downloader (a40b9e5)
- handlers: check Response' status < 400 (8743537)
- http: add pickle support for Request class (a4c98de)
- http: implement FileRequest to download/save file (e034360)
- http: use List to store multiple callback functions (1a753ac)
- middleware: multiple families for Task and only one for Handler (878d89d)
- middleware: use decorator to append Handler (20453cd), closes #1
- parse: implement Response.urljoin (22329d2)
- parse: support callback() decorator (19fe8c4)
- utils: better support for aioredis (9920405)
- AsyncPQ & RedisPQ (1183892)
- middleware: other method to add Handler is not recommended. user @middleware.register(,,) instead.
- global: Now all classes don't contain logger member. You can use acrawler.get_logger() to get a logger.
- item: change rule naming to rules (3173077)
- item: support custom parsel's parse (c0714eb)
- dynamically import aioredis (7071ba4)
- http: add sel / callback to response; add url_str & logger (84a412b)
- middleware: handlers support priority (f1106b2)
- request and fingerprint (d8112e2)