This is a crawler program for pixiv, written in Java.
Support crawling daily leaderboards, male/female hot leaderboards, newcomer leaderboards, original work leaderboards and their sub-categories, including R18 mode.
- Leaderboards crawl support.
- Proxy support.
- YAML format.
- SQLite database record the pictures that have been crawled to ensure that the pictures are not downloaded repeatedly.
Prepare: a Java runtime environment, a browser that can view cookies(Chrome as a example), a network environment that can access pixiv or a proxy that can access pixiv, a pixiv account.
- open Chrome.
- access www.pixiv.net, and log in.
- press F12 to bring up the DevTools. Find and select Application in the upper right corner(might hide in '>>').
- select Cookies-https://www.pixiv.net on the left and you will see a form, this is cookies.
- copy the value of PHPSESSID.
- run the program once, the config will be automatically generated, open it.
- set host and port in proxy(unless you can access directly).
- paste the value you copy on step 5.
- set a leaderboards page's link you like as the page to start crawling under startpage.
- run the program through the command or the attached startup script (run.bat / run.sh).
#if you don't need proxy, just set ''.
proxy:
host: '127.0.0.1'
port: '10809'
#find value of PHPSESSID in the cookies of pixiv.
cookie: 'YOUR-COOKIE-HERE'
#this page link will be the start page when crawlering.
#this link will be changed when program running, and will stop on the last page you had crawlerd.
startpage: 'https://www.pixiv.net/ranking.php?mode=male'
#picture file storage path, %HERE% is the folder of program's jar.
imagesavepath: '%HERE%/Crawled'