-
Notifications
You must be signed in to change notification settings - Fork 16
07. Considerations
Your archived website gets none but bigger over time. It can get so big with millions of files. Certain aspects must therefore come into considerations.
It is always advisable to limit the downloads each session with filtering options, including, but not limited to:
- Filtering by certain timestamps with
-f
or-t
options - Filtering by certain files with
-O
option - Do not download what you don't need with
-X
option - Minimize the number of simultaneous download by using small number to the
-c
option
It is a good ettiquete to crawl politely.
Avoid mass-scraping by overloading them with too many requests for too many big files as this will surely hurt the server.
If this occurs too often, they might take measures to block downloader tools such as this one, and in the long run, might lead to anti-scraping legal actions.
That said. So download wisely.
Windows has maximum of 248 characters on a directory path while a URL doesn't. This can lead to error due to this limitation and your files are not downloaded. In this case you can examine the log file and download manually from the source URL provided.
- πHome
- πRequirements
- πInstallation
- πStand Alone Exexutable
- πSource Code
- πBasic Usage
- πAdvanced Usage
- πCase Sensitive Parameter Names
- πDownloading Snapshots for All Timestamps
- πFrom Timestamp
- πTo Timestamp
- πLimiting Between Two Timestamps
- πLimiting The Number of Files to Download
- πExact URL
- πDownload Only Specific Files
- πExcluding Specific Files
- πDownload All HTTP Status Codes
- πDownload Multiple Files at a Time
- πDisplaying the File List Without Downloading
- πLog Files
- πConsiderations
- πContributing