A web crawler that prints a website to .pdf format
🌐🕸️ ⏩ 📂📜
✔️ 🐍 python 3.x environment
✔️ 📁 wkHTMLtoPDF installed on system
✔️ 🐍 pdfkit pypi library. pdfkit is a python wrapper for wkHTMLtoPDF.
✔️ 🐍 BeautifulSoup 4 pypi library
urls_to_parse
with all URLs to save to .pdf format.
urls_to_parse = ["<URL_1>", "<URL_2>", ..., "<URL_N>"] # Where URL_n is your desired URL.
The list can be collected by either:
get_url_list_from_site( <MY SITE eg. http://example.com> )
or
get_url_list_from_file( <MY FILE | DEFAULT = input/urls.txt> )
output/
directory from source website-to-pdf.py
MIT license compliant. Software provided as is. All content is free to use and modify.
Footnotes
-
GitHub shields provided by Shields.io ↩