Skip to content

AndrewKhassapov/website-to-pdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Website to PDF

A web crawler that prints a website to .pdf format

🌐🕸️ ⏩ 📂📜

Requirements:

✔️ 🐍 python 3.x environment

✔️ 📁 wkHTMLtoPDF installed on system

✔️ 🐍 pdfkit pypi library. pdfkit is a python wrapper for wkHTMLtoPDF.

✔️ 🐍 BeautifulSoup 4 pypi library

How to use:

▶️ Set list urls_to_parse with all URLs to save to .pdf format.

urls_to_parse = ["<URL_1>", "<URL_2>", ..., "<URL_N>"] # Where URL_n is your desired URL.

The list can be collected by either:

🅰️ ➡️ Using return from get_url_list_from_site( <MY SITE eg. http://example.com> )

or

🅱️ ➡️ Using return from get_url_list_from_file( <MY FILE | DEFAULT = input/urls.txt> )

▶️ Run website-to-pdf.py

▶️ All URLs will be saved as .pdf to the output/ directory from source website-to-pdf.py

License:

MIT license compliant. Software provided as is. All content is free to use and modify.

andrewkhassapov github1

Footnotes

  1. GitHub shields provided by Shields.io

Releases

No releases published

Packages

No packages published

Languages