Yet another scrapping tool.
$ pip install snagit
Or, to use lxml as your primary parser:
$ pip install snagit[lxml]
snagit
allows you to scrape multiple pages or documents by either running
script files, or by using an interactive REPL. For instance:
$ snagit
Type "help" for more information. Ctrl+c to exit
> load http://httpbin.org/links/3/{} range=0-2
> print
<html><head><title>Links</title></head><body>0 <a href='/links/3/1'>1</a> <a href='/links/3/2'>2</a> </body></html>
<html><head><title>Links</title></head><body><a href='/links/3/0'>0</a> 1 <a href='/links/3/2'>2</a> </body></html>
<html><head><title>Links</title></head><body><a href='/links/3/0'>0</a> <a href='/links/3/1'>1</a> 2 </body></html>
> strain a
> print
<a href="/links/3/1">1</a>
<a href="/links/3/2">2</a>
<a href="/links/3/0">0</a>
<a href="/links/3/2">2</a>
<a href="/links/3/0">0</a>
<a href="/links/3/1">1</a>
> unwrap a attr=href
> print
/links/3/1
/links/3/2
/links/3/0
/links/3/2
/links/3/0
/links/3/1
> list
load http://httpbin.org/links/3/{} range=0-2
print
strain a
print
unwrap a attr=href
print
- Process data as either a text block, lines of text, or HTML (using BeautifulSoup)
- Built-in scripting language
- REPL for command line interaction
- Python 3.11+
bs4
(BeautifulSoup 4.13+)cachely
0.2+
Using the convenience script run
:
$ git clone https://github.com/dakrauth/snagit.git
$ cd snagit
$ ./run init
$ ./run test
python -m venv --prompt snagit .venv
$ . .venv/bin/activate