-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit a61146c
Showing
46 changed files
with
12,658 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
__pycache__/ | ||
*.pyc | ||
.pytest_cache/ | ||
dist/ | ||
build/ | ||
*.spec |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
MIT License | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
|
||
all: | ||
@echo Available targets: | ||
@echo | ||
@echo " test - run pytest" | ||
@echo " lint - run lint" | ||
@echo " clean - clean everything, including windows build" | ||
@echo " w - git commit" | ||
@echo " win32 - build win32 executable" | ||
@echo " build_samples - rebuild samples in samples/ directory (used for testing), make sure your .rdf output is correct!" | ||
|
||
test: | ||
py.test-2.7 | ||
#pytest | ||
|
||
lint: lint_test lint_util | ||
|
||
lint_test: | ||
python2 /usr/local/bin/pylint test_s2z.py | ||
|
||
lint_util: | ||
python2 /usr/local/bin/pylint scrapbook2zotero.py | ||
|
||
clean: | ||
rm -rf __pycache__ | ||
rm -f *.pyc | ||
rm -rf dist | ||
rm -rf build | ||
rm -f scrapbook2zotero.spec | ||
rm -rf .pytest_cache | ||
rm -f tmp/*.rdf | ||
|
||
w: | ||
git commit -a -m "working..." | ||
|
||
win32: | ||
wine pyinstaller --onefile scrapbook2zotero.py | ||
|
||
# Run this if you are sure that scrapbook2zotero output is correct | ||
# pytest tests depend on samples/*.rdf files | ||
build_samples: | ||
./scrapbook2zotero.py scrapbook_test_data samples/standard.rdf | ||
./scrapbook2zotero.py scrapbook_test_data samples/standard_1_4_excluded.rdf --exclude 1 4 | ||
./scrapbook2zotero.py scrapbook_test_data samples/standard-no-collections.rdf --nocoll | ||
./scrapbook2zotero.py scrapbook_test_data samples/standard-no-tags.rdf --notags |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
# Scrapbook/Scrapbook X to Zotero migration tool | ||
|
||
Python script to migrate Scrapbook repository to Zotero. Scrapbook / Scrapbook X is a Firefox note-taking and web page capturing plugin. Zotero is reference management software to manage bibliographic data and related research materials (such as PDF files, HTML files, etc), or, simply put, Scrapbook on steroids. | ||
|
||
## Reasons | ||
|
||
Alas, Firefox is dying. Mozilla developers decided to scrap Firefox's powerful extensions system and replaced it with Google's lousy webextensions, throwing everybody out of current ecosystem. Thus, all excellent Firefox plugins are obsolete, including many important plugins like Scrapbook. There is no way to re-implement Scrapbook under new restrictions. Also, they push 'Pocket' as a cloud solution, but it's not a replacement for Scrapbook, since everything is kept in their cloud. If they choose to shutdown 'Pocket' product you WILL loose all your data immediately, so don't use 'Pocket'. | ||
|
||
Zotero is a mature and stable stand-alone program, able to perform (almost) all Scrapbook tasks. You can install Zotero plugin, currently there are plugins for Firefox, Chrome and Safari, and use that plugin to capture pages into Zotero database. | ||
|
||
Unfortunately, Zotero can't import Scrapbook data directly. So I wrote this script to export Scrapbook repository into Zotero's RDF format. Zotero can import its own files and thus import all Scrapbook data, including complete saved HTML pages and PDFs. | ||
|
||
## Installing | ||
|
||
### Windows: | ||
|
||
Download scrapbook2zotero.exe, open CMD shell in your download directory or place .exe file somewhere in your PATH. | ||
|
||
### Linux: | ||
|
||
apt-get install python pip pytest | ||
pip install rdflib | ||
git clone https://github.com/burbilog/scrapbook2zotero.git | ||
cd scrapbook2zotero | ||
./scrapbook2zotero.py ... | ||
|
||
## Usage | ||
|
||
scrapbook2zotero.py [-h] [--debug] [--exclude EXCLUDE [EXCLUDE ...]] [--version] [--nocoll] [--notags] SCRAPBOOKDIR OUTPUT.RDF | ||
|
||
positional arguments: | ||
SCRAPBOOKDIR Source directory, usually somewhere inside mozilla profile | ||
OUTPUT.RDF Output RDF file name. Use '-' to specify standard output. | ||
|
||
optional arguments: | ||
-h, --help show this help message and exit | ||
--debug Print debug messages | ||
--exclude EXCLUDE [EXCLUDE ...] | ||
One or more record numbers to exclude | ||
--version show program's version number and exit | ||
--nocoll Disable export of collections | ||
--notags Disable export of tags | ||
|
||
Scrapbook directory is usually something like `C:\Users\Your username\AppData\Roaming\Mozilla\Firefox\[Your Firefox profile]\Scrapbook` on Windows and is something like `~/.mozilla/.firefox/[Your Firefox profile]/Scrapbook` on Linux, unless you've changed that in Scrapbook options. Since it's your data, I'd backup it before doing anything, just in case. | ||
|
||
Output file is an RDF data for Zotero import, give it name like import.rdf. | ||
|
||
Generate RDF file, then import it into Zotero. During import click `My Library` and watch import counter increase until import is done. Sometimes Zotero fails to import saved web page. It just hangs. If items counter does not increase for a minute or two then Zoter is stuck. If you have large collection, you may stumble upon such problem. Note stuck import number. Delete already imported data from Zotero, empty trash, re-export Scrapbook data using --exclude option to exclude offending entry, then import everything again. | ||
|
||
## Development information | ||
|
||
Prerequisites: python 2.7 for linux and windows, pytest, rdflib | ||
|
||
### Linux | ||
|
||
sudo apt-get install python pip | ||
sudo pip install rdflib | ||
git clone https://github.com/burbilog/scrapbook2zotero.git | ||
cd scrapbook2zotero | ||
./scrapbook2zotero.py ... | ||
|
||
### Building windows .exe under Linux | ||
|
||
Download latest python 2.7 for windows from https://www.python.org/downloads/windows/ and then | ||
|
||
wine msiexec /i python-2.7.14.msi /L*v log.txt | ||
wine pip install rdflib pyinstaller | ||
make win32 | ||
|
||
### Running the tests | ||
|
||
make win32 | ||
make test | ||
|
||
## TODO | ||
|
||
Currently there is no export for 'note' or 'notex' item types. I never used notes, so my 1400+ scrapbook entries contain no notes and I can't debug them. If somebody needs export of their notes, please contact me. | ||
|
||
## Author | ||
|
||
* **Roman V. Isaev** - [burbilog](https://github.com/burbilog) | ||
|
||
## Acknowledgments | ||
|
||
Scrapbook2zotero is loosely based on (https://bitbucket.org/himselfv/scraptools/src/default/) | ||
|
||
## License | ||
|
||
Licensed under the [MIT License](LICENSE.txt). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,155 @@ | ||
<rdf:RDF | ||
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" | ||
xmlns:z="http://www.zotero.org/namespaces/export#" | ||
xmlns:dcterms="http://purl.org/dc/terms/" | ||
xmlns:link="http://purl.org/rss/1.0/modules/link/" | ||
xmlns:dc="http://purl.org/dc/elements/1.1/" | ||
xmlns:bib="http://purl.org/net/biblio#"> | ||
<bib:Document rdf:about="http://www.popadancev.net/pulsejet/"> | ||
<z:itemType>webpage</z:itemType> | ||
<dcterms:isPartOf> | ||
<z:Website></z:Website> | ||
</dcterms:isPartOf> | ||
<link:link rdf:resource="#item_20180222112456"/> | ||
<dc:subject></dc:subject> | ||
<dc:identifier> | ||
<dcterms:URI> | ||
<rdf:value>http://www.popadancev.net/pulsejet/</rdf:value> | ||
</dcterms:URI> | ||
</dc:identifier> | ||
<dcterms:dateSubmitted>2018-02-22 11:24:56</dcterms:dateSubmitted> | ||
<dc:title>ПуВРД « Попаданцев.нет</dc:title> | ||
</bib:Document> | ||
<z:Attachment rdf:about="#item_20180222112456"> | ||
<z:itemType>attachment</z:itemType> | ||
<rdf:resource rdf:resource="scrapbook_test_data/data/20180222112456/index.html"/> | ||
<dc:identifier> | ||
<dcterms:URI> | ||
<rdf:value>http://www.popadancev.net/pulsejet/</rdf:value> | ||
</dcterms:URI> | ||
</dc:identifier> | ||
<dcterms:dateSubmitted>2018-02-22 11:24:56</dcterms:dateSubmitted> | ||
<dc:title>ПуВРД « Попаданцев.нет</dc:title> | ||
<z:linkMode>1</z:linkMode> | ||
<link:type>text/html</link:type> | ||
</z:Attachment> | ||
<bib:Document rdf:about="http://www.popadancev.net/laser/"> | ||
<z:itemType>webpage</z:itemType> | ||
<dcterms:isPartOf> | ||
<z:Website></z:Website> | ||
</dcterms:isPartOf> | ||
<link:link rdf:resource="#item_20180222115059"/> | ||
<dc:subject>Корневой каталог</dc:subject> | ||
<dc:identifier> | ||
<dcterms:URI> | ||
<rdf:value>http://www.popadancev.net/laser/</rdf:value> | ||
</dcterms:URI> | ||
</dc:identifier> | ||
<dcterms:dateSubmitted>2018-02-22 11:50:59</dcterms:dateSubmitted> | ||
<dc:title>Лазер « Попаданцев.нет</dc:title> | ||
</bib:Document> | ||
<z:Attachment rdf:about="#item_20180222115059"> | ||
<z:itemType>attachment</z:itemType> | ||
<rdf:resource rdf:resource="scrapbook_test_data/data/20180222115059/index.html"/> | ||
<dc:identifier> | ||
<dcterms:URI> | ||
<rdf:value>http://www.popadancev.net/laser/</rdf:value> | ||
</dcterms:URI> | ||
</dc:identifier> | ||
<dcterms:dateSubmitted>2018-02-22 11:50:59</dcterms:dateSubmitted> | ||
<dc:title>Лазер « Попаданцев.нет</dc:title> | ||
<z:linkMode>1</z:linkMode> | ||
<link:type>text/html</link:type> | ||
</z:Attachment> | ||
<bib:Document rdf:about="http://polit.ru/article/2010/07/01/zalizniak/"> | ||
<z:itemType>webpage</z:itemType> | ||
<dcterms:isPartOf> | ||
<z:Website></z:Website> | ||
</dcterms:isPartOf> | ||
<link:link rdf:resource="#item_20180222115534"/> | ||
<dc:subject>Второй корневой каталог/Подкаталогг</dc:subject> | ||
<dc:identifier> | ||
<dcterms:URI> | ||
<rdf:value>http://polit.ru/article/2010/07/01/zalizniak/</rdf:value> | ||
</dcterms:URI> | ||
</dc:identifier> | ||
<dcterms:dateSubmitted>2018-02-22 11:55:34</dcterms:dateSubmitted> | ||
<dc:title>Что такое любительская лингвистика - ПОЛИТ.РУ</dc:title> | ||
</bib:Document> | ||
<z:Attachment rdf:about="#item_20180222115534"> | ||
<z:itemType>attachment</z:itemType> | ||
<rdf:resource rdf:resource="scrapbook_test_data/data/20180222115534/index.html"/> | ||
<dc:identifier> | ||
<dcterms:URI> | ||
<rdf:value>http://polit.ru/article/2010/07/01/zalizniak/</rdf:value> | ||
</dcterms:URI> | ||
</dc:identifier> | ||
<dcterms:dateSubmitted>2018-02-22 11:55:34</dcterms:dateSubmitted> | ||
<dc:title>Что такое любительская лингвистика - ПОЛИТ.РУ</dc:title> | ||
<z:linkMode>1</z:linkMode> | ||
<link:type>text/html</link:type> | ||
</z:Attachment> | ||
<bib:Document rdf:about="http://linuxpitstop.com/install-seamonkey-on-ubuntu/"> | ||
<z:itemType>webpage</z:itemType> | ||
<dcterms:isPartOf> | ||
<z:Website></z:Website> | ||
</dcterms:isPartOf> | ||
<link:link rdf:resource="#item_20180222115430"/> | ||
<dc:subject>Второй корневой каталог</dc:subject> | ||
<dc:identifier> | ||
<dcterms:URI> | ||
<rdf:value>http://linuxpitstop.com/install-seamonkey-on-ubuntu/</rdf:value> | ||
</dcterms:URI> | ||
</dc:identifier> | ||
<dcterms:dateSubmitted>2018-02-22 11:54:30</dcterms:dateSubmitted> | ||
<dc:title>How to install SeaMonkey On Ubuntu Linux LinuxPitStop</dc:title> | ||
</bib:Document> | ||
<z:Attachment rdf:about="#item_20180222115430"> | ||
<z:itemType>attachment</z:itemType> | ||
<rdf:resource rdf:resource="scrapbook_test_data/data/20180222115430/index.html"/> | ||
<dc:identifier> | ||
<dcterms:URI> | ||
<rdf:value>http://linuxpitstop.com/install-seamonkey-on-ubuntu/</rdf:value> | ||
</dcterms:URI> | ||
</dc:identifier> | ||
<dcterms:dateSubmitted>2018-02-22 11:54:30</dcterms:dateSubmitted> | ||
<dc:title>How to install SeaMonkey On Ubuntu Linux LinuxPitStop</dc:title> | ||
<z:linkMode>1</z:linkMode> | ||
<link:type>text/html</link:type> | ||
</z:Attachment> | ||
<bib:Document rdf:about="http://www.lpi.usra.edu/opag/meetings/aug2015/presentations/day-2/11_beauchamp.pdf"> | ||
<z:itemType>webpage</z:itemType> | ||
<dcterms:isPartOf> | ||
<z:Website></z:Website> | ||
</dcterms:isPartOf> | ||
<link:link rdf:resource="#item_20170808125614"/> | ||
<link:link rdf:resource="#item_2017080812561401"/> | ||
<dc:subject></dc:subject> | ||
<dc:identifier> | ||
<dcterms:URI> | ||
<rdf:value>http://www.lpi.usra.edu/opag/meetings/aug2015/presentations/day-2/11_beauchamp.pdf</rdf:value> | ||
</dcterms:URI> | ||
</dc:identifier> | ||
<dcterms:dateSubmitted>2017-08-08 12:56:14</dcterms:dateSubmitted> | ||
<dc:title>Slide 1 - 11_beauchamp.pdf</dc:title> | ||
</bib:Document> | ||
<z:Attachment rdf:about="#item_20170808125614"> | ||
<z:itemType>attachment</z:itemType> | ||
<rdf:resource rdf:resource="scrapbook_test_data/data/20170808125614/index.html"/> | ||
<dc:identifier> | ||
<dcterms:URI> | ||
<rdf:value>http://www.lpi.usra.edu/opag/meetings/aug2015/presentations/day-2/11_beauchamp.pdf</rdf:value> | ||
</dcterms:URI> | ||
</dc:identifier> | ||
<dcterms:dateSubmitted>2017-08-08 12:56:14</dcterms:dateSubmitted> | ||
<dc:title>Slide 1 - 11_beauchamp.pdf</dc:title> | ||
<z:linkMode>1</z:linkMode> | ||
<link:type>text/html</link:type> | ||
</z:Attachment> | ||
<z:Attachment rdf:about="#item_2017080812561401"> | ||
<z:itemType>attachment</z:itemType> | ||
<rdf:resource rdf:resource="scrapbook_test_data/data/20170808125614/11_beauchamp.pdf"/> | ||
<dc:title>11_beauchamp.pdf</dc:title> | ||
<link:type>application/pdf</link:type> | ||
</z:Attachment> | ||
</rdf:RDF> |
Oops, something went wrong.