- If using a file: Imports a file with a list of blacklisted names, cleans it against a file with irrelevant names and contents
- If using EUROPA sanctions list: Imports XML from EUROPA sanctions website, processes it to list type
- Then implements a strict find against a query name,
- If no results are found, a modified (partial) search is implemented
- scraper.py (1 method) - scrape EUROPA XML data and process to list type
- importer.py (1 method) - import file
- processor.py (1 method) - process file to list
- cleaner.py (1 method) - clean file against a noisefile with irrelevant words
- terrorist_finder.py (3 methods) - find matches in file against query
- main.py - command line program
- blacklist.tsv, blacklist.txt, ...
- noisefile.tsv, noisefile.txt, ...
tanel@tanel:~/Documents/pyScript$ python main.py
Please enter name to search in the terrorist list
> Robert Mugabe
Do you want to import a file or use the EUROPA database? (Input 'file', otherwise EUROPA is used)
> europa
We will use the default sanctions list on ec.europa.eu and 8110 records (as at 27.12.2016)
Give it a few seconds...
------------------------------------------------------------------------------
IMPORTANT!
If this text is followed by an error
it is most likely you requested # names that's more than there are in the list
The list has 8110 existing names
------------------------------------------------------------------------------
No strict match, looking for partial matches...
TERRORIST MATCHED!
Certainty: 100.0 %
Name: Robert Gabriel Mugabe
TERRORIST MATCHED!
Certainty: 100.0 %
Name: Robert Gabriel Mugabe
TERRORIST MATCHED!
Certainty: 50.0 %
Name: Grace Mugabe
TERRORIST MATCHED!
Certainty: 50.0 %
Name: Grace Mugabe
TERRORIST MATCHED!
Certainty: 50.0 %
Name: Robert Konars
- file has one name in every row (no XML, JSON,.., formatting)
- common filetypes: txt, csv, tsv
- Python 2.7
- Lubuntu 16.04
- every time a name is queried, data is reimported and processed
-
- In reality, a cronjob would do it in every X amount of time
- Partial matches are not ordered
- User raw input is not cleaned
- Does not handle foreign keyboards (e.g. kirillitsa)
- (levenshteinDistance.py): fuzzy search using Levenshtein distance, if strict and partial matches don't return anything