Blacklist Name Matcher

Description:

If using a file: Imports a file with a list of blacklisted names, cleans it against a file with irrelevant names and contents
If using EUROPA sanctions list: Imports XML from EUROPA sanctions website, processes it to list type
Then implements a strict find against a query name,
If no results are found, a modified (partial) search is implemented

scraper.py (1 method) - scrape EUROPA XML data and process to list type
importer.py (1 method) - import file
processor.py (1 method) - process file to list
cleaner.py (1 method) - clean file against a noisefile with irrelevant words
terrorist_finder.py (3 methods) - find matches in file against query
main.py - command line program

Data:

blacklist.tsv, blacklist.txt, ...
noisefile.tsv, noisefile.txt, ...

Example command line process

tanel@tanel:~/Documents/pyScript$ python main.py
Please enter name to search in the terrorist list 
> Robert Mugabe
Do you want to import a file or use the EUROPA database? (Input 'file', otherwise EUROPA is used) 
> europa
We will use the default sanctions list on ec.europa.eu and 8110 records (as at 27.12.2016)

Give it a few seconds...

------------------------------------------------------------------------------
IMPORTANT!
If this text is followed by an error
it is most likely you requested # names that's more than there are in the list
The list has  8110  existing names
------------------------------------------------------------------------------
No strict match, looking for partial matches...
TERRORIST MATCHED!
Certainty:  100.0 %
Name:  Robert Gabriel Mugabe

TERRORIST MATCHED!
Certainty:  100.0 %
Name:  Robert Gabriel Mugabe

TERRORIST MATCHED!
Certainty:  50.0 %
Name:  Grace Mugabe

TERRORIST MATCHED!
Certainty:  50.0 %
Name:  Grace Mugabe

TERRORIST MATCHED!
Certainty:  50.0 %
Name:  Robert Konars

Program logic

Program setup:

Details

file has one name in every row (no XML, JSON,.., formatting)
common filetypes: txt, csv, tsv

Tech used:

Python 2.7
Lubuntu 16.04

Current issues:

every time a name is queried, data is reimported and processed
- In reality, a cronjob would do it in every X amount of time
Partial matches are not ordered
User raw input is not cleaned
Does not handle foreign keyboards (e.g. kirillitsa)

Content works, not yet implemented:

(levenshteinDistance.py): fuzzy search using Levenshtein distance, if strict and partial matches don't return anything

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Blacklist Name Matcher

Description:

Contents

Components:

Data:

Example command line process

Program logic

Program setup:

Details

Tech used:

Current issues:

Content works, not yet implemented:

About

Releases 4

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.gitignore		.gitignore
blacklist.tsv		blacklist.tsv
blacklist.txt		blacklist.txt
cleaner.py		cleaner.py
importer.py		importer.py
inprogress_levenshteinDistance.py		inprogress_levenshteinDistance.py
license.txt		license.txt
main.py		main.py
noisefile.tsv		noisefile.tsv
noisefile.txt		noisefile.txt
processor.py		processor.py
readme.md		readme.md
requirements.txt		requirements.txt
scraper.py		scraper.py
scraper.pyc		scraper.pyc
terrorist_finder.py		terrorist_finder.py

License

tanel3203/blacklistNameMatcher

Folders and files

Latest commit

History

Repository files navigation

Blacklist Name Matcher

Description:

Contents

Components:

Data:

Example command line process

Program logic

Program setup:

Details

Tech used:

Current issues:

Content works, not yet implemented:

About

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Languages

Packages