Skip to content

Commit

Permalink
Added SpamScope configuration file to Ansible. Fixed README
Browse files Browse the repository at this point in the history
  • Loading branch information
fedelemantuano committed Apr 2, 2018
1 parent 4026f93 commit 80a2c84
Show file tree
Hide file tree
Showing 8 changed files with 286 additions and 8 deletions.
23 changes: 23 additions & 0 deletions ansible/01_spamscope_install.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,29 @@
name: virtualenv
state: latest

- name: Make SpamScope worker folders
file:
path: "{{ item }}"
state: directory
with_items:
- "{{ spamscope_conf_path }}"
- "/var/log/spamscope"
- "/var/lib/spamscope/moved"
- "/var/lib/spamscope/failed"
- "/var/lib/spamscope/output"

- name: Copy SpamScope main configuration file
template:
src: templates/spamscope.yml.j2
dest: "{{ spamscope_conf_path }}/spamscope.yml"

- name: Copy others SpamScope configuration files
copy:
src: "files/{{ item }}"
dest: "{{ spamscope_conf_path }}/{{ item }}"
with_items:
- tika_whitelist.yml

- name: Download Apache Tika in local
become: false
get_url:
Expand Down
22 changes: 18 additions & 4 deletions ansible/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,12 +59,14 @@ rarlinux_url="https://www.rarlab.com/rar/{{ rarlinux_filename }}"
spamscope_version="develop"
spamscope_repo="https://github.com/SpamScope/spamscope.git"
spamscope_path="/opt/spamscope"
spamscope_conf_path="/etc/spamscope"
```

I don't explain all of them, but only those parameters that are important.
If you want upgrade Apache Storm change version to `distro_name`.
To give more memory to run Apache Storm use `worker_heap_memory_mb`, while `topology_worker_max_heap_size_mb` limit the heap space of SpamScope topology.
`apache_tika_version` is the version of Apache Tika and it's useful if you want upgrade this tool.
I don't explain all of them, but only those parameters that are important:
* `distro_name` if you want upgrade Apache Storm change version to;
* `worker_heap_memory_mb` to give more memory to run Apache Storm;
* `topology_worker_max_heap_size_mb` limit the heap space of SpamScope topology;
* `apache_tika_version` is the version of Apache Tika and it's useful if you want upgrade this tool.

# Playbooks
There are two playbooks to install SpamScope:
Expand Down Expand Up @@ -102,6 +104,9 @@ The list of tasks is:
Install lein TAGS: []
Install all SpamScope system dependencies TAGS: []
Install virtualenv TAGS: []
Make SpamScope worker folders TAGS: []
Copy SpamScope main configuration file TAGS: []
Copy others SpamScope configuration files TAGS: []
Download Apache Tika in local TAGS: []
Copy Apache Tika on server TAGS: []
Clone Faup TAGS: [git]
Expand All @@ -115,8 +120,17 @@ The list of tasks is:
Install SpamScope TAGS: []
Install SpamScope requirements optional TAGS: []
Enable SpamScope environment variable TAGS: []
```

With this playbook you will install a main SpamScope configuration path to test your installation:
* in `/var/lib/spamscope/moved` SpamScope moves the email analyzed
* in `/var/lib/spamscope/failed` SpamScope moves the email that it couldn't analyze
* in `/var/lib/spamscope/output` SpamScope save the output, if you use spamscope_debug topology, that saves the output on filesystem
* in `/var/log/spamscope` SpamScope puts the Python logs

The main configuration file in this installation enable only _Apache Tika_ and _SpamAssassin_.

# Installation
You can use `ansible` folder to install SpamScope.

Expand Down
10 changes: 10 additions & 0 deletions ansible/files/tika_whitelist.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
- application/msword
- application/octet-stream
- application/pdf
- application/vnd.openxmlformats-officedocument.wordprocessingml.document
- application/x-7z-compressed
- application/xml
- application/zip
- text/html
- text/plain
3 changes: 2 additions & 1 deletion ansible/hosts
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,5 @@ rarlinux_url="https://www.rarlab.com/rar/{{ rarlinux_filename }}"
# SpamScope
spamscope_version="develop"
spamscope_repo="https://github.com/SpamScope/spamscope.git"
spamscope_path="/opt/spamscope"
spamscope_path="/opt/spamscope"
spamscope_conf_path="/etc/spamscope"
2 changes: 1 addition & 1 deletion ansible/templates/spamscope.sh.j2
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
export LEIN_ROOT="yes"
export SPAMSCOPE_CONF_FILE=/etc/spamscope/spamscope.yml
export SPAMSCOPE_CONF_FILE="{{ spamscope_conf_path }}/spamscope.yml"
208 changes: 208 additions & 0 deletions ansible/templates/spamscope.yml.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,208 @@
---
# Spouts configurations
# Spout file on file system
files-mails:

# Reload new mails after reload.mails analyzed
reload.mails: 1000

# Post processing
post_processing:

# move or remove mails?
what: move

# if move where
where: /var/lib/spamscope/moved

# if failed move in where.failed
where.failed: /var/lib/spamscope/failed

# Mailboxes
mailboxes:
unittest:
mail_server: hostname
# Trust string is used to get sender IP address from mail server.
# More details:
# https://github.com/SpamScope/mail-parser/blob/v0.4.6/mailparser/__init__.py#L221
trust_string: "test_trust_string"
files_pattern: "mail_*"
priority: 1
path_mails: {{ spamscope_path }}/tests/samples

# This flag enables Outlook msg parsing for every mails in mailbox
# Default value is false
outlook: false


# Bolts configurations
# Phishing bolt configuration
phishing:
lists:
subjects:
# Suspect subjects
# Example in conf/keywords/subjects.example.yml
# generic: /path/to/generic_subjects

targets:
# Keyword for every targets
# Example in conf/keywords/targets.example.yml
# generic: /path/to/generic_targets


tokenizer:
# If true mails with same hash are filtered and not analyzed.
# Only the body will not saved
filter_mails: true

# Max number of hashes saved for filter function
maxlen_mails: 1000000

# If true attachments with same hash are filtered and not analyzed.
# Only hashes will be saved
filter_attachments: true

# Max number of hashes saved for filter function
maxlen_attachments: 1000000

# If true the same ip address is filtered and not analyzed.
filter_network: true

# Max number of hashes saved for filter function
maxlen_network: 1000000


# Network bolt configuration
network:
# Shodan analysis https://www.shodan.io/
shodan:
enabled: false
api_key: xxxxxxxxxxxxxxxxxxxxxxxxxx

# VirusTotal analysis https://www.virustotal.com/
virustotal:
enabled: false
api_key: xxxxxxxxxxxxxxxxxxxxxxxxxx


# RawMail bolt configuration
raw_mail:
# SpamAssassin analysis: https://spamassassin.apache.org/
spamassassin:
enabled: true


# Attachments bolt configuration
attachments:
# The lists of all components must be under lists keyword to load them
# automatically
commons:
lists:
blacklist_content_types:
# All content types to remove from results
# Example in content_types/blacklist/generic.example.yml
# generic: /path/to/generic_content_types

not_extract_content_types:
# All content types that you don't want extract from archive
# Example: application/java-archive (jar), you can save the jar
# but do not extract the class inside.
# generic: /path/to/generic_content_types

# Apache Tika analysis: https://tika.apache.org/
tika:
# Enable Tika but it's very slow:
enabled: true

path_jar: {{ install_path }}/tika-app-{{ apache_tika_version }}.jar

# Like parameter -Xmx of java application
memory_allocation:

# All content types to extract details
# Example in content_types/tika/generic.example.yml
lists:
whitelist_content_types:
generic: {{ spamscope_conf_path }}/tika_whitelist.yml

# VirusTotal analysis: https://www.virustotal.com/
virustotal:
enabled: false

api_key: xxxxxxxxxxxxxxxxxxxxxxxxxx

# All content types to analyze with virustotal
# Example in content_types/virustotal/generic.example.yml
# Now is not active
lists:
whitelist_content_types:
generic: /path/to/generic_content_types
custom: /path/to/custom_content_types

# Thug analysis: https://github.com/buffer/thug
thug:
enabled: false

# File extensions to submit to thug
extensions:
- .html
- .js
- .jse

# More details:
# http://buffer.github.io/thug/doc/usage.html#basic-usage
#
# list of user agents to use for analysis
# SpamScope gets start an analysis for user agent
user_agents:
- win7ie90

referer: http://www.google.com/

# Set the singe analysis timeout in seconds
# This value MUST be lower than supervisor.worker.timeout.secs
# of SpamScope:
# nr. user agents * timeout < supervisor.worker.timeout.secs
timeout: 10

# Set the connect timeout in seconds
connect_timeout: 1

disable_cert_logging: true
disable_code_logging: true

# Maximum pages to fetch
# For SpamScope a good value is 1 to have short analysis
threshold: 1

# Zemana Antimalware analysis: https://www.zemana.com/
# only premium users
zemana:
enabled: false

PartnerId: xxxxx
UserId: xxxxx
ApiKey: xxxxx
useragent: SpamScope

# This plugin store the samples on file system
# in date format subfolders (YYYY-MM-DD)
store_samples:
enabled: false
base_path: /tmp


# Urls
urls:
whitelists:
# Only second level domains to whitelisting
# Example in conf/whitelists/generic.example.yml
# alexa:
# path: /path/to/alexa
# expiry: 2016-06-28T12:33:00.000Z # date ISO 8601 only UTC


# Output debug bolt configuration
output-debug:
json.indent: 4
output.path: /var/lib/spamscope/output
18 changes: 18 additions & 0 deletions config.json
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,24 @@
"use_ssh_for_nimbus": false,
"virtualenv_root": ""
},
"prod_vm": {
"user": "root",
"nimbus": "localhost",
"workers": [
"localhost"
],
"log": {
"path": "/var/log/spamscope",
"max_bytes": 5000000,
"backup_count": 7,
"level": "info"
},
"use_virtualenv": false,
"install_virtualenv": false,
"use_ssh_for_nimbus": false,
"virtualenv_root": "",
"virtualenv_name": ""
},
"debug": {
"user": "fedelemantuano",
"nimbus": "localhost",
Expand Down
8 changes: 6 additions & 2 deletions src/cli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,16 @@ Submit options:
```
usage: spamscope-topology submit [-h]
[-g {spamscope_debug,spamscope_elasticsearch,spamscope_redis}]
[-w WORKERS] [-k TICK] [-p MAX_PENDING]
[-s SPOUT_SLEEP] [-t TIMEOUT]
[-e ENVIRONMENT] [-w WORKERS] [-k TICK]
[-p MAX_PENDING] [-s SPOUT_SLEEP]
[-t TIMEOUT]
optional arguments:
-h, --help show this help message and exit
-g {spamscope_debug,spamscope_elasticsearch,spamscope_redis}, --topology {spamscope_debug,spamscope_elasticsearch,spamscope_redis}
SpamScope topology.
-e ENVIRONMENT, --environment ENVIRONMENT
The environment to use for the command.
-w WORKERS, --workers WORKERS
Apache Storm workers for your topology.
-k TICK, --tick TICK Every tick seconds SpamScope configuration is
Expand All @@ -48,6 +51,7 @@ optional arguments:
-t TIMEOUT, --timeout TIMEOUT
How long (in s) between heartbeats until supervisor
considers that worker dead.
```

# spamscope-elasticsearch
Expand Down

0 comments on commit 80a2c84

Please sign in to comment.