ElasticSearch Hackathon Material
All attendees:
- Git and git client (to download or share code)
- A GitHub account (to share your creations)
- Text editor or IDE of choice
- Either the native Java Client (see provided skeleton Java ES Client project ), or an ElasticSearch client for the language of your choice: http://www.elasticsearch.org/guide/clients/
It's recommended that you download and play with Elasticsearch locally if only to get familiar with the basic commands.
http://www.elasticsearch.org/guide/reference/setup/installation/
- Loaded on elasticsearch cluster on cluster-7-slave-00.sl.hackreduce.net (visual cuslter representation can be see through the ElasticSearch Head Plugin)
- Two indices are available:
- wikipedia: collection of english wikipedia articles and tweets. About 13 million records. Mapping: https://gist.github.com/imotov/5169928
- enron: collection of emails from Enron Email Dataset. About 0.5mln records. Mapping: https://gist.github.com/imotov/5169937
This data is loaded in MongoDB so that you can re-index it into ES in any way you find interesting:
- Loaded on Mongo instance on cluster-7-data-00.sl.hackreduce.net
- Mongo URI: mongodb://cluster-7-data-00.sl.hackreduce.net:28953
- Database name: traackr
- Two collections are available:
- posts: collection of articles and tweets. About 23 million records. JSON data structure: https://gist.github.com/gpstathis/5170137
- influencers: collection of authors corresponding to the articles in the “posts” collection. About 85K records. JSON data structure: https://gist.github.com/gpstathis/5170171
- Plugin Directory
- Native Script
- Analysis - https://github.com/elasticsearch/elasticsearch-analysis-icu, https://github.com/spinscale/elasticsearch-opennlp-plugin
- River
- REST API
- Script Facets
- CSV data loader (Ruby)
- JSON data loader (Ruby)
- CSV data loader (Perl)
- JSON data loader (Clojure)
- Enron data loader (Python)
- Two skeleton projects are availalbe to get you up and running right away: Java or Python
- Using the Java driver
- Java Driver Examples Code
- Using the Python driver
- Python driver tutorial
- How to connect to the Hack/Reduce MongoDB Shell via local client:
- Install MongoDB in your local environment
- Ubuntu / Debian:
sudo apt-get update; sudo apt-get install mongodb
- Fedora / RedHat:
sudo yum install mongodb
- Test if installed successfully:
mongo --version
- Connect to Mongo instance on cluster-7-data-00.sl.hackreduce.net:
mongo cluster-7-data-00.sl.hackreduce.net:28953/traackr
- Ubuntu / Debian:
- Install MongoDB in your local environment