APIs has been created to store twitter streaming data and retrieve data based on applied filters. It is a set of 3 APIs-
- API to trigger Twitter Stream
- API to filter/search stored tweets
- API to export filtered data in CSV
Technologies used:
- Python
- Flask framework
- ElasticSearch
- Twitter Streaming API
- clone the project
- cd to project folder
cd twitterapibackend
and create virtual environmentvirtualenv venv
- activate virtual environment
source venv/bin/activate
- install requirements after activating virtual environment
pip install -r requirements.txt
- Change the twitter stream Api credentials in configure.py file.
To install and configure follow the given link of where all steps are clearly given.
https://www.tutorialspoint.com/articles/ install-and-configure-elasticsearch-in-ubuntu-14-04-3
Run the python runserver.py
(try to avoid running at 9200 port because elastic default port is also 9200 )
This API triggers twitter streaming and stores a curated version of the data returned by Twitter Streaming API. The streaming is done as per the given parameters.
API 1 - http://0.0.0.0:8080/api1?keywords=modi,AbkiBarModiSarkar,ModiForPM
(methods supported - GET, POST)
Where keywords can be any keyword for which streaming needs to be performed. Successful response
{
"status": "success",
"message": "Started streaming tweets with keywords [u'modi', u'AbkiBarModiSarkar', u'ModiForPM']"
}
***Default time for streaming is 30 seconds
This API fetches the data stored by the first api based on the filters and search keywords provided and sorts them as required.
Operators: Following operators are available in order to filter/query data/tweets -
-
equals
: Facilitates exact match, or = operator for numeric/datetime values. -
contains
: Facilitates full-text search. -
wildcard
:-
startswith
: *ind (Starts with ind), -
endswith
: ind* (Ends with ind), -
wildcard
: *ind* (searches ind anywhere in string)
-
-
gte
: >= operator for numeric/datetime values. -
gt
: > operator for numeric/datetime values. -
lte
: <= operator for numeric/datetime values. -
lt
: < operator for numeric/datetime values.
API 2- http://0.0.0.0:8080/api2?from=0&size=20
Body in Raw form
{
"sort":["created_at"],
"criteria": {
"AND": [{
"fields": ["created_at"],
"operator": "gte",
"query": "2017-12-17T14:18:13"
}, {
"fields": ["location"],
"operator": "wildcard",
"query": "*ndia*"
}
],
"OR": [{
"fields": ["hashtags"],
"operator": "contains",
"query": "" //anything that hastag contians ex-- "Modi"
}, {
"fields": ["hashtags"],
"operator": "contains",
"query": " "
}
],
"NOT": [{
"fields": ["source_device"],
"operator": "equals",
"query": "Twitter for Android"
}
]
}
}
You'll get the filtered tweets response
*** AND represents must, OR repesents should and NOT repesents must_not, as matched according to elasticsearch query attributes. *** Response may result to empty in case if it'll not find any relevant result according to provided query.
Example: Body Json:
"sort":["created_at"],
"criteria": {
"OR": [{
"fields": ["tweet_text"],
"operator": "contains",
"query": "modi"
}
]
}
}```
Response:
{
"count": {
"total": 16,
"fetched": 16
},
"results": [
{
"sort": [
1520414046000
],
"_type": "tweet",
"_source": {
"lang": "und",
"is_retweeted": false,
"retweet_count": 0,
"screen_name": "jova_novic",
"country": "",
"created_at": "2018-03-07T09:14:06",
"hashtags": [],
"tweet_text": "@pravoverna Kamenjarke nisu u modi vise,sad su trotoarke i zovu sebe starletama!",
"source_device": "Twitter for Android",
"reply_count": 0,
"location": "Kragujevac, Srbija",
"country_code": "",
"timestamp_ms": "1520414046040",
"user_name": "Jova Novic",
"favorite_count": 0
},
"_score": null,
"_index": "tweets_index",
"_id": "AWH_vTXoVKT_vQpI5__3"
},
{.....}
{.....}
{.....}
{.....}
]
}
## 3. API to export filtered data in CSV (/api3)
API 3 : `http://0.0.0.0:8080/api3`
(methods supported - GET, POST)
This API returns the data in CSV. Input should be given in same format as in api2 as json payload but here no need for providing pagination details.
Csv will be downloaded when puts request on browser and When posted in postman csv data will be reflected in response body and you can find attachment in header.