Skip to content

Commit

Permalink
Implemented reddit thread collection and actor network creation (#3)
Browse files Browse the repository at this point in the history
* Added graph attribute for reddit
* Refactored reddit create network and updated package documentation
* Reimplemented reddit collection without oauth (future version)
* Added reddit Create function graph attributes and types
* Updated reddit function and parameter names
* Updated minimum package versions for imports
  • Loading branch information
bryn-g authored Nov 26, 2018
1 parent c3f4df5 commit 5397dfd
Show file tree
Hide file tree
Showing 25 changed files with 620 additions and 379 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -41,3 +41,4 @@ cred.R
*.rds
*.csv
*.graphml
.Rproj.user
50 changes: 32 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## What does this package do?

`vosonSML` is an R package that provides a suite of tools for collecting and constructing networks from social media data. It provides easy-to-use functions for collecting data across popular platforms (Instagram, Facebook, Twitter, and YouTube) and generating different types of networks for analysis.
`vosonSML` is an R package that provides a suite of tools for collecting and constructing networks from social media data. It provides easy-to-use functions for collecting data across popular platforms (Twitter, YouTube, Reddit, Instagram and Facebook) and generating different types of networks for analysis.

`vosonSML` is the `SocialMediaLab` package, renamed. We decided that `SocialMediaLab` was a bit too generic and also we wanted to indicate the connection to the [Virtual Observatory for the Study of Online Networks Lab](http://vosonlab.net), where this package was conceived and created.

Expand All @@ -18,15 +18,15 @@ If you are having trouble getting data from Facebook it is probably due to a num

#### Twitter

If you are getting the error `Error in check_twitter_oauth( )`, please find a [solution here](https://github.com/geoffjentry/twitteR/issues/90).
If you are getting the error `Error in check_twitter_oauth()`, please find a [solution here](https://github.com/geoffjentry/twitteR/issues/90).

#### Instagram

Instagram API access is severely limited if you do not have an authorised app, which is significantly harder to obtain nowadays.

### Special thanks

This package would not be possible without key packages by other authors in the R community, particularly: [igraph](https://github.com/igraph/rigraph), [Rfacebook](https://github.com/pablobarbera/Rfacebook), [instaR](https://github.com/pablobarbera/instaR), [twitteR](https://github.com/geoffjentry/twitteR), [data.table](https://github.com/Rdatatable/data.table), [tm](https://cran.r-project.org/web/packages/tm/index.html), and [httr](https://github.com/hadley/httr).
This package would not be possible without key packages by other authors in the R community, particularly: [igraph](https://github.com/igraph/rigraph), [twitteR](https://github.com/geoffjentry/twitteR), [RedditExtractoR](https://github.com/ivan-rivera/RedditExtractoR), [instaR](https://github.com/pablobarbera/instaR), [Rfacebook](https://github.com/pablobarbera/Rfacebook), [data.table](https://github.com/Rdatatable/data.table), [tm](https://cran.r-project.org/web/packages/tm/index.html), and [httr](https://github.com/hadley/httr).

## Getting started

Expand All @@ -36,23 +36,37 @@ The [vosonSML page on the VOSON website](http://vosonlab.net/vosonSML) also has

## Using Magrittr's pipe interface

The process of authentication, data collection and creating social network can be expressed with the 3 verb functions: *Authenticate*, *Collect* and *Create*. The following are some of the examples from the package documentation expressed with the pipe interface.
The process of authentication, data collection and creating social network is now expressed with the 3 verb functions: *Authenticate*, *Collect* and *Create*. The following are some of the examples from the package documentation using the pipe interface.

```{r}
require(magrittr)
# Authenticate with youtube, Collect data from youtube and Create an actor network
Authenticate("youtube", apiKey= apiKey) %>% Collect(videoIDs = videoIDs) %>% Create("Actor")
# Authenticate with facebook, archive the API credential, Collect data about Starwars Page and Create a bimodal network
# You can use facebook, FaCebooK or Facebook in the datasource field
Authenticate("Facebook", appID = appID, appSecret = appSecret) %>% SaveCredential("FBCredential.RDS") %>% Collect(pageName="StarWars", rangeFrom="2015-05-01",rangeTo="2015-06-03") %>% Create("Bimodal")
```R
library(magrittr)
library(vosonSML)

# Authenticate with Twitter, Collect data about #auspol and Create a semantic network
Authenticate("twitter", apiKey=myapikey, apiSecret=myapisecret,accessToken=myaccesstoken, accessTokenSecret=myaccesstokensecret) %>% Collect(searchTerm="#auspol", numTweets=150) %>% Create("Semantic")
# Create Instagram Ego Network
myUsernames <-
Authenticate("instagram", appID = myAppId, appSecret = myAppSecret) %>% Collect(ego = TRUE, username = c("adam_kinzinger","senatorreid")) %>% Create
# Authenticate with youtube, Collect data from youtube and Create an actor network
actorNetwork <- Authenticate("youtube", apiKey = myYoutubeAPIKey) %>%
Collect(videoIDs = myYoutubeVideoIds) %>% Create("actor")

# Authenticate with twitter, Collect 150 tweets for the "#auspol" hashtag and Create a semantic network
semanticNetwork <- Authenticate("twitter", apiKey = myTwitAPIKey, apiSecret = myTwitAPISecret,
accessToken = myTwitAccessToken,
accessTokenSecret = myTwitAccessTokenSecret) %>%
Collect(searchTerm = "#auspol", numTweets = 150) %>% Create("semantic")

# Collect reddit threads and Create an actor network with comment text as edge attribute
actorCommentsNetwork <- Authenticate("reddit") %>%
Collect(threadUrls = myThreadUrls, waitTime = 5) %>%
Create("actor", includeTextData = TRUE)

# Authenticate with facebook, archive the API credential, Collect data about the "Starwars" Page and
# Create a bimodal network
bimodalNetwork <- Authenticate("facebook", appID = myFacebookAppId, appSecret = myFacebookAppSecret) %>%
SaveCredential("FBCredential.RDS") %>%
Collect(pageName = "StarWars", rangeFrom = "2015-05-01", rangeTo = "2015-06-03") %>%
Create("bimodal")

# Create an instagram ego network for provided users
egoNetwork <- Authenticate("instagram", appID = myInstaAppId, appSecret = myInstaAppSecret) %>%
Collect(ego = TRUE, username = c("adam_kinzinger", "senatorreid")) %>% Create()
```

## Example networks
Expand Down
3 changes: 0 additions & 3 deletions vosonSML/.lintr

This file was deleted.

20 changes: 13 additions & 7 deletions vosonSML/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,12 +1,18 @@
Package: vosonSML
Version: 0.23.5
Date: 2018-11-01
Version: 0.24.0
Title: Tools for Collecting Social Media Data and Generating Networks for Analysis
Description: A suite of tools for collecting and constructing networks from social media data. Provides easy-to-use functions for collecting data across popular platforms (Instagram, Facebook, Twitter, and YouTube) and generating different types of networks for analysis.
Description: A suite of tools for collecting and constructing networks from social media data.
Provides easy-to-use functions for collecting data across popular platforms (Instagram,
Facebook, Twitter, YouTube and Reddit) and generating different types of networks for analysis.
Type: Package
Imports: tm, stringr, twitteR, RCurl, bitops, rjson, plyr, igraph, Rfacebook (>= 0.6.15), Hmisc, data.table, httpuv, instaR, methods, httr
Suggests: magrittr, testthat
Author: Timothy Graham & Robert Ackland with contributions from Bryan Gertzel & Chung-hong Chan
Imports: tm, stringr, twitteR, RCurl, bitops, rjson, plyr, igraph (>= 1.2.2), Rfacebook (>= 0.6.15),
Hmisc, data.table, httpuv, instaR, methods, httr, RedditExtractoR (>= 2.1.2), magrittr,
dplyr (>= 0.7.8), rlang (>= 0.3.0.1)
Depends: R (>= 3.2.0)
Suggests: testthat
Encoding: UTF-8
Author: Timothy Graham, Robert Ackland, Chung-hong Chan, Bryan Gertzel
Maintainer: Bryan Gertzel <[email protected]>
License: GPL (>= 2)
RoxygenNote: 6.1.0
RoxygenNote: 6.1.1
NeedsCompilation: no
18 changes: 18 additions & 0 deletions vosonSML/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,30 @@ import(methods)
import(rjson)
import(tm)
importFrom(Hmisc,escapeRegex)
importFrom(RedditExtractoR,reddit_content)
importFrom(RedditExtractoR,user_network)
importFrom(Rfacebook,fbOAuth)
importFrom(Rfacebook,getPage)
importFrom(Rfacebook,getPost)
importFrom(Rfacebook,getUsers)
importFrom(dplyr,coalesce)
importFrom(dplyr,filter)
importFrom(dplyr,group_by)
importFrom(dplyr,left_join)
importFrom(dplyr,mutate)
importFrom(dplyr,rename)
importFrom(dplyr,row_number)
importFrom(dplyr,select)
importFrom(dplyr,summarise)
importFrom(dplyr,ungroup)
importFrom(igraph,'V<-')
importFrom(igraph,V)
importFrom(igraph,delete.vertices)
importFrom(igraph,delete_vertex_attr)
importFrom(igraph,graph.data.frame)
importFrom(igraph,graph_from_data_frame)
importFrom(igraph,set.graph.attribute)
importFrom(igraph,set_graph_attr)
importFrom(igraph,simplify)
importFrom(igraph,write.graph)
importFrom(instaR,getComments)
Expand All @@ -49,7 +65,9 @@ importFrom(instaR,getLikes)
importFrom(instaR,getUser)
importFrom(instaR,instaOAuth)
importFrom(instaR,searchInstagram)
importFrom(magrittr,'%>%')
importFrom(plyr,ldply)
importFrom(rlang,'.data')
importFrom(stats,'na.omit')
importFrom(stringr,str_extract)
importFrom(stringr,str_match_all)
Expand Down
63 changes: 35 additions & 28 deletions vosonSML/R/Authenticate.R
Original file line number Diff line number Diff line change
Expand Up @@ -21,20 +21,21 @@
#' \code{Collect}, \code{Create} workflow.
#'
#' @param socialmedia character string, social media API to authenticate,
#' currently supports "facebook", "youtube", "twitter" and "instagram"
#' currently supports "facebook", "youtube", "twitter", "instagram" and "reddit"
#' @param ... additional parameters for authentication
#' \code{facebook}: appID, appSecret
#' \code{youtube}: apiKey
#' \code{twitter}: apiKey, apiSecret, accessToken, accessTokenSecret
#' \code{instagram}: appID, appSecret
#'
#' \code{reddit}: appName, appKey, appSecret, useTokenCache
#'
#' @return credential object with authentication information
#'
#'
#' @note Currently, \code{Authenticate} with socialmedia = "twitter" generates
#' oauth information to be used in the current active session only (i.e.
#' "side-effect") and no authentication-related information will be stored in
#' the returned \code{credential} object.
#'
#'
#' @author Chung-hong Chan <chainsawtiney@@gmail.com>
#' @seealso \code{\link{AuthenticateWithFacebookAPI}},
#' \code{\link{AuthenticateWithInstagramAPI}},
Expand Down Expand Up @@ -64,17 +65,18 @@
#' }
#' @export
Authenticate <- function(socialmedia, ...) {
authenticator <- switch(tolower(socialmedia),
facebook = facebookAuthenticator,
youtube = youtubeAuthenticator,
twitter = twitterAuthenticator,
instagram = instagramAuthenticator,
stop("Unknown socialmedia")
)
auth <- authenticator(...)
credential <- list(socialmedia = tolower(socialmedia), auth = auth)
class(credential) <- append(class(credential), "credential")
return(credential)
authenticator <- switch(tolower(socialmedia),
facebook = facebookAuthenticator,
youtube = youtubeAuthenticator,
twitter = twitterAuthenticator,
instagram = instagramAuthenticator,
reddit = redditAuthenticator,
stop("Unknown socialmedia")
)
auth <- authenticator(...)
credential <- list(socialmedia = tolower(socialmedia), auth = auth)
class(credential) <- append(class(credential), "credential")
return(credential)
}

### For the side effect of saving the credential into a file.
Expand Down Expand Up @@ -114,42 +116,47 @@ Authenticate <- function(socialmedia, ...) {
#' }
#' @export
SaveCredential <- function(credential, filename = "credential.RDS") {
if (credential$socialmedia == "twitter") {
warning("Credential created for Twitter will not be saved.")
} else {
saveRDS(credential, filename)
}
return(credential)
if (credential$socialmedia == "twitter") {
warning("Credential created for Twitter will not be saved.")
} else {
saveRDS(credential, filename)
}
return(credential)
}

#' @rdname SaveCredential
#' @export
LoadCredential <- function(filename = "credential.RDS") {
credential <- readRDS(filename)
return(credential)
credential <- readRDS(filename)
return(credential)
}

### *Authenticator functions should not be exported. It is just a bunch of helper functions to bridge the AuthenticateWith* functions with Authenticate(), but with datasource as the first argument and always return an auth object

### As a convention, function starts with lower case shouldn't be exported.

youtubeAuthenticator <- function(apiKey) {
return(authenticateWithYoutubeAPI(apiKey))
return(authenticateWithYoutubeAPI(apiKey))
}

### Currently, this Authenticator will return nothing, only for its side effect
### SAD!!!!!!!!!!!!!!!!!!
### i.e. cannot use SaveCredential and LoadCredential!

twitterAuthenticator <- function(apiKey, apiSecret, accessToken, accessTokenSecret, createToken) {
AuthenticateWithTwitterAPI(api_key = apiKey, api_secret = apiSecret, access_token = accessToken, access_token_secret = accessTokenSecret, createToken = createToken) # ah, only for its side effect, really bad design decision, twitteR!
return(NULL)
AuthenticateWithTwitterAPI(api_key = apiKey, api_secret = apiSecret, access_token = accessToken, access_token_secret = accessTokenSecret, createToken = createToken) # ah, only for its side effect, really bad design decision, twitteR!
return(NULL)
}

facebookAuthenticator <- function(appID, appSecret, extendedPermissions = FALSE) {
return(AuthenticateWithFacebookAPI(appID, appSecret, extended_permissions = extendedPermissions, useCachedToken = FALSE))
return(AuthenticateWithFacebookAPI(appID, appSecret, extended_permissions = extendedPermissions, useCachedToken = FALSE))
}

instagramAuthenticator <- function(appID, appSecret) {
return(AuthenticateWithInstagramAPI(appID, appSecret))
return(AuthenticateWithInstagramAPI(appID, appSecret))
}

redditAuthenticator <- function(appName, appKey, appSecret, useTokenCache) {
# return(AuthenticateWithRedditAPI(appName, appKey, appSecret, useTokenCache))
return(NULL)
}
50 changes: 50 additions & 0 deletions vosonSML/R/AuthenticateWithRedditAPI.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#' Reddit API authentication.
#'
#' OAuth2 based authentication with the Reddit API that returns an authentication token.
#'
#' The httr package has a known OAuth2 issue with its parameter "use_basic_auth", The default value is set to FALSE
#' and is missing parameter pass through meaning it can not be set to TRUE as required by reddit oauth2 authentication.
#' The point patch devtools::install_github("r-lib/httr#485") fixes this issue.
#' Further information: https://github.com/r-lib/httr/issues/482
#'
#' Reddit oauth tokens are only valid for one hour and using cached token will subsequently produce 401 errors.
#'
#' @param appName character string containing the reddit app name associated with the API key.
#' @param appKey character string containing the app key.
#' @param appSecret character string containing the app secret.
#' @param useTokenCache logical. Use cached authentication token if found.
#'
#' @return a reddit authentication token
#'
AuthenticateWithRedditAPI <- function(appName, appKey, appSecret, useTokenCache) {

if (missing(appName)) {
appName <- "reddit"
}

if (missing(appKey) | missing(appSecret)) {
cat("Error. One or more API credentials are missing.\nPlease specify these.\n")
return()
}

if (missing(useTokenCache)) {
useTokenCache <- FALSE
}

# sets up oauth2 for reddit
reddit_endpoint <- httr::oauth_endpoint(
authorize = "https://www.reddit.com/api/v1/authorize",
access = "https://www.reddit.com/api/v1/access_token"
)

reddit_app <- httr::oauth_app(appName, key = appKey, secret = appSecret)

reddit_token <- httr::oauth2.0_token(reddit_endpoint, reddit_app,
user_params = list(duration = "permanent"),
scope = c("read"),
use_basic_auth = TRUE,
config_init = user_agent("httr oauth"),
cache = useTokenCache)

return(reddit_token)
}
18 changes: 18 additions & 0 deletions vosonSML/R/AuthenticateWithYoutubeAPI.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#' YouTube API Authentication
#'
#' OAuth based authentication with the Google API.
#'
#' In order to collect data from YouTube, the user must first authenticate with Google's Application Programming
#' Interface (API). Users can obtain a Google Developer API key at: https://console.developers.google.com.
#'
#' @param apiKeyYoutube character string specifying your Google Developer API key.
#'
#' @return This is called for its side effect.
#'
#' @note In the future this function will enable users to save the API key in working directory, and the function will
#' automatically look for a locally stored key whenever it is called without apiKeyYoutube argument.
#'
#' @noRd
authenticateWithYoutubeAPI <- function(apiKeyYoutube) {
return(apiKeyYoutube)
}
20 changes: 14 additions & 6 deletions vosonSML/R/Collect.R
Original file line number Diff line number Diff line change
Expand Up @@ -17,18 +17,20 @@
#' \code{facebook}: pageName, rangeFrom, rangeTo, verbose, n, writeToFile, dynamic
#' \code{youtube}: videoIDs, verbose, writeToFile, maxComments
#' \code{twitter}: searchTerm, numTweets, verbose, writeToFile, language
#' \code{instagram}: credential, tag, n, lat, lng, distance, folder, mindate, maxdate, verbose, sleep, writeToFile,
#' \code{instagram}: credential, tag, n, lat, lng, distance, folder, mindate, maxdate, verbose, sleep, writeToFile,
#' waitForRateLimit
#'
#' \code{reddit}: threadUrls, waitTime, writeToFile
#'
#' \code{instagram} with \code{ego} = TRUE: username, userid, verbose,
#' degreeEgoNet, waitForRateLimit, getFollows
#' @return A data.frame object of class \code{dataSource.*} that can be used
#' with \code{Create}.
#' @author Chung-hong Chan <chainsawtiney@@gmail.com>
#' @seealso \code{CollectDataFromFacebook},
#' \code{CollectDataFromInstagram},
#' \code{CollectDatFromTwitter},
#' \code{CollectEgoInstagram}
#' @seealso \code{CollectDataFacebook},
#' \code{CollectDataInstagram},
#' \code{CollectDataTwitter},
#' \code{CollectEgoInstagram},
#' \code{CollectDataReddit},
#' @examples
#'
#' \dontrun{
Expand All @@ -50,6 +52,7 @@
#' Authenticate("youtube",
#' apiKey = my_apiKeyYoutube) %>% Collect(videoIDs = videoIDs) %>% Create('actor')
#' }
#'
#' @export
Collect <- function(credential, ego = FALSE, ...) {
if (ego) {
Expand All @@ -63,6 +66,7 @@ Collect <- function(credential, ego = FALSE, ...) {
youtube = youtubeCollector,
twitter = twitterCollector,
instagram = instagramCollector,
reddit = redditCollector,
stop("Unsupported socialmedia")
)
}
Expand Down Expand Up @@ -92,3 +96,7 @@ instagramCollector <- function(credential, tag, n, lat, lng, distance, folder, m
instagramEgo <- function(credential, username, userid, verbose, degreeEgoNet, waitForRateLimit, getFollows) {
return(CollectEgoInstagram(username, userid, verbose, degreeEgoNet, waitForRateLimit, getFollows, credential))
}

redditCollector <- function(credential, threadUrls, waitTime, writeToFile) {
return(CollectDataReddit(threadUrls, waitTime, writeToFile))
}
Loading

0 comments on commit 5397dfd

Please sign in to comment.