Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Background pool #6

Merged
merged 8 commits into from
Oct 21, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,26 +1,27 @@
Package: AzureRMR
Title: Interface to 'Azure Resource Manager'
Version: 2.2.1
Version: 2.2.1.9000
Authors@R: c(
person("Hong", "Ooi", , "[email protected]", role = c("aut", "cre")),
person("Microsoft", role="cph")
)
Description: A lightweight but powerful R interface to the 'Azure Resource Manager' REST API. The package exposes classes and methods for 'OAuth' authentication and working with subscriptions and resource groups. It also provides functionality for creating and deleting 'Azure' resources and deploying templates. While 'AzureRMR' can be used to manage any 'Azure' service, it can also be extended by other packages to provide extra functionality for specific services. Part of the 'AzureR' family of packages.
Description: A lightweight but powerful R interface to the 'Azure Resource Manager' REST API. The package exposes a comprehensive class framework and related tools for creating, updating and deleting 'Azure' resource groups, resources and templates. While 'AzureRMR' can be used to manage any 'Azure' service, it can also be extended by other packages to provide extra functionality for specific services. Part of the 'AzureR' family of packages.
URL: https://github.com/Azure/AzureRMR https://github.com/Azure/AzureR
BugReports: https://github.com/Azure/AzureRMR/issues
License: MIT + file LICENSE
VignetteBuilder: knitr
Depends:
R (>= 3.3)
Imports:
Imports:
AzureGraph (>= 1.0.4),
AzureAuth (>= 1.2.1),
utils,
parallel,
httr (>= 1.3),
jsonlite,
R6,
uuid
Suggests:
AzureGraph,
knitr,
testthat,
httpuv
Expand Down
10 changes: 10 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,12 @@ export(clean_token_directory)
export(create_azure_login)
export(delete_azure_login)
export(delete_azure_token)
export(delete_pool)
export(format_public_fields)
export(format_public_methods)
export(get_azure_login)
export(get_azure_token)
export(init_pool)
export(is_azure_login)
export(is_azure_token)
export(is_azure_v1_token)
Expand All @@ -37,5 +39,13 @@ export(is_url)
export(list_azure_logins)
export(list_azure_tokens)
export(named_list)
export(pool_call)
export(pool_evalq)
export(pool_exists)
export(pool_export)
export(pool_lapply)
export(pool_map)
export(pool_sapply)
export(pool_size)
import(AzureAuth)
importFrom(utils,modifyList)
4 changes: 3 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# AzureRMR 2.2.1
# AzureRMR 2.2.1.9000

- New in this version is a facility for parallelising connections to Azure, using a pool of background processes. Some operations, such as downloading many small files or interacting with a cluster of VMs, can be sped up significantly by carrying them out in parallel rather than sequentially. The code for this is currently duplicated in multiple packages including AzureStor and AzureVM; putting it in AzureRMR removes the duplication and also makes it available to other packages that may benefit. See `?pool` for more details.
- Expose `do_operation` methods for subscription and resource group objects, similar to that for resources. This allows arbitrary operations on a sub or RG.
- AzureRMR now directly imports AzureGraph.
- Update default Resource Manager API version to "2019-08-01".

# AzureRMR 2.2.0
Expand Down
8 changes: 7 additions & 1 deletion R/AzureRMR.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
#' @importFrom utils modifyList
NULL

utils::globalVariables(c("self", "private"))
utils::globalVariables(c("self", "private", "pool"))

.onLoad <- function(libname, pkgname)
{
Expand All @@ -14,6 +14,12 @@ utils::globalVariables(c("self", "private"))
invisible(NULL)
}

.onUnLoad <- function(libname, pkgname)
{
if(exists("pool", envir=.AzureR))
try(parallel::stopCluster(.AzureR$pool), silent=TRUE)
}


# default authentication app ID: leverage the az CLI
.az_cli_app_id <- "04b07795-8ddb-461a-bbee-02f9e1bf7b46"
Expand Down
1 change: 1 addition & 0 deletions R/call_azure_rm.R
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ error_message <- function(cont)
cont$error$message
else if(is.list(cont$odata.error)) # OData
cont$odata.error$message$value
else ""
}
else ""

Expand Down
2 changes: 1 addition & 1 deletion R/make_graph_login.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
make_graph_login_from_token <- function(token, aad_host, graph_host)
{
if(is_empty(graph_host) || !requireNamespace("AzureGraph", quietly=TRUE))
if(is_empty(graph_host))
return()

message("Also creating Microsoft Graph login for ", format_tenant(token$tenant))
Expand Down
158 changes: 158 additions & 0 deletions R/pool.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
#' Manage parallel Azure connections
#'
#' @param size For `init_pool`, the number of background R processes to create. Limit this is you are low on memory.
#' @param restart For `init_pool`, whether to terminate an already running pool first.
#' @param ... Other arguments passed on to functions in the parallel package. See below.
#'
#' @details
#' AzureRMR provides the ability to parallelise communicating with Azure by utilizing a pool of R processes in the background. This often leads to major speedups in scenarios like downloading large numbers of small files, or working with a cluster of virtual machines. This functionality is intended for use by packages that extend AzureRMR (and was originally implemented as part of the AzureStor package), but can also be called directly by the end-user.
#'
#' A small API consisting of the following functions is currently provided for managing the pool. They pass their arguments down to the corresponding functions in the parallel package.
#' - `init_pool` initialises the pool, creating it if necessary. The pool is created by calling `parallel::makeCluster` with the pool size and any additional arguments. If `init_pool` is called and the current pool is smaller than `size`, it is resized.
#' - `delete_pool` shuts down the background processes and deletes the pool.
#' - `pool_exists` checks for the existence of the pool, returning a TRUE/FALSE value.
#' - `pool_size` returns the size of the pool, or zero if the pool does not exist.
#' - `pool_export` exports variables to the pool nodes. It calls `parallel::clusterExport` with the given arguments.
#' - `pool_lapply`, `pool_sapply` and `pool_map` carry out work on the pool. They call `parallel::parLapply`, `parallel::parSapply` and `parallel::clusterMap` with the given arguments.
#' - `pool_call` and `pool_evalq` execute code on the pool nodes. They call `parallel::clusterCall` and `parallel::clusterEvalQ` with the given arguments.
#'
#' The pool is persistent for the session or until terminated by `delete_pool`. You should initialise the pool by calling `init_pool` before running any code on it. This restores the original state of the pool nodes by removing any objects that may be in memory, and resetting the working directory to the master working directory.
#'
#' @seealso
#' [parallel::makeCluster], [parallel::clusterCall], [parallel::parLapply]
#' @examples
#' \dontrun{
#'
#' init_pool()
#'
#' pool_size()
#'
#' x <- 42
#' pool_export("x")
#' pool_sapply(1:5, function(i) i + x)
#'
#' init_pool()
#' # error: x no longer exists on nodes
#' try(pool_sapply(1:5, function(i) i + x))
#'
#' delete_pool()
#'
#' }
#' @rdname pool
#' @export
init_pool <- function(size=10, restart=FALSE, ...)
{
if(restart || !pool_exists() || pool_size() < size)
{
delete_pool()
message("Creating background pool")
.AzureR$pool <- parallel::makeCluster(size, ...)
pool_evalq(loadNamespace("AzureRMR"))
}
else
{
# restore original state, set working directory to master working directory
pool_call(function(wd)
{
setwd(wd)
rm(list=ls(envir=.GlobalEnv, all.names=TRUE), envir=.GlobalEnv)
}, wd=getwd())
}

invisible(NULL)
}


#' @rdname pool
#' @export
delete_pool <- function()
{
if(!pool_exists())
return(invisible(NULL))

message("Deleting background pool")
parallel::stopCluster(.AzureR$pool)
rm(pool, envir=.AzureR)
}


#' @rdname pool
#' @export
pool_exists <- function()
{
exists("pool", envir=.AzureR) && inherits(.AzureR$pool, "cluster")
}


#' @rdname pool
#' @export
pool_size <- function()
{
if(!pool_exists())
return(0)
length(.AzureR$pool)
}


#' @rdname pool
#' @export
pool_export <- function(...)
{
pool_check()
parallel::clusterExport(cl=.AzureR$pool, ...)
}


#' @rdname pool
#' @export
pool_lapply <- function(...)
{
pool_check()
parallel::parLapply(cl=.AzureR$pool, ...)
}


#' @rdname pool
#' @export
pool_sapply <- function(...)
{
pool_check()
parallel::parSapply(cl=.AzureR$pool, ...)
}


#' @rdname pool
#' @export
pool_map <- function(...)
{
pool_check()
parallel::clusterMap(cl=.AzureR$pool, ...)
}


#' @rdname pool
#' @export
pool_call <- function(...)
{
pool_check()
parallel::clusterCall(cl=.AzureR$pool, ...)
}


#' @rdname pool
#' @export
pool_evalq <- function(...)
{
pool_check()
parallel::clusterEvalQ(cl=.AzureR$pool, ...)
}


.AzureR <- new.env()


pool_check <- function()
{
if(!pool_exists())
stop("AzureR pool does not exist; call init_pool() to create it", call.=FALSE)
}
83 changes: 83 additions & 0 deletions man/pool.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

36 changes: 36 additions & 0 deletions tests/testthat/test00_pool.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
context("pool")

skip_on_cran()

test_that("Background process pool works",
{
expect_false(pool_exists())
expect_error(AzureRMR:::pool_check())
expect_error(pool_sapply(1:5, function(x) x))

init_pool(5)
expect_true(pool_exists())
expect_identical(pool_size(), 5L)

res <- pool_sapply(1:5, function(x) x)
expect_identical(res, 1:5)

res2 <- pool_lapply(1:5, function(x) x)
expect_identical(res2, list(1L, 2L, 3L, 4L, 5L))

res3 <- pool_map(function(x, y) x + y, 1:5, 2)
expect_identical(res3, list(3, 4, 5, 6, 7))

y <- 42
pool_export("y", environment())
rm(y) # work around testthat environment shenanigans
res <- pool_sapply(1:5, function(x) y)
expect_identical(res, rep(42, 5))

init_pool(5)
expect_true(all(sapply(pool_evalq(ls()), is_empty)))
expect_error(pool_sapply(1:5, function(x) y))

delete_pool()
expect_false(pool_exists())
})