Skip to content

Commit 7083ec0

Browse files
viiryaHyukjinKwon
authored andcommitted
[SPARK-28215][SQL][R] as_tibble was removed from Arrow R API
## What changes were proposed in this pull request? New R api of Arrow has removed `as_tibble` as of apache/arrow@2ef96c8. Arrow optimization for DataFrame in R doesn't work due to the change. This can be tested as below, after installing latest Arrow: ``` ./bin/sparkR --conf spark.sql.execution.arrow.sparkr.enabled=true ``` ``` > collect(createDataFrame(mtcars)) ``` Before this PR: ``` > collect(createDataFrame(mtcars)) Error in get("as_tibble", envir = asNamespace("arrow")) : object 'as_tibble' not found ``` After: ``` > collect(createDataFrame(mtcars)) mpg cyl disp hp drat wt qsec vs am gear carb 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 ... ``` ## How was this patch tested? Manual test. Closes apache#25012 from viirya/SPARK-28215. Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
1 parent bc4a676 commit 7083ec0

File tree

2 files changed

+18
-5
lines changed

2 files changed

+18
-5
lines changed

R/pkg/R/DataFrame.R

+8-2
Original file line numberDiff line numberDiff line change
@@ -1203,7 +1203,8 @@ setMethod("collect",
12031203
requireNamespace1 <- requireNamespace
12041204
if (requireNamespace1("arrow", quietly = TRUE)) {
12051205
read_arrow <- get("read_arrow", envir = asNamespace("arrow"), inherits = FALSE)
1206-
as_tibble <- get("as_tibble", envir = asNamespace("arrow"))
1206+
# Arrow drops `as_tibble` since 0.14.0, see ARROW-5190.
1207+
useAsTibble <- exists("as_tibble", envir = asNamespace("arrow"))
12071208

12081209
portAuth <- callJMethod(x@sdf, "collectAsArrowToR")
12091210
port <- portAuth[[1]]
@@ -1213,7 +1214,12 @@ setMethod("collect",
12131214
output <- tryCatch({
12141215
doServerAuth(conn, authSecret)
12151216
arrowTable <- read_arrow(readRaw(conn))
1216-
as.data.frame(as_tibble(arrowTable), stringsAsFactors = stringsAsFactors)
1217+
if (useAsTibble) {
1218+
as_tibble <- get("as_tibble", envir = asNamespace("arrow"))
1219+
as.data.frame(as_tibble(arrowTable), stringsAsFactors = stringsAsFactors)
1220+
} else {
1221+
as.data.frame(arrowTable, stringsAsFactors = stringsAsFactors)
1222+
}
12171223
}, finally = {
12181224
close(conn)
12191225
})

R/pkg/R/deserialize.R

+10-3
Original file line numberDiff line numberDiff line change
@@ -237,7 +237,9 @@ readDeserializeInArrow <- function(inputCon) {
237237
if (requireNamespace1("arrow", quietly = TRUE)) {
238238
RecordBatchStreamReader <- get(
239239
"RecordBatchStreamReader", envir = asNamespace("arrow"), inherits = FALSE)
240-
as_tibble <- get("as_tibble", envir = asNamespace("arrow"))
240+
# Arrow drops `as_tibble` since 0.14.0, see ARROW-5190.
241+
useAsTibble <- exists("as_tibble", envir = asNamespace("arrow"))
242+
241243

242244
# Currently, there looks no way to read batch by batch by socket connection in R side,
243245
# See ARROW-4512. Therefore, it reads the whole Arrow streaming-formatted binary at once
@@ -246,8 +248,13 @@ readDeserializeInArrow <- function(inputCon) {
246248
arrowData <- readBin(inputCon, raw(), as.integer(dataLen), endian = "big")
247249
batches <- RecordBatchStreamReader(arrowData)$batches()
248250

249-
# Read all groupped batches. Tibble -> data.frame is cheap.
250-
lapply(batches, function(batch) as.data.frame(as_tibble(batch)))
251+
if (useAsTibble) {
252+
as_tibble <- get("as_tibble", envir = asNamespace("arrow"))
253+
# Read all groupped batches. Tibble -> data.frame is cheap.
254+
lapply(batches, function(batch) as.data.frame(as_tibble(batch)))
255+
} else {
256+
lapply(batches, function(batch) as.data.frame(batch))
257+
}
251258
} else {
252259
stop("'arrow' package should be installed.")
253260
}

0 commit comments

Comments
 (0)