-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/type preservation empty dataframes #301
base: main
Are you sure you want to change the base?
Fix/type preservation empty dataframes #301
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add a fix to this PR?
diff --git a/lib/red_amber/data_frame_variable_operation.rb b/lib/red_amber/data_frame_variable_operation.rb
index 7a5179e..62b0706 100755
--- a/lib/red_amber/data_frame_variable_operation.rb
+++ b/lib/red_amber/data_frame_variable_operation.rb
@@ -675,9 +675,18 @@ module RedAmber
raise DataFrameArgumentError, "Data size mismatch (#{data.size} != #{size})"
end
- a = Arrow::Array.new(data.is_a?(Vector) ? data.to_a : data)
+ if data.respond_to?(:to_arrow_chunked_array)
+ chunked_array = data.to_arrow_chunked_array
+ else
+ if data.respond_to?(:to_arrow_array)
+ a = data.to_arrow_array
+ else
+ a = Arrow::Array.new(data)
+ end
+ chunked_array = Arrow::ChunkedArray.new([a])
+ end
fields[i] = Arrow::Field.new(key, a.value_data_type)
- arrays[i] = Arrow::ChunkedArray.new([a])
+ arrays[i] = chunked_array
end
[fields, arrays]
end
diff --git a/lib/red_amber/vector.rb b/lib/red_amber/vector.rb
index 7237807..5267eb6 100644
--- a/lib/red_amber/vector.rb
+++ b/lib/red_amber/vector.rb
@@ -198,6 +198,22 @@ module RedAmber
alias_method :values, :to_ary
alias_method :entries, :to_ary
+ # Convert to an Arrow::Array.
+ #
+ # @return [Arrow::Array]
+ # Apache Arrow array representation.
+ def to_arrow_array
+ @data.to_arrow_array
+ end
+
+ # Convert to an Arrow::ChunkedArray.
+ #
+ # @return [Arrow::ChunkedArray]
+ # Apache Arrow chunked array representation.
+ def to_arrow_chunked_array
+ @data.to_arrow_chunked_array
+ end
+
# Indeces from 0 to size-1 by Array.
#
# @return [Array]
9bece65
to
7ce24d6
Compare
@kou if I don't completely understand how chucked arrays are manipulated but could we replace that line with |
Ah, sorry. We can use |
1b2364b
to
a232f14
Compare
@@ -250,7 +250,7 @@ class GroupTest < Test::Unit::TestCase | |||
Vectors : 3 numeric | |||
# key type level data_preview | |||
0 :i uint8 4 [0, 1, 2, nil], 1 nil | |||
1 :count uint8 3 [2, 1, 2, 0] | |||
1 :count int64 3 [2, 1, 2, 0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kou FYI I had to update this test after the change
I'll fix build failures on Linux in upstream. Please wait for a while... |
It looks like manipulating a column in an empty data frame defaults the result to a type of
:string
.