-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Show #6
base: main
Are you sure you want to change the base?
Show #6
Conversation
Thanks for breathing some new life into this project! I'm in favor of the changes you're proposing and look forward to testing them once they're ready for review. |
convert_r_to_python_types <- function(df) { | ||
# Get R types | ||
r_types <- sapply(df, class) | ||
|
||
# Define type mapping | ||
type_mapping <- list( | ||
"numeric" = "float", | ||
"integer" = "int", | ||
"character" = "str", | ||
"factor" = "str", | ||
"logical" = "bool", | ||
"Date" = "datetime.date", | ||
"POSIXct" = "datetime.datetime", | ||
"POSIXlt" = "datetime.datetime" | ||
) | ||
|
||
# Convert types | ||
python_types <- sapply(r_types, function(x) { | ||
if (x %in% names(type_mapping)) { | ||
type_mapping[[x]] | ||
} else { | ||
"object" # default type | ||
} | ||
}) | ||
|
||
return(reticulate::r_to_py(as.list(python_types))) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised this type_mapping
list is needed. Shouldn't type conversion be handled implicitly by reticulate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It did the floats fine but defaults to "object" for the strings. I'll check into it some more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the info. I haven't had a chance to test your changes yet, but I'm hoping there's still a way that we can piggyback off reticulate's type conversion somehow.
The r_to_py.data.frame() function should be able to handle most R data types. Maybe we can pass df
to this function, instead of using sapply(df, class)
, and then extract the python types from the resulting data frame?
It's possible that reticulate isn't recognizing that pandas is available, which could be causing r_to_py_impl()
to be called instead. However, I'm noticing that uv support was added to reticulate in October, and the renv.lock
shows the latest version, so this shouldn't be an issue.
I'll try taking a closer look in the next couple of days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh r_to_py.data.frame was the function I was missing after rifling through the reticulate docs. Yeah that might do it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaning on reticulate
more produces the below dictionary:
reticulate::r_to_py(df)
{'Sepal.Length': dtype('float64'), 'Sepal.Width': dtype('float64'), 'Petal.Length': dtype('float64'), 'Petal.Width': dtype('float64'), 'Species': CategoricalDtype(categories=['setosa', 'versicolor', 'virginica'], ordered=False, categories_dtype=object)}
With a little more work, we can extract the name of each data type from its dtype()
object:
convert_r_to_python_types <- function(df) {
df_pandas <- reticulate::r_to_py(df)
dtypes_dict <- df_pandas$dtypes$to_dict()
for (col_name in names(dtypes_dict)) {
dtypes_dict[[col_name]] <- as.character(dtypes_dict[[col_name]]$name)
}
return(dtypes_dict)
}
{'Sepal.Length': 'float64', 'Sepal.Width': 'float64', 'Petal.Length': 'float64', 'Petal.Width': 'float64', 'Species': 'category'}
This will materialize successfully, however, reticulate's float64
is not recognized as a Python float
by Dagster. Instead, it is interpreted as a VARCHAR
:
I'm not really sure if there's an implicit way to rely on reticulate here. The hard-coded mapping dictionary may be the only way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think relying on reticluate and then manually overriding a few might make the most sense (like float64 vs. float).
I'll take the code here and add it to my PR with an override
3. **Install Python Dependencies** | ||
# you'll need a version of python installed | ||
Using uv | ||
# install uv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing an opening ``` here?
README.md
Outdated
|
||
4. **Set RETICULATE_PYTHON environment variable** | ||
Determine the path to the python binary associated with this project's poetry environment. | ||
```bash | ||
poetry run | ||
# from your viritual environment | ||
which python | ||
# /home/user/.cache/pypoetry/virtualenvs/dagster-and-r-kS5e8P_l-py3.10/bin/python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed to update my .Renviron
to avoid a cryptic error message when trying to materialize the iris_r
asset:
#.Renviron
RETICULATE_PYTHON=.venv/bin/python
We should probably update the README to include this new path after migrating to uv
.
I was able to materialize these assets on Ubuntu 22.04 after pulling your changes! I should be able to close #1 once this merges! |
addresses #2, #3, #4, and #5.
Intro
To be clear: this PR is not intended to merge as-is. I was working pretty quickly to test out and take this for a spin myself. I opened individual issues, and would pare these changes down into bite-sized chunks depending on your interest in accepting each.
I tried to group the commits based on the type of changes that these would make