Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show #6

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft

Show #6

wants to merge 9 commits into from

Conversation

cnolanminich
Copy link

addresses #2, #3, #4, and #5.

Intro

To be clear: this PR is not intended to merge as-is. I was working pretty quickly to test out and take this for a spin myself. I opened individual issues, and would pare these changes down into bite-sized chunks depending on your interest in accepting each.

I tried to group the commits based on the type of changes that these would make

@philiporlando philiporlando added the enhancement New feature or request label Dec 13, 2024
@philiporlando
Copy link
Owner

Thanks for breathing some new life into this project! I'm in favor of the changes you're proposing and look forward to testing them once they're ready for review.

Comment on lines +12 to +38
convert_r_to_python_types <- function(df) {
# Get R types
r_types <- sapply(df, class)

# Define type mapping
type_mapping <- list(
"numeric" = "float",
"integer" = "int",
"character" = "str",
"factor" = "str",
"logical" = "bool",
"Date" = "datetime.date",
"POSIXct" = "datetime.datetime",
"POSIXlt" = "datetime.datetime"
)

# Convert types
python_types <- sapply(r_types, function(x) {
if (x %in% names(type_mapping)) {
type_mapping[[x]]
} else {
"object" # default type
}
})

return(reticulate::r_to_py(as.list(python_types)))
}
Copy link
Owner

@philiporlando philiporlando Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this type_mapping list is needed. Shouldn't type conversion be handled implicitly by reticulate?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It did the floats fine but defaults to "object" for the strings. I'll check into it some more

Copy link
Owner

@philiporlando philiporlando Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info. I haven't had a chance to test your changes yet, but I'm hoping there's still a way that we can piggyback off reticulate's type conversion somehow.

The r_to_py.data.frame() function should be able to handle most R data types. Maybe we can pass df to this function, instead of using sapply(df, class), and then extract the python types from the resulting data frame?

It's possible that reticulate isn't recognizing that pandas is available, which could be causing r_to_py_impl() to be called instead. However, I'm noticing that uv support was added to reticulate in October, and the renv.lock shows the latest version, so this shouldn't be an issue.

I'll try taking a closer look in the next couple of days.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh r_to_py.data.frame was the function I was missing after rifling through the reticulate docs. Yeah that might do it

Copy link
Owner

@philiporlando philiporlando Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaning on reticulate more produces the below dictionary:

reticulate::r_to_py(df)
{'Sepal.Length': dtype('float64'), 'Sepal.Width': dtype('float64'), 'Petal.Length': dtype('float64'), 'Petal.Width': dtype('float64'), 'Species': CategoricalDtype(categories=['setosa', 'versicolor', 'virginica'], ordered=False, categories_dtype=object)}

With a little more work, we can extract the name of each data type from its dtype() object:

convert_r_to_python_types <- function(df) {
    df_pandas <- reticulate::r_to_py(df)
    dtypes_dict <- df_pandas$dtypes$to_dict()
    for (col_name in names(dtypes_dict)) {
        dtypes_dict[[col_name]] <- as.character(dtypes_dict[[col_name]]$name)
    }
    return(dtypes_dict)
}

{'Sepal.Length': 'float64', 'Sepal.Width': 'float64', 'Petal.Length': 'float64', 'Petal.Width': 'float64', 'Species': 'category'}

This will materialize successfully, however, reticulate's float64 is not recognized as a Python float by Dagster. Instead, it is interpreted as a VARCHAR:

image

I'm not really sure if there's an implicit way to rely on reticulate here. The hard-coded mapping dictionary may be the only way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think relying on reticluate and then manually overriding a few might make the most sense (like float64 vs. float).

I'll take the code here and add it to my PR with an override

3. **Install Python Dependencies**
# you'll need a version of python installed
Using uv
# install uv
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing an opening ``` here?

README.md Outdated

4. **Set RETICULATE_PYTHON environment variable**
Determine the path to the python binary associated with this project's poetry environment.
```bash
poetry run
# from your viritual environment
which python
# /home/user/.cache/pypoetry/virtualenvs/dagster-and-r-kS5e8P_l-py3.10/bin/python
Copy link
Owner

@philiporlando philiporlando Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to update my .Renviron to avoid a cryptic error message when trying to materialize the iris_r asset:

#.Renviron
RETICULATE_PYTHON=.venv/bin/python

We should probably update the README to include this new path after migrating to uv.

@philiporlando
Copy link
Owner

philiporlando commented Dec 17, 2024

I was able to materialize these assets on Ubuntu 22.04 after pulling your changes! I should be able to close #1 once this merges!

image

@philiporlando philiporlando linked an issue Dec 18, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants