-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement the RoutesDataClass as a generic data format #204
base: main
Are you sure you want to change the base?
Conversation
…necessary. If bf not provided, use birdflow_crs in the environment. This will make data loading easier without loading the model.
…necessary. If bf not provided, use birdflow_crs in the environment. This will make data loading easier without loading the model.
…timestep 1 can cause inconsistency. So fix the random state.
… acquiring from BirdFlow models
Seems to be a floating-point precision errors induced by |
…but not for date; the route generated by `route` function should be sorted
…e duplicated but date should not.
…e duplicated but date should not.
…e duplicated but date should not.
@ethanplunkett Hi Ethan do you want to review the current changes before I go further to edit other functions? Especially about I still implemented the transformation function |
@chenyangkangyes. I want to review before it's merged. I have come down with something nasty though so am unlikely to get to it for a few days. I'm currently running a 102 fever. Overall it's great to see you getting so much done! I do have a couple thoughts based on what the code looked like Saturday morning when O last looked it over. Disregard anything that you've already addressed.
|
@ethanplunkett Sorry to hear that! And thanks for the quick feedback. I will check the first two points. Regarding the Hope you get better soon and we can discuss later! ![]() |
Yes. It’s a good idea to make plot routes a method of the generic ‘plot’ now that we’ve formalized the class structure.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chenyangkang I left lots of specific comments suggesting changes but overall this is great!
Some overarching comments:
- Stylistically I prefer more succinct names. I don't think that matters enough to change it for the private functions and internal code, but it is worth shortening the public facing function and argument names.
- I don't remember deciding whether we wanted to base these classes on data frames or lists. I do remember saying data frames might be cool. I like that they can then be treated like data frames but as you discovered manipulating them can lose the attributes, a problem people are less likely to bump up against with a list based object. Let's talk about this in person; I don't have a strong feeling either way but want to make sure we are making an informed decision.
NAMESPACE
Outdated
export(sparsify) | ||
export(species) | ||
export(species_info) | ||
export(truncate_birdflow) | ||
export(validate_BirdFlow) | ||
export(validate_BirdFlowIntervals_birdflow_intervals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should just be validate_BirdFlowIntervals
and the following two exports should be validate_Routes
and validate_BirdFlowRoutes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh... I see there's also a validate_BirdFLowRoutes_geom
etc. I still think the public validation functions shouldn't have the almost repeated text (_birdflow_route_df
, route_df
).
R/RouteDataClass.R
Outdated
|
||
#' @rdname RouteDataClass | ||
#' @export | ||
Routes <- function(route_df, species = NULL, metadata = NULL, source = NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe change route_df
to data
or route_data
or just df
.
R/RouteDataFunction.R
Outdated
routes |> | ||
dplyr::group_by(route_type) |> | ||
dplyr::summarize( | ||
unique_route_count = dplyr::n_distinct(route_id), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid the notes about undefined variables you need to use .data$route_id
here.
R/index_conversions.R
Outdated
@@ -60,7 +60,7 @@ | |||
#' space index corresponding to x and y coordinates or row and column indices.} | |||
#' } | |||
#' | |||
#' \item{`latlon_to_xy(lat, lon, bf)`}{Returns a two column matrix of the x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was right as it was bf
is an argument to latlon_to_xy()
@@ -33,7 +33,7 @@ if (FALSE) { | |||
#' @param radius A point is considered between two locations if it is within | |||
#' `radius` meters (along a great circle) of the great circle line between the | |||
#' locations. `radius` defaults to half the cell size (`mean(res(bf))/2`). | |||
#' @param n_direction The number of (equally spaced) directional bins to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this!
R/route.R
Outdated
points <- points |> | ||
dplyr::group_by(route_id) |> | ||
dplyr::mutate( | ||
date = as.Date(unlist(purrr::accumulate(date, ~ ifelse(.y < .x, .y + lubridate::years(1), .y)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently synthetic routes are allowed to be backwards in time. This assumes forward. We should either change route()
to exclude backwards routes or clean things up here. purr
is iterating through each pair or adjacent rows so is likely to be slow. It might make sense to fix this in format_trajectory()
.
R/validate_RouteDataClass.R
Outdated
# check the features required by direct initiation of BirdFlowIntervals class | ||
for (name in get_target_columns_BirdFlowIntervals(type='input')){ | ||
if (!name %in% colnames(birdflow_interval_df)){ | ||
stop(sprintf(glue::glue("'{name}' is not found in the input dataframe."))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop dependency on glue and use paste()
or sprintf()
|
||
#' @rdname target_columns | ||
#' @export | ||
get_target_columns_Routes <- function(type='input'){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These names feel clunky. I'd be tempted not to export them as I don't think they are for end users, but if we do want to export them I think they should be shorter. get_routes_cols()
get_birdflowroutes_cols()
? There's probably something better.
@@ -13,7 +13,22 @@ test_that("preprocess_species runs on test dataset", { | |||
expect_no_error(a <- preprocess_species("example_data", hdf5 = FALSE)) | |||
expect_no_error(validate_BirdFlow(a, allow_incomplete = TRUE)) | |||
expect_error(validate_BirdFlow(a)) | |||
expect_true(all((ext(a)[, ] %% xres(a)) < 1e-9)) # Test if origin is at 0, 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is making sure the cells align with the origin and I think is important to keep. It has nothing to do with your code.
I have two thoughts:
- I could add code to preprocess species to check if the extent and cell size numbers are close to integers and round them if they are. It wouldn't help when the cell size is floating point (say some sort of spherical coordinates) but would help with all practical applications we've had to date.
- We could lower the tolerance on the check to allow the test to pass as is. I can't see the failed test output anymore but if I recall the difference was very small but not quite as small as this test requires.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I will leave the current modification as it is -- To suppress the error. I'm not deleting the test -- just using another logic to validate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed the added lines!
Yours is definitely better.
Your test passes if you are within the tolerance either above or below the exact number, whereas mine only passes if you are above but within the tolerance.
What's weird is when it was failing before it was for a value that was slightly too big given the tolerance so, I think, should have failed either test. Definitely leave yours though, it's both theoretically and empirically working better.
Thanks!
(About the failed test due to slight discrepancies in cell alignment, that is unrelated to this PR.) |
RE: |
Oh I see, the elapsed_stay is calculated by row_number int he plot_routes function -- Will change it. It should represent 'days' now. |
…d_stay_id_with_varied_intervals; remove `glue` package; update some namings
…rrr dependency; fix notes
Looks like you are close! This is great. I have one more comment. Can you add to the top of the news the way this breaks existing code. Something along the lines of: Breaking Changes:
|
@ethanplunkett Done! Could you please help review that next week at your convenience? Ready to be merged. Still need to do several things:
-- These will go to other PRs. I think the current changes are all for this PR. |
Routes
,BirdFlowRoutes
,BirdFlowIntervals
Make a unified data format (data class) for tracking & banding data #202print
plot
orplot_routes
?route
inroute.R
format_trajectory
inroute.R
calc_predictive_distance_metric.R
interval_log_likelihood.R
make_tracks
infunction.R
ofBirdFlowPipeline
package.preprocess_with_recovery
,preprocess_sort_band_date
,preprocess_calc_distance_days
infunction.R
ofBirdFlowPipeline
package.tracking_data.R
inBirdFlowPipeline
package.amwo_track_info_new.R
inBirdFlowPipeline
Minor changes: