-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make a unified data format (data class) for tracking & banding data #202
Comments
Dave and I talked about this a bunch. There's a related route object generated by the route() function that maybe should also be covered by this class. Also, my preference would be to stick with S3 classes unless there is a really compelling reason to use S6. |
Note also that the interval log likelihood function has observations and intervals as inputs that I think correspond somewhat with the two types of objects you are thinking about. Although, for that function they are just data frames and the intervals reference but don't contain the observations. |
Thanks for the suggestions. I will cover the route() function. I will try to use S3 for consistency. I was thinking about R6 as it's most similar to the OOP system that I know (and for my brain). And yes I saw the track_info object. I think it would be beneficial to make it a class that cover across BirdFlowR and BirdFlowPipeline. We only need to use the tracks in "BirdFlow grids", and all the preprocessing of banding & tracking should be covered on initiation of the object -- so can reduce the redundant preprocessing code distributed in multiple scripts. Now my idea is to use only one class, not two. This class will store preprocessed tracks. And a function "process_to_pairs" will add paired data to this class -- a little redundant in terms of data storage but this class can be passed into all functions. |
R6 handles object oriented programming a way that feels very foreign to R. Hadley Wickham's take on it in his introduction to a chapter on R6 is relevant here:
S3 does support inheritance so you could still have a second class that inherits from the first produced by your stand alone I'm not sure what "BirdFlow grids" references in your comment above but be aware that aligning tracks with a BirdFlow model is model specific. Even if two models have the same extent, CRS, and cell size the location indices won't be consistent because they will have different masks. Thus location indices, row and column indices, CRS (and projected x and y coordinates), and even timesteps (if it's not a full year model) can all be inconsistent between models. Additionally, depending on the eBird year the model is trained on the week of the year that a date aligns with may change. If we want to include any space or time information other than Latitude, Longitude, and a date-time we should also be sure to include the spatial and possibly temporal references - the Before you implement the class I think we should hammer out it's structure and make sure it's flexible enough to handle banding, tracking, modus, and synthetic tracks (from For reference the
An example of what that looks like:
Note each timestep has a row even if the bird didn't move. Also a mistake I made is that the date is always the nominal date from the model even though the route itself can cross into a second year resulting in the transition from Dec to January being backwards in time:
I regret this and would like to fix this by incrementing the year in this situation. That would simplify the plotting code and make it work better with non-synthetic routes. The |
Related existing data formats in BirdFlowR are the inputs to interval_log_likelihood() and A possible structure for this object is a list with components:
We could optionally include the I've written a lot for one comment but I realize that we probably also want to allow an aligned version of the object that has been snapped toa BirdFlow model's cells and timesteps - in this case we'd definitely want to add the |
Issue #158 is related to this and will be resolved if we include aligned routes. |
Thanks for the clarification!
I understand it, but will at least the CRS be consistent across models? @ethanplunkett |
No the CRS isn't fixed. It's an input to preprocess_species() which actually will default to a different custom projection for each species. In our big runs we've used that argument to input a common projection that works well for the Americas. |
@ethanplunkett Hey Ethan, based on the need and existing structures, I propose the following structure. Nothing settled and happy to discuss.
——
—— Possible workflow:
The goal of these workflows is to make sure that all functions in BirdFlowR & BirdFlowPipeline will find a suitable data class as input, and no more within-function data check, only class check. For interval_log_likelihood(), I think using a |
Some initial reactions.
|
@ethanplunkett Thanks! I will incorporate your suggestions. For the Tracks/Routes naming I'm happy to change it any time - Let's discuss on the next meeting. I will look into the The updated plan:
Possible workflow:
Another related issue is how should we sample the intervals from tracking data -- might need to reduce temporary correlation of samples. |
I think we are close especially on the overall structure. NamingI have a bunch of naming thoughts though. It's worth reading this: https://adv-r.hadley.nz/s3.html#s3 especially the section on classes. Note this is the second addition of the book which has expanded this chapter quite a bit, but search engines seem to often serve the first edition. Based on that a few more name revisions: I think we should also drop the "track_" prefix from the data frame columns for succinctness and to keep with BirdFlowR's naming conventions. Conversion to BirdFlowTracks (aligning with model)The conversion from Track to BirdFlowTrack will likely wrap Note BirdFlowPipeline::tracks_to_rts() is a function Dave wrote to convert tracking data to the existing "BirdFlowRoutes" class. Its redundant with Converting to BirdFlowIntervalsThere are a ton of choices for how to sample intervals from tracks especially if they are high frequency GPS tracks. One big benefit of this reorganization is consolidating that code in one place. Some existing code:
I suspect this conversion will take more thought and will likely evolve as we encounter different types of data and use cases. Happily I don't think we need to anticipate all the options now as we can add to the function as needed without changing the structure of the output object. We might want to include some metadata about what parameters were used when picking the intervals. |
Great discussion -- thanks Yangkang and Ethan. |
@ethanplunkett Thanks for pointing all these out. Super helpful. Naming
Conversion to BirdFlowTracks
Converting to BirdFlowIntervals
|
@chenyangkang this is great! One, maybe last, thought is that we should keep Two possible approaches:
Your call on which you prefer. |
Considering using R6 to make a data class.
Two data classes:
TrackCollection
) stores each track (no matter tracking data or capture-recover pair) as an object (sayTrack
)-- can be used for general storage or plotting.PairedTrackCollection
) inherits from the first one, but has a process function that is called upon initiation, which will process the data to be "paired" (sayPairedTrack
) -- this object can be used for evaluation.Implement it in
BirdFlowR
instead ofBirdFlowPipeline
can help end-users to test the models with custom data.The text was updated successfully, but these errors were encountered: