Use `date` column to calculate `stay_len` and `stay_id` #207

chenyangkang · 2025-01-19T22:49:35Z

The current add_stay_id function is using number of the rows as stop length, this is valid if the timesteps are sampled in the same frequency.

BirdFlowR/R/route.R

Lines 176 to 184 in cef92ce

    
           add_stay_id <- function(df) { 
        
             # Benjamin's function 
        
             df |> 
        
               dplyr::mutate(stay_id = cumsum(c(1, as.numeric(diff(.data$i)) != 0)), 
        
                             stay_len = rep(rle(.data$stay_id)$lengths, 
        
                                            times = rle(.data$stay_id)$lengths)) 
        
           } 
        
           points <- points |> dplyr::group_by(.data$route_id) |> add_stay_id()

However, custom data (e.g., tracking, motus, banding) seldom contains equally sampled timepoints. So consider using "real" calculation on the date column to get stay_len with default unit of day.

add_stay_id or similar transformation will be a default behavior for BirdFlowRoutes.

I removed the add_stay_id in route function.

BirdFlowR/R/route.R

Line 187 in 83eaf15

# add_stay_id <- function(df) {

And add the function add_stay_id_with_varied_intervals

BirdFlowR/R/RouteDataFunction.R

Lines 395 to 415 in 83eaf15

    
           add_stay_id_with_varied_intervals <- function(df, timestep_col = "date", timediff_unit = "days", time_threshold = Inf) { 
        
             # Ensure the data is sorted by timestep 
        
             df <- df |> dplyr::arrange(.data[[timestep_col]]) 
        
             new_df <- df |> 
        
               dplyr::mutate( 
        
                 timestep_diff = c(1, as.numeric(diff(.data[[timestep_col]]), units = timediff_unit)),  # Time differences 
        
                 i_change = c(1, as.numeric(diff(.data$i)) != 0),    # Changes in 'i' 
        
                 stay_id = cumsum(i_change | (timestep_diff > time_threshold)) 
        
               ) |> 
        
               # Now the stay_id is assigned, calculate the duration (time difference) of each stay 
        
               dplyr::group_by(route_id, stay_id) |> 
        
               dplyr::mutate( 
        
                 stay_len = as.numeric(max(.data[[timestep_col]]) - min(.data[[timestep_col]]) + 1, units = timediff_unit) 
        
               ) |> 
        
               dplyr::select(-timestep_diff, -i_change) 
        
             return(new_df) 
        
           }

which is applied when a new BirdFlowRoutes object is created:

BirdFlowR/R/RouteDataClass.R

Lines 198 to 209 in 83eaf15

    
           ## Add stay id 
        
           birdflow_route_df <- birdflow_route_df |> 
        
             sort_by_id_and_dates() |> 
        
             dplyr::group_by(.data$route_id) |> 
        
             add_stay_id_with_varied_intervals(timestep_col = timestep_col, timediff_unit = timediff_unit) |>  
        
             # Here, using add_stay_id_with_varied_intervals, rather than add_stay_id.  
        
             # It takes 'timestep' as input so account for varying intervals,  
        
             # if the data is not sampled in a frequency. 
        
             dplyr::ungroup() |> 
        
             as.data.frame() |> 
        
             preserve_s3_attributes(original = birdflow_route_df)

Also now the synthetic routes generated by route function will not have circular dates (so cross the year boundary), but the timestep will circulate to 1 again. So we should calculate the stay based on date rather than timestep.

This change will be included in the next merge if nobody objects.

The text was updated successfully, but these errors were encountered:

ethanplunkett · 2025-01-20T01:14:03Z

Make sure you do some testing with plot_routes(). It will likely need some updating for the new format. I've wanted to drop the circular dates and drop a bunch of hackish stuff I did to deal with the circular dates, so this is an overdue change. Let me know if you want me to make the changes in the plotting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `date` column to calculate `stay_len` and `stay_id` #207

Use `date` column to calculate `stay_len` and `stay_id` #207

chenyangkang commented Jan 19, 2025

ethanplunkett commented Jan 20, 2025

Use date column to calculate stay_len and stay_id #207

Use date column to calculate stay_len and stay_id #207

Comments

chenyangkang commented Jan 19, 2025

ethanplunkett commented Jan 20, 2025

Use `date` column to calculate `stay_len` and `stay_id` #207

Use `date` column to calculate `stay_len` and `stay_id` #207