Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the exposure func to handle filtering modelled datetimes and combining filters to customise exposure analysis #50

Closed
wants to merge 4 commits into from

Conversation

erialC-P
Copy link
Collaborator

@erialC-P erialC-P commented Jan 23, 2024

This PR:

  • adds extensive functionality to the exposure script.
  • The Intertidal_workflow.ipynb is included and demo's calculation of the full timeseries for exposure, as previously.

This PR does not:

  • update calls to the exposure function from other notebooks, namely the CLI_notebook.

Notes on the structure of the new exposure function:

  • Filter inputs include:
    • temporal filters such as day, night, month, season
    • spatial filters including hightides, lowtides, neap and spring high/low tides
    • combinations of temporal and spatial filters such as wet season lowtides.
  • Known issues include consistency in the capitalisation of filter names and the length of the script.

Future work will address the known issues and add a notebook dedicated to showcasing exposure custom options

@erialC-P erialC-P changed the title Update the exposure func to handle filtering the datetimes and combining filters to custom exposure analysis Update the exposure func to handle filtering modelled datetimes and combining filters to customise exposure analysis Jan 23, 2024
@erialC-P erialC-P assigned erialC-P and unassigned erialC-P Jan 23, 2024
@erialC-P erialC-P requested review from robbibt and vnewey January 23, 2024 22:56
Copy link
Member

@robbibt robbibt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Claire, sorry for the super long delay in reviewing this! This new functionality is awesome - the range of different filtering methods is fantastic, and will let our users really narrow in on and answer some very interesting ecological questions relating to specific life history traits/environmental processes.

I think there's a few things we need to change before merging it into main to improve performance and make the code easier to maintain in the future. There's three key changes I'd recommend:

Reduce duplication by packing up analysis into discrete functions that re-use code wherever possible

The exposure function has grown pretty long, and includes quite a few sections of code that are repeated with only small differences each time (e.g. significant overlaps between the neap low/neap high, the spring low/spring high, and the low tide/high tide calculations). My feeling is that we should aim to keep exposure as minimal as possible by packing all those discrete pieces up as re-usable functions instead. That way we can reduce duplication/reuse logic, make it much more difficult for errors and inconsistencies to creep in, and make it easier to read and maintain. These look like priorities for more discrete functions:

  • A combined temporal_filter function handling all the basic temporal filtering options, which itself calls another lower-level "daynight_filter" function that performs the day/night filtering
  • A combined spring_tides function that supports both "low" and "high" functionality
  • A combined neap_tides function that supports both "low" and "high" functionality
  • A combined lowhigh_tides function that supports both "low" and "high" tide functionality

Make sure expensive analyses like tide modelling are only performed once in the workflow

Some functionality like the tide modelling is really expensive, but is now run multiple times throughout the workflow (I suspect this might be the cause of some of the Sandbox issues you've been running into). We'll need to ensure that the tide modelling code is only run once in the entire Exposure workflow, likely by moving it up higher and re-using it in other functions below.

Consider shifting spatial filters to run on the low-res tide modelling data instead of the high-res data

I suspect the spatial filter functions are currently slow mainly as they seem to be applied to every high-res pixel. This will scale exponentially as our study area increases, and likely won't be practical for anything larger than a small section of a beach. A more efficient approach would be to run these on the low-res output of the tide modelling functions instead. Combining this with the suggestion above, we could do something like:

  1. Calling the tide modelling functions with resample=False and calculate_quantiles=None once at the very top of the workflow, returning a low-res model output
  2. Apply spatial filters to the low-res tide modelling output
  3. Calculate exposure quantiles and resample back to high-resolution, borrowing code from inside pixel_tides​ here

(That said: one thing to consider could be whether these filters actually need to be run at a pixel level at all. If we switched to simply running them on a 1D timeseries instead (e.g. using average tide height per timestep), we could probably greatly simplify a bunch of code: e.g. all the temporal + spatial filters could simply return equivalent lists of filtered timesteps, rather than the mix of different outputs we currently have.)


Given our updated Notebooks plan, some of this work can probably be pushed back until after the April release. A short-term starting point could be to raise a second smaller PR to implement the breaking API changes in exposure (e.g. time_range > start_date/ end_date/freq etc) in the existing version of the function, so that we can start using that syntax now before the larger changes above?

Comment on lines +287 to +322
## Note: day and night times will be calculated once per area-of-interest(ds)
## for the median latitude and longitude position of the aoi
tidepost_lat_3577 = dem.x.median(dim='x').values
tidepost_lon_3577 = dem.y.median(dim='y').values

## Set the Datacube native crs (GDA/Aus Albers (meters))
crs_3577 = CRS.from_epsg(3577)

## Translate the crs of the tidepost to determine (1) local timezone
## and (2) the local sunrise and sunset times:

## (1) Create a transform to convert default epsg3577 coords to epsg4283 to compare
## against state/territory boundary polygons and assign a timezone

## GDA94 CRS (degrees)
crs_4283 = CRS.from_epsg(4283)
## Transfer coords from/to
transformer_4283 = Transformer.from_crs(crs_3577, crs_4283)
## Translate tidepost coords
tidepost_lat_4283, tidepost_lon_4283 = transformer_4283.transform(tidepost_lat_3577,
tidepost_lon_3577)
## Coordinate point to test for timezone
point_4283 = Point(tidepost_lon_4283, tidepost_lat_4283)

## (2) Create a transform to convert default epsg3577 coords to epsg4326 for use in
## sunise/sunset library

## World WGS84 (degrees)
crs_4326 = CRS.from_epsg(4326)
## Transfer coords from/to
transformer_4326 = Transformer.from_crs(crs_3577, crs_4326)
## Translate the tidepost coords
tidepost_lat_4326, tidepost_lon_4326 = transformer_4326.transform(tidepost_lat_3577,
tidepost_lon_3577)
## Coordinate point to locate the sunriset calculation
point_4326 = Point(tidepost_lon_4326, tidepost_lat_4326)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The block of coordinate conversion code above can probably be simplified by getting the coord directly from the dataset's GeoBox and converting it to EPSG:4326 there:

tidepost_lon_4326, tidepost_lat_4326 = dem.odc.geobox.extent.centroid.to_crs("EPSG:4326").coords[0]

Comment on lines +259 to +285
timezones = {'wa':'../../gdata1/data/boundaries/GEODATA_COAST_100K/western_australia/cstwacd_r.shp',
'nt':'../../gdata1/data/boundaries/GEODATA_COAST_100K/northern_territory/cstntcd_r.shp',
'sa':'../../gdata1/data/boundaries/GEODATA_COAST_100K/south_australia/cstsacd_r.shp',
'qld':'../../gdata1/data/boundaries/GEODATA_COAST_100K/queensland/cstqldmd_r.shp',
'nsw':'../../gdata1/data/boundaries/GEODATA_COAST_100K/new_south_wales/cstnswcd_r.shp',
'vic':'../../gdata1/data/boundaries/GEODATA_COAST_100K/victoria/cstviccd_r.shp',
'tas':'../../gdata1/data/boundaries/GEODATA_COAST_100K/tasmania/csttascd_r.shp'
}

## Determine the timezone of the pixels
## Bring in the state polygons (note: native crs = epsg:4283)
wa = gpd.read_file(timezones['wa'])
nt = gpd.read_file(timezones['nt'])
sa = gpd.read_file(timezones['sa'])
qld = gpd.read_file(timezones['qld'])
nsw = gpd.read_file(timezones['nsw'])
vic = gpd.read_file(timezones['vic'])
tas = gpd.read_file(timezones['tas'])

# Merge the polygons to create single state/territory boundaries
wa = gpd.GeoSeries(unary_union(wa.geometry))
nt = gpd.GeoSeries(unary_union(nt.geometry))
sa = gpd.GeoSeries(unary_union(sa.geometry))
qld = gpd.GeoSeries(unary_union(qld.geometry))
nsw = gpd.GeoSeries(unary_union(nsw.geometry))
vic = gpd.GeoSeries(unary_union(vic.geometry))
tas = gpd.GeoSeries(unary_union(tas.geometry))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to do local timezone conversion here - sunset and sunrise should be able to be calculated directly in UTC. I think if we pass in local_tz = 0 to sunriset, we should be able to do the enture calculation in UTC and avoid having to load and manipulate custom timezone polygons.

intertidal/exposure.py Show resolved Hide resolved
Comment on lines +324 to +395
## Set the local timezone for the analysis area of interest
if wa.contains(point_4283)[0] == True:
timezone = 'Australia/West'
local_tz = 8

elif nt.contains(point_4283)[0] == True:
timezone = 'Australia/North'
local_tz = 9.5

elif sa.contains(point_4283)[0] == True:
timezone = 'Australia/South'
local_tz = 9.5

elif qld.contains(point_4283)[0] == True:
timezone = 'Australia/Queensland'
local_tz = 10

elif nsw.contains(point_4283)[0] == True:
timezone = 'Australia/NSW'
local_tz = 10

elif vic.contains(point_4283)[0] == True:
timezone = 'Australia/Victoria'
local_tz = 10

elif tas.contains(point_4283)[0] == True:
timezone = 'Australia/Tasmania'
local_tz = 10

## Calculate the local sunrise and sunset times
# Place start and end dates in correct format
start = time_range[0]
end = time_range[-1]
startdate = datetime.date(pd.to_datetime(start).year,
pd.to_datetime(start).month,
pd.to_datetime(start).day)

# Make 'all_timerange' time-zone aware
localtides = time_range.tz_localize(tz=pytz.UTC).tz_convert(timezone)

# Replace the UTC datetimes from all_timerange with local times
modelledtides = pd.DataFrame(index = localtides)

# Return the difference in years for the time-period.
# Round up to ensure all modelledtide datetimes are captured in the solar model
diff = pd.to_datetime(end) - pd.to_datetime(start)
diff = int(ceil(diff.days/365))

## Model sunrise and sunset
sun_df = sunriset.to_pandas(startdate, tidepost_lat_4326, tidepost_lon_4326, local_tz, diff)

## Set the index as a datetimeindex to match the modelledtide df
sun_df = sun_df.set_index(pd.DatetimeIndex(sun_df.index))

## Append the date to each Sunrise and Sunset time
sun_df['Sunrise dt'] = sun_df.index + sun_df['Sunrise']
sun_df['Sunset dt'] = sun_df.index + (sun_df['Sunset'])

## Create new dataframes where daytime and nightime datetimes are recorded, then merged
## on a new `Sunlight` column
daytime=pd.DataFrame(data = 'Sunrise', index=sun_df['Sunrise dt'], columns=['Sunlight'])
nighttime=pd.DataFrame(data = 'Sunset', index=sun_df['Sunset dt'], columns=['Sunlight'])
DayNight = pd.concat([daytime, nighttime], join='outer')
DayNight.sort_index(inplace=True)
DayNight.index.rename('Datetime', inplace=True)

## Create an xarray object from the merged day/night dataframe
day_night = xr.Dataset.from_dataframe(DayNight)

## Remove local timezone timestamp column in modelledtides dataframe. Xarray doesn't handle
## timezone aware datetimeindexes 'from_dataframe' very well.
modelledtides.index = modelledtides.index.tz_localize(tz=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above - avoid local timezones/timezone conversion and just do everything in UTC / local_tz=0.

intertidal/exposure.py Show resolved Hide resolved
@erialC-P erialC-P closed this Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants