-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the exposure func to handle filtering modelled datetimes and combining filters to customise exposure analysis #50
Conversation
…ion and updating exposure call in notebook
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Claire, sorry for the super long delay in reviewing this! This new functionality is awesome - the range of different filtering methods is fantastic, and will let our users really narrow in on and answer some very interesting ecological questions relating to specific life history traits/environmental processes.
I think there's a few things we need to change before merging it into main
to improve performance and make the code easier to maintain in the future. There's three key changes I'd recommend:
Reduce duplication by packing up analysis into discrete functions that re-use code wherever possible
The exposure
function has grown pretty long, and includes quite a few sections of code that are repeated with only small differences each time (e.g. significant overlaps between the neap low
/neap high
, the spring low
/spring high
, and the low tide
/high tide
calculations). My feeling is that we should aim to keep exposure
as minimal as possible by packing all those discrete pieces up as re-usable functions instead. That way we can reduce duplication/reuse logic, make it much more difficult for errors and inconsistencies to creep in, and make it easier to read and maintain. These look like priorities for more discrete functions:
- A combined
temporal_filter
function handling all the basic temporal filtering options, which itself calls another lower-level "daynight_filter" function that performs the day/night filtering - A combined
spring_tides
function that supports both "low" and "high" functionality - A combined
neap_tides
function that supports both "low" and "high" functionality - A combined
lowhigh_tides
function that supports both "low" and "high" tide functionality
Make sure expensive analyses like tide modelling are only performed once in the workflow
Some functionality like the tide modelling is really expensive, but is now run multiple times throughout the workflow (I suspect this might be the cause of some of the Sandbox issues you've been running into). We'll need to ensure that the tide modelling code is only run once in the entire Exposure workflow, likely by moving it up higher and re-using it in other functions below.
Consider shifting spatial filters to run on the low-res tide modelling data instead of the high-res data
I suspect the spatial filter functions are currently slow mainly as they seem to be applied to every high-res pixel. This will scale exponentially as our study area increases, and likely won't be practical for anything larger than a small section of a beach. A more efficient approach would be to run these on the low-res output of the tide modelling functions instead. Combining this with the suggestion above, we could do something like:
- Calling the tide modelling functions with
resample=False
andcalculate_quantiles=None
once at the very top of the workflow, returning a low-res model output - Apply spatial filters to the low-res tide modelling output
- Calculate exposure quantiles and resample back to high-resolution, borrowing code from inside
pixel_tides
here
(That said: one thing to consider could be whether these filters actually need to be run at a pixel level at all. If we switched to simply running them on a 1D timeseries instead (e.g. using average tide height per timestep), we could probably greatly simplify a bunch of code: e.g. all the temporal + spatial filters could simply return equivalent lists of filtered timesteps, rather than the mix of different outputs we currently have.)
Given our updated Notebooks plan, some of this work can probably be pushed back until after the April release. A short-term starting point could be to raise a second smaller PR to implement the breaking API changes in exposure
(e.g. time_range
> start_date
/ end_date
/freq
etc) in the existing version of the function, so that we can start using that syntax now before the larger changes above?
## Note: day and night times will be calculated once per area-of-interest(ds) | ||
## for the median latitude and longitude position of the aoi | ||
tidepost_lat_3577 = dem.x.median(dim='x').values | ||
tidepost_lon_3577 = dem.y.median(dim='y').values | ||
|
||
## Set the Datacube native crs (GDA/Aus Albers (meters)) | ||
crs_3577 = CRS.from_epsg(3577) | ||
|
||
## Translate the crs of the tidepost to determine (1) local timezone | ||
## and (2) the local sunrise and sunset times: | ||
|
||
## (1) Create a transform to convert default epsg3577 coords to epsg4283 to compare | ||
## against state/territory boundary polygons and assign a timezone | ||
|
||
## GDA94 CRS (degrees) | ||
crs_4283 = CRS.from_epsg(4283) | ||
## Transfer coords from/to | ||
transformer_4283 = Transformer.from_crs(crs_3577, crs_4283) | ||
## Translate tidepost coords | ||
tidepost_lat_4283, tidepost_lon_4283 = transformer_4283.transform(tidepost_lat_3577, | ||
tidepost_lon_3577) | ||
## Coordinate point to test for timezone | ||
point_4283 = Point(tidepost_lon_4283, tidepost_lat_4283) | ||
|
||
## (2) Create a transform to convert default epsg3577 coords to epsg4326 for use in | ||
## sunise/sunset library | ||
|
||
## World WGS84 (degrees) | ||
crs_4326 = CRS.from_epsg(4326) | ||
## Transfer coords from/to | ||
transformer_4326 = Transformer.from_crs(crs_3577, crs_4326) | ||
## Translate the tidepost coords | ||
tidepost_lat_4326, tidepost_lon_4326 = transformer_4326.transform(tidepost_lat_3577, | ||
tidepost_lon_3577) | ||
## Coordinate point to locate the sunriset calculation | ||
point_4326 = Point(tidepost_lon_4326, tidepost_lat_4326) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The block of coordinate conversion code above can probably be simplified by getting the coord directly from the dataset's GeoBox and converting it to EPSG:4326 there:
tidepost_lon_4326, tidepost_lat_4326 = dem.odc.geobox.extent.centroid.to_crs("EPSG:4326").coords[0]
timezones = {'wa':'../../gdata1/data/boundaries/GEODATA_COAST_100K/western_australia/cstwacd_r.shp', | ||
'nt':'../../gdata1/data/boundaries/GEODATA_COAST_100K/northern_territory/cstntcd_r.shp', | ||
'sa':'../../gdata1/data/boundaries/GEODATA_COAST_100K/south_australia/cstsacd_r.shp', | ||
'qld':'../../gdata1/data/boundaries/GEODATA_COAST_100K/queensland/cstqldmd_r.shp', | ||
'nsw':'../../gdata1/data/boundaries/GEODATA_COAST_100K/new_south_wales/cstnswcd_r.shp', | ||
'vic':'../../gdata1/data/boundaries/GEODATA_COAST_100K/victoria/cstviccd_r.shp', | ||
'tas':'../../gdata1/data/boundaries/GEODATA_COAST_100K/tasmania/csttascd_r.shp' | ||
} | ||
|
||
## Determine the timezone of the pixels | ||
## Bring in the state polygons (note: native crs = epsg:4283) | ||
wa = gpd.read_file(timezones['wa']) | ||
nt = gpd.read_file(timezones['nt']) | ||
sa = gpd.read_file(timezones['sa']) | ||
qld = gpd.read_file(timezones['qld']) | ||
nsw = gpd.read_file(timezones['nsw']) | ||
vic = gpd.read_file(timezones['vic']) | ||
tas = gpd.read_file(timezones['tas']) | ||
|
||
# Merge the polygons to create single state/territory boundaries | ||
wa = gpd.GeoSeries(unary_union(wa.geometry)) | ||
nt = gpd.GeoSeries(unary_union(nt.geometry)) | ||
sa = gpd.GeoSeries(unary_union(sa.geometry)) | ||
qld = gpd.GeoSeries(unary_union(qld.geometry)) | ||
nsw = gpd.GeoSeries(unary_union(nsw.geometry)) | ||
vic = gpd.GeoSeries(unary_union(vic.geometry)) | ||
tas = gpd.GeoSeries(unary_union(tas.geometry)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to do local timezone conversion here - sunset and sunrise should be able to be calculated directly in UTC. I think if we pass in local_tz = 0
to sunriset
, we should be able to do the enture calculation in UTC and avoid having to load and manipulate custom timezone polygons.
## Set the local timezone for the analysis area of interest | ||
if wa.contains(point_4283)[0] == True: | ||
timezone = 'Australia/West' | ||
local_tz = 8 | ||
|
||
elif nt.contains(point_4283)[0] == True: | ||
timezone = 'Australia/North' | ||
local_tz = 9.5 | ||
|
||
elif sa.contains(point_4283)[0] == True: | ||
timezone = 'Australia/South' | ||
local_tz = 9.5 | ||
|
||
elif qld.contains(point_4283)[0] == True: | ||
timezone = 'Australia/Queensland' | ||
local_tz = 10 | ||
|
||
elif nsw.contains(point_4283)[0] == True: | ||
timezone = 'Australia/NSW' | ||
local_tz = 10 | ||
|
||
elif vic.contains(point_4283)[0] == True: | ||
timezone = 'Australia/Victoria' | ||
local_tz = 10 | ||
|
||
elif tas.contains(point_4283)[0] == True: | ||
timezone = 'Australia/Tasmania' | ||
local_tz = 10 | ||
|
||
## Calculate the local sunrise and sunset times | ||
# Place start and end dates in correct format | ||
start = time_range[0] | ||
end = time_range[-1] | ||
startdate = datetime.date(pd.to_datetime(start).year, | ||
pd.to_datetime(start).month, | ||
pd.to_datetime(start).day) | ||
|
||
# Make 'all_timerange' time-zone aware | ||
localtides = time_range.tz_localize(tz=pytz.UTC).tz_convert(timezone) | ||
|
||
# Replace the UTC datetimes from all_timerange with local times | ||
modelledtides = pd.DataFrame(index = localtides) | ||
|
||
# Return the difference in years for the time-period. | ||
# Round up to ensure all modelledtide datetimes are captured in the solar model | ||
diff = pd.to_datetime(end) - pd.to_datetime(start) | ||
diff = int(ceil(diff.days/365)) | ||
|
||
## Model sunrise and sunset | ||
sun_df = sunriset.to_pandas(startdate, tidepost_lat_4326, tidepost_lon_4326, local_tz, diff) | ||
|
||
## Set the index as a datetimeindex to match the modelledtide df | ||
sun_df = sun_df.set_index(pd.DatetimeIndex(sun_df.index)) | ||
|
||
## Append the date to each Sunrise and Sunset time | ||
sun_df['Sunrise dt'] = sun_df.index + sun_df['Sunrise'] | ||
sun_df['Sunset dt'] = sun_df.index + (sun_df['Sunset']) | ||
|
||
## Create new dataframes where daytime and nightime datetimes are recorded, then merged | ||
## on a new `Sunlight` column | ||
daytime=pd.DataFrame(data = 'Sunrise', index=sun_df['Sunrise dt'], columns=['Sunlight']) | ||
nighttime=pd.DataFrame(data = 'Sunset', index=sun_df['Sunset dt'], columns=['Sunlight']) | ||
DayNight = pd.concat([daytime, nighttime], join='outer') | ||
DayNight.sort_index(inplace=True) | ||
DayNight.index.rename('Datetime', inplace=True) | ||
|
||
## Create an xarray object from the merged day/night dataframe | ||
day_night = xr.Dataset.from_dataframe(DayNight) | ||
|
||
## Remove local timezone timestamp column in modelledtides dataframe. Xarray doesn't handle | ||
## timezone aware datetimeindexes 'from_dataframe' very well. | ||
modelledtides.index = modelledtides.index.tz_localize(tz=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above - avoid local timezones/timezone conversion and just do everything in UTC / local_tz=0
.
This PR:
Intertidal_workflow.ipynb
is included and demo's calculation of the full timeseries for exposure, as previously.This PR does not:
Notes on the structure of the new exposure function:
Future work will address the known issues and add a notebook dedicated to showcasing exposure custom options