Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAS-2232 -small functions added to support the main solution in the t… #16
DAS-2232 -small functions added to support the main solution in the t… #16
Changes from 2 commits
5f8d987
14a634e
ca881d1
36a3c40
b2b30c5
51b110c
73390fd
9c22bfd
d972777
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be ok in context, but as a stand-alone function it is a bit odd - taking in a variable_name, and then only working if that variable is either a latitude or longitude variable. In fact, the result is not really dependent upon the variable_name passed in, other than to use the path. It might be that it has the side effect of verifying the variable_name as a latitude or longitude, but that is because it calls upon the is_latitude and is_longitude VarInfo methods, which can be called directly. It raises the exception MissingCoordinateVariable if the variable passed in is not a latitude or longitude, but is a known variable, i.e., not intrinsically a missing coordinate use case.
It has the feel of a method perhaps taken as frequently used code, in which case the name could be check_coord_var_and_get_std_dimensions. Alternatively, if there is not the need to verify the variable is a coordinate, the code could be simplified to simply return standard dimensions within a shared path - a get_std_projected_dimension_reference. If the only use is in the following method, that would be my recommendation.
As is, the name could also be: get_std_projected_coordinate_dimensions, but I wonder about the need to check for is_latitidue or is_longitude.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will use your new suggested names
I originally had the is_latitude to map to projected_y and is_longitude to map to projected_x
I removed that.
But we do need to validate that are coordinates - we are using that path for the projected dimensions..
Maybe it does not have to be within this method..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made the updates - removed "override" and renamed the methods. removed the lat/lon check in the "get_projected_dimension_names' function"
commit - ca881d1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This choice of dimension names is better than before, but it's still likely to cause problems down the road. This function currently relies on the assumption that coordinate variables for different grids will be in different groups, which is not guaranteed.
(I think we see this with some of the gridded ICESat-2 ATLAS collections, for example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I could just have group_path/coordinate_name_projected_x or projected_y
The coordinate names would be unique
it would be latitude_projected_x or latitude_projected_y . It would not matter . it would be unique
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One approach (that would make this function more complicated 😞) could be to:
VariableFromDmr
instance.coordinates
metadata attribute.get_coordinate_variables
function..../projected_x
and.../projected_y
(like you're already doing with the group path).I took some inspiration from MaskFill. To try something like:
The string will look a bit ugly (e.g., 'a089b9ebff6935f6c8332710de2ee3b351bd47c1fb807b22765969617027e8d2'), but it will be unique and reproducible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is expected to be updated when we do use a configuration file, but in the mean time, perhaps get_std_projected_dimensions is a better name. Also, such a revised name suggests a method that is well defined and ready to merge, vs. getting the projected dimensions, possibly with transposed names, which is TBD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used override because we are overriding the coordinates with projected dimensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to sound like a broken record, but I do not think we should assume leaning on a configuration file is the best implementation. There are two relevant tenets I try to generally adhere to in our service code:
We have the information we need to determine dimension ordering while not needing configuration file-based information. And, if we write the code in a general way, this could open HOSS up to a large number of collections that are HDF-5 format but without proper 1-D dimension variables. That could be a massive win! But it would be stymied by data providers needing to add information to the configuration file. To date, no-one outside of DAS has added any configuration file information to any of our services in the last 3 years. That's pretty telling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, yup, I acknowledge that this monologue is not strictly relevant to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like the better output for this function (definitely out-of-scope for this PR) would be a list with the number of elements matching the number of dimensions of the variable. Maybe the additional, non-spatial, dimensions could have
None
for their dimensions. Just to write out what that might look like:[None, '/unique_cache_key_one', '/unique_cache_key_two']
[None, '/unique_cache_key_two', '/unique_cache_key_one']
['/unique_cache_key_one', '/unique_cache_key_two', None]
['/unique_cache_key_one', '/unique_cache_key_two']
['/unique_cache_key_one']
['/unique_cache_key_two']
[None]
[None]
If this same function was called in
add_index_range
, that would mean that you'd have a correctly ordered list, and would end up doing something likeindex_ranges.get(None)
, and so end up with[]
in the right places for dimensions you don't have ranges for. Something to maybe discuss during the next iteration?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is where we need the additional:
(excuse the pidgeon python)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I will include it as a comment till we make the configuration change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly prefer what we have now. Also, the function contents exactly matches the function name.
Also, the snippet supplied above is not correct Python code, so it's hard to know for sure what you are trying to achieve. Trying to decompose that snippet:
VarInfoFromDmr
instance does not have any of the listed dimensions as variables.and not []
is a no-op. It will always evaluate toTrue
, because it is being evaluated in isolation, and you are asking if an empty list is "falsy", which it is.While I don't like this approach, I think what you are trying to suggest would be more like:
If this was to be augmented in such a way, I would recommend breaking this check out into it's own function, because the set comprehension will become very hard to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see splitting out the function to clarify the code and document the comprehension.
I'm less clear on forcing the upstream code to use this code without the additional check, and then add in the call to the new function in every usage. That additional check is now essential to ensure the case of OPeNDAP creating "empty" dimensions does not allow this check, by itself, to succeed. And of course, OPeNDAP's "empty" dimensions is pretty much always going to be the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function updated and unit tests added - d972777
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I asked this on the other PR - should this check be mutually exclusive to the other checks in the longitude or latitude blocks, or should it be done in addition to those checks? Right now, if you have a fill value, you are only checking for where the coordinate is not fill, and not considering your other checks. (I tend to prefer this first check, but wanted to confirm what the logic was intended to be)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. if we have a fill value - we use that to check for validity of the data and if that is not available we check for the geo extent range. I guess we can check for it even if the fill value is provided - in case the coordinate data itself is bad data.
if we check the lat/lon valid range first, the check for fill does become redundant..the fill value would definitely be outside that range,...
I guess @autydp can weigh in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would check the coordinates regardless, and let the fill-value be outside that range. It simplifies the code and those checks need to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated - 51b110c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reworking this so that the fill value check and the latitude/longitude range checks can both happen.
I think the
coordinate.is_latitude()
andcoordinate.is_longitude()
checks could benefit from somenumpy
magic, rather than looping individually through each element. I think what you could use is the element-wise-and, which can be either written as&
ornp.logical_and
. You could do something like:Note, in the snippet above, I've also changed the first check from
if coordinate_fill
toif coordinate_fill is not None
. That's pretty important, as zero could be a valid fill value, but ifcoordinate_fill = 0
, then this check will evaluate toFalse
.Ultimately, I think the conditions you have now are equivalent to this, just maybe not as efficient. So the only thing I'd definitely like to see changed is that first
if coordinate_fill
condition, so that it's not considering of fill value of 0 to be non-fill.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some collections aren't normalised to just
-180 ≤ longitude (degrees east) ≤ 180
, some are0 ≤ longitude (degrees east) ≤ 360
. Maybe this check should be:Also, please use
and
instead of&
which is a bitwise-and operator, and not the logical AND operator. (Same in the condition for latitude checks below)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh sorry I knew that. Not sure why I used &. will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think changing to
and
gave a different error :Simplify chained comparison between the operands (chained-comparison)
have to change that check to be simpler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wanted to come back to this thread - I was wrong about the
&
. But I think I prefer the equivalentnp.logical_and
just because I'm generally too verbose.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this function? It's just a wrapper for
VariableFromDmr.get_attribute_value('_FillValue')
.The only difference, really, is that the value is always being cast as a
float
. But some variables aren't floats, so this seems like an iffy thing to do.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not fill value for any variable. the fill value for the coordinates has to be converted to a float right? to compare.
I could remove the function...but If there are cases where the fill value will not be a float, it is could to have this method to handle the different cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could remove the method and add it as part of get_valid_indices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python plays a little fast and loose with types sometimes. If you have an integer and you are trying to do something with it and a float, then Python tries to be clever and treat that integer as a float.
For example:
So, I think we're probably okay in the instance that the fill value metadata attribute is an integer. (But I totally acknowledge that there is somewhat sneaky stuff Python is doing in the background here)