-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Min/Max value check does not handle ASCII_Time_* Strings #1135
Comments
I do not think an alpha comparison would work. The time regular expressions allow for parts of the date time field to be missing. While all of the examples are consistent enough to use alpha comparison, I think that validate would have to translate them to DataTime objects then compare to cover all allowable cases. |
@jmafi If I understand the definitions of min/max, can we assume that the datetime format used in the min/max values needs to match the format in the data? For instance if min is |
@jordanpadams: I believe that the specific case that you gave would not be allowed, but only on a technicality. The only requirement is that the min/max values must be the same data type as the data. In the case you gave @al-niessner: You are undoubtedly right. I just offered up an alpha comparison as I guessed that it would be a much less costly approach. Would it be possible to "zero pad" the min/max values to match the format of the data? |
@jmafi we can probably make this happen, but as @al-niessner mentioned, this will just be a very costly operation from an algorithm perspective. To confirm, is the datetime format described in the table description? That would help a little bit versus having to guess for every column. |
@jordanpadams typically the only required description of the datetime format in the table description is the |
To be workable, the format for the field must be given in one of those magic ascii keywords. For instance, if the field was I do not know the PDS rules at all. I just fix the code. So, here are some questions that would need to be settled:
If 1 and 2 are true, then the solution is straight forward. If 1 is true but 2 is false, then we can do it if 1 is provided; otherwise, what is If 1 is false, then I have a host of other questions relating to |
Not sure what this is asking. Apparently field_format can be a string, even if the the data_type is a datetime type, which doesn't make any sense.
Yes. |
As Joe mentioned, the data_type should key us in on the format of the column, e.g. |
Starting here are the possible values for datetime strings from the IM: https://pds.nasa.gov/datastandards/documents/im/v1/index_1N00.html#19.5%C2%A0%C2%A0class_pds_ascii_date |
My question (1) was are
If they are, then we can ignore From your answers (both true) this looks pretty straight forward. What to do if the special constants do not match the Special constants is getting complicated. Might be a trick to add it without breaking everything, but the actual additive should be straight forward. |
I'm sorry if my previous posts were misleading. Yes, the Field_* class includes both data_type and field_format attributes. data_type is required, field_format is optional. For
data_type provides a hint as to the actual format of the time string, but only a hint since all time elements after the year are optional. In other words all of the following would be valid ASCII_Date_Time_DOY_UTC values:
First off, any ASCII data field can accurately be described as being an ASCII string (%s), it's just not particularly useful. That said, ISO time strings may contain a number non-numeric characters ("T" and ":", plus "-" in weird places). An alternative to the "%23s" field format_value would have been something like:
However, the pattern defined for field_format ( |
We had some previous discussions about handling datetime strings (and their formatting and precision issues) in this issue: I'm mostly pointing it out so that whatever solutions we come up with are consistent. |
Checked for duplicates
Yes - I've already checked
🐛 Describe the bug
Validate generates the following error when run on a data files containing an ASCII time string:
ERROR [error.table.field_value_not_a_number] data object 2, record 1, field 1: Cannot cast field value '2004-183T04:11:26.809' to a Number data type to validate against the min/max values defined in the label.
The error message suggests that when checking data values against File_Area_Observational.Table_.Field_.Special_Constants.valid_minimum/maximum values Validate tool assumes that the valid_minimum/maximum values are numeric. In this example the data values are times strings.
🕵️ Expected behavior
In the case of time string data values, Validate should use an alphabetic instead of numeric comparison for determining whether the data fall within the range defined by the valid_minimum/maximum attributes.
📜 To Reproduce
validate -t xmlfile -r logfile
🖥 Environment Info
📚 Version of Software Used
No response
🩺 Test Data / Additional context
The attached files include Table_Binary, Table_Character, and Table_Delimited examples, all of which produce similar results.
min_max_value_time_strings-20250212.zip
🦄 Related requirements
🦄 #xyz
Acceptance Criteria
Given
When I perform
Then I expect
⚙️ Engineering Details
No response
🎉 Integration & Test
No response
The text was updated successfully, but these errors were encountered: