Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Application of QARTOD QC to data sets #268

Closed
kerfoot opened this issue Aug 25, 2023 · 7 comments
Closed

Inconsistent Application of QARTOD QC to data sets #268

kerfoot opened this issue Aug 25, 2023 · 7 comments
Assignees

Comments

@kerfoot
Copy link
Contributor

kerfoot commented Aug 25, 2023

Per inquire email from Mike Crowley (Rutgers University Coastal Ocean Observation Lab):

"It is my understanding that the required and strongly recommended QC tests in the IOOS glider QARTOD manual (page 18) are not being applied to the real-time and delayed mode data streams coming into the DAC".

Following up with Mike, there is concern by the data provider community and end users with respect to the application of QC to both real-time and delayed mode data sets. The following concerns and questions, all directly related to this issue, were highlighted:

  1. QARTOD algorithms are not being applied consistently to real-time and delayed mode data sets.
  2. When applied, the results of the QC tests are not propagated to all ERDDAP data set end points.
  3. When the results are stored, some results variables do not contain any values (all _FillValues), indicating that the test was not run. At a minimum, each record in a qartod_ variable should contain a flag value of 2 (NOT EVALUATED).
  4. Old, out of date flags are still visible in all data sets and do not contain any results (all _FillValues), regardless of whether any test was actually performed:
    • density_qc
    • depth_qc
    • conductivity_qc
    • lat_qc
    • lon_qc
    • latitude_qc
    • longitude_qc
    • precise_lat_qc
    • precise_lon_qc
    • precise_time_qc
    • pressure_qc
    • salinity_qc
    • temperature_qc
    • time_qc
    • time_uv_qc
  5. Can the DAC provide the documentation detailing the end to end QC process as it currently is implemented for both real-time and delayed mode data sets:
    • How often does the process run?
    • How are files that need to be QC'd identified?
    • How often/when is the ERDDAP data set description (<dataset /> XML) updated to reflect the inclusion of the corresponding qartod_ variables?
    • If a data set is marked as complete by the data provider and archived by NCEI, but the data provider uploads a new set of files at some point after, are the QC algorithms applied to this new data set, propagated to ERDDAP and is a new NCEI archival package containing the updated data set created?
  6. Are the results of the applied qc flag values evaluated by the DAC to determine their accuracy and/or effectiveness in identifying suspect profiles?
  7. How are the results fo the qc algorithms evaluated for efficacy by the DAC?
  8. Are the applied qc algorithms identifying suspect profiles and does the DAC endorse the results?
  9. How are the results of the applied QC flags affecting data set integrity and accuracy?
  10. Are the qartod_ qc flags considered by NDBC when releasing the observations to GTS?

This issue has been fowarded to Mike in order to keep him in the loop on status and/or progress.

@Acolohan
Copy link

This is an issue I would prefer discussing as a team on Wednesday. The primary concern is that there are multiple issues in this single issue plus duplication in other (earlier) open GitHub issues. Let think about how we can Identify the tasks needed first and then open new GitHub issues to track those tasks.

@kerfoot
Copy link
Contributor Author

kerfoot commented Aug 28, 2023

The issue Mike filed is the most comprehensive with respect to QC. The answers to all of these questions should ultimately form the DAC's overall approach to QC going forward.

My recommendation, for whatever it's worth, is:

  1. Delete the following old QC focused issues as they are all essentially duplicates to points/questions listed in the user's issue:

  2. If the desire is to break the user's issue into individual issues, we can then do this. However, we need to make sure that all 10 separate issues are considered in what should be a big picture approach to QC going forward.

  3. Create a #qc tag and tag all of the individual issues in order to make sure they are considered collectively when being addressed.

Happy to discuss on Wednesday's call.

@Acolohan
Copy link

Aye, we can talk about this Wed.

@dpsnowden
Copy link
Contributor

@kerfoot RE your item 2. Yes, the goal of our exercise is to break issues into small enough chunks that they can be assigned to a single person and so that they are relatively stand alone. As written, the first issue description of this thread has ten items, that may be related, but are much bigger than any one person can tackle. Here are a few articles that describe various techniques for managing issues. I think this guide is an excellent summary of the what and the why behind writing good issues and how they make managing work across distributed teams more efficient. I particularly like the suggestion to create three different issue templates for the various types of issues we might encounter (bug, feature request, regular issue aka unstructured).

I agree with the notion that we should review old issues for redundancy or lack of clarity, but I don't agree with deleting independent issues in favor of one large roll up issue. I don't think you're suggesting that.

Labels could be a good solution to organize similar concepts. But I also caution against going overboard and creating too many that are chaotic. Here's a good default list to start with. Labels can be good for grouping common themes but we shouldn't use them to track work. That's what the milestones are for. @DonaldMoretti and @Acolohan are trying to devise the milestone or release plan for the next few months. Issues will be attached to Milestones which have dates. This is why it's so important to have granular issues that can be assigned to dates and people.

Thanks for leading the charge on QC, it's an important topic!

@kerfoot
Copy link
Contributor Author

kerfoot commented Aug 30, 2023

@dpsnowden re: "I don't agree with deleting independent issues in favor of one large roll up issue. I don't think you're suggesting that".

Nope, I am not suggesting that. On our weekly check in call this morning, we discussed and agreed that the issues noted above should be closed as OBE since they are all encapsulated within the larger 10 item issue I filed after in person discussions with the clients in which they voiced their concern for a clearer picture, from the DAC, on how QC is applied. I do not believe that we have a cohesive plan for application of reliable QC that we can defend in a public forum or at least, I'm not aware of it.

We also agreed on scheduling an additional weekly check in for technical discussions of how we will approach the issue that IOOS allows us to work on. This will provide some context in terms of how big of a lift the solution is while also freeing up more time to discuss high level goals and objectives on the weekly all hands check in. This call is scheduled for Thursdays at 10am. This is the day after the weekly all hands check in, so it allows us to attack fresh issues while providing the maximum amount of time before the next weekly all hands check in.

For tomorrow's technical call in, we will further discuss closing the 5 issues as OBE and then reframing the larger 10 item issue into smaller, more manageable tasks and provide some realistic numbers for accomplishing them. Then, IOOS can decide if they want us to work them. Once (if) they have been okayed, Don can assign the appropriate people and move them into their corresponding milestones. Then they get worked and documented.

@sarinamann-noaa
Copy link

Leila will review this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants