Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate traces #27

Open
vpet98 opened this issue Dec 8, 2023 · 6 comments
Open

Duplicate traces #27

vpet98 opened this issue Dec 8, 2023 · 6 comments
Assignees

Comments

@vpet98
Copy link

vpet98 commented Dec 8, 2023

For some cases the webservice appears to return some duplicate rows.

Screenshot from 2023-12-08 17-51-52

In the above screenshot there are two identical rows for each channel for 2023-12-02. WFCatalog database has one entry for each of these channels for this day.

@jbienkowski
Copy link
Member

Hi @vpet98,

Can you try to querying WFCatalog segments/daily streams related to these entries directly in WFCat MongoDB to see if it is correct there?

Thanks,
J

@vpet98
Copy link
Author

vpet98 commented Dec 12, 2023

daily_streams entries of 2023-12-02 for this station seem normal as on other dates. Should I look in something specific?

In c_segments I am not sure what to query. If I have to query with the _id attribute that I see in daily_streams, this returns nothing for the aforementioned daily_streams entries.

Also, there is an availability collection that does have 2 duplicate entries (with different _id) for this station on this date.

Not sure if these help. Please tell me if I need to search something more specific.

Thank you

@jbienkowski
Copy link
Member

jbienkowski commented Dec 13, 2023

Hi @vpet98,

If you look at the wfrepo, the WFCatalog c_segments and daily_streams should be in line with the availability view. It means that all _id's of all WFCatalog objects should appear only once in the availability view:

  • For documents from daily_streams with avail >= 100 there should be 1 document in the availability view with the same ObjectId:

    wfrepo> db.daily_streams.findOne({avail: {$gte: 100}})
    
    {
    _id: ObjectId("575ec657e5f2c10393444345"),
    ...
    }
    
    wfrepo> db.availability.findOne({_id: ObjectId("575ec657e5f2c10393444345")})
    {
      _id: ObjectId("575ec657e5f2c10393444345"),
    ...
    }
  • If document in the daily_streams has avail < 100, it looks for all documents referenced to it in the c_segments and creates one corresponding document in the availability view:

    wfrepo> db.daily_streams.findOne({avail: {$lt: 100}})
    
    {
      _id: ObjectId("62a9bc82af4b924a293edce9"),
    ...
    }
    
    wfrepo> db.c_segments.findOne({streamId: ObjectId("62a9bc82af4b924a293edce9")})
    {
      _id: ObjectId("62a9bc82af4b924a293edcea"),
    ...
    }
    
    wfrepo> db.availability.findOne({_id: ObjectId("62a9bc82af4b924a293edcea")})
    {
      _id: ObjectId("62a9bc82af4b924a293edcea"),
    ...
    }

When the view is created we do a merge on _id ($merge: { into: "availability", on: "_id", whenMatched: "replace" }), so the same source object from WFCatalog should never have duplicates in the availability view.

Could it be that those duplicate rows are result of archive backfilling and WFCat reprocessing? Now that I think about it it would make sense to improve the views/main.js script to remove availability documents meeting request criteria before reprocessing.

Let me know what you think,
Cheers,
Jarek

@vpet98
Copy link
Author

vpet98 commented Dec 13, 2023

Hello,
Thanks for explaining this.

Ok, so if there is an _id in availability that does not exist in neither of the other collections, should probably be removed, right?

And I suppose this could have happened after archive backfilling and WFCatalog reprocessing as you mentioned. Except if for some reason a daily_stream document was removed and later was readded with different _id.

If this is the case, I think removing such availability documents before reprocessing seems more than a good idea to avoid having duplicate rows in the results.

@jbienkowski
Copy link
Member

Nice, let me know if you want to improve the view creation script, otherwise I'll do it, but it will be probably next year. :-)

@vpet98
Copy link
Author

vpet98 commented Dec 14, 2023

Ok, if I come up with something at some point I will make a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants