-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid Generating Extra Gaps #102
base: develop
Are you sure you want to change the base?
Conversation
* update regions * remove bad eten peru anchorages * remove pipe-tools dependencies * reduce changes relative to master * switch to using Dockerfile-scheduler in Dockerfile compose * fix requirements and other docker infrastructure to run name_anchorages using Docker on Dataflow. * update numpy due to security vulnerability Co-authored-by: Nate Miller <[email protected]>
…ery and adjust the format string exp according to the amount of args
…lass in py8, shard port_state by last_timestamp
…ot be created directly
…ks superficially plausible
…am[gcp] to authenticate and remove numpy reference that comes already in the DF SDK 2.40.0
Improve memory efficiency
Moves to py3.8
…sting overrides and those Hannah commited to Master
…nd labes of output tables
Several changes to update the read/write transfors and descriptions and labels
@@ -68,7 +81,7 @@ def thin(self, grouped_records, anchorage_map): | |||
and rcd.timestamp - last_rcd.timestamp >= self.min_gap | |||
): | |||
active.add(last_rcd) | |||
active.add(rcd) | |||
active.add(rcd._replace(is_possible_gap_end=True)) | |||
|
|||
if self.transition_map[(last_state, state)]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first thought is that at this point the record is still being dropped due to thinning (if both points - before and after gap - are IN_PORT
) , no?
edit: the result would be that we'd get rid of gaps altogether - in theory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Chr96er , if there is a gap seen during the thinning it should keep the start and end of the gap in the thinned points, and (this is the key part) the end of the gap gets tagged with is_possible_gap_end
. Then in creare_in_out_events
, points with that tag, but no other points can emit gaps. So we should still get gaps when there's a legitimate gap.
It might make sense to only emit this pair of points when the current point is in port (since that's the only time we'd generate a gap), but I'm not sure it makes a difference one way or the other.
… which means not using it for last state. Also fix bug where we forgot to inherit from NamedTuple
…l to only emit one item for each message
…rect seg_id. Also clean up section that wasn't causing an issue, but was confusing and would likely lead to some bugs later
This PR attempts to reduce the number of extra gaps being generated by tagging messages during the thinning phase
with
is_possible_gap_end
. Only tagged messages that have this set to true can generate a gap event.Caution
This has not been tested. Do not merge without testing!
Changes
message_schema.py
: addis_possible_gap_end
column. Remove unuseddestination
column.smart_thin_records.py
: Convert VesselLocationRecords to VisitLocationRecords on input. These have the added fieldis_possible_gap_end
, which is initially set to False. When emitting points due to a gap, setis_possible_gap_end
to True for the gap end.port_visits_pipeline.py
: load messages as VisitLocationRecord rather the VesselLocationRecord to accommodateis_possible_gap_end
field.create_in_out_events.py
: Only generate gap events if the end point hasis_possible_gap_end
set.