All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- PIPELINE-1875: Ensure to set all fields mode None as NULLABLE
- PIPELINE-1875: Ensure destination tables of pipe_segment steps are created even without data.
- PIPELINE-1319: Adds
support for runnig the code with
python-3.8
and updated the google sdk from2.40.0
to 2.49.0.
- PIPELINE-946: Changes
how the
segment_identity_daily_
table was being saved. Issue detected when running in back-fill mode, only saves one shard per backfill. Fix implies saving one shard usingsummary_timestamp
instead of from/to args.
- PIPELINE-914: Changes
version of
Apache Beam
from2.35.0
to 2.40.0.
- PIPELINE-869: Removes deprecated code and type check parameter in segment_identity_daily that let read in EXPORT mode.
- PIPELINE-807: Changes
to support Beam
2.35.0
. Points to last shipdataprocess SHA commit that supports python3. Separates Dockerfile scheduler and worker and create separate images.
- PIPELINE-431: Removes
Travis and its references and uses cloudbuild instead to run the tests.
Uses gfw-pipeline as Docker base image.
Updates
pipe-tools
with update in beam reference when reading schema from json. Removes 4 warnings from tests.
- Data Pipeline/PIPELINE-84:
Adds support of Apache Beam
2.28.0
. Increments Google SDK version to338.0.0
. Fixes tests after update of Beam.
- Data Pipeline/PIPELINE-155: Adds a new data quality step for segment identity metrics.
- Data Pipeline/PIPELINE-129:
Adds
segmenter_params
to segment airflow step andmax_gap_size_value
to Airflow variable.
- Data Pipeline/PIPELINE-144:
Changes
gpsdio-segment
version to use the latest fixed version0.20.2
.
- Data Pipeline/PIPELINE-139:
Changes the
gpsdio-segment
version to use the latest fixed version,0.20.1
.
- GlobalFishingWatch/gfw-eng-task#129: Added
- flag to enable or disable the run of the aggregation tables,
segment_info
,segment_vessel
,vessel_info
.
- GlobalFishingWatch/gfw-eng-task#56: Changes
the use of the Airflow Variables
PIPELINE_START_DATE
to the value that is stored indefaults_args
asstart_date
.
- GlobalFishingWatch/gfw-eng-task#111: Changes
- the version of the
pipe-tools:v3.1.2
.
- the version of the
- GlobalFishingWatch/gfw-eng-task#48: Changes Bash Operator to flexible operator. version to gpsdio-segment:0.20
- GlobalFishingWatch/gfw-eng-tasks#49: Changes Pin version of gpsdio-segment to v0.19 non-dev version.
- GlobalFishingWatch/pipe-segment/pull/101: Adds
- Support new version of gpsdio-segment, but continue to emit old style segments as well for backwards compatibility. See PR for details
- Improve memory usage significantly by cogrouping messages rather than passing as side arguments. Also, filter out noise segments before grouping and use more temporary shards on output.
- Update to pipe-tools 3.1.1 and support Python 3.
- Upgrade Google SDK to 268 from 232
- Remove the fixed version of pip to 9
- GlobalFishingWatch/GFW-Tasks#1030: Changes
- the way we pass the machine type to dataflow so it re-allow us to send the custom machine type.
- GlobalFishingWatch/GFW-Tasks#1015: Changes
- Updated version of gpsdio-segment in order to include the fix of A vs B messages.
- GlobalFishingWatch/GFW-Tasks#1000: Changes
- Forces ordering when serializing and deserializing segmenter state each day so that the segmenter state timestamp is correctly calculated.
- GlobalFishingWatch/GFW-Tasks#991: Adds
- version 2.0.0 to pipe-tools that split airflow dependencies from dataflow dependencies. Check the repo
- GlobalFishingWatch/GFW-Tasks#992: Adds
- Fixed issue with a dependency in gpsdio-segment
- #80 and #77 Take into account if the message is of type A or B to generate the segment. Uses the change done in GPSDIO version 0.12
- #83 Add vessel_id field to segment_info table
- #87 Increase the noise threshold for determination of spoofing, and parameterize
- GlobalFishingWatch/GFW-Tasks#982 Include width and length of vessels in the segment_info, vessel_info, vessel_identity_daily and segment_identity_daily tables
- GlobalFishingWatch/GFW-Tasks#979 Include the Yearly run mode.
- DEPRECATED segment_identity and identity_messages_monthly.
- #66 Refactor Segment Identity
- #71 Add param MOST_COMMON_MIN_FREQ which is used to filter noise values when determinig the most commonly occuring identity value used to assign vessel_id
- #76 Ranked vessel_id per segment in segment_vessel table
- #61 Include additional noise and message count fields in segment_info table
- #68 Bump version of pipe-tools to 0.1.7
- #53 Improved Vessel ID creation scheme vessel_info table
- #50 Force SSVID to string before segmenting
- #44 pin pip version to 9.0.3
- #45 Change dataflow machine type to increase memory
- #47 Update to pipe-tools v0.1.6
- #35 Importable Dags. Update to pipe-tools v0.1.4
- Initial release.