This repository has been archived.
Repo for exploring the use of Apache Beam as the orchestrator for OGDC recipes.
Repo currently focuses on following along with the beam "getting started" materials: https://beam.apache.org/get-started/
To start, run the built-in copy of the word-count example with the following command, just to make sure that Apache Beam is correctly installed.
python -m apache_beam.examples.wordcount_minimal \
--input data/words.txt \
--output data/wordcounts_official_example.txt
This outputs a file wordcounts_official_example.txt-00000-of-00001
. Why doesn't it
match the requested output file name?
python -m wordcount_example \
--input data/words.txt \
--output data/wordcounts_our_example.txt
The output file looks the same as the output file from the above example. There is significantly less log output, however. Why is that?
python -m seal_csv_to_gpkg