Skip to content

Fix typos , format and grammar #21

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@ title: Metafacture Tutorial
description: This is a tutorial to Metafacture.
theme: just-the-docs

url: https://metafacture.github.io/metafacture-documentation
url: https://metafacture.github.io/metafacture-tutorial

aux_links:
Metafacture Documentation on Github: https://github.com/metafacture/metafacture-tutorial
Metafacture Tutorial on Github: https://github.com/metafacture/metafacture-tutorial

# External navigation links
nav_external_links:
Expand Down
47 changes: 23 additions & 24 deletions docs/02_Introduction_into_Metafacture-Flux.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ See the result below? It is `Hello, friend. I'am Metafacture!`.
But what have we done here?
We have a short text string `"Hello, friend. I'am Metafacture"`. That is printed with the modul `print`.

A Metafacture Workflow is nothing else than an incoming text string that is manipulated by one or multiple moduls that do something with the incoming string.
A Metafacture Workflow is nothing else than an incoming text string that is manipulated by one or multiple modules that do something with the incoming string.
However, the workflow does not have to start with a text string but can also be a variable that stands for the text string and needs to be defined before the workflow. As this:

```text
Expand Down Expand Up @@ -93,8 +93,7 @@ inputFile
```

The inputFile is opened as a file (`open-file`) and then processed line by line (`as-line`).
You can see that in this [sample](https://metafacture.org/playground/?flux=inputFile%0A%7Copen-file%0A%7Cas-lines%0A%7Cprint%0A%3B&data=Hello%2C+friend.+I%27am+Metafacture%21).
https://metafacture.org/playground/?flux=inputFile%0A%7C+open-file%0A%7C+as-lines%0A%7C+print%0A%3B&data=Hello%2C+friend.+I%27am+Metafacture%21
Have a look at this [sample](https://metafacture.org/playground/?flux=inputFile%0A%7Copen-file%0A%7Cas-lines%0A%7Cprint%0A%3B&data=Hello%2C+friend.+I%27am+Metafacture%21).

We usually do not start with any random text strings but with data. So lets play around with some data.

Expand All @@ -108,7 +107,7 @@ You will see data that look like this:

This is data in JSON format. But it seems not very readable.

But all these fields tell something about a publication, a book, with 268 pages and title Ordinary Vices by Judith N. Shklar.
All these fields tell us something about a publication, a book, with 268 pages and title "Ordinary Vices" by Judith N. Shklar.

Let's copy the JSON data into our `ìnputFile-content` field. [And run it again](https://metafacture.org/playground/?flux=inputFile%0A%7Copen-file%0A%7Cas-lines%0A%7Cprint%0A%3B&data=%7B%22publishers%22%3A+%5B%22Belknap+Press+of+Harvard+University+Press%22%5D%2C+%22identifiers%22%3A+%7B%22librarything%22%3A+%5B%22321843%22%5D%2C+%22goodreads%22%3A+%5B%222439014%22%5D%7D%2C+%22covers%22%3A+%5B413726%5D%2C+%22local_id%22%3A+%5B%22urn%3Atrent%3A0116301499939%22%2C+%22urn%3Asfpl%3A31223009984353%22%2C+%22urn%3Asfpl%3A31223011345064%22%2C+%22urn%3Acst%3A10017055762%22%5D%2C+%22lc_classifications%22%3A+%5B%22JA79+.S44+1984%22%2C+%22HM216+.S44%22%2C+%22JA79.S44+1984%22%5D%2C+%22key%22%3A+%22/books/OL2838758M%22%2C+%22authors%22%3A+%5B%7B%22key%22%3A+%22/authors/OL381196A%22%7D%5D%2C+%22ocaid%22%3A+%22ordinaryvices0000shkl%22%2C+%22publish_places%22%3A+%5B%22Cambridge%2C+Mass%22%5D%2C+%22subjects%22%3A+%5B%22Political+ethics.%22%2C+%22Liberalism.%22%2C+%22Vices.%22%5D%2C+%22pagination%22%3A+%22268+p.+%3B%22%2C+%22source_records%22%3A+%5B%22marc%3AOpenLibraries-Trent-MARCs/tier5.mrc%3A4020092%3A744%22%2C+%22marc%3Amarc_openlibraries_sanfranciscopubliclibrary/sfpl_chq_2018_12_24_run01.mrc%3A195791766%3A1651%22%2C+%22ia%3Aordinaryvices0000shkl%22%2C+%22marc%3Amarc_claremont_school_theology/CSTMARC1_barcode.mrc%3A137174387%3A3955%22%2C+%22bwb%3A9780674641754%22%2C+%22marc%3Amarc_loc_2016/BooksAll.2016.part15.utf8%3A115755952%3A680%22%2C+%22marc%3Amarc_claremont_school_theology/CSTMARC1_multibarcode.mrc%3A137367696%3A3955%22%2C+%22ia%3Aordinaryvices0000shkl_a5g0%22%2C+%22marc%3Amarc_columbia/Columbia-extract-20221130-001.mrc%3A328870555%3A1311%22%2C+%22marc%3Aharvard_bibliographic_metadata/ab.bib.01.20150123.full.mrc%3A156768969%3A815%22%5D%2C+%22title%22%3A+%22Ordinary+vices%22%2C+%22dewey_decimal_class%22%3A+%5B%22172%22%5D%2C+%22notes%22%3A+%7B%22type%22%3A+%22/type/text%22%2C+%22value%22%3A+%22Bibliography%3A+p.+251-260.\nIncludes+index.%22%7D%2C+%22number_of_pages%22%3A+268%2C+%22languages%22%3A+%5B%7B%22key%22%3A+%22/languages/eng%22%7D%5D%2C+%22lccn%22%3A+%5B%2284000531%22%5D%2C+%22isbn_10%22%3A+%5B%220674641752%22%5D%2C+%22publish_date%22%3A+%221984%22%2C+%22publish_country%22%3A+%22mau%22%2C+%22by_statement%22%3A+%22Judith+N.+Shklar.%22%2C+%22works%22%3A+%5B%7B%22key%22%3A+%22/works/OL2617047W%22%7D%5D%2C+%22type%22%3A+%7B%22key%22%3A+%22/type/edition%22%7D%2C+%22oclc_numbers%22%3A+%5B%2210348450%22%5D%2C+%22latest_revision%22%3A+16%2C+%22revision%22%3A+16%2C+%22created%22%3A+%7B%22type%22%3A+%22/type/datetime%22%2C+%22value%22%3A+%222008-04-01T03%3A28%3A50.625462%22%7D%2C+%22last_modified%22%3A+%7B%22type%22%3A+%22/type/datetime%22%2C+%22value%22%3A+%222024-12-27T16%3A46%3A50.181109%22%7D%7D).

Expand All @@ -117,14 +116,12 @@ The output in result is the same as the input and it is still not very readable.
Lets turn the one line of JSON data into YAML. YAML is another format for structured information which is a bit easier to read for human eyes.
In order to change the serialization of the data we need to decode the data and then encode the data.

Metafacture has lots of decoder and encoder modules for specific data formats that can be used in an Flux workflow.
Metafacture has lots of decoder and encoder modules for specific data formats that can be used in a Flux workflow.

Let's try this out. Add the module `decode-json` and `encode-yaml` to your Flux Workflow.

The Flux should now look like this:

Flux:

```text
inputFile
| open-file
Expand Down Expand Up @@ -217,7 +214,7 @@ Luckily, we cannot only open the data we have in our `inputFile-content` field,

Clear your playground and copy the following Flux workflow:

```
```text
"https://openlibrary.org/books/OL2838758M.json"
| open-http
| as-lines
Expand All @@ -227,22 +224,24 @@ Clear your playground and copy the following Flux workflow:
;
```

The [result in the playground](https://metafacture.org/playground/?flux=%22https%3A//openlibrary.org/books/OL2838758M.json%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-json%0A%7C+encode-yaml%0A%7C+print%0A%3B) should be the same as before without having to paste anything into the text field. We just used the module `open-http` and directly retrieved the data from the URL.
The [result in the playground](https://metafacture.org/playground/?flux=%22https%3A//openlibrary.org/books/OL2838758M.json%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-json%0A%7C+encode-yaml%0A%7C+print%0A%3B) should be the same as before without having to paste anything into the text field. We just used the module `open-http` to directly retrieve the data from the URL.

Let's take a look what a Flux workflow does. The Flux workflow is combination of different moduls to process incoming structured data. In our example we have different things that we do with these modules:
Let's take a look at what a Flux workflow does. The Flux workflow is a combination of different modules to process incoming structured data. In our example we have different things that we do with these modules:

1. We have a URL as input. The URL localizes the data on the web.
2. We tell Metafacture to request the stated url using `open-http`.
2. We tell Metafacture to request the stated URL using `open-http`.
3. Then we define how to handle the incoming data: since the JSON is written in one line, we tell Metafacture to regard every new line as a new record with `as-lines`
4. Afterwards we tell Metafacture to `decode-json` in order to translate the incoming data as json to the generic internal data model that is called metadata events
4. Afterwards we tell Metafacture to `decode-json` in order to translate the incoming data as JSON to the generic internal data model that is called metadata events
5. Then we instruct Metafacture to serialize the metadata events as YAML with `encode-yaml`
6. Finally, we tell MF to `print` everything.

So let's have a small recap of what we done and learned so far: * We played around with the Metafacture Playground.
* We learned that a Metafacture Flux workflow is a combination of modules with an inital text string or an variable.
So let's have a small recap of what we've done and learned so far:

* We've played around with the Metafacture Playground.
* We've learned that a Metafacture Flux workflow is a combination of modules with an inital text string or a variable.
* We got to know different modules like `open-http`, `as-lines`. `decode-json`, `encode-yaml`, `print`

More modules can be found in the [documentation of available flux commands](https://github.com/metafacture/metafacture-documentation/blob/master/flux-commands.html).
More modules can be found in the [documentation of available flux commands](https://metafacture.github.io/metafacture-documentation/docs/flux/flux-commands.html).

Now take some time and play around a little bit more and use some other modules.

Expand All @@ -268,16 +267,16 @@ Now take some time and play around a little bit more and use some other modules.
What you see with the modules `encode-formeta` and `write` is that modules can have further specification in brackets.
These can eiter be a string in `"..."` or attributes that define options as with `style=`.

One last thing you should learn on an abstract level is to grasp the general idea of Metafacture Flux workflows is that they have many different moduls through which the data is flowing.
The most abstract and most common process resemble the following steps:
One last thing you should learn on an abstract level to grasp the general idea of Metafacture Flux workflows is that they have many different modules through which the data is flowing.
The most abstract and most common process resembles the following steps:

**→ read → decode → transform → encode → write →**

This process is one that transforms incoming data in a way that is changed at the end.
This process chain transforms incoming data in distinct steps.
Each step can be done by one or a combination of multiple modules.
Modules are small tools that do parts of the complete task we want to do.

Each modul demands a certain input and give a certain output. This is called signature.
Each modul demands a certain input and gives a certain output. This is called signature.
e.g.:

The first modul `open-file` expects a string and provides read data (called reader).
Expand All @@ -286,12 +285,12 @@ This reader data can be passed on to a modul that accepts reader data e.g. in ou

If you have a look at the flux modul/command documentation then you see under signature which data a modul expects and which data it outputs.

The combination of moduls is a Flux workflow.
The combination of modules is called a "Flux workflow".

Each module is separated by a `|` and every workflow ends with a `;`.
Comments can be added with `//`.

See:
For example:

```
//input string:
Expand Down Expand Up @@ -319,7 +318,7 @@ Add the option: <code>prettyPrinting="true"</code> to the <code>encode-json</cod



2) Have a look at documentation of [`decode-xml`](https://metafacture.org/metafacture-documentation/docs/flux/flux-commands.html#decode-xml) what is different to `decode-json`? And what input does it expect and what output does it create (Hint: signature)?
2) Have a look at the documentation of [`decode-xml`](https://metafacture.org/metafacture-documentation/docs/flux/flux-commands.html#decode-xml). What is different to `decode-json`? And what input does it expect and what output does it create (hint: signature)?

<details>
<summary>Answer</summary>
Expand All @@ -329,7 +328,7 @@ The signature of <code>decode-xml</code> and <code>decode-json</code> is quiet d
<code>decode-json</code>: signature: String -> StreamReceiver

Explanation:
<code>decode-xml</code> expects data from Reader output of <code>open-file</code> or <code>open-http</code>, and creates output that can be transformed by a specific xml <code>handler</code>. The xml parser of <code>decode-xml</code> works straight with read content of a file or a url.
<code>decode-xml</code> expects data from Reader output of <code>open-file</code> or <code>open-http</code>, and creates output that can be transformed by a specific XML <code>handler</code>. The XML parser of <code>decode-xml</code> works straight by reading the content of a file or a URL.

<code>decode-json</code> expects data from output of a string like <code>as-lines</code> or <code>as-records</code> and creates output that could be transformed by <code>fix</code> or encoded with a module like <code>encode-xml</code>. For the most decoding you have to specify how (<code>as-lines</code> or <code>as-records</code>) the incoming data is read.
</details>
Expand All @@ -354,7 +353,7 @@ Explanation:

As you surely already saw I mentioned transform as one step in a metafacture workflow.

But aside from changing the serialisation we did not play around with transformations yet.
But aside from changing the serialization we did not play around with transformations yet.
This will be the theme of the next session.

---------------
Expand Down
28 changes: 14 additions & 14 deletions docs/03_Introduction_into_Metafacture-Fix.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,15 @@ parent: Tutorial

# Lesson 3: Introduction into Metafacture Fix

In the last session we learned about Flux moduls.
Flux moduls can do a lot of things. They configure the "high-level" transformation pipeline.
In the last session we've learned about Flux modules.
Flux modules can do a lot of things. They configure the "high-level" transformation pipeline.

But the main transformation of incoming data at record, elemenet and value level is usually done by the transformation moduls [Fix](https://metafacture.github.io/metafacture-documentation/docs/flux/flux-commands.html#fix) or [Morph](https://metafacture.github.io/metafacture-documentation/docs/flux/flux-commands.html#morph) as one step in the pipeline.
But the main transformation of incoming data at record, element and value level is usually done by the transformation modules [Fix](https://metafacture.github.io/metafacture-documentation/docs/flux/flux-commands.html#fix) or [Morph](https://metafacture.github.io/metafacture-documentation/docs/flux/flux-commands.html#morph) as one step in the pipeline.

By transformation we mean things like:

* Manipulating element names and element values
* Change hierachies and structures of records
* Changing hierachies and structures of records
* Lookup values in concordance list

But not changing serialization that is part of encoding and decoding.
Expand Down Expand Up @@ -47,10 +47,10 @@ You should end up with something like:
title: "Ordinary vices"
```

The Fix module, called by `fix`, in Metafacture is used to manipulate the input data filtering fields we would like to see. Only one fix-function was used: `retain`, which throws away all the data from the input except the stated `"title"` field. Normally all incoming data is passed through, unless it is somehow manipulated or a `retain` function is used.
The Fix module, called by `fix`, is used to manipulate the input data filtering fields we would like to see. Only one Fix-function was used: `retain`, which throws away all the data from the input except the stated `"title"` field. Normally all incoming data is passed through, unless it is somehow manipulated or a `retain` function is used.

HINT: As long as you embedd the fix functions in the Flux Workflow, you have to use double quotes to fence the fix functions,
and single quotes in the fix functions. As we did here: `fix ("retain('title')")`
HINT: As long as you embed the Fix functions in the Flux Workflow, you have to use double quotes to fence the Fix functions,
and single quotes in the Fix functions. As we did here: `fix ("retain('title')")`

Now let us additionally keep the info that is given in the element `"publish_date"` and the subfield `"key"` in `'type'` by adding `'publish_date', 'type.key'` to `retain`:

Expand All @@ -76,9 +76,9 @@ notes:

```

When manipulating data you often need to create many fixes to process a data file in the format and structure you need. With a text editor you can write all fix functions in a singe separate Fix file.
When manipulating data you often need to create many Fixes to process a data file in the format and structure you need. With a text editor you can write all Fix functions in a singe separate Fix file.

The playground has an transformationFile-content area that can be used as if the Fix is in a separate file.
The playground has a transformationFile-content area that can be used as if the Fix is in a separate file.
In the playground we use the variable `transformationFile` to adress the Fix file in the playground.

Like this.
Expand All @@ -93,16 +93,16 @@ retain("title", "publish_date", "notes.value", "type.key")

Using a separate Fix file is recommended if you need to write many Fix functions. It will keep the Flux workflow clear and legible.

To add more fixes we can again edit the Fix file.
To add more Fixes we can again edit the Fix file.
Lets add these lines in front of the retain function:

```
```perl
move_field("type.key", "pub_type")
```

Also change the `retain` function so that you keep the new element `"pub_type"` instead of the not existing nested `"key"` element.

```
```perl
move_field("type.key","pub_type")
retain("title", "publish_date", "notes.value", "pub_type")
```
Expand All @@ -121,7 +121,7 @@ notes:
With `move_field` we moved and renamed an existing element.
As next step add the following function before the `retain` function.

```
```perl
replace_all("pub_type","/type/","")
```

Expand Down Expand Up @@ -169,7 +169,7 @@ retain("title", "publish_date", "pub_type")

2) [Add a field with todays date called `"map_date"`.](https://metafacture.org/playground/?flux=%22https%3A//openlibrary.org/books/OL2838758M.json%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-json%0A%7C+fix+%28transformationFile%29%0A%7C+encode-yaml%0A%7C+print%0A%3B&transformation=move_field%28%22type.key%22%2C%22pub_type%22%29%0Areplace_all%28%22pub_type%22%2C%22/type/%22%2C%22%22%29%0A...%28%22mape_date%22%2C%22...%22%29%0Aretain%28%22title%22%2C+%22publish_date%22%2C+%22by_statement%22%2C+%22pub_type%22%29)

Have a look at the fix functions: https://metafacture.org/metafacture-documentation/docs/fix/Fix-functions.html (Hint: you could use `add_field` or `timestamp`. And don't forget to add the new element to `retain`)
Have a look at the [Fix functions](https://metafacture.org/metafacture-documentation/docs/fix/Fix-functions.html). (Hint: you could use `add_field` or `timestamp`. And don't forget to add the new element to `retain`)


<details>
Expand Down
Loading