Skip to content

Commit

Permalink
Only some minor formatting changes. NSD submission version.
Browse files Browse the repository at this point in the history
  • Loading branch information
nheeren committed Sep 27, 2018
1 parent bc72310 commit 42ace37
Show file tree
Hide file tree
Showing 3 changed files with 5 additions and 5 deletions.
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ In order to make a (data) contribution to this repository please consider the fo
4. Adhere to the given CSV [data format](#Data Format)
5. Create a [pull request](https://help.github.com/articles/creating-a-pull-request/). It is also possible to [edit files online on github](https://help.github.com/articles/editing-files-in-another-user-s-repository/), which may be inconvenient for larger changes.

We encourage the use of the [liberated_data project](https://github.com/nheeren/liberated_data) if you are extracting data from non-portable, such as figures or tables in PDF files.
Data can be converted or aggregated. Any alteration to the original data must be documented in one of the designated comment columns. We encourage the use of the [liberated_data project](https://github.com/nheeren/liberated_data) if you are extracting data from non-portable, such as figures or tables in PDF files.

## Data format

Expand Down
4 changes: 2 additions & 2 deletions codebook.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Please also consider the technical file formatting guidelines described in the [

## Special values

This section explains the logic for resolving missing or ambiguous values. It is differentiated between *no observation* and *no information*. Therefore, missing values have different notation with `NULL` referring to the former and `NA` to the latter. The following figure illustrates the decision process. `NULL` is the default value for empty or missing values. Only if the data contributor is certain that the data source does not contain a value the value `NA` is used. In case the data source contains a references to the parameter in question, but does not provide a numerical value the value `unspecified` is used. Otherwise the numerical value is used. The number `0`must only be used if it is specified or implied as such.
This section explains the logic for resolving missing or ambiguous values. It is differentiated between *no observation* and *no information*. Therefore, missing values have different notation with `NULL` referring to the former and `NA` to the latter. The following figure illustrates the decision process. `NULL` is the default value for empty or missing values. Only if the data contributor is certain that the data source does not contain a value the value `NA` is used. In case the data source contains a reference to the parameter in question, but does not provide a numerical value the value `unspecified` is used. Otherwise the numerical value is used. The number `0`must only be used if it is specified or implied as such.

![special_values](doc/figures/special_values.png)

Expand All @@ -29,7 +29,7 @@ In summary:
- `empty value`: The database must not contain any empty values.
- `NULL`: Missing value that has *no observation*. This is the default empty value of cells in a new column or row. That means, the parameter was not evaluated by the person providing the data. For example, this is the default value if a new column is added to the database. Without revisiting the studies it is not possible to make a judgement on the values and all rows would therefore be NULL. The same applies if a data contributor decides not to provide the (optional) secondary data attributes – they need to be NULL. Ideally there should be no NULL valued cells in the database and contributors are encouraged to resolve NULL values.
- `NA`: Missing value that has *no information*. That means no data was provided, is not applicable, or could not be attributed. This implies that the data contributor looked for the data in the source, but no (suitable) value was found. An example: If a study on buildings reported only steel in reinforced concrete buildings, then the 'concrete' column will be 'NA', since no value for concrete is present. It is at the contributor's discretion to calculate the concrete from the available numbers and mention the calculation steps in the comment column.
- `unspecified`: The data source contains an explicit unspecified value, , such as "unspecified", "not available", "-", "unknown", "unclear", "trace amounts", "some", etc. This means that the data creators considered this attribute but have not provided a numerical value (zero or non-zero number). An example: In a study on a building the data creators state that copper content is known to be part of the building in an unknown amount shall have 'unspecified' in the corresponding column.
- `unspecified`: The data source contains an explicit unspecified value, such as "unspecified", "not available", "-", "unknown", "unclear", "trace amounts", "some", etc. This means that the data creators considered this attribute but have not provided a numerical value (zero or non-zero number). An example: In a study on a building the data creators state that copper content is known to be part of the building in an unknown amount shall have 'unspecified' in the corresponding column.
- `0`: A zero value is simply maintained as the number zero (0). However, it must only be used if the number has been measured and provided in the data source. It must not be used as a placeholder for missing values.

The release 1.0 version database contains a number of NULL values as the authors added these attributes at a later stage of the project.
Expand Down
4 changes: 2 additions & 2 deletions doc/figures/special_values.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,14 @@ Hopefully Github will enable [rendering of graphs](https://github.com/github/mar

## special_values figure

This is the code for the [mermaid graph]( https://mermaidjs.github.io) in the [codebook.md](../codebook.md) file.
This is the code for the [mermaid graph]( https://mermaidjs.github.io) in the [codebook.md](../../codebook.md) file.



```mermaid
graph TD
%% Items
graph TD
NULL["#quot;NULL#quot;<br/>(no observation)"]
has_info[<i>Does the data source<br/>contain any information?</i>]
NA["#quot;NA#quot;</br>(no information)"]
Expand Down

0 comments on commit 42ace37

Please sign in to comment.