Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README + process.md: Update directory name and parameters to current + fix Markdown warnings #830

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,12 +141,12 @@ Starting a Changelog.

* Dropped Python 2 support. Python >= 3.6 is now required.
* Added `pyspdxtools_convertor` and `pyspdxtools_parser` CLI scripts. See [the readme](README.md) for usage instructions.
* Updated the tools to support SPDX versions up to 2.3 and to conform with the specification. Apart from many bugfixes
* Updated the tools to support SPDX versions up to 2.3 and to conform with the specification. Apart from many bugfixes
and new properties, some of the more significant changes include:
* Support for multiple packages per document
* Support for multiple checksums for packages and files
* Support for files outside a package
* **Note**: Validation was updated to follow the 2.3 specification. Since there is currently no support for
* Support for multiple packages per document
* Support for multiple checksums for packages and files
* Support for files outside a package
* **Note**: Validation was updated to follow the 2.3 specification. Since there is currently no support for
version-specific handling, some details may be handled incorrectly for documents using lower
versions. The changes are mostly restricted to properties becoming optional and new property values becoming
available, and should be of limited impact. See https://spdx.github.io/spdx-spec/v2.3/diffs-from-previous-editions/
Expand Down
31 changes: 21 additions & 10 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ intention prior to creating a patch.

## Development process

We use the GitHub flow that is described here: https://guides.github.com/introduction/flow/
We use the GitHub flow that is described here: <https://guides.github.com/introduction/flow/>

Here's the process to make changes to the codebase:

Expand All @@ -30,15 +30,19 @@ Here's the process to make changes to the codebase:
and optionally follow the further steps described to sync your fork and the original repository.

4. Create a new branch in your fork and set up environment:

```sh
git checkout -b fix-or-improve-something
python -m venv ./venv
./venv/bin/activate
pip install -e ".[development]"
```
Note: By using the group `[development]` for the installation, all dependencies (including optional ones) will be
installed. This way we make sure that all tests are executed.

Note: By using the group `[development]` for the installation, all dependencies (including optional ones) will be
installed. This way we make sure that all tests are executed.

5. Make some changes and commit them to the branch:

```sh
git commit --signoff -m 'description of my changes'
```
Expand All @@ -49,46 +53,53 @@ Here's the process to make changes to the codebase:
of [the Developer Certificate of Origin](https://developercertificate.org/). Git has utilities for signing off on
commits: `git commit -s` or `--signoff` signs a current commit, and `git rebase --signoff <revision-range>`
retroactively signs a range of past commits.

6. Test your changes:

```sh
pytest -vvs # in the repo root
```

7. Check your code style. When opening a pull request, your changes will automatically be checked with `isort`, `black`
and `flake8` to make sure your changes fit with the rest of the code style.
7. Check your code style. When opening a pull request, your changes will automatically be checked with `isort`, `black`
and `flake8` to make sure your changes fit with the rest of the code style.

```sh
# run the following commands in the repo root
isort src tests
isort src tests
black src tests
flake8 src tests
flake8 src tests
```
`black` and `isort` will automatically format the code and sort the imports. The configuration for these linters

`black` and `isort` will automatically format the code and sort the imports. The configuration for these linters
can be found in the `pyproject.toml`. `flake8` lists all problems found which then need to be resolved manually.
The configuration for the linter can be found in the `.flake8` file.

8. Push the branch to your fork on GitHub:

```sh
git push origin fix-or-improve-something
```

9. Make a pull request on GitHub.
10. Continue making more changes and commits on the branch, with `git commit --signoff` and `git push`.
11. When done, write a comment on the PR asking for a code review.
12. Some other developer will review your changes and accept your PR. The merge should be done with `rebase`, if
possible, or with `squash`.
13. The temporary branch on GitHub should be deleted (there is a button for deleting it).
14. Delete the local branch as well:

```sh
git checkout master
git pull -p
git branch -a
git branch -d fix-or-improve-something
```

# How to run tests
## How to run tests

The tests framework is using pytest:

```
```sh
pip install pytest
pytest -vvs
```
26 changes: 21 additions & 5 deletions DOCUMENTATION.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,27 @@
# Code architecture documentation

## Package Overview

Beneath the top-level package `spdx_tools` you will find three sub-packages:

- `spdx`, which contains the code to create, parse, write and validate SPDX documents of versions 2.2 and 2.3
- `spdx3`, which will contain the same feature set for versions 3.x once they are released
- `spdx3`, which will contain the same feature set for versions 3.x
- `common`, which contains code that is shared between the different versions, such as type-checking and `spdx_licensing`.

## `spdx`

The `spdx` package contains the code dealing with SPDX-2 documents.
The subpackages serve the purpose to divide the code into logically independent chunks. Shared code can be found in the top-level modules here.
`model`, `parser`, `validation` and `writer` constitute the four main components of this library and are further described below.
`clitools` serves as the entrypoint for the command `pyspdxtools`.
`jsonschema` and `rdfschema` contain code specific to the corresponding serialization format.

### `model`

The internal data model closely follows the [official SPDX-2.3 specification](https://spdx.github.io/spdx-spec/v2.3/).

Entrypoint to the model is the `Document` class, which has the following attributes:

- `creation_info`: a single instance of the `CreationInfo` class
- `packages`: a list of `Package` objects
- `files`: a list of `File` objects
Expand All @@ -35,6 +40,7 @@ A custom extension of the `@dataclass` annotation is used that is called `@datac
Apart from all the usual `dataclass` functionality, this implements fields of a class as properties with their own getter and setter methods.
This is used in particular to implement type checking when properties are set.
Source of truth for these checks are the attribute definitions at the start of the respective class that must specify the correct type hint.

The `beartype` library is used to check type conformity (`typeguard` was used in the past but has been replaced since due to performance issues).
In case of a type mismatch a `TypeError` is raised. To ensure that all possible type errors are found during the construction of an object,
a custom `__init__()` that calls `check_types_and_set_values()` is part of every class.
Expand All @@ -43,26 +49,31 @@ This function tries to set all values provided by the constructor and collects a
For the SPDX values `NONE` and `NOASSERTION` the classes `SpdxNone` and `SpdxNoAssertion` are used, respectively. Both can be instantiated without any arguments.

### `parser`

The parsing and writing modules are split into subpackages according to the serialization formats: `json`, `yaml`, `xml`, `tagvalue` and `rdf`.
As the first three share the same tree structure that can be parsed into a dictionary, their shared logic is contained in the `jsonlikedict` package.
One overarching concept of all parsers is the goal of dealing with parsing errors (like faulty types or missing mandatory fields) as long as possible before failing.
Thus, the `SPDXParsingError` that is finally raised collects as much information as possible about all parsing errors that occurred.

#### `tagvalue`

Since Tag-Value is an SPDX-specific format, there exist no readily available parsers for it.
This library implements its own deserialization code using the `ply` library's `lex` module for lexing and the `yacc` module for parsing.
This library implements its own deserialization code using the `ply` library's `lex` module for lexing and the `yacc` module for parsing.

#### `rdf`

The `rdflib` library is used to deserialize RDF graphs from XML format.
The graph is then being parsed and translated into the internal data model.
The graph is then being parsed and translated into the internal data model.

#### `json`, `yaml`, `xml`

In a first step, all three of JSON, YAML and XML formats are deserialized into a dictionary representing their tree structure.
This is achieved via the `json`, `yaml` and `xmltodict` packages, respectively.
Special note has to be taken in the XML case which does not support lists and numbers.
The logic concerning the translation from these dicts to the internal data model can be found in the `jsonlikedict` package.

### `writer`

For serialization purposes, only non-null fields are written out.
All writers expect a valid SPDX document from the internal model as input.
To ensure this is actually the case, the standard behaviour of every writer function is to call validation before the writing process.
Expand All @@ -71,18 +82,21 @@ Also by default, all list properties in the model are scanned for duplicates whi
This can be disabled by setting the `drop_duplicates` boolean to false.

#### `tagvalue`

The ordering of the tags follows the [example in the official specification](https://github.com/spdx/spdx-spec/blob/development/v2.3.1/examples/SPDXTagExample-v2.3.spdx).

#### `rdf`

The RDF graph is constructed from the internal data model and serialized to XML format afterward, using the `rdflib` library.

#### `json`, `yaml`, `xml`

As all three of JSON, YAML and XML formats share the same tree structure, the first step is to generate the dictionary representing that tree.
This is achieved by the `DocumentConverter` class in the `jsonschema` package.
Subsequently, the dictionary is serialized using the `json`, `yaml` and `xmltodict` packages, respectively.


### `validation`

The `validation` package takes care of all nonconformities with the SPDX specification that are not due to incorrect typing.
This mainly includes checks for correctly formatted strings or the actual existence of references SPDXIDs.
Entrypoint is the `document_validator` module with the `validate_full_spdx_document()` function.
Expand All @@ -93,6 +107,7 @@ Validation and reference checking of SPDXIDs (and possibly external document ref
For the validation of license expressions we utilise the `license-expression` library's `validate` and `parse` functions, which take care of checking license symbols against the [SPDX license list](https://spdx.org/licenses/).

Invalidities are captured in instances of a custom `ValidationMessage` class. This has two attributes:

- `validation_message` is a string that describes the actual problem
- `validation_context` is a `ValidationContext` object that helps to pinpoint the source of the problem by providing the faulty element's SPDXID (if it has one), the parent SPDXID (if that is known), the element's type and finally the full element itself.
It is left open to the implementer which of this information to use in the following evaluation of the validation process.
Expand All @@ -101,7 +116,8 @@ Every validation function returns a list of `ValidationMessage` objects, which a
That is, if an empty list is returned, the document is valid.

## `spdx3`
Due to the SPDX-3 model still being in development, this package is still a work in progress.

This package is still a work in progress.
However, as the basic building blocks of parsing, writing, creation and validation are still important in the new version,
the `spdx3` package is planned to be structured similarly to the `spdx` package.

Expand Down
Loading