Skip to content

Latest commit

Β 

History

History
167 lines (109 loc) Β· 5.53 KB

CONTRIBUTING.md

File metadata and controls

167 lines (109 loc) Β· 5.53 KB

How to Contribute

This guide describes how to contribute to Temporian, and will help you set up your environment and create your first submission.

Contributor License Agreement

Contributions to this project must be accompanied by a Contributor License Agreement.

You (or your employer) retain the copyright to your contribution, this simply gives us permission to use and redistribute your contributions as part of the project. Head over to https://cla.developers.google.com/ to see your current agreements on file or sign a new one.

You generally only need to submit a CLA once, so if you've already submitted one (even if it was for a different project), you probably won't need to do it again.

Code reviews

All submissions, including submissions by project members, require review. We use GitHub Pull Requests for this purpose. Consult GitHub Help for more information on using pull requests.

All new contributions must pass all the tests and checks performed by GitHub actions, and any changes to docstrings must respect the docstring guidelines.

Development

Environment Setup

After cloning the repository, please manually install the git hooks:

git clone [email protected]:google/temporian.git

cp .git-hooks/* .git/hooks

Install Poetry, which we use to manage Python dependencies and virtual environments.

Temporian requires Python 3.9.0 or greater. We recommend using PyEnv to install and manage multiple Python versions. Once PyEnv is available, install a supported Python version (e.g. 3.9.6) by running:

pyenv install 3.9.6

After both Poetry and an adequate Python version have been installed, you can proceed to install the virtual environment and the required dependencies.

Configure poetry to create the virtual environment in the project's root directory (some vscode settings depend on this) by executing:

poetry config virtualenvs.in-project true

Before installing the package you need to install bazel (in Mac we recommend installing bazelisk with brew):

brew install bazelisk

Navigate to the project's root and run:

pyenv which python | xargs poetry env use
poetry install

Finally, activate the virtual environment by executing:

poetry shell

Testing

Run all tests with bazel:

bazel test //...:all

You can use the Bazel test flag --test_output=streamed to see the test logs in realtime.

If developing and testing C++ code, the --compilation_mode=dbg flag enables additional assertions that are otherwise disabled.

Note that these tests also include docstring examples, using the builtin doctest module. See the Adding code examples section for more information.

Benchmarking and profiling

Benchmarking and profiling of pre-configured scripts is available as follow:

Time and memory profiling

bazel run -c opt //benchmark:profile_time -- [name]
bazel run -c opt //benchmark:profile_memory -- [name] [-p]

where [name] is the name of one of the python scripts in benchmark/scripts, e.g. bazel run -c opt benchmark:profile_time -- basic.

-p flag displays memory over time plot instead of line-by-line memory consumption.

Time benchmarking

bazel run -c opt //benchmark:benchmark_time

Running docs server

Live preview your local changes to the documentation with

mkdocs serve -f docs/mkdocs.yml

Adding code examples

Any doctest code examples in temporian/*.py or docs/*.md, will be executed and tested using the python's built-in doctest module.

For example, the following piece of code would be executed, and the outputs must match the expected result indicated:

>>> evset = tp.event_set(
... 	timestamps=["2020-01-01", "2020-02-02"],
... )
>>> print(evset)
indexes: []
features: []
events:
     (2 events):
        timestamps: [...]
...

Note from this example:

  • If the >>> indicator is not present, the code will not be run or tested.
  • Multi-line statements need a preceding ... instead of >>>.
  • All the lines immediately following >>> or ... and before a blank line, are the expected outputs.
  • You should always leave a blank line before closing the code block, to indicate the end of the test.
  • The ... inside the expected result is used to match anything. Here, the exact timestamps and the latest line (which includes memory usage information) don't need exact match.

You cannot use ... in the first matching line to ignore the whole output (it's ambiguous with multi-lines). In that case, you may use the SKIP flag as follows:

>>> print("hello")  # doctest:+SKIP
This result doesn't need to match

Exceptions can also be expected, but it's better to avoid being too specific with the expected result:

>>> node["f1"] + node["f2"]
Traceback (most recent call last):
    ...
ValueError: ... corresponding features should have the same dtype. ...

Finally, note that globals like tp, pd and np are always included in the execution context, no need to import them.

To check if your examples are correct, you may run:

# Test anything in temporian/*.py and docs/*.md
bazel test //temporian/test:doc_test --test_output=streamed

In case of unexpected outputs, the result is printed and compared to the expected values, so that they can be fixed.