This guide describes how to contribute to Temporian, and will help you set up your environment and create your first submission.
Contributions to this project must be accompanied by a Contributor License Agreement.
You (or your employer) retain the copyright to your contribution, this simply gives us permission to use and redistribute your contributions as part of the project. Head over to https://cla.developers.google.com/ to see your current agreements on file or sign a new one.
You generally only need to submit a CLA once, so if you've already submitted one (even if it was for a different project), you probably won't need to do it again.
All submissions, including submissions by project members, require review. We use GitHub Pull Requests for this purpose. Consult GitHub Help for more information on using pull requests.
All new contributions must pass all the tests and checks performed by GitHub actions, and any changes to docstrings must respect the docstring guidelines.
After cloning the repository, please manually install the git hooks:
git clone [email protected]:google/temporian.git
cp .git-hooks/* .git/hooks
Install Poetry, which we use to manage Python dependencies and virtual environments.
Temporian requires Python 3.9.0
or greater. We recommend using PyEnv to install and manage multiple Python versions. Once PyEnv is available, install a supported Python version (e.g. 3.9.6) by running:
pyenv install 3.9.6
After both Poetry and an adequate Python version have been installed, you can proceed to install the virtual environment and the required dependencies.
Configure poetry to create the virtual environment in the project's root directory (some vscode settings depend on this) by executing:
poetry config virtualenvs.in-project true
Before installing the package you need to install bazel (in Mac we recommend installing bazelisk with brew):
brew install bazelisk
Navigate to the project's root and run:
pyenv which python | xargs poetry env use
poetry install
Finally, activate the virtual environment by executing:
poetry shell
Run all tests with bazel:
bazel test //...:all
You can use the Bazel test flag --test_output=streamed
to see the test logs in realtime.
If developing and testing C++ code, the --compilation_mode=dbg
flag enables additional assertions that are otherwise disabled.
Note that these tests also include docstring examples, using the builtin doctest
module.
See the Adding code examples section for more information.
Benchmarking and profiling of pre-configured scripts is available as follow:
bazel run -c opt //benchmark:profile_time -- [name]
bazel run -c opt //benchmark:profile_memory -- [name] [-p]
where [name]
is the name of one of the python scripts in
benchmark/scripts, e.g. bazel run -c opt benchmark:profile_time -- basic
.
-p
flag displays memory over time plot instead of line-by-line memory
consumption.
bazel run -c opt //benchmark:benchmark_time
Live preview your local changes to the documentation with
mkdocs serve -f docs/mkdocs.yml
Any doctest code examples in temporian/*.py
or docs/*.md
, will be executed and tested using the python's built-in doctest module.
For example, the following piece of code would be executed, and the outputs must match the expected result indicated:
>>> evset = tp.event_set(
... timestamps=["2020-01-01", "2020-02-02"],
... )
>>> print(evset)
indexes: []
features: []
events:
(2 events):
timestamps: [...]
...
Note from this example:
- If the
>>>
indicator is not present, the code will not be run or tested. - Multi-line statements need a preceding
...
instead of>>>
. - All the lines immediately following
>>>
or...
and before a blank line, are the expected outputs. - You should always leave a blank line before closing the code block, to indicate the end of the test.
- The
...
inside the expected result is used to match anything. Here, the exact timestamps and the latest line (which includes memory usage information) don't need exact match.
You cannot use ...
in the first matching line to ignore the whole output (it's ambiguous with multi-lines).
In that case, you may use the SKIP
flag as follows:
>>> print("hello") # doctest:+SKIP
This result doesn't need to match
Exceptions can also be expected, but it's better to avoid being too specific with the expected result:
>>> node["f1"] + node["f2"]
Traceback (most recent call last):
...
ValueError: ... corresponding features should have the same dtype. ...
Finally, note that globals like tp
, pd
and np
are always included in the execution context, no need to import them.
To check if your examples are correct, you may run:
# Test anything in temporian/*.py and docs/*.md
bazel test //temporian/test:doc_test --test_output=streamed
In case of unexpected outputs, the result is printed and compared to the expected values, so that they can be fixed.