diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 40da48ccdf..d4dbf8c451 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -17,20 +17,13 @@ The Kedro team pledges to foster and maintain a friendly community. We enforce a You can find the Kedro community on our [Slack organisation](https://slack.kedro.org/), which is where we share news and announcements, and general chat. You're also welcome to post links here to any articles or videos about Kedro that you create, or find, such as how-tos, showcases, demos, blog posts or tutorials. -We also curate a [GitHub repo that lists content created by the Kedro community](https://github.com/kedro-org/awesome-kedro). +We also curate a [GitHub repo that lists content created by the Kedro community](https://github.com/kedro-org/awesome-kedro). If you've made something with Kedro, simply add it to the list with a PR! ## Contribute to the project -There are quite a few ways to contribute to the project, find inspiration from the table below. +There are quite a few ways to contribute to Kedro, sich as answering questions about Kedro to help others, fixing a typo on the documentation, reporting a bug, reviewing pull requests or adding a feature. -|Activity|Description| -|-|-| -|Community Q&A|We encourage you to ask and answer technical questions on [GitHub discussions](https://github.com/kedro-org/kedro/discussions) or [Slack](https://slack.kedro.org/), but the former is often preferable since it will be picked up by search engines.| -|Report bugs and security vulnerabilities |We use [GitHub issues](https://github.com/kedro-org/kedro/issues) to keep track of known bugs and security vulnerabilities. We keep a close eye on them and update them when we have an internal fix in progress. Before you report a new issue, do your best to ensure your problem hasn't already been reported. If it has, just leave a comment on the existing issue, rather than create a new one.
If you have already checked the existing [GitHub issues](https://github.com/kedro-org/kedro/issues) and are still convinced that you have found odd or erroneous behaviour then please file a new one.| -|Propose a new feature|If you have new ideas for Kedro functionality then please open a [GitHub issue](https://github.com/kedro-org/kedro/issues) and describe the feature you would like to see, why you need it, and how it should work.| -|Review pull requests|Check the [Kedro repo to find open pull requests](https://github.com/kedro-org/kedro/pulls) and contribute a review!| -|Contribute a fix or feature|If you're interested in contributing fixes to code or documentation, first read our [guidelines for contributing developers](https://docs.kedro.org/en/stable/contribution/developer_contributor_guidelines.html) for an explanation of how to get set up and the process you'll follow. Once you are ready to contribute, a good place to start is to take a look at the `good first issues` and `help wanted issues` on [GitHub](https://github.com/kedro-org/kedro/issues).| -|Contribute to the documentation|You can help us improve the [Kedro documentation online](https://docs.kedro.org/en/stable/). Send us feedback as a [GitHub issue](https://github.com/kedro-org/kedro/issues) or start a documentation discussion on [GitHub](https://github.com/kedro-org/kedro/discussions).You are also welcome to make a raise a PR with a bug fix or addition to the documentation. First read the guide [Contribute to the Kedro documentation](https://docs.kedro.org/en/stable/contribution/documentation_contributor_guidelines.html). +Take a look at some of our [contribution suggestions on the Kedro GitHub Wiki](https://github.com/kedro-org/kedro/wiki/Contribute-to-Kedro)! ## Join our Technical Steering Committee diff --git a/README.md b/README.md index aed1d6894c..f329a8331f 100644 --- a/README.md +++ b/README.md @@ -5,6 +5,7 @@ [![Conda version](https://img.shields.io/conda/vn/conda-forge/kedro.svg)](https://anaconda.org/conda-forge/kedro) [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/kedro-org/kedro/blob/main/LICENSE.md) [![Slack Organisation](https://img.shields.io/badge/slack-chat-blueviolet.svg?label=Kedro%20Slack&logo=slack)](https://slack.kedro.org) +[![Slack Archive](https://img.shields.io/badge/slack-archive-blueviolet.svg?label=Kedro%20Slack%20)](https://linen-slack.kedro.org/) ![CircleCI - Main Branch](https://img.shields.io/circleci/build/github/kedro-org/kedro/main?label=main) ![Develop Branch Build](https://img.shields.io/circleci/build/github/kedro-org/kedro/develop?label=develop) [![Documentation](https://readthedocs.org/projects/kedro/badge/?version=stable)](https://docs.kedro.org/) @@ -14,7 +15,7 @@ ## What is Kedro? -Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular. +Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular. You can find out more at [kedro.org](https://kedro.org). Kedro is an open-source Python framework hosted by the [LF AI & Data Foundation](https://lfaidata.foundation/). @@ -51,12 +52,10 @@ _A pipeline visualisation generated using [Kedro-Viz](https://github.com/kedro-o The [Kedro documentation](https://docs.kedro.org/en/stable/) first explains [how to install Kedro](https://docs.kedro.org/en/stable/get_started/install.html) and then introduces [key Kedro concepts](https://docs.kedro.org/en/stable/get_started/kedro_concepts.html). -- The first example illustrates the [basics of a Kedro project](https://docs.kedro.org/en/stable/get_started/new_project.html) using the Iris dataset -- You can then review the [spaceflights tutorial](https://docs.kedro.org/en/stable/tutorial/tutorial_template.html) to build a Kedro project for hands-on experience +You can then review the [spaceflights tutorial](https://docs.kedro.org/en/stable/tutorial/spaceflights_tutorial.html) to build a Kedro project for hands-on experience -For new and intermediate Kedro users, there's a comprehensive section on [how to visualise Kedro projects using Kedro-Viz](https://docs.kedro.org/en/stable/visualisation/kedro-viz_visualisation.html) and [how to work with Kedro and Jupyter notebooks](https://docs.kedro.org/en/stable/notebooks_and_ipython/kedro_and_notebooks). +For new and intermediate Kedro users, there's a comprehensive section on [how to visualise Kedro projects using Kedro-Viz](https://docs.kedro.org/en/stable/visualisation/index.html) and [how to work with Kedro and Jupyter notebooks](https://docs.kedro.org/en/stable/notebooks_and_ipython/index.html). We also recommend the [API reference documentation](/kedro) for additional information. -Further documentation is available for more advanced Kedro usage and deployment. We also recommend the [glossary](https://docs.kedro.org/en/stable/resources/glossary.html) and the [API reference documentation](/kedro) for additional information. ## Why does Kedro exist? @@ -68,64 +67,21 @@ Kedro is built upon our collective best-practice (and mistakes) trying to delive - To increase efficiency, because applied concepts like modularity and separation of concerns inspire the creation of **reusable analytics code** +Find out more about how Kedro can answer your use cases from the [product FAQs on the Kedro website](https://kedro.org/#faq). + ## The humans behind Kedro The [Kedro product team](https://docs.kedro.org/en/stable/contribution/technical_steering_committee.html#kedro-maintainers) and a number of [open source contributors from across the world](https://github.com/kedro-org/kedro/releases) maintain Kedro. ## Can I contribute? -Yes! Want to help build Kedro? Check out our [guide to contributing to Kedro](https://github.com/kedro-org/kedro/blob/main/CONTRIBUTING.md). +Yes! We welcome all kinds of contributions. Check out our [guide to contributing to Kedro](https://github.com/kedro-org/kedro/wiki/Contribute-to-Kedro). ## Where can I learn more? -There is a growing community around Kedro. Have a look at the [Kedro FAQs](https://docs.kedro.org/en/stable/faq/faq.html#how-can-i-find-out-more-about-kedro) to find projects using Kedro and links to articles, podcasts and talks. - -## Who likes Kedro? - -There are Kedro users across the world, who work at start-ups, major enterprises and academic institutions like [Absa](https://www.absa.co.za/), -[Acensi](https://acensi.eu/page/home), -[Advanced Programming Solutions SL](https://www.linkedin.com/feed/update/urn:li:activity:6863494681372721152/), -[AI Singapore](https://makerspace.aisingapore.org/2020/08/leveraging-kedro-in-100e/), -[AMAI GmbH](https://www.am.ai/), -[Augment Partners](https://www.linkedin.com/posts/augment-partners_kedro-cheat-sheet-by-augment-activity-6858927624631283712-Ivqk), -[AXA UK](https://www.axa.co.uk/), -[Belfius](https://www.linkedin.com/posts/vangansen_mlops-machinelearning-kedro-activity-6772379995953238016-JUmo), -[Beamery](https://medium.com/hacking-talent/production-code-for-data-science-and-our-experience-with-kedro-60bb69934d1f), -[Caterpillar](https://www.caterpillar.com/), -[CRIM](https://www.crim.ca/en/), -[Dendra Systems](https://www.dendra.io/), -[Element AI](https://www.elementai.com/), -[GetInData](https://getindata.com/blog/running-machine-learning-pipelines-kedro-kubeflow-airflow), -[GMO](https://recruit.gmo.jp/engineer/jisedai/engineer/jisedai/engineer/jisedai/engineer/jisedai/engineer/jisedai/blog/kedro_and_mlflow_tracking/), -[Indicium](https://medium.com/indiciumtech/how-to-build-models-as-products-using-mlops-part-2-machine-learning-pipelines-with-kedro-10337c48de92), -[Imperial College London](https://github.com/dssg/barefoot-winnie-public), -[ING](https://www.ing.com), -[Jungle Scout](https://junglescouteng.medium.com/jungle-scout-case-study-kedro-airflow-and-mlflow-use-on-production-code-150d7231d42e), -[Helvetas](https://www.linkedin.com/posts/lionel-trebuchon_mlflow-kedro-ml-ugcPost-6747074322164154368-umKw), -[Leapfrog](https://www.lftechnology.com/blog/ai-pipeline-kedro/), -[McKinsey & Company](https://www.mckinsey.com/alumni/news-and-insights/global-news/firm-news/kedro-from-proprietary-to-open-source), -[Mercado Libre Argentina](https://www.mercadolibre.com.ar), -[Modec](https://www.modec.com/), -[Mosaic Data Science](https://www.youtube.com/watch?v=fCWGevB366g), -[NaranjaX](https://www.youtube.com/watch?v=_0kMmRfltEQ), -[NASA](https://github.com/nasa/ML-airport-taxi-out), -[NHS AI Lab](https://nhsx.github.io/skunkworks/synthetic-data-pipeline), -[Open Data Science LatAm](https://www.odesla.org/), -[Prediqt](https://prediqt.co/), -[QuantumBlack](https://medium.com/quantumblack/introducing-kedro-the-open-source-library-for-production-ready-machine-learning-code-d1c6d26ce2cf), -[ReSpo.Vision](https://neptune.ai/customers/respo-vision), -[Retrieva](https://tech.retrieva.jp/entry/2020/07/28/181414), -[Roche](https://www.roche.com/), -[Sber](https://www.linkedin.com/posts/seleznev-artem_welcome-to-kedros-documentation-kedro-activity-6767523561109385216-woTt), -[Société Générale](https://www.societegenerale.com/en), -[Telkomsel](https://medium.com/life-at-telkomsel/how-we-build-a-production-grade-data-pipeline-7004e56c8c98), -[Universidad Rey Juan Carlos](https://github.com/vchaparro/MasterThesis-wind-power-forecasting/blob/master/thesis.pdf), -[UrbanLogiq](https://urbanlogiq.com/), -[Wildlife Studios](https://wildlifestudios.com), -[WovenLight](https://www.wovenlight.com/) and -[XP](https://youtu.be/wgnGOVNkXqU?t=2210). - -Kedro won [Best Technical Tool or Framework for AI](https://awards.ai/the-awards/previous-awards/the-4th-ai-award-winners/) in the 2019 Awards AI competition and a merit award for the 2020 [UK Technical Communication Awards](https://uktcawards.com/announcing-the-award-winners-for-2020/). It is listed on the 2020 [ThoughtWorks Technology Radar](https://www.thoughtworks.com/radar/languages-and-frameworks/kedro) and the 2020 [Data & AI Landscape](https://mattturck.com/data2020/). Kedro has received an [honorable mention in the User Experience category in Fast Company’s 2022 Innovation by Design Awards](https://www.fastcompany.com/90772252/user-experience-innovation-by-design-2022). +There is a growing community around Kedro. We encourage you to ask and answer technical questions on [Slack](https://slack.kedro.org/) and bookmark the [Linen archive of past discussions](https://linen-slack.kedro.org/). + +We keep a list of [technical FAQs in the Kedro documentation](https://docs.kedro.org/en/stable/faq/faq.html) and you can find a growing list of blog posts, videos and projects that use Kedro over on the [`awesome-kedro` GitHub repository](https://github.com/kedro-org/awesome-kedro). If you have created anything with Kedro we'd love to include it on the list. Just make a PR to add it! ## How can I cite Kedro? diff --git a/docs/source/faq/faq.md b/docs/source/faq/faq.md index 9087f29def..75790690a9 100644 --- a/docs/source/faq/faq.md +++ b/docs/source/faq/faq.md @@ -1,4 +1,6 @@ -# Frequently asked questions +# FAQs + +This is a growing set of technical FAQs. The [product FAQs on the Kedro website](https://kedro.org/#faq) explain how Kedro can answer the typical use cases and requirements of data scientists, data engineers, machine learning engineers and product owners. ## Visualisation @@ -46,3 +48,25 @@ * [How do I create a modular pipeline](../nodes_and_pipelines/modular_pipelines.md#how-do-i-create-a-modular-pipeline)? * [Can I use generator functions in a node](../nodes_and_pipelines/nodes.md#how-to-use-generator-functions-in-a-node)? + +## What is data engineering convention? + +[Bruce Philp](https://github.com/bruceaphilp) and [Guilherme Braccialli](https://github.com/gbraccialli-qb) are the +brains behind a layered data-engineering convention as a model of managing data. You can find an [in-depth walk through of their convention](https://towardsdatascience.com/the-importance-of-layered-thinking-in-data-engineering-a09f685edc71) as a blog post on Medium. + +Refer to the following table below for a high level guide to each layer's purpose + +> **Note**:The data layers don’t have to exist locally in the `data` folder within your project, but we recommend that you structure your S3 buckets or other data stores in a similar way. + +![data_engineering_convention](../meta/images/data_layers.png) + +| Folder in data | Description | +| -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Raw | Initial start of the pipeline, containing the sourced data model(s) that should never be changed, it forms your single source of truth to work from. These data models are typically un-typed in most cases e.g. csv, but this will vary from case to case | +| Intermediate | Optional data model(s), which are introduced to type your :code:`raw` data model(s), e.g. converting string based values into their current typed representation | +| Primary | Domain specific data model(s) containing cleansed, transformed and wrangled data from either `raw` or `intermediate`, which forms your layer that you input into your feature engineering | +| Feature | Analytics specific data model(s) containing a set of features defined against the `primary` data, which are grouped by feature area of analysis and stored against a common dimension | +| Model input | Analytics specific data model(s) containing all :code:`feature` data against a common dimension and in the case of live projects against an analytics run date to ensure that you track the historical changes of the features over time | +| Models | Stored, serialised pre-trained machine learning models | +| Model output | Analytics specific data model(s) containing the results generated by the model based on the `model input` data | +| Reporting | Reporting data model(s) that are used to combine a set of `primary`, `feature`, `model input` and `model output` data used to drive the dashboard and the views constructed. It encapsulates and removes the need to define any blending or joining of data, improve performance and replacement of presentation layer without having to redefine the data models | diff --git a/docs/source/get_started/install.md b/docs/source/get_started/install.md index 0ce17301c5..8afea95a57 100644 --- a/docs/source/get_started/install.md +++ b/docs/source/get_started/install.md @@ -134,7 +134,7 @@ You should see an ASCII art graphic and the Kedro version number. For example: ![](../meta/images/kedro_graphic.png) -If you do not see the graphic displayed, or have any issues with your installation, check out the [searchable archive of Slack discussions](https://www.linen.dev/s/kedro), or post a new query on the [Slack organisation](https://slack.kedro.org). +If you do not see the graphic displayed, or have any issues with your installation, check out the [searchable archive of Slack discussions](https://linen-slack.kedro.org/), or post a new query on the [Slack organisation](https://slack.kedro.org). ## How to upgrade Kedro @@ -187,4 +187,4 @@ pip install kedro * Installation prerequisites include a virtual environment manager like `conda`, Python 3.7+, and `git`. * You should install Kedro using `pip install kedro`. -If you encounter any problems as you set up Kedro, ask for help on Kedro's [Slack organisation](https://slack.kedro.org) or review the [searchable archive of Slack discussions](https://www.linen.dev/s/kedro). +If you encounter any problems as you set up Kedro, ask for help on Kedro's [Slack organisation](https://slack.kedro.org) or review the [searchable archive of Slack discussions](https://linen-slack.kedro.org/). diff --git a/docs/source/index.rst b/docs/source/index.rst index ac106f9c48..5850f15f76 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -43,8 +43,8 @@ Welcome to Kedro's documentation! :target: https://slack.kedro.org :alt: Kedro's Slack organisation -.. image:: https://img.shields.io/badge/slack-archive-blue.svg?label=Kedro%20Slack%20 - :target: https://www.linen.dev/s/kedro +.. image:: https://img.shields.io/badge/slack-archive-blueviolet.svg?label=Kedro%20Slack%20 + :target: https://linen-slack.kedro.org/ :alt: Kedro's Slack archive .. image:: https://img.shields.io/badge/code%20style-black-black.svg diff --git a/docs/source/meta/images/data_layers.png b/docs/source/meta/images/data_layers.png new file mode 100644 index 0000000000..fd3798310a Binary files /dev/null and b/docs/source/meta/images/data_layers.png differ diff --git a/docs/source/resources/index.md b/docs/source/resources/index.md index 72493f112e..ce24876a0a 100644 --- a/docs/source/resources/index.md +++ b/docs/source/resources/index.md @@ -1,4 +1,4 @@ -# Resources +# FAQs and resources ```{toctree} :maxdepth: 1 diff --git a/docs/source/tutorial/spaceflights_tutorial.md b/docs/source/tutorial/spaceflights_tutorial.md index 0a65d0369b..da58578174 100644 --- a/docs/source/tutorial/spaceflights_tutorial.md +++ b/docs/source/tutorial/spaceflights_tutorial.md @@ -31,7 +31,7 @@ If you hit an issue with the tutorial: * Check the [spaceflights tutorial FAQ](spaceflights_tutorial_faqs.md) to see if we have answered the question already. * Use [Kedro-Viz](../visualisation/kedro-viz_visualisation) to visualise your project to better understand how the datasets, nodes and pipelines fit together. * Use the [#questions channel](https://slack.kedro.org/) on our Slack channel to ask the community for help. -* Search the [searchable archive of Slack discussions](https://www.linen.dev/s/kedro). +* Search the [searchable archive of Slack discussions](https://linen-slack.kedro.org/). ## Terminology diff --git a/docs/source/visualisation/kedro-viz_visualisation.md b/docs/source/visualisation/kedro-viz_visualisation.md index 0f6e207508..d546681106 100644 --- a/docs/source/visualisation/kedro-viz_visualisation.md +++ b/docs/source/visualisation/kedro-viz_visualisation.md @@ -42,7 +42,7 @@ You should see the following: If a visualisation panel opens up and a pipeline is not visible, refresh the view, and check that your tutorial project code is complete if you've not generated it from the starter template. If you still don't see the visualisation, the Kedro community can help: * use the [#questions channel](https://slack.kedro.org/) on our Slack channel to ask the community for help -* search the [searchable archive of Slack discussions](https://www.linen.dev/s/kedro) +* search the [searchable archive of Slack discussions](https://linen-slack.kedro.org/) To exit the visualisation, close the browser tab. To regain control of the terminal, enter `^+c` on Mac or `Ctrl+c` on Windows or Linux machines.