Skip to content

improve concept docs structure #1305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/StardustDocs/c.list
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@
<!DOCTYPE categories
SYSTEM "https://resources.jetbrains.com/writerside/1.0/categories.dtd">
<categories>

<category id="related" name="Related topics" order="1"/>
</categories>
26 changes: 12 additions & 14 deletions docs/StardustDocs/d.tree
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,16 @@
<toc-element topic="gettingStartedDatalore.md"/>
<toc-element topic="gettingStartedGradleAdvanced.md"/>
</toc-element>
<toc-element topic="overview.md">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we put a redirect from overview -> concepts in writerside?

<toc-element topic="apiLevels.md">
<toc-element topic="stringApi.md"/>
<toc-element topic="extensionPropertiesApi.md"/>
<toc-element topic="concepts.md" accepts-web-file-names="overview.html">
<toc-element topic="apiLevels.md"/>
<toc-element topic="types.md">
<toc-element topic="DataFrame.md"/>
<toc-element topic="DataColumn.md"/>
<toc-element topic="DataRow.md"/>
</toc-element>
<toc-element topic="hierarchical.md"/>
<toc-element topic="nanAndNa.md"/>
<toc-element topic="numberUnification.md"/>
<toc-element topic="schemas.md">
<toc-element topic="schemasGradle.md"/>
<toc-element topic="schemasJupyter.md"/>
Expand All @@ -36,20 +40,16 @@
<toc-element topic="schemasImportSqlGradle.md"/>
<toc-element topic="schemasImportOpenApiGradle.md"/>
<toc-element topic="schemasImportOpenApiJupyter.md"/>
<toc-element topic="DataSchemaGenerationGradle.md"/>
</toc-element>
</toc-element>
<toc-element topic="types.md">
<toc-element topic="DataFrame.md"/>
<toc-element topic="DataColumn.md"/>
<toc-element topic="DataRow.md"/>
</toc-element>
<toc-element topic="extensionPropertiesApi.md"/>
<toc-element topic="DataSchemaGenerationMethods.md"/>
<toc-element topic="Compiler-Plugin.md">
<toc-element topic="staticInterpretation.md"/>
<toc-element topic="dataSchema.md"/>
<toc-element topic="compilerPluginExamples.md"/>
</toc-element>
<toc-element topic="nanAndNa.md"/>
<toc-element topic="numberUnification.md"/>
<toc-element topic="operations.md">
<toc-element topic="create.md">
<toc-element topic="createColumn.md" toc-title="DataColumn"/>
Expand Down Expand Up @@ -199,8 +199,6 @@
<toc-element topic="jupyterRendering.md"/>
</toc-element>
</toc-element>
<toc-element topic="DataSchema-Data-Classes-Generation.md">
<toc-element topic="gradleReference.md"/>
</toc-element>
<toc-element topic="_shadow_resources.md" hidden="true"/>
<toc-element topic="Support.md"/>
</instance-profile>
1 change: 1 addition & 0 deletions docs/StardustDocs/images/github-mark.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 4 additions & 4 deletions docs/StardustDocs/topics/ColumnSelectors.md
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,7 @@ For instance, excluding `userData.age`:
`df.select { colsAtAnyDepth { "a" in it.name() } except userData.age }`

Note that the selection of columns to exclude from column sets is always done relative to the outer scope.
Use the [Extension Properties API](extensionPropertiesApi.md) to prevent scoping issues if possible.
Use the [Extension Properties API](concepts/extensionPropertiesApi.md) to prevent scoping issues if possible.

> Special case: If a column that needs to be removed appears multiple times in the [`ColumnSet`](#column-resolvers),
> it is excepted each time it is encountered (including inside [Column Groups](DataColumn.md#columngroup)).
Expand Down Expand Up @@ -392,19 +392,19 @@ This function behaves the same as [`cols {}` and `[{}]`](ColumnSelectors.md#cols
Creates a [`ColumnSet`](#column-resolvers) containing the columns from both the left and right side of the function. This allows
you to combine selections or simply select multiple columns at once.

Any combination of [AccessApi](apiLevels.md) can be used on either side of the `and` operator.
Any combination of [AccessApi](concepts/apiLevels.md) can be used on either side of the `and` operator.

Note, while you can write `col1 and col2 and col3...`, it may be more concise to use
[`cols(col1, col2, col3...)`](ColumnSelectors.md#cols) instead. The only downside is that you can't mix
[Access APIs](apiLevels.md) with that notation.
[Access APIs](concepts/apiLevels.md) with that notation.

##### Rename {collapsible="true"}
`colA named "colB"`, `colA into namedColAccessor`

Renaming a column in the Columns Selection DSL is done by calling the infix functions
`named` or `into`.
They behave exactly the same, so it's up to contextual preference which one to use.
Any combination of [Access API](apiLevels.md) can be used to specify the column to rename
Any combination of [Access API](concepts/apiLevels.md) can be used to specify the column to rename
and which name should be used instead.

##### Expr (Column Expression) {collapsible="true"}
Expand Down
3 changes: 1 addition & 2 deletions docs/StardustDocs/topics/Home.topic
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
<primary>
<title>First steps</title>
<a href="gettingStartedKotlinNotebook.md"/>
<a href="overview.md"/>
<a href="concepts.md"/>
<a href="operations.md"/>
<a href="read.md">Reading from files: CSV, JSON, ApacheArrow</a>
</primary>
Expand All @@ -32,7 +32,6 @@
<a href="readSqlDatabases.md"/>
</secondary>


</section-starting-page>


Expand Down
14 changes: 14 additions & 0 deletions docs/StardustDocs/topics/Support.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Support

* <img src="github-mark.svg" alt="GitHub logo" height="24"/>
[**Issue Tracker**](https://github.com/Kotlin/dataframe/issues)

If you find a bug or have an idea for a new feature,
file an issue in our [DataFrame GitHub repository](https://github.com/Kotlin/dataframe).

* <img src="https://kotlinlang.org/docs/images/slack.svg" alt="Slack logo" height="24"/>
[**Community**](https://github.com/Kotlin/dataframe/issues)

Peer-to-peer support is available on the Kotlin Slack
[#datascience](https://kotlinlang.slack.com/archives/C4W52CFEZ) channel
([request an invite](https://surveys.jetbrains.com/s3/kotlin-slack-sign-up?_gl=1*1ssyqy3*_gcl_au*MTk5NzUwODYzOS4xNzQ2NzkxMDMz*FPAU*MTk5NzUwODYzOS4xNzQ2NzkxMDMz*_ga*MTE0ODQ1MzY3OS4xNzM4OTY1NzM3*_ga_9J976DJZ68*czE3NTE1NDUxODUkbzIyNyRnMCR0MTc1MTU0NTE4NSRqNjAkbDAkaDA.)).
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[//]: # (title: Gradle plugin reference)
[//]: # (title: Data Shemas Generation in Gradle)

This page describes the Gradle plugin that generates `@DataSchema` from data samples.
```Kotlin
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Data Schemas/Data Classes Generation
# Data Schemas Generation From Existing DataFrame

<web-summary>
Generate useful Kotlin definitions based on your DataFrame structure.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ In the Kotlin DataFrame library, we provide two different ways to access columns

Here's a list of all APIs in order of increasing safety.

* [**String API**](stringApi.md) <br/>
* **String API** <br/>
Columns are accessed by `string` representing their name. Type-checking is done at runtime, name-checking too.

* [**Extension Properties API**](extensionPropertiesApi.md) <br/>
Expand Down Expand Up @@ -80,7 +80,7 @@ The `titanic.csv` file can be found [here](https://github.com/Kotlin/dataframe/b

# Comparing APIs

The [String API](stringApi.md) is the simplest and unsafest of them all. The main advantage of it is that it can be
The String API is the simplest and unsafest of them all. The main advantage of it is that it can be
used at any time, including when accessing new columns in chain calls. So we can write something like:

```kotlin
Expand Down
62 changes: 62 additions & 0 deletions docs/StardustDocs/topics/concepts/concepts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Concepts And Principles

<web-summary>
Learn what Kotlin DataFrame is about — its core concepts, design principles, and usage philosophy.
</web-summary>

<card-summary>
Discover the fundamentals of the library —
understand key concepts, motivation, and the overall structure of the library.
</card-summary>

<link-summary>
Explore the fundamentals of Kotlin DataFrame —
understand key concepts, motivation, and the overall structure of the library.
</link-summary>


<show-structure depth="3"/>


## What is a dataframe

A *dataframe* is an abstraction for working with structured data.
Essentially, it’s a 2-dimensional table with labeled columns of potentially different types.
You can think of it like a spreadsheet or SQL table, or a dictionary of series objects.

The handiness of this abstraction is not in the table itself but in a set of operations defined on it.
The Kotlin DataFrame library is an idiomatic Kotlin DSL defining such operations.
The process of working with dataframe is often called *data wrangling* which
is the process of transforming and mapping data from one "raw" data form into another format
that is more appropriate for analytics and visualization.
The goal of data wrangling is to ensure quality and useful data.

## Main Features and Concepts

* [**Hierarchical**](hierarchical.md) — the Kotlin DataFrame library provides an ability to read and present data from different sources,
including not only plain **CSV** but also **JSON** or **[SQL databases](readSqlDatabases.md)**.
This is why it was designed to be hierarchical and allows nesting of columns and cells.
* **Functional** — the data processing pipeline is organized in a chain of [`DataFrame`](DataFrame.md) transformation operations.
* **Immutable** — every operation returns a new instance of [`DataFrame`](DataFrame.md) reusing underlying storage wherever it's possible.
* **Readable** — data transformation operations are defined in DSL close to natural language.
* **Practical** — provides simple solutions for common problems and the ability to perform complex tasks.
* **Minimalistic** — simple, yet powerful data model of three [column kinds](DataColumn.md#column-kinds).
* [**Interoperable**](collectionsInterop.md) — convertable with Kotlin data classes and collections.
This also means conversion to/from other libraries' data structures is usually quite straightforward!
See our [examples](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples)
for some conversions between DataFrame and [Apache Spark](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples/spark), [Multik](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples/multik), and [JetBrains Exposed](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples/exposed).
* **Generic** — can store objects of any type, not only numbers or strings.
* **Typesafe** — the Kotlin DataFrame library provides a mechanism of on-the-fly [**generation of extension properties**](extensionPropertiesApi.md)
that correspond to the columns of a dataframe.
In interactive notebooks like Jupyter or Datalore, the generation runs after each cell execution.
In IntelliJ IDEA there's a Gradle plugin for generation properties based on CSV file or JSON file.
Also, we’re working on a compiler plugin that infers and transforms [`DataFrame`](DataFrame.md) schema while typing.
You can now clone this [project with many examples](https://github.com/koperagen/df-plugin-demo) showcasing how it allows you to reliably use our most convenient extension properties API.
The generated properties ensure you’ll never misspell column name and don’t mess up with its type, and of course nullability is also preserved.
* [**Polymorphic**](schemas.md) —
if all columns of a [`DataFrame`](DataFrame.md) instance are presented in another dataframe,
then the first one will be seen as a superclass for the latter.
This means you can define a function on an interface with some set of columns
and then execute it safely on any [`DataFrame`](DataFrame.md) which contains this same set of columns.
In notebooks, this works out-of-the-box.
In ordinary projects, this requires casting (for now).
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ dataframes {
}
```

See [reference](gradleReference.md) and [examples](gradleReference.md#examples) for more details.
See [reference](DataSchemaGenerationGradle.md) and [examples](DataSchemaGenerationGradle.md#examples) for more details.

</tab>
</tabs>
Expand Down
2 changes: 1 addition & 1 deletion docs/StardustDocs/topics/extensionPropertiesApi.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ See the [](quickstart.md) in Kotlin Notebook with basic Extension Properties API
<tab title="Compiler Plugin">

For now, if you read [`DataFrame`](DataFrame.md) from a file or URL, you need to define its schema manually.
You can do it quickly with [`generate..()` methods](DataSchema-Data-Classes-Generation.md).
You can do it quickly with [`generate..()` methods](DataSchemaGenerationMethods.md).

Define schemas:
```kotlin
Expand Down
2 changes: 1 addition & 1 deletion docs/StardustDocs/topics/guides/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -349,7 +349,7 @@ Ready to go deeper? Check out what’s next:

- 🛠️ **[Browse the operations overview](operations.md)** to learn what Kotlin DataFrame can do.

- 🧠 **Understand the design** and core concepts in the [library overview](overview.md).
- 🧠 **Understand the design** and core concepts in the [library overview](concepts.md).

- 🔤 **[Learn more about Extension Properties](extensionPropertiesApi.md)**
and make working with your data both convenient and type-safe.
Expand Down
Loading