Skip to content

Commit cbc064b

Browse files
committed
Update README.md
1 parent 12c2306 commit cbc064b

File tree

1 file changed

+3
-82
lines changed

1 file changed

+3
-82
lines changed

README.md

Lines changed: 3 additions & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ You could find the following articles there:
3030

3131
* [Get started with Kotlin DataFrame](https://kotlin.github.io/dataframe/gettingstarted.html)
3232
* [Working with Data Schemas](https://kotlin.github.io/dataframe/schemas.html)
33+
* [Setup compiler plugin in Gradle project](https://kotlin.github.io/dataframe/compiler-plugin.html)
3334
* [Full list of all supported operations](https://kotlin.github.io/dataframe/operations.html)
3435
* [Reading from SQL databases](https://kotlin.github.io/dataframe/readsqldatabases.html)
3536
* [Reading/writing from/to different file formats like JSON, CSV, Apache Arrow](https://kotlin.github.io/dataframe/read.html)
@@ -52,7 +53,7 @@ implementation("org.jetbrains.kotlinx:dataframe:1.0.0-Beta2")
5253
Check out the [custom setup page](https://kotlin.github.io/dataframe/gettingstartedgradleadvanced.html) if you don't need some of the formats as dependencies,
5354
for Groovy, and for configurations specific to Android projects.
5455

55-
## Getting started
56+
## Code example
5657

5758
```kotlin
5859
import org.jetbrains.kotlinx.dataframe.*
@@ -78,87 +79,7 @@ Follow this [guide](https://kotlin.github.io/dataframe/gettingstartedkotlinnoteb
7879
* `ColumnGroup` — contains columns
7980
* `FrameColumn` — contains dataframes
8081

81-
## Syntax example
82-
83-
Let us show you how data cleaning and aggregation pipelines could look like with DataFrame.
84-
85-
**Create:**
86-
```kotlin
87-
// create columns
88-
val fromTo by columnOf("LoNDon_paris", "MAdrid_miLAN", "londON_StockhOlm", "Budapest_PaRis", "Brussels_londOn")
89-
val flightNumber by columnOf(10045.0, Double.NaN, 10065.0, Double.NaN, 10085.0)
90-
val recentDelays by columnOf("23,47", null, "24, 43, 87", "13", "67, 32")
91-
val airline by columnOf("KLM(!)", "{Air France} (12)", "(British Airways. )", "12. Air France", "'Swiss Air'")
92-
93-
// create dataframe
94-
val df = dataFrameOf(fromTo, flightNumber, recentDelays, airline)
95-
96-
// print dataframe
97-
df.print()
98-
```
99-
100-
**Clean:**
101-
```kotlin
102-
// typed accessors for columns
103-
// that will appear during
104-
// dataframe transformation
105-
val origin by column<String>()
106-
val destination by column<String>()
107-
108-
val clean = df
109-
// fill missing flight numbers
110-
.fillNA { flightNumber }.with { prev()!!.flightNumber + 10 }
111-
112-
// convert flight numbers to int
113-
.convert { flightNumber }.toInt()
114-
115-
// clean 'airline' column
116-
.update { airline }.with { "([a-zA-Z\\s]+)".toRegex().find(it)?.value ?: "" }
117-
118-
// split 'fromTo' column into 'origin' and 'destination'
119-
.split { fromTo }.by("_").into(origin, destination)
120-
121-
// clean 'origin' and 'destination' columns
122-
.update { origin and destination }.with { it.lowercase().replaceFirstChar(Char::uppercase) }
123-
124-
// split lists of delays in 'recentDelays' into separate columns
125-
// 'delay1', 'delay2'... and nest them inside original column `recentDelays`
126-
.split { recentDelays }.inward { "delay$it" }
127-
128-
// convert string values in `delay1`, `delay2` into ints
129-
.parse { recentDelays }
130-
```
131-
132-
**Aggregate:**
133-
```kotlin
134-
clean
135-
// group by the flight origin renamed into "from"
136-
.groupBy { origin named "from" }.aggregate {
137-
// we are in the context of a single data group
138-
139-
// total number of flights from origin
140-
count() into "count"
141-
142-
// list of flight numbers
143-
flightNumber into "flight numbers"
144-
145-
// counts of flights per airline
146-
airline.valueCounts() into "airlines"
147-
148-
// max delay across all delays in `delay1` and `delay2`
149-
recentDelays.maxOrNull { delay1 and delay2 } into "major delay"
150-
151-
// separate lists of recent delays for `delay1`, `delay2` and `delay3`
152-
recentDelays.implode(dropNA = true) into "recent delays"
153-
154-
// total delay per destination
155-
pivot { destination }.sum { recentDelays.colsOf<Int?>() } into "total delays to"
156-
}
157-
```
158-
159-
Check it out on [**Datalore**](https://datalore.jetbrains.com/view/notebook/vq5j45KWkYiSQnACA2Ymij) to get a better visual impression of what happens and what the hierarchical dataframe structure looks like.
160-
161-
Explore [**more examples here**](examples).
82+
Explore [**more examples here**](https://kotlin.github.io/dataframe/guides-and-examples.html).
16283

16384
## Kotlin, Kotlin Jupyter, Arrow, and JDK versions
16485

0 commit comments

Comments
 (0)