-
Notifications
You must be signed in to change notification settings - Fork 14
/
Copy pathautobin.malloynb
49 lines (44 loc) · 1.55 KB
/
autobin.malloynb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
>>>markdown
# Automatically Binning Data
By examining the range of values over a dataset, we can compute the appropriate histogram bin size, while capturing the data at the same time. We can then pipe the output to another query to display a histogram.
>>>malloy
source: airports is duckdb.table('../data/airports.parquet') extend {
measure: airport_count is count()
# bar_chart
view: by_elevation is {
aggregate: bin_size is (max(elevation) - min(elevation)) / 30
nest: data is {
group_by: elevation
aggregate: row_count is count()
}
} -> {
group_by: elevation is
floor(data.elevation / bin_size) * nullif(bin_size, 0) + bin_size / 2
aggregate: airport_count is data.row_count.sum()
order_by: elevation
}
}
>>>markdown
## Overall elevation distribution
The query can be used to show the overall distribution of the data.
We are showing the bin_size in this exmaple for clarity.
>>>malloy
#(docs) limit=5000 size=large
run: airports -> by_elevation
>>>markdown
## Distribution Adapts Automatically
Notice that when we look at Florida, the bin-width is different.
>>>malloy
#(docs) limit=5000 size=large
run: airports extend { where: state = 'FL' } -> by_elevation
>>>markdown
## Elevation within states
This binning even adapts when the queries are nested.
Notice that all the binning is local to the individual states. For example the bottom bin in Colorado starts at an elevation of 3000ish feet.
>>>malloy
#(docs) size=large
run: airports -> {
group_by: state is state
aggregate: airport_count
nest: by_elevation
}