patterns/autobin.malloynb

>>>markdown
# Automatically Binning Data
By examining the range of values over a dataset, we can compute the appropriate histogram bin size, while capturing the data at the same time.  We can then pipe the output to another query to display a histogram.
>>>malloy
source: airports is duckdb.table('../data/airports.parquet') extend {
  measure: airport_count is count()
  # bar_chart
  view: by_elevation is {
    aggregate: bin_size is (max(elevation) - min(elevation)) / 30
    nest: data is  {
      group_by: elevation
      aggregate: row_count is count()
    }
  } -> {
    group_by: elevation is 
      floor(data.elevation / bin_size) * nullif(bin_size, 0) + bin_size / 2
    aggregate: airport_count is data.row_count.sum()
    order_by: elevation
  }
}
>>>markdown
## Overall elevation distribution

The query can be used to show the overall distribution of the data.

We are showing the bin_size in this exmaple for clarity.
>>>malloy
#(docs) limit=5000 size=large
run: airports -> by_elevation
>>>markdown

## Distribution Adapts Automatically
Notice that when we look at Florida, the bin-width is different.
>>>malloy
#(docs) limit=5000 size=large
run: airports extend { where: state = 'FL' } -> by_elevation
>>>markdown

## Elevation within states
This binning even adapts when the queries are nested.

Notice that all the binning is local to the individual states.  For example the bottom bin in Colorado starts at an elevation of 3000ish feet.
>>>malloy
#(docs) size=large
run: airports -> {
  group_by: state is state
  aggregate: airport_count
  nest: by_elevation
}