Add slidingWindowSum #99

treeowl · 2020-06-05T19:40:15Z

slidingWindow typically helps write only functions with
complexity around O(n*k), where n is the number of elements
in the stream and k is the size of the window. In many cases,
this can be reduced to O(n) by looking not at the window itself
but instead the sum of that window in some Semigroup. This can
be used, for example, to implement moving averages such as
arithmetic, geometric, or harmonic means.

treeowl · 2020-06-05T19:44:58Z

A couple concerns:

My biggest concern is that for a number of purposes, the size of the window is quite important. If the stream proves to be smaller than the window, we need to know that. For example, if we're taking a moving average, and the stream comes up very short, then we'd naively end up with a total garbage result. Should we have variants that add a return value indicating something about this?

Another concern: it seems rather useful to use this with newtypes from Data.Semigroup. Mapping over the stream once to apply the newtype and again to remove it seems potentially pricy. Do we want a version that maps over its input and its output?

treeowl · 2020-06-05T19:45:42Z

Oh, one more thing: the name kind of sucks. Should it be slidingWindowSconcat or something?

treeowl · 2020-06-05T19:52:48Z

Yet another question: do we want a By version that lets the user provide the semigroup operation directly?

src/Data/AnnotatedQueue.hs

treeowl · 2020-06-05T20:25:59Z

The CI failure is some GHC HEAD nonsense.

treeowl · 2020-06-06T07:12:44Z

@andrewthad I know you've stepped away from this package, but I'd very much appreciate your thoughts on this one.

src/Data/AnnotatedQueue.hs

`slidingWindow` typically helps write only functions with complexity around `O(n*k)`, where `n` is the number of elements in the stream and `k` is the size of the window. In many cases, this can be reduced to `O(n)` by looking not at the window itself but instead the sum of that window in some `Semigroup`. This can be used, for example, to implement moving averages such as arithmetic, geometric, or harmonic means.

The queues seem more useful than the sliding window function itself.

danidiaz · 2020-06-11T11:42:28Z

My biggest concern is that for a number of purposes, the size of the window is quite important. If the stream proves to be smaller than the window, we need to know that. For example, if we're taking a moving average, and the stream comes up very short, then we'd naively end up with a total garbage result. Should we have variants that add a return value indicating something about this?

Instead of a return value, perhaps members of the original stream up to the first full window could be re-yielded before starting with the windows themselves. Something like:

slidingWindow :: Monad m => Int -> Stream (Of a) m b -> Stream (Of a) m (Stream (Of (Seq a)) m b)

For cases in which there aren't enough elements to form a window, the isolated elements would be re-yielded but the stream with the window results would be empty.

Also, this way we don't need to keep whole windows in memory in order to report "incomplete" windows.

andrewthad · 2020-06-11T13:02:43Z

Pardon my ignorance in this area, but here are the things that confused me. This internal data structure looks a lot like a finger tree, but it differs from Patterson's finger tree:

The left digit and right digit have different size restrictions from one another.
The original finger trees have digits max out at size 4. This implementation uses 2 and 1.
The tree itself has an extra data constructor.

But I guess that's not what it's actually based on. The documentation says that this is "an implementation of Okasaki's implicit queues holding elements of some semigroup". It would be kind of nice to have a link to a reference implementation of Okasaki's implicit queues in Haskell so that someone could look at it and see how the data structure differs, but I don't know if there is a good library where this has already been done.

After figuring out what was going on with the data structure, I think this seems fine though, and it's the ideal "purely functional" solution to the problem. It's a little sad that the Semimeasurable abstraction is not exposed to the user in any way, but I think it would only be appropriate to do that if Semimeasurable were provided by a different library. I guess there's the Measured class from the fingertree library, but it's really not what you want here. It would be cool if there were a way to factor those out and unify them, but that's a complication for another way.

treeowl · 2020-06-11T13:59:05Z

@andrewthad, I really just mashed up Okasaki's structure with Hinze and Paterson's and checked that it was still okay. Yes, the structures are quite closely related, with Okasaki's simpler and less flexible. Which is faster for the purpose? Dunno. It does rather feel like it belongs in Some Other Library. semimeasure is really a convenient hack, and I wouldn't want to expose it as is. Better to expose an idea like one of these for general use:

semimeasure' :: a -> Maybe s
semimeasure'' :: r -> (s -> r) -> a -> r

I just went with the bare minimum I needed here.

@danidiaz, I really don't know enough about what people need to feel like I can evaluate whether your idea is the right approach.

treeowl requested review from chessai and Gabriella439 June 5, 2020 19:45

treeowl added the enhancement label Jun 5, 2020

treeowl force-pushed the sliding-window-sum branch 2 times, most recently from 84d7e83 to 8ec3763 Compare June 5, 2020 19:50

treeowl commented Jun 5, 2020

View reviewed changes

src/Data/AnnotatedQueue.hs Outdated Show resolved Hide resolved

treeowl commented Jun 5, 2020

View reviewed changes

src/Data/AnnotatedQueue.hs Show resolved Hide resolved

treeowl force-pushed the sliding-window-sum branch 2 times, most recently from a3c5a40 to 0bffbf9 Compare June 8, 2020 00:04

treeowl commented Jun 8, 2020

View reviewed changes

src/Data/AnnotatedQueue.hs Outdated Show resolved Hide resolved

treeowl added 2 commits June 8, 2020 00:31

Expose the queues

6e8a5b1

The queues seem more useful than the sliding window function itself.

treeowl force-pushed the sliding-window-sum branch from 0bffbf9 to 6e8a5b1 Compare June 8, 2020 04:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add slidingWindowSum #99

Add slidingWindowSum #99

treeowl commented Jun 5, 2020

treeowl commented Jun 5, 2020

treeowl commented Jun 5, 2020

treeowl commented Jun 5, 2020

treeowl commented Jun 5, 2020

treeowl commented Jun 6, 2020

danidiaz commented Jun 11, 2020

andrewthad commented Jun 11, 2020

treeowl commented Jun 11, 2020

Add slidingWindowSum #99

Are you sure you want to change the base?

Add slidingWindowSum #99

Conversation

treeowl commented Jun 5, 2020

treeowl commented Jun 5, 2020

treeowl commented Jun 5, 2020

treeowl commented Jun 5, 2020

treeowl commented Jun 5, 2020

treeowl commented Jun 6, 2020

danidiaz commented Jun 11, 2020

andrewthad commented Jun 11, 2020

treeowl commented Jun 11, 2020