Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Implementation of window functions. #70

Closed
wants to merge 4 commits into from
Closed

WIP: Implementation of window functions. #70

wants to merge 4 commits into from

Conversation

Sean1708
Copy link
Contributor

As per @spencerlyon2's request in #66 I implemented some basic windowing functionality. I aimed for generality and memory efficiency (so that you could easily iterate through a large table without having to load the whole thing into memory) but I could really use some feedback on what kind of API/functionality is actually needed.

Essentially you provide it with a callback taking an array of Any arrays (the current window), the range that was used to create the window and an arbitrary third value (which is used to pass data through the function). This callback should return a value for that window and these are all concatenated together.

See below for a rather contrived example.

julia> using SQLite

julia> db = SQLiteDB()
SQLite.SQLiteDB{UTF8String}("",Ptr{Void} @0x00007fd694397f70,0)

julia> table = Array(Int, (0, 2))
0x2 Array{Int64,2}

julia> for i in 1:10
        for j in 1:3
         table = vcat(table, [i ((i-1)*10 + j)])
        end
       end

julia> table
30x2 Array{Int64,2}:
  1   1
  1   2
  1   3
  2  11
  2  12
  2  13
  3  21
  3  22
  3  23
  4  31
  4  32
  4  33
  5  41
  5  42
  5  43
  6  51
  6  52
  6  53
  7  61
  7  62
  7  63
  8  71
  8  72
  8  73
  9  81
  9  82
  9  83
 10  91
 10  92
 10  93

julia> create(db, "test", table, ["a", "b"])
1x1 SQLite.ResultSet
| Row | "Rows Affected" |
|-----|-----------------|
| 1   | 0               |

julia> function average(win, rng, data)
        println("Called $(data[1]) times.")
        data[1] += 1
        results = zeros(Int, size(win[1]))
        for row in win
            for (i, v) in enumerate(row)
                results[i] += v
            end
        end
        [col/length(rng) for col in results]
       end
average (generic function with 1 method)

julia> window(db, average, 1:3, "test", ["a", "b"], [1])
Called 1 times.
Called 2 times.
Called 3 times.
Called 4 times.
Called 5 times.
Called 6 times.
Called 7 times.
Called 8 times.
Called 9 times.
Called 10 times.
Called 11 times.
Called 12 times.
Called 13 times.
Called 14 times.
Called 15 times.
Called 16 times.
Called 17 times.
Called 18 times.
Called 19 times.
Called 20 times.
Called 21 times.
Called 22 times.
Called 23 times.
Called 24 times.
Called 25 times.
Called 26 times.
Called 27 times.
Called 28 times.
28-element Array{Any,1}:
 [1.0,2.0]
 [1.3333333333333333,5.333333333333333]
 [1.6666666666666667,8.666666666666666]
 [2.0,12.0]
 [2.3333333333333335,15.333333333333334]
 [2.6666666666666665,18.666666666666668]
 [3.0,22.0]
 [3.3333333333333335,25.333333333333332]
 [3.6666666666666665,28.666666666666668]
 [4.0,32.0]
 [4.333333333333333,35.333333333333336]
 [4.666666666666667,38.666666666666664]
 [5.0,42.0]
 [5.333333333333333,45.333333333333336]
 [5.666666666666667,48.666666666666664]
 [6.0,52.0]
 [6.333333333333333,55.333333333333336]
 [6.666666666666667,58.666666666666664]
 [7.0,62.0]
 [7.333333333333333,65.33333333333333]
 [7.666666666666667,68.66666666666667]
 [8.0,72.0]
 [8.333333333333334,75.33333333333333]
 [8.666666666666666,78.66666666666667]
 [9.0,82.0]
 [9.333333333333334,85.33333333333333]
 [9.666666666666666,88.66666666666667]
 [10.0,92.0]

julia> window(db, average, 1:3:7, "test", ["a", "b"], [1])
Called 1 times.
Called 2 times.
Called 3 times.
Called 4 times.
Called 5 times.
Called 6 times.
Called 7 times.
Called 8 times.
Called 9 times.
Called 10 times.
Called 11 times.
Called 12 times.
Called 13 times.
Called 14 times.
Called 15 times.
Called 16 times.
Called 17 times.
Called 18 times.
Called 19 times.
Called 20 times.
Called 21 times.
Called 22 times.
Called 23 times.
Called 24 times.
24-element Array{Any,1}:
 [2.0,11.0]
 [2.0,12.0]
 [2.0,13.0]
 [3.0,21.0]
 [3.0,22.0]
 [3.0,23.0]
 [4.0,31.0]
 [4.0,32.0]
 [4.0,33.0]
 [5.0,41.0]
 [5.0,42.0]
 [5.0,43.0]
 [6.0,51.0]
 [6.0,52.0]
 [6.0,53.0]
 [7.0,61.0]
 [7.0,62.0]
 [7.0,63.0]
 [8.0,71.0]
 [8.0,72.0]
 [8.0,73.0]
 [9.0,81.0]
 [9.0,82.0]
 [9.0,83.0]

If a row is the first row in a window then that will be the last time it is used
so it can be `shift!`ed so that the GC can get rid of it.
Forcing the user to structure the data variable themselves means they know how
to unpack it in the callback.
@quinnj quinnj closed this Nov 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants