-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Interval bounds field #217
base: v2-DEV
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #217 +/- ##
===========================================
- Coverage 84.31% 49.77% -34.55%
===========================================
Files 12 12
Lines 848 870 +22
===========================================
- Hits 715 433 -282
- Misses 133 437 +304
|
We originally used something very similar before Intervals 1.3 but decided in #102 to switch over to encoding the bounds as parametric type information to reduce memory (the main idea being that a vector of intervals with the same bounds do not need to encode this information for each instance). I think probably the right approach is to introduce the changes you added here as new interval types (we can possibly rename the original ones too) so end-users can switch between these different types based upon their particular use case to maximize performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of initial thoughts
bounds_int(l::Type{Open}, r::Type{Open}) = 0x00 | ||
bounds_int(l::Type{Closed}, r::Type{Open}) = 0x01 | ||
bounds_int(l::Type{Open}, r::Type{Closed}) = 0x02 | ||
bounds_int(l::Type{Closed}, r::Type{Closed}) = 0x03 | ||
|
||
bounds_types(x::Integer) = bounds_types(Val(UInt8(x))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed opportunities to use bit masks:
bounds_int(l::Type{Open}, r::Type{Open}) = 0x00 | |
bounds_int(l::Type{Closed}, r::Type{Open}) = 0x01 | |
bounds_int(l::Type{Open}, r::Type{Closed}) = 0x02 | |
bounds_int(l::Type{Closed}, r::Type{Closed}) = 0x03 | |
bounds_types(x::Integer) = bounds_types(Val(UInt8(x))) | |
bounds_int(l::Type{Open}, r::Type{Open}) = 0x00 | 0x00 | |
bounds_int(l::Type{Closed}, r::Type{Open}) = 0x10 | 0x00 | |
bounds_int(l::Type{Open}, r::Type{Closed}) = 0x00 | 0x01 | |
bounds_int(l::Type{Closed}, r::Type{Closed}) = 0x10 | 0x01 | |
function bounds_types(x::UInt8) | |
L = if x & 0x20 == 0x20 | |
Unbounded | |
elseif x & 0x10 == 0x10 | |
Closed | |
else | |
Open | |
end | |
R = if x & 0x02 == 0x02 | |
Unbounded | |
elseif x & 0x01 == 0x01 | |
Closed | |
else | |
Open | |
end | |
return (L, R) | |
end |
Can make this logic even cleaner with something like: https://github.com/JuliaTime/TimeZones.jl/blob/master/src/class.jl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually my initial pass did this, but I concluded that the value dispatch was more declarative / readable and I didn't notice a performance difference either way... though maybe there's a specific case you're thinking of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just from a reasoning standpoint it seems sensible to denote each of the 8-bits for a specific purpose. Your values from 0x00
- 0x03
follow that logic but the unbounded values do not. This makes it difficult to reason into what each bit does and use bitwise operators to determine information about bounds.
I'm fine with using Val
dispatch if there isn't a performance issue.
abstract type AbstractInterval{T, L <: Bound, R <: Bound} end | ||
# Methods for convert between int and tuple representations for space efficiency | ||
# (Open, Closed) is 16 bytes while the integer represenation is only 1 byte. | ||
# TODO: Convert types to symbols when we switch to extending IntervalSets.jl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been catching up with the motivation behind these changes. Is there rational why using symbols is a better approach than using types? Should IntervalSets.jl be updated to use types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel like it matters either way. My preference is more focused on making things consistent between the two, so Intervals.jl can extend IntervalSets.jl. As it stands, there doesn't seem to be much advantage to having them be types. We don't leverage the type hierarchy much and we don't store any information in them, so it's just extra names/types to remember. The practical benefit of switching to symbols is that it means we may be able to have Intervals.jl extend IntervalSets.jl with only one breaking release in Intervals.jl. Going the other way around would require a breaking release to IntervalSets.jl and a lot more updates in the ecosystem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the counter argument to using symbols is that by using types you can use the standard type inspection functions to determine what the names are used by this package. Dispatch on these types isn't used commonly outside of this package but I do think it's extremely useful for the internal logic in endpoint.jl
.
bounds_types(::Val{0x03}) = (Closed, Closed) | ||
|
||
# Extension for backwards support of Unbounded endpoints to avoid changing existing logic | ||
# TODO: Drop these in favour of the approach in IntervalSets.jl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like an approach has yet to be worked out: JuliaMath/IntervalSets.jl#123
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the question is whether it should be the responsibility of either IntervalSets.jl or Intervals.jl to handle this with yet another custom type. AFAICT, bounded-ness is just a question of whether an endpoint value isfinite
which we already have an API for. My preference would be for that API to be handle independently of intervals like in Infinity.jl or Infinities.jl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree that using isfinite
on endpoints does seem like a better interface. From what I recall from the Infinity.jl work the parts where this breaks down is that:
- Defining
Infinite{T}
seems like a great idea for making any type into something that supports infinity but you lose the proper type hierarchy which makes this break down in multiple dispatch. Ideally we'd want to defineInfinite{T} <: supertype(T)
to make this work. - With 1 being infeasible we're left with making specific types to properly define supertypes. This results in not all types having an alternative infinite type.
This is where it started to make more sense to have the intervals to support infinity themselves. For example Interval{Char,Closed,Unbounded}
.
An alternative strategy could be something like:
struct Interval{T}
first::Union{T,NegInf}
last::Union{T,PosInf}
end
but I doubt that would be an improvement over the v1 setup.
struct Interval{T} <: AbstractInterval{T} | ||
bounds::UInt8 # bounds comes first to allow incomplete construction for unbounded intervals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious if the intention is to have Intervals.jl use the IntervalSets.jl interface isn't this moving in the wrong direction? Or is the plan to update IntervalSets.jl in some way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the plan is to:
- Subtype IntervalSets.AbstractInterval
- Provide conversion and promotion rules for different interval types
- Provide documentation on when to use each interval type and why
- More clearly identify what should live in each package (ie: core API in IntervalSets.jl and collections of intervals in Intervals.jl)
In the end, most packages can just depend on a common API in IntervalSets.jl and just use duck-typing.
Yes, IntervalSets.jl may require some updates, but I'd like to narrow down what those should actually be. AFAIK, we didn't end up using a lot of the flexibility included with this package... maybe Beacon has used more of it? Can you clarify why you mean by "moving in the wrong direction"?
The argument for reverting back to storing bounds in a field rather than parameters is to differentiate the Intervals.Interval from the IntervalSets.Interval types. If you're just working with individual intervals or all intervals have the same bounds then use IntervalSets.Interval. If you're working with large collections of Intervals where you might not have consistent bounds then use the Interval.jl type. It's the same reasoning as using AnchoredInterval if you know that all intervals have the same span.
Again, the goal is to have the two main packages complement rather than compete with each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify why you mean by "moving in the wrong direction"?
What I meant by this is that IntervalSets.jl encodes the bound type information as type information which was closer to what was done before this PR than with the bit mask. Now that I understand your plan more with supporting both types this makes more sense.
Yes, IntervalSets.jl may require some updates, but I'd like to narrow down what those should actually be. AFAIK, we didn't end up using a lot of the flexibility included with this package.
I think one of those changes is to have IntervalSets.jl to move from Interval{L,R,T}
to Interval{T,L,R}
. If we're going to have multiple interval types the element type is going to come first for other implementations like you have here.
The argument for reverting back to storing bounds in a field rather than parameters is to differentiate the Intervals.Interval from the IntervalSets.Interval types. If you're just working with individual intervals or all intervals have the same bounds then use IntervalSets.Interval. If you're working with large collections of Intervals where you might not have consistent bounds then use the Interval.jl type. It's the same reasoning as using AnchoredInterval if you know that all intervals have the same span.
I can get behind having both types of concrete interval exist for different use cases. Where I'm less sure is how these packages should be combined (like you mentioned)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I've tried to provide a bit more context behind my motivation for the changes. In general the goal is to reduce maintenance requirements and complement IntervalSets.jl rather than compete with it which means:
- I don't really care about types vs symbols.
- I'm open to renaming things if folks want.
- Unifying the APIs should ideally require the fewest breaking changes for the ecosystem
- There should be a reason to use one type vs another. It feels like
IntervalSets.Interval
andIntervals.Interval
are competing with each other.
bounds_int(l::Type{Open}, r::Type{Open}) = 0x00 | ||
bounds_int(l::Type{Closed}, r::Type{Open}) = 0x01 | ||
bounds_int(l::Type{Open}, r::Type{Closed}) = 0x02 | ||
bounds_int(l::Type{Closed}, r::Type{Closed}) = 0x03 | ||
|
||
bounds_types(x::Integer) = bounds_types(Val(UInt8(x))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually my initial pass did this, but I concluded that the value dispatch was more declarative / readable and I didn't notice a performance difference either way... though maybe there's a specific case you're thinking of?
abstract type AbstractInterval{T, L <: Bound, R <: Bound} end | ||
# Methods for convert between int and tuple representations for space efficiency | ||
# (Open, Closed) is 16 bytes while the integer represenation is only 1 byte. | ||
# TODO: Convert types to symbols when we switch to extending IntervalSets.jl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel like it matters either way. My preference is more focused on making things consistent between the two, so Intervals.jl can extend IntervalSets.jl. As it stands, there doesn't seem to be much advantage to having them be types. We don't leverage the type hierarchy much and we don't store any information in them, so it's just extra names/types to remember. The practical benefit of switching to symbols is that it means we may be able to have Intervals.jl extend IntervalSets.jl with only one breaking release in Intervals.jl. Going the other way around would require a breaking release to IntervalSets.jl and a lot more updates in the ecosystem.
bounds_types(::Val{0x03}) = (Closed, Closed) | ||
|
||
# Extension for backwards support of Unbounded endpoints to avoid changing existing logic | ||
# TODO: Drop these in favour of the approach in IntervalSets.jl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the question is whether it should be the responsibility of either IntervalSets.jl or Intervals.jl to handle this with yet another custom type. AFAICT, bounded-ness is just a question of whether an endpoint value isfinite
which we already have an API for. My preference would be for that API to be handle independently of intervals like in Infinity.jl or Infinities.jl.
struct Interval{T} <: AbstractInterval{T} | ||
bounds::UInt8 # bounds comes first to allow incomplete construction for unbounded intervals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the plan is to:
- Subtype IntervalSets.AbstractInterval
- Provide conversion and promotion rules for different interval types
- Provide documentation on when to use each interval type and why
- More clearly identify what should live in each package (ie: core API in IntervalSets.jl and collections of intervals in Intervals.jl)
In the end, most packages can just depend on a common API in IntervalSets.jl and just use duck-typing.
Yes, IntervalSets.jl may require some updates, but I'd like to narrow down what those should actually be. AFAIK, we didn't end up using a lot of the flexibility included with this package... maybe Beacon has used more of it? Can you clarify why you mean by "moving in the wrong direction"?
The argument for reverting back to storing bounds in a field rather than parameters is to differentiate the Intervals.Interval from the IntervalSets.Interval types. If you're just working with individual intervals or all intervals have the same bounds then use IntervalSets.Interval. If you're working with large collections of Intervals where you might not have consistent bounds then use the Interval.jl type. It's the same reasoning as using AnchoredInterval if you know that all intervals have the same span.
Again, the goal is to have the two main packages complement rather than compete with each other.
Closes #208 and depends on #214
Intervals.Interval
now serves a unique purpose overIntervalSets.Interval
which we can now document:IntervalSets.Interval
IntervalSets.Interval
Intervals.Interval
TODO:
Interval
bounds as an integer field rather than type parameters.AnchoredIntervals
Benchmarks:
In the case where they're all the same, it's order of magnitudes faster to use the parameterized version in IntervalSets.jl
However, once the intervals have been allocated many operations have similar performance.
And once we need to deal with potentially mixed bounds the Intervals.Interval performs better... as you'd expect.
And because we don't need to box the elements operations over all elements are faster.