Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ordering of genomic intervals only works as expected when parameterised by the same type #44

Open
owensnick opened this issue Nov 12, 2021 · 2 comments

Comments

@owensnick
Copy link

Comparison of intervals isless, isordered, precedes of intervals is defined for intervals with identical type parameters for their metadata, for example:

Base.isless(a::Interval{T}, b::Interval{T}, seqname_isless::Function=isless) where T

This causes some surprises when comparing intervals with different parametric types as it falls back on basic_isless defined in IntervalTrees.jl and does not compare seqname:

## Comparing Interval{String} with Interval{String} 
julia> Interval("chr2", 1, 10, '.', "") < Interval("chr1", 11, 20, '.', "")
false  # expected
julia> @which Interval("chr2", 1, 10, '.', "") < Interval("chr1", 11, 20, '.', "")
isless(a::Interval{T}, b::Interval{T}) where T in GenomicFeatures at ... .julia/packages/GenomicFeatures/bVCwr/src/interval.jl:82

## Comparing Interval{Int64} with Interval{String} 
julia> Interval("chr2", 1, 10, '.',  1) < Interval("chr1", 11, 20, '.', "")
true # not expected, only compares first and last
julia> @which isless(Interval("chr2", 1, 10, '.', 1), Interval("chr1", 11, 20, '.', ""))
isless(u::IntervalTrees.AbstractInterval, v::IntervalTrees.AbstractInterval) in IntervalTrees at  ... .julia/packages/IntervalTrees/wh2ex/src/IntervalTrees.jl:30

It would be super useful to be able to order intervals when their metadata was of different types. The most obvious change would be to allow different type parameters:
Base.isless(a::Interval{T}, b::Interval{V}, seqname_isless::Function=isless) where {T, V}

However, could this cause issues breaking ordering equality conditions, as it makes sense to define isequal for two intervals that have equal metadata. So for two intervals x and y with identical coordinates but different metadata you would have all of isless(x, y), isless(y, x) and isequal(x, y) being false.

My use case is that I want to calculate distances between intervals with different metadata and using functions like sort and searchsorted are useful in doing this.

Would be happy if isordered was changed to isordered(a::Interval{T}, b::Interval{V}, seqname_isless::Function=isless) where {T, V}, and then use: searchsorted(intervals, iv, lt=GenomicFeatures.isordered), unless anyone sees any issues in doing so?

@CiaranOMara
Copy link
Member

CiaranOMara commented Nov 12, 2021

I think I've got upstream code that addresses your use case. But let me know if it doesn't and I'll have another look at your suggestion. I've linked the relevant code below.

function Base.isless(a::AbstractGenomicInterval, b::AbstractGenomicInterval, seqname_isless::Function=isless)
if seqname(a) != seqname(b)
return seqname_isless(seqname(a), seqname(b))::Bool
end
if leftposition(a) != leftposition(b)
return leftposition(a) < leftposition(b)
end
if rightposition(a) != rightposition(b)
return rightposition(a) < rightposition(b)
end
return false
end
function Base.isless(a::GenomicInterval, b::GenomicInterval, seqname_isless::Function=isless)
if seqname(a) != seqname(b)
return seqname_isless(seqname(a), seqname(b))::Bool
end
if leftposition(a) != leftposition(b)
return leftposition(a) < leftposition(b)
end
if rightposition(a) != rightposition(b)
return rightposition(a) < rightposition(b)
end
if strand(a) != strand(b)
return strand(a) < strand(b)
end
return false
end
"""
Check if two intervals are well ordered.
`AbstractGenomicInterval` are considered well ordered if seqname(a) <= seqname(b) and leftposition(a) <= leftposition(b).
"""
function isordered(a::AbstractGenomicInterval, b::AbstractGenomicInterval, seqname_isless::Function=isless)
if seqname(a) != seqname(b)
return seqname_isless(seqname(a), seqname(b))::Bool
end
if leftposition(a) != leftposition(b)
return leftposition(a) < leftposition(b)
end
return true
end

If you want to test the code, add the following packages.

add https://github.com/CiaranOMara/IntervalTrees.jl#pu/v1.1.0
add GenomicFeatuers#pu/v3.0.0

The main changes are that the Interval type is renamed GenomicInterval and now subtypes AbstractGenomicInterval; and IntervalCollection is now GenomicIntervalCollection and the element type of the collection is now the AbstractGenomicInterval instead of the interval's metadata type. The rest should be familiar.

These branches are fairly stable as I'm using them for my own work. The reason they haven't been merged into master is that it doesn't seem correct/reasonable to run ahead with GenomicFeatures v3 before bring all of the other packages up to date with GenomicFeatures v2.

@CiaranOMara
Copy link
Member

I've moved the relevant code down to the head of the develop branch. So the functionality you require without the new types is now also an option for you with dev GenomicFeatures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants