You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So this is just a living issue now more to say, we've got a pretty nice BioSequences package with stuff well optimised. And given the number of kmers a program will process for even a modestly sized genome, we should aim to micro-optimise the hell out of this package too, and experiment with different kmer sizes and types and explore the performance profiles.
As an example, I think for low K (and consequently low N where N is the number of UInt64's backing the kmer), implementing the canonical method one way (the way it's currently implemented in Kmers.jl), is more optimal than the more generic BioSequences version, but at higher K & N, it is not, the generic BioSequences version seems more preferrable.
So this is just a living issue now more to say, we've got a pretty nice BioSequences package with stuff well optimised. And given the number of kmers a program will process for even a modestly sized genome, we should aim to micro-optimise the hell out of this package too, and experiment with different kmer sizes and types and explore the performance profiles.
As an example, I think for low K (and consequently low N where N is the number of UInt64's backing the kmer), implementing the
canonical
method one way (the way it's currently implemented in Kmers.jl), is more optimal than the more generic BioSequences version, but at higher K & N, it is not, the generic BioSequences version seems more preferrable.Pluto notebook benchmarking canonical: https://gist.github.com/SabrinaJaye/4e3d3fbe5d90ec275e3c591bff89dec8
The text was updated successfully, but these errors were encountered: