Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UniquifyAllKmers does not find kmers at end of the sequence #95

Open
gcorsi opened this issue Jan 10, 2025 · 1 comment
Open

UniquifyAllKmers does not find kmers at end of the sequence #95

gcorsi opened this issue Jan 10, 2025 · 1 comment

Comments

@gcorsi
Copy link

gcorsi commented Jan 10, 2025

The global_evaluation of the UniquifyAllKmers Specification does not find repeated kmers at the end of the sequence.

For instance, the following code results in a score of 0 (no kmers of size 10 repeated):

sequence = 'AGTTCCCGGTCACCTGAGCTCCGGGTGACGCGGCTGCGGTAGCATGGCGTCCCTCTTCAGTTCCCGGT'  
problem = DnaOptimizationProblem(sequence=sequence, constraints=[UniquifyAllKmers(k=10)])
for evaluation in problem.constraints_evaluations():
    print(evaluation.score)

However, the kmer AGTTCCCGGT is repeated, at the beginning and end of the sequence.

Suggested solution:

for i in range(start, end - self.k):

change to: for i in range(start, end - self.k + 1):

and (self.location.start <= start_ < end_ < self.location.end)

change to: and (self.location.start <= start_ < end_ <= self.location.end)

veghp added a commit that referenced this issue Jan 13, 2025
@veghp
Copy link
Member

veghp commented Jan 13, 2025

Many thanks for bringing attention to this and for the proposed solution. I managed to reproduce it and also implemented the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants