Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent indels generated by Indelgen when using REVERSE sequences #8

Open
cseale opened this issue Feb 2, 2021 · 1 comment
Open

Comments

@cseale
Copy link

cseale commented Feb 2, 2021

Hi @felicityallen,

There seems to be an issue with indelgen in that the potential indel profiles that are generated are not consistent with one between a sequence and it's reverse complement.

Consider the following example:

>Oligo2_GGTTGAAAGTCTATAGTGGT 49 REVERSE
AAATGCGTAACAAAAACAAGCTCGGTATCTGGCCTATTCCACGCCACCCACCACTATAGACTTTCAACATTGTATTGTC

When we run indelgen on this sequence, we get the following possible inserts:

I1_L-1C2R2	
I1_L0R1	
I2_L-1C1R1	
I2_L-1C2R2	
I2_L-3C3R1	
I2_L0C1R2	
I2_L0C3R4	
I2_L0R1	

If we take the reverse complement of Oligo_2, and take the PAM location as 79 (sequence length) - 49 = 30, as below

>Oligo2_GGTTGAAAGTCTATAGTGGT_FORWARD 30 FORWARD
GACAATACAATGTTGAAAGTCTATAGTGGTGGGTGGCGTGGAATAGGCCAGATACCGAGCTTGTTTTTGTTACGCATTT

we get a different set of indels:

I1_L-1C2R2	
I1_L-1R0	2	
I1_L-2C1R0	
I2_L-1C1R1	
I2_L-1C2R2	
I2_L-1R0	9	
I2_L-2C1R0	
I2_L-3C3R1	

I have tried this for multiple examples. I would expect the set of returned indels to be the same. Maybe you know what the issue is here? I am not familiar with C++ myself, so it will take me a while to familiarise myself enough to be able to debug this problem

@felicityallen
Copy link
Owner

Hi @cseale,

Apologies, I think there is probably an off-by-one bug somewhere in the reverse code for indelgen but I never used it in that direction and didn't see a need to so I never debugged that and unfortunately don't have time to at the minute. The reverse part was necessary for the read mapping in our experiment, but not for the indel generation for making predictions, where I always took reverse complement of the sequence first.

I will add an error warning not to use REVERSE for indelgen, if you want to generate indels for a reverse strand just take the reverse complement of the sequence and use FORWARD.

Sorry for the confusion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants