You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you again for a very important tool to annotate CAZymes and identify CGCs in the microbial genomes of interest.
I am interested in examining how complete are the CGCs in my microbial genome of interest.
For example, if dbcan3 identifies 5 CGCs in my microbial genome of interest. To understand the %completeness of these CGCs, I extract out nucleotide sequences spanning the start and end coordinates of the CGCs and PULs from dbcan-PUL database. Then I do a BLASTn search of the 5 CGC sequences against the complete dbcan-PUL database to get %similarity and %coverage.
Is that a correct approach?
My goal is to bioinformatically say that we found 5 CGCs in the microbial genome, which are XYZ % similar to known PULs and have ABC % of completeness so we can speculate that these CGCs would be functional. But if the similarity and coverage are less than ~40% (arbitrary cutoff) then it's either a novel CGC or a non-functional CGC.
Looking forward to your suggestions and reply!
Regards,
Jigyasa
The text was updated successfully, but these errors were encountered:
The short answer is yes. We used a similar strategy in dbCAN3 when predicting substrates for CGCs by blast search against dbCAN-PULs, while our parsing thresholds are more relaxed (min identity 20% and min 2 CAZyme matches to call a CGC-PUL pair). However, I should mention that the boundary of CGCs (which affects the length of CGCs) is never rigorously evaluated. PUL boundaries are often experimentally determined (e.g., through rna-seq differential expression), but CGC boundaries are arbitrarily determined based on our CGC prediction criteria (default: at least one CAZyme and one transporter, and the number of inserted non-signature genes are less than 2; this can be customized by users). Therefore, in many cases, the %coverage or completeness cutoff you mentioned is difficult to determine.
Dear @yinlabniu,
Thank you again for a very important tool to annotate CAZymes and identify CGCs in the microbial genomes of interest.
I am interested in examining how complete are the CGCs in my microbial genome of interest.
For example, if dbcan3 identifies 5 CGCs in my microbial genome of interest. To understand the %completeness of these CGCs, I extract out nucleotide sequences spanning the start and end coordinates of the CGCs and PULs from dbcan-PUL database. Then I do a BLASTn search of the 5 CGC sequences against the complete dbcan-PUL database to get %similarity and %coverage.
Is that a correct approach?
My goal is to bioinformatically say that we found 5 CGCs in the microbial genome, which are XYZ % similar to known PULs and have ABC % of completeness so we can speculate that these CGCs would be functional. But if the similarity and coverage are less than ~40% (arbitrary cutoff) then it's either a novel CGC or a non-functional CGC.
Looking forward to your suggestions and reply!
Regards,
Jigyasa
The text was updated successfully, but these errors were encountered: