Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: add design method like d-tailor #31

Closed
Lix1993 opened this issue Dec 23, 2019 · 7 comments
Closed

Suggestion: add design method like d-tailor #31

Lix1993 opened this issue Dec 23, 2019 · 7 comments

Comments

@Lix1993
Copy link
Contributor

Lix1993 commented Dec 23, 2019

It useful to design seq for experiment.
Besides, I think it can solve #27

@Lix1993
Copy link
Contributor Author

Lix1993 commented Dec 23, 2019

As described in d tailor tutorial:

In D-Tailor, a class defining a design objective extends the abstract class Design and there are already four predefined methods:
• Optimization—only one specific combination of property scores is desired. For example, to increase the expression of a given gene, we may want to design a sequence with high CAI, strong binding between SD and the 16S rRNA and weak mRNA secondary structure around the initiation region.
• FullFactorial—all possible combinations between the levels of the different properties are generated. This methodology is appropriate to systematically vary the multiple properties and quantify their effect the observed phenotype.
• CustomDesign—this is a more flexible design where the user can indicate each combination of property scores that he/she wants to design for.
• RandomSampling—this method does not enforce any particular combination of properties a priori. It can be used to generate a predetermined number of new sequence variants and observe how they scatter across the property space.

@Zulko
Copy link
Member

Zulko commented Dec 23, 2019

Thanks for the suggestion, here are my thoughts on this so far:

DNA Chisel and D-tailor solve different problems: Chisel is about quickly converging to an optimal solution, even under complex specifications, and Tailor is about creating sets of sequences with different fitness with respect to objectives. As a consequence, the frameworks work differently. For your problem, I can see several approaches (here ranked from least work on DNA Chisel to most work):

  1. Define specifications with a "target score" which you can tune, as I discuss in can I generate a sequence with a sequence with 'medium' score? #27. For instance, instead of MaximizeCAI, you would have an objective TuneCAI(target=some_score). With this you could obtain a sequence with the fitness you want. However, you would need to do one full optimization for every sequence you want to obtain. So if your goal is to generate hundreds of sequences, this could be much slower than Tailor.
  2. To just create (unguided) variability between sequences you can generate sequences iteratively and make sure that each sequence is very different from any previously generated sequence. Like in this example, where a collection of different primers are generated from the same specifications. This is admitedly a naive solution, but it could work in your case
  3. Find a way to convert the DNA Chisel specifications you need into D-tailor specifications, and use D-Tailor directly. This would be logical, since Tailor specializes in this kind of problem. I am not sure how well it would work.
  4. Port the method used in Tailor to DNA Chisel. This could be by adding a new method find_multiobjective_variants to DnaOptimizationProblem, or (simpler in a first step) writing an extension of DNA Chisel (e.g. a Design class like you suggest, and maybe a Solver class) which implement the new search methods. You could still take advantage of Chisel's methods for constraining and generating mutations, checking hard constraints, or suggesting suboptimal regions, but the rest of the algorithm would be different.

There is also the problem of defining which score is "best", "good", "average", or "bad". In D-tailor, this is done by taking a big dataset of real-life sequences (e.g. all genes in E. coli) and looking at the distribution of the scores in the sequences. This would be doable with DNA Chisel too, however in DNA Chisel many scores depend on sequence length (for instance longer genes would most probably get worst scores as they would have more suboptimal regions), This is on purpose to guide global and local optimizations, but it makes the specifications less adapted to look at score distributions in a set of sequences, as Tailor does. You would need to redefine these DNA Chisel specifications a bit so they would be sequence-length-independant.

I'm tagging @jcg (who developed D-tailor) on this issue for awareness and possible suggestions.

@Lix1993
Copy link
Contributor Author

Lix1993 commented Dec 24, 2019

Since d-tailor is written in py2 and not update since 2013, add a objectiveDesignMixin may be a better way to solve this. I'll work on this.

@Lix1993
Copy link
Contributor Author

Lix1993 commented Jan 3, 2020

I wrote a prototype for this, at https://github.com/Lix1993/DnaChisel/tree/design
It should be re-organized,but it can work now.

I'm looking for a job now, so I may improve it in the future.

@Lix1993 Lix1993 closed this as completed Jan 3, 2020
@Zulko
Copy link
Member

Zulko commented Jan 4, 2020

That sounds great, it looks like you are following the D-Tailor naming and methods, let us know how it works in real life! As there are many files in the module, it could become a library of its own, so users could get D-tailor features from DnaChisel-compatible constraints and objectives.

Did I understand correctly that you are looking for a job? Is it in the computational biology area?

@Lix1993
Copy link
Contributor Author

Lix1993 commented Jan 6, 2020

yes,

In bioinformatics or computational biology area.

@Lix1993
Copy link
Contributor Author

Lix1993 commented Jan 6, 2020

Here is a simple resume for me.

Do you have any suggestions?

@Zulko Zulko mentioned this issue Jan 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants