Suggestion: add design method like d-tailor #31

Lix1993 · 2019-12-23T02:10:17Z

It useful to design seq for experiment.
Besides, I think it can solve #27

Lix1993 · 2019-12-23T05:24:35Z

As described in d tailor tutorial:

In D-Tailor, a class defining a design objective extends the abstract class Design and there are already four predefined methods:
• Optimization—only one specific combination of property scores is desired. For example, to increase the expression of a given gene, we may want to design a sequence with high CAI, strong binding between SD and the 16S rRNA and weak mRNA secondary structure around the initiation region.
• FullFactorial—all possible combinations between the levels of the different properties are generated. This methodology is appropriate to systematically vary the multiple properties and quantify their effect the observed phenotype.
• CustomDesign—this is a more flexible design where the user can indicate each combination of property scores that he/she wants to design for.
• RandomSampling—this method does not enforce any particular combination of properties a priori. It can be used to generate a predetermined number of new sequence variants and observe how they scatter across the property space.

Zulko · 2019-12-23T10:50:54Z

Thanks for the suggestion, here are my thoughts on this so far:

DNA Chisel and D-tailor solve different problems: Chisel is about quickly converging to an optimal solution, even under complex specifications, and Tailor is about creating sets of sequences with different fitness with respect to objectives. As a consequence, the frameworks work differently. For your problem, I can see several approaches (here ranked from least work on DNA Chisel to most work):

Define specifications with a "target score" which you can tune, as I discuss in can I generate a sequence with a sequence with 'medium' score? #27. For instance, instead of MaximizeCAI, you would have an objective TuneCAI(target=some_score). With this you could obtain a sequence with the fitness you want. However, you would need to do one full optimization for every sequence you want to obtain. So if your goal is to generate hundreds of sequences, this could be much slower than Tailor.
To just create (unguided) variability between sequences you can generate sequences iteratively and make sure that each sequence is very different from any previously generated sequence. Like in this example, where a collection of different primers are generated from the same specifications. This is admitedly a naive solution, but it could work in your case
Find a way to convert the DNA Chisel specifications you need into D-tailor specifications, and use D-Tailor directly. This would be logical, since Tailor specializes in this kind of problem. I am not sure how well it would work.
Port the method used in Tailor to DNA Chisel. This could be by adding a new method find_multiobjective_variants to DnaOptimizationProblem, or (simpler in a first step) writing an extension of DNA Chisel (e.g. a Design class like you suggest, and maybe a Solver class) which implement the new search methods. You could still take advantage of Chisel's methods for constraining and generating mutations, checking hard constraints, or suggesting suboptimal regions, but the rest of the algorithm would be different.

There is also the problem of defining which score is "best", "good", "average", or "bad". In D-tailor, this is done by taking a big dataset of real-life sequences (e.g. all genes in E. coli) and looking at the distribution of the scores in the sequences. This would be doable with DNA Chisel too, however in DNA Chisel many scores depend on sequence length (for instance longer genes would most probably get worst scores as they would have more suboptimal regions), This is on purpose to guide global and local optimizations, but it makes the specifications less adapted to look at score distributions in a set of sequences, as Tailor does. You would need to redefine these DNA Chisel specifications a bit so they would be sequence-length-independant.

I'm tagging @jcg (who developed D-tailor) on this issue for awareness and possible suggestions.

Lix1993 · 2019-12-24T01:57:45Z

Since d-tailor is written in py2 and not update since 2013, add a objectiveDesignMixin may be a better way to solve this. I'll work on this.

Lix1993 · 2020-01-03T10:39:17Z

I wrote a prototype for this, at https://github.com/Lix1993/DnaChisel/tree/design
It should be re-organized,but it can work now.

I'm looking for a job now, so I may improve it in the future.

Zulko · 2020-01-04T11:22:11Z

That sounds great, it looks like you are following the D-Tailor naming and methods, let us know how it works in real life! As there are many files in the module, it could become a library of its own, so users could get D-tailor features from DnaChisel-compatible constraints and objectives.

Did I understand correctly that you are looking for a job? Is it in the computational biology area?

Lix1993 · 2020-01-06T01:24:43Z

yes,

In bioinformatics or computational biology area.

Lix1993 · 2020-01-06T02:30:19Z

Here is a simple resume for me.

Do you have any suggestions?

Lix1993 closed this as completed Jan 3, 2020

Zulko mentioned this issue Jan 13, 2020

Design mixin #34

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: add design method like d-tailor #31

Suggestion: add design method like d-tailor #31

Lix1993 commented Dec 23, 2019

Lix1993 commented Dec 23, 2019

Zulko commented Dec 23, 2019

Lix1993 commented Dec 24, 2019

Lix1993 commented Jan 3, 2020

Zulko commented Jan 4, 2020

Lix1993 commented Jan 6, 2020

Lix1993 commented Jan 6, 2020

Suggestion: add design method like d-tailor #31

Suggestion: add design method like d-tailor #31

Comments

Lix1993 commented Dec 23, 2019

Lix1993 commented Dec 23, 2019

Zulko commented Dec 23, 2019

Lix1993 commented Dec 24, 2019

Lix1993 commented Jan 3, 2020

Zulko commented Jan 4, 2020

Lix1993 commented Jan 6, 2020

Lix1993 commented Jan 6, 2020