Replies: 14 comments 1 reply
-
We analyzed 100 ng Pearce HeLa digest using the Evospep One (40 SPD method). So very similar to your manuscript. |
Beta Was this translation helpful? Give feedback.
-
Hi, the two big caveats are that we don't have automatic parameter optimization and multi step searches. To get the same results you will need to set the following parameters in the first search:
Then perform a second search without library prediction and select the
This should give you very similar performance :) |
Beta Was this translation helpful? Give feedback.
-
😟 Now I am really surprised. Currently, no parameter opt. and no 2nd search with the opt. library? Automating the 2nd search using a script is easy to do. But how should I guess the opt. parameters without having seen the data a priori? |
Beta Was this translation helpful? Give feedback.
-
Hi, it's not as bad as it sounds. 😄 Lots of parameters are already bring automatically optimized but there are still three major parameters: MS1 tolerance, MS2 tolerance and RT tolerance. MS1 and MS2 are self explanatory and RT tolerance is the largest expected error after calibration. We usually set this to 200-300s for 60, 40SPD. the reason is that we want to offer best performance across methods and gradients. Therefore we are currently compiling a test suite with different methods and setups. |
Beta Was this translation helpful? Give feedback.
-
Ok. Let's get practical. I am having a set of DIA raw files from the ASTRAL and acquired data incl. lock mass correction. So we basically know it is high mass accuracy data (because you can check lock mass correction easily without doing a DB search first). Then you are suggesting to use a default values for all OT scans (MS1), here you used 4 ppm? Why exactly 4? And a default value for ASTRAL scans (MS2), again why exactly 7? And your iRT matching tolerance is simply based on historical knowledge (because you know your column setup and the Evosep performance, so you go for +- 2 min)? Why 2? |
Beta Was this translation helpful? Give feedback.
-
Yes, this is similar to how 5ppm and 10ppm has been the default for 70k, 35k resolution on orbitrap instruments or 15ppm for timsTOF default DIA settings. We have seen that Astral data works well with 6-7ppm. Regarding the RT tolerance you should aim for 30% of total gradient (21min*0.3 = 6.3min; +- 189sec) for the first search and 15% for the second search. Otherwise I would recommend to look at the methods section of the manuscript for inspiration. That being said, I absolutely I agree that this is not practical and we are working on a prototype to solve this. The reason is really that we have to set priorities and as we have very well controlled and standardized instrument setup with Evosep, ion opticks and the Astrals. This was therefore fine for getting good performance and we could focus on establishing confident FDR, speed, quantification etc. I'm curious to hear how alphaDIA performs on your data with the updated parameters! I can update you when we have the first release to test. As you are using the developer version anyways, any feedback on the automated optimization would be appreciated once its part of a release candidate. |
Beta Was this translation helpful? Give feedback.
-
As you predicted, using the 2-pass search with the above parameters gives:
I implemented this using a very basic slurm batch script
But I think you should really change the manuscript in this respect: "With these state-of-the art predicted libraries, we devised a two-step search workflow in alphaDIA consisting of library refinement and quantification (Fig. 5 a)." This sounds like the alphadia actively manages both steps as part of an integrated workflow. Are the precursor now counted at 1% FDR? Why is the MS1 accuracy zero? Is it possible to add the MS2 accuracy? Best, |
Beta Was this translation helpful? Give feedback.
-
Ah, that's good to see! I would assume that there is some additional performance with parameter optimization. I use a very similar SLURM script for my two step searches and hope to inlcude it in the docs soon. We wrote it like this in the manuscript because it was designed for multi step searches and therefore builds the MBR library etc. This should allow testing and benchmarking. From a scientific point of view this was the priority. Therefore, also the FDR is not arbitrary but was made to work in such context. It's actually surprisingly hard to get all of this somewhat right :D. Of course it's important for adoption to make it easily accessible in the GUI and allow multi step searches. At the same time, we don't just want to have a single check box which gives you some multi step search but we want to have this in a configurable, transparent fashion, designed with good software engineering in mind. It's really cool to see that you have already explored alphaDIA a bit. If you have time I would happy to schedule a Zoom call and hear about your experience from the perspective of a very technical user. This is to be an open source community project, so all contribution is welcome. You can reach me at (lastname)@biochem.mpg.de 😀. Regarding the FDR, it's always controlled on a local precursor and global protein level. Therefore it will be controlled for the first search, the MBR library and the second search results. Something I would recommend is to set the |
Beta Was this translation helpful? Give feedback.
-
Ok! Will try. I was already guessing that you use alphadia within slurm. I think this is in general something of interest for the community. Are there things that one should be aware of when running alphadia by |
Beta Was this translation helpful? Give feedback.
-
Yes, we have some pipelines and do most processing on Slurm. I have some templates for transfer learning and two step searches I will share. We use sbatch with conda, similar to your script. I would generally recommend to only use a single socket at a time. So only use as many threads as available on a single socket and allow two tasks per node if you have two socket machines. For simple searches 64gb to 128 GB should be sufficient, for more complicated searches 256gb is better. We haven't really optimized for memory yet. Another trick is to use the --config-dict argument in the CLI. https://alphadia.readthedocs.io/en/latest/methods/command-line.html |
Beta Was this translation helpful? Give feedback.
-
I am currently testing on a single cluster node like:
If I get the above correctly we have 128 CPUs, each sitting in a separate socket having a single core. Each core is set for a single thread. So your recommendation would mean using a single thread? 😄 So far I was running 32 or 64 thread alphadia commands without setting anything in https://slurm.schedmd.com/cpu_management.html are not all perfectly clear to me. |
Beta Was this translation helpful? Give feedback.
-
I am constantly modifying the slurm batch script, but currently the 2-pass search jobs like like:
|
Beta Was this translation helpful? Give feedback.
-
The slurm batch script for the 2-pass search
not quite sure if the |
Beta Was this translation helpful? Give feedback.
-
Hi, this looks good, although your script is already much more sophisticated than mine :D I your case with The config dict is a json.
|
Beta Was this translation helpful? Give feedback.
-
Describe the bug
number of precursors reported by alphadia in
stat.tsv
deviates strongly from precursor values reported by DIA-NN (library-free mode) on the same files.vs.
According to Fig. 5 of your manuscript one would expect similar values on precursor and protein counts on ASTRAL data (in-house generated using 2 Da fixed window DIA). How do you filter the precursor data for the stat.tsv file?
Logs
attached
log.txt
Version (please complete the following information):
Beta Was this translation helpful? Give feedback.
All reactions