-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request merge of BfG Spectra to MassBank dev #245
Conversation
Trying to update the dev to the current version
Hi Kevin, |
Hi Kevin,
|
Thanks for the corrections! I corrected the script on points 1 to 3 and I will update my branch. Regarding point 4 I will wait for René's opinion. It is okay to change it, but if it doesn't matter we would prefer dl-de/by-2-0. |
Is this it? https://www.govdata.de/dl-de/by-2-0 |
Yes, that's the one. Sure, how do I turn it into a hyperlink? Should it look like this?: |
Hi, And now a little bit more general comment about licensing and acquiring a copyright to data. Our law experts in the NFDI4Chem tell us, that it is not possible to hold the copyright on measurement data at all. Only if its arranged or evaluated a copyright can be hold on the process. So I expect that all data an MassBank can not be subject to a copyright of anyone. Nevertheless we still have this LICENSE field in our data format. Its there because it has been for a long time, the inventors of the format were propably not aware of the fact that data is not copyrightable and we don't push to remove that. It also forces contributors to think about the fact, that their data might get compiled and reused in a different place. We think the "reuse" is also a purpose of this collection of data. But if contributor provide their data with a open license with the properties " ShareAlike" and " Attribution" and a consumer uses the data and does not follow this rules it will be probably impossible to enforce that. So as a bottom line: The LICENSE tag might be superfluous and is a bit misleading but there was no initiative to remove that. It has no legal consequences for any consumer of the data and can only be considered as a wish. |
@meier-rene Cool, I'm happy that the licensing will not be in the way of this! I think then you'll have to fix this in |
I took the time to look a bit closer to the data. I was not aware of the size of the contribution. Thank you for your effort! But: All records miss the SMILES, but you provide the InChi for nearly all records. So the information is there. Nevertheless we require the SMILES in the record. My question is now: Do you want to fix your pipeline, because you want to contribute more data in the future? Or do you just want to get the data deposited as fast as possible and never want to touch it again? 😉 If you want a working pipeline you need to make sure that the SMILES are also in the record file. Otherwise you can ask me to try to add that line from the given InChI. |
Adding SMILES from InChI is non-ideal, since it makes some tautomer assumptions that are not always true to the original SMILES. If they are not added, this will break our MassBank / PubChem integration... |
But I see SMILES in BFG000001? |
Thanks for all the help! I'll update the branch now. |
These are all current spectra from the BfG in the current library. The spectra were processed with RMassBank (without mass recalibration, since this did not increase accuracy) and then loaded to an internal SQLite DB. Here they are further curated. They are then exported from the internal SQLite DB using the Spectra package and MsBackendMassBank.
Naming: The assension number is made up of the date of export and the ID from the internal SQLite DB (to avoid any duplications). CSL stands for collective spectral library, the name of the internal SQLite DB. But in this commit I only included the BfG spectra.
I also changed the list of contributors to include the BfG.