Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request merge of BfG Spectra to MassBank dev #245

Closed
wants to merge 8 commits into from

Conversation

ksjewell
Copy link

These are all current spectra from the BfG in the current library. The spectra were processed with RMassBank (without mass recalibration, since this did not increase accuracy) and then loaded to an internal SQLite DB. Here they are further curated. They are then exported from the internal SQLite DB using the Spectra package and MsBackendMassBank.

Naming: The assension number is made up of the date of export and the ID from the internal SQLite DB (to avoid any duplications). CSL stands for collective spectral library, the name of the internal SQLite DB. But in this commit I only included the BfG spectra.

I also changed the list of contributors to include the BfG.

@sneumann
Copy link
Member

Hi Kevin,
thanks for the PR, we'll have a look at it and come back to you. Yours, Steffen

@meowcat
Copy link

meowcat commented Oct 18, 2023

Hi Kevin,
0) this is not official feedback since I am not René or Steffen

  1. thanks for this awesome block of data
  2. on a first glance, I see COMMENT: CONFINDENCE (instead of COMMENT: CONFIDENCE). Now this is not a strictly regulated subtag but it would be nice to have it lined up with existing data (and it's even in the record format)
  3. There's an extra space after AUTHORS
  4. Is it possible to use CC licenses instead of whatever "dl-de" is? That's what's currently required on massbank.eu, afaik. https://github.com/MassBank/MassBank-web/blob/dev/Documentation/MassBankRecordFormat.md#2.1.5 @meier-rene opinion?

@ksjewell
Copy link
Author

Thanks for the corrections! I corrected the script on points 1 to 3 and I will update my branch.

Regarding point 4 I will wait for René's opinion. It is okay to change it, but if it doesn't matter we would prefer dl-de/by-2-0.

@schymane
Copy link
Member

schymane commented Oct 19, 2023

Is this it? https://www.govdata.de/dl-de/by-2-0
If we go with this, is it possible to hyperlink to the license, since not everyone may be familiar with it?

@ksjewell
Copy link
Author

Yes, that's the one. Sure, how do I turn it into a hyperlink? Should it look like this?:
LICENSE: (dl-de/by-2-0)[www.govdata.de/dl-de/by-2-0]

@meier-rene
Copy link
Collaborator

meier-rene commented Oct 23, 2023

Hi,
just a little comment on the license issue: As already pointed out, we encourage people to use a open license from the CC. We were not aware of the dl-de/by-2-0. This one looks also very liberal and is of course ok. You don't need to do anything about the link to the license. Now that I know about it, I will add that link to the web app and it will add that link to the html representation of your records. Please just stick with "LICENSE: dl-de/by-2-0". All you need to do, @ksjewell, is to solve the other issues mentioned in comment 3 and push it. I will than happily merge your data.

And now a little bit more general comment about licensing and acquiring a copyright to data. Our law experts in the NFDI4Chem tell us, that it is not possible to hold the copyright on measurement data at all. Only if its arranged or evaluated a copyright can be hold on the process. So I expect that all data an MassBank can not be subject to a copyright of anyone. Nevertheless we still have this LICENSE field in our data format. Its there because it has been for a long time, the inventors of the format were propably not aware of the fact that data is not copyrightable and we don't push to remove that. It also forces contributors to think about the fact, that their data might get compiled and reused in a different place. We think the "reuse" is also a purpose of this collection of data. But if contributor provide their data with a open license with the properties " ShareAlike" and " Attribution" and a consumer uses the data and does not follow this rules it will be probably impossible to enforce that.

So as a bottom line: The LICENSE tag might be superfluous and is a bit misleading but there was no initiative to remove that. It has no legal consequences for any consumer of the data and can only be considered as a wish.

@meowcat
Copy link

meowcat commented Oct 23, 2023

@meier-rene
Copy link
Collaborator

I took the time to look a bit closer to the data. I was not aware of the size of the contribution. Thank you for your effort!

But: All records miss the SMILES, but you provide the InChi for nearly all records. So the information is there. Nevertheless we require the SMILES in the record.

My question is now: Do you want to fix your pipeline, because you want to contribute more data in the future? Or do you just want to get the data deposited as fast as possible and never want to touch it again? 😉 If you want a working pipeline you need to make sure that the SMILES are also in the record file. Otherwise you can ask me to try to add that line from the given InChI.

@schymane
Copy link
Member

Adding SMILES from InChI is non-ideal, since it makes some tautomer assumptions that are not always true to the original SMILES. If they are not added, this will break our MassBank / PubChem integration...

@schymane
Copy link
Member

But I see SMILES in BFG000001?

@ksjewell
Copy link
Author

Thanks for all the help! I'll update the branch now.
@meier-rene, I have the SMILES in the original database, so I will just copy them from there (these were the original values used in RMassbank).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants