-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(parser): extract authors from CFF files #115
Conversation
…output RDF of authors
Adds citation file (cff) to gimie repository for documenation and testing purposes.
…ure. Update pyshacl, comment out useless pyshacl test throwing errors
@cmdoret I'm especially interested in the tests - how do you generally decide how "deep" to test? Looking at the other tests I saw the level detail was not super high, so I tried to match that in my test- but happy to hear if there is a better way of deciding that. Also, I think the container registry action that is failing seems to be related to a version number? Is that something I should add to my PR somehow? It seems related to the version of git, rather than any code I added :/ |
duplicate check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I have a bunch of suggestions to improve code readability.
About tests: I would just add a test for behaviour on a cff with broken orcid (not a uri) and authors without doi. And making sure they don't appear in the output.
Note: you don't need to create cff files for tests, since the functions take bytes, you can write plain yaml in the test, e.g.:
data = b"""
<yaml-content>
"""
get_cff_authors(data)
I'm not sure there's an absolute rule about test depths, but in general, it's good to cover the obvious edge cases, and have them run on important code paths.
Co-authored-by: Cyril Matthey-Doret <[email protected]>
About the container building issue: it seems that the newer versions of pydriller need a more recent Changing the base-layer of our docker image from |
Co-authored-by: Cyril Matthey-Doret <[email protected]>
Since I was making a orcid matcher, I thought I would also move the doi matching logic to utils. Does that make sense? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this is starting to look really good! I have a few suggestions to improve things but most of them are optional.
The only really important one is the orcid url check in CffParser.
note: bumped the dockerfile base layer so that ci succeeds |
Co-authored-by: Cyril Matthey-Doret <[email protected]>
Co-authored-by: Cyril Matthey-Doret <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀 well done!
This branch makes it possible to retrieve "author" objects from a CFF file and write them to turtle. It slightly refactored the generic parse function to output a graph structure (triples) instead of properties and values ("doubles?").
Additionally, minor fixes in poetry dependencies.