Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(parser): extract authors from CFF files #115

Merged
merged 39 commits into from
Nov 28, 2024
Merged

feat(parser): extract authors from CFF files #115

merged 39 commits into from
Nov 28, 2024

Conversation

rmfranken
Copy link
Member

This branch makes it possible to retrieve "author" objects from a CFF file and write them to turtle. It slightly refactored the generic parse function to output a graph structure (triples) instead of properties and values ("doubles?").

Additionally, minor fixes in poetry dependencies.

@rmfranken
Copy link
Member Author

@cmdoret I'm especially interested in the tests - how do you generally decide how "deep" to test? Looking at the other tests I saw the level detail was not super high, so I tried to match that in my test- but happy to hear if there is a better way of deciding that.

Also, I think the container registry action that is failing seems to be related to a version number? Is that something I should add to my PR somehow? It seems related to the version of git, rather than any code I added :/

duplicate check
@rmfranken rmfranken requested a review from cmdoret November 18, 2024 16:31
Copy link
Member

@cmdoret cmdoret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I have a bunch of suggestions to improve code readability.

About tests: I would just add a test for behaviour on a cff with broken orcid (not a uri) and authors without doi. And making sure they don't appear in the output.

Note: you don't need to create cff files for tests, since the functions take bytes, you can write plain yaml in the test, e.g.:

data = b"""
<yaml-content>
"""
get_cff_authors(data)

I'm not sure there's an absolute rule about test depths, but in general, it's good to cover the obvious edge cases, and have them run on important code paths.

gimie/parsers/__init__.py Outdated Show resolved Hide resolved
gimie/parsers/__init__.py Outdated Show resolved Hide resolved
gimie/parsers/cff.py Outdated Show resolved Hide resolved
gimie/parsers/cff.py Outdated Show resolved Hide resolved
gimie/parsers/cff.py Outdated Show resolved Hide resolved
gimie/parsers/cff.py Outdated Show resolved Hide resolved
gimie/parsers/license/__init__.py Outdated Show resolved Hide resolved
tests/test_output.py Outdated Show resolved Hide resolved
tests/test_parsers.py Outdated Show resolved Hide resolved
gimie/parsers/abstract.py Show resolved Hide resolved
Co-authored-by: Cyril Matthey-Doret <[email protected]>
@cmdoret
Copy link
Member

cmdoret commented Nov 20, 2024

About the container building issue: it seems that the newer versions of pydriller need a more recent git.

Changing the base-layer of our docker image from python:3.10-slim-bullseye to python:3.13-slim-bookworm should do it, it seems the latter comes with git 21.39.5

@rmfranken
Copy link
Member Author

Since I was making a orcid matcher, I thought I would also move the doi matching logic to utils. Does that make sense?

@rmfranken rmfranken requested a review from cmdoret November 22, 2024 10:15
@cmdoret cmdoret changed the title feat: Expand CFF parser to retrieve author information feat(parser): parse authors from CFF files Nov 26, 2024
@cmdoret cmdoret changed the title feat(parser): parse authors from CFF files feat(parser): extract authors from CFF files Nov 26, 2024
Copy link
Member

@cmdoret cmdoret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is starting to look really good! I have a few suggestions to improve things but most of them are optional.

The only really important one is the orcid url check in CffParser.

gimie/parsers/abstract.py Outdated Show resolved Hide resolved
gimie/parsers/cff.py Outdated Show resolved Hide resolved
gimie/parsers/cff.py Outdated Show resolved Hide resolved
gimie/parsers/license/__init__.py Outdated Show resolved Hide resolved
gimie/parsers/license/__init__.py Outdated Show resolved Hide resolved
tests/test_cff.py Outdated Show resolved Hide resolved
tests/test_cff.py Outdated Show resolved Hide resolved
tests/test_cff.py Outdated Show resolved Hide resolved
@cmdoret
Copy link
Member

cmdoret commented Nov 28, 2024

note: bumped the dockerfile base layer so that ci succeeds

@rmfranken rmfranken requested a review from cmdoret November 28, 2024 14:00
Copy link
Member

@cmdoret cmdoret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀 well done!

@rmfranken rmfranken merged commit 56eabde into main Nov 28, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants