feat(parser): extract authors from CFF files #115

rmfranken · 2024-11-18T16:23:07Z

This branch makes it possible to retrieve "author" objects from a CFF file and write them to turtle. It slightly refactored the generic parse function to output a graph structure (triples) instead of properties and values ("doubles?").

Additionally, minor fixes in poetry dependencies.

…output RDF of authors

Adds citation file (cff) to gimie repository for documenation and testing purposes.

…ure. Update pyshacl, comment out useless pyshacl test throwing errors

rmfranken · 2024-11-18T16:28:15Z

@cmdoret I'm especially interested in the tests - how do you generally decide how "deep" to test? Looking at the other tests I saw the level detail was not super high, so I tried to match that in my test- but happy to hear if there is a better way of deciding that.

Also, I think the container registry action that is failing seems to be related to a version number? Is that something I should add to my PR somehow? It seems related to the version of git, rather than any code I added :/

duplicate check

cmdoret

Nice work! I have a bunch of suggestions to improve code readability.

About tests: I would just add a test for behaviour on a cff with broken orcid (not a uri) and authors without doi. And making sure they don't appear in the output.

Note: you don't need to create cff files for tests, since the functions take bytes, you can write plain yaml in the test, e.g.:

data = b"""
<yaml-content>
"""
get_cff_authors(data)

I'm not sure there's an absolute rule about test depths, but in general, it's good to cover the obvious edge cases, and have them run on important code paths.

gimie/parsers/__init__.py

gimie/parsers/cff.py

gimie/parsers/license/__init__.py

tests/test_output.py

tests/test_parsers.py

gimie/parsers/abstract.py

Co-authored-by: Cyril Matthey-Doret <[email protected]>

cmdoret · 2024-11-20T09:05:25Z

About the container building issue: it seems that the newer versions of pydriller need a more recent git.

Changing the base-layer of our docker image from python:3.10-slim-bullseye to python:3.13-slim-bookworm should do it, it seems the latter comes with git 21.39.5

Co-authored-by: Cyril Matthey-Doret <[email protected]>

…utils

rmfranken · 2024-11-22T10:15:12Z

Since I was making a orcid matcher, I thought I would also move the doi matching logic to utils. Does that make sense?

cmdoret

Nice, this is starting to look really good! I have a few suggestions to improve things but most of them are optional.

The only really important one is the orcid url check in CffParser.

gimie/parsers/abstract.py

gimie/parsers/cff.py

gimie/parsers/license/__init__.py

tests/test_cff.py

cmdoret · 2024-11-28T12:23:15Z

note: bumped the dockerfile base layer so that ci succeeds

Co-authored-by: Cyril Matthey-Doret <[email protected]>

…rsing

cmdoret

LGTM 🚀 well done!

rmfranken and others added 14 commits November 14, 2024 11:09

feat:write goal of cff parser

c102e28

feat: add function body and placeholder

5ae7f8c

refactor: do list(dict) formatting within the function

1206d30

fix: forgot arrow in type annotation

751918a

chore: black reformat

9edcd6f

feat: update parsers to output graphs instead of property "doubles", …

68f48a2

…output RDF of authors

chore: update docstrings

84db808

fix: remove python <= 3.12 req

b43f448

feat: add "affiliation" of authors to output

5fc788a

docs: add CFF file (#111)

2dd85dd

Adds citation file (cff) to gimie repository for documenation and testing purposes.

feat: add test for cff author parsing

f17dd9f

fix: python dependencies in poetry

93d1286

fix: adapt parser tests to graph structure instead of property struct…

15dab2d

…ure. Update pyshacl, comment out useless pyshacl test throwing errors

fix: remove faulty cff function example description

fa53024

Update test_cff.py

a536511

duplicate check

rmfranken requested a review from cmdoret November 18, 2024 16:31

rmfranken added 2 commits November 18, 2024 17:32

Merge branch 'main' into cff_parsing

33e407d

Merge branch 'main' into cff_parsing

d9fe975

cmdoret requested changes Nov 19, 2024

View reviewed changes

fix: unused import

a91b59c

Co-authored-by: Cyril Matthey-Doret <[email protected]>

rmfranken and others added 8 commits November 20, 2024 10:15

fix: typo

b088a45

Co-authored-by: Cyril Matthey-Doret <[email protected]>

refactor: rename variable

d3eb1f4

docs: add docstring parameter for parser class

b81b5b6

refactor: rename variable

5238b64

feat: check if orcid is valid before writing

5645f6d

refactor: rename variable

6a25951

chore: remove pyshacl

f7a1165

fix: typo

5aba09d

rmfranken and others added 6 commits November 20, 2024 11:47

fix: remove unused imports

aef27dd

fix: tests for cff, add test for doi, move doi and orcid matchers to …

ee9238e

…utils

docs:fix docs of valid_doi_extractor

195c778

Merge branch 'main' into cff_parsing

1ba0ee7

refactor: doi re matcher

03adbf0

chore: remove unneccessary comment

9673aac

rmfranken requested a review from cmdoret November 22, 2024 10:15

cmdoret changed the title ~~feat: Expand CFF parser to retrieve author information~~ feat(parser): parse authors from CFF files Nov 26, 2024

cmdoret changed the title ~~feat(parser): parse authors from CFF files~~ feat(parser): extract authors from CFF files Nov 26, 2024

cmdoret reviewed Nov 27, 2024

View reviewed changes

chore(docker): bump base layer to python 3.13

420252e

rmfranken and others added 6 commits November 28, 2024 14:14

Update gimie/parsers/abstract.py

2a6272a

Co-authored-by: Cyril Matthey-Doret <[email protected]>

Update gimie/parsers/cff.py

a331a63

Co-authored-by: Cyril Matthey-Doret <[email protected]>

chore(docker): use python 3.12 base

9d69267

fix: improve tests, rename some variables

b91df2a

Merge branch 'cff_parsing' of github.com:sdsc-ordes/gimie into cff_pa…

8f61fe2

…rsing

fix:rename the example in extract_doi_march

477ab88

rmfranken requested a review from cmdoret November 28, 2024 14:00

cmdoret approved these changes Nov 28, 2024

View reviewed changes

rmfranken merged commit 56eabde into main Nov 28, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(parser): extract authors from CFF files #115

feat(parser): extract authors from CFF files #115

rmfranken commented Nov 18, 2024

rmfranken commented Nov 18, 2024

cmdoret left a comment •

edited

Loading

cmdoret commented Nov 20, 2024

rmfranken commented Nov 22, 2024

cmdoret left a comment

cmdoret commented Nov 28, 2024

cmdoret left a comment

feat(parser): extract authors from CFF files #115

feat(parser): extract authors from CFF files #115

Conversation

rmfranken commented Nov 18, 2024

rmfranken commented Nov 18, 2024

cmdoret left a comment • edited Loading

Choose a reason for hiding this comment

cmdoret commented Nov 20, 2024

rmfranken commented Nov 22, 2024

cmdoret left a comment

Choose a reason for hiding this comment

cmdoret commented Nov 28, 2024

cmdoret left a comment

Choose a reason for hiding this comment

cmdoret left a comment •

edited

Loading