Skip to content

WIP SRUopener #682

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 13 commits into
base: master
Choose a base branch
from
Draft

WIP SRUopener #682

wants to merge 13 commits into from

Conversation

dr0i
Copy link
Member

@dr0i dr0i commented Mar 28, 2025

This is a draft and WIP.
@TobiasNx you can use it for functional testing.

Resolves #510.

@dr0i dr0i requested a review from TobiasNx March 28, 2025 12:18
@dr0i dr0i changed the title WIP SRUopener (#510) WIP SRUopener Mar 28, 2025
@dr0i dr0i moved this to Review in Metafacture Mar 28, 2025
@TobiasNx
Copy link
Contributor

Nice seems to work. +1
The printed logs are a little bit esoteric:

, startRecord=1, maximumRecords=1000, istream.length=7865
urlToOpen=https://services.dnb.de/sru/zdb?query=dnb.isil%3DDE-Sol1&operation=searchRetrieve&recordSchema=MARC21plus-xml&version=1.1&maximumRecords=1000&startRecord=1001
, startRecord=1001, maximumRecords=1000, istream.length=7865
urlToOpen=https://services.dnb.de/sru/zdb?query=dnb.isil%3DDE-Sol1&operation=searchRetrieve&recordSchema=MARC21plus-xml&version=1.1&maximumRecords=1000&startRecord=2001
, startRecord=2001, maximumRecords=1000, istream.length=7865
urlToOpen=https://services.dnb.de/sru/zdb?query=dnb.isil%3DDE-Sol1&operation=searchRetrieve&recordSchema=MARC21plus-xml&version=1.1&maximumRecords=1000&startRecord=3001
, startRecord=3001, maximumRecords=1000, istream.length=7865
urlToOpen=https://services.dnb.de/sru/zdb?query=dnb.isil%3DDE-Sol1&operation=searchRetrieve&recordSchema=MARC21plus-xml&version=1.1&maximumRecords=1000&startRecord=4001
, startRecord=4001, maximumRecords=1000, istream.length=437

@TobiasNx
Copy link
Contributor

@dr0i is still in review?

@dr0i
Copy link
Member Author

dr0i commented Apr 10, 2025

As we found out in #510 this PR needs a complete redesign.

@dr0i dr0i force-pushed the 510-addSruOpener branch from ecd9c8c to c3f3ad6 Compare April 10, 2025 13:32
@dr0i dr0i force-pushed the 510-addSruOpener branch from 84d6845 to 3dc0416 Compare June 2, 2025 14:24
@dr0i
Copy link
Member Author

dr0i commented Jun 2, 2025

@TobiasNx can you do functional tests before I go on here? Have a look at the @Description to see how it works (hint: "stream" based, i.e. other than the OAI-PMH opener works atm.)
I've added the class to flux-commands.
[edit]: and ignore the failing editorconfigChecker for now.

@dr0i dr0i requested a review from TobiasNx June 2, 2025 14:38
@TobiasNx
Copy link
Contributor

TobiasNx commented Jun 4, 2025

@dr0i I tried to install the dist: https://metafacture.github.io/metafacture-documentation/docs/flux/Flux-User-Guide.html#build-from-local-distribution to try the runner for functional testing

but it runs into errors:

$ ./gradlew installDist

> Configure project :
HEAD has no annotated tags
No SCM tag found. Making a snapshot build
Feature branch found
Version is feature-510-addSruOpener-SNAPSHOT

[Incubating] Problems report is available at: file:///home/user/git/metafacture-core/build/reports/problems/problems-report.html

Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

For more on this, please refer to https://docs.gradle.org/8.13/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.

When I test the flux.sh then it outputs the following:

$ /home/user/git/metafacture-core/metafacture-runner/build/install/metafacture-core/flux.sh
Exception in thread "main" java.lang.ExceptionInInitializerError
        at org.metafacture.runner.Flux.main(Flux.java:62)
Caused by: org.metafacture.commons.reflection.ReflectionException: Class not found: org.metafacture.io.
        at org.metafacture.commons.reflection.ReflectionUtil.loadClass(ReflectionUtil.java:70)
        at org.metafacture.commons.reflection.ObjectFactory.loadClassesFromMap(ObjectFactory.java:57)
        at org.metafacture.flux.parser.FluxProgramm.<clinit>(FluxProgramm.java:54)
        ... 1 more
Caused by: java.lang.ClassNotFoundException: org.metafacture.io.
        at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
        at org.metafacture.commons.reflection.ReflectionUtil.loadClass(ReflectionUtil.java:67)
        ... 3 more

Can you help? (I tested the current master to compare, there $ ./gradlew installDist and $ /home/user/git/metafacture-core/metafacture-runner/build/install/metafacture-core/flux.sh works)

@TobiasNx TobiasNx assigned dr0i and unassigned TobiasNx Jun 4, 2025
@dr0i
Copy link
Member Author

dr0i commented Jun 5, 2025

Ah, I accidently removed the TarReader.
Try again please.

@dr0i dr0i removed their assignment Jun 5, 2025
@dr0i dr0i force-pushed the 510-addSruOpener branch from 4a00838 to ac80718 Compare June 10, 2025 13:02
@TobiasNx
Copy link
Contributor

TobiasNx commented Jun 18, 2025

Current version stucks in an endless SRU request loop starting by 1 again after finishing all request does not matter if a total number of records is given or not:

e.g.

"https://services.dnb.de/sru/authorities"
| open-sru(recordSchema="MARC21plus-xml", query="WOE%3Dsozialistenkongress%20and%20COD%3Ds",version="1.1",maximumRecords="10")
| object-batch-log(batchsize="10")
| as-records
| write(FLUX_DIR + "result.txt")
;
"https://services.dnb.de/sru/authorities"
| open-sru(recordSchema="MARC21plus-xml", query="WOE%3Dsozialistenkongress%20and%20COD%3Ds",version="1.1",maximumRecords="10",total="10")
| object-batch-log(batchsize="10")
| as-records
| write(FLUX_DIR + "result.txt")
;

Both result in, see that recordPosition 1 is turning up again after the expected last recordPosition 8:

<?xml version="1.0" encoding="UTF-8"?><searchRetrieveResponse xmlns="http://www.loc.gov/zing/srw/"><version>1.1</version><numberOfRecords>8</numberOfRecords><records><record><recordSchema>MARC21plus-xml</recordSchema><recordPacking>xml</recordPacking><recordData><collection xmlns="http://www.loc.gov/MARC21/slim">
  <record type="Authority">
    <leader>00000nz  a2200000nc 4500</leader>
    <controlfield tag="001">042278333</controlfield>
    <controlfield tag="003">DE-101</controlfield>
    <controlfield tag="005">20110429135047.0</controlfield>
    <controlfield tag="008">900305n||azznnaabn           | ana    |c</controlfield>
    <datafield ind1="7" ind2=" " tag="024">
      <subfield code="a">4227833-8</subfield>
      <subfield code="0">http://d-nb.info/gnd/4227833-8</subfield>
      <subfield code="2">gnd</subfield>
    </datafield>
    <datafield ind1=" " ind2=" " tag="035">
      <subfield code="a">(DE-101)042278333</subfield>
...
    <datafield ind1=" " ind2=" " tag="913">
      <subfield code="S">swd</subfield>
      <subfield code="i">k</subfield>
      <subfield code="a">Internationaler Sozialistenkongress</subfield>
      <subfield code="0">(DE-588c)4021089-3</subfield>
    </datafield>
  </record>
</collection></recordData><recordPosition>8</recordPosition></record></records><echoedSearchRetrieveRequest><version>1.1</version><query>WOE=sozialistenkongress and COD=s</query><xQuery xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/><startRecord>6</startRecord><maximumRecords>5</maximumRecords><recordSchema>MARC21plus-xml</recordSchema></echoedSearchRetrieveRequest></searchRetrieveResponse>
<?xml version="1.0" encoding="UTF-8"?><searchRetrieveResponse xmlns="http://www.loc.gov/zing/srw/"><version>1.1</version><numberOfRecords>8</numberOfRecords><records><record><recordSchema>MARC21plus-xml</recordSchema><recordPacking>xml</recordPacking><recordData><collection xmlns="http://www.loc.gov/MARC21/slim">
  <record type="Authority">
    <leader>00000nz  a2200000nc 4500</leader>
    <controlfield tag="001">042278333</controlfield>
    <controlfield tag="003">DE-101</controlfield>
    <controlfield tag="005">20110429135047.0</controlfield>
    <controlfield tag="008">900305n||azznnaabn           | ana    |c</controlfield>
    <datafield ind1="7" ind2=" " tag="024">
      <subfield code="a">4227833-8</subfield>
      <subfield code="0">http://d-nb.info/gnd/4227833-8</subfield>
      <subfield code="2">gnd</subfield>
...
    </datafield>
    <datafield ind1=" " ind2=" " tag="913">
      <subfield code="S">swd</subfield>
      <subfield code="i">c</subfield>
      <subfield code="a">Bern / Internationaler Sozialistenkongress &lt;1919&gt;</subfield>
      <subfield code="0">(DE-588c)4227833-8</subfield>
    </datafield>
  </record>
</collection></recordData><recordPosition>1</recordPosition></record><record><recordSchema>MARC21plus-xml</recordSchema><recordPacking>xml</recordPacking><recordData><collection xmlns="http://www.loc.gov/MARC21/slim">
  <record type="Authority">
    <leader>00000nz  a2200000nc 4500</leader>
    <controlfield tag="001">1267605979</controlfield>
    <controlfield tag="003">DE-101</controlfield>
    <controlfield tag="005">20230329111229.0</controlfield>
    <controlfield tag="008">220908n||azznnaabn           | ana    |c</controlfield>
    <datafield ind1="7" ind2=" " tag="024">
      <subfield code="a">1267605979</subfield>
      <subfield code="0">http://d-nb.info/gnd/1267605979</subfield>
      <subfield code="2">gnd</subfield>
...

@TobiasNx TobiasNx assigned dr0i and unassigned TobiasNx Jun 18, 2025
Copy link
Contributor

@TobiasNx TobiasNx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that SRU opener stucks in infinite loop. See: #682 (comment)

@dr0i
Copy link
Member Author

dr0i commented Jun 20, 2025

The inifinite loop should be fixed with b92238b, please try again @TobiasNx .

@dr0i dr0i assigned TobiasNx and unassigned dr0i Jun 20, 2025
@TobiasNx TobiasNx self-requested a review June 20, 2025 11:37
Copy link
Contributor

@TobiasNx TobiasNx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice looks good! :) +1

@TobiasNx
Copy link
Contributor

Current version stucks in an endless SRU request loop starting by 1 again after finishing all request does not matter if a total number of records is given or not:

e.g.

"https://services.dnb.de/sru/authorities"
| open-sru(recordSchema="MARC21plus-xml", query="WOE%3Dsozialistenkongress%20and%20COD%3Ds",version="1.1",maximumRecords="10")
| object-batch-log(batchsize="10")
| as-records
| write(FLUX_DIR + "result.txt")
;
"https://services.dnb.de/sru/authorities"
| open-sru(recordSchema="MARC21plus-xml", query="WOE%3Dsozialistenkongress%20and%20COD%3Ds",version="1.1",maximumRecords="10",total="10")
| object-batch-log(batchsize="10")
| as-records
| write(FLUX_DIR + "result.txt")
;

Both result in, see that recordPosition 1 is turning up again after the expected last recordPosition 8:

<?xml version="1.0" encoding="UTF-8"?><searchRetrieveResponse xmlns="http://www.loc.gov/zing/srw/"><version>1.1</version><numberOfRecords>8</numberOfRecords><records><record><recordSchema>MARC21plus-xml</recordSchema><recordPacking>xml</recordPacking><recordData><collection xmlns="http://www.loc.gov/MARC21/slim">
  <record type="Authority">
    <leader>00000nz  a2200000nc 4500</leader>
    <controlfield tag="001">042278333</controlfield>
    <controlfield tag="003">DE-101</controlfield>
    <controlfield tag="005">20110429135047.0</controlfield>
    <controlfield tag="008">900305n||azznnaabn           | ana    |c</controlfield>
    <datafield ind1="7" ind2=" " tag="024">
      <subfield code="a">4227833-8</subfield>
      <subfield code="0">http://d-nb.info/gnd/4227833-8</subfield>
      <subfield code="2">gnd</subfield>
    </datafield>
    <datafield ind1=" " ind2=" " tag="035">
      <subfield code="a">(DE-101)042278333</subfield>
...
    <datafield ind1=" " ind2=" " tag="913">
      <subfield code="S">swd</subfield>
      <subfield code="i">k</subfield>
      <subfield code="a">Internationaler Sozialistenkongress</subfield>
      <subfield code="0">(DE-588c)4021089-3</subfield>
    </datafield>
  </record>
</collection></recordData><recordPosition>8</recordPosition></record></records><echoedSearchRetrieveRequest><version>1.1</version><query>WOE=sozialistenkongress and COD=s</query><xQuery xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/><startRecord>6</startRecord><maximumRecords>5</maximumRecords><recordSchema>MARC21plus-xml</recordSchema></echoedSearchRetrieveRequest></searchRetrieveResponse>
<?xml version="1.0" encoding="UTF-8"?><searchRetrieveResponse xmlns="http://www.loc.gov/zing/srw/"><version>1.1</version><numberOfRecords>8</numberOfRecords><records><record><recordSchema>MARC21plus-xml</recordSchema><recordPacking>xml</recordPacking><recordData><collection xmlns="http://www.loc.gov/MARC21/slim">
  <record type="Authority">
    <leader>00000nz  a2200000nc 4500</leader>
    <controlfield tag="001">042278333</controlfield>
    <controlfield tag="003">DE-101</controlfield>
    <controlfield tag="005">20110429135047.0</controlfield>
    <controlfield tag="008">900305n||azznnaabn           | ana    |c</controlfield>
    <datafield ind1="7" ind2=" " tag="024">
      <subfield code="a">4227833-8</subfield>
      <subfield code="0">http://d-nb.info/gnd/4227833-8</subfield>
      <subfield code="2">gnd</subfield>
...
    </datafield>
    <datafield ind1=" " ind2=" " tag="913">
      <subfield code="S">swd</subfield>
      <subfield code="i">c</subfield>
      <subfield code="a">Bern / Internationaler Sozialistenkongress &lt;1919&gt;</subfield>
      <subfield code="0">(DE-588c)4227833-8</subfield>
    </datafield>
  </record>
</collection></recordData><recordPosition>1</recordPosition></record><record><recordSchema>MARC21plus-xml</recordSchema><recordPacking>xml</recordPacking><recordData><collection xmlns="http://www.loc.gov/MARC21/slim">
  <record type="Authority">
    <leader>00000nz  a2200000nc 4500</leader>
    <controlfield tag="001">1267605979</controlfield>
    <controlfield tag="003">DE-101</controlfield>
    <controlfield tag="005">20230329111229.0</controlfield>
    <controlfield tag="008">220908n||azznnaabn           | ana    |c</controlfield>
    <datafield ind1="7" ind2=" " tag="024">
      <subfield code="a">1267605979</subfield>
      <subfield code="0">http://d-nb.info/gnd/1267605979</subfield>
      <subfield code="2">gnd</subfield>
...

Both now logs only one object batches and only 8 records are fetched. This is good!

@TobiasNx
Copy link
Contributor

One think that just came to my mind is that the sru opener needs the provide to provide a user agent

@dr0i dr0i assigned dr0i and unassigned TobiasNx Jun 23, 2025
@dr0i dr0i moved this from Review to Working in Metafacture Jun 23, 2025
@dr0i
Copy link
Member Author

dr0i commented Jun 23, 2025

Code review: @fsteeg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Working
Development

Successfully merging this pull request may close these issues.

Add SRU opener / open-sru
2 participants