Create Solr Query API #11

hoodriverheather · 2024-11-06T20:44:14Z

@nutjob4life create new Solr Query API interface to allow users to query Solr metadata. For example, show me all EventIDs for LTP2 Site weRc6TUHvOru6A. Or return all EventIDs by BlindedSiteID for LTP2.

nutjob4life · 2024-11-11T17:52:28Z

Hi @hoodriverheather

The Solr query API for EDRN LabCAS is available.

The URLs are:

For collections: https://edrn-labcas.jpl.nasa.gov/data-access-api/collections/select
For datasets: https://edrn-labcas.jpl.nasa.gov/data-access-api/dataset/select
For files: https://edrn-labcas.jpl.nasa.gov/data-access-api/files/select

Please use HTTP Basic Authentication with your EDRN username and password. For example, using the curl command to query for all files that have eventID of 8300386 and returning the collection name and organ in JSON format:

curl --silent --user 'kelly:REDACTED' 'https://edrn-labcas.jpl.nasa.gov/data-access-api/files/select?fl=CollectionName,Organ,eventID&indent=on&q=eventID:8300386&wt=json'

The various APIs all accept Solr query parameters documented at https://solr.apache.org/guide/6_6/the-standard-query-parser.html

More examples:

All details of all collections with SpecimenType of Serum in XML format: https://edrn-labcas.jpl.nasa.gov/data-access-api/collections/select?indent=on&q=SpecimenType:Serum&wt=xml
Top 10 LeadPI names of all datasets with the CollectionName of Lung Team Project 2 Images in JSON format: https://edrn-labcas.jpl.nasa.gov/data-access-api/datasets/select?fl=LeadPI&indent=on&q=CollectionName:%22Lung%20Team%20Project%202%20Images%22&wt=json
The ID, data custodian, and data custodian email of the top 100 files with City_of_Hope in their IDs in CSV format: https://edrn-labcas.jpl.nasa.gov/data-access-api/files/select?fl=id,DataCustodian,DataCustodianEmail&indent=on&q=id:*City_of_Hope*&rows=100&wt=csv

hoodriverheather · 2024-11-15T18:39:32Z

@nutjob4life This is cool! I got your example to work. Could you write a query for me that would return the eventIDs for a BlindedSiteID? I tried this:
curl --silent --user 'kincaid:YourPassword' 'https://edrn-labcas.jpl.nasa.gov/data-access-api/files/select?fl=eventID&indent=on&q=BlindedSiteID:"NVRiRYzqspbvMw"&wt=json'

it didn't work. I get this error:

<title>400 Unknown Reason</title>

Unknown Reason

Your browser sent a request that this server could not understand.

It would be even more helpful if it could write the output to a .csv file :)
Thanks!

nutjob4life · 2024-11-15T19:54:05Z

@hoodriverheather the issue is that there are quotation marks in your URL (above) which need to be encoded as %22.

Here's the URL that worked for me

https://edrn-labcas.jpl.nasa.gov/data-access-api/files/select?fl=eventID&indent=on&q=BlindedSiteID:NVRiRYzqspbvMw&wt=json

$ curl --user 'kelly:REDACTED' --silent 'https://edrn-labcas.jpl.nasa.gov/data-access-api/files/select?fl=eventID&indent=on&q=BlindedSiteID:NVRiRYzqspbvMw&wt=json'
{
  "response":{"numFound":114558,"start":0,"docs":[
      {
        "eventID":["8291042"]},
      {
        "eventID":["8291042"]},
      {
        "eventID":["8291042"]},
      {
        "eventID":["8291042"]},
      {
        "eventID":["8291042"]},
      {
        "eventID":["8291042"]},
      {
        "eventID":["8291042"]},
      {
        "eventID":["8291042"]},
      {
        "eventID":["8143000"]},
      {
        "eventID":["8143000"]}]
  }}

Here's a tip: I use ChatGPT to write the curl commands:

Write a Solr query for "curl" that will return all eventIDs for a query where the BlindedSiteID is NVRiRYzqspbvMw

Also, there's probably no need to quote NVRiRYzqspbvMw anyway as it doesn't contain spaces or special characters

hoodriverheather · 2024-11-19T01:00:35Z

@nutjob4life I still can't get this to work. :( Do you have a few minutes to help me tomorrow?

Also, can you give DMCC instructions on how to do this using our API? I don't think this is a user friendly way for them to get updates. Thanks!!

nutjob4life · 2024-11-19T01:57:31Z

@hoodriverheather sure, I can help on the 19th.

No, it's not user friendly. But it is developer friendly, and quite powerful—and APIs are meant for developers, not users 😉

Although a lot of developers are familiar with curl, some might find a programming language client like pysolr for Python, or Postman to construct queries. Does the DMCC have developers familiar with Python or another programming language? I think they know Postman. What would you recommend I gear a guide for?

EDIT: I searched email and saw that [email protected] was indeed using Postman, so I'll write up instructions specifically for that. You might find Postman easier to use, too, than curl.

nutjob4life · 2024-11-19T17:36:34Z

@hoodriverheather could you come up with some example queries that developers at the DMCC might like to perform? I have a few examples in this comment but those are just "toy" examples I came up with. We can include these in the document I'm writing.

You know better the kinds of questions they'd like to ask of the metadata 🎓

hoodriverheather · 2024-11-19T18:52:44Z

@nutjob4life I think the primary query would be the following:
Return all eventIDs by BlindedSiteID for CollectionName="Lung Team Project 2 Images"

Similar for PMRI data collection.

OR
For CollectionName="Combined Imaging and Blood Biomarkers for Breast Cancer Diagnosis" somehow return the list of images sets grouped by Training and Validation. I'm not sure what the image set is labeled as in Solr. This would look like this:
Validation:

2614
2624
2639
...
(see screenshot)

Do those make sense?

nutjob4life · 2024-11-19T21:45:05Z

@hoodriverheather that first query can be done and I will include it in the documentation. That second one, though, will require programming. That goes beyond the scope of the documentation I'm writing. (I trust a developer like [email protected] to handle it.)

I'll mention it, though, in an "Advanced Topics" section" 😉

nutjob4life · 2024-11-20T00:03:21Z

@hoodriverheather I wrote some docs that should satisfy many developers

You're welcome to give it a try—or we can try it together. My schedule on 11-20 is open except from 7am to 8am.

hoodriverheather · 2024-11-20T18:10:24Z

@nutjob4life It would be great to get a quick walk through or at least show me how to query so that I can get a list by Collection of eventIDs by BlindedSiteID. Even better would be to output that list into a csv or allow me to copy into a spreadsheet. :)
Question - can i also run this on Dev?

nutjob4life · 2024-11-20T18:44:04Z

@hoodriverheather you can't get eventIDs by BlindedSiteID without programming; Solr doesn't support sorting by that field. You can get into CSV with the wt parameter.

As for dev: yes, it is supported there.

Got time for a quick call?

hoodriverheather · 2024-11-20T18:46:24Z

@nutjob4life yes, i have time for a call. i'll send you a Teams meeting.

nutjob4life · 2024-11-20T22:46:02Z

@hoodriverheather here's that first report we discussed:

events-by-blinds.csv

hoodriverheather · 2024-11-21T23:37:54Z

@nutjob4life This is awesome! It will help both me and Jackie! Would it be possible to also add the CollectionID?

nutjob4life · 2024-11-22T00:35:04Z

@hoodriverheather I gotta stop reading email at night; let me restart VPN and modify the report generator 😁

nutjob4life · 2024-11-22T00:41:58Z

@hoodriverheather here you go!

events-by-blinds.csv

hoodriverheather · 2024-12-06T22:09:32Z

@nutjob4life Thanks! I’ll pass this along to Jackie.

Could you help clarify the plan for providing access to the DMCC programmer? I’m sorry—I know you’ve explained this before, but I don’t remember enough details to draft an email about it.

From what I recall, the DMCC programmer will be able to run a Solr query now. Is there any documentation available for this? Additionally, is there API access or another plan for providing broader access?

If so, we could close this issue and create a new one to track any future updates.

Thanks again for your help!

nutjob4life · 2024-12-06T23:07:17Z

@hoodriverheather did we conclude that the Solr API—despite being an industry standard—was too advanced for the DMCC to handle? I thought we talked about creating specific APIs to handle specific questions the DMCC would like to pose rather than use the generic Solr API.

Should we talk about this on the 12-10 staff meeting?

Regardless, yes I did write documentation for the Solr API … and you're welcome! 😇

hoodriverheather · 2024-12-09T21:03:16Z

@nutjob4life That documentation looks great. We can discuss on tomorrow's call.

nutjob4life · 2024-12-13T23:56:14Z

In our staff meeting on 2024-12-10, @dcrichto1 recommended we develop some example programs that use the LabCAS Solr API that can serve as additional supporting material for the existing documentation.

@hoodriverheather I've completed these example programs.

The documentation page has been updated to refer to these example programs.

Please review. You can try running the example programs, however they're meant for programmers, so you can skip that 😇

hoodriverheather assigned nutjob4life Nov 6, 2024

nutjob4life assigned hoodriverheather and unassigned nutjob4life Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create Solr Query API #11

Create Solr Query API #11

hoodriverheather commented Nov 6, 2024

nutjob4life commented Nov 11, 2024

hoodriverheather commented Nov 15, 2024

nutjob4life commented Nov 15, 2024 •

edited

Loading

hoodriverheather commented Nov 19, 2024

nutjob4life commented Nov 19, 2024 •

edited

Loading

nutjob4life commented Nov 19, 2024

hoodriverheather commented Nov 19, 2024

nutjob4life commented Nov 19, 2024

nutjob4life commented Nov 20, 2024

hoodriverheather commented Nov 20, 2024

nutjob4life commented Nov 20, 2024

hoodriverheather commented Nov 20, 2024

nutjob4life commented Nov 20, 2024

hoodriverheather commented Nov 21, 2024

nutjob4life commented Nov 22, 2024

nutjob4life commented Nov 22, 2024

hoodriverheather commented Dec 6, 2024

nutjob4life commented Dec 6, 2024 •

edited

Loading

hoodriverheather commented Dec 9, 2024

nutjob4life commented Dec 13, 2024

Create Solr Query API #11

Create Solr Query API #11

Comments

hoodriverheather commented Nov 6, 2024

nutjob4life commented Nov 11, 2024

hoodriverheather commented Nov 15, 2024

Unknown Reason

nutjob4life commented Nov 15, 2024 • edited Loading

hoodriverheather commented Nov 19, 2024

nutjob4life commented Nov 19, 2024 • edited Loading

nutjob4life commented Nov 19, 2024

hoodriverheather commented Nov 19, 2024

nutjob4life commented Nov 19, 2024

nutjob4life commented Nov 20, 2024

hoodriverheather commented Nov 20, 2024

nutjob4life commented Nov 20, 2024

hoodriverheather commented Nov 20, 2024

nutjob4life commented Nov 20, 2024

hoodriverheather commented Nov 21, 2024

nutjob4life commented Nov 22, 2024

nutjob4life commented Nov 22, 2024

hoodriverheather commented Dec 6, 2024

nutjob4life commented Dec 6, 2024 • edited Loading

hoodriverheather commented Dec 9, 2024

nutjob4life commented Dec 13, 2024

nutjob4life commented Nov 15, 2024 •

edited

Loading

nutjob4life commented Nov 19, 2024 •

edited

Loading

nutjob4life commented Dec 6, 2024 •

edited

Loading