Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to connect with minio-driver? #199

Open
MBueschelberger opened this issue Nov 8, 2023 · 9 comments
Open

How to connect with minio-driver? #199

MBueschelberger opened this issue Nov 8, 2023 · 9 comments

Comments

@MBueschelberger
Copy link

MBueschelberger commented Nov 8, 2023

I am currently running into the issue, that I am not able to connect with dlite to an Minio-instance through the oteapi plugin, but I am actually successful if I use dlite directly.

Here is my docker-compose:

version: "3.7"

services:

  ote_api:
    image: ghcr.io/emmc-asbl/oteapi:latest
    environment:
      OTEAPI_REDIS_DB: 2
      OTEAPI_REDIS_TYPE: redis
      OTEAPI_REDIS_HOST: redis
      OTEAPI_REDIS_PORT: 6379
      OTEAPI_prefix: "${OTEAPI_prefix:-/api/v1}"
      OTEAPI_PLUGIN_PACKAGES: "oteapi-dlite==v0.1.4"
    depends_on:
      - redis
    entrypoint: "./entrypoint.sh --reload --log-level debug"
    networks:
      - otenet

  minio:
    image: quay.io/minio/minio
    ports:
      - 9000
    environment:
      MINIO_ROOT_USER: otadmin
      MINIO_ROOT_PASSWORD: otadmin123
    volumes:
      - ./minio-data:/data
    command: server /data
    networks:
      - otenet

  redis:
    image: redis:latest
    networks:
      - otenet

networks:
  otenet:
    driver: bridge

This is my test-instance:

{ "d71531a5-a6a4-4252-abef-8c2fdf89c416": {
    "meta": "http://www.ontotrans.eu/0.1/inputEntity",
    "dimensions": {},
    "properties": {
      "blooming_duration": 150,
      "blooming_total_passes": 10,
      "c": 0.05,
      "flitzer_duration": 70,
      "mn": 0.8,
      "pyro_bp2_temp_max": 1150,
      "qst_duration": 60,
      "qst_entry_temp_estimate": 900,
      "qst_flow_rate_total": 700,
      "qst_water_temp_entry": 30,
      "tandem_duration": 300,
      "v": 0.01
    }
  }
}

First of all, I tried to store the the instance with the driver directly:

import dlite
from pathlib import Path

thisdir = Path(__file__).resolve().parent

dlite.storage_path.append(thisdir)

access_key = "otadmin"
secret_key = "otadmin123"
instance_key = "d71531a5-a6a4-4252-abef-8c2fdf89c416"

url = f"minio://minio:9000?access_key={access_key};secret_key={secret_key};secure=False"

instance = dlite.get_instance(instance_key)

with dlite.Storage(url) as s:
    s.save(instance.meta)
    s.save(instance)

This was obviously successful, since I can fetch the instance again with this id:

import dlite

access_key = "otadmin"
secret_key = "otadmin123"
instance_key = "d71531a5-a6a4-4252-abef-8c2fdf89c416"

url = f"minio://minio:9000?access_key={access_key};secret_key={secret_key};secure=False"

dlite.storage_path.append(url)
instance = dlite.get_instance(instance_key)

print(instance)

The printed instance looks like this (as expected):

{
  "d71531a5-a6a4-4252-abef-8c2fdf89c416":  {
    "meta": "http://www.ontotrans.eu/0.1/inputEntity",
    "dimensions": {
    },
    "properties": {
      "blooming_duration": 150,
      "blooming_total_passes": 10,
      "c": 0.05,
      "flitzer_duration": 70,
      "mn": 0.8,
      "pyro_bp2_temp_max": 1150,
      "qst_duration": 60,
      "qst_entry_temp_estimate": 900,
      "qst_flow_rate_total": 700,
      "qst_water_temp_entry": 30,
      "tandem_duration": 300,
      "v": 0.01
    }
  }
}

However, when I try to fetch the instance through oteapi-dlite with following config...

from oteapi_dlite.strategies.parse import DLiteParseStrategy, DLiteParseResourceConfig

access_key = "otadmin"
secret_key = "otadmin123"
instance_key = "d71531a5-a6a4-4252-abef-8c2fdf89c416"

session = {}
config = {
    "downloadUrl": "http://minio:9000", # does not seem to be needed here?
    "mediaType": "minio", # also does not seem to explicity be needed here?
    "configuration": {
        "driver": "minio", 
        "location": "minio:9000",
        "id": instance_key,
        "options": f"access_key={access_key};secret_key={secret_key};secure=False",
        "label": "input_entity_definition"
    }
}

config = DLiteParseResourceConfig(**config)

parse = DLiteParseStrategy(parse_config=config)
session = parse.get(session)
print(session)

... I get the following traceback:

** DLiteOtherError: cannot find metadata 'http://www.ontotrans.eu/0.1/inputEntity' when loading 'd71531a5-a6a4-4252-abef-8c2fdf89c416' - please add the right storage to DLITE_STORAGES and try again
** DLiteOtherError: cannot find metadata 'http://www.ontotrans.eu/0.1/inputEntity' when loading 'd71531a5-a6a4-4252-abef-8c2fdf89c416' - please add the right storage to DLITE_STORAGES and try again
 - DLiteOtherError: calling load() in Python plugin 'minio'
   To see error messages from Python storages, please rerun with the
   DLITE_PYDEBUG environment variable set.
   For example: `export DLITE_PYDEBUG=`

: DLiteUnknownError: DLiteOtherError: cannot find metadata 'http://www.ontotrans.eu/0.1/inputEntity' when loading 'd71531a5-a6a4-4252-abef-8c2fdf89c416' - please add the right storage to DLITE_STORAGES and try again
Traceback (most recent call last):
  File "/shared/store-minio-ote.py", line 26, in <module>
    session = parse.get(session)
  File "/usr/local/lib/python3.9/site-packages/oteapi_dlite/strategies/parse.py", line 107, in get
    inst = dlite.Instance.from_location(
  File "/usr/local/lib/python3.9/site-packages/dlite/dlite.py", line 1370, in from_location
    return Instance(
  File "/usr/local/lib/python3.9/site-packages/dlite/dlite.py", line 1282, in __init__
    raise _dlite.DLiteError(f"cannot initiate dlite.Instance")
dlite.DLiteError: cannot initiate dlite.Instance

Since I actually saved the metadata in the script shown previously, I am not sure why the plugin is not able to fetch the instance with through the given configuration.

Obviously, the arguments in the options-string in the driver-config need to be ;-separated instead of ,-separated, as it is written in the attribute description.

Additionally, it would be great if the plugin would use the user/password attributes of the ResourceConfig (inherited from the SecretConfig) in the oteapi for later passing it to the access_key and secret_key in the options-string of the driver.

@jesper-friis
Copy link
Contributor

When you are running with oteapi, it cannot find the 'http://www.ontotrans.eu/0.1/inputEntity' entity. Have you appended the search path to dlite.storage_path like you do when you call directly from dlite?

@jesper-friis
Copy link
Contributor

Additionally, it would be great if the plugin would use the user/password attributes of the ResourceConfig (inherited from the SecretConfig) in the oteapi for later passing it to the access_key and secret_key in the options-string of the driver.

This is a very good point. But I am not sure how it should be implemented. The oteapi-dlite parse strategy just call the underlying dlite storage plugin with the options provided. Most storage plugins will complain and fail if they are given an unknown option.

I guess that we first should establish a standard vocabulary shared by dlite plugins, such that the identifiers secret_key and access_key are used consistently across oteapi strategies and dlite storage plugins. Then the oteapi-dlite parse strategy should query the dlite storage plugins what options it support. If an option from the standard vocabulary is supported and also exists in the session, then it is provided to the storage plugin.

Any other suggestions for how to implement this?

@MBueschelberger
Copy link
Author

Hi @jesper-friis , you are correct the http://www.ontotrans.eu/0.1/inputEntity was missing indeed:

{
    "uuid": "a7a9b39e-a38d-5446-83c3-2fdf45546795",
    "uri": "http://www.ontotrans.eu/0.1/inputEntity",
    "meta": "http://onto-ns.com/meta/0.3/EntitySchema",
    "description": "",
    "dimensions": {},
    "properties": {
        "blooming_duration": {
            "type": "float32",
            "description": "Time in seconds spent in the blooming mill"
        },
        "blooming_total_passes": {
            "type": "float32",
            "description": "Total number of passes in the blooming stand, should always be an even number"
        },
        "c": {
            "type": "float32",
            "description": "Carbon content in the tundish"
        },
        "flitzer_duration": {
            "type": "float32",
            "description": "Time in seconds spent in the Flitzer"
        },
        "mn": {
            "type": "float32",
            "description": "Manganese content in the tundish"
        },
        "pyro_bp2_temp_max": {
            "type": "float32",
            "description": "Maximum temperature measured at pyrometer BP2 after rolling in the blooming stand has finished"
        },
        "qst_duration": {
            "type": "float32",
            "description": "Time in seconds spent in the QST"
        },
        "qst_entry_temp_estimate": {
            "type": "float32",
            "description": "Estimated entry temperature in the QST"
        },
        "qst_flow_rate_total": {
            "type": "float32",
            "description": "Total water flow"
        },
        "qst_water_temp_entry": {
            "type": "float32",
            "description": "Water temperature"
        },
        "tandem_duration": {
            "type": "float32",
            "description": "Time in seconds spent in the tandem mill"
        },
        "v": {
            "type": "float32",
            "description": "Vanadium content in the tundish"
        }
    }
}

When I change the code to the following pattern, the instance is loaded from the minio without problems:

from oteapi_dlite.strategies.parse import DLiteParseStrategy, DLiteParseResourceConfig

access_key = "otadmin"
secret_key = "otadmin123"
ids = {"input_entity_definition": "a7a9b39e-a38d-5446-83c3-2fdf45546795", "dataset": "d71531a5-a6a4-4252-abef-8c2fdf89c416"}

session = {}
for label, instance_key in ids.items():
    
    config = {
        "downloadUrl": "http://minio:9000", # does not seem to be needed here?
        "mediaType": "minio", # also does not seem to explicity be needed here?
        "configuration": {
            "driver": "minio", 
            "location": "minio:9000",
            "id": instance_key,
            "options": f"access_key={access_key};secret_key={secret_key};secure=False",
            "label": label,
        },
    }

    config = DLiteParseResourceConfig(**config)

    parse = DLiteParseStrategy(parse_config=config)
    session = parse.get(session)
    print(session)

Is this due to the case that the EntitySchema is loaded into the collection at the first iteration and then the actual Instance is loaded in the second iteration? I am asking this because if I change back the ids-variable to {"dataset": "d71531a5-a6a4-4252-abef-8c2fdf89c416"} without referencing the uuid of the EntitySchema, I get the same error again.

I would like to avoid using the dlite.storage_path since a user might come up with other InputSchemas etc. for other usecases and their own OSPs ultimately and hence the user might upload his custom schema to minio directly before he uses the OTE pipelines for the transformation.

@MBueschelberger
Copy link
Author

Additionally, it would be great if the plugin would use the user/password attributes of the ResourceConfig (inherited from the SecretConfig) in the oteapi for later passing it to the access_key and secret_key in the options-string of the driver.

This is a very good point. But I am not sure how it should be implemented. The oteapi-dlite parse strategy just call the underlying dlite storage plugin with the options provided. Most storage plugins will complain and fail if they are given an unknown option.

I guess that we first should establish a standard vocabulary shared by dlite plugins, such that the identifiers secret_key and access_key are used consistently across oteapi strategies and dlite storage plugins. Then the oteapi-dlite parse strategy should query the dlite storage plugins what options it support. If an option from the standard vocabulary is supported and also exists in the session, then it is provided to the storage plugin.

Any other suggestions for how to implement this?

This is a good question indeed. Maybe it makes sense to map the attributes of the SecretConfig to the individual driver options in dedicated Enum classes for now? It is not really sustainable if new drivers will be available in dlite, but it might deliver a working solution for now.

@jesper-friis
Copy link
Contributor

Is this due to the case that the EntitySchema is loaded into the collection at the first iteration and then the actual Instance is loaded in the second iteration?

No. The EntityScheme is hard-coded at C-level into DLite (as are the BasicMetadataSchema and the CollectionEntity).

@jesper-friis
Copy link
Contributor

if I change back the ids-variable to {"dataset": "d71531a5-a6a4-4252-abef-8c2fdf89c416"} without referencing the uuid of the EntitySchema, I get the same error again.

Sorry, I don't understand the change. But maybe it is related to inconsistent uuids? In DLite, if an instance has an uri (which is the case of all metadata), then its uuid is calculated as a hash of this uri. You can use the dlite-getuuid tool to get the corresponding uuid.

@jesper-friis
Copy link
Contributor

Maybe it makes sense to map the attributes of the SecretConfig to the individual driver options in dedicated Enum classes for now? It is not really sustainable if new drivers will be available in dlite, but it might deliver a working solution for now.

Yes, this would be a quick solution that could be applied until we have something more scalable. We could add a PR on this, with a reference to my above suggestion for a more scalable solution.

@jesper-friis
Copy link
Contributor

jesper-friis commented Nov 9, 2023

I would like to avoid using the dlite.storage_path...

I understand this wish. I added a suggestion (issue #148) to otelib to address this. For now, I think the easiest solution is that you set DLITE_STORAGES in the docker compose file that runs the oteapi services.

@ajeklund
Copy link
Contributor

Is this due to the case that the EntitySchema is loaded into the collection at the first iteration and then the actual Instance is loaded in the second iteration? I am asking this because if I change back the ids-variable to {"dataset": "d71531a5-a6a4-4252-abef-8c2fdf89c416"} without referencing the uuid of the EntitySchema, I get the same error again.

@jesper-friis, to me it seems that what @MBueschelberger intends to point out is that he has to manually load the Entity (although he referred to it in his comment as EntitySchema) before he can load an instance of it. Is this the behavior that you would expect? I.e., shouldn't the Entity be loaded automatically when we try to load one of its instances?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants