Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor McM REST client. #23

Closed
8 tasks done
ggonzr opened this issue May 30, 2024 · 8 comments
Closed
8 tasks done

Refactor McM REST client. #23

ggonzr opened this issue May 30, 2024 · 8 comments
Assignees

Comments

@ggonzr
Copy link
Contributor

ggonzr commented May 30, 2024

Perform several updates to enhance the code quality for this module and its re-usability in more contexts than requests for McM. Split the responsibility related to authenticating an HTTP client for performing requests from the per-application operations to consume a required resource. Also, allow extensions for other endpoints related to other PdmV applications and improve the performance handling requests.

Is your feature related to a problem?

Because of the McM REST client module tying HTTP methods to application-specific operations, users who require a fast way to authenticate other kinds of requests end up consuming mangled methods to achieve this behavior. This is not expected as it creates a problem in performing internal updates on modules without breaking existing client implementations.

Also, several modules in PdmV's integrations use this implementation for McM and change some internals to perform requests to other applications, bypassing the original purpose of the module and creating issues to maintain it in the future. To conclude, the current implementation doesn’t allow to handle requests concurrently and this creates performance issues related to bulk big processing loads.

Describe the solution you'd like

  • Use the requests library as the HTTP client instead of default implementations with core modules to improve flexibility, maintenance, and performance.
  • Split the responsibility for configuring an HTTP client session (request.Session) in another class. Include hooks to retrieve or request tokens and session cookies to authenticate operations and refresh the credentials when required. Store ID/OAuth tokens and refresh them if possible to improve UX and avoid extra human interventions.
  • Refactor the McM module to use the new HTTP client class created before and set operations only related to this application.
  • Refactor PdmV integrations to use this new implementation and the examples too.
  • Organize the McM modules into a subfolder only dedicated to this application.
  • Create new modules for other applications like Stats2 and RelVal.
  • Rename the repository to describe an intent wider than just McM scripts.
  • Drop Python 2 support

Expected behavior

Users have an HTTP client they can use to perform authenticated HTTP requests to CERN internal applications and, related to our context, perform operations to PdmV applications in an easier and high-performance way.

@ggonzr ggonzr mentioned this issue Jun 6, 2024
@ggonzr ggonzr linked a pull request Jun 20, 2024 that will close this issue
@ggonzr
Copy link
Contributor Author

ggonzr commented Jun 20, 2024

Hi @tvami, @efeyazgan, @lmoureaux, @DickyChant, @sihyunjeon

I am working on a refactor for the McM REST client to give more flexibility on the components it uses for performing authenticated HTTP requests to CERN web services mainly focusing on:

  1. Modularize the HTTP client so that it is easier to reuse the same component to consume HTTP resources for any CERN web application without breaking class encapsulation.
  2. Store the credentials in the filesystem and refresh them without always requiring human intervention (one alternative to simplify the issue described on: How to best support 2FA in user scripts? cmsPdmV#1127).
  3. Rearrange the modules into a package instead of a group of standalone scripts.
  4. Install this package just by using pip without embedding code.
  5. Updating the main Python version (to 3.9), dropping support for deprecated versions (Python 2).
  6. Attempting to provide backward compatibility to the user’s code.
  7. Providing a stronger test suite.

I would like to know your opinion about how smooth would it be to integrate this new version into some of the projects/scripts you work with, like:

  1. CMS GEN request fragment check: McM client used here
  2. CMS EXO MC sample request: McM client used here

This update only requires handling isolated venv environments to install the package from the source (details here) before executing the main entry point.

I. Do you think it would be easy to set up the runtime environment (venv) required to use this new version for your projects or your stakeholder’s projects so that we can decommission the current bundle serviced on AFS?

II. Is there something else you think should be added in this new version to make users’ lives simpler by interacting with the application to operate a big set of requests programmatically (without directly using the UI)?

III. Do you know other projects that heavily rely on this package? If so, and their source is public, could you include a URL to them below?

Thanks,
Geovanny

@efeyazgan
Copy link

Hi @ggonzr

I will try to test this tomorrow but will the access be the same, e.g. mcm._McM__get('public/restapi/requests/get/%s' % (prepid)) ?

Thanks,
Efe

@DickyChant
Copy link

This is really great news!

Meanwhile performing test as Efe did, I think I can quickly share some pieces of experience from my end in response to your questions.

  • In addition to the refactoring the APIs, should we also consider the range of APIs that we want to expose as public? This would help externals to retrieve information e.g. campaign details, as done this pull request Adding MCM parser script cms-cat/order#1 that @sihyunjeon and I were chatting earlier today.

And for your questions

I. Do you think it would be easy to set up the runtime environment (venv) required to use this new version for your projects or your stakeholder’s projects so that we can decommission the current bundle serviced on AFS?

Not an issue for us since ours are just meant for contacts, but for gen_check_script is different. But gen_check_script is used for McM validation majorly, so the worst solution would be install it to pdmvserv user pip area...

II. Is there something else you think should be added in this new version to make users’ lives simpler by interacting with the application to operate a big set of requests programmatically (without directly using the UI)?

Good point and apart from the tagging list, I believe it is also beneficial to have PdmV side who operates the other side of McM to share their thoughts too. We could provide feedback in the means of issue or even PR gradually.

III. Do you know other projects that heavily rely on this package? If so, and their source is public, could you include a URL to them below?

Speaking of using McM scripts, I think many MC contacts are doing so, I think in HIG MC area there is a set of notebooks which should more or less alike to your script examples, while many PWGs are working with EXO solution somehow. Also the PR I just sent!

@ggonzr
Copy link
Contributor Author

ggonzr commented Jun 21, 2024

Hi @ggonzr

I will try to test this tomorrow but will the access be the same, e.g. mcm._McM__get('public/restapi/requests/get/%s' % (prepid)) ?

Thanks, Efe

Yes it is currently compatible but we may update in the future the code to remove mangled methods (like __get(...)) and just rename it with an underscore (_get(...))

$ git clone --depth 1 https://github.com/cms-sw/genproductions.git
Cloning into 'genproductions'...
remote: Enumerating objects: 23625, done.
.......

$ python3 -V && python3 -m venv venv && source ./venv/bin/activate && pip install git+https://github.com/cms-PdmV/mcm_scripts.git@refactor/rest-client
Python 3.9.18
Collecting git+https://github.com/cms-PdmV/mcm_scripts.git@refactor/rest-client
......
Resolved https://github.com/cms-PdmV/mcm_scripts.git to commit e92c1ef2aacc9a7a5d3d63b731c64e6a5b282917
  Installing build dependencies ... done
......

(venv)$ cd ./genproductions/bin/utils/
.....

(venv)$ python3 test_request_fragment_check.py
[test_request_fragment_check][2024-06-21 10:31:23,897][INFO] Using Python version: 3.9.18 (main, Jan 24 2024, 00:00:00) 
[GCC 11.4.1 20231218 (Red Hat 11.4.1-3)
......
[2024-06-21 10:31:24 +0200][WARNING][http_client.mcm]: Using McM client without providing authentication
Running on McM DEV!
......
***********************************************************************************
Number of warnings = 1
WARNINGS:
----------
1 :  No parton shower weights configuration in the fragment. Since the Fall18 campaign, we recommend to include Parton Shower weights 

----------
Number of errors = 0

.
----------------------------------------------------------------------
Ran 1 test in 1.814s

OK

@efeyazgan
Copy link

Thanks! I had checked with some other bunch of requests as well. It seems all OK, so I assume no changes needed in the request checking script itself.

@efeyazgan
Copy link

And yes it would be nice to rename the "mangled methods". Thanks.

@ggonzr
Copy link
Contributor Author

ggonzr commented Jun 21, 2024

This is really great news!

Meanwhile performing test as Efe did, I think I can quickly share some pieces of experience from my end in response to your questions.

* In addition to the refactoring the APIs, should we also consider the range of APIs that we want to expose as public? This would help externals to retrieve information e.g. campaign details, as done this pull request [Adding MCM parser script cms-cat/order#1](https://github.com/cms-cat/order/pull/1) that @sihyunjeon and I were chatting earlier today.

And for your questions

I. Do you think it would be easy to set up the runtime environment (venv) required to use this new version for your projects or your stakeholder’s projects so that we can decommission the current bundle serviced on AFS?

Not an issue for us since ours are just meant for contacts, but for gen_check_script is different. But gen_check_script is used for McM validation majorly, so the worst solution would be install it to pdmvserv user pip area...

II. Is there something else you think should be added in this new version to make users’ lives simpler by interacting with the application to operate a big set of requests programmatically (without directly using the UI)?

Good point and apart from the tagging list, I believe it is also beneficial to have PdmV side who operates the other side of McM to share their thoughts too. We could provide feedback in the means of issue or even PR gradually.

III. Do you know other projects that heavily rely on this package? If so, and their source is public, could you include a URL to them below?

Speaking of using McM scripts, I think many MC contacts are doing so, I think in HIG MC area there is a set of notebooks which should more or less alike to your script examples, while many PWGs are working with EXO solution somehow. Also the PR I just sent!

Thanks for the feedback, related to your questions:

In addition to the refactoring the APIs, should we also consider the range of APIs that we want to expose as public?

Not at this time, modifications to the web application to expose more endpoints as public are outside the scope of this update. Ideally, in the medium to long term, we would also like to remove any public endpoints that the McM application currently has, even if only used for read-only purposes.

Instead, if it is only needed to perform read-only operations accessible to the base user (role with fewer permissions on the application), the following solution would work:

import os
from json import dumps
from pathlib import Path

import requests
from rest import McM
from rest.client.session import SessionFactory


def set_up_client_credentials() -> requests.Session:
    client_id = os.getenv("ACCESS_TOKEN_CLIENT_ID", "")
    client_secret = os.getenv("ACCESS_TOKEN_CLIENT_SECRET", "")
    url = "https://cms-pdmv-dev.web.cern.ch/mcm/"
    credential_path = Path.home() / Path("test-credential")
    target_application = "cms-ppd-pdmv-dev"

    return SessionFactory.configure_by_access_token(
        url=url,
        credential_path=credential_path,
        client_id=client_id,
        client_secret=client_secret,
        target_application=target_application,
    )

# Create an instance without setting an authentication method
# Then replace the session.
mcm = McM(id=None)
mcm.session = set_up_client_credentials()

# Consume resources using the private API
# Example took from: https://github.com/cms-PdmV/mcm_scripts/blob/master/get_requests.py
requests_query = """
    B2G-Fall13-00001
    B2G-Fall13-00005 -> B2G-Fall13-00015
"""
range_of_requests = mcm.get_range_of_requests(requests_query)
print('Found %s requests' % (len(range_of_requests)))
for request in range_of_requests:
    print(request['prepid'])

For registering a client application, it is required to follow the instructions available here.

This example allows users' code to access all private endpoints and query the required information. If this proposal fulfills the requirement, please let us know so that we can update the McM client to offer this kind of authentication too.

Not an issue for us since ours are just meant for contacts, but for gen_check_script is different. But gen_check_script is used for McM validation majorly, so the worst solution would be install it to pdmvserv user pip area...

Yes, it would require encapsulating the GEN script execution into an EL9 container in a similar way cms-sw executions currently work. For instance, related to the Get test command for PPD-TestForGENScriptDoNotOperateThis-00001, it would be something like:

# Execute some internals with newer Python versions.
cat <<'EndOfTestFile' > PPD-TestForGENScriptDoNotOperateThis-00001_pre.sh
#!/bin/bash

# Python interpreter version to use.
PYTHON_VERSION='python3.9'

# Install PdmV HTTP Package
if ! eval "${PYTHON_VERSION} -V"; then
  echo "Python version: ${PYTHON_VERSION} is not available..."
  exit 1
fi

# Install the package and an isolated venv
PYTHON_VENV_NAME="isolated-venv-${PYTHON_VERSION}"
${PYTHON_VERSION} -m venv ${PYTHON_VENV_NAME}
source "./${PYTHON_VENV_NAME}/bin/activate"
pip install git+https://github.com/cms-PdmV/mcm_scripts.git@refactor/rest-client

echo "Environment packages...\n"
echo $($PYTHON_VERSION -V)
pip freeze

echo 'Starting execution...\n\n'

# GEN Script begin
rm -f request_fragment_check.py
wget -q https://raw.githubusercontent.com/cms-sw/genproductions/master/bin/utils/request_fragment_check.py

# Remove all instructions that import remote modules 
sed -i '/sys.path.append(/d' request_fragment_check.py

chmod +x request_fragment_check.py
./request_fragment_check.py --bypass_status --prepid PPD-TestForGENScriptDoNotOperateThis-00001 --dev --develop

GEN_ERR=$?
if [ $GEN_ERR -ne 0 ]; then
  echo "GEN Checking Script returned exit code $GEN_ERR which means there are $GEN_ERR errors"
  echo "Validation WILL NOT RUN"
  echo "Please correct errors in the request and run validation again"
  exit $GEN_ERR
fi
echo "Running VALIDATION. GEN Request Checking Script returned no errors"
# GEN Script end

# Download fragment from McM
curl -s -k https://cms-pdmv-dev.web.cern.ch/mcm/public/restapi/requests/get_fragment/PPD-TestForGENScriptDoNotOperateThis-00001 --retry 3 --create-dirs -o Configuration/GenProduction/python/PPD-TestForGENScriptDoNotOperateThis-00001-fragment.py
[ -s Configuration/GenProduction/python/PPD-TestForGENScriptDoNotOperateThis-00001-fragment.py ] || exit $?;

# Check if fragment contais gridpack path and that it is in cvmfs
if grep -q "gridpacks" Configuration/GenProduction/python/PPD-TestForGENScriptDoNotOperateThis-00001-fragment.py; then
  if ! grep -q "/cvmfs/cms.cern.ch/phys_generator/gridpacks" Configuration/GenProduction/python/PPD-TestForGENScriptDoNotOperateThis-00001-fragment.py; then
    echo "Gridpack inside fragment is not in cvmfs."
    exit -1
  fi
fi
# End of PPD-TestForGENScriptDoNotOperateThis-00001_pre.sh file
EndOfTestFile

chmod +x PPD-TestForGENScriptDoNotOperateThis-00001_pre.sh
singularity run -B /afs -B /eos -B /cvmfs -B /etc/grid-security -B /etc/pki/ca-trust --home $PWD:$PWD /cvmfs/unpacked.cern.ch/registry.hub.docker.com/cmssw/el9:x84_64 $(echo "$(pwd)/PPD-TestForGENScriptDoNotOperateThis-00001_pre.sh")

# `cms-sw` execution works as normally expected
.......

Related to your lastest two items, sure I will forward this issue to more colleagues for asking their opinion about this refactor.

@ggonzr
Copy link
Contributor Author

ggonzr commented Sep 17, 2024

Updates to include this version for the McM automatic scripts will be deferred and added as part of cms-PdmV/cmsPdmV#1124 instead. The last is because these automatic scripts also use sections of the web application code that are currently being migrated from Python 2 to Python 3. The new version of the McM REST client is only compatible with Python 3. Therefore, apart from this, all other tasks are completed.

@ggonzr ggonzr closed this as completed Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants