Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider addition of endpoint to get direct upload URLs #28

Open
thclark opened this issue Apr 10, 2023 · 0 comments
Open

Consider addition of endpoint to get direct upload URLs #28

thclark opened this issue Apr 10, 2023 · 0 comments
Assignees
Labels
decision needed A decision is required (e.g. on UX or company policy) feature A new feature of the app

Comments

@thclark
Copy link
Contributor

thclark commented Apr 10, 2023

Feature request

Current state

When using BlobField to upload blobs to GCS, the upload is made to a temporary file, with a fixed content-type (application/octet-stream). Then, on successful commit of the transaction (ie once the corresponding row is saved in the database) the temporary blob is assigned its metadata and moved to its ultimate destination.
This is good because:

  • The naming callback can access any and all model fields
  • The naming callback can be deterministic, because the edge case of uploading a file then failing the database transaction does not leave an orphaned file
  • Any orphaned files land up in the _tmp/ directory (or another bucket entirely) so are easily cleaned later.

However, this mechanism limits you to uploading files using BlobField.

Use Case

I want to upload files from another service directly to GCS, using django-gcp as the permissions manager to sign URLs but without registering the files in BlobField

Proposed Solution

Create an endpoint to sign URLS that's accessible by the frontend, given a signing token. Thus the frontend can

Add a view like the following to storage/views.py:

import datetime
import json
import random
import string
import time
import django.core.signing
from django.http import HttpResponse, HttpResponseBadRequest
from django.utils import baseconv, timezone
from django.views.decorators.http import require_POST
from google.cloud.storage import Blob, Bucket

from .bucket_registry import _bucket_registry


URLSAFE_CHARACTERS = string.ascii_letters + string.digits + "-._~"
REQUIRED_PARAMS = ["token", "filename", "content_type"]

signer = django.core.signing.Signer()


@require_POST
def get_direct_upload_url(request):
    """Responds with a pre-signed URL enabling the client to upload an object to the bucket"""

    for p in REQUIRED_PARAMS:
        if not request.POST.get(p):
            return HttpResponseBadRequest(f"'{p}' is a required parameter.")
    try:
        token: str = signer.unsign(request.POST["token"])
    except django.core.signing.BadSignature:
        return HttpResponseBadRequest("Invalid token.")

    bucket_and_path, include_timestamp_indicator, exptime = token.rsplit(":", 2)
    if time.time() > baseconv.base62.decode(exptime):
        return HttpResponseBadRequest("Timeout expired.")

    bucketname, path_prefix = bucket_and_path[5:].split("/", 1)
    bucket: Bucket = _bucket_registry.get("gs://" + bucketname)
    if not bucket:
        return HttpResponseBadRequest(f"Unknown bucket identifier 'gs://{bucketname}'.")

    filename: str = request.POST["filename"]
    content_type: str = request.POST["content_type"]

    timestring: str = f"{timezone.now():%Y-%m-%d_%H-%M-%S/}" if include_timestamp_indicator == "1" else ""
    randomstring: str = "".join(random.choices(URLSAFE_CHARACTERS, k=24))
    path: str = f"{path_prefix}{timestring}{randomstring}/{filename}"
    blob: Blob = bucket.blob(path)

    return HttpResponse(
        json.dumps(
            {
                "url": blob.generate_signed_url(
                    expiration=timezone.now() + datetime.timedelta(minutes=60),
                    method="PUT",
                    content_type=content_type,
                ),
                "path": path,
            }
        )
    )

Then use this code snippet to generate the token and URL enabling the frontend to call the signing endpoint (in storage/utils.py):

import logging
import os
import time
from django.core.signing import Signer
from django.urls import reverse
from django.utils import baseconv
import datetime
import time
from django.utils import baseconv, timezone


signer = Signer()


def get_signing_token_and_url(bucket_name, path_prefix):
    bucket_identifier = f"gs://{bucket_name}"

    # Get signing url and a token to pass to it, allows the frontend to sign on demand
    # NOTE: These are currently not used but are taken from the DDCU library and could be
    include_timestamp_indicator = "1" if self.include_timestamp else "0"
    valid_until = baseconv.base62.encode(int(time.time()) + self.submit_timeout)
    signing_path = os.path.join(bucket_identifier, path_prefix)
    to_sign = f"{signing_path}:{include_timestamp_indicator}:{valid_until}"

    signing_token = signer.sign(to_sign)
    signing_url = reverse("gcp-storage-get-direct-upload-url")

Finally, add the corresponding URL (urlss.py):

from django_gcp.storage.views import get_direct_upload_url
# ...

urlpatterns = [
    # ...
    path(r"storage/get-direct-upload-url", get_direct_upload_url, name="gcp-storage-get-direct-upload-url"),
]
@thclark thclark moved this to Priority 1 (Low) in Octue Board Apr 10, 2023
@thclark thclark added the feature A new feature of the app label Apr 10, 2023
@thclark thclark self-assigned this Apr 10, 2023
@thclark thclark added the decision needed A decision is required (e.g. on UX or company policy) label Oct 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
decision needed A decision is required (e.g. on UX or company policy) feature A new feature of the app
Projects
Status: Priority 1 (Low)
Development

No branches or pull requests

1 participant