-
-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Move to a faster base64 implementation #19984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,10 @@ | ||
# SPDX-License-Identifier: Apache-2.0 | ||
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project | ||
|
||
import base64 | ||
from io import BytesIO | ||
from pathlib import Path | ||
|
||
import pybase64 | ||
import torch | ||
from PIL import Image | ||
|
||
|
@@ -55,7 +55,7 @@ def load_bytes(self, data: bytes) -> Image.Image: | |
return convert_image_mode(image, self.image_mode) | ||
|
||
def load_base64(self, media_type: str, data: str) -> Image.Image: | ||
return self.load_bytes(base64.b64decode(data)) | ||
return self.load_bytes(pybase64.b64decode(data, validate=True)) | ||
|
||
def load_file(self, filepath: Path) -> Image.Image: | ||
image = Image.open(filepath) | ||
|
@@ -75,7 +75,7 @@ def encode_base64( | |
image.save(buffer, image_format) | ||
data = buffer.getvalue() | ||
|
||
return base64.b64encode(data).decode('utf-8') | ||
return pybase64.b64encode(data).decode('utf-8') | ||
|
||
|
||
class ImageEmbeddingMediaIO(MediaIO[torch.Tensor]): | ||
|
@@ -88,10 +88,10 @@ def load_bytes(self, data: bytes) -> torch.Tensor: | |
return torch.load(buffer, weights_only=True) | ||
|
||
def load_base64(self, media_type: str, data: str) -> torch.Tensor: | ||
return self.load_bytes(base64.b64decode(data)) | ||
return self.load_bytes(pybase64.b64decode(data, validate=True)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar to the change in This is a good enhancement for robustness. Confirming this is the intended behavior. |
||
|
||
def load_file(self, filepath: Path) -> torch.Tensor: | ||
return torch.load(filepath, weights_only=True) | ||
|
||
def encode_base64(self, media: torch.Tensor) -> str: | ||
return base64.b64encode(media.numpy()).decode('utf-8') | ||
return pybase64.b64encode(media.numpy()).decode('utf-8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By switching to
pybase64.b64decode
and settingvalidate=True
, you've made the base64 decoding stricter. The previous implementation usingbase64.b64decode
with its default settings would silently ignore non-base64 characters in the input string. The new implementation will raise an error.This is a positive change for input validation and data integrity. I just want to confirm this change in behavior is intended. If so, this is a great improvement!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validate=True is actually faster, since with validate=False, pybase64 will proactively filter the input for illegitimate characters, which entails some performance overhead