A Pulp plugin for managing Hugging Face Hub content with pull-through caching support.
- Pull-through caching: Automatically fetch and cache content from Hugging Face Hub on first access
- Support for all Hugging Face content types: Models, datasets, and spaces
- Authentication support: Use Hugging Face tokens for private repositories
- API proxying: Forward API requests to Hugging Face Hub for metadata operations
- File downloads: Cache and serve model files, configuration files, and other artifacts
pip install pulp-hugging-face
First, create a Hugging Face remote that points to the Hugging Face Hub using the REST API:
# Using curl (REST API)
curl -X POST https://your-pulp-instance.com/api/pulp/public-domain-name/api/v3/remotes/hugging_face/hugging-face/ \
-H "Content-Type: application/json" \
-u admin:password \
-d '{
"name": "hf-remote",
"url": "https://huggingface.co",
"policy": "on_demand"
}'
For private repositories, include your Hugging Face token:
# Using curl with authentication token
curl -X POST https://your-pulp-instance.com/api/pulp/public-domain-name/api/v3/remotes/hugging_face/hugging-face/ \
-H "Content-Type: application/json" \
-u admin:password \
-d '{
"name": "hf-private",
"url": "https://huggingface.co",
"policy": "on_demand",
"hf_token": "YOUR_HF_TOKEN"
}'
Create a distribution that uses the remote for pull-through caching:
# First get the remote href
REMOTE_HREF=$(curl -s https://your-pulp-instance.com/api/pulp/public-domain-name/api/v3/remotes/hugging_face/hugging-face/ -u admin:password | jq -r '.results[] | select(.name=="hf-remote") | .pulp_href')
# Create distribution
curl -X POST https://your-pulp-instance.com/api/pulp/public-domain-name/api/v3/distributions/hugging_face/hugging-face/ \
-H "Content-Type: application/json" \
-u admin:password \
-d "{
\"name\": \"hf-proxy\",
\"base_path\": \"huggingface\",
\"remote\": \"$REMOTE_HREF\"
}"
https://cert.console.redhat.com/api/pulp-content/public-ytrahnov/huggingface/
Note: CLI support (
pulp hugging-face
commands) is planned but not yet implemented. Currently, you need to use the REST API directly or create a simple script for automation.
Once configured, you can access Hugging Face content through your Pulp instance:
# Download a model file
curl http://your-pulp-instance/api/pulp-content/public-domain-name/huggingface/microsoft/DialoGPT-medium/resolve/main/config.json
# Access API endpoints
curl http://your-pulp-instance/api/pulp-content/public-domain-name/huggingface/api/models/microsoft/DialoGPT-medium
# List repository files
curl http://your-pulp-instance/api/pulp-content/public-domain-name/huggingface/models/microsoft/DialoGPT-medium/tree/main
#If you want to use it with the huggingface-cli
export HF_ENDPOINT="http://your-pulp-instance/api/pulp-content/public-domain-name/huggingface"
huggingface-cli download hf-internal-testing/tiny-random-bert
- First request: When content is requested but not available locally, Pulp fetches it from Hugging Face Hub
- Caching: The content is stored locally and associated with the appropriate metadata
- Subsequent requests: Future requests for the same content are served from the local cache
- API forwarding: API requests are forwarded to Hugging Face Hub for real-time metadata
The plugin supports the standard Hugging Face Hub URL patterns:
- File downloads:
/{repo_id}/resolve/{revision}/{filename}
- API endpoints:
/api/models/{repo_id}
,/api/datasets/{repo_id}
,/api/spaces/{repo_id}
- Repository trees:
/api/{repo_type}s/{repo_id}/tree/{revision}
- Git LFS:
/api/{repo_type}s/{repo_id}/git/lfs/*
hf_hub_url
: Base URL for Hugging Face Hub (default: https://huggingface.co)hf_token
: Authentication token for private repositoriespolicy
: Set toon_demand
to enable pull-through caching
The plugin handles various Hugging Face content types:
- Models: PyTorch models, TensorFlow models, configuration files
- Datasets: Training data, evaluation data, data descriptions
- Spaces: Gradio apps, Streamlit apps, static sites
git clone https://github.com/pulp/pulp_hugging_face.git
cd pulp_hugging_face
pip install -e .
CLI support for this plugin is planned but not yet implemented. The plugin currently supports:
- ✅ REST API: Full functionality via
/pulp/api/v3/
- ❌ CLI Commands:
pulp hugging-face
commands not yet available
To add CLI support, the following would need to be implemented:
- CLI command definitions in a
cli/
directory - Client library generation
- Integration with
pulp-cli
package
For now, use the REST API directly or the provided example script for automation.
File through this project's GitHub issues and appropriate labels.
WARNING Is this security related? If so, please follow the Security Disclosures procedure.
Contributions are welcome! Please read our contributing guidelines and submit pull requests to our GitHub repository.
This project is licensed under the GNU General Public License v2.0 or later.