Skip to content

Guardrails implementation #504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

gbayomi
Copy link
Contributor

@gbayomi gbayomi commented Aug 9, 2025

🛡️ Overview

This PR introduces a comprehensive guardrails system for Openlayer's tracing functionality, enabling automatic content filtering and protection for AI/LLM applications.

✨ Key Features

🏗️ Flexible Architecture

  • Base guardrail system with extensible BaseGuardrail abstract class
  • Multiple action types: ALLOW, BLOCK, MODIFY with configurable strategies
  • Block strategies: Graceful handling (return empty/error) vs exceptions
  • Rich metadata for monitoring, filtering, and analysis

🔒 PII Protection

  • PIIGuardrail implementation using Microsoft Presidio
  • Configurable entities: Block SSNs, redact phone numbers, etc.
  • Confidence thresholds for fine-tuned detection
  • Multiple handling strategies for different use cases

🎯 Comprehensive Integration

@trace() Decorator Support

@tracer.trace(guardrails=[pii_guardrail])
def process_user_input(user_data: str) -> str:
    return f"Processed: {user_data}"

Helper Functions Support

# Global configuration - applies to ALL helper functions
tracer.configure(guardrails=[pii_guardrail])
traced_client = tracer.trace_openai(openai.OpenAI())

Per-Call Overrides

tracer.add_chat_completion_step_to_trace(
    guardrails=[custom_guardrail],  # Override global settings
    inputs={"prompt": "sensitive data"},
    output="protected response"
)

📊 Metadata & Analytics

Each trace step includes comprehensive guardrail metadata:

{
  "guardrails": {
    "input_pii_protection": {
      "action": "redacted",
      "reason": "Redacted PII entities: PHONE_NUMBER",
      "block_strategy": "return_error_message"
    }
  },
  "has_guardrails": true,
  "guardrail_blocked": false,
  "guardrail_modified": true,
  "guardrail_allowed": false
}

🔧 Usage Examples

@trace() Decorator Example

See: examples/tracing/trace_decorator_with_guardrails.py

  • Function-level PII protection
  • Multiple guardrails with different strategies
  • Custom guardrail implementations
  • Role-based conditional protection

trace_openai() Helper Example

See: examples/tracing/trace_openai_with_guardrails.py

  • Global guardrails for all LLM calls
  • RAG pipeline protection
  • Application-specific configurations
  • Multi-model setups with different protection levels

Backward Compatibility

  • 100% backward compatible - existing code works unchanged
  • Optional guardrails - only applied when explicitly configured
  • Graceful degradation - missing dependencies don't break functionality

🧪 Testing

Comprehensive test suite included:

  • Unit tests for all guardrail components
  • Integration tests for decorator and helper functions
  • Block strategy validation
  • Metadata structure verification

📦 Dependencies

Optional (for PII guardrails):

pip install presidio-analyzer presidio-anonymizer

Future extensibility ready for:

  • LLM-Guard integration
  • Custom content filters
  • Third-party security tools

🎯 Use Cases

  • PII Protection: Automatic detection and redaction of sensitive data
  • Content Filtering: Block inappropriate or harmful content
  • Compliance: Meet regulatory requirements (GDPR, HIPAA, etc.)
  • Security: Prevent data leaks in AI applications
  • Monitoring: Track and analyze content filtering actions

Impact: This system provides enterprise-grade content protection for AI applications with minimal integration overhead and maximum flexibility.

Copy link
Contributor

@gustavocidornelas gustavocidornelas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the functionality is interesting and could be a good addition to the SDK. Below are my comments:

  • The examples under /examples/tracing don't make a lot of sense because:

    • They do not follow the convention we use (of organizing examples under providers):
Image
  • The file names indicate they are "drafts" and not informative examples for users. E.g., simple_guardrails_test.py, improved_guardrails_test.py, etc.

Overall, I'd completely remove these examples, as they are bloating the PR. Examples should probably go into a guardrails folder inside examples/tracing. I also believe they should be Jupyter notebooks. From what's described in the README.md inside /guardrails, a single example is enough. Maybe two: one showing something simple and another showing a custom guardrail.

  • It doesn't make sense to have a README.md inside src/openlayer/lib/guardrails. The content in it should probably go into a documentation page
  • The changes in tracer.py are conflicting with what we have in main. I'd rebase and fix the conflicts first. Eyeballing the code, though, I can see that the changes are pretty intrusive. There should be a less intrusive way to incorporate guardrails with tracing, as it's just a matter of adding a processing step to the inputs/outputs of traced functions and the base guardrail has a good interface for that. These big changes are hard to review (GitHub is not even rendering diffs, because there are so many) and since the code in tracer.py is part of many user's production code, reviewing it carefully is important.

@gustavocidornelas gustavocidornelas force-pushed the feature/guardrails-integration branch from c090f16 to f595c2e Compare August 14, 2025 13:42
Gabriel Bayomi Tinoco Kalejaiye and others added 7 commits August 14, 2025 11:01
- Add base guardrail architecture with GuardrailAction, BlockStrategy enums
- Implement GuardrailResult dataclass for structured guardrail responses
- Create BaseGuardrail abstract class for extensible guardrail implementations
- Add PIIGuardrail using Microsoft Presidio for PII detection and redaction
- Support multiple block strategies: raise exception, return empty, return error message, skip function
- Include GuardrailRegistry for managing guardrail instances
- Add comprehensive error handling and logging

This foundational system enables flexible content filtering and protection
for AI/LLM applications with configurable actions and strategies.
- Add guardrails parameter to @trace() and @trace_async() decorators
- Enhance create_step() context manager to support guardrails
- Update add_chat_completion_step_to_trace() for helper function integration
- Add global guardrails configuration via tracer.configure()
- Implement input and output guardrail processing with graceful error handling
- Add comprehensive guardrail metadata to trace steps including action flags
- Support multiple block strategies: graceful handling vs exceptions
- Enable per-call guardrail overrides and global configuration
- Maintain 100% backward compatibility with existing trace decorators

This integration enables automatic protection for both @trace() decorated
functions and LLM helper functions like trace_openai() with rich metadata
for monitoring and analysis.
- Add trace_decorator_with_guardrails.py demonstrating @trace() with guardrails
  * Basic PII protection with different block strategies
  * Multiple guardrails with layered protection
  * Custom guardrail implementations
  * Role-based conditional guardrails

- Add trace_openai_with_guardrails.py demonstrating helper function integration
  * Global guardrails configuration for all LLM calls
  * RAG pipeline protection with automatic guardrail application
  * Application-specific guardrail configurations
  * Multi-model setups with different protection levels
  * Monitoring and analytics with comprehensive metadata

These examples provide clear guidance for implementing guardrails in
production AI applications with both decorator and helper function patterns.
- Add comprehensive test suite for guardrails functionality
- Include tests for different block strategies and error handling
- Add helper functions integration tests
- Provide additional usage examples and demonstrations

These tests ensure the guardrails system works correctly across all
integration points and usage patterns.
- Remove complex guardrail logic from create_step and add_chat_completion_step_to_trace
- Simplify @trace() decorator integration to be much less intrusive
- Remove global guardrails configuration (not needed for basic functionality)
- Clean up examples to focus on core @trace() decorator usage
- Remove unnecessary test files and complex examples
- Keep only essential guardrails functionality in tracer.py

This makes the integration much cleaner and easier to review while maintaining
the core guardrails functionality for @trace() decorated functions.
@gustavocidornelas gustavocidornelas force-pushed the feature/guardrails-integration branch from dac9bd4 to f349cc5 Compare August 14, 2025 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants