Prompt Leakage Probing

Overview

The Prompt Leakage Probing project is a framework for testing Large Language Model (LLM) agents to assess their susceptibility to system prompt leaks by calculating the advantage. It is built using AG2.

What This Project Does

Tests two agents one with original prompt and one with sanitized prompt to asses the judge advantage for ech security level (higher advantage is weaker security to prompt leakage).
Uses a "judge" and "analyzer" framework to determine which agent the judge is prompting.
Saves the results to a CSV file for further analysis.

Step-by-Step Instructions

Prerequisites

Install Python 3.10 or higher. Check your Python version:
```
python --version
```
Set up your OpenAI API Key. Export the key as an environment variable:
```
export OPENAI_API_KEY="your_openai_api_key"
```

1. Clone the Repository and Navigate to It

git clone https://github.com/sternakt/prompt-leakage-probing.git
cd prompt-leakage-probing

2. Install Dependencies

pip install -r requirements.txt

3. Understand the Notebooks

The project includes three Jupyter notebooks. Each tests a specific level of prompt leakage protection:

Low Protection
Medium Protection
High Protection

Run the notebooks:

Launch Jupyter Notebook:
```
jupyter notebook
```
Open the desired notebook (e.g., low_protection.ipynb) and run all cells.
Last cell will calculate the advantage for that security level

We have already uploaded the results (probe_results_low.csv, probe_results_medium.csv, probe_results_high.csv) from 40 runs so you can skip the generation and go straight to result analysis and advantage calculation.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
probe_results_high.csv		probe_results_high.csv
probe_results_low.csv		probe_results_low.csv
probe_results_medium.csv		probe_results_medium.csv
prompt_leakage_probing_high.ipynb		prompt_leakage_probing_high.ipynb
prompt_leakage_probing_low.ipynb		prompt_leakage_probing_low.ipynb
prompt_leakage_probing_medium.ipynb		prompt_leakage_probing_medium.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prompt Leakage Probing

Overview

What This Project Does

Step-by-Step Instructions

Prerequisites

1. Clone the Repository and Navigate to It

2. Install Dependencies

3. Understand the Notebooks

Run the notebooks:

About

Releases

Packages

Languages

sternakt/prompt-leakage-probing

Folders and files

Latest commit

History

Repository files navigation

Prompt Leakage Probing

Overview

What This Project Does

Step-by-Step Instructions

Prerequisites

1. Clone the Repository and Navigate to It

2. Install Dependencies

3. Understand the Notebooks

Run the notebooks:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages