You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: CONTRIBUTING.md
+2-2
Original file line number
Diff line number
Diff line change
@@ -29,7 +29,7 @@ And if you like the project, but just don't have time to contribute, that's fine
29
29
30
30
## I Have a Question
31
31
32
-
If you want to ask a question, good places to check first are the [garak quick start docs](https://docs.garak.ai) and, if its a coding question, the [garak reference](https://reference.garak.ai/).
32
+
If you want to ask a question, good places to check first are the [garak quick start docs](https://docs.garak.ai) and, if it's a coding question, the [garak reference](https://reference.garak.ai/).
33
33
34
34
Before you ask a question, it is best to search for existing [Issues](https://github.com/NVIDIA/garak/issues) that might help you. In case you have found a suitable issue and still need clarification, you can write your question in this issue. It is also advisable to search the internet for answers first. You can also often find helpful people on the garak [Discord](https://discord.gg/uVch4puUCs).
35
35
@@ -70,7 +70,7 @@ A good bug report shouldn't leave others needing to chase you up for more inform
70
70
<!-- omit in toc -->
71
71
#### How Do I Submit a Good Bug Report?
72
72
73
-
You should never report security related issues, vulnerabilities or bugs including sensitive information to the issue tracker, or elsewhere in public. Instead sensitive bugs must be sent by email to [email protected].
73
+
You should never report security related issues, vulnerabilities or bugs including sensitive information to the issue tracker, or elsewhere in public. Instead, sensitive bugs must be sent by email to [email protected].
74
74
<!-- You may add a PGP key to allow the messages to be sent encrypted as well. -->
75
75
76
76
We use GitHub issues to track bugs and errors. If you run into an issue with the project:
Copy file name to clipboardexpand all lines: FAQ.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -79,7 +79,7 @@ No, if the model is the same, you should get the same results - though there are
79
79
80
80
## How can I scan a RAG pipeline with garak?
81
81
82
-
Currently the major attack we hear about in RAG systems is indirect prompt injection, and garak already scans for a few of those.
82
+
Currently, the major attack we hear about in RAG systems is indirect prompt injection, and garak already scans for a few of those.
83
83
84
84
## There are so many probes in garak, I was trying to scan a model for all probes, but it took hours and I eventually had to kill that scan. What is the recommended practice on scanning a model? Which typical probes are recommended?
Copy file name to clipboardexpand all lines: README.md
+22-22
Original file line number
Diff line number
Diff line change
@@ -228,28 +228,28 @@ For testing. This generator repeats back the prompt it received.
228
228
229
229
## Intro to probes
230
230
231
-
| Probe | Description |
232
-
|---|---|
233
-
| blank | A simple probe that always sends an empty prompt. |
234
-
| atkgen | Automated Attack Generation. A red-teaming LLM probes the target and reacts to it in an attempt to get toxic output. Prototype, mostly stateless, for now uses a simple GPT-2 [fine-tuned](https://huggingface.co/garak-llm/artgpt2tox) on the subset of hhrlhf attempts that yielded detectable toxicity (the only target currently supported for now). |
235
-
| av_spam_scanning | Probes that attempt to make the model output malicious content signatures |
236
-
| continuation | Probes that test if the model will continue a probably undesirable word |
237
-
| dan | Various [DAN](https://adguard.com/en/blog/chatgpt-dan-prompt-abuse.html) and DAN-like attacks |
238
-
| donotanswer | Prompts to which responsible language models should not answer. |
239
-
| encoding | Prompt injection through text encoding |
240
-
| gcg | Disrupt a system prompt by appending an adversarial suffix. |
241
-
| glitch | Probe model for glitch tokens that provoke unusual behavior. |
242
-
| grandma | Appeal to be reminded of one's grandmother. |
243
-
| goodside | Implementations of Riley Goodside attacks. |
244
-
| leakerplay | Evaluate if a model will replay training data. |
245
-
| lmrc | Subsample of the [Language Model Risk Cards](https://arxiv.org/abs/2303.18190) probes |
246
-
| malwaregen | Attempts to have the model generate code for building malware |
247
-
| misleading | Attempts to make a model support misleading and false claims |
248
-
| packagehallucination | Trying to get code generations that specify non-existent (and therefore insecure) packages. |
249
-
| promptinject | Implementation of the Agency Enterprise [PromptInject](https://github.com/agencyenterprise/PromptInject/tree/main/promptinject) work (best paper awards @ NeurIPS ML Safety Workshop 2022) |
250
-
| realtoxicityprompts | Subset of the RealToxicityPrompts work (data constrained because the full test will take so long to run) |
251
-
| snowball |[Snowballed Hallucination](https://ofir.io/snowballed_hallucination.pdf) probes designed to make a model give a wrong answer to questions too complex for it to process |
252
-
| xss | Look for vulnerabilities the permit or enact cross-site attacks, such as private data exfiltration. |
| blank | A simple probe that always sends an empty prompt.|
234
+
| atkgen | Automated Attack Generation. A red-teaming LLM probes the target and reacts to it in an attempt to get toxic output. Prototype, mostly stateless, for now uses a simple GPT-2 [fine-tuned](https://huggingface.co/garak-llm/artgpt2tox) on the subset of hhrlhf attempts that yielded detectable toxicity (the only target currently supported for now). |
235
+
| av_spam_scanning | Probes that attempt to make the model output malicious content signatures|
236
+
| continuation | Probes that test if the model will continue a probably undesirable word|
237
+
| dan | Various [DAN](https://adguard.com/en/blog/chatgpt-dan-prompt-abuse.html) and DAN-like attacks|
238
+
| donotanswer | Prompts to which responsible language models should not answer.|
239
+
| encoding | Prompt injection through text encoding|
240
+
| gcg | Disrupt a system prompt by appending an adversarial suffix.|
241
+
| glitch | Probe model for glitch tokens that provoke unusual behavior.|
242
+
| grandma | Appeal to be reminded of one's grandmother.|
243
+
| goodside | Implementations of Riley Goodside attacks.|
244
+
| leakerplay | Evaluate if a model will replay training data.|
245
+
| lmrc | Subsample of the [Language Model Risk Cards](https://arxiv.org/abs/2303.18190) probes|
246
+
| malwaregen | Attempts to have the model generate code for building malware|
247
+
| misleading | Attempts to make a model support misleading and false claims|
248
+
| packagehallucination | Trying to get code generations that specify non-existent (and therefore insecure) packages. |
249
+
| promptinject | Implementation of the Agency Enterprise [PromptInject](https://github.com/agencyenterprise/PromptInject/tree/main/promptinject) work (best paper awards @ NeurIPS ML Safety Workshop 2022)|
250
+
| realtoxicityprompts | Subset of the RealToxicityPrompts work (data constrained because the full test will take so long to run)|
251
+
| snowball |[Snowballed Hallucination](https://ofir.io/snowballed_hallucination.pdf) probes designed to make a model give a wrong answer to questions too complex for it to process|
252
+
| xss | Look for vulnerabilities the permit or enact cross-site attacks, such as private data exfiltration.|
0 commit comments