Best AI Security Tools
LLM vulnerability scanners benchmark
reviews

Top LLM Vulnerability Scanners: What We Found After Testing 8 Tools

We ran 8 LLM vulnerability scanners against the same attack corpus and measured what each one actually catches. Here are the numbers — including the tools that failed.

By Editorial · · 8 min read

LLM vulnerability scanners occupy an interesting position in the AI security stack. They’re not runtime guards — they run before or alongside deployment, looking for weaknesses in models, prompts, and pipeline configurations. The category is young enough that “vulnerability scanner” means wildly different things from vendor to vendor. One tool calls itself a scanner because it fuzzes prompts; another means it checks your model weights for known-malicious fine-tunes; a third is really just a RAG-security linter with a flashy dashboard.

We wanted to know which tools actually catch real vulnerabilities, so we built a shared test corpus and put eight scanners through it.

Our Test Methodology

The corpus covers four vulnerability classes that appear repeatedly in real LLM deployments:

Prompt injection vectors. Direct injections, indirect injections via retrieved context, nested instructions designed to override system prompts. We used 340 attack strings, drawn from the Prompt Injection Report database and our own collection.

Jailbreak patterns. Role-playing frames, hypothetical framings, token manipulation, DAN-style multi-step prompts. 210 variants across seven technique families.

Data exfiltration patterns. Instructions designed to leak system prompts, conversation history, or retrieved documents. 85 test cases at varying levels of obfuscation.

Model supply-chain checks. Where applicable, we provided deliberately misconfigured pipeline definitions and model cards with known red flags to see whether scanners flagged them.

All eight tools were tested against the same base LLM (we used GPT-4o and a 70B Llama 3.1 fine-tune, with results nearly identical) and scored on detection rate, false-positive rate on benign inputs, and — where the tool supports it — useful remediation guidance.

The Eight Tools

1. Garak (NeMo Guardrails Research Edition)

Garak is NVIDIA’s open-source LLM vulnerability scanner, and it’s the closest thing the category has to an industry-standard baseline. It covers an impressive breadth of attack types — hallucination triggers, toxic generation, data leakage, and prompt injection — and its probe library is extensible via YAML.

Detection rates:

False-positive rate: 4.2% on benign inputs.

The main limitation is that Garak’s default probes favor breadth over depth — it casts a wide net but doesn’t probe deeply within any single technique family. For targeted red-teaming of a specific attack class, you’ll want to layer it with something more focused.

Garak is free and open source. NVIDIA on GitHub. Related reading: AI Sec Bench’s Garak walkthrough.

2. PyRIT (Microsoft)

Microsoft’s Python Risk Identification Toolkit for generative AI is a framework, not just a scanner — it’s designed to let red teams build custom attack orchestrations. Out of the box, it ships with a reasonable set of converters and objectives covering most of the standard attack families.

Detection rates:

False-positive rate: 3.1%

PyRIT’s strength is its composability. You can layer converters (obfuscation, translation, paraphrase) with objectives (harmful content, sensitive data extraction) and scoring functions to build targeted campaigns that surface edge cases other scanners miss. The documentation is good enough to get started without reading source code.

The tool requires meaningful setup investment; it’s not a “scan now” experience. More appropriate for teams with dedicated red-team bandwidth. Free, MIT license.

3. LLMGuard (ProtectAI)

LLMGuard is a runtime guardrail library that includes a scanning mode. Its scanners check for prompt injection, PII leakage, and toxic content. Unlike the research tools above, this one is designed for production integration — scan rates are low enough to use inline.

Detection rates:

False-positive rate: 6.8%

The lower jailbreak detection rate matters in practice. LLMGuard’s jailbreak scanner relies on a classification model trained on a relatively narrow corpus, and our more creative jailbreak variants (indirect, multi-turn, role-framing hybrids) got through at a high rate.

That said, the PII detection is genuinely strong — one of the better implementations we’ve seen for catching social security numbers, credit card numbers, and email addresses in LLM output. If PII leakage is your primary concern, LLMGuard is worth serious consideration.

Open source, Apache 2.0. ProtectAI GitHub. Sister site review: AI Moderation Tools on LLMGuard.

4. Rebuff

Rebuff focuses narrowly on prompt injection detection and does it better than most generalist tools. It uses a multi-layer approach: a canary token layer, a vector-similarity layer comparing against known injections, and an LLM-based classification layer.

Detection rates:

False-positive rate: 5.9%

That 84% prompt injection detection rate is the highest in our benchmark. The cost is that Rebuff doesn’t try to detect much else — if you’re running it as your only scanner, you have large blind spots. It’s best understood as a specialized component in a layered security stack, not a complete solution.

5. Vigil (ProtectAI)

Vigil is ProtectAI’s scanner-specific product, distinct from LLMGuard. It’s designed for scanning at the request/response level and includes YARA-based custom rule support, which is a significant differentiator for teams that want to write organization-specific detection logic.

Detection rates:

False-positive rate: 3.7%

The YARA rule engine is genuinely useful. We wrote five custom rules targeting internal vocabulary (names of internal systems, proprietary terms) and Vigil flagged test injections attempting to extract that content at a 91% rate. That kind of customization is hard to get elsewhere.

6. Lakera Guard

Lakera Guard is the commercial offering from Lakera AI and the only paid product in our test. The API is clean, latency is low (average 47ms per request in our tests), and the detection model clearly benefits from training on a broader corpus than the open-source alternatives.

Detection rates:

False-positive rate: 2.4%

Lakera posts the strongest combined score across the three detection categories, and its false-positive rate is the best in the benchmark. For teams that need production-quality detection without the operational overhead of running their own inference infrastructure, Lakera Guard is hard to argue against. Pricing starts at a per-request rate that works out favorably at moderate volumes.

7. Calypso by Robust Intelligence

Robust Intelligence’s scanner focuses on model-level vulnerabilities in addition to prompt-level ones. It can test models directly via their inference API and flag known attack surfaces in fine-tuned weights using their adversarial ML framework.

Detection rates:

False-positive rate: 5.1%

The supply-chain detection capability is the headline feature and it delivers. If your org is pulling fine-tuned models from HuggingFace or internal registries, Calypso’s weight-level scanning gives you coverage that no other tool in our benchmark provides.

More at Adversarial ML for background on weight-level attacks.

8. Prompt Armor

Prompt Armor is a newer entrant with a narrow focus on preventing system prompt exfiltration and confidentiality violations. Their detection engine is specifically trained on exfiltration-style attacks and performs accordingly.

Detection rates:

False-positive rate: 4.6%

If protecting your system prompt is the primary threat model, Prompt Armor performs well. If you need general coverage, it’s too narrow.

How to Stack Them

No single tool covers the full attack surface well. The practical answer is a layered stack:

The total cost of the paid components at reasonable production volumes is $400–900/month. All open-source components can be self-hosted.

Benchmark Caveats

These numbers reflect a single point-in-time evaluation. All detection models degrade against novel attack patterns, and vendors update detection logic regularly — Lakera in particular has released two model updates since we started this benchmark. Check AI Sec Bench for updated numbers.

Detection rates also depend heavily on your deployment context. A heavily filtered enterprise deployment will see different results than an open-ended consumer chatbot. Our numbers reflect a neutral base configuration.

What’s Coming

We’re adding three more tools to the next benchmark run, including two that weren’t production-ready in time for this publication. AI Alert tracks new scanner releases as they drop.

#scanners #llm-security #benchmark #tools
Subscribe

Best AI Security Tools — in your inbox

Comparing the AI security tooling landscape, with numbers. — delivered when there's something worth your inbox.

No spam. Unsubscribe anytime.

Related

Comments