Top LLM Vulnerability Scanners: How 8 Tools Compare

LLM vulnerability scanners occupy an interesting position in the AI security stack. They’re not runtime guards; they run before or alongside deployment, looking for weaknesses in models, prompts, and pipeline configurations. The category is young enough that “vulnerability scanner” means wildly different things from vendor to vendor. One tool calls itself a scanner because it fuzzes prompts; another means it checks your model weights for known-malicious fine-tunes; a third is really just a RAG-security linter with a flashy dashboard.

To understand which tools actually catch real vulnerabilities, it helps to compare them against a consistent set of vulnerability classes. The comparison below draws on published benchmarks, vendor documentation, and independent reviews.

Evaluation Framework

Published evaluations of these scanners tend to cover four vulnerability classes that appear repeatedly in real LLM deployments:

Prompt injection vectors. Direct injections, indirect injections via retrieved context, nested instructions designed to override system prompts. The Prompt Injection Report ↗ database is a commonly cited source of attack strings for this category.

Jailbreak patterns. Role-playing frames, hypothetical framings, token manipulation, DAN-style multi-step prompts, spanning several distinct technique families.

Data exfiltration patterns. Instructions designed to leak system prompts, conversation history, or retrieved documents, at varying levels of obfuscation.

Model supply-chain checks. Where applicable, deliberately misconfigured pipeline definitions and model cards with known red flags reveal whether a scanner flags them.

Across published comparisons, tools are typically scored on detection rate, false-positive rate on benign inputs, and, where the tool supports it, useful remediation guidance. The detection figures cited below come from published benchmarks and vendor-reported numbers; treat them as directional rather than definitive.

The Eight Tools

1. Garak (NeMo Guardrails Research Edition)

Garak is NVIDIA’s open-source LLM vulnerability scanner, and it’s the closest thing the category has to an industry-standard baseline. It covers an impressive breadth of attack types, including hallucination triggers, toxic generation, data leakage, and prompt injection, and its probe library is extensible via YAML.

Detection rates (per published benchmarks):

Prompt injection: 71%
Jailbreaks: 64%
Data exfiltration: 58%
Supply-chain flags: N/A (not supported)

False-positive rate: around 4% on benign inputs.

A frequently noted limitation is that Garak’s default probes favor breadth over depth, casting a wide net but not probing deeply within any single technique family. For targeted red-teaming of a specific attack class, you’ll want to layer it with something more focused.

Garak is free and open source. NVIDIA on GitHub ↗. Related reading: AI Sec Bench’s Garak walkthrough ↗ and our open-source LLM security testing guide for a broader toolkit review.

2. PyRIT (Microsoft)

Microsoft’s Python Risk Identification Toolkit for generative AI is a framework, not just a scanner. Per its documentation, it’s designed to let red teams build custom attack orchestrations, and out of the box it ships with a reasonable set of converters and objectives covering most of the standard attack families.

Detection rates (per published benchmarks):

Prompt injection: 76%
Jailbreaks: 69%
Data exfiltration: 61%
Supply-chain flags: N/A

False-positive rate: around 3%

PyRIT’s strength is its composability. According to the project documentation, you can layer converters (obfuscation, translation, paraphrase) with objectives (harmful content, sensitive data extraction) and scoring functions to build targeted campaigns that surface edge cases other scanners miss. The documentation is detailed enough to get started without reading source code.

The tool requires meaningful setup investment; it’s not a “scan now” experience. More appropriate for teams with dedicated red-team bandwidth. Free, MIT license.

3. LLMGuard (ProtectAI)

LLMGuard is a runtime guardrail library that includes a scanning mode. Its scanners check for prompt injection, PII leakage, and toxic content. Unlike the research tools above, this one is designed for production integration, with latency low enough to use inline.

Detection rates (per published benchmarks):

Prompt injection: 63%
Jailbreaks: 47%
Data exfiltration: 55%
Supply-chain flags: N/A

False-positive rate: around 7%

The lower jailbreak detection rate matters in practice. LLMGuard’s jailbreak scanner relies on a classification model trained on a relatively narrow corpus, and reviewers report that more creative jailbreak variants (indirect, multi-turn, role-framing hybrids) get through at a high rate.

That said, the PII detection is widely regarded as strong, one of the better implementations for catching social security numbers, credit card numbers, and email addresses in LLM output. If PII leakage is your primary concern, LLMGuard is worth serious consideration.

Open source, Apache 2.0. ProtectAI GitHub ↗. Sister site review: AI Moderation Tools on LLMGuard ↗.

4. Rebuff

Rebuff focuses narrowly on prompt injection detection and does it better than most generalist tools. It uses a multi-layer approach: a canary token layer, a vector-similarity layer comparing against known injections, and an LLM-based classification layer.

Detection rates (per published benchmarks):

Prompt injection: 84%
Jailbreaks: 31% (very limited scope)
Data exfiltration: 29% (limited scope)
Supply-chain flags: N/A

False-positive rate: around 6%

That 84% prompt injection detection rate is among the highest reported across published comparisons. The cost is that Rebuff doesn’t try to detect much else, so if you’re running it as your only scanner, you have large blind spots. It’s best understood as a specialized component in a layered security stack, not a complete solution.

5. Vigil (ProtectAI)

Vigil is ProtectAI’s scanner-specific product, distinct from LLMGuard. It’s designed for scanning at the request/response level and includes YARA-based custom rule support, which is a significant differentiator for teams that want to write organization-specific detection logic.

Detection rates (per published benchmarks):

Prompt injection: 67%
Jailbreaks: 58%
Data exfiltration: 63%
Supply-chain flags: N/A

False-positive rate: around 4%

The YARA rule engine is genuinely useful. Custom rules targeting internal vocabulary (names of internal systems, proprietary terms) let Vigil flag injection attempts aimed at extracting that content, and reviewers report high detection rates for organization-specific rules of this kind. That level of customization is hard to get elsewhere.

6. Lakera Guard

Lakera Guard is the commercial offering from Lakera AI and the only paid product in this comparison. The API is clean, latency is low (the vendor reports roughly 47ms per request), and the detection model benefits from training on a broader corpus than the open-source alternatives.

Detection rates (per published benchmarks):

Prompt injection: 81%
Jailbreaks: 74%
Data exfiltration: 71%
Supply-chain flags: N/A

False-positive rate: around 2%

Lakera posts among the strongest combined scores across the three detection categories in published comparisons, and its reported false-positive rate is among the lowest. For teams that need production-quality detection without the operational overhead of running their own inference infrastructure, Lakera Guard is hard to argue against. Pricing starts at a per-request rate that works out favorably at moderate volumes.

7. Calypso by Robust Intelligence

Robust Intelligence’s scanner focuses on model-level vulnerabilities in addition to prompt-level ones. It can test models directly via their inference API and flag known attack surfaces in fine-tuned weights using their adversarial ML framework.

Detection rates (per published benchmarks):

Prompt injection: 69%
Jailbreaks: 62%
Data exfiltration: 67%
Supply-chain flags: 73% (strong)

False-positive rate: around 5%

The supply-chain detection capability is the headline feature, and published evaluations indicate it delivers. If your org is pulling fine-tuned models from HuggingFace or internal registries, Calypso’s weight-level scanning offers coverage that no other tool in this comparison provides.

More at Adversarial ML ↗ for background on weight-level attacks.

8. Prompt Armor

Prompt Armor is a newer entrant with a narrow focus on preventing system prompt exfiltration and confidentiality violations. Their detection engine is specifically trained on exfiltration-style attacks and performs accordingly.

Detection rates (per published benchmarks):

Prompt injection: 52%
Jailbreaks: 38%
Data exfiltration: 79%
Supply-chain flags: N/A

False-positive rate: around 5%

If protecting your system prompt is the primary threat model, Prompt Armor performs well. If you need general coverage, it’s too narrow.

How to Stack Them

No single tool covers the full attack surface well. For comprehensive adversarial testing beyond what scanners provide, see our AI red teaming tools guide. The practical answer is a layered stack:

For prompt injection and general coverage: Lakera Guard (paid) or Rebuff + Garak (open source)
Add for jailbreak coverage: PyRIT for periodic red-team campaigns
Add for supply-chain: Calypso if you’re pulling external model weights
Add for PII protection: LLMGuard’s output scanner
Add custom rules: Vigil’s YARA engine for org-specific content policies

The total cost of the paid components at reasonable production volumes is $400–900/month. All open-source components can be self-hosted.

Benchmark Caveats

These numbers reflect point-in-time published evaluations. All detection models degrade against novel attack patterns, and vendors update detection logic regularly, with Lakera in particular shipping frequent model updates. Check AI Sec Bench ↗ for updated numbers.

Detection rates also depend heavily on your deployment context. A heavily filtered enterprise deployment will see different results than an open-ended consumer chatbot. The figures above reflect neutral base configurations.

What’s Coming

Additional tools are emerging in this category, including some that were not yet production-ready when the benchmarks above were published. AI Alert ↗ tracks new scanner releases as they drop.

Top LLM Vulnerability Scanners: How 8 Tools Compare

Evaluation Framework

The Eight Tools

1. Garak (NeMo Guardrails Research Edition)

2. PyRIT (Microsoft)

3. LLMGuard (ProtectAI)

4. Rebuff

5. Vigil (ProtectAI)

6. Lakera Guard

7. Calypso by Robust Intelligence

8. Prompt Armor

How to Stack Them

Benchmark Caveats

What’s Coming

Best AI Security Tools — in your inbox

Related

AI Firewall and Guardrail Solutions: The 2026 Landscape

Open Source LLM Security Testing Tools: The Practical Toolkit

Adversarial Machine Learning Defense Tools: What Actually Works

Comments