This weblog is collectively written by Arjun Sambamoorthy, Amy Chang, and Nicholas Conley.
Immediately, Cisco launched the LLM Safety Leaderboard, a complete useful resource for evaluating mannequin safety danger and susceptibility to adversarial assaults. By offering clear, adversarial analysis alerts, this leaderboard contextualizes mannequin efficiency metrics towards evaluations of how fashions deal with malicious prompts, jailbreak makes an attempt, and different manipulation methods. The device empowers organizations with a transparent, goal understanding of mannequin safety danger by mapping threats to our AI Safety and Security Framework taxonomyand informs defense-in-depth approaches to AI deployments. As new fashions emerge and assault methods evolve, we’ll proceed increasing our analysis protection, refining our methodology and including fashions as they’re launched. Your suggestions and engagement to enhance this device are welcome and inspired.
The Cisco LLM Safety Leaderboard supplies:
- Goal safety rankings based mostly on rigorous testing throughout single-turn and multi-turn assault situations
- Detailed risk mappings aligned to the Cisco AI Safety Framework
- Clear methodology so organizations can perceive precisely what’s being measured
Why Safety Efficiency Issues
The speedy adoption of enormous language fashions (LLMs) has created an pressing want for standardized safety analysis towards real-world assaults, a lagging consideration in comparison with benchmarking capabilities in engineering, math, and science. Organizations which have deployed or are contemplating deployment of AI assistants, chatbots, and different AI-powered functions want clear, actionable information about how these fashions deal with adversarial manipulation methods to know the way to harden their property.
Not all LLMs are created equal on the subject of safety. The results of deploying suboptimal fashions to your use case can vary from dangerous content material technology to information leakage and model injury. If these fashions are related to brokers, the injury danger will increase exponentially, whereas reversibility of damaging outcomes turns into ever smaller.
What Makes Our Method Totally different
Complete Assault Protection
Our analysis goes past easy immediate injection checks. We assess fashions towards each single- and multi-turn assaults that try to elicit dangerous or malicious responses. Every mannequin receives a mixed safety rating weighted equally between single-turn resistance (50%) and multi-turn protection capabilities (50%), offering a holistic view of safety posture.
Honest, Unbiased Testing
All testing has been performed on base fashions with none further guardrails or security layers. Whereas manufacturing deployments usually embrace guardrails, content material filters, and extra security mechanisms, our analysis focuses on the inherent safety capabilities constructed into the fashions themselves. This strategy supplies a good baseline evaluation throughout numerous mannequin suppliers or variations and helps organizations perceive the foundational safety posture earlier than layering on further protections.
The Cisco AI Safety Framework
We’ve mapped all assault information to our AI Safety Framework taxonomy, which facilitates identification of mannequin susceptibility to a particular sort of assault, and the way and the place these weaknesses exist. We break this down hierarchically alongside three dimensions:
- Targets — Excessive-level safety targets and assault classes
- Strategies — Particular strategies attackers use to compromise fashions
- Subtechniques — Granular assault variations and implementation particulars
Transparency
Not like proprietary evaluations, the Cisco LLM Safety Leaderboard is publicly accessible and facilitates comparability of fashions side-by-side earlier than deployment choices; filter and seek for particular fashions of curiosity; drill down into efficiency throughout procedures, content material sorts, and assault methods; and perceive resistance charges at each stage of our taxonomy.
Navigating the Leaderboard
The platform consists of three most important elements: LLM Safety Rankings, Cisco AI Safety and Security Framework, and Methodology.
Rankings Web page
On this web page, guests can view complete mannequin safety rankings with fast entry to the highest and backside performers towards our assault dataset. Every mannequin entry expands to disclose granular efficiency metrics throughout a number of assault dimensions.


Determine 1. The primary rankings view reveals mixed safety scores, with fast filters for high performers, backside performers, and all fashions. Search performance permits speedy mannequin lookup.
Detailed Mannequin Metrics
This detailed view allows safety groups to determine particular risk patterns and make knowledgeable danger assessments for his or her specific use instances. Click on on any mannequin to increase complete efficiency information and examine:
- General resistance and success charges for each single-turn and multi-turn assaults
- Finest and worst performing procedures
- Strongest and weakest content material sort defenses
- Subtechnique risk patterns
- Multi-turn technique effectiveness


Determine 2. Expanded mannequin view reveals granular breakdowns of efficiency throughout assault procedures, content material sorts, sub-techniques, and multi-turn methods. Every metric reveals each resistance price and assault success price for full transparency.
Cisco AI Safety & Security Framework Web page
Discover an interactive hierarchy that maps mannequin efficiency towards our safety framework and derives insights into sure assault strategies that pose challenges throughout almost all fashions, or model-specific weaknesses. Guests may filter by mannequin to view particular mannequin efficiency throughout the framework and perceive common resistance charges and general assault protection. This granular perception allows focused danger mitigation methods.


Determine 3. The interactive taxonomy tree maps all assault information to the Cisco AI Safety Framework. Every node reveals resistance charges, complete prompts examined, and refused/profitable counts. Filter by mannequin to see safety efficiency throughout the hierarchy.
Methodology Web page
Transparency is vital to belief. Our methodology web page particulars:
- How mixed scores are calculated
- Knowledge sources and analysis standards
- Rating interpretation ranges (Glorious: 85-100%, Good: 70-84%, Honest: 50-69%, Poor: 0-49%)
- A glossary of phrases
- High quality assurance procedures
All fashions evaluated on this leaderboard have been examined of their base configurations with no further guardrails utilized. Nonetheless, sure cloud service suppliers could implement built-in content material filtering or security layers that can not be disabled. In consequence, noticed mannequin conduct could mirror a mixture of the mannequin’s inherent resilience and any provider-level protections in place on the time of testing.
What the Knowledge Reveals
Preliminary rankings reveal vital variance in LLM safety capabilities. Some fashions show wonderful resistance charges above 85%, successfully defending towards each direct and conversational assaults. Others present notable risk patterns, notably round multi-turn manipulation methods that construct rapport earlier than introducing malicious requests.
As a result of testing happens on base fashions with out guardrails, organizations can assess safety capabilities throughout a constant baseline. Manufacturing deployments ought to layer further protections based mostly on these insights and particular use case necessities.
To see our strategy in motion, go to the Cisco LLM Safety Leaderboard as we speak.
Disclaimer: The scores and rankings offered are meant solely to mirror how fashions carried out towards the described benchmark methodology and don’t represent an endorsement or assure of efficiency. Customers are solely liable for conducting their very own unbiased evaluation to find out the adequacy of any mannequin for his or her particular AI governance and safety necessities. The Cisco LLM Safety Leaderboard is supplied “as-is” with out warranties of any sort. Cisco doesn’t assure that any evaluated mannequin is secure, safe, or match to your particular use case.
