How Nethermind Security Uses AuditAgent Alongside Manual Audits
Last Updated: 20 September 2025
Introduction
Smart contract vulnerabilities have led to more than $11.8 billion in total value hacked as of July 2025. With the stakes so high, Nethermind Security explored how AI could complement traditional audits. This case study shows how their security research team tested AuditAgent, an AI smart contract audit tool developed internally, and what they learned from applying it to real audits.
Much of this analysis was first presented by Dr. Luciana Silva, PhD, and Security Researcher at Nethermind Security, during her talk at Google’s Web3: Zero Knowledge (ZK) & AI Summit.
Approach
Nethermind Security integrated AuditAgent into their workflow by running it after completing manual reviews. The objective was to check whether any potential issues had been overlooked. Each audit followed this process:
Manual review by auditors
Run AuditAgent
Inspect the tool’s findings
Findings Across 29 Audits
Nethermind applied AuditAgent to 29 audits, analyzing projects with a mean of 11.6 contracts and 725 lines of code.
62% of projects: AuditAgent detected valid issues.
30% of all auditor findings were also identified by the tool.
AuditAgent flagged issues across all severities, including Critical and High vulnerabilities.
Charts in the study show that AuditAgent’s detection rates were particularly strong in Critical (42%) and High (43%) severity categories.
Real-World Relevance: The ResupplyFi Hack
On June 27, 2025, ResupplyFi lost $9.8 million in a hack. The vulnerability stemmed from a miscalculation in exchange rate logic. When AuditAgent was later run against the contract (July 16, 2025), it flagged this exact issue, suggesting the exploit could have been avoided if the tool had been applied earlier.
Challenges and Areas for Improvement
While AuditAgent proved valuable, Nethermind identified key areas that need further work:
Business Logic Flaws: The tool can improve in identifying context-specific vulnerabilities where understanding system logic is critical.
Recall Limitations: On average, AuditAgent achieved 30% recall, detecting about one-third of issues auditors found, though some projects reached 50% recall.
False Positives/Negatives: Like most AI tools, AuditAgent still produces noise that requires expert filtering.
These findings highlight that while AuditAgent strengthens audits, it is most effective as a pair auditor rather than a standalone solution.
Challenges and Areas for Improvement
Pair auditing works: AuditAgent is most effective as a second layer of review alongside human expertise.
High-impact detection: The tool was able to catch High and Critical vulnerabilities across multiple projects.
Consistent contribution: AuditAgent accounted for 30% of all findings, strengthening overall audit coverage.