The State of AI Risk Management report is here.
Download a copy

From Snapshots to Living Intelligence: AI-Driven Threat Modeling in the World of Cyber-centric Frontier Models

By 
Bill Tamerlane
,
Sanket Naik
and
June 10, 2026

Machine scale incident response to defend against machine scale vulnerabilities — that is what the AI era now demands of every security team. Cyber-centric frontier reasoning models and modern codebases move faster than traditional threat modeling can track and also outpace the response capabilities of most security organizations. Claude Mythos (by Anthropic) and Daybreak (by OpenAI) are competing frontier AI-powered cybersecurity initiatives that can autonomously discover and chain software vulnerabilities at a speed and scale no human reviewer can match. These models have been a game changer for the security industry in 2026 and require a faster, more focused approach from product security teams.

This blog describes how our team built an AI-driven threat modeling framework — grounded in live source code rather than documentation — that finds kill chain escalation scenarios, supports engineering teams with actionable summaries, and, in one of its most unexpected applications, cuts incident response analysis time to under twenty minutes. 

We have already used this AI-driven model of our products in real-world incident response (IR) fire drills to accelerate our research dramatically and to pinpoint the products in scope for remediation or risk mitigation.

The following table describes the main benefits of AI-driven threat modeling, from Incident Response speed improvement to the improved accuracy and visibility provided when models are generated from source code and design documents. 

The Old Manual Way — and Why It Doesn’t Work in an AI Context

For developers, security threat modeling traditionally meant updating documentation to reflect current system state, filling out a security intake form, scheduling a briefing with expert reviewers, performing a threat modeling exercise with whiteboards, followed by several rounds of communications for clarification. The process was resource-intensive by design: you needed specialists who could hold a complex architecture in their heads and reason through adversarial scenarios. 

Product security has traditionally suffered from asymmetrical resourcing with one security product engineer supporting several development teams and over 100 developers. The other unavoidable constraint was scale. Even with a dedicated security review team, the sheer volume of modern codebases meant that no one could read everything. Reviewers made judgment calls about what to prioritize, and entire subsystems could go unexamined for months. Threat models were snapshots — accurate at a point in time, but increasingly stale with each sprint that passed. Periodic reassessments were possible in principle, but in practice they competed with every other demand in a small pool of expert time.

The result was a process that worked reasonably well for high-profile systems reviewed at launch, and much less well for everything else.

A Different Starting Point: Code as the Source of Truth

The AI-driven threat modeling framework we’ve been building takes a fundamentally different approach. Instead of starting with design documentation, it starts with source code — current, live, version-controlled source code pulled directly from GitHub, Gerrit, and Sourcegraph. Documentation is still consumed and adds useful context, but it’s always subordinate to what the code actually does. When there’s a conflict between a design document and an implementation, the implementation wins.

This matters more than it might seem. A lot of security issues live in that gap — in the place where what was planned diverges from what was built. Docs describe intent; code describes reality.

Our agent can also read code comments and commit history to understand why certain architectural or implementation decisions were made. This turns historical context into an asset. You’re not just seeing what the code does today; you’re seeing the reasoning thread preceding it.

The skill guiding our agent is distributed as a self-contained package and installed globally to ~/.cursor/skills/, making it available across all projects on any machine. No per-project configuration needed.

The Pipeline in Detail

When the security team invokes the AI agent skill for an existing product, the agent runs a structured pipeline automatically. Understanding these steps is important for appreciating where quality controls are built in, not bolted on afterward.

The threat model itself uses the AWS Threat Composer JSON format. JSON format works seamlessly with Git.

The pipeline begins with a pre-flight check: the agent inspects existing Jira tickets from any prior threat modeling run, verifies which items are open versus resolved, and archives the previous threat model before proceeding. This ensures continuity — no finding gets silently dropped between threat model versions.

The generation phase is where the code-level analysis happens: the agent typically reads source code through Sourcegraph and GitHub, traces data flows, identifies trust boundaries, and produces threat entries in AWS Threat Composer JSON format. All threat statements follow a Skill-driven, Threat Composer structured template that ensures each is expressed with consistent, reviewable language. Data Flow Diagrams (DFDs) are created in Mermaid format.

After generation, the validation script runs automatically: it checks JSON schema conformance, verifies that all threat IDs are valid UUIDs, and reconciles  threats and mitigations to flag obviously incomplete models. Git hooks enforce this at commit time — a model that fails validation cannot be committed to the threat modeling repository.

The pipeline then generates an executive summary and updates the central catalog — human-readable Markdown indexes and a machine-readable JSON index covering all products. This catalog is what makes the 20-minute IR analysis possible:  a current, validated threat model across every product is ready and available well in advance of an incident. You flip the script from hours of research and cross-team communication to incident response from a well-informed position.

Finding What Matters — Including What You Didn’t Know to Look For

The framework’s primary function is security threat modeling, and it takes that seriously. It identifies vulnerabilities, traces their potential exploitation paths, and can construct kill chain escalation scenarios that show how a lower-severity issue can become a high-severity one when combined with other weaknesses. This last analysis feature is driven by a custom Skill, which can be used either during the threat modeling phase or as an important Incident Response step.

This is where the AI-driven approach genuinely outperforms the traditional model. A human reviewer reading a CVE advisory might note that a particular vulnerability has a CVSS score of 5.4 and move on. The framework instead asks: given the actual code in production, given the privilege levels and trust boundaries in this system, given what else is present — can an adversary use this as a stepping stone? Sometimes the answer reveals an attack path that no one had mapped. 

Proactive kill chain analysis also anticipates the findings of a cyber-centric model like Mythos, which means that: (1) foundational weaknesses impacting multiple vulnerabilities and kill chains can be fixed proactively, and (2) when a zero day report comes in, it can be analyzed very quickly against existing, up-to-date models to determine impact and the most vulnerable points of attack. Both the IR team and engineering teams are spared the usual needle-in-a-haystack panic that would otherwise ensue.

Beyond the Finding: Closing the Loop

In threat modeling, identifying a security issue is only the first step. Our AI-driven framework is designed to complete the workflow from discovery through resolution tracking.

When new findings are generated, the agent produces canvas summaries — structured, readable snapshots of the analysis that can be shared with engineering teams, who don’t need to dig into the full threat model. These summaries present the relevant findings, their severity context, proposed mitigations, and longer-term remediation paths, all in a format that a non-security engineer can act on without a briefing session.

For gaps that can’t be closed immediately, the framework opens Jira tickets automatically, ensuring that findings don’t evaporate into a security team’s backlog but land directly in the queues of the teams who can address them. And because the framework has access to Sourcegraph and GitHub, it can also check whether a reported issue has already been fixed — preventing duplicate work and giving teams credit for remediation that predates the finding.

The Unexpected Benefit: Incident Response

Here’s where the story gets interesting. The framework was built for proactive threat modeling, but it has turned out to be remarkably useful in reactive contexts — specifically, incident response.

Kernel vulnerabilities like those that have surfaced in recent cycles (the kinds that affect memory integrity and page fault handling in ways that are subtle and hard to exploit but not impossible) create a particular kind of IR challenge. The vulnerability is public. The question is: are we affected, and if so, how badly? Answering that question thoroughly used to mean pulling in multiple engineering teams, reviewing deployment configurations, and reasoning through exploitation requirements by hand.

With the threat model framework in place, that analysis now takes roughly twenty minutes. Because the models are already built on current source code, the agent can immediately answer which services are in scope, whether the conditions for exploitation exist in the actual codebase, whether a viable kill chain runs from the vulnerability to a meaningful impact, and what the realistic exposure is. The only engineering teams that need to be involved are the ones with actual exposure — everyone else stays out of the incident bridge.

Sharing Analysis Where It Needs to Go

One of the practical improvements the framework has enabled is in how IR findings reach the people who need them. Not everyone on an incident response group chat has a background in threat modeling or wants to navigate a full threat model document under time pressure.

Canvas summaries solve this. When an IR event is underway, the security team lead drops a canvas summary directly into the group chat — a concise, structured view of the relevant analysis, the services in scope, the proposed mitigations, and the longer-term fixes. People who have never read a threat model can immediately understand what’s relevant to them and what action to take.

This also means the analysis isn’t locked in the head of the person who ran it. It’s documented, shareable, and reviewable — which matters when IR events have post-mortems and when regulators or auditors want to understand how a decision was made.

What This Looks Like in Practice

The Security Review team’s approach is to share the Cursor environment setup and prompts with IR team members directly, so that anyone who needs to run this kind of analysis can do so without depending on a single point of expertise. The skill installs from a single zip file and is immediately available across all projects. The framework is not a black box that requires a specialist to operate; it’s a structured capability that scales across the team.

The twenty-minute IR analysis figure is worth sitting with. That includes the time to pull current source code context through Sourcegraph, assess kill chain viability, determine which services are in scope, evaluate available mitigations, and produce a shareable summary. It is not an optimistic estimate made in ideal conditions — it reflects actual IR events.

That efficiency comes from having continuously maintained, code-grounded threat models across all products. The investment is in the ongoing maintenance of the framework; the payoff is that any individual analysis — whether proactive or reactive — draws on a rich, current knowledge base rather than starting from zero.

Machine Scale Incident Response to Defend Against Machine Scale Vulnerabilities

In 2026, vulnerability detection is operating at machine speed and scale; incident response also needs to take advantage of AI-driven analysis to pinpoint scope and focus the response effort.

The framework we’ve described here isn’t hypothetical. It’s running in production, producing findings, supporting IR events, and generating the kind of continuous security visibility that the old security review model couldn’t offer at scale.

For the Purple Book community, the broader lesson is about the relationship between depth and scale in security practice. Traditional threat modeling was deep but narrow — it required expert time and was therefore rationed. The AI-driven approach doesn’t eliminate the need for expert judgment, but it eliminates the bottleneck. The experts aren’t replaced; they’re amplified. The things that required a specialist’s full attention for days now take twenty minutes and produce richer, schema-validated, cataloged output.

The source code has always been where the truth lives. We’ve finally built tooling that can read it at the speed the modern threat landscape demands.

Framework Capabilities at a Glance

Threat Discovery

  • Continuous analysis of live source code via GitHub, Gerrit, and Sourcegraph (and can be extended to other sources)
  • Kill chain escalation mapping across CVEs and coding defects
  • Historical context from code comments and commit history

Remediation Workflow

  • Canvas summaries shared directly with engineering teams
  • Automatic Jira ticket creation for tracked gaps
  • Source code fix verification via Sourcegraph and GitHub

Incident Response

  • Rapid scope determination for kernel and system vulnerabilities
  • Kill chain viability assessment against actual production code
  • Canvas summaries in IR group chats for non-specialist stakeholders

Scale & Accessibility

  • Cursor IDE environment and prompts shareable across IR team
  • No specialist bottleneck — any trained analyst can run analysis
  • Covers all products simultaneously with up-to-date models

Lessons Learned

  • Security design reviews and threat modeling are becoming essential to prevent architectural issues from creeping into code being generated by AI, creating technical debt
  • With an engineering mindset, security teams can scale threat modeling for all products and create net new capabilities like faster incident response
  • While human in the loop does not scale at machine speed, it’s important to continuously test the quality of the threat models with automated testing and some human review so that engineering teams don’t lose trust in the system

Where Do We Go From Here?

  • Threat and mitigation packs for common application patterns that can be shared with the community
  • Enhancing threats for privacy with LINDDUN and agentic AI with MAESTRO
  • Mapping to attack catalogs like MITRE ATT&CK and MITRE ATLAS
  • Mapping to compliance frameworks to automate tracking risk registers and reporting

This post follows Chatham House norms. Findings, methods, and tooling approaches are shared openly; organizational details and specific incident attribution are not. Every organization is diverse and has a different set of requirements. Readers should evaluate any approach in the context of their own architecture, risk tolerance, regulatory environment, and organizational maturity. No performance guarantees, timelines, or outcomes should be assumed or inferred from this blog. 

About the Purple Book Community

The Purple Book Community (PBC) is a community of 1,200+ security leaders, including CISOs, VPs of Security, Directors of AppSec and Product Security, and Security Architects. Members come together to advance practitioner-led thinking across AI Security, Application Security, Software Supply Chain Security, and Product Security, sharing real-world challenges, case studies, and best practices in a space built on trust and peer exchange.

PBC offers a range of ways to engage: virtual discussions, in-person events, expert panels, working groups, and Centers of Excellence, all designed to help security leaders learn from each other, challenge assumptions, and stay ahead of the problems that matter most to the field.

Learn more at thepurplebook.club

Senior Director of Product Security, Nutanix
Founder and CEO, Palosade