THE ALIGNMENT-PRODUCTION CONTINUUM
The full diagnostic framework.
How alignment research findings become production engineering problems, mapped using a structure borrowed from medical diagnostics. 16 observable symptoms, 18 underlying mechanisms, 11 adversarial attack categories. Every governance decision we make traces back to this framework.
THE CONTINUUM
Research and production are converging.
Alignment research and production AI engineering used to be separate worlds. Researchers studied theoretical risks. Engineers shipped systems. The gap between them was wide enough that each could ignore the other.
That gap is closing. Failure modes that were lab curiosities twelve months ago are now production incidents. Sycophancy compounds over multi-turn deployments. Reward hacking in training generalizes to unrelated misaligned behaviors. Models fake alignment during evaluation. Agents take harmful actions under goal pressure. Each capability advance moves more theoretical risks into the production column.
A consultancy that ignores the research will ship systems that fail in ways the research community could have warned about six months earlier. A research lab that ignores deployment loses access to the empirical reality that makes its work matter. AE operates across the full continuum.
THE DIAGNOSTIC FRAMEWORK
Symptoms and mechanisms.
We map AI failure using a structure borrowed from medical diagnostics. An AI system can fail in observable ways (symptoms) produced by underlying causal processes (mechanisms). Some mechanisms produce many symptoms. Some symptoms can be produced by many mechanisms. The diagnostic value is in identifying the correct mechanism: the right response might be a fix, a partial mitigation, containment, or monitoring. Many mechanisms do not have reliable fixes yet. Knowing that is itself critical information for deciding how much autonomy to grant. We also map the adversarial attack surface: how external actors can exploit these mechanisms deliberately.
16 symptoms
Observable behaviors. Each traces to one or more mechanisms.
18 mechanisms
Underlying causes. Tagged by treatment status.
Most mechanisms without a root-cause fix still have practical responses: containment, architectural isolation, monitoring, least-privilege constraints. Only two (deceptive alignment and mesa-optimization) are genuine open problems with no operational mitigation. The governance architecture is about applying the right response to each mechanism, and making informed decisions about how much autonomy to grant based on what's treatable today.
BY SYSTEM TYPE
Different systems, different risk surfaces.
The governance architecture scales with the system's complexity and autonomy. A simple assistant needs different controls than a multi-agent strategic system.
Fabrication, injection, sycophancy. Standard engineering mitigations. Well-understood.
Add scope violation. Least-privilege scoping is the highest-leverage intervention.
Add context degradation and selective disclosure. Monitor for correlated symptoms that suggest a shared mechanism.
Add inter-agent coordination risks. Competitive dynamics create alignment degradation even with explicit honesty instructions.
The full symptom surface is relevant. This is where the alignment-production continuum is tightest and governance matters most.
ATTACK SURFACE
How adversaries exploit these mechanisms.
The same mechanisms that produce accidental failures can be exploited deliberately. We map 11 attack categories across three layers: model-level (exploiting how the model processes input), infrastructure-level (exploiting deployment configuration), and operations-level (exploiting how the system is managed).
Key finding: Two mechanisms (architectural conflation and over-broad permissions) are the proximate enablers of 9 of the 11 attack categories. Defending these two mechanisms covers the majority of the adversarial surface. Multi-turn attacks are the most impactful single category: every frontier model tested is vulnerable, and current safety evaluation infrastructure is structurally calibrated for single-turn testing.
This framework informs every governance engagement we deliver.