Module 9 · Expert Track26 min read · AI Safety and Alignment

Catastrophic Risk Scenarios

The field of AI safety exists in large part because of a specific claim: AI development, if pursued without adequate precautions, could lead to outcomes that are not merely bad but catastrophic — outcomes that are severe, large-scale, and potentially irreversible. Understanding what those scenarios actually are, how they work mechanistically, and how likely they are requires moving beyond vague gestures toward specific, concrete analysis. This module does that analysis.

A taxonomy of catastrophic AI risks

Catastrophic AI risks fall into three structurally distinct categories that are worth keeping separate, because they have different causes, different probability structures, and different mitigations.

Misuse risks

AI systems are used by humans — including malicious actors — to cause harm at scales or with effectiveness that would not have been possible without AI assistance. The AI system itself may be functioning exactly as designed; the risk comes from who is using it and for what. Bioweapons uplift, cyberweapons generation, and influence operations at scale are the primary current misuse concerns.

Misalignment risks

AI systems pursue goals that diverge from what their operators intend or what is good for humanity, potentially causing harm as a byproduct of or in service of those misaligned goals. The AI is not being misused by humans; it is itself acting in ways that are contrary to human interests. This category includes scenarios sometimes described as "AI takeover" but covers a much broader range of failure modes.

Systemic and structural risks

AI deployment changes the structure of power, institutions, or social systems in ways that concentrate control, undermine accountability mechanisms, or create systemic fragilities. No single AI system is the cause; the risk emerges from the cumulative effect of AI deployment across society. AI-enabled authoritarianism and economic concentration are the primary examples.

These categories can interact. A misaligned AI system might be misused by a small group to concentrate power (combining misalignment and structural risk). AI-enabled bioweapon development might create a catastrophic event that in turn enables structural changes to governance. Understanding the categories helps; the scenarios that actually matter often involve multiple risk types simultaneously.

Bioweapons uplift and the dual-use research problem

Of the near-term misuse risks, AI-assisted bioweapon development is consistently rated by frontier AI researchers as the most concerning. The reasoning is specific: the main barrier to developing a pathogen capable of causing mass casualties has historically been technical knowledge and skill, not materials. Unlike nuclear weapons, which require rare physical materials (highly enriched uranium, plutonium) that are extremely difficult to obtain, biological agents can in principle be synthesized by anyone with sufficient knowledge of pathogen biology, genetic engineering, and acquisition of standard laboratory equipment.

AI systems capable of providing detailed technical guidance in molecular biology, virology, and synthetic biology could lower the knowledge barrier dramatically. The concern is not that AI would make bioweapon development trivially easy — significant practical obstacles remain — but that it could extend the set of actors technically capable of serious attempts. A state actor with modest resources, a non-state group with access to a well-equipped biology laboratory, or even a lone actor with a chemistry background might gain meaningful capability from AI uplift that currently lies beyond their reach.

What "uplift" means technically

Uplift is a specific technical term in this context: it refers to the degree to which an AI system increases a would-be actor's probability of successfully developing a dangerous capability relative to their baseline probability without AI assistance. An AI system provides significant uplift if it can explain synthesis routes, troubleshoot failed attempts, suggest modifications to increase transmissibility or lethality, or help acquire necessary precursors while avoiding detection — in short, if it functions as an expert collaborator on the development project.

Current frontier models are evaluated for biosecurity risk using structured elicitation tasks developed by RAND, the Center for Health Security at Johns Hopkins, and other organizations. The evaluations test whether models provide information beyond what is available from open-source literature, help non-experts understand and operationalize technical information, or provide assistance that is specifically useful for weapons development rather than legitimate research.

The dual-use research problem

Biology presents a fundamental dual-use problem that makes AI governance in this domain exceptionally difficult. The same knowledge that allows researchers to understand and defend against dangerous pathogens also allows bad actors to create them. Gain-of-function research — experiments that increase the transmissibility or virulence of pathogens — is conducted by legitimate scientists to understand pandemic risk, yet the same research could provide a blueprint for weaponization. AI systems that can assist legitimate biosecurity research can also assist hostile actors pursuing the same technical goals.

This creates a genuine dilemma for AI labs. Refusing to engage with any biology question is too restrictive — it would prevent legitimate researchers from using AI for valuable work. Engaging with all biology questions is too permissive — it provides uplift to hostile actors. The current approach, implemented across major frontier labs, is to train models to refuse specific categories of requests (detailed synthesis routes for dangerous pathogens, enhancement of transmissibility, acquisition strategies that circumvent safety regulations) while engaging with general biology education and research assistance. The challenge is that the boundary between permissible and impermissible is blurry, context-dependent, and easily obscured by adversarial framings.

The Specificity Problem

AI biosecurity evaluations consistently find that the most dangerous assistance is not general information (which is widely available) but specific, operationally relevant guidance — troubleshooting specific synthesis failures, suggesting modifications targeted to particular desired properties, or helping integrate information from disparate sources into a coherent development plan. A model that refuses to discuss pathogens in the abstract but engages with highly specific technical questions framed as "research" may provide significant uplift while passing surface-level safety evaluations. The specificity problem is why biosecurity evaluations must go beyond checking whether a model will discuss dangerous topics and must assess whether it provides meaningful operational assistance.

Cyberweapons and critical infrastructure

AI-assisted cyberweapon development is a second major misuse concern, and in some ways a more tractable one because the attack/defense dynamics are more symmetrical and the harm ceiling, while potentially very high, is generally lower than for biological threats.

The concern is that AI systems capable of sophisticated code generation and vulnerability analysis could significantly lower the barrier to developing novel cyberweapons — specifically, to discovering zero-day vulnerabilities in critical infrastructure systems and developing exploit code that takes advantage of them. Currently, finding novel vulnerabilities in well-defended systems requires a combination of deep technical knowledge, creativity, and extensive time. AI that can assist with vulnerability research at scale could expand this capability to a much larger set of actors and accelerate the speed at which new attack vectors are discovered.

The critical infrastructure dimension is particularly concerning. Power grids, water treatment systems, financial infrastructure, and hospital networks are interconnected in ways that make cascading failures possible. A sophisticated cyberattack on a power grid during winter, for example, could produce civilian casualties through secondary effects (loss of heating, failure of medical equipment) that dwarf the direct damage from the attack itself. AI-assisted attack development that makes such attacks accessible to less sophisticated actors is a meaningful escalation of this risk.

The asymmetric defense problem

In cybersecurity, AI uplift is available to both attackers and defenders, and there is a reasonable case that AI benefits defenders more than attackers in many contexts — AI can monitor network traffic at scales no human team can, identify anomalous patterns, and respond to threats faster than human incident response teams. But the asymmetry cuts both ways: AI can also assist attackers in generating novel attack variations faster than defenders can update their defenses, and in automating the scanning and exploitation process that currently requires skilled human operators.

Loss of human oversight at scale

A third category of catastrophic risk concerns the structural loss of meaningful human oversight over consequential decisions as AI systems are deployed at scale and with increasing autonomy. This risk is not primarily about a single AI system making a catastrophically bad decision but about the cumulative erosion of human control over domains where human judgment is essential.

Consider the trajectory of AI deployment in high-stakes domains: AI-assisted medical diagnosis, AI-assisted legal research, AI-assisted financial decision-making, AI-assisted military targeting systems. In each domain, the economic incentives favor increasing automation, increasing speed, and reducing the number of human decision-makers in the loop. Each individual step may be defensible — AI diagnostics that are more accurate than average radiologists improve patient outcomes, AI financial systems that react faster than humans reduce arbitrage opportunities. But the cumulative effect may be a world in which consequential decisions across many domains are made by AI systems that humans cannot meaningfully audit, override, or correct.

Autonomy and Accountability

A specific version of this concern applies to autonomous weapon systems. As of 2024, significant AI integration in military targeting systems exists across multiple major powers, including systems that can identify and track targets autonomously. The question of when and whether autonomous systems can select and engage targets without direct human authorization is the subject of ongoing international debate under the UN Convention on Certain Conventional Weapons. The concern from an AI safety perspective is not just ethical (the question of whether autonomous killing is acceptable) but structural: military decision-making systems that exclude meaningful human oversight are systems where errors — including catastrophic errors — cannot be corrected in real time. The speed advantage of autonomous systems comes precisely from removing the latency introduced by human judgment, and that latency is also the window in which errors can be caught.

Concentration of power through AI

Perhaps the most structurally significant catastrophic risk from AI is not any single failure mode but a class of scenarios in which AI dramatically accelerates the concentration of economic, political, or military power in a small number of entities — whether states, corporations, or individuals — in ways that undermine the competitive and institutional checks that currently prevent any single actor from dominating human civilization.

The mechanism is not mysterious. AI dramatically amplifies the productivity of already-capable actors. A company with access to frontier AI systems can potentially do the work of a much larger organization. A government that comprehensively deploys AI in intelligence, economic planning, and military systems gains advantages over adversaries that compounds over time. An individual with access to the most capable AI systems can exercise influence at scales previously unavailable to individuals. If AI capability is highly concentrated, the productivity amplification it provides flows primarily to those who already have concentrated power, accelerating inequality and concentration rather than distributing the benefits of the technology.

Economic concentration

The economic dimension of power concentration is already visible. AI development is capital-intensive, dominated by a small number of frontier labs with access to very large compute budgets, enormous proprietary datasets, and deep talent pipelines. Network effects in AI — the value of models trained on more data, with more compute, using more researchers — favor scale in ways that tend toward natural monopoly dynamics. If a small number of companies develop AI systems substantially more capable than anything available to competitors, and if those systems provide decisive productivity advantages in many sectors, the economic concentration could be severe.

Political and military concentration

The political and military dimensions are potentially more acute. A government that develops substantially more capable AI than its geopolitical rivals gains surveillance and intelligence advantages, logistical and planning advantages, and potentially decisive military advantages. The Cold War nuclear deterrence framework was premised on roughly symmetrical destructive capability — Mutually Assured Destruction worked because neither side could strike without facing equal retaliation. If AI capability becomes highly asymmetric, this equilibrium breaks down. A sufficiently capable AI system deployed for military applications could provide one state with decisive tactical and strategic advantages that no amount of human ingenuity by the opposing side could overcome.

The Lock-In Problem

What makes power concentration through AI a potentially catastrophic rather than merely bad outcome is the possibility of lock-in — a scenario in which whoever reaches a decisive advantage in AI capability uses that advantage to cement their position in ways that cannot be reversed. This could occur through AI-assisted surveillance and control of domestic populations, AI-assisted suppression of political opposition, AI-assisted economic dominance that makes competitors structurally dependent, or AI-assisted military superiority that makes armed resistance impossible. The historical record shows that very extreme concentrations of power can persist for decades or generations once established. An AI-enabled lock-in of authoritarian control — at the national or global level — would represent a catastrophe that, unlike many other risks, might not self-correct over time.

AI-enabled mass surveillance and authoritarianism

A specific version of the power concentration concern focuses on AI's potential to dramatically improve the capability of surveillance and social control systems. The ingredients of effective authoritarian surveillance are: the ability to observe a population's movements and communications at scale, the ability to identify individuals who express dissent or engage in organized opposition, and the ability to respond quickly enough to suppress coordination before it becomes self-sustaining. AI improves all three.

Large-scale facial recognition in public spaces, combined with gait analysis and behavioral pattern recognition, makes it possible to track individuals' movements across a city with minimal human labor. Natural language processing applied to communications surveillance can flag sentiment and detect coordinated opposition. Social graph analysis can identify the most influential nodes in opposition networks and prioritize suppression efforts. Predictive systems can identify individuals likely to engage in protest or organization before they do so.

These capabilities are not theoretical. China's deployment of AI-assisted surveillance in Xinjiang — including facial recognition at scale, mandatory phone data extraction, behavioral profiling, and predictive detention — has been documented in detail by researchers, journalists, and government reports from multiple countries. The infrastructure developed there has been exported to other governments. The question for AI safety is not whether AI surveillance systems can enable authoritarian control — they demonstrably can — but whether the development and export of these capabilities can be governed, and what the implications are for the long-term trajectory of human governance.

Catastrophic versus existential risk

AI safety discourse often conflates "catastrophic" and "existential" risk, but the distinction matters both analytically and practically. A catastrophic outcome is one that causes enormous harm — deaths at scale, severe degradation of human welfare, long-lasting damage to institutions. An existential risk is a catastrophic outcome that also affects humanity's long-term potential — either causing human extinction or permanently curtailing what humanity could otherwise achieve.

Most of the scenarios discussed in this module are catastrophic without being existential in the strict sense. An AI-assisted bioweapon causing a severe pandemic would be a catastrophe comparable to or exceeding historical pandemics; it would not by itself end human civilization. An AI-enabled authoritarian regime would impose enormous suffering; humanity has survived previous authoritarian regimes. The scenarios that rise to the level of existential risk involve either very high casualties (pandemic killing most of humanity, not just a substantial fraction) or mechanisms for permanent lock-in (a global AI-enabled authoritarianism with no viable path to overthrow).

Why Both Matter

The distinction between catastrophic and existential risk is analytically important but should not become a reason to neglect the non-existential catastrophic risks. A pandemic that kills 10% of humanity is not existential but it is an overwhelming moral catastrophe. An authoritarian lock-in that affects a single country of a billion people is not a threat to humanity's long-term potential but involves staggering human suffering. AI safety work that focuses exclusively on extinction-level risks and neglects the more probable and near-term catastrophic risks is both practically and morally incomplete. The field has increasingly recognized this, with major organizations including the Center for AI Safety, the Future of Humanity Institute, and GovAI all maintaining research portfolios that address near-term catastrophic risks alongside longer-term existential scenarios.

Near-term versus long-term risk timelines

A persistent and sometimes acrimonious debate in AI safety concerns the relative priority of near-term versus long-term risks. Near-term risks include AI-assisted bioweapons, cyberweapons, surveillance, discrimination, labor displacement, and economic concentration — these are risks that current or near-current AI systems plausibly contribute to. Long-term risks include scenarios involving much more capable future AI systems that might develop misaligned goals or enable catastrophic power concentration in ways that current systems cannot.

The debate is not just about timelines but about prioritization. Research attention, funding, and policy focus are limited resources. Prioritizing long-term existential risks means potentially underinvesting in near-term catastrophic risks that are both more probable in the short run and affect many people who are alive today. Prioritizing near-term risks means potentially underinvesting in the technical safety research that may be necessary to prevent long-term catastrophic risks from materializing as AI systems become more capable.

Why both timelines deserve serious attention

The strongest case for taking both timelines seriously is that they are not independent. Near-term AI-enabled catastrophes — a pandemic, a severe cyberattack on critical infrastructure, significant erosion of democratic institutions — would damage the social, institutional, and political infrastructure that governance of more powerful future AI systems depends on. If AI development undermines the institutions and international cooperation that future AI governance requires, the probability of catastrophic outcomes from more capable future systems increases. Near-term safety is a prerequisite for the conditions in which long-term safety can be achieved.

Similarly, the technical research needed to address long-term risks often has near-term applications. Interpretability research that helps identify when models are hiding information from evaluators is directly applicable to near-term deployment safety, not just to distant hypothetical superintelligent AI. Responsible scaling policies that mandate capability evaluations before each scale increment address both near-term risk (ensuring models are not deployed with currently dangerous capabilities) and long-term risk (building the evaluation infrastructure and organizational culture needed for safe development of future, more capable systems).

Convergence, Not Competition

The framing of near-term versus long-term risk as a zero-sum competition for resources and attention obscures more than it reveals. The interventions most robustly beneficial across both timelines — building the technical infrastructure for AI capability evaluation, establishing international governance norms, developing interpretability tools, fostering a culture of safety in AI development — are the same. A research program focused on those convergent interventions avoids the worst of the tradeoff. The disagreement that matters most in AI safety is not between those who care about near-term risks and those who care about long-term risks, but between those who believe the problem is tractable and worth investing in now, and those who believe current concern is premature or misallocated.