Module 624 min read · AI in Governance

AI in Criminal Justice and Law Enforcement

Few government domains carry higher stakes than criminal justice. Decisions about who is investigated, arrested, detained, or sentenced determine whether people live free or imprisoned — and errors compound through a system where early disadvantage propagates forward. AI has entered this system at nearly every stage, from patrol routing to parole decisions, raising fundamental questions about fairness, due process, and the relationship between algorithmic probability and human dignity.

The landscape of AI in criminal justice

Artificial intelligence is now embedded across the criminal justice pipeline, though the depth and visibility of that embedding varies considerably. Some applications are widely known and debated; others operate quietly in the background of daily policing and court administration. Understanding the full landscape is essential context for evaluating the reforms and regulations that are beginning to emerge.

Predictive policing systems use historical crime data, demographic information, and sometimes social media activity to forecast where crimes are likely to occur or which individuals are likely to commit them. Place-based systems, such as PredPol (now Geolitica), generate maps showing areas of elevated predicted risk, guiding patrol deployment. Person-based systems attempt to identify individuals who are at elevated risk of either committing or becoming victims of violence.

Risk assessment instruments are used at multiple points in the justice process — pretrial detention decisions, sentencing recommendations, probation conditions, and parole hearings — to estimate the likelihood that an individual will reoffend, fail to appear in court, or engage in violence. These instruments synthesize dozens of factors about an individual's history and circumstances into a numerical score or categorical risk level.

Facial recognition technology has been deployed by police agencies to identify suspects from surveillance footage, social media images, and other photographs. Unlike most investigative tools, facial recognition can search millions of database entries in seconds, creating a qualitatively new kind of identification capacity.

License plate readers mounted on patrol cars or fixed infrastructure automatically record the plates of passing vehicles, building databases of vehicle locations over time. Analytics layers on top of this data can identify vehicles associated with individuals under investigation, flag stolen plates in real time, and reconstruct the movements of specific vehicles across jurisdictions.

Gunshot detection systems, of which ShotSpotter is the most widely deployed example, use acoustic sensors to identify the location of gunfire and alert police without requiring a civilian call. These systems operate continuously and passively, generating alerts that may lead to police presence and contact in communities where they are deployed.

Body camera AI analytics apply computer vision and natural language processing to footage captured by officers, potentially enabling automatic detection of use-of-force incidents, policy violations, or patterns of conduct across an officer's history.

COMPAS and the ProPublica investigation

No single episode has done more to crystallize public debate about algorithmic risk assessment than the 2016 ProPublica investigation into COMPAS — Correctional Offender Management Profiling for Alternative Sanctions. COMPAS is a risk assessment instrument developed by Equivant (then Northpointe) and used in jurisdictions across the United States to generate risk scores that inform pretrial, sentencing, and parole decisions.

ProPublica journalists obtained COMPAS scores for more than 7,000 defendants in Broward County, Florida, and followed their outcomes for two years. Their analysis found that Black defendants were nearly twice as likely as white defendants to be incorrectly flagged as high risk for future offending — a false positive rate of 44.9 percent for Black defendants versus 23.5 percent for white defendants. Conversely, white defendants were more likely to be incorrectly labeled low risk and go on to commit new offenses — a false negative rate of 47.7 percent for white defendants versus 28 percent for Black defendants.

The fairness impossibility theorem

The debate that followed the ProPublica investigation revealed a fundamental mathematical constraint: when recidivism base rates differ across demographic groups, it is impossible to simultaneously equalize false positive rates, false negative rates, and positive predictive value. Any choice of fairness criterion advantages one group and disadvantages another. This is not a flaw in COMPAS specifically — it is an inherent property of probabilistic classification across groups with different base rates. The policy question is which kind of error, committed against which group, is most acceptable.

Northpointe disputed ProPublica's analysis, noting that COMPAS was equally calibrated across racial groups — defendants with the same score had similar recidivism rates regardless of race. Both claims are mathematically correct. They reflect different fairness criteria, not a disagreement about facts.

The COMPAS controversy also exposed a structural due process problem: defendants subject to risk assessment scores often cannot meaningfully challenge them. In State v. Loomis (2016), the Wisconsin Supreme Court upheld the use of COMPAS in sentencing while acknowledging that the proprietary nature of the algorithm prevented defendants from scrutinizing how their score was calculated. The court reasoned that the score was not the sole basis for the sentence and that judges were free to discount it — a rationale that critics found insufficient given the practical weight such scores carry in overburdened courts.

Facial recognition: accuracy, disparities, and bans

The National Institute of Standards and Technology (NIST) has conducted the most rigorous independent evaluations of facial recognition accuracy, publishing findings through its Face Recognition Vendor Testing (FRVT) program. The results have been unambiguous on one point: most facial recognition algorithms perform significantly worse on images of darker-skinned individuals, women, and older adults than on images of lighter-skinned men in their thirties and forties.

The best-documented finding is the disparity in false match rates — the probability that the algorithm incorrectly identifies a non-target person as a match. NIST found that for one-to-one verification tasks (confirming that a photo matches an enrolled identity), some algorithms had false match rates for Black women that were ten to one hundred times higher than for white men. For the one-to-many identification tasks that are most relevant to law enforcement use — searching a database to find who a suspect is — the disparities are similarly pronounced.

These accuracy disparities have direct consequences when facial recognition is used to generate investigative leads. A false positive identification does not typically result in immediate arrest, but it initiates investigation, including possible lineup appearances and witness identification processes that can go badly wrong. A 2020 report by the MIT Media Lab documented three cases in which Black men were wrongly arrested based on misidentifications made by facial recognition systems — in each case, the false match triggered an investigation that human investigators failed to critically scrutinize.

Minneapolis and the federal response

In 2021, Minneapolis became one of the first major U.S. cities to formally prohibit police use of facial recognition technology, following intense scrutiny of the technology's role in the aftermath of George Floyd's murder and growing evidence of accuracy disparities. Several other cities have enacted similar prohibitions, including San Francisco, Boston, and Portland.

At the federal level, responses have been more muted. The FBI and Department of Homeland Security have expanded facial recognition use; Congress has held hearings but not enacted comprehensive legislation as of 2024. The patchwork of local prohibitions and the absence of federal standards creates a situation in which a technology may be banned in one jurisdiction but used routinely in the jurisdiction across the county line.

Risk assessment in bail and sentencing

Risk assessment instruments are now used in pretrial detention decisions across much of the United States, often in the context of bail reform efforts that sought to replace cash bail with evidence-based risk evaluation. The Public Safety Assessment (PSA), developed by the Arnold Foundation (now Arnold Ventures), is among the most widely adopted — it uses nine factors from criminal history to generate scores predicting failure to appear and new criminal activity, and has been adopted in jurisdictions ranging from New Jersey to Kentucky.

The political trajectory of risk assessment instruments in bail reform illustrates the complexity of algorithmic governance debates. Advocates from across the political spectrum initially supported replacing cash bail — which effectively imprisons people for being poor — with risk assessment. Subsequent critiques from civil rights organizations pointed out that many risk factors correlate with race and socioeconomic status, and that algorithmic pretrial detention could perpetuate or amplify existing disparities while providing the appearance of objectivity.

Illinois enacted the Pretrial Fairness Act, effective January 2023, which abolished cash bail statewide and also significantly restricted the use of risk assessment instruments — prohibiting their use as the sole basis for detention and requiring that courts consider the pretrial services available to individuals rather than relying on algorithmic risk scores. Illinois represents the most significant statutory reform of pretrial practice in the United States, and its experience will provide evidence about whether bail reform can be accomplished without heavy reliance on algorithmic tools.

Pretrial risk assessment
Estimating the likelihood an individual will fail to appear for trial or engage in new criminal activity while awaiting trial, used to inform detention decisions and bail conditions.
Sentencing risk assessment
Informing judicial discretion at sentencing by estimating recidivism likelihood — used in some jurisdictions as a structured factor judges must consider, in others as optional guidance.
Parole risk assessment
Evaluating individuals eligible for release from incarceration, estimating the risk of reoffending in the community — often with significant weight given to score in board decisions.
Probation and supervision
Determining the level of supervision and conditions imposed on individuals under community supervision — more intensive monitoring for higher-scored individuals.

Due process and the right to challenge algorithms

A core principle of due process in criminal proceedings is that defendants have the right to confront and challenge the evidence used against them. When that evidence includes an algorithmic risk score generated by a proprietary system, meaningful challenge may be impossible. Defense counsel cannot cross-examine an algorithm; courts cannot verify that inputs were entered correctly or that the model is performing as claimed without access to the underlying system.

Courts have reached inconsistent conclusions about whether this poses constitutional problems. Some have found that algorithmic risk scores are simply tools to structure discretion, not determinative evidence, and that existing procedural protections are adequate. Others have expressed concern that the practical weight of algorithmic scores in overburdened systems makes this theoretical distinction hollow. Academic legal commentators have widely argued that due process requires at a minimum that defendants have access to the factors used to generate their score, the training data validation results, and any documented disparate impact — even if the underlying model weights remain proprietary.

Human oversight in life-affecting decisions

Perhaps the clearest principle to emerge from the accumulating evidence on criminal justice AI is that consequential decisions about individuals must involve meaningful human judgment, not just algorithmic output ratified by a human signature. This is distinct from formal human review — a judge who receives a risk score report and incorporates it uncritically has exercised nominal human oversight, not genuine human judgment.

Meaningful human oversight requires that decision-makers understand what the algorithm is and is not measuring, know the algorithm's validated accuracy and its documented disparities, receive training on how to weigh algorithmic input against other evidence, and document their reasoning when they accept or depart from algorithmic recommendations. Jurisdictions that deploy AI without providing this infrastructure to decision-makers are not achieving human oversight — they are creating a paper trail that provides cover for algorithmic authority.

Body camera AI analytics, gunshot detection, and license plate reader systems raise somewhat different oversight questions. These tools generate investigative leads rather than making formal decisions. But the decision-making they trigger — who gets stopped, who gets questioned, who gets searched — has profound consequences for individuals and communities. Oversight requirements for investigative AI must grapple with the cumulative effect of many low-stakes individual decisions, not just the formal decisions at the end of the process.

Reform frameworks taking shape

Several jurisdictions are developing more comprehensive frameworks for AI in criminal justice. New York City's Local Law 144 required bias audits for automated employment decision tools; similar legislation has been proposed for criminal justice AI. The Algorithmic Accountability Act, introduced in Congress in multiple sessions, would require impact assessments for consequential automated systems. At the state level, Colorado enacted the Artificial Intelligence Act in 2024 requiring developers and deployers of high-risk AI systems — including those used in criminal justice — to exercise reasonable care to protect against algorithmic discrimination.

The central challenge in AI and criminal justice is that the systems operate at the intersection of two powerful tendencies: the genuine desire to make criminal justice more consistent and evidence-based, and the structural reality that historical criminal justice data encodes decades of discriminatory policing and prosecution. AI trained on that data can reproduce and legitimize historical discrimination while presenting it as objective prediction. Addressing this requires not just better algorithms but better governance — transparency, accountability, human oversight, and willingness to ask whether algorithmic tools are serving justice or obscuring its absence.