Where bias enters AI systems
The word "bias" in the context of AI systems covers several distinct phenomena that are frequently conflated. Understanding them separately is important because they arise from different sources and require different remedies.
Training data bias occurs when the dataset used to train an algorithm does not adequately represent the population to which the system will be applied. Facial recognition systems trained predominantly on images of white men from certain demographic and age ranges will predictably perform less accurately when asked to match the faces of women, older people, or people with darker skin. This is not a design choice by the people building the system; it is a consequence of training data that was not sufficiently representative, which often reflects who was most commonly photographed and whose images were most readily available in usable form when the training datasets were assembled.
Historical data bias is distinct and in some ways more intractable. Risk assessment tools trained on criminal justice records — arrest data, conviction data, reoffending data — are trained on outcomes that reflect decades of policing decisions as much as they reflect underlying patterns of behaviour. If a particular community has historically been subject to more intensive stop-and-search, higher rates of arrest for low-level offending, and less favourable treatment at every subsequent stage of the criminal justice process, all of that inequality is encoded in the historical data. A model trained on that data will learn to predict outcomes that reflect the existing system, and will tend to produce scores that recommend continuing the same patterns.
Proxy variable bias occurs when a variable that is used as a model input correlates strongly with a protected characteristic, effectively introducing that characteristic into the model's predictions without naming it. Postcode is the clearest example in UK policing: it correlates with ethnicity, socioeconomic deprivation, and historical policing intensity in ways that mean including postcode in a risk model can produce outputs that effectively discriminate on grounds of race or class even though the model contains no direct reference to either.
NIST and the evidence on facial recognition accuracy
The most authoritative body of evidence on demographic disparities in facial recognition accuracy comes from the Face Recognition Vendor Testing programme run by the US National Institute of Standards and Technology. NIST has evaluated hundreds of algorithms submitted by commercial vendors and academic groups since 2019, testing them against large datasets across a range of demographic groups, image types, and operational scenarios.
The headline findings have been consistent and stark. The majority of the algorithms tested — though not all — produce higher false match rates for Black and Asian faces than for white faces, sometimes by an order of magnitude. The disparity is particularly pronounced for images of Black women, who in some algorithms face false positive rates ten to a hundred times higher than those for white men. False non-match rates — where a genuine match is not found — also vary by demographic group, with some algorithms performing substantially worse on older adults and children.
NIST has been careful to note that the tested algorithms vary enormously in their accuracy and their demographic disparities, and that the best performers have substantially smaller differentials across groups than the worst. This means that a police force choosing an algorithm on grounds of headline accuracy without looking at demographic breakdown data may be unknowingly selecting a system with significantly unequal performance across the communities it will be used to police. The NIST programme does not include operational evaluation — it does not test how algorithms perform in real policing conditions, with the image quality and variability that characterises real CCTV and body camera footage — which means even the best-performing algorithms under test conditions may perform differently in deployment.
The UK stop and search context
The deployment of AI tools in UK policing does not take place in a neutral environment. Section 95 of the Criminal Justice Act 1991 requires the Secretary of State to publish information that enables criminal justice agencies to identify and respond to any discriminatory differences in the operation of the criminal justice system. The statistics published under Section 95 have consistently shown significant racial disproportionality in stop and search: in 2022/23, Black people in England and Wales were stopped and searched at approximately six times the rate of white people. For stop and search under Section 60 of the Criminal Justice and Public Order Act 1994 — which does not require reasonable grounds and is used in response to anticipated violence — the disparity was even more pronounced.
This background matters for AI because predictive policing and facial recognition tools deployed in this context will be working within, and potentially amplifying, a system in which racial disproportionality is already well-documented. A facial recognition system deployed in an area where Black residents are already subject to disproportionate policing activity will encounter a higher proportion of Black faces in its operational environment; if it also produces higher error rates for Black faces, the combination creates a compounding disadvantage. The same logic applies to risk assessment tools used in custody or sentencing: a tool trained on data from a system characterised by racial disproportionality will tend to replicate that disproportionality in its outputs.
The Casey Review and structural racism
The Baroness Casey Review into the standards of behaviour and internal culture of the Metropolitan Police, published in March 2023, is the most significant recent official assessment of bias and discrimination in UK policing. The review found institutional racism, misogyny, and homophobia within the Metropolitan Police, and documented a pattern in which officers from marginalised groups faced discrimination from within the force while members of the public from those same groups faced discriminatory treatment from officers.
The review's findings are directly relevant to the AI debate because they establish that bias in policing is not solely a function of algorithms: it is also a function of organisational culture, decision-making processes, and the human judgements that AI outputs feed into and which feed into them. An algorithmic risk score reviewed by an officer operating within an institutionally biased culture may reinforce rather than correct for that bias, regardless of the score's technical characteristics. Conversely, a well-designed AI system with genuinely equitable performance characteristics could still produce discriminatory outcomes if it is deployed and acted upon in a discriminatory way.
This point has broader implications: the framing of the AI bias debate that focuses exclusively on technical properties of algorithms — accuracy gaps, fairness metrics, training data composition — can obscure the extent to which algorithmic tools are embedded in, and interact with, human systems that have their own forms of bias. The Casey Review suggests that improving the technical fairness of AI tools used by the Metropolitan Police is necessary but not sufficient for addressing discriminatory outcomes in practice.
The Equality Act and the Public Sector Equality Duty
Section 149 of the Equality Act 2010 imposes the Public Sector Equality Duty on police forces, requiring them to have due regard to the need to eliminate unlawful discrimination, advance equality of opportunity between groups with different protected characteristics, and foster good relations between different groups. The duty applies to the adoption and use of AI tools as much as to any other aspect of policing, and in principle requires forces to assess and address any discriminatory differential in the performance or impact of the tools they deploy.
The Court of Appeal's decision in Bridges v Chief Constable of South Wales Police (2020) gave practical content to this requirement in the facial recognition context. The court found that South Wales Police had failed to comply with the Public Sector Equality Duty specifically because its Data Protection Impact Assessment did not adequately assess the potential differential impact of facial recognition on people with different protected characteristics, and because the force had not taken steps to ensure that the algorithm it used had been tested for bias across relevant demographic groups before operational deployment. The judgment imposed a requirement to carry out that assessment before any future deployment, and in doing so established a practical minimum standard that other forces seeking to deploy similar technology must meet.
The duty does not require perfection — it does not mean that a force can only deploy an AI tool if it performs identically across all demographic groups — but it does require genuine engagement with the evidence about differential performance, a documented decision about whether any differential is acceptable and proportionate to the aim pursued, and ongoing monitoring to detect disparities that emerge in operational conditions that may not have been anticipated in pre-deployment testing.
Gender and disability bias
The AI bias debate in a policing context has focused primarily on race, and for good reason: the evidence is most extensive and the consequences most serious there. But the NIST findings also document significant performance differentials across gender and age, and there is a growing literature on disability bias in AI systems used in contexts that include policing.
Women are generally less well-represented in the training datasets of systems developed in male-dominated fields, and facial recognition systems designed or evaluated primarily on male faces tend to perform less accurately on women. Older adults face accuracy challenges in several biometric and recognition systems because the physical characteristics used by algorithms change with age in ways that training data from younger populations does not capture well. People with certain physical or cognitive disabilities may interact with police AI systems in ways that produce unexpected or adverse outcomes: a person with a movement disorder may interact with a behavioural analytics system in ways that trigger false alerts; a person with a learning disability may not understand the implications of having their data processed by an automated system and may be unable to exercise meaningful consent or challenge.
What forces are doing — and what they are not
The response of UK police forces to the AI bias question has been uneven. The Metropolitan Police, as the most visible deployer of live facial recognition in the UK, has been the most subjected to external scrutiny, and has published more information about its facial recognition deployments than most other forces — though civil liberties groups and academics have consistently argued that the information published falls short of what genuine accountability requires. The force has stated that it only uses facial recognition algorithms that have been tested for bias across demographic groups, though the details of that testing and its outcomes have not been fully disclosed.
The Home Office's Algorithmic Transparency Recording Standard — a voluntary framework introduced in 2021 requiring public bodies to publish structured information about algorithmic tools used in decision-making — has seen very limited uptake among police forces. As of the mid-2020s, only a small number of forces had published records under the standard, and the records that had been published varied significantly in their completeness and the depth of information they provided about bias testing and mitigation. The standard is not mandatory, which removes much of its force as a transparency mechanism. Proposals to make it mandatory have been floated in policy discussions without resulting in legislative action.
Follow the coverage
PoliceAI News tracks AI bias and discrimination stories continuously — new research, legal challenges, force audit results, equality findings, and policy developments as they are reported.
View Live AI Bias StoriesYou can also browse the full archive on this topic or explore related subjects: facial recognition, predictive policing algorithms, and data protection and GDPR.