Anti-racism, algorithmic bias, and policing: a brief introduction
This post originally appeared on Medium.
Recently I’ve been interested in various questions relating to anti-racism, algorithmic bias, and policing.
What does anti-racist policing look like?
What do we mean by algorithmic bias and algorithmic fairness?
How can data science and machine learning practitioners ensure they are being anti-racist in their work?
Traditionally the purpose of policing has been to ensure the everyday safety of the general public. Often this has involved police forces responding to reports of suspected criminal activity. However, we may be entering a new age of policing. New technologies, including traditional data analysis as well as what might be called machine learning or AI, allow police forces to make predictions about suspected criminal activity that have not been possible until now.
We may be in a period of time where new technological developments have advanced at a faster rate than that of the regulation necessary in order to ensure the use of these technologies is safe. I think of this as the ‘safety gap’ or the ‘accountability gap’.
I hope to answer these questions relating to anti-racism, algorithmic bias, and policing, and introduce you to thinking about these issues relating to safety and accountability, using a few recent examples.
In July, MIT Technology Review published an article titled “Predictive policing algorithms are racist. They need to be dismantled.”
This article tells the story of an activist turned founder called Yeshimabeit Milner, who co-founded Data for Black Lives in 2017 to fight back against bias in the criminal justice system, and to dismantle the so-called school-to-prison pipeline.
Milner’s focus is on predictive policing tools and abuse of data by police forces.
According to the article, there are two broad types of predictive policing algorithm.
Location-based algorithms, which work by using places, events, historical crime rates, weather conditions, to create a crime ‘weather forecast’, e.g. PredPol, used by dozens of city police forces in the US.
Person-based algorithms, which work by using age, gender, marital status, history of substance abuse, criminal record, to predict if a person has a high chance of being involved in future criminal activity, e.g. a tool called COMPAS, used by jurisdictions to help make decisions about pretrial release and sentencing, which issues a statistical score between 1 and 10 to quantify how likely a person is to be rearrested if released.
There are a number of general problems with using predictive algorithms that these tools have to try to overcome. For example, naive predictive algorithms are easily skewed by arrest rates.
If a social group, for example, young Black men in the US, have systematically higher rates of arrest, even if this is biased to begin with, then using that biased data to train a predictive model ‘bakes in’ that bias into future predictions.
From the article:
Though by law the algorithms do not use race as a predictor, other variables, such as socioeconomic background, education, and zip code, act as proxies. Even without explicitly considering race, these tools are racist.
Another problem is the training data: some models were trained on non-representative samples of the population, for example, white-majority areas in Canada. Applying inferences learned from these samples to the general population can be problematic.
From the article:
Static 99, a tool designed to predict recidivism among sex offenders, was trained in Canada, where only around 3% of the population is Black compared with 12% in the US. Several other tools used in the US were developed in Europe, where 2% of the population is Black. Because of the differences in socioeconomic conditions between countries and populations, the tools are likely to be less accurate in places where they were not trained.
Why is there a push towards the use of these tools?
There are many possible reasons, including budget cuts, and the belief that they are more objective than humans at predicting future criminal activity.
For decades, risk assessments have been used and seen to act to reduce bias in policing, and only in the last few years has this strong claim come under more scrutiny.
Another problem is the use of ‘calls to police’ as training data rather than arrest or conviction data, which is more likely to be biased, as it is generated earlier in the process and more dependent on the subjective judgement of who made the call.
It’s also often not clear what tools are being used.
“We don’t know how many police departments have used, or are currently using, predictive policing,” says Richardson.
For example, the fact that police in New Orleans were using a predictive tool developed by secretive data-mining firm Palantir came to light only after an investigation by The Verge. And public records show that the New York Police Department has paid $2.5 million to Palantir but isn’t saying what for.
It is no great surprise that with so many salient issues with predictive policing systems that they have attracted so much attention.
In June, Nature published an article titled “Mathematicians urge colleagues to boycott police work in wake of killings”.
Nature reported that, as of June, more than 1400 researchers had signed a letter calling on mathematicians to stop working on predictive policing algorithms and other policing models.
You can read the letter for yourself here.
In light of the extrajudicial murders by police of George Floyd, Breonna Taylor, Tony McDade and numerous others before them, and the subsequent brutality of the police response to protests, we call on the mathematics community to boycott working with police departments.
In places it focuses on PredPol, linking to articles from The Verge, Vice, MIT Technology Review, and the New York Times.
Given the structural racism and brutality in US policing, we do not believe that mathematicians should be collaborating with police departments in this manner. It is simply too easy to create a “scientific” veneer for racism. Please join us in committing to not collaborating with police. It is, at this moment, the very least we can do as a community.
We demand that any algorithm with potential high impact face a public audit. For those who’d like to do more, participating in this audit process is potentially a proactive way to use mathematical expertise to prevent abuses of power. We also encourage mathematicians to work with community groups, oversight boards, and other organizations dedicated to developing alternatives to oppressive and racist practices. Examples of data science organizations to work with include Data 4 Black Lives (http://d4bl.org/) and Black in AI (https://blackinai.github.io/).
As well as urging the community to work together on theses issues, it recommends a public audit of any algorithm with “potential high impact”.
Nature’s discussion is useful because it brings in responses from the PredPol chief executive as well as those familiar with the letter.
This includes the remarkable claim that there is “no risk” that historical biases reflected in crime statistics would affect predictions(!)
MacDonald argues, however, that PredPol uses only crimes reported by victims, such as burglaries and robberies, to inform its software. “We never do predictions for crime types that have the possibility of officer-initiated bias, such as drug crimes or prostitution,” he says.
Meanwhile, academics interested in assessing the effectiveness of PredPol at achieving its purported goals have found mixed evidence.
Last year, an external review that looked at eight years of PredPol use by the Los Angeles Police Department in California concluded that it was “difficult to draw conclusions about the effectiveness of the system in reducing vehicle or other crime”. A 2015 study published in the Journal of the American Statistical Association and co-authored by the company’s founders looked at two cities that had deployed its software, and showed that the algorithms were able to predict the locations of crimes better than a human analyst could.
However, the article goes on to report that a separate study by some of the same authors found no significant statistical effect.
In the UK, while things are a little bit different, we still appear to be following the trend towards greater use of technology in police forces set by the US, if a little bit further behind.
In September 2019, the Royal United Services Institute — a top Whitehall think tank focused on the defence and security sector, including the armed forces, published a report titled “Data Analytics and Algorithmic Bias in Policing”. It is an independent report commissioned by the UK government’s policy unit for data ethics, the Centre for Data Ethics and Innovation.
The report had a few key findings.
Multiple types of potential bias can occur: there is unwanted discrimination, real or apparent skewing of decision-making, outcomes and processes which are “systematically less fair to individuals within a particular group”
Algorithmic fairness is not just about data: it is important to consider wider operational, organisational, legal context
A lack of guidance: there is a lack of organisational guidelines or clear processes for scrutiny, regulation, and enforcement for police use of data analytics
In statistics, making predictions about groups are generally more valid than predictions about individuals. With a good data set, you can generally make authoritative statements about some phenomenon in the aggregate, even if not for individual members of the statistical population.
It can be quite risky to use non-representative data sets to make inferences about individuals. Presumably, this risk is even higher when the algorithms used (e.g. black-box ML algorithms) and their causal inference mechanisms are not well understood.
One of the things that machine learning is really terrible at is predicting rare and infrequent events, especially when you don’t have loads of data’. With this in mind, the more infrequent the event the tool is trying to predict, the less accurate it is likely to be. Furthermore, accuracy is often difficult to calculate, because when an individual is judged to pose a risk of offending, an intervention is typically delivered which prevents the predicted outcome from happening. Authorities cannot know what may have happened had they not intervened, and therefore there is no way to test the accuracy (or otherwise) of the prediction.
In England and Wales, a small number of police forces use ML algorithms to assess reoffending risk, to inform prioritisation and assist decision-making.
These include Durham, Avon and Somerset (i.e. Bristol), West Midlands (i.e. Birmingham), and Hampshire. These might be the most technologically advanced police forces, the ones with the most budget, or something else.
Interviewees for the report raised similar concerns as found in the other article, namely that if the training data is police interactions as opposed to criminal convictions, then “the effects of a biased sample could be amplified by algorithmic predictions via a feedback loop”.
The report is not shy when it comes to pointing out proposed weaknesses in the entire predictive policing approach.
There is a risk that focusing on ‘fixing’ bias as a matter of ‘data’ may distract from wider questions of whether a predictive algorithmic system should be used at all in a particular policing context.
There are also concerns raised about human rights, if not examined in detail. It was considered relevant but outside the scope of the report to assess the legal basis for use of these tools in relation to Article 2 of the European Convention on Human Rights (i.e. “right to life”). Presumably, there is a chance that any continued use of technologies that breached international laws would carry a legal risk for the operators of those technologies, including governments.
Most government reports end with recommendations for better cooperation, or better regulation, or something of that kind, just as most academic articles indicate a need for further research.
It is not surprising then that the recommendation of the report is a new code of practice for algorithmic tools in policing, specifying clear roles and responsibilities for scrutiny, regulation, and enforcement. There is a call to establish standard processes for independent ethical review and oversight to ensure transparency and accountability.
This recommendation is similar to the demands made by the writers of the letter. We need public auditing of ML algorithms, especially when they are likely to have an impact on people’s lives.
I originally wrote this as a talk delivered as part of a session by my employer on anti-racist approaches to design. I finished here, but in the time since preparing the slides, I discovered relevant news articles that illustrate how much of a fast-moving space this is.
In August, BBC News published “Home Office drops ‘racist’ algorithm from visa decisions”.
At a similar time, the Joint Council for the Welfare of Immigrants published “We won! Home Office to stop using racist visa algorithm”, telling the same story of a visa processing algorithm used by the Home Office.
I would recommend fully reading both stories for yourself. A green-amber-red ‘traffic light’ system was used to categorise visa applicants by level of risk. This risk model included nationality, and FoxGlove (a tech justice organisation) alleges that the Home Office kept a list of ‘suspect nationalities’ which would automatically be given a red rating.
It was legally argued that this process amounted to racial discrimination under the Equality Act.
From Friday 7 August (today’s date of writing), it was announced by Home Secretary Priti Patel that the Home Office will suspend the ‘visa streaming’ algorithm “pending a redesign of the process”, which will consider “issues around unconscious bias and the use of nationality” in automated visa applications.
Without revealing too much, my work in public sector tech means that I know a few colleagues involved in projects close to this one, even if not quite the same thing.
I think it’s relevant to keep in mind how much of public sector technology, particularly defence and security, is contracted out to external suppliers. As we saw earlier, many leaders of police departments in the US do not even know for sure what technologies they use, because the details of how contracting arrangements are made are so often under most people’s radar.
However, I don’t think that claiming ignorance is a defence, if indeed these algorithms are truly having a bad impact on people’s lives as found by a legal case heard in court. And even before a case reaches a court, it is incumbent on the operators of these technologies to use them responsibly. Public audit of the kind already discussed would surely help towards that goal.
I can now return to the questions I started with.
What does anti-racist policing look like?
I think it looks like a police force that is committed to public safety, wellbeing, especially covering the issues raised by the Black Lives Matter movement.
What do we mean by algorithmic bias and algorithmic fairness?
Algorithmic bias, or bias in data analysis and machine learning, can arise from a number of places, including non-inclusive data sets, or an issue with the data analytical or statistical process. Algorithmic fairness can be achieved when data tools can be audited in public for how they contribute to the fairness of society. With a model of justice as fairness, this means that bringing about algorithmic fairness can also contribute to greater social justice.
How can data science and machine learning practitioners ensure they are being anti-racist in their work?
While practitioners might think they operate in a space that exists independently of decision-makers or policymakers, I don’t think this is the case. Even technical specialists have a voice to use. Pledging to not work on damaging tech projects, or with organisations with a bad track record of damaging others, might be a good way forward.
These organisations are all doing great work on predictive policing and technology ethics, I would recommend staying up to date with their work if you have an interest in this area.
Data for Black Lives (Instagram, Twitter)
AI Now Institute at New York University
Partnership on AI — 100+ partners across academia, nonprofits, business including Amazon, Apple, Facebook, Google, Microsoft
Further reading
Centre for Data Ethics and Innovation. (2020). What next for police technology and ethics?
Department for Digital, Culture, Media & Sport. (2018). Data Ethics Framework.
Heilweil, Rebecca. (2020). Why algorithms can be racist and sexist. Recode.
National Police Chiefs’ Council. (2020). Digital policing.
Partnership on AI. (2019). Report on Algorithmic Risk Assessment Tools in the U.S. Criminal Justice System.
Richardson, R., Schultz, J. M., & Crawford, K. (2019). Dirty data, bad predictions: How civil rights violations impact police data, predictive policing systems, and justice. NYUL Rev. Online, 94, 15.
Vincent, James. (2020). AI experts say research into algorithms that claim to predict criminality must end. The Verge.
West, S. M., Whittaker, M., & Crawford, K. (2019). Discriminating systems.