The Dangers of Automating Police Investigations

Human reasoning can be flawed and anything built on top of that reasoning will, by definition, also be flawed. As we head full speed into an AI / Machine Learning future – ready or not – I couldn’t help but notice a recent article regarding the creation of a pattern recognition tool called Patternizr by the NYPD to help with the investigative leg work. The goal of the program is to help the NYPD cross reference crimes across precincts, look for patterns in crimes, and help to dramatically speed up the process of finding connections. The data set apparently goes back ten years.

It makes complete sense, provided that the underlying historical data that is being used to “train” the model(s) are as free of bias as possible. In addition, the code of Patternizr must also be free of bias. This is a very real concern. As someone who has had to review other people’s source code, I can tell you that there always needs to be a code review. In fact, Microsoft© has a dedicated team who goes through the source code for their various products to rename and remove offensive variable names and comments before that source code is released.

We’ve read the stories of how flawed the AI / Machine Learning algorithms from Google© and others have been. While those are serious enough issues, it takes on a whole new urgency when we talk about law enforcement utilizing such tools to partially automate the investigative process, as that information will then be handed to prosecutors.

I have concerns and a few questions:

Who has vetted the source code for Patternizr?
Who has vetted the underlying datasets? Remember, garbage in, garbage out.
How closely will the system’s results be monitored?
Does the system take into account court rulings that have, in effect, negated previous investigations as they were found to be unconstitutional or in some other way incorrect?
Does the system make available, upon request and instantaneously, the sources from which it makes its assumptions?
If, at year two, some of the source material is found to be incorrect or not appropriate to be part of the system learning process, will the results from year one be automatically reviewed and shared with all relevant parties – including anyone currently serving a custodial sentence based upon the flawed information?
Will this system make detectives less capable and overly reliant on the system?
Has the system been pentested to check for vulnerabilities?
Is Patternizr tied into multiple NYPD / Federal data sources?
If so, what is the likelihood of a catastrophic data or systems breach via Patternizr?
Is the system accessible from the Internet?

Don’t get me wrong. I totally understand the need for this type of system. However, the lack of transparency from NYPD is more than a little disturbing. More communication from NYPD and City Hall would go a long way to make the public feel more at ease with this system.