What we learned by analyzing 42 million Illinois traffic stop records

 

One of the more polarizing debates in America is whether police treat Black people fairly. 

In a Pew Research survey this year, 42% of U.S. adults said police do a good or excellent job “treating racial or ethnic groups equally.” Among Black respondents, the number dropped to 12%. 

A Gallup survey last year showed 50% of U.S. adults believed “major changes are needed to make policing better;” 72%  of Black respondents agreed. 

Illinois offers a wealth of information on this topic, due to a 2003 law that mandates data collection on traffic stops. The sponsor of that legislation, then-Illinois State Sen. Barack Obama, hoped the program would create transparency and repair trust in police.

On the 20th anniversary of the law’s passage, WBEZ, in partnership with the Investigative Project on Race and Equity, decided to see how things were working out. Our analysis found growing racial disparities in traffic stops, inconsistent compliance with the law by police and lax oversight by the state, among other findings.

To come up with these findings, we built a comprehensive, first-of-its-kind database of Illinois traffic stops that allowed us to analyze more than 42 million records from 19 years of data that had been submitted by roughly 1,100 police departments across the state.

That was just the beginning. To identify trends across the state, we had to overcome many obstacles: inconsistent record-keeping, missing data and changes in what the state included in the data files from year to year. 

Once we passed those technical hurdles, we used the data to guide our reporting, to identify areas of high traffic stop activity and to tell the stories of the people most impacted by our findings. 

Here’s how we did it.

It started with a simple FOIA

The Illinois Department of Transportation (IDOT) is responsible for keeping records on  state traffic stop data. WBEZ was able to acquire the entire available history of traffic stop data with a simple three-sentence Freedom of Information Act request.

That was the easy part. From that point on, things got a lot more complicated.

IDOT complied with the FOIA request, providing download links to a couple dozen annual data files, many of which topped 2 million rows of data each. 

More data, more problems

Most files included the entire state’s collection of traffic stops for a given year – but in several years, the Chicago Police Department filed its data separately, adding more files to the inventory. In 2007, for example, CPD scattered its data across three different files using different field names in each case. 

The traffic stop law took effect in 2004, but significant revisions in 2007 and 2012 changed what was included in the data. We had to account for those changes when combining the files so we could see broader historical trends over time. 

IDOT maintains three “data dictionaries” – one for each major revision in the law – to explain the names of the data fields and the coded values associated with them.

Each year, IDOT publishes an executive summary explaining the research process, along with detailed tables covering statewide and agency-level data points. We referred to these documents when trying to understand the state’s methodology and terminology.

We also read up on the findings of the Illinois Criminal Justice Information Authority’s Traffic and Pedestrian Stop Data Use and Collection Task Force.

The right tools for the job

As data journalists, we appreciate spreadsheets, but managing 42.5 million records pushed us way past the limits of Microsoft Excel. We needed database management software capable of not only loading but processing high volumes of information.

We settled on SQLite. It’s fast, simple and can churn through millions of rows of data in seconds. But SQL isn’t great at solving complicated tasks, like cleaning up messy data or analyzing changes over time. For that, we relied on a general-purpose programming language called Python.

We used a software-development framework called Django that combines the power of SQL with the flexibility of Python. Django has proven to be a reliable platform for many data journalism projects over the years, including The Washington Post’s Fatal Force project and The Chicago Reporter’s Settling for Misconduct

We backed up copies of the database regularly to a file repository. This allowed members of the reporting team to collaborate efficiently, knowing we all had access to the latest revisions as the database evolved. We analyzed the data and shared the findings in Google Docs, showing our work with code snippets written in SQL’s namesake structured query language. 

For more complicated analyses, we wrote Python code, using Django’s object-relational mapper to automate large numbers of SQL queries. We stored the most critical code, used to build the database and come up with our findings, in a GitHub repository so we had a clear record of how we got every data-driven finding in our stories.

Missing info

A 42-million record database may appear to be comprehensive, but we found plenty of holes in the information IDOT keeps on file. Here are some highlights:

Noncompliant agencies - The share of police departments that don't report traffic stops to the state has grown over time. Last year, the data show, 1 in 5 law enforcement agencies failed to submit their traffic stop data as required by the law. Some missed the state’s deadline or submitted incomplete data.

Incomplete data - The Cook County Sheriff, for example, has partial years of data on file for 2018 and 2021, and unusually low stop counts in other years. The sheriff's office said  it submitted data to the state for 2021 but that it wasn't processed or included in IDOT’s annual traffic stop report. We couldn’t find it.

Another example: The Chicago Office of Emergency Management and Communication records more than 100,000 instances of Chicago police traffic stops than CPD reports to the state, according to an investigation by Injustice Watch and Block Club Chicago earlier this year. WBEZ verified the Chicago police data sent to the state for 2022 has nearly 150,000 fewer records than OEMC’s log of traffic stops. 

One traffic stop. West suburban La Grange submitted one single traffic stop to the state for last year. In 2021, it submitted no data at all. 

Village officials told us that staff turnover and computer problems caused the oversight, but neither La Grange nor IDOT officials could explain how the police department was able to submit one single traffic stop – which technically made the village compliant with the state study. 

In a statement, an IDOT spokesperson described La Grange’s situation as “an oversight” and added, “We are not aware of any other agencies that submitted just one report.” But WBEZ identified 11 total agencies that reported just one traffic stop in 2022, and 155 instances of agencies reporting one stop per year since the program started.

Estimates and reality checks

IDOT hires a consultant to estimate demographics of driving populations for each police jurisdiction in Illinois. The consultant, Seattle-based Mountain-Whisper-Light, provides a detailed methodology for its statistical analysis – but the results seem far out of line with U.S. Census data.

For example, the consultants estimated the driving population of Illinois is 21% Black – even though the adult population is less than 14% Black.

These demographic estimates may not get as much attention as the actual traffic stop data, but they serve an important purpose. To the extent state officials are analyzing police agency traffic stop data, they are supposed to use the consultant’s estimates for assessing whether racial disparities exist.

In other words, an increase in the estimated number of Black drivers on the road reduces the significance of more Black drivers getting stopped.

As recently as 2020, Mountain-Whisper-Light estimated just under 14% of Illinois drivers were Black, which closely matches U.S. Census data. It added half a million Black drivers to its 2021 estimates. In the 2023 report, the consultant noted that it changed its methodology in the last couple of years to include a statistical analysis of not-at-fault drivers in vehicle traffic crash reports, saying that it provided a more accurate representation of the driving-age population. 

WBEZ requested the estimated driving populations for each jurisdiction in spreadsheet form, which IDOT denied, claiming repeatedly that the “data requested is not compiled in a spreadsheet.” Months later, IDOT acknowledged that the data did in fact exist and provided several years’ worth of estimates in spreadsheet files.

The big-picture findings

Ultimately, we were able to compile a reliable database, given all the constraints mentioned above. As we explored the complete archive of records, we found the racial gap has been widening. In the last two years, stops involving Black drivers have topped 30.5% of all traffic stops statewide, up from 17.5% in 2004, the first year data was released. The state’s adult population is 13.6% Black.

Our analysis also revealed that the problem is statewide. In Chicago, stops of Black drivers in 2022 were more than four times that of white drivers, even though the city has a larger white adult population. In the rest of the state, Black drivers make up 9.5% of the adult population but 21.5% of all traffic stops outside of Chicago.

On the ground reporting

At a certain point, you get as far as the data can take you. Then you need to take your reporting into the real world and talk with the people most impacted.

The data pointed us in the right direction. We searched for the Chicago police beat that produced the most activity since CPD began shifting its strategy towards traffic stops in 2016.

That’s how we found District 15, Beat 33 in the South Austin neighborhood. On a Saturday morning in September, my colleague Michael Liptrot and I parked near the intersection of Madison Street and Laramie Avenue and observed five traffic stops in 25 minutes. A few took several minutes but most were over as soon as the officer checked the drivers’ identification.

“Immediate fear”

We asked Black drivers what it’s like to get pulled over in one the city’s most heavily patrolled neighborhoods.

Going into those interviews, we knew from the data that in 2022, for the first time on record, more than half of Black drivers statewide were stopped for non-moving violations, like talking on the phone, not wearing a seatbelt or expired tags. 

Legal experts say these encounters are potentially “pretextual stops,” where low-level traffic violations are used as an excuse to make contact with drivers. The fivefold increase in the number of Black drivers stopped for non-moving violations and let go with a warning only raised more questions about the intent of the stops. 

While millions of the encounters went nowhere, drivers said that they still cause harm.

“For lack of a better word, it’s fear,” Edward Robinson said while wiping down his car at a South Cicero Avenue car wash. “Immediate fear. Because you just don’t know what the outcome gonna be, no matter how straight and narrow you are.” 

He explained how he had been stopped and searched, with his son in the back seat, just the day before.

“It’s this color, it’s because I’m this color,” Robinson added. “And the history behind that. I always got that feeling when I’m pulled over.”

Working in the open

In addition to our stories reporting on the findings of our investigation, WBEZ and the Investigative Project are working to make our data and research as accessible as possible.

To start, we’re publishing an interactive database that provides a view of traffic stop patterns in each of the 1,000 Illinois law enforcement agencies. Readers can see at a local level how the number of stops – and racial breakdowns – have changed over the past two decades.

We also have a survey for readers to share their experiences with traffic stops, which will help our future reporting.

And we’ve released the entire 42-million-row database on the Big Local News platform at Stanford University. This means journalists, researchers, policymakers and community groups will have access to this first-of-its-kind resource. 

We plan to keep these resources updated when the state releases new data annually.

Along with the data, we’re open-sourcing the code we used to build and analyze it. This level of transparency makes the information more accessible and helps ensure accuracy.

And we’re planning to roll out some training materials and events that will help make sense of the data for those who are interested. We hope to serve a wide range of technical expertise.

Credits

This project relied on the support of many professionals working behind the scenes, including: Alden Loury, Amy Qin, Andjela Padejski, Angela Caputo, Claire Kurgan, Dillon Kelley, Jessica Alvardo-Gamez, Jim Ylisela, Justine Tobiasz, Manuel Martinez, Matt Kiefer, Leslie Hurtado, Michael Liptrot, Noah Jennings, Ola Giwa, Pat Nabong, Patrick Smith, Saman Creel, Taylor Moore, Tenaysia Fox, Tyler Pasciak LaRiviere and Zahid Khalil. 

 
 
Matt Kiefer