Mapping Oakland Police Use of Force

I was looking for some easy data to try some new map visualization techniques and came upon the City of Oakland’s website which includes a portal to Oakland Police data including crime statistics, 911 response data, and use of force. Taking a stab at the latter, I found a year-by-year repository of data covering what the OPD defines as “physical or mechanical intervention used by officers to defend against, control, overpower, restrain, or overcome the resistance of an individual“.

They break down these “Interventions” into four main tiers, each described on the data download page. I was curious what the data would reveal and how it might lend itself to mapping and other visualization methods, so I downloaded the 2024 dataset and began poking around.

What does the 2024 Oakland Police Department Use of Force data include?

I uploaded the data to a Google sheet for initial analysis and found almost 5000 rows of incidents, with columns pointing to:

Where the incident took place (police beat, police area street and block).
When the UoF occurred.
Force level, type and description.
The “subject’s” race, gender and age.
The officer’s race, age, gender and years in service.
The “Presence reason”, or why the police were there in the first place.
The actions preceding the use of force.
If the subject was arrested, interviewed, required medical attention, hospitalization, or died, and if there were visible signs of injury.

Much of the data relies on OPD self-reporting, for example, if the use of force was reasonable (spoiler: the vast majority all were deemed reasonable), or whether there was deescalation or verbal persuasion involved, which may or may not have been the case.

Processing the data in Kaggle

Next, I imported the data into Kaggle to start cleaning and visualizing some initial trends using pandas, matplotlib and seaborn. It is here where you can see the final result, but that took a bit of wrangling and figuring out how best to show the basics of the data: which police beats involve the most incidents of use of force.

It became clear that beat 19x was by far the most “Active” in terms of police violence, justified or not. But where was this? I realized I needed to find a map of police beats that I could use to match with the Use of Force data, and after some searching, I found a source of truth I could more or less be confident of using: the City of Oakland’s data portal for Community Policing Beats page, with a downloadable dataset in .csv format. The site claims to have been updated in 2024, so good enough for me, since that is the year the data I am working with is referencing.

I uploaded it to Kepler.gl to see if the shapefile data was valid and ready to map. Kepler is a lightweight service for mapping data sets that doesn’t require a sign-on or any account creation or fees, which is not so good if you want to build a library of maps hosted with them, but great if you want something that is ready to go with many customizations and decent amount of features. You can export your map into JSON, or share it as an HTML page or image. With once quick upload, it was all there and ready to hover. As an Oakland resident, I don’t know if the beat borders match the real ones, but they did match, at least on visual inspection, what the official City of Oakland had on their website.

Map of Kepler.io rendition of shapefiles

Combining Shapefile data with Use of Force pandas Data Frame

The beat boundary data was in CSV format, with one column (the_geom) containing WKT (Well-Known Text) representations of polygons. The Python Shapely library has a module called wkt (Well-Known Text), which I learned could handle the job of making MULTIPOLYGON data into a true shapefile that GeoPandas could recognize.

from shapely import wkt

beat_df['geometry']= beat_df['the_geom'].apply(wkt.loads)
gdf = gpd.GeoDataFrame(beat_df,geometry='geometry', crs="EPSG:4326" )

The original geometry (the_geom) had been in string format, so I added a column with the reformatted data and created a new Geo dataframe object that could then be combined with the Use of Force dataframe. Also, setting the coordinate reference system (CRS) to EPSG:4326 ensures the data plots correctly on web-based maps.

The Use of Force dataframe has multiple incidents associated with each beat, so I knew I had to calculate new totals for each beat for total number of incidents, hospitalizations, racial breakdown and more, in order to show them in a tool tip on a map.

summary = uof_df.groupby('Beat').agg(

    TotalIncidents=('UoFNumber', 'count'),
    ForceLevel=('ForceLevel', force_level_count),
    TopForceType=('ForceType', top_force_type), 
    TopRaceGroups=('SubjRace', top_race_groups),
    MedicalReq=('SubjMedicalReq', lambda x: (x== 'Yes'). sum()),
    Hospitalized=('SubjHospitalized', lambda x: (x=='Yes').sum()),
    Deaths=('SubjDied', lambda x: (x=='Yes').sum())
).reset_index()

These totals were grouped by beat then merged with the geo dataframe.

gdf = gdf.merge(summary, on='Beat', how='left')

I then removed the string version of the data and had a version I could download as geoJSON and use elsewhere.

Below is the map made in Kepler.gl, with color fills tied to number of incidents, and yes, beat 19x stands out in stark contrast, which I uploaded to an AWS S3 bucket to share and embed.

Trying out Folium for the first time

Next, I wanted to build a Mapbox version of my with this data and discovered the Folium library, which thankfully is already pre-loaded into Kaggle. Folium uses Leaflet.js under the hood, but lets you work with Python pretty seamlessly.

This Medium article is a good way to get started, and you can do it inside a Jupyter notebook of your choice. For this project, I barely scratched the surface of what you can do with it, but as long as you have a GeoJSON object, you can create live interactive choropleth maps directly inside a notebook.

One thing I had trouble at first was formatting the border width and color. Adding a style_format lambda function was the suggested solution (thanks, ChatGPT).

I wanted to include racial breakdowns for each beat, but the default tool tip did not align my values with the label in way that was clear, so the solution was to build a custom HTML tooltip for each row, and add that to the GeoPandas dataframe as a new column.

The end result is in the bottom of Kaggle notebook, but I’ve also uploaded it to AWS to share below. This version does not have a title, legend or any other labels, or custom attributions, so I’ll add them here:

Oakland Police Department Use of Force by Community Policing Beats, 2024.

Source: City of Oakland

Next steps

Happy with these results, I decided to stop there, but clearly there is much more in the dataset that could be turned into an interactive map. The current map shows number of incidents, but they could also highlight incidents that result in hospitalizations, or show incidents counts of Level 1 and 2, the most severe. It could also include all the years for which data is available.

In the meantime, I will keep Folium as one of my go-to tools for geo-based visualizations.