Labor Day is coming up, and here at MIOsoft, we have Accident Explorer on the brain again. If we’re not driving for the holiday weekend ourselves, we know someone who is.
In our last post, we looked at 15 potentially-significant clusters of fatal vehicle accidents throughout the US, using all of the NHTSA’s Fatality Analysis Reporting System data from 2008-2014.
We tried to distribute our picks throughout the country... but with only 15 picks over 3.8 million square miles, there were huge areas of the US where we didn’t identify a (relatively) nearby 2008-2014 cluster.
Since the 2008-2014 Accident Explorer version isn’t publicly available, we wanted to wrap up this chapter of the Accident Explorer updates by identifying a potentially-significant accident cluster, and therefore a potentially dangerous place to drive, in every state.
Check it out below: just click on a state to view that state's accident cluster.
You’ll probably notice that some states’ featured clusters are smaller than others.
We strongly suspect that this is due to population density factors, and would be the case even if our road name data (and all the other data) was perfect.
But a really key difference in how we approached this post is that we actually tweaked the clustering algorithm before we looked for these clusters.
In the live Accident Explorer, a cluster can have accidents with different road names up to 0.125mi from any other accident in the cluster. Accidents with the same road name can be up to 0.25mi apart. Read more details about the algorithm here. As of May of this year, clusters are created only over three-year intervals.
For our previous post, we used the same algorithm as the live Accident Explorer, minus the time restriction. Read more here.
But for this post, we created a version of Accident Explorer where the clustering algorithm only uses one ε, of 0.125mi, and all accidents in the cluster must have the same road name.
So how did we make a pick for each state?
To start, we decided not to automate it: all our picks were a judgment call. We think we had good reasoning, but understanding the limitations of your process is key to understanding any data-driven project.
We looked at the cluster data (not the map) for each state, then picked a cluster that had some combination of size, recency, and longevity that suggested to us that the cluster was significant in some way.
We did try not to duplicate our picks from the previous post, although for Maine and South Carolina, we ended up making the same choice.
But you’ll notice that because of our change to the algorithm, the Maine and South Carolina clusters each have one fewer accident than they did in our previous post.
That’s because the Maine accident on April 17, 2010 and the South Carolina accident on June 23, 2014 are both more than 0.125 miles from any other accident in the cluster.*
Finally, if you’ve been following Accident Explorer closely, you might notice that there’s one other cluster in this list that we’ve (sort of) encountered before: the cluster in Elizabeth, NJ.
This area is where, in the very first Accident Explorer post, we saw the effects that different names for the same road had on clustering.
In that post, a group of accidents that were in physical proximity weren’t clustered in the Accident Explorer app due to road name differences (US-1 vs Spring).
For the first post, we only had 2010-2013 data. Since then, we’ve added data for 2008, 2009, and 2014. As a result, there’s now a cluster in Elizabeth for 2011-2013 and 2012-2014 in the live Accident Explorer.
But here’s the data for the cluster that we showed in the map above:
If you’ve really been following closely, you might notice that one of the US-1 accidents from that first post isn’t there:
This is because, once again, that accident is too far away to be in the cluster, according to the new algorithm.
So there are three examples—from Maine, South Carolina, and New Jersey—of how a relatively minor change to our Accident Explorer algorithm can change a cluster!
For these three clusters, it’s a fairly minor change. But imagine how the main Accident Explorer app might change—especially the smaller clusters—if we implemented the new algorithm there.
(We’re not going to do that… at least not in the near future.)
This is a great example of why, for any data-driven project that you’ll use to make decisions, it’s so important to understand what your project is doing and how it works.
Imagine if you were using Accident Explorer to drive decisions for your state’s DOT,** but you didn’t know how we’d defined clusters. Or about the potential for the differing road names in the FARS data to affect the clustering. Or whether transitive matching was being used, or not.
You’d have data to inform your decision, sure. But in a way, you’d still be flying blind. You wouldn’t know the factors that went into clustering the data, so you couldn’t frame your consideration of the results to take those factors into account.
If you wanted to make changes to the algorithm, you wouldn’t even know what kinds of changes to ask for.
That’s why we’re not fans of the black-box approach to data.
For more insight into Accident Explorer, our clustering algorithm, and Accident Explorer’s data, see our previous posts:
15 of the worst places to drive in the United States
Accident Explorer gets a turbo-boost
Accident Explorer: Machine learning with traffic accident data
*We also discovered that we’d accidentally cropped the April 17, 2010 accident out of the image for Maine in the previous post, so if you’re comparing the screenshots, that’s why you don’t see the missing accident.
**Just to be clear: Accident Explorer is a demo for our technology. It is not a decisionmaking tool. Accident Explorer and its information is provided “as is” without warranty of any kind, express or implied, including but not limited to any warranty of fitness for a particular purpose. If you're with a DOT or any other organization and you want an Accident-Explorer-like project to make decisions with, don't use the Accident Explorer demo; email us instead!