Oleh Sekirkin

Oleh Sekirkin

Business Professional & Data Enthusiast

Philadelphia, Pennsylvania, United States

Motor Vehicle Collisions

Python Folium Statistical Testing Tools Data Analysis

In this article, we'll explore an extensive Exploratory Data Analysis (EDA) project focused on Motor Vehicle Collisions. The project aims to uncover patterns, trends, and insights from crash data, which can be invaluable for improving road safety and informing policy decisions. We'll walk through each step of the analysis, explaining the methods used and the findings uncovered, before concluding with key insights and recommendations.

    1. Data Understanding and Preparation
    2. Descriptive Statistics
    3. Geospatial Analysis
    4. Injury and Fatality Analysis
    5. Contributing Factor Analysis
    6. Vehicle Type Analysis
    7. Conclusion, Key Insights and Recommendations

1. Data Understanding and Preparation

The first step in any data analysis project is to understand and prepare the data. This phase involved loading the dataset using pandas and examining its structure, checking for missing values and deciding on appropiate handling methods, verifying data ypes and converting them if necessary, identifying and handling outliers or anomalies and creating derived features (.e.g, day of week, month, year from crash date). Missing numerical values were filled with the median, and missing categorical values were filled with the mode.

2. Descriptive Statistics

The next phase involved calculating summary statistics and generating visualizations to understand the basic characteristics of the data, which included calculating summary statistics for numerical columns, generating requency distributions for categorical variables and analyzing temporal patterns (such as daily, weekly, monthly and yearly trends).

What day is people having more accidents?
Crash Frequency Analysis
What month?
Temporal Distribution
And what year? (2024 ongoing data)
Weekly Patterns
Heatmap of Crashes by Day/Hour
Seasonal Distribution
3. Geospatial Analysis

Given the geographical nature of the data, a geospatial analysis was conducted, which involved the creation of a heatmap crash location using latitude and longitude data, analysis of crash distribution by borough and zip code, and identifying high-risk areas or intersections.

Here I used Folium to create interactive maps, including a heatmap layer to visualize crash density and identify areas with hih crash concentrations, which could be prioritized for safety interventions. Here is a picture of how the map looks like around Central Park.

Crash Density Heatmap - Central Park Area
Central Park Crash Heatmap
4. Injury and Fatality Analysis

A critical aspect of crash analysis is understanding the severity of incidents, this section here focused on comparing injury and fatality across different categories (pedestrians, cyclists, motorists) and analyzing the relationship between injury severity and other factors (time of day, location, contributing factors). The analysis included creating severity categories and visualizing their distribution across various factors. Statistical tests, such as chi-square and ANOVA, were used to validate the significante of observed differences.

Chi-square test for Severity vs. Borough: p-value = 2.5002712420019383e-61

ANOVA test for Severity vs. Hour of Day: p-value = 0.001110825445747302

This extremely small p-value (Chi-square test), much less than the common significance level of 0.05, suggest there is strong evidence of a statistically significant association between the severity of incidents and the borough where they occur. In other words, the severity of incidents is not randomly distributed across boroughs, but there are meaningful differences in severity patterns among different boroughs.

Talking about the ANOVA test, the p-value is also smaller than the typical significance level of 0.05, indicating that there is a statistically significant relationship between the severity of incidents and the hour of the day when they occur. This suggest that the time of the day has a meaningful impact on the severity of incidents.

What borough is seeing more accidents?
Severity by Borough
What hour of the day is having what type of accidents?
Hourly Severity
What borough is having what type of accidents?
Severity Distribution

In both cases, we reject the null hypothesis of no association. These results imply:

1) The borough where an incident ocurrs, is related to its severity. Some boroughs may have more severe accidents than others.
2) The hour of the day when an incident occurs is related to its severity. Certain times of day may be associated with more severe incidents.

These findings could have important implications for resource allocation, policy-making and targeted interventions in different boroughs and at different times of the day.

Who is getting in more and worse accidents?
Time Severity Analysis
5. Contributing Factors Analysis

Understanding what contributes to crashes is crucial for prevention. For this part of the data analysis I identified the most common contributing factors, analyzed how contributing factors vary by location, tiem, or injury severity, and examined interactions between multiple contributing factors.

Why is people having accidents?
Contributing Factors
The why and where of the accidents
Factor Distribution
The why and when of the accidents
Factor Correlation
The why and how bad is the accident
Factor Impact
6. Vehicle Type Analysis

Different vehicle types may have different crash patterns and severity outcomes. This section explored the crash patterns by vehicle type, and whether certain vehicle types are associated with higher injury or fatality rates.

The analysis included visualizing the distribution of vehicle types involved in crashes and examining how crash patterns varied by time of day for different vehicle types. Statistical tests were performed to determine if differences in injury and fatality rates across vehicle types were significant.

Vehicle Type Distribution
Vehicle Distribution
Vehicle Crash Patterns
Crash Patterns
Vehicle Type Impact Analysis
Impact Analysis
7. Conclusion, Key Insights and Recommendations

Based on the data analysis conducted, several key insights emerge:

    1) Temporal patterns: crashes show distinct patterns by time of day, day of week, and season. Understanding these patterns can help target safety measures more effectively.

    2) Geographic hotspots: The geospatial analysis revealed areas with high crash concentrations. These hotspots should be prioritized for safety interventions.

    3) Contributing factors: certain factors consistenyl contribute to crashes across different contexts. Addressing these common factors could have a significant impact on overall crash reduction.

    4) Vehicle type impact: differente vehicle types show varying patterns in term of crash frequency and severity. This suggests that safety strategies may need to be tailored for different vehicle types.

    5) Severity variations: crash severity varies with factors like time of day and location, understanding these variations can help emergency services prepare for potential severe crashes.

Based on the insights gained from the analysis, here are some recommendations for improving road safety in New York City:

    1) Targeted interventions: focus safety measures on identified hotspots and during peak crash times. This could include increased traffic enforcement or improved signage and road design in these areas and times.

    2) Education campaigns: develop targeted education campaigns addressing the most common contributing factors. These campaigns should be tailored to different road user groups (drivers, cyclists, pedestrians) and vehicle types.

    3) Vehicle-specific measures: implement safety measures specific to vehicle types associated with higher crash rates or severities. This could include stricter licensing requirements or additional safety features.

    4) Time-based strategies: adjust traffic management strategies based on the time of day, day of week, and seasonal patterns identified. This could involve dynamic speed limits or lane allocations.

    5) Data-driven policy: use the insights from this analysis to inform policy decisions. Regular updates to this analysis can help track the effectiveness of implemented measures over time.

    6) Emergency response optimization: use the severity analysis to optimize the allocation of emergency response resources, ensuring faster response times for potentially severe crashes.

    7) Fresh research: conduct more in-depth studies on the specific factors contributing to crashes in high-risk areas or involving high-risk vehicle types.

By implementing these recommendations and continuing to analyze crash data, we can work towards significantly reducing the frequency and severity of motor vehicle collisions, ultimately saving lives and improving road safety for all users.