- Project Overview
- Business Question
- Data Source
- Key Visualizations
- Data Cleaning & Transformation
- Methodology
- Insights and Recommendations
- How to Run the Code
- Project Structure
- Future Work
- Contributors
I aim to analyze aviation accident data to pinpoint the safest aircraft models for a company venturing into aviation. By examining historical safety data, I’ll recommend aircraft that are ideal for commercial and private operations, focusing on key safety metrics like injury severity, accident trends, and flight phase risks. This analysis is designed to provide actionable insights for business decision-making.
"Which aircraft are the lowest risk for the company to start this new business endeavor?"
To answer this, I evaluate:
- Survival rates and injury distributions by aircraft make and model.
- High-risk flight phases with significant injuries.
- Industry-wide safety trends over time.
The analysis is based on the NTSB Aviation Accident Database, a comprehensive dataset available on Kaggle. This dataset includes detailed records of aviation accidents, with essential fields such as aircraft make/model, accident dates, injury counts, and flight phases.
-
Raw Dataset (AviationData.csv)
- Contains detailed aviation accident records with 31 columns and 88,889 rows.
- Includes fields such as aircraft make/model, accident dates, injury counts, and flight phases.
- Not cleaned; requires pre-processing for analysis.
-
Cleaned Dataset (aviation_data_clean.csv)
- A streamlined version of the raw dataset with 18 essential columns and 48,090 rows.
- Contains cleaned and standardized fields for easier analysis.
Field Name | Description |
---|---|
Aircraft_Make | Manufacturer of the aircraft. |
Aircraft_Model | Specific model of the aircraft. |
Total_Uninjured | Total number of people uninjured in the accident. |
Total_Fatal_Injuries | Total number of fatalities in the accident. |
Total_Serious_Injuries | Total number of serious injuries in the accident. |
Total_Minor_Injuries | Total number of minor injuries in the accident. |
Flight_Phase | Phase of flight during the accident (e.g., Takeoff, Cruise). |
Event_Date | Date when the event occurred. |
Purpose_of_Flight | Purpose of the flight (e.g., Business, Personal). |
To tackle the business question and uncover insights, I’ve created these graphs:
-
Box Plot: Total Uninjured by Aircraft Make and Model
- Visualizes survival rates by aircraft type.
- Insight: Identifies aircraft with high survival rates for low-risk operations.
-
Bar Plot: Fatal and Serious Injuries by Flight Phase
- Highlights the riskiest phases of flight.
- Insight: Pinpoints critical phases needing enhanced safety measures.
- Box Plot: Total Uninjured by Aircraft Make and Model
- Bubble Chart: Proportion of Fatalities by Flight Phase
- Line Chart: Number of People Involved in Accidents Per Year
- Interactive Features
- Filters for accident years and flight purpose.
- Ability to visualize data based on different injury types and flight phases.
A custom function was applied to classify accident severity based on injury columns:
def classify_severity(row):
if row['Total_Fatal_Injuries'] > 0:
return 'Fatal'
elif row['Total_Serious_Injuries'] > 0:
return 'Serious'
elif row['Total_Minor_Injuries'] > 0:
return 'Minor'
return 'No Injuries'
aviation_df['Accident_Severity'] = aviation_df.apply(classify_severity, axis=1)
This creates a new column Accident_Severity based on the injury counts.
Missing values in Total_Fatal_Injuries
were filled using Fatal_Injuries
as a reference:
aviation_df['Total_Fatal_Injuries'] = aviation_df['Total_Fatal_Injuries'].fillna(aviation_df['Fatal_Injuries'])
This ensures that we don't lose valuable data due to missing injury values.
-
Data Cleaning:
- Removed missing values from key columns.
- Transformed dates and numerical fields for analysis.
-
Exploratory Data Analysis (EDA):
- Investigated injury severity, accident causes, and trends.
- Grouped data by aircraft make/model and flight phase.
-
Visualization:
- Used Python libraries like Matplotlib and Seaborn for compelling graphs.
- Built Tableau dashboards for interactive exploration.
-
Low-Risk Aircraft:
- Aircraft with high survival rates and low injury counts are ideal choices for acquisition.
-
High-Risk Flight Phases:
- Most fatal injuries occur during Takeoff and Approach. These phases require additional safety investments.
-
Industry Safety Trends:
- Accidents and fatalities show a declining trend, reflecting improved safety technologies.
- Prioritize aircraft models with outstanding safety records (identified in the box plot).
- Invest in training and technology for takeoff and approach phases.
- Collaborate with manufacturers demonstrating consistent safety performance.
-
Prerequisites:
- Python 3.9 or higher.
- Required libraries:
pandas
,matplotlib
,seaborn
.
-
Steps:
- Clone this repository.
- Place the aviation dataset (
AviationData.csv
) in the project folder. - Run the Jupyter Notebook to generate the analysis and visualizations.
-
Interactive Dashboard:
- Open the Tableau dashboard file for deeper insights.
.
├── AviationData.csv # Dataset
├── README.md # Project documentation
├── aviation_analysis.ipynb # Jupyter Notebook with analysis
├── tableau_dashboard.twbx # Tableau workbook for interactive analysis
- Expand the analysis to include environmental and mechanical failure data.
- Develop predictive models for aviation risk evaluation.
- Integrate operational cost analysis to refine recommendations.
Feel free to contact me for questions or collaboration opportunities!