To set up the notebook, make sure that the data folder in included in the directory. Each cell can be ran separately, or ran all together.
In this problem, we are trying to figure out the cause behind the disappearance of passengers on a spaceship through a spacetime anomaly. To do this, we employed various machine learning techniques. The first step was to fill missing values in the dataset and then create new features while dropping some unnecessary ones. Categorical columns were encoded by creating dummy variables for each unique category and normalizing the dataset using the Normalizer() function. Finally, we trained four different machine learning classifiers: KNN, Decision Tree, Gradient Boosting, and Random Forest. The results showed that the Gradient Boosting classifier had the highest classification accuracy of .8076, indicating that it was the best model for this problem. This project demonstrated the effectiveness of using machine learning in solving complex problems by applying data preprocessing techniques and trying different models to identify the most suitable one.