Predicting Titanic Survival and Crop Yield

  • Category: AI
  • Technologies: AWS SageMaker, Scikit-learn, PyCaret

Assignment

Our team was tasked with a comprehensive machine learning project that involved two distinct datasets: the classic Titanic dataset and a more intricate crop dataset. The project required a multi-faceted approach, encompassing in-depth exploratory data analysis (EDA), meticulous data preparation, model building using diverse techniques, and rigorous model evaluation to identify the most effective solutions.

Approach

Collaborating closely as a team of three, we embarked on a systematic journey. We began by immersing ourselves in the Titanic dataset, leveraging Python libraries such as Pandas, Matplotlib, NumPy, Scikit-learn, xlrd, and Seaborn. We calculated descriptive statistics, thoroughly examined data types, and created informative charts to visualize patterns and relationships within the data. Armed with a deeper understanding, we meticulously cleaned and preprocessed the dataset, ensuring its readiness for model training. Subsequently, we harnessed the power of AWS SageMaker, Scikit-learn, and PyCaret to build and evaluate multiple predictive models. This structured approach was then replicated for the crop dataset, fostering consistency and enabling comparative analysis.

Result

Our comprehensive EDA efforts yielded clear and insightful visualizations for both datasets. The thorough data preparation paved the way for the successful training of accurate predictive models. For the Titanic dataset, AWS SageMaker with XGBoost emerged as the top performer, achieving a remarkable 96% accuracy. Impressively, this same combination also excelled with the crop dataset, attaining an outstanding 98% accuracy.

My Input

Throughout the project, I actively collaborated with my teammates to conduct thorough EDA and craft compelling visualizations. As we divided the project into three, I had full responsibilities of the PyCaret model development process, from initial experimentation to final evaluation.

What I learned

The project solidified the importance of EDA for success in machine learning. It also provided exposure to AutoML tools, cloud-based platforms, and the value of teamwork.