Credit Card Fraud Detection (AWS SageMaker)
CompleteEnd-to-end ML pipeline for credit card fraud detection on AWS SageMaker, with a focus on handling class imbalance, threshold optimisation for business decisions, and the development of reusable, domain-agnostic ML frameworks.
Technologies
Problem
Fraud cases represent only 0.17% of transactions. A naive model reaches 99.83% accuracy by always predicting "legitimate" — making accuracy alone a misleading metric. The real challenge is finding the right precision-recall tradeoff based on business cost assumptions.
Approach
7-phase ML workflow: Initial Data Assessment → Feature Engineering → EDA & Visualisation → Baseline Modelling (class weights, SMOTE, undersampling) → Threshold Optimisation. Parameterised Python scripts and domain-agnostic Claude prompts for each phase.
Result
Fraud detection model with an optimised classification threshold and a fully documented, reusable ML workflow framework applicable to any classification task.
Learnings
AUC-PR is more informative than accuracy for imbalanced datasets; threshold selection is a business decision, not a model parameter; generalising frameworks from domain-specific to domain-agnostic increases long-term reuse value.
Relevance
Demonstrates ML engineering, AWS SageMaker integration, AI-assisted development workflows, reusable framework design, and end-to-end thinking from raw data to business decision.