Credit Card Fraud Detection (AWS SageMaker)

Complete

End-to-end ML pipeline for credit card fraud detection on AWS SageMaker, with a focus on handling class imbalance, threshold optimisation for business decisions, and the development of reusable, domain-agnostic ML frameworks.

Technologies

AWS SageMakerAWS S3AWS AthenaPythonXGBoostscikit-learnSHAPimbalanced-learnJupyter

Problem

Fraud cases represent only 0.17% of transactions. A naive model reaches 99.83% accuracy by always predicting "legitimate" — making accuracy alone a misleading metric. The real challenge is finding the right precision-recall tradeoff based on business cost assumptions.

Approach

7-phase ML workflow: Initial Data Assessment → Feature Engineering → EDA & Visualisation → Baseline Modelling (class weights, SMOTE, undersampling) → Threshold Optimisation. Parameterised Python scripts and domain-agnostic Claude prompts for each phase.

Result

Fraud detection model with an optimised classification threshold and a fully documented, reusable ML workflow framework applicable to any classification task.

Learnings

AUC-PR is more informative than accuracy for imbalanced datasets; threshold selection is a business decision, not a model parameter; generalising frameworks from domain-specific to domain-agnostic increases long-term reuse value.

Relevance

Demonstrates ML engineering, AWS SageMaker integration, AI-assisted development workflows, reusable framework design, and end-to-end thinking from raw data to business decision.

Architecture

Open fullscreen ↗