← Back to Projects

Batch Energy Analytics Pipeline (AWS)

Complete

Serverless batch pipeline that ingests monthly electricity generation data from the Ember API for 8 European countries, transforms nested JSON into Parquet format via AWS Glue, and visualises energy production trends through an Amazon QuickSight dashboard.

Technologies

AWS S3AWS Glue ETLAWS Glue CrawlerAmazon AthenaAmazon QuickSightPythonboto3

Problem

Raw energy data from the Ember API arrives as nested JSON that is not directly queryable. No unified view of European electricity generation trends across countries and energy sources existed.

Approach

Python ingestion script → S3 raw layer (Hive-style date partitioning) → AWS Glue ETL (Explode + Flatten transforms) → S3 processed layer (Parquet/Snappy) → Glue Crawler → Athena validation → QuickSight dashboard.

Result

Interactive dashboard showing electricity production trends across 8 European countries from 2020 to 2026, with monthly budget cap at $50 USD.

Learnings

AWS Glue requires explicit Explode + Flatten transforms for nested JSON APIs; QuickSight date columns must be typed as Date (not string) for time-axis charts; QuickSight permissions are configured separately from AWS IAM.

Relevance

Cloud-native end-to-end AWS data pipeline from API ingestion to business visualisation; demonstrates serverless ETL architecture and AWS analytics service integration.