Airbnb Listings Pipeline
CompleteBatch data pipeline that automates the collection of Airbnb listing snapshots from the Inside Airbnb dataset, stores them in MinIO, and loads them into PostgreSQL for historical trend analysis of the London short-term rental market.
Technologies
Problem
Manually collecting and loading Airbnb listing data for trend analysis is error-prone and not reproducible at scale.
Approach
HTTP sensor-based polling with mode="reschedule" to release worker slots, Astro SDK native PostgreSQL loader for bulk ingestion, MinIO as a local S3-compatible object store.
Result
Automated, reproducible pipeline for historical Airbnb trend analysis in London with a fully local Docker-based development stack.
Learnings
Browser User-Agent headers are required to avoid HTTP 403 blocks; XCom should carry keys, not raw bytes; reschedule mode frees worker slots between poke intervals; native DB COPY is orders of magnitude faster than row-by-row inserts.
Relevance
Demonstrates clean ETL design, sensor architecture, data warehouse loading patterns, and reproducible local development infrastructure.