A fully automated, cloud-native ETL pipeline and web application built to securely ingest, process, and serve historical sports data. This document outlines the architectural choices, tradeoffs, and DevOps practices used to build a resilient, production-grade system on Google Cloud Platform (GCP).
The platform is designed around a decoupled, pull-based architecture separating heavy data ingestion from the lightweight API serving layer.
ON CONFLICT constraints to guarantee idempotency.Engineering is about choosing the right compromises. Below are the key decisions made during the design of this system:
dev tag, new code can be validated in production on a private URL. Once verified, traffic is seamlessly shifted to the new revision, guaranteeing zero downtime and protecting the end-user experience.ON CONFLICT upserts.The foundation of this project is completely codified using Terraform. By strictly managing state and resources via code, the project achieves:
Modern infrastructure requires deep visibility. To move beyond standard monitoring, this application is heavily instrumented to provide actionable insights during traffic spikes.
k6 to simulate concurrent user spikes. Distributed tracing (Tempo waterfalls) was leveraged to identify and isolate PostgreSQL query latency under heavy load, ensuring the database connection pool remained stable.