Automated Sports Data Lake

A scalable, cloud-native ETL pipeline and web application built to securely ingest, process, and serve historical sports data.

System Architecture, Engineering, & DevOps Principles

Idempotent Data Pipelines

Engineered robust ETL scripts handling historical and live sports data. By leveraging PostgreSQL ON CONFLICT upsert operations and strict, dynamic path structuring in Google Cloud Storage (GCS), the pipeline guarantees safe, repeatable executions without data duplication or state corruption.

Infrastructure as Code

Fully bootstrapped the GCP environment using Terraform. Every resource—from Cloud SQL instances and GCS buckets to Cloud Run deployments—is codified. This ensures the entire infrastructure can be safely destroyed to manage costs and reliably redeployed from scratch in minutes.

Zero-Trust Security & CI/CD

Emphasized a security-first approach by eliminating long-lived service account keys. Integrated Workload Identity Federation (WIF) to securely authenticate GitHub Actions deployments to GCP, strictly enforcing least-privilege IAM roles across all service accounts and cloud resources.

Zero-Downtime Releases

The CI/CD pipeline is designed for enterprise reliability. Utilizing GitHub Actions and Workload Identity Federation, new code is deployed to Google Cloud Run as an isolated revision with 0% public traffic. This Blue/Green deployment strategy allows new features to be validated in production on a private URL before seamlessly shifting live traffic, ensuring absolute zero downtime for end-users.

Deep Observability & Load Resilience

Beyond standard monitoring, the API is heavily instrumented with OpenTelemetry to generate distributed traces, custom metrics, and application logs. Using an industry-standard local LGTM stack (Loki, Grafana, Tempo, Prometheus), the system's database connection pools and endpoint latencies were rigorously load-tested and optimized using k6 to handle high-concurrency traffic spikes.

Core Technology Stack

Docker / Containerization Linux / Bash Terraform OpenTelemetry Grafana (LGTM) Prometheus k6 Load Testing GitHub Actions (CI/CD) Blue/Green CI/CD Workload Identity Federation Google Cloud Platform (GCP) IAM (Least Privilege) Google Cloud Run Cloud Storage (GCS) PostgreSQL (Cloud SQL) Python 3 & FastAPI

About the Engineer

Keegan Davis

I am a Cloud & Reliability Engineer with a foundational background in Data Analytics. By bridging the gap between complex data pipelines and infrastructure-as-code, I build the resilient platforms that keep applications running gracefully under pressure. I focus on automating secure CI/CD workflows, deep system observability, and engineering zero-downtime deployments for production environments.

Bridging a background in data analytics with modern DevOps, Site Reliability Engineering (SRE), and Cloud Architecture. This platform evolved from a robust ETL data pipeline into a comprehensive showcase of production-grade DevOps practices. It highlights secure, keyless CI/CD patterns, declarative infrastructure (Terraform), and zero-downtime Blue/Green deployments. Furthermore, the system is deeply instrumented with OpenTelemetry, proving its architectural resilience and data integrity under load.

Ready to interact with the API?

Explore the NFL Database