// distributed systems & high availability architecture.
About
dedicated to designing high-performance, fault-tolerant distributed systems. focusing on scalability, data integrity, and the beauty of minimalist technical architecture. exploring the intersections of data engineering, cloud-native infrastructure, and elegant code.
Experience
Stealth
- building high-scale data infrastructure and distributed execution engines.
- focusing on fault tolerance, automated scaling, and low-latency data processing.
Decentro (YC S20)YC S20
- engineered multi-terabyte data archival pipelines.
- optimized query performance and reduced cloud infrastructure costs by 60%.
Academic
University
Selected Builds
[ distributed systems / infrastructure / tools ]
Data File Viewer
VS Code extension to view and explore binary data files directly in the editor. Supports 11 formats including pkl, h5, parquet, feather, joblib, npy, npz, msgpack, arrow, avro, nc, and mat files. Implemented a Python backend with isolated virtual environments for safe, on-demand data parsing. Optimized file loading to handle large datasets without editor freezes.
AWS Terraform Multi-Environment Template
Production-ready Terraform template supporting dev, staging, and prod environments. Modular IaC architecture with reusable components for VPC, ECS, RDS, ALB, ECR, Route53, and remote state management. Implements multi-environment patterns using for_each loops and environment conditionals.
Parallelization Engine
Distributed parallelization engine using Docker, Celery, and RabbitMQ for scalable task execution. Enables dynamic worker scaling across multiple nodes for compute-intensive workloads. Focused on fault tolerance, task retries, and throughput optimization for real-world data pipelines.
Motor Vehicle Collision Analysis Pipeline
End-to-end ETL pipeline that processes traffic accident data to identify patterns and insights. Built with Apache Airflow for orchestration and Spark for large-scale data processing. Includes data visualization dashboards for exploring collision trends.
X-Purge
Chrome extension for smart X (Twitter) unfollowing. Mimics human behavior with randomized delays and daily caps. Implements advanced relationship, activity, and profile quality filters to fix follow-to-follower ratios without API dependency.
Multi-Node Airflow Cluster
Multi-node Apache Airflow cluster with distributed schedulers, metadata DB replication using Patroni, self-healing capabilities, and Prometheus-Grafana monitoring. Designed for high availability and fault tolerance. (Not publicly available)
Data Archival/Deletion Pipeline
Large-scale archival and deletion pipelines for multi-product Cassandra database. Migrated archived data to Amazon S3 in Hive format, configured AWS Athena reducing query costs by 60%. Ensured data governance compliance throughout the archival process. (Not publicly available)
High Availability Infrastructure
Highly available APIs, databases, and Airflow services using Keepalived (VIPs), Patroni (PostgreSQL HA), and shared storage via GlusterFS/NFS. Nginx load balancing with Route53 and Azure DNS for global distribution. (Not publicly available)