Project 1: Network Usage Analytics on Big Data Stack
I worked on a big data analytics solution for a telecom giant to analyze terabytes of CDR (Call Detail Records) daily. I used Apache Sqoop and Kafka to ingest structured and semi-structured data into HDFS and Hive tables.
I developed PySpark jobs to analyze call durations, dropped calls, roaming patterns, and bandwidth consumption.
I also used Apache HBase to store real-time metrics and Apache Druid for fast OLAP queries. Data scientists used this clean, transformed data to run churn prediction models.
I exposed the processed data through Superset dashboards and APIs built in Flask for internal telecom ops teams.
We achieved near real-time network monitoring using Airflow DAGs for hourly updates and alerts for anomalies using Grafana and Prometheus.
Project 2: 5G Rollout and User Location Data Platform
I developed a platform to collect and process geo-location data for 5G coverage optimization. We collected device signal data via mobile SDKs and sent them through Kafka topics for ingestion.
I used Apache Beam with Google Cloud Dataflow to process these signals and aggregated data per tower, per region.
I implemented BigQuery for OLAP analysis and used GeoJSON with PostGIS to map user density in real time.
I used Airflow to orchestrate daily heatmap generation and Looker for dashboard visualization.
The processed output enabled the 5G planning team to optimize antenna placement and reduce dead zones by 23%.
I worked closely with the DevOps team to containerize the jobs using Docker and deployed them using Kubernetes (GKE).
I modernized an old telecom billing system by reengineering it using a distributed data engineering stack.
I migrated ETL processes from PL/SQL to Spark Structured Streaming to handle streaming events from customer calls, SMS, and data usage.
I used Apache Kafka for ingesting CDR events, Spark for real-time enrichment using reference data (customer plans, taxes), and Delta Lake to maintain billing state with full audit logs.
I wrote transformation logic in Scala and orchestrated batch jobs using Airflow.
Processed bills were stored in Snowflake, and I built custom validation layers using dbt for downstream BI reports.
This pipeline reduced latency by 60% and ensured accurate billing for over 10 million customers.