InfoDataWorx

Data Engineer Telecom Domain

Written by Vishwa Teja | Apr 7, 2025 8:15:04 PM

Telecom Domain

Project 1: Network Usage Analytics on Big Data Stack

I worked on a big data analytics solution for a telecom giant to analyze terabytes of CDR (Call Detail Records) daily. I used Apache Sqoop and Kafka to ingest structured and semi-structured data into HDFS and Hive tables.

I developed PySpark jobs to analyze call durations, dropped calls, roaming patterns, and bandwidth consumption.

I also used Apache HBase to store real-time metrics and Apache Druid for fast OLAP queries. Data scientists used this clean, transformed data to run churn prediction models.

I exposed the processed data through Superset dashboards and APIs built in Flask for internal telecom ops teams.

We achieved near real-time network monitoring using Airflow DAGs for hourly updates and alerts for anomalies using Grafana and Prometheus.

 

Project 2: 5G Rollout and User Location Data Platform

I developed a platform to collect and process geo-location data for 5G coverage optimization. We collected device signal data via mobile SDKs and sent them through Kafka topics for ingestion.

I used Apache Beam with Google Cloud Dataflow to process these signals and aggregated data per tower, per region.

I implemented BigQuery for OLAP analysis and used GeoJSON with PostGIS to map user density in real time.

I used Airflow to orchestrate daily heatmap generation and Looker for dashboard visualization.

The processed output enabled the 5G planning team to optimize antenna placement and reduce dead zones by 23%.

I worked closely with the DevOps team to containerize the jobs using Docker and deployed them using Kubernetes (GKE).

 

Project 3: Telecom Billing Pipeline Modernization

I modernized an old telecom billing system by reengineering it using a distributed data engineering stack.

I migrated ETL processes from PL/SQL to Spark Structured Streaming to handle streaming events from customer calls, SMS, and data usage.

I used Apache Kafka for ingesting CDR events, Spark for real-time enrichment using reference data (customer plans, taxes), and Delta Lake to maintain billing state with full audit logs.

I wrote transformation logic in Scala and orchestrated batch jobs using Airflow.

Processed bills were stored in Snowflake, and I built custom validation layers using dbt for downstream BI reports.

This pipeline reduced latency by 60% and ensured accurate billing for over 10 million customers.