Distributed Systems
Designing services that coordinate at scale — across process boundaries, network partitions, and eventual consistency constraints. CAP tradeoffs are engineering decisions, not theory.
Building enterprise-grade backend systems and distributed architectures that perform under real-world load — across healthcare platforms, media streaming, and high-availability infrastructure.
Over the course of my career, I have worked at the intersection of scale and reliability — building backend systems that process millions of events, serve global audiences, and recover gracefully under pressure. My experience spans healthcare and media: two domains where data integrity and uptime aren't aspirational — they're contractual.
At OSN, I contributed to platform infrastructure serving streaming audiences across MENA. At CureMD, I worked on enterprise healthcare software where correctness and compliance were non-negotiable constraints. These aren't environments that reward shortcuts. They reward engineers who think in systems.
I am drawn to problems that live at the boundary of performance and correctness: distributed coordination, event-driven data flows, service boundaries that hold under pressure. I work well in senior engineering teams — contributing architecture decisions, reviewing design tradeoffs, and mentoring engineers who are growing into that same systems-level thinking.
Designing services that coordinate at scale — across process boundaries, network partitions, and eventual consistency constraints. CAP tradeoffs are engineering decisions, not theory.
Production Java, Kotlin, and .NET services built for longevity. Clean APIs, disciplined dependency management, testable architectures — code meant to outlast the sprint.
Asynchronous pipelines, message brokers, and event sourcing patterns for systems that need to decouple without losing consistency guarantees. Kafka, queues, and beyond.
Azure-native deployments with Docker and Kubernetes orchestration. Infrastructure as a product — observable, reproducible, and cost-aware by design.
Profiling hot paths, eliminating contention, restructuring query plans, and tuning caching strategies. Latency is not a detail — it's user experience made measurable.
PostgreSQL at scale — indexing strategies, partitioning, query optimisation. Redis for caching and coordination. Data models that serve the system, not just the feature.
Specific details remain confidential. What follows is an accurate representation of the scale and character of the engineering challenges I've engaged with.
Problem: A core API serving downstream consumers had accumulated latency under concurrent load, leading to cascading timeouts across dependent services.
Approach: Profiled hot code paths, identified N+1 query patterns, introduced targeted caching at the service layer, and restructured connection pool configuration.
Outcome: Response time reduced by over 60% under peak load. Cascading timeout incidents eliminated. Downstream teams unblocked.
Problem: Over-provisioned infrastructure supporting a legacy synchronous service architecture was generating significant cloud spend without proportional performance benefit.
Approach: Migrated synchronous processing pipelines to event-driven consumers, right-sized Kubernetes workloads, and eliminated redundant inter-service round-trips.
Outcome: Monthly infrastructure spend reduced materially while throughput capacity increased. System became easier to operate and reason about.
Problem: A complex onboarding flow involving multiple external integrations had high failure rates at tenant activation, requiring frequent manual intervention.
Approach: Re-architected the flow using a saga pattern with compensating transactions, introduced idempotency at each integration boundary, and added structured retry logic with exponential backoff.
Outcome: Onboarding success rate improved significantly. Manual intervention requirements dropped. Engineering team gained visibility into failure modes for the first time.
Problem: A PostgreSQL schema had grown organically, producing degrading query performance as data volume scaled — particularly across multi-table joins with unindexed predicates.
Approach: Audited slow query logs, redesigned index strategy for high-cardinality access patterns, introduced table partitioning, and worked with product teams to rewrite the most expensive query paths.
Outcome: p99 query latency reduced substantially. Reporting workloads decoupled from OLTP paths, improving stability across both surfaces.
Contributed to the backend platform powering OSN's streaming product — one of MENA's leading entertainment networks. Worked on content delivery infrastructure, service resilience, and platform reliability initiatives serving millions of subscribers. Operated at the intersection of high-throughput data pipelines and strict availability requirements.
Developed and maintained backend systems for CureMD's enterprise healthcare platform — a complex multi-tenant environment serving clinical operations, billing workflows, and patient data management. Navigated compliance constraints and data integrity requirements that demanded precision in every design decision.
Reliability over cleverness.
Latency is user experience.
Simple systems scale better.
Observability is not optional.
Failure is a first-class concern.
Code is read more than written.
I'm open to senior engineering roles, distributed systems challenges, and engineering leadership conversations. Based in Dubai. Available for remote and on-site roles globally.