Job Detail

Senior DevOps Engineer

Posted on Jun 10, 2026
Location: Dubai, UAE
Industry: Others
Job Type: Full Time/Permanent
Experience: 3 Years
Salary: 4000 - 40000 UAE Dirhams (Monthly)

Job Description

Job Summary
The Senior DevOps Engineer serves as a critical execution layer for Mercans’ AI-native infrastructure strategy and broader produ delivery and automation. Either based in Tartu, Estonia or Remote, reporting directly to the CTO, and collaborating with Product Managers, Engineering Managers, software engineers, data scientists, and SRE teams, this role operationalizes the technical vision through hands-on automation, GitLab DevSecOps pipelines, and resilient platform operations.

The position focuses on building and maintaining a secure, cost-efficient private cloud environment capable of hyperscale payroll processing and proprietary AI model traini, while providing deployment automation, feature flagging, and release orchestration to accelerate Product and Engineering team velocity.

Duties and Responsibilities:
Platform Automation &
Support Product and Engineering Teams by implementing and maintaining GitLab pipelines that enable rapid feature delivery, testing, feature flagging, and blue-green deployments across all product lines, enforcing architectural standards, security controls, and AI-first engineering patterns defined by the Enterprise Architecture Board.
Provide deployment automation for Product releases with GitLab pipelines including shift-left security (SAST, DAST, dependency scanning, container scanning, IaC scanning, secret detection) integrated into merge requests, environment promotion gates, and production deployment approvals.
Enable Engineering velocity through self-service deployment templates, environment provisioning APIs, and GitLab pipeline libraries that reduce cognitive load for application teams building payro features.
Automate Product experimentation with GitLab Feature Flags, progressive delivery patterns, and canary releases to enable Product Teams to test hypotheses with minimal deployment risk.
Private Cloud Operations
Automate infrastructure provisioning for the private cloud (Kubernetes, HCI, GPU nodes, storage) using Infrastructure as Code in line with the AI Cloud reference architecture, scanning Terrafo manifests for misconfigurations via GitLab.
Operate and optimize GPU-enabled Kubernetes clusters, including bin-packing, autoscaling, and fractional GPU scheduling to support AI training and inference workloads efficiently, with GitLab runtime security policies and container image scanning for CVEs.
Observability & Resilience
Implement observability (logging, metrics, tracing) and SRE practices to contribute toward the 99.999% availability target and active-active multi-datacenter strategy for core payroll and AI services, leveraging GitLab security dashboards for vulnerability tracking and remediation.
Identify operational issues, implement fixes and performance improvements, and contribute to chaos engineering and resilience drills to build an anti-fragile engineering culture, with GitLab conditional pipelines for secure testing and deployment.
Security & Compliance
Ensure systems are safe and secure against cybersecurity threats by embedding GitLab security policies into pipelines, managing secrets with detection scans, enforcing role-based access control (RBAC), and achieving policy compliance through MR approvals and dashboards.
Work closely with Product Managers, software engineers, data scientists, and MLOps teams to standardize release processes for AI models and product features, reduce lead time to production, and integrate with model registries, compliance checks, and feature management platforms using GitLab’s end-to-end DevSecOps workflows.
Doentation & Knowledge Transfer
Produce high-quality doentation for runbooks, deployment procedures, GitLab pipeline templates, and platform standards, and contribute to internal Centers of Excellence for SRE and AI Engineering, including GitLab security best practices training.
Skills and experience
4–6+ years of experience as a DevOps / SRE / Platform Engineer operating production‑grade Kubernetes‑based systems and pipelines.
Hands‑on experience with private cloud or on‑prem Kubernetes (e.g., CAPI‑based clusters, HCI) and automation tools (Terrafo or equivalents).
Experience running containerized workloads with GPUs, including familiarity with scheduling, resource quotas, and performance tuning for workloads.
Strong automation skills and programming ability in at least one language (e.g., Python, Go, or similar) for scripting, integrations, and tooling.
Good understanding of observability stacks, incident management, and SRE practices (SL, error budgets, postmortems).
Knowledge of secure software delivery practices, secrets management, and compliance‑aware deployment in regulated or data‑sensitive environments.
Proficiency with GitLab DevSecOps: Configuring .gitlab-ci.yml templates for SA scanning, security dashboards, RBAC, policy enforcement, feature flags, and progressive delivery in pipelines.
Experience enabling Produ teams with self-service deployment platforms, GitOps workflows, and golden deployment paths that balance velocity and safety.
Experience with Agile teams and collaborative ways of working across Product, development, architecture, and da functions.
Strong doentation, time‑management, and communication skills in English, with readiness to take initiative and shape DevOps practices from the ground up in alignment with architectural guidelines.
Performance Goals:
reliability and speed
Specific: Design and standardize GitLab pipelines for core payroll, AI services, and Product feature releases, including automated security testing (SAST, DAST, scanning) and deployment approvals.
Measurable: Achieve a pipeline success rate of at least 98% and reduce median lead time from commit to production to under 2 hours for target services and product features.
Achievable: Leverage GitLab security templates and collaborate with Product, development, and QA teams to streamline stages and remove manual bottlenecks.
Relevant: Directly supports AI‑first engineering standards and Product velocity for faster time‑to‑market for new features and AI models.
Time‑bound: Target achieved by end of Q***026.
Infrastructure cost and utilization optimization
Specific: Implement bin‑packing strategies, right‑size workloads, and refine Kubernetes scheduling for CPU, memory, and GPU resources in the private cloud.
Measurable: Contribute to a 15% reduction in infrastructure cost per payslip and increase average GPU and node utilization to at least 70% on production clusters.
Achievable: Use monitoring data and autoscaling capabilities; coordinate with architecture on capacity planning and hardware lifecycle.
Relevant: Supports broader COGS reduction and maximizes ROI on AI hardware investments.
Time‑bound: Target achieved by end of Q***026.
Platform resilience and incident reduction
Specific: Implement SRE practices, incident runbooks, and active‑active‑aware deployment patterns for critical payroll, AI, and Product services.
Measurable: Help reach 99.99%+ availability for owned services on the path to five nines for the core engine, and reduce high‑severity incidents (Sev‑1 and Sev‑2) by 30% year‑over‑year.
Achievable: Introduce improved alerting, standardized playbooks, and participate in chaos drills and postmortems to address systemic issues.
Relevant: Aligned with the strategic goal of enterprise‑grade resilience for Tier‑1 clients.
Time‑bound: Measured over the 12‑month period following the hire date.
MLOps and Product deployment velocity
Specific: Integrate GitLab pipelines with the model registry, compliance checks, and Product feature management, enabling automated deployment of AI models and product releases to the private cloud.
Measurable: Reduce the lead time for deploying updated AI models and Product features from weeks to less than 24 hours for prioritized use cases, with zero non‑approved deployments.
Achievable: Build pipeline templates for AI workloads and Product releases and collaborate with data science, Product, and GRC teams.
Relevant: Supports the organization’s target of advanced MLOps maturity and Product velocity with safe AI and feature adoption at scale.
Time‑bound: Initial target achieved by Q***026, with continuous improvement thereafter.
Operational excellence and knowledge sharing
Specific: Create and maintain platform doentation, runbooks, and internal knowledge sessions focused on private cloud, GitLab DevSecOps, , Product deployment patterns, and AI infrastructure operations.
Measurable: Publish at least 10 high‑quality runbooks or platform guides and lead a minimum of 6 internal technical sessions or deep‑dives per year.
Achievable: Integrate doentation and knowledge‑sharing into incident resolution, new feature rollout, and architectural change activities.
Relevant: Strengthens internal Centers of Excellence and supports talent density and mentorship objectives.
Time‑bound: Targets measured on an annual basis, with the first cycle ending Q***026.


Skills Required

NOTICE: Esteemed Candidate, you bear complete responsibility for engaging with the employer throughout the hiring process. GulfJobs.com disclaims any responsibility regarding your recruitment. A legitimate employer will never request payment for hiring.