Senior Software Engineer (AWS)
Sage
We’re looking for a Senior DataOps / DevOps Engineer to design, build, and operate the reliability layer underpinning Sage’s core data platforms, including large-scale batch and streaming data systems.
In this role, you’ll own the observability, monitoring, and operational resilience of cloud native data infrastructure and streaming pipelines, ensuring that data flows—whether event driven or batch—are performant, reliable, and predictable in production.
This is a hybrid position requiring 3 days per week in our Newcastle office.
First 90 Days
• 30 days: Get familiar with Sage’s data platform architecture, including batch and streaming pipelines, cloud infrastructure, and existing operational tooling. Understand current monitoring, alerting, logging, and incident response practices, along with data reliability SLAs, failure modes, and engineering standards.
• 60 days: Begin actively improving observability across key data systems, including dashboards, alerts, and pipeline health checks. Contribute to the operation and reliability of batch and streaming workloads, applying Infrastructure as Code, incident learnings, and DataOps best practices.
• 90 days: Own major aspects of the data platform’s operational reliability and observability strategy. Drive improvements in alert quality, system resilience, pipeline reliability, and operational maturity. Mentor team members on DataOps and DevOps practices, and help shape how data platforms are built and operated going forward.
Meet the Team
You’ll work alongside data engineers, AI specialists, product managers, and designers in a highly collaborative environment.
The team focuses on building scalable internal platforms that power data-driven decision making and AI-enabled products across Sage.
How success will be measured
• Delivery of reliable, scalable automation and operational capabilities across data ingestion, processing, and platform services.
• Measurable improvements in platform observability, including clear dashboards and actionable alerts tied to data SLAs such as freshness, latency, and availability.
• Reduction in operational toil through Infrastructure as Code, repeatable deployments, and improved self-service onboarding for engineering teams.
• Improved incident response outcomes, including faster detection, faster recovery, and fewer recurring issues through effective post-incident followups.
• Strong operational quality across environments, with platforms operating securely, predictably, and in line with governance and compliance requirements.
• Increased visibility into system health across batch and streaming data pipelines.
Skills you’ll gain
• Deep expertise operating a modern Product Data Platform / Data Hub supporting both batch and streaming workloads.
• Hands-on experience with streaming and distributed data processing systems and their operational characteristics.
• Strong exposure to observability engineering for data systems, including metrics, logs, traces, and pipeline health monitoring.
• Experience shaping platform reliability standards, including alerting strategies, run books, and on call readiness.
• Practical cloud infrastructure ownership across storage, compute, and analytics layers used by large scale data platforms.
In this role, you’ll own the observability, monitoring, and operational resilience of cloud native data infrastructure and streaming pipelines, ensuring that data flows—whether event driven or batch—are performant, reliable, and predictable in production.
This is a hybrid position requiring 3 days per week in our Newcastle office.
First 90 Days
• 30 days: Get familiar with Sage’s data platform architecture, including batch and streaming pipelines, cloud infrastructure, and existing operational tooling. Understand current monitoring, alerting, logging, and incident response practices, along with data reliability SLAs, failure modes, and engineering standards.
• 60 days: Begin actively improving observability across key data systems, including dashboards, alerts, and pipeline health checks. Contribute to the operation and reliability of batch and streaming workloads, applying Infrastructure as Code, incident learnings, and DataOps best practices.
• 90 days: Own major aspects of the data platform’s operational reliability and observability strategy. Drive improvements in alert quality, system resilience, pipeline reliability, and operational maturity. Mentor team members on DataOps and DevOps practices, and help shape how data platforms are built and operated going forward.
Meet the Team
You’ll work alongside data engineers, AI specialists, product managers, and designers in a highly collaborative environment.
The team focuses on building scalable internal platforms that power data-driven decision making and AI-enabled products across Sage.
How success will be measured
• Delivery of reliable, scalable automation and operational capabilities across data ingestion, processing, and platform services.
• Measurable improvements in platform observability, including clear dashboards and actionable alerts tied to data SLAs such as freshness, latency, and availability.
• Reduction in operational toil through Infrastructure as Code, repeatable deployments, and improved self-service onboarding for engineering teams.
• Improved incident response outcomes, including faster detection, faster recovery, and fewer recurring issues through effective post-incident followups.
• Strong operational quality across environments, with platforms operating securely, predictably, and in line with governance and compliance requirements.
• Increased visibility into system health across batch and streaming data pipelines.
Skills you’ll gain
• Deep expertise operating a modern Product Data Platform / Data Hub supporting both batch and streaming workloads.
• Hands-on experience with streaming and distributed data processing systems and their operational characteristics.
• Strong exposure to observability engineering for data systems, including metrics, logs, traces, and pipeline health monitoring.
• Experience shaping platform reliability standards, including alerting strategies, run books, and on call readiness.
• Practical cloud infrastructure ownership across storage, compute, and analytics layers used by large scale data platforms.
Confirm your E-mail: Send Email
All Jobs from Sage