Newcastle, GBR
13 hours ago
Senior Software Engineer (AWS)
Senior Software Engineer (AWS) Job Description: We’re looking for a Senior DataOps / DevOps Engineer to design, build, and operate the reliability layer underpinning Sage’s core data platforms, including large-scale batch and streaming data systems. In this role, you’ll own the observability, monitoring, and operational resilience of cloud native data infrastructure and streaming pipelines, ensuring that data flows—whether event driven or batch—are performant, reliable, and predictable in production. This is a hybrid position requiring 3 days per week in our Newcastle or Barcelona office. First 90 Days • 30 days: Get familiar with Sage’s data platform architecture, including batch and streaming pipelines, cloud infrastructure, and existing operational tooling. Understand current monitoring, alerting, logging, and incident response practices, along with data reliability SLAs, failure modes, and engineering standards. • 60 days: Begin actively improving observability across key data systems, including dashboards, alerts, and pipeline health checks. Contribute to the operation and reliability of batch and streaming workloads, applying Infrastructure as Code, incident learnings, and DataOps best practices. • 90 days: Own major aspects of the data platform’s operational reliability and observability strategy. Drive improvements in alert quality, system resilience, pipeline reliability, and operational maturity. Mentor team members on DataOps and DevOps practices, and help shape how data platforms are built and operated going forward. Meet the Team You’ll work alongside data engineers, AI specialists, product managers, and designers in a highly collaborative environment. The team focuses on building scalable internal platforms that power data-driven decision making and AI-enabled products across Sage. How success will be measured • Delivery of reliable, scalable automation and operational capabilities across data ingestion, processing, and platform services. • Measurable improvements in platform observability, including clear dashboards and actionable alerts tied to data SLAs such as freshness, latency, and availability. • Reduction in operational toil through Infrastructure as Code, repeatable deployments, and improved self-service onboarding for engineering teams. • Improved incident response outcomes, including faster detection, faster recovery, and fewer recurring issues through effective post-incident followups. • Strong operational quality across environments, with platforms operating securely, predictably, and in line with governance and compliance requirements. • Increased visibility into system health across batch and streaming data pipelines. Skills you’ll gain • Deep expertise operating a modern Product Data Platform / Data Hub supporting both batch and streaming workloads. • Hands-on experience with streaming and distributed data processing systems and their operational characteristics. • Strong exposure to observability engineering for data systems, including metrics, logs, traces, and pipeline health monitoring. • Experience shaping platform reliability standards, including alerting strategies, run books, and on call readiness. • Practical cloud infrastructure ownership across storage, compute, and analytics layers used by large scale data platforms. Key Responsibilities: Snapshot of your day to day • You’ll design and operate monitoring and alerting that provides realtime visibility into pipeline health, SLA breaches, and platform behaviour. • You’ll improve the reliability of batch and streaming data ingestion and processing workloads, focusing on failure recovery and operational robustness. • You’ll build and maintain cloud infrastructure and deployment automation to keep environments consistent, secure, and repeatable. • You’ll work closely with data engineering and product teams to improve platform onboarding and reduce the effort required to adopt shared data capabilities. • You’ll help strengthen governance, compliance, and auditability by improving observability, documentation, and operational controls across the platform. Must have skills • Strong experience as a DataOps, DevOps, or Platform Engineer supporting production data systems. • Proven expertise in observability tooling, including monitoring, logging, alerting, dashboards, and operating distributed systems in production. • Solid understanding of streaming and event-driven data pipelines and their common failure modes (e.g. lag, back pressure, replay). • Strong cloud infrastructure experience (AWS preferred), including networking, compute, storage, and managed services. • Hands-on experience with Infrastructure as Code and CI/CD practices for platform and data services. • Ability to work across ingestion, processing, and storage layers while collaborating effectively with multiple engineering teams. • Excellent communication and collaboration skills in English. Nice to have skills • Experience operating data platforms built on technologies such as Snowflake and S3based data lake patterns. • Familiarity with distributed processing and streaming ecosystems such as Kafka or Flink. • Experience implementing data pipeline health monitoring beyond infrastructure metrics (e.g. freshness, completeness, anomaly detection). • Experience supporting multi-team internal platforms with a “platform as a product” mindset. At Sage, we offer you an environment where you can grow professionally without compromising your personal well-being. Our benefits package is designed to provide stability, flexibility, and balance: Benefits video –https://youtu.be/TCMtTYUUiuU • Comprehensive health, dental and vision coverage • Work away scheme for up to 10 weeks a year • On-going training and professional development • Paid 5 days yearly to volunteer through our Sage Foundation • Flexible work patterns and hybrid working #LI-AL2 Function: Service Fabric (Data and Developer Services) Country: United Kingdom Office Location: Newcastle Work Place type: Hybrid Advert Working at Sage means you’re supporting millions of small and medium sized businesses globally with technology to work faster and smarter. We leverage the future of AI, meaning business owners spend less time doing routine tasks, like entering invoices and generating reports, and more time pursuing their ambitions. Our colleagues are the best of the best. It’s why we were awarded 2024 Best Places to Work by Glassdoor. Because to achieve extraordinary outcomes, we need extraordinary teams. This means infusing Sage with people who knock down barriers, continuously innovate, and want to experience their potential. Learn more about working at Sage:sage.com/en-gb/company/careers/working-at-sage/ Watch a video about our culture:youtube.com/watch?v=qIoiCpZH-QE We celebrate individuality and welcome you to join us if you embrace all backgrounds, identities, beliefs, and ways of working. If you need support applying, reach out atcareers@sage.com. Learn more about DEI at Sage:sage.com/en-gb/company/careers/diversity-equity-and-inclusion/ Equal Employment Opportunity (EEO) Sage is committed to Equal Employment Opportunity and providing reasonable accommodations to applicants with physical and/or mental disabilities. In order to provide equal employment and advancement opportunities to all individuals, employment decisions at Sage will be based on merit, qualifications, and abilities. Sage does not discriminate in employment opportunities or practices on the basis of race, color, religion, sex, national origin, age, protected disability, veteran status, sexual orientation, gender identity, genetic information, or any other characteristic protected by applicable law.
Confirm your E-mail: Send Email