Sr. TL SRE ((C#, .NET,SRE principles,SLIs, SLOs,AWSerror budgets, observability ,Debugging,Linux & Windows)
Vertafore
We are seeking a Senior Technical Lead Site Reliability Engineer to own the reliability, scalability, performance, and operational integrity of critical production services. This role is accountable for the full-service lifecycle, from design and deployment readiness through production operations, incident response, and continuous improvement. Reliability is a core engineering responsibility, requiring strong software engineering skills and autonomous operation across AWS, hybrid data centers, and customer-hosted environments. Roles and Responsibilities · Own production services end to end. Accountable for reliability, availability, scalability, performance, and operational health. · Define and manage SLIs and SLOs, using error budgets to guide delivery decisions. · Influence of service and system design to improve fault tolerance, observability and operational sustainability. · Debug complex production issues across application code, services and infrastructure using software engineering practices. · Perform root cause analysis using logs, metrics, traces, and code-level investigation. · Build automation and self-healing mechanisms to prevent repeat failures. · Execute production changes (patching, certificate management, software releases) with safety, automation, and observability. · Design and operate production observability aligned to service health and customer impact. · Lead and participate in incident response for high-severity events. · Collaborate with engineering, product, architecture, and operations teams. · Operate with autonomy and sound judgment in reliability decisions.
Confirm your E-mail: Send Email
All Jobs from Vertafore