Hyderabad, IN
5 days ago
Sr. TL SRE ((C#, .NET,SRE principles,SLIs, SLOs,AWSerror budgets, observability ,Debugging,Linux & Windows)
We are seeking a Senior Technical Lead Site Reliability Engineer to own the reliability, scalability, performance, and operational integrity of critical production services. This role is accountable for the full-service lifecycle, from design and deployment readiness through production operations, incident response, and continuous improvement. Reliability is a core engineering responsibility, requiring strong software engineering skills and autonomous operation across AWS, hybrid data centers, and customer-hosted environments. Roles and Responsibilities ·       Own production services end to end. Accountable for reliability, availability, scalability, performance, and operational health. ·       Define and manage SLIs and SLOs, using error budgets to guide delivery decisions. ·       Influence of service and system design to improve fault tolerance, observability and operational sustainability. ·       Debug complex production issues across application code, services and infrastructure using software engineering practices. ·       Perform root cause analysis using logs, metrics, traces, and code-level investigation. ·       Build automation and self-healing mechanisms to prevent repeat failures.  ·       Execute production changes (patching, certificate management, software releases) with safety, automation, and observability. ·       Design and operate production observability aligned to service health and customer impact. ·       Lead and participate in incident response for high-severity events. ·       Collaborate with engineering, product, architecture, and operations teams. ·       Operate with autonomy and sound judgment in reliability decisions.
Confirm your E-mail: Send Email