BENGALURU, KARNATAKA, India
21 days ago
Site Reliability Developer 3

As a Site Reliability Engineer, you will be responsible for defining, deploying, and operating key services with a strong emphasis on system architecture, production operations, capacity planning, performance optimization, deployment, and release engineering. You will help deliver exceptional experiences for our customers and partners while ensuring our services meet reliability, scalability, and performance standards. 

Responsibilities 

Own the architecture, design, implementation, and production operations of core system and platform services 

Improve system reliability through automation, self-healing mechanisms, and real-time monitoring and alerting 

Identify and respond to production issues, driving root-cause analysis and implementing preventative solutions 

Contribute to the design, development, and operation of platform services, including provisioning, configuration, deployment, and ongoing support 

Partner with a globally distributed team to prototype, evaluate, and roll out new platform capabilities 

Design, write, and deploy software to improve the availability, scalability, and operational efficiency of services 

Develop and evolve standards, architectures, and best practices for large-scale distributed systems 

Lead and support capacity planning, demand forecasting, performance analysis, and system tuning 

Stay current with emerging technologies and apply innovative approaches to solving complex infrastructure and cloud-service challenges 

Qualifications & Experience 

5-8 years of experience in Site Reliability Engineering, DevOps, or a closely related role 

Experience developing and/or operating large-scale, distributed systems and services 

Hands-on experience with containerized environments using Kubernetes, Docker, Mesos, or similar technologies 

Experience with infrastructure automation and Infrastructure-as-Code tools such as Terraform, Chef, Ansible, Puppet, or Packer 

Familiarity with cloud orchestration frameworks and supporting them in an SRE or production environment 

Experience building and maintaining CI/CD pipelines using tools such as Git (or other VCS), GitLab Runners, Jenkins, and Rundeck 

Experience supporting production, test, and development environments at medium to large scale 

Proficiency in scripting for automation and deployments using Bash, PowerShell, or similar 

Knowledge of cloud compute platforms, networking, monitoring, logging, and data processing/analytics 

Proficiency in at least one modern programming language such as Python, Go or Java 

Experience operating fault-tolerant, highly available, high-throughput, and scalable systems 

Hands-on experience with at least one major cloud provider (AWS, OCI, GCP, or equivalent) 

Confirm your E-mail: Send Email