Mumbai, Maharashtra, India
7 hours ago
Lead Software Engineer - Site Reliability Engineer

As a Lead Software Engineer within Asset & Wealth Management Technology at JPMorgan Chase, you work with your fellow stakeholders to drive the adoption of Site Reliability Engineering tools, practices and culture. You partner with Product teams, other LOB SMEs and leadership to not only help in defining the SRE Objectives but also lead the way in driving the delivery of those objectives.

As part of that, you drive programs and initiatives to enable Product teams to define non-functional requirements (NFRs) and availability targets for the services in their respective application and product lines. You will ensure those NFRs are accounted for in products’ design and test phases and firm-wide SRE practices are integrated into Product teams' SDLC life cycles.

 

Job responsibilities

Demonstrates site reliability principles and practices every day and champions the adoption of site reliability throughout your team Collaborates with others to create and implement observability and reliability designs for complex systems that are robust, stable, and do not incur additional toil or technical debt Creates high quality designs, roadmaps, and program charters that are delivered by you or the engineers under your guidance Ensures that systems not only follow the firm wide standard resiliency patterns but are also tested for resiliency on a regular basis through wargames, failover exercises and chaos experiments  Provides advice and mentoring to other engineers and acts as a key resource for technologists seeking advice on technical and business-related issues Works toward becoming an expert on the applications and platforms in your remit while understanding their interdependencies and limitations Evolves and debug critical components of applications and platforms Champion SRE culture throughout the organization through programs, initiatives and innovation Makes significant contributions to JPMorgan Chase’s site reliability community via internal forums, communities of practice, guilds, and conferences

 

Required qualifications, capabilities, and skills

10+ years of experience in software engineering Experienced in site reliability culture and principles with demonstrated ability to implement site reliability within an application or platform Knowledge and experience in observability such as white and black box monitoring, service level objectives, alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, Open Telemetry, etc. Have a good understanding of cloud infrastructure and micro service based design principles in order to apply SRE best practices across the application architecture Ability to design and develop solutions for automation and toil reduction using one of the common programming languages such as Python, Java or C++ Recognized as an active contributor of the engineering community Continues to expand network and leads evaluation sessions with vendors to see how offerings can fit into the firm’s strategy Ability to anticipate, identify, and troubleshoot defects found during testing Strong communication skills with ability to mentor and educate others on site reliability principles and practices
Confirm your E-mail: Send Email