There’s nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.
As a Site Reliability Engineer III at JPMorgan Chase within the Commercial & Investment Bank, you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform.
Job responsibilities
Independently manage small to medium-sized projects with initial guidance, progressing to designing and delivering projects autonomouslyUtilize technology to address business challenges by developing high-quality, maintainable, and robust code in line with software engineering best practicesEngage in triaging, analyzing, diagnosing, and resolving incidents, collaborating with others to address root causesIdentify repetitive tasks within your role and proactively work to eliminate them through appropriate channelsComprehend observability patterns and strive to implement and enhance service level indicators, objectives, monitoring, and alerting solutions for optimal transparency and analysis.Design, code, test, and deliver software solutions to automate manual operational tasksTroubleshoot high-priority incidents, facilitate blameless post-mortems, and ensure the permanent resolution of incidentsIdentify application patterns and analytics to support improved service level objectives. Implement necessary telemetry and observability to monitor and measure service quality in real-time against established SLOsMaintain a strong focus on automation and processes, designing, implementing, improving, and utilizing key monitoring tools. Collaborate with SRE, Operations, and Development teams to balance manual operational work with engineering effortsPossess a strong understanding of Incident, Problem, and Change Management processes and tools. Participate in Support Rota coverage as needed. Effectively escalate issues and risks across the support framework when necessarySupports the adoption of site reliability engineering best practices within your team
Required qualifications, capabilities, and skills
Formal training or certification on SRE concepts and proficient applied experience.Proficiency in one or more technology domains, with the ability to solve complex and mission-critical problems within a business or across the firm. Excellent debugging and troubleshooting skills.Proficient in coding with at least one programming language and open to learning modern technologies, such as Python, Java, etc.Extensive expertise in the instrumentation, customization, and use of modern monitoring tools like Dynatrace, Grafana, Splunk, AWS, Kubernetes, Geneos, Kafka, MQ, etc.Hands-on experience with modern cloud technologies such as AWS, Gaia, etc. Expertise in at least one relational database (e.g., SQL Server, Oracle, DB2).Skilled in performance monitoring and capacity management of large systems using various tools. Comfortable working in an Agile environment and proficient in Continuous Integration and Continuous Delivery practices.Strong attention to detail and time-management skills. Proficient in Site Reliability Engineering (SRE) concepts, principles, and practices. Proficient with containers or common server operating systems such as Linux and Windows.Ability to contribute to large and collaborative teams by presenting information in a logical and timely manner with compelling language and limited supervisionAbility to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovationAbility to identify new technologies and relevant solutions to ensure design constraints are met by the software teamAbility to initiate and implement ideas to solve business problemsPreferred qualifications, capabilities, and skillsCertification in programming languages and/or cloud technologies.Experience in Custody, Securities, or Trading domains, including areas such as FX Cross Currency, High and Low Value Payments, SWIFT, Real-Time Payments, Trading, Corporate Actions, etc.General knowledge of the financial services industry.