There’s nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.
As a Site Reliability Engineer III at JPMorgan Chase within the Enterprise technology, Employee Platforms team, you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform.
Job responsibilities
Guide and assist peers in creating designs and gaining consensus.Collaborate with teams to design and implement deployment approaches using automated CI/CD pipelines.Design, develop, test, and implement solutions for availability, reliability, and scalability.Implement infrastructure, configuration, and network as code.Collaborate with technical experts and stakeholders to resolve complex problems.Utilize service level indicators and objectives to proactively resolve issues before they impact customers.Support the adoption of site reliability engineering best practices within the team.
Required qualifications, capabilities, and skills
Formal training or certification on Site Reliability Engineering concepts and 3+ years applied experienceManage and optimize various types of databases, including relational, NoSQL, and columnar databases.Utilize programming languages such as Python, SQL, Spark, Ada, R, C/C++, Java, and JavaScript.Demonstrate experience with big data platforms like Databricks, Spark, Snowflake, and Hadoop.Apply knowledge of machine learning, deep learning, generative AI, and statistical analysis.Use containerization tools like Docker and orchestration platforms like Kubernetes.Apply site reliability engineering principles, including SLAs, SLOs, and error budgets.Understand networking fundamentals, including TCP/IP, DNS, and network protocols.Experience with cloud services like AWS, Azure, or Google Cloud.Familiarity with version control systems like Git.Thorough understanding of encryption, access controls, and secure data transmission techniques.Preferred qualifications, capabilities, and skillsExperience with data platforms like Splunk, Datadog, Dynatrace, and the Elastic Stack.Implement data ingestion techniques such as Batch Ingestion & Streaming Real-time Ingestion (Kafka/Cribl).Utilize data visualization and analytics tools like Grafana, Splunk, Tableau, Power BI, and Graph Explorer.Perform data wrangling and cleansing, and manage Data ETL processes (Cleansing, Transformation, Integration/Enrichment).Strong communicator with excellent problem-solving, critical thinking, and analytical reasoning skills, along with attention to detail and a passion for innovation.