We are actively seeking a highly skilled and motivated software engineer, to join our Site Reliability Engineering team who is dedicated to delivering high-quality products and services to our customers and help us maintain and improve the reliability, scalability, and performance of our systems
Job description:
As a Sotware Engineer II with Site Reliability Engineer at JPMorgan Chase within the Site Reliability Engineering team, you will play a critical role in ensuring the reliability and performance of our systems and services. You will work closely with development and operations teams to design, build, and maintain scalable infrastructure, automate processes, and implement best practices for system reliability. Your expertise will be essential in identifying and resolving issues, optimizing system performance, and ensuring the seamless operation of our services.
Job responsibilities
Design, implement, and maintain scalable and reliable infrastructure to support our applications and services.Collaborate with development teams to ensure that new features and services are designed with reliability and scalability in mind.Automate operational processes to improve efficiency and reduce manual intervention.Monitor system performance and availability, proactively identifying and resolving issues.Implement and maintain monitoring, alerting, and logging systems to ensure visibility into system health.Conduct root cause analysis and post-mortem reviews to prevent future incidents.Develop and maintain documentation for system architecture, processes, and procedures.Participate in on-call rotations to provide support for critical incidents and outages.Continuously improve system reliability, performance, and scalability through innovative solutions and best practices
Required qualifications, capabilities, and skills
Formal training or certification on software engineering concepts and 2+ years applied experienceBachelor's degree in Computer Science, Engineering, or a related field, or equivalent experience.Proven experience as a Site Reliability Engineer, DevOps Engineer, or similar role.Strong knowledge of cloud platforms (e.g., AWS, Google Cloud, Azure) and containerization technologies (e.g., Docker, Kubernetes).Proficiency in scripting and automation languages (e.g., Python, Bash, Terraform).Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack).Solid understanding of networking, security, and system architecture.Excellent problem-solving skills and the ability to work under pressure.Strong communication and collaboration skills.
Preferred qualifications, capabilities, and skills
Experience with CI/CD pipelines and tools (e.g., Jenkins, GitLab CI).Familiarity with microservices architecture and related technologies.Knowledge of database management and optimization (e.g., MySQL, PostgreSQL, NoSQL).Experience with incident management and response.