Plano, TX, USA
6 days ago
Site Reliability Engineer

DESCRIPTION:

Duties: Lead initiatives to improve the reliability and stability of applications and platforms using data-driven analytics to improve service levels. Collaborate with team members to identify comprehensive service level indicators and stakeholders to establish reasonable service level objectives and error budgets with customers. Proactively identify and solve technology-related bottlenecks. Act as the main point of contact during major application incidents and solve issues quickly to avoid financial losses. Configure Grafana dashboards for observability. Create SLO (Service Level Objective) and SLI (Service Level indicators) parameters for monitoring.

QUALIFICATIONS:

Minimum education and experience required: Master's Degree in Computer Science, Computer Engineering, or related field of study plus 5 years of experience in the job offered or as Site Reliability Engineer, Lead Developer, Developer, or related occupation. The employer will alternatively accept a Bachelor's degree in Computer Science, Computer Engineering, or related field of study plus 7 years of experience in the job offered or as Site Reliability Engineer, Lead Developer, Developer, or related occupation.

Skills Required: This position requires four (4) years of experience with the following: observability including white and black box monitoring, service level objective alerting, and telemetry collection using Grafana and Dynatrace. This position requires experience with the following: site reliability culture and principles including process for failure mode analysis, infrastructure reviews for resiliency and offerability, and livesites active culture; implementing site reliability within an application or platform including FMEA and Root Cause Analysis Reviews; At least one of the following programming languages: Python, Java Spring Boot, or .Net; software application analysis, development and technical processes within a given technical discipline such as Cloud computing, artificial intelligence, or android; Prometheus; Datadog; Splunk; cloud watch; creating Synthetic transactions using Dynatrace; continuous integration and continuous delivery (CI/CD) tools including Jenkins, GitLab, and Terraform; container and container orchestration including ECS, Kubernetes, and Docker; troubleshooting common networking technologies and issues including issues caused by dropped packets, and down circuits.

Job Location: 8181 Communications Pkwy, Plano, TX 75024.

Confirm your E-mail: Send Email