Site Reliability Engineer-Observability
Insight Global
Job Description
Insight Global is sourcing for a Site Reliability Engineer (Observability), to join a leading health solutions company that provides healthcare and retail pharmacy services. The Site Reliability Engineer will have a strong focus on observability to design, implement, and operate monitoring and alerting solutions for mission-critical enterprise applications. This role will be responsible for building proactive, actionable observability across services, batch workloads, infrastructure, databases, and logs using tools such as Grafana, Prometheus, Loki, and Tempo. The ideal candidate is passionate about reliability engineering, signal-to-noise optimization, and enabling teams to detect and resolve issues before they impact customers. This position will be hybrid with preference to candidates local to New England, Richardson, TX, Buffalo Grove, IL, or Phoenix, AZ. This position is a 6-month contract to hire.
Compensation: $_50_/hr. to $_60_/hr. Exact compensation may vary based on several factors, including skills, experience, and education.
Benefit packages for this role will start on day one of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401K retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.
Skills and Requirements
-At least 6 years of experience in Site Reliability Engineering, DevOps, or Production Operations.
-Experience implementing proactive monitoring and alerting for Microservices and APIs, batch jobs, data pipelines, server and container health, database health and performance, etc.
-Hands-on expertise with Prometheus, Grafana, Loki, and Tempo in large-scale, production environments.
-Strong understanding of monitoring distributed systems spanning both On-Premises and Cloud environments (GCP, Azure).
-Experience defining SLOs/SLIs and building alerting strategies based on reliability engineering best practices.
-Exceptional attention to detail with the ability to think through complex systems end-to-end, anticipate edge cases, failure modes, and cascading impacts, and proactively design monitoring and alerting to cover both common and rare operational scenarios.
Confirm your E-mail: Send Email
All Jobs from Insight Global