REMOTE WORK, VA, USA
12 hours ago
SRE/MLOps Engineer
**Description** We are seeking a versatile **SRE/MLOps Engineer with DevSecOps expertise** to design, automate, and operate secure, scalable, and repeatable **model deployment workflows** across the AI/ML Common Services environment. This role bridges **infrastructure reliability, CI/CD automation, and model operations** , enabling IRS mission teams to move from experimentation to production with confidence. The engineer will not only support **ML lifecycle operations** (Databricks, MLflow, AWS SageMaker/Bedrock) but also bring **DevSecOps rigor** to ensure compliance, monitoring, and infrastructure-as-code are embedded in every step. By partnering with Infrastructure, Security, and Architecture teams, this role ensures the AAP environment is **resilient, automated, and compliance-ready** at enterprise scale. **Key Responsibilities:** + Enable **secure, scalable, and repeatable** deployment workflows for both ML models and supporting infrastructure. + Build and maintain **runtime environments, service accounts, orchestration logic** for Databricks, MLflow, and AWS AI services. + Implement and maintain **CI/CD pipelines** (Bitbucket, Bamboo, Jenkins, or equivalent) for code, data, and model deployments. + Apply **DevSecOps practices** — integrating security scans, compliance checks, and audit logging into deployment pipelines. + Collaborate with **Infrastructure DSO** and **Solutions Architect** to integrate Terraform-based IaC for consistent, automated provisioning. + Implement **observability, alerting, and logging** (CloudWatch, Datadog, Prometheus) to monitor both application and ML workloads. + Align infrastructure with ML lifecycle needs — including staging, promotion, rollback, retraining, and compliance-aware tracking. + Develop **automation templates, reusable workflows, and guardrails** to accelerate onboarding of mission team models while ensuring security. + Contribute to **incident response, performance tuning, and reliability engineering** across ML and non-ML workloads. **Qualifications** **Required Qualifications:** + Bachelor’s or master’s degree in computer science, Data Engineering, or a related technical discipline. + 5+ years of experience in **Site Reliability Engineering, DevOps, or MLOps** with production-grade systems. + Must be a U.S. Citizen with the ability to obtain and maintain a Public Trust security clearance. + Hands-on experience with **Databricks, MLflow, or AWS SageMaker/Bedrock** for ML model lifecycle operations. + Strong proficiency in **Terraform, CI/CD pipelines** , and container orchestration (Docker, Kubernetes). + Experience implementing **security automation** (e.g., IaC scanning, container security, SAST/DAST tools) within CI/CD workflows. + Solid understanding of **observability stacks** (logs, metrics, tracing) and best operational practices. **Desired Skills:** + Active IRS clearance highly desired. + Experience in **federal or regulated environments** with security, audit, and compliance requirements (FedRAMP, NIST 800-53). + Knowledge of **Trustworthy AI monitoring** (bias detection, drift monitoring, explainability). + Familiarity with **Unity Catalog, Delta Lake, and data pipeline orchestration** in Databricks. + Hands-on experience with **Zero Trust security models** and secure boundary implementations. + Relevant certifications such as: + **Databricks Certified Machine Learning Professional** + **AWS DevOps Engineer – Professional** + **Certified Kubernetes Administrator (CKA)** + **Security+ or equivalent security cert** Target salary range: $120,001 - $160,000. The estimate displayed represents the typical salary range for this position based on experience and other factors. REQNUMBER: 2508971 SAIC is a premier technology integrator, solving our nation's most complex modernization and systems engineering challenges across the defense, space, federal civilian, and intelligence markets. Our robust portfolio of offerings includes high-end solutions in systems engineering and integration; enterprise IT, including cloud services; cyber; software; advanced analytics and simulation; and training. We are a team of 23,000 strong driven by mission, united purpose, and inspired by opportunity. Headquartered in Reston, Virginia, SAIC has annual revenues of approximately $6.5 billion. For more information, visit saic.com. For information on the benefits SAIC offers, see Working at SAIC. EOE AA M/F/Vet/Disability
Confirm your E-mail: Send Email