Own your opportunity to manage the network that makes mission success possible. Make an impact by using your skills to deliver “One GDIT Network” for our clients.
Job DescriptionOwn the opportunity as a Cloud DevOps Engineer/ Senior Reliability Engineer and help ensure the mission is never interrupted. At GDIT, we deliver clarity with our cloud solutions and provide meaningful work. Your work will be an important part of transforming our clients for the modern age and help them face any obstacle.
The Case Management Modernization (CMM) Program is an initiative to support the Administrative Office of the US Courts (AO) develop a modern cloud-based solution to support all 204+ federal courts across the United States. The Cloud DevOps Engineer/ Senior Reliability Engineer will work as part of an agile development team to build and support the modernization of enterprise-class software applications.
RESPONSIBILITIES:
Ensure operational stability, availability, performance, and scalability of cloud-hosted systems across production and development environments supporting multiple agile teamsProvide real-time monitoring, alerting, incident response, and health checks for infrastructure and applications across all cloud layers (OS, app, DB)Implement and maintain dashboards, visualizations, and reports for system health, event management, and cost optimization using native CSP toolsManage cloud resource thresholds and automate capacity planning, forecasting, and resource optimization strategiesPerform incident and event management (SIEM) operations, and support issue diagnosis, resolution, and reporting including RCA documentationTrack, document, and report monthly issues, including system performance, stability, ticket volumes, and time-to-resolution metricsMonitor resource utilization (CPU, memory, disk space) across all deployed VMs, containers, and PaaS componentsContribute to the implementation of the Enterprise FinOps framework, including forecasting, budget control, and right-sizing analysisSupport deployment automation and ensure systems are resilient, repeatable, and scalable via Infrastructure as Code (IaC)Integrate operations with DevSecOps, MLOps, and CI/CD pipelines for seamless deployment and managementExecute daily or agreed frequency system health checks and maintain operational Runbooks and SOPsREQUIRED EXPERIENCE & QUALIFICATIONS:
Technical training, certificate, or degree required. Bachelor's degree strongly preferred5+ years experience in IT system engineering, systems development, systems coding and programmingDeep expertise with AWS services, including monitoring, logging, compute, storage, and networkingProficiency in Infrastructure as Code (IaC) tools like Terraform, AWS CloudFormationHands-on experience with monitoring and APM tools such as CloudWatch, Datadog, Prometheus, Grafana, New Relic, etc.Solid understanding of incident response, change management, and ITIL-based operational supportFamiliarity with CI/CD toolchains and automation platforms (Jenkins, GitHub Actions, GitLab, ArgoCD)Strong scripting skills (Python, PowerShell, Bash) for automation and orchestrationAdvanced experienced in providing DevSecOps implementation using GitOps, or similar toolsExperienced in developing, testing, and maintaining containerized applicationsExpert knowledge of source version control, build/release tools and methodologies, CI/CD pipelines and the Software Build process for large enterprises that consists of a large number of complex applicationsExperience with FinOps practices, cost modeling, forecasting, and optimization tools within cloud platformsUnderstanding of federal compliance and security frameworks (e.g., FedRAMP, NIST, JISF Rev 5)GDIT Is Your Place:
401K with company matchComprehensive health and wellness packagesInternal mobility team dedicated to helping you own your careerProfessional growth opportunities including paid education and certificationsRest and recharge with paid vacation and holidays