Meet Fetch Engineering:
At Fetch, our engineering philosophy emphasizes innovation, adaptability, and informed decision-making. Our engineers thrive in complex environments, making decisions based on critical thinking and data, even in uncertain situations. We value proactive problem-solving and focus on driving impactful results while maintaining high technical standards. You will work alongside talented engineers who are dedicated to pushing the boundaries of technology and encouraging each other to excel. We understand that Fetch may not be the right fit for everyone, but if you're passionate about solving challenging problems and navigating intricate systems, Fetch could be a great place for you.
About the Role:
Fetch’s next step in evolving our engineering velocity requires a Senior Platform Engineer. In this role, you’ll own major pieces of Fetch’s platform strategy building standardized, cloud-native platforms that enable developers to move faster while keeping our systems secure, scalable, and resilient.
You’ll lead by example: designing paved paths, mentoring engineers, and delivering automation that turns operational friction into developer velocity. You’ll play a critical role in advancing initiatives like multi-region resiliency, observability standardization, cost anomaly monitoring, and testing in production.
What you’ll do at Fetch (Role Responsibilities):
Own & Evolve Platforms: Design, build, and scale core platform components (EKS/ECS, GitHub-native CI/CD, OTEL observability, cost monitoring). Modern IaC & App Delivery: Architect and enforce Terraform standards (module catalogs, policy-as-code with OPA/Conftest), manage Kubernetes apps via Helm (charts, repos), and implement GitOps (Argo CD/Flux) for progressive delivery and compliance. Accelerate Developer Experience: Deliver self-service infrastructure, opinionated deployment patterns, and automation that enable teams to ship faster with confidence. Advance Reliability & Resilience: Lead implementation of autoscaling, canary releases, anomaly detection, rollback automation, and disaster recovery patterns. Cross-Functional Influence: Collaborate with Engineering, Product, Finance, and Security to align platform investments with organizational goals. Mentorship & Leadership: Guide engineers in platform best practices, review designs, and raise the technical bar across the org. AI Fluency: Integrate AI-assisted tooling into daily platform engineering workflows, using it to accelerate development, validate infrastructure code, and improve developer experience while ensuring outputs meet reliability, security, and scalability standards.In your Toolbox (Minimum Requirements):
5+ years of experience in platform, DevOps, or SRE roles. Strong proficiency in one or more programming languages (Python, Go, or Java). Expert-level experience with AWS (multi-account, ECS/EKS, IAM, cloud networking). Deep knowledge of containerization, orchestration, and Infrastructure as Code (Terraform, Ansible, CDK, or CloudFormation). Track record of designing and delivering production-grade, scalable platform solutions. Proven ability to collaborate across teams and influence technical direction. Demonstrated ability to use AI-assisted tools to scaffold, validate, and extend infrastructure or platform code with a focus on reliability, security, and scalability. Strong judgment in balancing AI-generated output with engineering best practices, including validation, troubleshooting, and clear communication of tradeoffs.Nice to haves/Bonus Points (Preferred Requirements):
Experience standardizing developer platforms at scale. Hands-on with observability (OTEL), cost anomaly monitoring, or incident automation. Background in disaster recovery planning, multi-region architectures, and production safety (canaries, rollbacks). History of mentorship, technical leadership, or cross-team platform initiatives. Systematic problem-solving mindset and strong sense of ownership. Experience driving adoption of AI-assisted workflows across engineering teams (e.g., CI/CD improvements, observability, infrastructure automation). Ability to evaluate emerging AI tools for platform engineering, identifying opportunities, risks, and best practices for secure and effective use.This is a full-time role that can be held from one of our US offices or remotely in the United States.