Doraville, GA, 30362, USA
3 days ago
Dev/ML Ops Engineer- Azure AI Infrastructure- INTL BRAZIL
Job Description A client of Insight Global is looking to add a team member to their AI COE, they will be the operational backbone of the AI Lab. Enable rapid AI deployment by transforming prototypes into production systems. Ensure observability, reliability, and cost optimization for LLM workloads at scale. Reduce infrastructure friction so AI Engineers focus on conversational AI and agent development. Establish MLOps capability with reusable patterns for LLM systems, vector databases, and agentic workflows. Protect business value through security, compliance, and cost control. Role Overview The AI Center of Excellence needs a Dev/ML Ops Engineer to operationalize AI/LLM solutions on Azure. This role transforms AI prototypes into reliable, secure, cost-optimized production systems. This is not traditional DevOps—it's an AI-native infrastructure role focused on LLM orchestration layers, vector databases, agentic workflows, and AI observability. You will: • Build Infrastructure as Code (Terraform/Bicep) for Azure AI services • Deploy LLM-powered applications (Claude, Azure OpenAI, RAG systems) • Create CI/CD pipelines supporting rapid AI iteration • Implement observability for AI workload performance and costs • Ensure enterprise security, compliance, and operational reliability Infrastructure & Deployment Design and maintain Azure cloud infrastructure using IaC (Terraform/Bicep). Provision Azure AI services: App Service, Container Apps, Azure OpenAI, AI Search (vector DB), Cosmos DB, PostgreSQL, Redis Cache. Configure secure networking (VNETs, Private Endpoints, NSGs, WAF). Implement Azure Key Vault for secrets management. Optimize costs through Azure Cost Management. Build automated CI/CD pipelines (GitHub Actions or Azure DevOps) for backend APIs (Python/Flask, Node.js), frontends (React/Vue.js), database migrations, and infrastructure updates. Implement feature flags (Azure App Configuration) for gradual rollouts and kill switches. Create self-service deployment workflows with rollback automation. Observability & MLOps Configure Azure Application Insights for distributed tracing, custom metrics, and log analytics. Build dashboards tracking AI response times (P50/P95/P99), LLM token usage/costs, vector DB performance, cache hit rates, and compliance violations. Implement alerting (Azure Monitor Action Groups) and cost tracking per project. Support LLM deployment: configure Azure OpenAI with rate limiting and fallbacks, implement model versioning for prompt templates, build A/B testing infrastructure, manage vector DB operations (indexing, chunking, retrieval), optimize token consumption via caching, and support agentic workflows (multi-agent orchestration, tool calling, memory management). Security & Compliance Ensure enterprise-grade security: implement Azure Private Endpoints (zero public exposure), configure Key Vault access policies (least privilege), set up Azure AD authentication (Managed Identities, RBAC), ensure data encryption at rest and in transit, support SOC 2/ISO 27001 audits, manage API key rotation, and implement PII anonymization for GDPR compliance. We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/. Skills and Requirements Must-Have Technical Skills Azure Cloud Platform 3+ years with Azure: App Service, Container Apps, Functions, Cosmos DB, PostgreSQL, Redis Cache, VNETs, Private Endpoints, Azure AD, Managed Identities, RBAC, Azure OpenAI, AI Search, AI Document Intelligence. Azure Cost Management and FinOps experience. Infrastructure as Code & Automation Proficient in Terraform (preferred) or Bicep. Experience with IaC design, state management, GitOps. Strong scripting: Python (preferred), Bash, PowerShell. CI/CD & Containerization GitHub Actions or Azure DevOps Pipelines. Docker proficiency, Azure Container Apps, Kubernetes basics. Multi-environment deployment strategies (blue-green, canary, feature flags). Observability & Monitoring Azure Application Insights, Azure Monitor, Log Analytics. KQL (Kusto Query Language) proficiency. Dashboard building (Azure Workbooks, Grafana). Alerting and incident management (PagerDuty, Opsgenie). Preferred Experience • Deployed LLM applications (OpenAI, Anthropic Claude, Azure OpenAI) in production • Vector databases (Azure AI Search, Pinecone, Weaviate, ChromaDB) • LLM orchestration frameworks (LangChain, LlamaIndex, Semantic Kernel) • RAG patterns: chunking, semantic search, hybrid retrieval, reranking • A/B testing for ML/AI models and token optimization strategies • Agentic workflows: multi-agent systems, tool calling, memory management • Enterprise compliance (SOC 2, ISO 27001, GDPR) • WAF configuration (Imperva, Azure Front Door) • AI consultancy or fast-paced innovation teams
Confirm your E-mail: Send Email