Raanana, Israel
15 hours ago
Principal Software Engineer, AIOps

NVIDIA is powering the world’s most advanced AI Factories. To ensure their seamless operation, we are building a mission-critical Observability and Prediction platform. This platform is delivered as a dual-delivery model: both as a high-scale SaaS solution and as a robust on-premises deployment for our largest enterprise customers.

We are looking for a Principal Engineer to lead the architectural vision of the platform’s core. In this role, you will be the internal technical authority responsible for building a unified, high-performance engine that processes massive telemetry streams and runs advanced predictive models, regardless of where the infrastructure resides.

 

What you’ll be doing:

Unified Architectural Vision: Lead the design of a flexible, high-scale architecture that supports both multi-tenant SaaS environments and complex on-premises deployments.Operationalizing Predictive Models: Bridge the gap between AI research and production by architecting the framework that runs sophisticated predictive algorithms at scale, ensuring they are robust enough for mission-critical environments.High-Scale Engineering: Design distributed systems to handle the extreme telemetry density of large-scale AI clusters, ensuring efficient data ingestion, processing, and real-time analysis.Cross-Organizational Leadership: Collaborate with networking and infrastructure teams to define the technical standards that enable the AIOps platform to integrate seamlessly with global AI infrastructure.Technical Excellence: Drive the engineering roadmap, mentor senior staff, and serve as the final authority on architectural decisions, ensuring the platform meets the highest standards of reliability and scalability.

 

What we need to see:

Education: B.Sc./M.Sc. in Computer Science, Computer Engineering, or a related technical field.Experience: 12+ years of experience in software engineering, with a proven track record of architecting complex, high-scale products delivered via SaaS and/or on-premises enterprise models.Architectural Sovereignty: Deep expertise in building environment-agnostic distributed systems, using technologies like Kubernetes to ensure portability across cloud and private data centers.Core Systems Programming: Expert-level proficiency in languages such as Go, C++, or Rust, with a focus on high-performance, concurrent architectures.Data Infrastructure: Extensive experience with high-throughput data processing (e.g., Apache Kafka) and managing large-scale telemetry or time-series data.

 

Ways to stand out from the crowd:

The "0 to 1" Mindset: A proven track record of taking a complex architectural concept from a whiteboard to a stabilized, production-grade platform.A "Systems" Thinker: You don't just write software; you understand the full stack, from how data moves across the wire to how it’s processed in a distributed cluster.Infrastructure Evangelist: Experience in leading large-scale technical migrations or introducing modern engineering paradigms (like Cloud-Native or GitOps) into complex, high-stakes environments.Practical Innovation: The ability to simplify complex problems and build internal tools or frameworks that empower other engineering teams to move faster.

#LI-Hybrid ​

Confirm your E-mail: Send Email