Bengaluru, Karnataka, India
41 days ago
Principal Engineer - Data Layer
We are seeking a hands-on, highly technical Data Professional to lead the architecture, development, and rollout of a next-generation AI-powered Analytical Data Platform built on Lakehouse architecture.You will not just guide teams — you’ll roll up your sleeves, build proofs of concept, and demonstrate “do-and-show” leadership in Engineering & Product Management to deliver robust, high-performance, secure, and scalable data platforms.This role is ideal for a leader who thrives at the intersection of deep technical problem-solving, strategic vision, and team empowerment — capable of driving solutions from architecture to production with precision, quality, and speed.
Key Responsibilities:  Architect, Design & Build: Define and implement enterprise-grade analytical data platforms following Lakehouse and Data-as-a-Service (DaaS) principles.  Hands-On Engineering Leadership: Personally develop and validate key platform components and data flows. Create POCs and “show-by-doing” implementations to accelerate team understanding and delivery. Optimize complex data integrations, transformations, and performance bottlenecks. Lead by example in design, development, and debugging efforts, serving as a "player-coach" for the team. Data Platform Expertise: Design data storage and retrieval layers using ClickHouse (Preferred), Apache Druid, Apache Doris and the likes, PostgreSQL/MS SQL, MongoDB, and Elasticsearch. Build and optimize data pipelines (Apache Seatunnel, AWS Glue, etc.) for large-scale, high-volume data processing. Model analytical structures using OLAP principles, semantic layers, and data visualization frameworks (Apache Superset (preferred) or similar open-source offerings). Scalability & Performance: Deliver reliable and performant data systems that handle diverse, massive, and complex datasets. Quality & Consistency: Set and enforce standards for data accuracy, governance, security, and performance. Execution Ownership: Manage Sprints, perform SMART task breakdowns, and ensure on-time, high-quality team deliverables. ClickHouse: Manage and optimize this high-performance OLAP database for real-time analytics on massive datasets. Clickhouse experience is preferred, however experience with other open source OLAPs like Apache Druid/Doris can apply. Data Integration: Utilize Apache Seatunnel (or similar tools) for efficient data ingestion and synchronization across various sources. AI/ML & Agentic AI: Lead the development and integration of AI models, algorithms, and Agentic AI solutions to solve complex business problems and automate processes. Databases: Manage and optimize both PostgreSQL (relational) and NoSQL databases to support diverse data storage needs. Data Visualization/BI: Implement and manage data visualization and exploration tools like Apache Superset to deliver actionable insights to stakeholders. Infrastructure: Oversee deployment and orchestration using technologies like Docker, Kubernetes, and potentially specific environments such as MCP servers (Model Context Protocol, if applicable to the company's tech stack) Data Governance & Quality: Ensure robust data governance, integrity, privacy, and security standards are maintained across the platform.
Confirm your E-mail: Send Email