Should hands on experience in Snowflake, Snowpark and PySpark
Data Pipeline Development
Build and optimize ETL workflows for batch and real-time data ingestion.
Implement transformations using Spark, PySpark, and Spark SQL.
Design orchestration flows using tools like Azure Data Factory or similar.
Data Integration & Quality
Perform data profiling, mapping, and validation for ETL processes.
Ensure compliance with data governance and quality standards.
Performance & Automation
Optimize SQL queries and Spark jobs for scalability and efficiency.
Automate workflows using CI/CD pipelines and scripting.
Collaboration & Support
Work with business analysts, architects, and global stakeholders to deliver data solutions.
Troubleshoot and monitor data lake environments and resolve issues proactively.
Required Skills
Technical Expertise:
Advanced SQL for data manipulation and analysis.
Strong experience with Apache Spark, PySpark, and distributed data processing.
ETL design principles and tools (Azure Data Factory, Databricks, Hadoop ecosystem).
Additional Knowledge:
Familiarity with cloud platforms (Azure/AWS), data warehousing (Snowflake, Teradata), and scripting languages (Python, Scala)
Soft Skills:
Analytical thinking, problem-solving, and effective communication