Trivandrum
3 hours ago
Lead I - Software Engineering

Summary
Seeking a Python developer with solid Python fundamentals and strong hands-on expertise in PySpark for building scalable data processing pipelines on AWS.

Required Skills & Experience

4+ years of professional experience in software development with a strong focus on Python. Solid understanding of core Python concepts, data structures, algorithms, and design patterns. Proficiency in Python for scripting, automation, backend services, and data-processing workflows. Data modeling for analytics (medallion architecture: bronze/silver/gold), Parquet/Avro/JSON best practices. Hands-on expertise with PySpark, including: Working with DataFrames/Datasets and Spark SQL ETL/ELT pipeline development for large-scale, batch and near-real-time workloads. Expertise and hands on experience in Performance tuning & optimization. Hands on experience on Spark Streaming. Excellent knowledge of Lakehouse & table formats: Delta Lake (preferred), Apache Hudi or Apache Iceberg. Expertise in Data quality & validation. Excellent knowledge of Pandas. AWS hands-on experience with a strong understanding of cloud principles, including: AWS Glue (ETL jobs, Spark jobs, Glue Studio/Workflows, Glue Data Catalog) and AWS Lambda for serverless integrations. Amazon EMR (cluster sizing, autoscaling, cost optimization with Spot, versioned runtimes). Amazon S3 (data lake layout, partitioning, lifecycle policies. Orchestration & monitoring: AWS Step Functions, Amazon MWAA/Airflow, CloudWatch Logs/Metrics/Alarms. Experience with Agile development methodologies. Familiarity with CI/CD concepts and tooling such as AWS CodePipeline/CodeBuild/CodeDeploy; infrastructure as code (CloudFormation/Terraform) is a plus. Testing & code quality: unit/integration testing for Spark (pytest, chispa), code reviews, PEP 8, type hints/mypy. Strong problem solving, analytical, and communication skills. Ability to work independently and collaboratively in a team environment.

Nice to Have

Knowledge of Java and the Spring framework. Databricks on AWS: Jobs, clusters, notebooks, Repos, Delta Live Tables, Unity Catalog. Experience with catalog governance and row/column-level security. Exposure to cost/performance governance (e.g., file compaction, small-files mitigation, Z-Ordering for Delta). Knowledge of REST APIs integration and message-based architectures.

Responsibilities

Design, build, and optimize PySpark-based data pipelines (batch & streaming) on AWS. Tune Spark jobs for performance, reliability, and cost efficiency; monitor using Spark UI/CloudWatch. Collaborate with platform, data, and application teams to integrate pipelines with Glue/EMR/Lambda/Step Functions. Establish CI/CD for data workflows and ensure test coverage and deployment automation. Contribute to coding standards, documentation, and Agile ceremonies.

Confirm your E-mail: Send Email