Join us as we embark on a journey of collaboration and innovation, where your unique skills and talents will be valued and celebrated. Together we will create a brighter future and make a meaningful difference.
As a Lead Data Engineer- Python / Spark / Data Lake at JPMorgan Chase within the Consumer & Community Bank- Connected Commerce Technology, you play a crucial role in an agile team dedicated to improving, developing, and providing data collection, storage, access, and analytics solutions that are secure, stable, and scalable. As a key technical contributor, you are tasked with maintaining essential data pipelines and architectures across diverse technical domains within various business functions, all in support of the firm's business goals.
Job responsibilities
Generates data models for their team using firmwide tooling, linear algebra, statistics, and geometrical algorithmsDelivers data collection, storage, access, and analytics data platform solutions in a secure, stable, and scalable wayImplements database back-up, recovery, and archiving strategy Evaluates and reports on access control processes to determine effectiveness of data asset security with minimal supervisionAdds to team culture of diversity, opportunity, inclusion, and respectDevelops data strategy and enterprise data models for applications Manages data infrastructure including design, construct, install, and maintenance of large scale processing systems and infrastructure Drives data quality and ensures data accessibility to analysts and data scientists Ensures compliance with data governance requirements and business alignment including ensuring data engineering practices align with business goals
Required qualifications, capabilities, and skills
Formal training or certification on data engineering concepts and 5+ years applied experienceExperience with both relational and NoSQL databasesExperience and proficiency across the data lifecycleExperience with database back up, recovery, and archiving strategyProficient knowledge of linear algebra, statistics, and geometrical algorithmsAdvanced proficiency in at least one programming language such as Python, Java or ScalaAdvanced proficiency in at least one cluster computing framework such as Spark, Flink or StormAdvanced proficiency in at least one cloud data lakehouse platform such as AWS data lake services, Databricks, or Hadoop; at least one relational data store such as Postgres, Oracle or similar; and at least one NOSQL data store such as Cassandra, Dynamo, MongoDB or similarAdvanced proficiency in at least one scheduling/orchestration tool such as Airflow, AWS Step Functions or similarProficiency in Unix scripting, data structures, data serialization formats such as JSON, AVRO, Protobuf, or similar; big-data storage formats such as Parquet, Iceberg, or similar; data processing methodologies such as batch, micro-batching, and stream; one or more data modelling techniques such as Dimensional, Data Vault, Kimball, Inmon; Agile methodology including developing PI plans and roadmaps, TDD or BDD and CI/CD toolsAble to coach team members in continuous improvement of the product and mentors team members on optimal design and development practices
Preferred qualifications, capabilities, and skillsProficiency in IaC such as Terraform or AWS cloud formationProficiency in cloud based data pipeline technologies such as Fivetran, DBT, Prophecy.ioProficiency in Snowflake data platformBudgeting and resource allocation and vendor relationship management