New York, NJ, USA
4 days ago
Project Leader-Clinical Solutions CoE (10457)
Solution design & backbone engineering Own end to end GenAI pipelines for CSR automation: Ingestion à preprocessing/OCR à semantic chunking à embeddings/vector store à RAG/agentsà evaluation & observability. Engineer agentic workflows with LangGraph / AutoGen / CrewAI / Agno to orchestrate multi step reasoning, tool use, and human checkpoints. Modularize codebases to support pluggable models, prompts, retrievers, and knowledge graphs across content creation, summarization, and Q&A. Data, retrieval, and knowledge graphs Connect to Databricks for data engineering and feature pipelines Implement semantic chunking/tokenization strategies for clinical artifacts (TLFs, SAP, protocol, listings) and optimize retrievers for numerical fidelity. Stand up and tune vector databases; design Knowledge Graphs (e.g., Neo4j) and Graph RAG patterns for evidence and traceability. Establish ground truth/eval sets and automatic scoring for accuracy, completeness, and faithfulness. Modeling, prompts, and quality Select and integrate LLMs via Azure OpenAI/AWS Bedrock Design prompt templates and function/tool schemas for consistent outputs. Build guardrails and evaluators to reduce hallucination Deployment & MLOps (Good to have) Containerize and deploy services with Docker on Azure/AWS; implement CI/CD, secrets, model/version registries, and cost observability. Integrate low/no code orchestrators (n8n) for human workflows and rapid business iteration Maintain production reliability (latency, cost per page/section, eval scores) and security (PII handling, audit logs)

Required Skills and Qualification:

Advanced degree in computer sciences or quantitative research or relevant experience in AI/ML space with hands-on experience in end-to-end (development to deployment) 5+ yrs AI/ML engineering (2+ yrs Generative AI in production). Strong Python; experience building back-end services and APIs (bonus). Familiarity with OCR technologies and multi-modal data processing. Hands on with agent frameworks (LangGraph / AutoGen / CrewAI / Agno) and RAG architectures. Expertise in semantic chunking, tokenization, vector DBs, and OCR pipelines. Cloud experience (Azure/AWS, Azure OpenAI / Bedrock); containerization with Docker. Domain exposure to pharma / Clinical R&D will be an added advantage

Soft Skills:

Work closely with cross-functional teams, including clinical data engineers, clinical data modelers, front end developers, QA/CSV specialists, clinical SMEs, project managers, and cloud/infrastructure professionals, to understand project requirements and deliver high-quality solutions. Maintain thorough documentation of methodologies, processes, and results of the models. Prepare and present detailed reports on assigned projects to stakeholders, including client and internal leadership illustrating the impact and effectiveness of data-driven strategies.
Confirm your E-mail: Send Email