Santa Clara, CA, 95054, USA
21 hours ago
2026 Summer Intern - ML modeling of DNA sequencing error, Roche Diagnostics
At Roche you can show up as yourself, embraced for the unique qualities you bring. Our culture encourages personal expression, open dialogue, and genuine connections, where you are valued, accepted and respected for who you are, allowing you to thrive both personally and professionally. This is how we aim to prevent, stop and cure diseases and ensure everyone has access to healthcare today and for generations to come. Join Roche, where every voice matters. **The Position** **2026 Summer Intern - ML modeling of DNA sequencing error, Roche Diagnostics** We advance science so that we all have more time with the people we love. **Department Summary** About CSI (Computational Science & Informatics) at Roche Diagnostics: CSI is Roche Diagnostics’ hub for computational R&D, bringing together data science, machine learning, computational biology, and software engineering to build clinically impactful technologies. We work end-to-end—from research prototypes to production-grade systems—advancing diagnostics across modalities such as DNA sequencing, in-silico analysis, in-vivo sensing, and real-time monitoring (e.g., continuous glucose). + Mission and impact + Translate cutting-edge computation into safer, faster, and more accurate diagnostics used worldwide. + Partner closely with assay scientists, bioinformaticians, clinicians, regulatory experts, and product teams to turn models into validated medical solutions. + Focus on trustworthiness: reproducibility, robustness, uncertainty quantification, and clinically relevant performance metrics. ​ + What we work on + Genomics and bioinformatics: basecalling, error modeling, variant calling, QC pipelines, multi-omics integration, sequence-and signal-level modeling. + ML/AI for healthcare: representation learning for biological sequences, sequence-to-sequence models, probabilistic modeling, causal inference, and real-time analytics for sensor data. + Scalable data & software: workflow orchestration, MLOps, cloud-native pipelines, and high-performance computing for large-scale datasets. Our Bioinformatics group within CSI specializes in DNA sequencing and algorithm development—ideal for students with ML experience and a foundational understanding of genomics. The summer project on an ML-based prediction algorithm for sequencing error sits squarely in this environment: you’ll tackle a high-impact problem with access to domain experts, robust datasets, and the engineering support needed to move from idea to validated solution. **This internship position is located in** **Santa Clara, CA - onsite.** **The Opportunity** + Ramp-up, pipeline familiarization, and problem framing + Get familiar with the sequencing Simulation pipeline: architecture, data flow, interfaces, and evaluation metrics + Reproduce baseline runs and document the setup for reproducibility (env, data versions, configs) + Define targets (e.g., per-base/read-level error probabilities) and assemble training/validation datasets + Perform feature engineering, sanity checks, and data quality assessments; establish data splits and leakage controls ​ + Model development and evaluation (primary focus) + Implement baseline models: gradient-boosted decision trees (e.g., XGBoost/LightGBM/CatBoost) and neural network regression for probability vector prediction + Train, tune, and validate models using robust protocols (cross-validation, early stopping, hyperparameter search) + Assess performance with appropriate metrics (e.g., Brier score, log-loss, RMSE; calibration curves and reliability diagrams; ROC/PR if framed as classification) + Analyze model behavior: feature importance, error stratification, ablation studies, and basic uncertainty estimates ​ + Pipeline integration and efficiency + Integrate the best-performing model(s) into the Simulation pipeline + Profile and, if needed, improve pipeline efficiency (I/O, batching, parallelization); ensure reproducible workflows (containers, versioning) ​ + Communication and documentation + Maintain clear experiment logs, notebooks, and code documentation + Share progress updates; prepare a concise final report and presentation + Draft a structured analysis write-up that could potentially serve as the basis for a future publication (post-internship) ​ + Stretch (time-permitting; not required) + Explore feasibility of sequence-aware architectures (e.g., transformer-based models) for error prediction and document findings for future work **Program Highlights** + **Intensive 12-weeks, full-time (40 hours per week) paid internship.** + **Program start dates are in May/June (Summer)** + **A stipend, based on location, will be provided to help alleviate costs associated with the internship.** + Ownership of challenging and impactful business-critical projects. + Work with some of the most talented people in the biotechnology industry. **Who You Are (Required)** **Required Education:** You meet the following criteria: + Must be pursuing a Master's or PhD Degree. **Required Majors:** Computer Science, Physics, Applied mathematics/Engineering, Biology, Chemistry (or closely related engineering/science fields). **Required Skills:** + Working knowledge of Probability, Statistics and Machine Learning fundamentals. + Solid understanding of Linear Algebra and Programming Methodology. + Proficiency in Python and at least one ML framework (PyTorch or TensorFlow). + Strong data structures and algorithms fundamentals; ability to write clean and efficient code. + Comfort with Linux command line and basic shell scripting. **Preferred Knowledge, Skills, and Qualifications** + Biology/Chemistry background is a plus. + Excellent communication, collaboration, and interpersonal skills. + Complements our culture and the standards that guide our daily behavior & decisions: Integrity, Courage, and Passion. **Relocation benefits are not available for this job posting.** The expected salary range for this position based on the primary location of California is $50.00 per hour. Actual pay will be determined based on experience, qualifications, geographic location, and other job-related factors permitted by law. This position also qualifies for paid holiday time off benefits. **Who we are** A healthier future drives us to innovate. Together, more than 100’000 employees across the globe are dedicated to advance science, ensuring everyone has access to healthcare today and for generations to come. Our efforts result in more than 26 million people treated with our medicines and over 30 billion tests conducted using our Diagnostics products. We empower each other to explore new possibilities, foster creativity, and keep our ambitions high, so we can deliver life-changing healthcare solutions that make a global impact. Let’s build a healthier future, together. Roche is an equal opportunity employer. It is our policy and practice to employ, promote, and otherwise treat any and all employees and applicants on the basis of merit, qualifications, and competence. The company's policy prohibits unlawful discrimination, including but not limited to, discrimination on the basis of Protected Veteran status, individuals with disabilities status, and consistent with all federal, state, or local laws. If you have a disability and need an accommodation in relation to the online application process, please contact us by completing this form Accommodations for Applicants (https://docs.google.com/forms/d/e/1FAIpQLSdZWlsbfQOvFVIQgHE\_iDzWUTlhZvj6FytIzjS7xq6IGh1H5g/viewform) .
Confirm your E-mail: Send Email