San Francisco, CA, United States of America
19 hours ago
Sr Data Scientist

Job Posting Title:

Sr Data Scientist

Req ID:

10142140

Job Description:

Our R&D teams at Lucasfilm and ILM are seeking a Sr Data Scientist to join a strategic R&D initiative focused on Generative AI. The goal of this project is to develop a robust data curation pipeline that can help identify and leverage our most useful assets and media for technical model training.

You will play a critical role in bridging the gap between raw visual data and advanced machine learning applications. You will be responsible for the statistical analysis, sampling strategies, and evaluation metrics required to ensure our training data is diverse, relevant, and optimized for next-generation image and video synthesis.

This role is considered Hybrid, which means the employee will work 2-3 days onsite at our San Francisco location and occasionally from home.  This is a project position for 6 months.

What you’ll do

 Data Strategy & Diversity Analysis

Independently design and implement statistical methods to ensure curated datasets retain representative coverage across various visual attributes, stylistic choices, and subject matter.

Develop logic to identify and down-weight low-variance or repetitive data points to maximize training efficiency.

Collaborate with key stakeholders on algorithms for de-duplication to automatically eliminate redundant or near-identical assets from the training corpus.
 

Evaluation Metrics & Quality Assurance

Design and lead  implementation of automated metrics to assess the quality of generative images and videos.

Validate automated quantitative metrics by correlating them against qualitative feedback provided by senior creative stakeholders.

Establish success criteria for model fidelity, accuracy, and stylistic consistency.
 

Pipeline Integration

Work closely with the engineering team to integrate data cleaning, normalization, and sampling modules into a scalable automated pipeline.

Assist in defining taxonomy and metadata standards to systematically organize unstructured visual assets.

Project Focus & Timeline

This is a fast-paced, 6-month initiative. You will move through rapidly iterating phases:

Phase 1: defining data taxonomy and establishing baseline automated metrics.

Phase 2: refining metrics for temporal consistency and validating against initial model fine-tuning runs.

Phase 3: final validation of metrics and delivery of fully curated, optimized datasets for cold storage.
 

What We’re Looking For

5+ years experience in related field

Education - Bachelor’s degree in Data Science, Computer Science, or a related field of study, and/or equivalent work experience.  Master’s Degree preferred

Experience: Proven background in Data Science with a strong emphasis on Computer Vision, Generative AI, or Deep Learning.

Technical Skills: Proficiency in statistical analysis and dataset curation (distribution analysis, sampling techniques). Experience working with large-scale unstructured media data is a plus.

Evaluation Expertise: Familiarity with standard and novel metrics for evaluating Generative Models (e.g., FID, FVD, or similar).

Communication: Ability to translate complex statistical insights for engineering partners and non-technical creative leads.

The hiring range for this position in San Francisco is $155,400-$208,400 per year. The base pay actually offered will take into account internal equity and also may vary depending on the candidate’s geographic region, job-related knowledge, skills, and experience among other factors. A bonus and/or long-term incentive units may be provided as part of the compensation package, in addition to the full range of medical, financial, and/or other benefits, dependent on the level and position offered.

Job Posting Segment:

ILM San Francisco

Job Posting Primary Business:

ILM San Francisco

Primary Job Posting Category:

Data Science

Employment Type:

Full time

Primary City, State, Region, Postal Code:

San Francisco, CA, USA

Alternate City, State, Region, Postal Code:

Date Posted:

2026-02-03
Confirm your E-mail: Send Email