← Back to Careers
Member of Technical Intern
Location: Hybrid / San Francisco, CA
Contribute to model quality and product reliability through a
PhD-focused internship in failure-driven LLM research.
Role Context
This internship is designed for PhD candidates who want to move
quickly in a production AI environment. You will work with engineering
and research partners to identify failure patterns, run controlled
experiments, and ship measurable reliability improvements.
-
Assist with evaluation datasets, labeling quality checks, and
benchmark tracking.
-
Prototype improvements in model behavior and claims workflow
instrumentation.
-
Communicate findings clearly in written summaries and team reviews.
What You Will Work On
We provide full support across mentorship, compute resources, data
access, and research guidance for each direction below.
We actively encourage interns to develop original research ideas and
publish papers from internship outcomes. Potential venues include
NeurIPS, ICML, ICLR, CHI, and journals such as Nature Computational
Science and Nature Communications.
-
Build LLM verifiers: Improve model capability to
automatically detect and surface failures across the end-to-end
pipeline.
-
Build failure analysis systems: Construct robust
methods to capture and organize a diverse, raw corpus from
real-world failure-claim submissions.
-
Post-train models on failure data: Develop
post-training approaches that learn from out-of-distribution failure
data without contaminating baseline model performance.
-
Multimodal failure analysis: Develop cross-fusion
methods to analyze and connect failures across different modality
types.
Qualifications
-
Current PhD student in computer science, ML, statistics, or a
related field.
-
Able to commit at least 3 months of full-time work; part-time
continuation and FTE transfer/conversion are possible based on
performance and team fit.
- Relocation bonus is available for qualified candidates.
-
PhD candidates with relevant publications or prior hands-on
experience are especially welcome.
-
Strong coding skills in Python and comfort with data analysis
workflows.
-
Rich hands-on experience with coding agents such as Claude Code,
Cursor, and Codex is strongly preferred.
-
Clear written communication and curiosity for model behavior
debugging.
Application
Please use our standard application form and select the intern role in
your submission details.
Apply Now