Hug.claims

Member of Technical Intern

Location: Hybrid / San Francisco, CA

Contribute to model quality and product reliability through a PhD-focused internship in failure-driven LLM research.

Role Context

This internship is designed for PhD candidates who want to move quickly in a production AI environment. You will work with engineering and research partners to identify failure patterns, run controlled experiments, and ship measurable reliability improvements.

Assist with evaluation datasets, labeling quality checks, and benchmark tracking.
Prototype improvements in model behavior and claims workflow instrumentation.
Communicate findings clearly in written summaries and team reviews.

What You Will Work On

We provide full support across mentorship, compute resources, data access, and research guidance for each direction below.

We actively encourage interns to develop original research ideas and publish papers from internship outcomes. Potential venues include NeurIPS, ICML, ICLR, CHI, and journals such as Nature Computational Science and Nature Communications.

Build LLM verifiers: Improve model capability to automatically detect and surface failures across the end-to-end pipeline.
Build failure analysis systems: Construct robust methods to capture and organize a diverse, raw corpus from real-world failure-claim submissions.
Post-train models on failure data: Develop post-training approaches that learn from out-of-distribution failure data without contaminating baseline model performance.
Multimodal failure analysis: Develop cross-fusion methods to analyze and connect failures across different modality types.

Qualifications

Current PhD student in computer science, ML, statistics, or a related field.
Able to commit at least 3 months of full-time work; part-time continuation and FTE transfer/conversion are possible based on performance and team fit.
Relocation bonus is available for qualified candidates.
PhD candidates with relevant publications or prior hands-on experience are especially welcome.
Strong coding skills in Python and comfort with data analysis workflows.
Rich hands-on experience with coding agents such as Claude Code, Cursor, and Codex is strongly preferred.
Clear written communication and curiosity for model behavior debugging.

Application

Please use our standard application form and select the intern role in your submission details.

Apply Now