Member of Technical Staff - Multimodal
Location: Hybrid / San Francisco, CA
Focus on text-image-tool reasoning quality and multimodal failure
diagnosis.
Role Context
This role focuses on multimodal reliability in real user workflows.
You will diagnose where text, visual, and tool outputs diverge from
ground truth, then convert those findings into deployable evaluation
and mitigation plans.
- Own multimodal failure taxonomy and benchmark evolution.
-
Design targeted tests for image reasoning, OCR-heavy tasks, and
tool-use chains.
-
Partner with model and product teams to reduce cross-modal error
propagation.
First 90 Days
-
Baseline multimodal error rates across priority user journeys.
-
Deliver a failure analysis report with top mitigation opportunities.
-
Ship an evaluation harness for at least one new multimodal risk
category.
Cross-Functional Partners
- Vision and multimodal model engineering teams.
-
Tooling teams for retrieval, OCR, and external API integration.
- Product and design teams for end-user interaction quality.
Success Metrics
- Decrease in multimodal failure frequency on targeted tasks.
- Coverage depth of multimodal benchmark suites.
- Adoption rate of shipped mitigations by product teams.
Responsibilities
-
Design evaluations and diagnostics for multimodal model outputs.
-
Investigate failure modes across text, image, and tool-augmented
workflows.
-
Prototype mitigations and translate findings into deployable
improvements.
-
Collaborate with engineering and product teams in fast iteration
cycles.
Qualifications
-
MS/PhD or equivalent experience in ML, computer vision, NLP, or
computer science.
-
Experience building and evaluating multimodal systems in
production-like environments.
Compensation
Total compensation: USD $190,000 - $430,000 (base + equity + bonus,
location dependent).
Apply Now