English Language Audio Model Trainers

Nancy Assists

We are partnering with a world-leading AI research lab on a groundbreaking multimodal AI project — and we're looking for

2000+ contributors

to join us.

About the Role

As an English Language Audio Model Trainer, you'll help train the next generation of AI systems by recording and evaluating short audio clips that describe visual content. Your voice and input will directly support the development of models capable of understanding the world across both

visual

and

auditory

domains.

Responsibilities

  • Watch short video clips and provide preference feedback
  • Record clear, high-quality audio clips (2–3 minutes each)
  • Follow specific linguistic and stylistic guidelines
  • Collaborate with research and QA teams to ensure dataset quality

Qualifications

  • Native or near-native fluency in English (other languages are a plus)
  • Excellent verbal communication and clear enunciation
  • Strong attention to detail & ability to follow instructions
  • Prior experience with voice recording or annotation is helpful but

not required

Compensation

: $20/hour

Location

: 100% Remote

Schedule

: Flexible

Why Join Us?

  • Contribute to

cutting-edge AI research

  • Gain hands-on experience in the intersection of

language, audio, and computer vision

  • Be part of a global, flexible, and remote-friendly project

Interview Process

  • Quick 15-minute AI interview
  • Short form about your availability
  • Fast turnaround: we aim to respond within 1 week

Job Alerts

Get notified when new positions matching your interests become available at Gen AI Careers.

Need Help?

Questions about our hiring process or want to learn more about working with us?