Senior MLOps / ML Platform Engineer

People in AI

Senior MLOps / ML Platform Engineer

Location: Remote (U.S.) | Preference for SF Bay Area

Type: Full-time, Permanent

Salary Range: $180,000 – $250,000 + Equity + Benefits

About the Opportunity

People in AI is working with a confidential, late-stage startup that’s scaling one of the most advanced ML platforms in production. This company operates at enormous scale, supporting trillions of real-time and batch interactions across their data infrastructure — and they’re hiring experienced engineers to help build the backbone of their machine learning practice.

You’ll join a high-impact ML Platform team that owns the infrastructure used by 20+ ML Engineers and Data Scientists — enabling faster experimentation, deployment, and monitoring of models in production.

What You’ll Work On

  • Design, build, and operate ML infrastructure for training, deployment, and inference
  • Scale and manage feature stores powering real-time and batch use cases
  • Develop high-throughput pipelines using Ray, Apache Spark, and Kafka
  • Improve latency and reliability of ML model serving (GPU + CPU)
  • Work with tools like MLFlow, Argo, Terraform, Kubernetes (EKS)
  • Build internal tooling and automation to improve ML developer workflows
  • Collaborate closely with cross-functional ML teams to enable experimentation at scale

Ideal Background

  • 5+ years in MLOps, ML Platform Engineering, Data Engineering, or Infrastructure
  • Strong experience with Apache Spark, Spark Structured Streaming, Kafka, Ray, or similar tools
  • Proven experience building or scaling feature stores (e.g. Tecton, Feast)
  • Deep understanding of online vs offline inference, and how to optimize for both
  • Hands-on experience with Kubernetes (EKS), Terraform, and cloud-native infra (AWS preferred)
  • Background in software engineering, with a strong focus on production-grade systems
  • Bonus: experience managing GPU compute environments or working with CI/CD for ML workflows

Tech Stack Highlights

  • Infra: Kubernetes (EKS), Terraform, Helm, Istio, CloudFlare
  • Pipelines: Spark, Ray, Kafka, Airflow
  • Languages: Python, Java, Scala
  • Serving & Orchestration: MLFlow, Argo Workflows, ArgoCD
  • Monitoring: Datadog, Prometheus
  • Modeling tools: HuggingFace 🤗, PyTorch, TensorFlow, Metaflow

Why Apply

  • Join at a pivotal time — huge ownership and technical influence
  • Work on systems used by hundreds of millions of users
  • Competitive compensation + strong equity upside
  • Remote flexibility + preference for Bay Area engineers for in-person collaboration

Job Alerts

Get notified when new positions matching your interests become available at Gen AI Careers.

Need Help?

Questions about our hiring process or want to learn more about working with us?