Senior MLOps / ML Platform Engineer at People in AI (Expired)

Senior MLOps / ML Platform Engineer

Location: Remote (U.S.) | Preference for SF Bay Area

Type: Full-time, Permanent

Salary Range: $180,000 – $250,000 + Equity + Benefits

About the Opportunity

People in AI is working with a confidential, late-stage startup that’s scaling one of the most advanced ML platforms in production. This company operates at enormous scale, supporting trillions of real-time and batch interactions across their data infrastructure — and they’re hiring experienced engineers to help build the backbone of their machine learning practice.

You’ll join a high-impact ML Platform team that owns the infrastructure used by 20+ ML Engineers and Data Scientists — enabling faster experimentation, deployment, and monitoring of models in production.

What You’ll Work On

Design, build, and operate ML infrastructure for training, deployment, and inference
Scale and manage feature stores powering real-time and batch use cases
Develop high-throughput pipelines using Ray, Apache Spark, and Kafka
Improve latency and reliability of ML model serving (GPU + CPU)
Work with tools like MLFlow, Argo, Terraform, Kubernetes (EKS)
Build internal tooling and automation to improve ML developer workflows
Collaborate closely with cross-functional ML teams to enable experimentation at scale

Ideal Background

5+ years in MLOps, ML Platform Engineering, Data Engineering, or Infrastructure
Strong experience with Apache Spark, Spark Structured Streaming, Kafka, Ray, or similar tools
Proven experience building or scaling feature stores (e.g. Tecton, Feast)
Deep understanding of online vs offline inference, and how to optimize for both
Hands-on experience with Kubernetes (EKS), Terraform, and cloud-native infra (AWS preferred)
Background in software engineering, with a strong focus on production-grade systems
Bonus: experience managing GPU compute environments or working with CI/CD for ML workflows

Tech Stack Highlights

Infra: Kubernetes (EKS), Terraform, Helm, Istio, CloudFlare
Pipelines: Spark, Ray, Kafka, Airflow
Languages: Python, Java, Scala
Serving & Orchestration: MLFlow, Argo Workflows, ArgoCD
Monitoring: Datadog, Prometheus
Modeling tools: HuggingFace 🤗, PyTorch, TensorFlow, Metaflow

Why Apply

Join at a pivotal time — huge ownership and technical influence
Work on systems used by hundreds of millions of users
Competitive compensation + strong equity upside
Remote flexibility + preference for Bay Area engineers for in-person collaboration

Senior MLOps / ML Platform Engineer