Production ML Engineer (LLMs, Image Gen, Personalization) - Contract to Hire at Upwork (Expired)

Note: You must be comfortable working on products that can involve spicy subject matter and mature themes.

About the Role

We’re looking for a production-minded ML Engineer to lead and own core AI/ML systems across LLMs, image generation, and personalization. This is a hands-on engineering role focused on shipping high-impact features quickly and reliably—not research for research’s sake.

What You’ll Work On

LLM systems: prompting/orchestration, chat memory, RAG, and personalization
Training & fine-tuning: LLMs / Diffusers / TTS with reproducible pipelines (LoRA/QLoRA, PEFT)
High-perf inference: real-time serving with vLLM / TGI, ONNX Runtime, TensorRT-LLM, Triton Inference Server, Hugging Face Accelerate
GenAI features: image generation (SDXL/Diffusers), TTS/STT, and occasional video workflows
Reliability & insight: evaluation harnesses, observability/telemetry, and latency/throughput tuning
Ownership: model/version lifecycle, CI/CD, model registries, and performance regressions

What You Bring

4+ years ML engineering with shipped, production workloads
Strong LLM experience (fine-tuning, prompt strategies, RAG, evals, safety/guardrails)
Image gen experience (Diffusers, SDXL; ControlNet/IP-Adapter is a plus)
Proficiency with Python, Docker, and cloud deployment (AWS/GCP/Azure)
Inference optimization on GPUs (CUDA fundamentals, quantization, batching, KV-cache tricks)
Startup mindset: iterative delivery, bias to action, crisp communication

Bonus Points

TTS/STT (Whisper, VITS/FastPitch, NeMo, 11labs API familiarity)
Personalization systems, chat memory stores, multi-modal pipelines
Distributed training (DeepSpeed, FSDP, Ray) and model versioning/registries (MLflow)
Vector search (pgvector, Milvus, Pinecone, Weaviate) and retrieval quality tuning
Experience with evaluation frameworks (Ragas/DeepEval) and observability (OpenTelemetry, Langfuse, Prometheus/Grafana)

Responsibilities (Contractor)

LLM Engineering

Build, refactor, and productionize LLM inference modules
Maintain and evolve API endpoints for AI services
Migrate/deploy models across cloud providers; manage scaling/rollbacks
Support training, memory systems, and semantic search integrations

AI Systems & Infrastructure

Design and implement robust AI pipelines (evals, telemetry, fine-tuning, data curation)
Stand up end-to-end observability and evaluation with clear SLOs
Own performance: profiling, caching, batching, speculative decoding, paged attention

Why This Work Matters

Your work makes our AI features reliable, scalable, and measurable by:

Enabling multi-cloud deployment for flexibility and cost control
Improving output quality via guardrails, observability, and systematic evaluation
Powering personalization with solid training pipelines and prompt management
Providing business-critical APIs that unify AI/ML functionality across the product

Our Typical Stack

Python • PyTorch • Hugging Face (Transformers, Diffusers, Accelerate, PEFT) • vLLM or TGI • ONNX Runtime • TensorRT-LLM • Triton Inference Server • CUDA/FlashAttention • bitsandbytes/quantization • Ray/Prefect/Airflow • MLflow/Weights & Biases • Postgres + pgvector/Milvus/Pinecone • Redis • Kafka/PubSub • OpenTelemetry • Prometheus/Grafana • Langfuse

Engagement Details

Contract (hourly or milestone-based)
Remote; 3–4 hours overlap with US Eastern Time preferred
Start: immediate

How to Apply (please include)

1. Links to 1–3 shipped projects or repos showing production ML work (not just notebooks)

2. A short note on how you cut inference latency or scaled throughput—be specific (numbers, tools, changes)

3. Your experience with fine-tuning (methods, data prep, evals)

4. Your hourly rate and earliest start date

Skills/Tags

Machine Learning, Deep Learning, Large Language Models (LLMs), Generative AI, PyTorch, Hugging Face, Prompt Engineering, RAG, Computer Vision, Diffusers/SDXL, ONNX Runtime, TensorRT, Triton Inference Server, MLOps, Model Serving, CUDA, Python, Docker, Kubernetes, Observability, MLflow, Langfuse, Vector Databases, Ray, Airflow/Prefect

Production ML Engineer (LLMs, Image Gen, Personalization) - Contract to Hire