Upwork logo

LLM Prompt Engineering & Cost Optimization Specialist (Production-Scale Systems)

Upwork

Background

We are building a GenAI-driven recommendation engine that generates structured recommendations by passing user context + prompts to LLMs and evaluating the output.

This system will run at a massive scale (millions of users) with strict cost constraints.

Our goal

  • ≤ $0.02 per user per year
  • ~25 LLM iterations per user
  • High consistency, predictable output, and strong evaluation hooks

This is not a prototype — this is a production cost-sensitive LLM system.

What You’ll Work On

  • Design token-efficient, high-quality prompts for recommendation generation
  • Build guardrails to ensure deterministic, structured outputs
  • Optimize prompt + model combinations to reduce cost without sacrificing quality
  • Define prompt templates, versioning strategies, and fallback logic
  • Work with LLM evaluations (LLM-as-judge / heuristic evals / scoring frameworks)
  • Reduce retries, hallucinations, and output variance
  • Advise on model selection (open-source vs proprietary) for cost/performance tradeoffs
  • Design prompts that reliably trigger tool / function calls and validate tool outputs
  • Tune multi-prompt and multi-agent LLM pipelines, where prompts must be optimized jointly, not in isolation
  • Perform comparative analysis of cost vs quality across different LLM providers, models, and prompt versions
  • Provide clear production recommendations on prompt–model combinations to achieve the sub-$0.02/user/year target

Must-Have Skills

  • Proven experience in Prompt Engineering for production systems
  • Strong understanding of token economics (input/output tokens, pricing models)
  • Experience optimizing LLM systems for cost at scale
  • Hands-on work with structured outputs (JSON, schema-based generation)
  • Experience designing guardrails (format enforcement, constraint prompting, validation)
  • Familiarity with LLM evaluation techniques
  • Ability to reason about latency, retries, and failure modes
  • Experience working with reasoning-capable LLMs, including controlling, constraining, or suppressing unnecessary reasoning to balance quality, latency, and cost
  • Experience designing token-efficient input representations, including transforming raw user context into compact, loss-aware inputs that minimize tokens and reduce unnecessary model reasoning
  • Experience designing prompts that invoke multiple tools/functions with correct ordering and validation
  • Hands-on experience with multi-agent or chained prompt systems in production environments

Good to Have

  • Experience with open-source LLMs (LLaMA, Mistral, Mixtral, etc.)
  • Knowledge of prompt compression, chaining, and distillation
  • Experience with batching, caching, or prompt reuse
  • Understanding of RAG systems (even if not the core task)
  • Prior work on recommendation systems or personalization engines

What This Role Is NOT

❌ Chatbot or demo-only work

❌ Generic “LLM app” development

❌ Research-only experimentation without cost accountability

Engagement

  • Short-term contract with potential long-term extension
  • Immediate start preferred
  • Must be comfortable working with an engineering-led team

Job Type

Job Type
Contract
Location
United States

Share this job: