LLM Prompt Engineering & Cost Optimization Specialist (Production-Scale Systems)
UpworkBackground
We are building a GenAI-driven recommendation engine that generates structured recommendations by passing user context + prompts to LLMs and evaluating the output.
This system will run at a massive scale (millions of users) with strict cost constraints.
Our goal
- ≤ $0.02 per user per year
- ~25 LLM iterations per user
- High consistency, predictable output, and strong evaluation hooks
This is not a prototype — this is a production cost-sensitive LLM system.
What You’ll Work On
- Design token-efficient, high-quality prompts for recommendation generation
- Build guardrails to ensure deterministic, structured outputs
- Optimize prompt + model combinations to reduce cost without sacrificing quality
- Define prompt templates, versioning strategies, and fallback logic
- Work with LLM evaluations (LLM-as-judge / heuristic evals / scoring frameworks)
- Reduce retries, hallucinations, and output variance
- Advise on model selection (open-source vs proprietary) for cost/performance tradeoffs
- Design prompts that reliably trigger tool / function calls and validate tool outputs
- Tune multi-prompt and multi-agent LLM pipelines, where prompts must be optimized jointly, not in isolation
- Perform comparative analysis of cost vs quality across different LLM providers, models, and prompt versions
- Provide clear production recommendations on prompt–model combinations to achieve the sub-$0.02/user/year target
Must-Have Skills
- Proven experience in Prompt Engineering for production systems
- Strong understanding of token economics (input/output tokens, pricing models)
- Experience optimizing LLM systems for cost at scale
- Hands-on work with structured outputs (JSON, schema-based generation)
- Experience designing guardrails (format enforcement, constraint prompting, validation)
- Familiarity with LLM evaluation techniques
- Ability to reason about latency, retries, and failure modes
- Experience working with reasoning-capable LLMs, including controlling, constraining, or suppressing unnecessary reasoning to balance quality, latency, and cost
- Experience designing token-efficient input representations, including transforming raw user context into compact, loss-aware inputs that minimize tokens and reduce unnecessary model reasoning
- Experience designing prompts that invoke multiple tools/functions with correct ordering and validation
- Hands-on experience with multi-agent or chained prompt systems in production environments
Good to Have
- Experience with open-source LLMs (LLaMA, Mistral, Mixtral, etc.)
- Knowledge of prompt compression, chaining, and distillation
- Experience with batching, caching, or prompt reuse
- Understanding of RAG systems (even if not the core task)
- Prior work on recommendation systems or personalization engines
What This Role Is NOT
❌ Chatbot or demo-only work
❌ Generic “LLM app” development
❌ Research-only experimentation without cost accountability
Engagement
- Short-term contract with potential long-term extension
- Immediate start preferred
- Must be comfortable working with an engineering-led team
Job Type
- Job Type
- Contract
- Location
- United States
Share this job:
