LLM Prompt Engineering & Cost Optimization Specialist (Production-Scale Systems)

Background

We are building a GenAI-driven recommendation engine that generates structured recommendations by passing user context + prompts to LLMs and evaluating the output.

This system will run at a massive scale (millions of users) with strict cost constraints.

Our goal

≤ $0.02 per user per year
~25 LLM iterations per user
High consistency, predictable output, and strong evaluation hooks

This is not a prototype — this is a production cost-sensitive LLM system.

What You’ll Work On

Design token-efficient, high-quality prompts for recommendation generation
Build guardrails to ensure deterministic, structured outputs
Optimize prompt + model combinations to reduce cost without sacrificing quality
Define prompt templates, versioning strategies, and fallback logic
Work with LLM evaluations (LLM-as-judge / heuristic evals / scoring frameworks)
Reduce retries, hallucinations, and output variance
Advise on model selection (open-source vs proprietary) for cost/performance tradeoffs
Design prompts that reliably trigger tool / function calls and validate tool outputs
Tune multi-prompt and multi-agent LLM pipelines, where prompts must be optimized jointly, not in isolation
Perform comparative analysis of cost vs quality across different LLM providers, models, and prompt versions
Provide clear production recommendations on prompt–model combinations to achieve the sub-$0.02/user/year target

Must-Have Skills

Proven experience in Prompt Engineering for production systems
Strong understanding of token economics (input/output tokens, pricing models)
Experience optimizing LLM systems for cost at scale
Hands-on work with structured outputs (JSON, schema-based generation)
Experience designing guardrails (format enforcement, constraint prompting, validation)
Familiarity with LLM evaluation techniques
Ability to reason about latency, retries, and failure modes
Experience working with reasoning-capable LLMs, including controlling, constraining, or suppressing unnecessary reasoning to balance quality, latency, and cost
Experience designing token-efficient input representations, including transforming raw user context into compact, loss-aware inputs that minimize tokens and reduce unnecessary model reasoning
Experience designing prompts that invoke multiple tools/functions with correct ordering and validation
Hands-on experience with multi-agent or chained prompt systems in production environments

Good to Have

Experience with open-source LLMs (LLaMA, Mistral, Mixtral, etc.)
Knowledge of prompt compression, chaining, and distillation
Experience with batching, caching, or prompt reuse
Understanding of RAG systems (even if not the core task)
Prior work on recommendation systems or personalization engines

What This Role Is NOT

❌ Chatbot or demo-only work

❌ Generic “LLM app” development

❌ Research-only experimentation without cost accountability

Engagement

Short-term contract with potential long-term extension
Immediate start preferred
Must be comfortable working with an engineering-led team

Job Type

Job Type: Contract
Location: United States

Share this job: