LLM Ops Engineer

Brillio

Description

  • Design, implement, and maintain end-to-end pipelines for LLM training, fine-tuning, validation, and deployment
  • Build and optimize scalable infrastructure for large language model operations
  • Deploy LLMs to production environments with prompt management, observability, serverless deployment, monitoring, scaling, and performance optimization
  • Design, develop, and maintain RESTful API endpoints for LLM inference and model interactions
  • Ensure API reliability, performance optimization, rate limiting, authentication, and comprehensive documentation
  • Implement comprehensive monitoring solutions for model performance, drift detection, and system health metrics
  • Research and evaluate emerging LLMOps techniques, tools, and methodologies and provide recommendations on technology and architecture
  • Establish and document best practices for LLM operations, deployment patterns, and governance frameworks
  • Develop prototypes and POCs to validate new approaches and technologies
  • Collaborate closely with data scientists, ML engineers, DevOps teams, and product managers
  • Create comprehensive documentation for systems, processes, and architectural decisions
  • Mentor team members and share expertise through technical presentations and training sessions
  • Optimize data preprocessing and feature engineering pipelines for LLM training and inference
  • Implement data validation, quality checks, and lineage tracking for model training datasets
  • Design efficient data storage and retrieval systems for large-scale model artifacts and training data
  • Implement model governance frameworks including audit trails, compliance monitoring, and approval workflows
  • Ensure secure model deployment practices, access controls, and data privacy measures
  • Identify and mitigate risks associated with LLM deployment and operations
  • Maintain development, staging, and production environments for LLM workflows

Requirements

  • Bachelor’s degree in Computer Science, Statistics, Engineering or a related field (B.E/B.Tech/M.Tech) or Equivalent
  • LLMOps Engineer with software engineering experience
  • 6-12 years of experience building production-quality software (minimum 6 years)
  • At least 5 years of experience in Python
  • 6+ years of software development experience with strong programming skills in Python and SQL
  • 2+ years of hands-on experience in LLMOps
  • 1+ years of experience with machine learning operations, model deployment, and lifecycle management
  • Proficiency with at least one major cloud provider (AWS or GCP) and their ML services
  • Experience with Docker, Kubernetes, and container orchestration for ML workloads
  • Strong experience in designing, building, and maintaining production-grade APIs for ML services
  • Proficiency with Git, CI/CD pipelines, and DevOps practices
  • Understanding of LLM architectures, training methodologies, and fine-tuning techniques
  • Knowledge of ML pipeline design, model monitoring, and deployment strategies
  • Understanding of distributed systems, scalability patterns, and microservices architecture
  • "Good-to-have": Experience with HuggingFace Transformers, PyTorch, TensorFlow, or similar frameworks
  • "Good-to-have": Knowledge of prompt optimization, RAG (Retrieval-Augmented Generation) architectures
  • "Good-to-have": Experience with vector search
  • Note: Exceptional candidates without advanced degrees will be considered

Benefits

Job Alerts

Get notified when new positions matching your interests become available at Gen AI Careers.

Need Help?

Questions about our hiring process or want to learn more about working with us?