Position Expired

This job is no longer accepting applications.

LLM Ops Engineer

Brillio

Description

Design, implement, and maintain end-to-end pipelines for LLM training, fine-tuning, validation, and deployment
Build and optimize scalable infrastructure for large language model operations
Deploy LLMs to production environments with prompt management, observability, serverless deployment, monitoring, scaling, and performance optimization
Design, develop, and maintain RESTful API endpoints for LLM inference and model interactions
Ensure API reliability, performance optimization, rate limiting, authentication, and comprehensive documentation
Implement comprehensive monitoring solutions for model performance, drift detection, and system health metrics
Research and evaluate emerging LLMOps techniques, tools, and methodologies and provide recommendations on technology and architecture
Establish and document best practices for LLM operations, deployment patterns, and governance frameworks
Develop prototypes and POCs to validate new approaches and technologies
Collaborate closely with data scientists, ML engineers, DevOps teams, and product managers
Create comprehensive documentation for systems, processes, and architectural decisions
Mentor team members and share expertise through technical presentations and training sessions
Optimize data preprocessing and feature engineering pipelines for LLM training and inference
Implement data validation, quality checks, and lineage tracking for model training datasets
Design efficient data storage and retrieval systems for large-scale model artifacts and training data
Implement model governance frameworks including audit trails, compliance monitoring, and approval workflows
Ensure secure model deployment practices, access controls, and data privacy measures
Identify and mitigate risks associated with LLM deployment and operations
Maintain development, staging, and production environments for LLM workflows

Requirements

Bachelor’s degree in Computer Science, Statistics, Engineering or a related field (B.E/B.Tech/M.Tech) or Equivalent
LLMOps Engineer with software engineering experience
6-12 years of experience building production-quality software (minimum 6 years)
At least 5 years of experience in Python
6+ years of software development experience with strong programming skills in Python and SQL
2+ years of hands-on experience in LLMOps
1+ years of experience with machine learning operations, model deployment, and lifecycle management
Proficiency with at least one major cloud provider (AWS or GCP) and their ML services
Experience with Docker, Kubernetes, and container orchestration for ML workloads
Strong experience in designing, building, and maintaining production-grade APIs for ML services
Proficiency with Git, CI/CD pipelines, and DevOps practices
Understanding of LLM architectures, training methodologies, and fine-tuning techniques
Knowledge of ML pipeline design, model monitoring, and deployment strategies
Understanding of distributed systems, scalability patterns, and microservices architecture
"Good-to-have": Experience with HuggingFace Transformers, PyTorch, TensorFlow, or similar frameworks
"Good-to-have": Knowledge of prompt optimization, RAG (Retrieval-Augmented Generation) architectures
"Good-to-have": Experience with vector search
Note: Exceptional candidates without advanced degrees will be considered

Benefits

Other Recent Opportunities

AI Language Model Trainer

Outlier

United Statesfull time

AI Video Model Trainer for Real-Time Lip Syncing

Upwork

United Statescontract

AI Model Trainer for ChatGPT and Llama 3

Upwork

United Statescontract

Foundry AI Data Engineering integration specialist

Capgemini

New Jerseyfull time

API Integration Specialist

Outlier

Trenton, NJfull time

AI Model Integration Specialist

beBeeMachineLearningOperations

Mahwah, NJfull time

Job Alerts

Get notified when new positions matching your interests become available at Gen AI Careers.

Need Help?

Questions about our hiring process or want to learn more about working with us?