MLOps / LLMOps Engineer - Remote (should be able to work on PST time zones) at Rootshell Enterprise Technologies, Inc. (Expired)

Remote (should be able to work on PST time zones)

Prefers local to bay area.

MLOps/LLMOps Engineer

Operationalizing Large Language Models requires specialized expertise beyond traditional MLOps practices. LLMs present unique operational challenges including significantly larger computational requirements, complex data pipelines, specialized infrastructure needs, and unique performance optimization requirements. This specialized role ensures GenAI solutions can scale effectively from proof-of-concept to enterprise-wide deployment in a utility environment.

Ensures GenAI solutions move successfully from prototype to production with proper operational support
Establishes specialized monitoring for model performance, inference latency, and data quality
Enables efficient scaling of LLM solutions across multiple business units
Creates high-performance deployment architectures that balance speed, cost, and reliability
Develops operational data pipelines to continuously improve model performance with new utility-specific data

Key Responsibilities

Design and implement LLM-specific deployment architectures with Docker containers for both batch and real-time inference
Configure GPU infrastructure on-premises or in the cloud with appropriate CI/CD pipelines for model updates
Build comprehensive monitoring and observability systems with appropriate logging, metrics, and alerts
Implement load balancing and scaling solutions for LLM inference, including model sharding if necessary
Create automated workflows for model retraining, versioning, and deployment
Optimize infrastructure costs through intelligent resource allocation, spot instances, and efficient compute strategies
Collaborate with PG&E's Cyber team on implementing appropriate security controls for GenAI applications
Develop automated testing frameworks to ensure consistent output quality across model updates

Expected Skillset

DevOps + ML: Expertise in Kubernetes, Docker, CI/CD tools, and MLflow or similar platforms
Cloud & Infrastructure: Understanding of GPU instance options, cloud services (AWS/Azure/GCP), and optimization techniques
Automation: Proficiency in Python, Bash, and infrastructure-as-code tools like Terraform or Ansible
LLM-Specific Frameworks: Experience with tools like TensorBoard, MLFLow, or equivalent for scaling LLMs
Performance Optimization: Knowledge of techniques to monitor and improve inference speed, throughput, and cost
Collaboration: Ability to work effectively across technical teams while adhering to enterprise architecture standards

MLOps / LLMOps Engineer - Remote (should be able to work on PST time zones)

Remote (should be able to work on PST time zones)

MLOps/LLMOps Engineer

Key Responsibilities

Expected Skillset

Other Recent Opportunities

Principal Knowledge Graph Engineer

Principal Engineer, LLM (Platform and Tooling)

Cyber Forensic Analyst II

Technician, Senior IT (Computer Lab Support)

Sr Principal Computer Ops Analyst – C2BMC Watch Officer (26-128)

Partner Product & Solutions Lead

Job Alerts

Need Help?