Position Expired
This job is no longer accepting applications.
MLOps / LLMOps Engineer - Remote (should be able to work on PST time zones)
Rootshell Enterprise Technologies, Inc.
Remote (should be able to work on PST time zones)
Prefers local to bay area.
MLOps/LLMOps Engineer
Operationalizing Large Language Models requires specialized expertise beyond traditional MLOps practices. LLMs present unique operational challenges including significantly larger computational requirements, complex data pipelines, specialized infrastructure needs, and unique performance optimization requirements. This specialized role ensures GenAI solutions can scale effectively from proof-of-concept to enterprise-wide deployment in a utility environment.
- Ensures GenAI solutions move successfully from prototype to production with proper operational support
- Establishes specialized monitoring for model performance, inference latency, and data quality
- Enables efficient scaling of LLM solutions across multiple business units
- Creates high-performance deployment architectures that balance speed, cost, and reliability
- Develops operational data pipelines to continuously improve model performance with new utility-specific data
Key Responsibilities
- Design and implement LLM-specific deployment architectures with Docker containers for both batch and real-time inference
- Configure GPU infrastructure on-premises or in the cloud with appropriate CI/CD pipelines for model updates
- Build comprehensive monitoring and observability systems with appropriate logging, metrics, and alerts
- Implement load balancing and scaling solutions for LLM inference, including model sharding if necessary
- Create automated workflows for model retraining, versioning, and deployment
- Optimize infrastructure costs through intelligent resource allocation, spot instances, and efficient compute strategies
- Collaborate with PG&E's Cyber team on implementing appropriate security controls for GenAI applications
- Develop automated testing frameworks to ensure consistent output quality across model updates
Expected Skillset
- DevOps + ML: Expertise in Kubernetes, Docker, CI/CD tools, and MLflow or similar platforms
- Cloud & Infrastructure: Understanding of GPU instance options, cloud services (AWS/Azure/GCP), and optimization techniques
- Automation: Proficiency in Python, Bash, and infrastructure-as-code tools like Terraform or Ansible
- LLM-Specific Frameworks: Experience with tools like TensorBoard, MLFLow, or equivalent for scaling LLMs
- Performance Optimization: Knowledge of techniques to monitor and improve inference speed, throughput, and cost
- Collaboration: Ability to work effectively across technical teams while adhering to enterprise architecture standards
Other Recent Opportunities
Principal Knowledge Graph Engineer
4/23/2026Wells Fargo
IRVING, TX, United Statesfull time
Principal Engineer, LLM (Platform and Tooling)
4/23/2026Upstart
United States, United Statesfull time
Cyber Forensic Analyst II
4/23/2026LOGCAP Spanish External Career Site
VA-Vienna, United Statesfull time
Technician, Senior IT (Computer Lab Support)
4/23/2026ACC District 🦇
Riverside Campus, United Statesfull time
Sr Principal Computer Ops Analyst – C2BMC Watch Officer (26-128)
4/23/2026Northrop Grumman Restricted Site
United States-Colorado-Schriever AFB, United Statesfull time
Partner Product & Solutions Lead
4/23/2026Arize AI
Remote United States, United Statesfull time