Staff Machine Learning Engineer, LLM Fine Tuning (Verilog/RTL Applications)

GTN Technical Staffing

Staff Machine Learning Engineer, LLM Fine‑Tuning (Verilog/RTL Applications)

HIGHLIGHTS

Location: San Jose, CA (Onsite/Hybrid)

Schedule: Full Time

Position Type: Contract

Hourly: BOE

Overview

Our client is building privacy‑preserving LLM capabilities that help hardware design teams reason over Verilog/SystemVerilog and RTL artifacts—code generation, refactoring, lint explanation, constraint translation, and spec‑to‑RTL assistance. Our client is looking for a Staff‑level engineer to technically lead a small, high‑leverage team that fine‑tunes and productizes LLMs for these workflows in a strict enterprise data‑privacy environment.

You don’t need to be a Verilog/RTL expert to start;curiosity, drive, and deep LLM craftsmanship matter most. Any HDL/EDA fluency is a strong plus.

What you’ll do (Responsibilities)

  • Own the technical roadmap for Verilog/RTL‑focused LLM capabilities—from model selection and adaptation to evaluation, deployment, and continuous improvement.
  • Lead a hands‑on team of applied scientists/engineers: set direction, unblock technically, review designs/code, and raise the bar on experimentation velocity and reliability.
  • Fine‑tune and customize models using state‑of‑the‑art techniques (LoRA/QLoRA, PEFT, instruction tuning, preference optimization/RLAIF) with robust HDL‑specific evals:
  • Compile‑/lint‑/simulate‑based pass rates, pass@k for code generation, constrained decoding to enforce syntax, and “does‑it‑synthesize”checks.
  • Design privacy‑first ML pipelines on AWS:
  • Training/customization and hosting using Amazon Bedrock (including Anthropic models) where appropriate;SageMaker (or EKS + KServe/Triton/DJL) for bespoke training needs.
  • Artifacts in S3 with KMS CMKs;isolated VPC subnets & PrivateLink (including Bedrock VPC endpoints), IAM least‑privilege, CloudTrail auditing, and Secrets Manager for credentials.
  • Enforce encryption in transit/at rest, data minimization, no public egress for customer/RTL corpora.
  • Stand up dependable model serving: Bedrock model invocation where it fits, and/or low‑latency self‑hosted inference (vLLM/TensorRT‑LLM), autoscaling, and canary/blue‑green rollouts.
  • Build an evaluation culture: automatic regression suites that run HDL compilers/simulators, measure behavioral fidelity, and detect hallucinations/constraint violations;model cards and experiment tracking (MLflow/Weights & Biases).
  • Partner deeply with hardware design, CAD/EDA, Security, and Legal to source/prepare datasets (anonymization, redaction, licensing), define acceptance gates, and meet compliance requirements.
  • Drive productization: integrate LLMs with internal developer tools (IDEs/plug‑ins, code review bots, CI), retrieval (RAG) over internal HDL repos/specs, and safe tool‑use/function‑calling.
  • Mentor & uplevel: coach ICs on LLM best practices, reproducible training, critical paper reading, and building secure‑by‑default systems.

What you’ll bring (Minimum qualifications)

  • 10+ years total engineering experience with 5+ years in ML/AI or large‑scale distributed systems;3+ years working directly with transformers/LLMs.
  • Proven track record shipping LLM‑powered features in production and leading ambiguous, cross‑functional initiatives at Staff level.
  • Deep hands‑on skill with PyTorch, Hugging Face Transformers/PEFT/TRL, distributed training (DeepSpeed/FSDP), quantization‑aware fine‑tuning (LoRA/QLoRA), and constrained/grammar‑guided decoding.
  • AWS expertise to design and defend secure enterprise deployments, including:
  • Amazon Bedrock (model selection, Anthropic model usage, model customization, Guardrails, Knowledge Bases, Bedrock runtime APIs, VPC endpoints)
  • SageMaker (Training, Inference, Pipelines), S3, EC2/EKS/ECR, VPC/Subnets/Security Groups, IAM, KMS, PrivateLink, CloudWatch/CloudTrail, Step Functions, Batch, Secrets Manager.
  • Strong software engineering fundamentals: testing, CI/CD, observability, performance tuning;Python a must (bonus for Go/Java/C++).
  • Demonstrated ability to set technical vision and influence across teams;excellent written and verbal communication for execs and engineers.

Nice to have (Preferred qualifications)

  • Familiarity with Verilog/SystemVerilog/RTL workflows: lint, synthesis, timing closure, simulation, formal, test benches, and EDA tools (Synopsys/Cadence/Mentor).
  • Experience integrating static analysis/AST‑aware tokenization for code models or grammar‑constrained decoding.
  • RAG at scale over code/specs (vector stores, chunking strategies), tool‑use/function‑calling for code transformation.
  • Inference optimization: TensorRT‑LLM, KV‑cache optimization, speculative decoding;throughput/latency trade‑offs at batch and token levels.
  • Model governance/safety in the enterprise: model cards, red‑teaming, secure eval data handling;exposure to SOC2/ISO 27001/NIST frameworks.
  • Data anonymization, DLP scanning, and code de‑identification to protect IP.

What success looks like

90 days

  • Baseline an HDL‑aware eval harness that compiles/simulates;establish secure AWS training & serving environments (VPC‑only, KMS‑backed, no public egress).
  • Ship an initial fine‑tuned/customized model with measurable gains vs. Base (e.G., +X% compile‑pass rate, −Y% lint findings per K LOC generated).

180 days

  • Expand customization/training coverage (Bedrock for managed FMs including Anthropic;SageMaker/EKS for bespoke/open models).
  • Add constrained decoding + retrieval over internal design specs;productionize inference with SLOs (p95 latency, availability) and audited rollout to pilot hardware teams.

12 months

  • Demonstrably reduce review/iteration cycles for RTL tasks with clear metrics (defect reduction, time‑to‑lint‑clean, % auto‑fix suggestions accepted), and a stable MLOps path for continuous improvement.

(Security & privacy by design)

  • Customer and internal design data remain within private AWS VPCs;access via IAM roles and audited by CloudTrail;all artifacts encrypted with KMS.
  • No public internet calls for sensitive workloads;Bedrock access via VPC interface endpoints/PrivateLink with endpoint policies;SageMaker and/or EKS run in private subnets.
  • Data pipelines enforce minimization, tagging, retention windows, and reproducibility;DLP scanning and redaction are first‑class steps.
  • We produce model cards, data lineage, and evaluation artifacts for every release.

Tech you’ll touch

  • Modeling: PyTorch, HF Transformers/PEFT/TRL, DeepSpeed/FSDP, vLLM, TensorRT‑LLM
  • AWS & MLOps: Amazon Bedrock (Anthropic and other FMs, Guardrails, Knowledge Bases, Runtime APIs), SageMaker (Training/Inference/Pipelines), MLflow/W&B, ECR, EKS/KServe/Triton, Step Functions
  • Platform/Security: S3 + KMS, IAM, VPC/PrivateLink (incl. Bedrock), CloudWatch/CloudTrail, Secrets Manager

Tooling (nice to have)

  • HDL toolchains for compile/simulate/lint, vector stores (pgvector/OpenSearch), GitHub/GitLab CI

"We are GTN –The Go To Network"

Job Alerts

Get notified when new positions matching your interests become available at {organizationName}.

Need Help?

Questions about our hiring process or want to learn more about working with us?