Deep Learning Software Engineer, FlashInfer
NVIDIANVIDIA is actively seeking a Deep Learning Software Engineer, FlashInfer. For over 25 years, NVIDIA has been a leader in computer graphics, PC gaming, and accelerated computing, driven by continuous innovation and exceptional talent. The company is now leveraging the immense potential of AI to shape the next era of computing, where GPUs will power intelligent computers, robots, and self-driving vehicles capable of understanding the world. This pioneering work requires vision, innovation, and top global talent. As an NVIDIAN, you will join a diverse and supportive environment that encourages everyone to achieve their best, making a lasting impact on the world.
The role involves developing groundbreaking technologies within the inference systems software stack, focusing on innovative AI systems software to accelerate AI inference. As a team member, you will develop libraries, code generators, and GPU kernel technologies for NVIDIA's hardware architecture. This includes designing and building new abstractions, efficient attention kernel implementations, new LLM inference runtime components, and kernel code generators to accelerate large language models, agents, and other high-impact AI workloads.
Your responsibilities will include innovating and developing new AI systems technologies for efficient inference, designing, implementing, and optimizing kernels for high-impact AI workloads, and designing and implementing extensible abstractions for LLM serving engines. You will also build efficient just-in-time domain-specific compilers and runtimes. Collaboration with other NVIDIA engineers across deep learning frameworks, libraries, kernels, and GPU architecture teams is key, and you will contribute to open-source communities such as FlashInfer, vLLM, and SGLang.
Candidates should possess a Bachelor's degree in Computer Science, Electrical Engineering, or a related field, or equivalent experience; a PhD is preferred. Strong experience in developing or using deep learning frameworks like PyTorch, JAX, TensorFlow, or ONNX is required, ideally coupled with familiarity with inference engines and runtimes such as vLLM, SGLang, and MLC. Excellent Python and C/C++ programming skills are essential.
To distinguish yourself, a background in domain-specific compiler and library solutions for LLM inference and training, like FlashInfer or Flash Attention, is highly beneficial. Expertise in inference engines such as vLLM and SGLang, as well as machine learning compilers like Apache TVM or MLIR, will be valued. Strong experience in GPU kernel development and performance optimizations, particularly using CUDA C/C++, cuTile, or Triton, is also a significant advantage. Finally, ownership or significant contributions to open-source projects are highly regarded.
The base salary for this position is determined by location, experience, and the compensation of similar roles. The base salary range is 108,000 USD to 178,250 USD for Level 1, and 124,000 USD to 195,500 USD for Level 2. Equity and benefits are also part of the compensation package. Applications will be accepted until at least February 28, 2026. This posting is for an existing vacancy, and NVIDIA utilizes AI tools in its recruiting processes. NVIDIA is an equal opportunity employer committed to fostering a diverse work environment, and does not discriminate based on protected characteristics in its hiring and promotion practices.
Job Type
- Job Type
- Full Time
- Location
- Santa Clara, CA
Share this job:
