ML Research Engineer, Performance Optimization
Liquid AIAbout the Role
Join our Research & Engineering team as an ML Research Engineer focused on performance optimization, where you will architect and implement cutting-edge solutions to redefine AI efficiency on various hardware platforms. This remote, full-time role offers the opportunity to make a significant impact on the frontier of intelligent systems, working across a global team.
Responsibilities
- Design and implement high-performance, custom GPU kernels for ML training and inference.
- Utilize low-level profiling tools to tune and optimize kernel performance.
- Integrate optimized GPU kernels into ML frameworks like PyTorch, bridging high-level models with low-level hardware.
- Apply a deep understanding of memory hierarchy to optimize compute and memory-bound workloads.
- Develop fine-grain optimizations targeting specific hardware architectures, such as tensor cores.
Requirements
- Experience writing high-performance, custom GPU kernels for training or inference.
- Proficiency with low-level profiling tools and demonstrated ability to tune kernels effectively.
- Experience integrating GPU kernels into ML frameworks (e.g., PyTorch).
- Solid understanding of memory hierarchy and optimization techniques for compute and memory-bound workloads.
- Demonstrated ability to implement fine-grain optimizations for target hardware, including tensor cores.
Role Impact & Work Environment
As part of the Research & Engineering team, your work will directly contribute to building efficient AI systems that operate on-device, at the edge, and under real-time constraints, pushing the boundaries of intelligent system architecture. This is a full-time, remote position open to candidates in the United States, Austria, Canada, France, Germany, Netherlands, Switzerland, and the UK.
About Liquid AI
View companyAn MIT spin-off, Liquid AI develops efficient general-purpose AI systems and foundation models, including "liquid neural networks," designed for adaptable machine learning with minimal processing power, optimized for edge devices.