Machine Learning Engineer, Ads Training Platform
RedditAbout the Role
Join Reddit's Ads Training Platform team as a Machine Learning Engineer to build and scale the critical infrastructure powering our Ads ML models. You will play a key role in enabling fast, reliable, and efficient model training, directly impacting ad targeting and advertiser value.
Responsibilities
- Design, build, and maintain large-scale distributed training infrastructure for Ads ML models.
- Develop tools and frameworks on top of the Ray platform.
- Build tools to debug, profile, and tune distributed training jobs for performance and reliability.
- Integrate with object storage systems and improve data access patterns.
- Collaborate with ML engineers to improve model training time, efficiency, and GPU training costs.
- Drive improvements in scheduling, state management, and fault tolerance within the training platform.
Requirements
- Deep experience in infrastructure, distributed systems, and ML platform operations.
About the Team & Work Environment
The Ads Training Platform pod is responsible for building and maintaining the distributed training and data processing infrastructure that powers Reddit’s Ads machine learning models. Our focus is on enabling fast, reliable, and scalable model training across large datasets, directly supporting Ads ML teams in improving ad targeting, conversion prediction, and advertiser value. This role is available remotely within the United States, offering flexibility for where you work.
About Reddit
View companyReddit is an online platform that enables users to submit links, create content, and have discussions about the topics of their interest.