In this role, you will play a key part in the development, optimization, and operation of the Institute’s AI-optimized HPC cluster (the AI Foundry), which integrates 68 Nvidia’s Hopper and 80 Blackwell AI accelerators plus 5PB of fast storage.
You will contribute to the design and implementation of MLOps workflows and AI development pipelines, supporting our researchers in deploying innovative solutions across the AI lifecycle. You will work closely with other R&D units to optimize infrastructure, troubleshoot complex issues related to model training and deployment, and help ensure the reliability, scalability, and performance of our AI systems.