About the Role
<h5>The organization</h5><p>Our client is a rapidly growing organization at the forefront of the AI revolution, specializing in providing high-performance computing infrastructure to run heavy LLM models and AI products. They operate a global network of data centers with capacity specifically designed and tailored for extreme-scale computational workloads.</p><h5>The role</h5><p>We are seeking an experienced HPC Engineer to join a dedicated high-performance computing optimization team. This team sits at the intersection of R&D, hardware engineering, and distributed systems, focusing on maximizing computational throughput and efficiency rather than traditional system administration. You'll work with cutting-edge HPC technology to optimize parallel computing environments, GPU clusters, and interconnect systems, meeting the demanding requirements of AI and machine learning workloads.</p><p>You will focus on optimizing the performance of large-scale GPU clusters, targeting latency reduction, computational efficiency, and enhanced parallel processing capabilities. Working with InfiniBand networks and high-performance computing infrastructure, you'll collaborate with cross-functional teams to deliver scalable HPC solutions for client needs.</p><p>The role requires balancing operational optimization and troubleshooting (50%) with HPC architecture design and performance tuning projects (50%). You'll maintain and optimize distributed computing systems, managing over 100,000 GPUs across 10+ InfiniBand networks, while ensuring the optimal performance of global HPC infrastructure and driving continuous computational improvements.</p><h5>What we're looking for</h5><ul><li><p>5+ years of experience in HPC environments and parallel computing systems.</p></li><li><p>Strong proficiency in Linux Kernel optimization for HPC workloads.</p></li><li><p>Proficient with tools for profiling & tuning (kernel-space; for example perf, ftrace, eBPF, etc.)</p></li><li><p>Strong proficiency in C++ or C development for high-performance applications.</p></li><li><p>Experience with Golang and/or Python for HPC tooling and automation.</p></li><li><p>Experience with InfiniBand networking and high-speed interconnects.</p></li><li><p>Experience with distributed computing architectures and cluster management.</p></li></ul><h5>What's offered</h5><ul><li><p>Salary: up to 160k + 25% bonus (200k OTE).</p></li><li><p>Flexible working arrangements.</p></li><li><p>A dynamic and collaborative work environment that values initiative and innovation.</p></li><li><p>Location: Amsterdam or full-remote from anywhere within the EU/EER</p></li></ul>