About the Role
The Hybrid Space Liquid Cooling Subject Matter Expert (SME) serves as the technical authority and hands-on expert for liquid cooling systems supporting high-density compute environments within the High-Performance Compute Center (HPCC). Reporting to the Hybrid Space Liquid Cooling Operations Manager, this role is responsible for the day-to-day technical execution, troubleshooting, optimization, and validation of direct-to-chip liquid cooling systems, Coolant Distribution Units (CDUs), Rear Door Heat Exchangers (RDHXs), and associated control and water chemistry systems.
The SME partners closely with Operations, Mechanical, Electrical, and Controls teams to ensure reliable thermal performance, rapid fault isolation, and continuous optimization of the Hybrid Space cooling environment. This role is deeply technical and operational in nature, acting as the escalation point for complex cooling issues, system anomalies, and performance deviations in a 24/7 mission-critical setting.
Responsibilities:
Serve as the primary technical expert for Hybrid Space liquid cooling systems, including CDUs, RDHXs, rack-level cooling loops, isolation valves, associated mechanical equipment, and controls.
Execute advanced troubleshooting, fault isolation, and corrective actions for cooling-related alarms, failures, and performance issues.
Support real-time monitoring and analysis of system performance, including temperatures, pressures, flows, and delta-T metrics.
Assist in optimizing cooling system efficiency, stability, and reliability through data-driven analysis and field validation.
Support commissioning, system modifications, and upgrades impacting the liquid cooling environment.
Monitor and maintain technical water chemistry to ensure corrosion control, biological stability, and compliance with manufacturer specifications.
Perform sampling, testing, and analysis of system fluids, including pH, conductivity, inhibitors, and biological indicators.
Coordinate with water treatment vendors on filtration, chemical dosing, and corrective actions.
Identify early indicators of fluid degradation or system risk and escalate proactively to the Operations Manager.
Act as the primary technical escalation resource for operators and third-party service providers.
Support the Operations Manager during incidents, root cause investigations, and post-event reviews.
Collaborate with Mechanical, Electrical, and Controls teams to ensure proper hydraulic balance, thermal integration, and system redundancy.
Participate in change management reviews for work affecting compute cooling systems.
Maintain accurate as-built documentation, operating procedures, and maintenance records for liquid cooling systems.
Provide technical input for the development of preventive maintenance tasks, procedures, and schedules.
Execute and support preventive maintenance activities for CDUs, RDHXs, isolation valves, and related components.
Provide technical guidance, training, and mentoring to operators and technicians.
Support performance trending, reporting, and identification of improvement opportunities.
Contribute technical findings and recommendations for long-term system enhancements.
Adhere to all EHS, Lockout/Tagout, confined space, and chemical handling policies.
Ensure proper PPE usage and safe work practices when performing work on liquid cooling systems.
Support compliance with applicable NFPA, ASHRAE, OSHA, and site-specific standards.
Requirements:
Bachelor’s degree in Mechanical Engineering, Facilities Engineering, or a related technical discipline;or equivalent hands-on experience.
6+ years of experience in mission-critical environments such as data centers, HPC facilities, or industrial cooling systems.
Direct, hands-on experience with liquid cooling technologies including CDUs, RDHXs, or direct-to-chip cooling systems.
Strong working knowledge of water chemistry management, corrosion control, and industrial cooling fluids.
Demonstrated ability to troubleshoot complex mechanical and thermal systems under operational conditions.
Familiarity with integrated mechanical, electrical, and control systems supporting high-density compute environments.
Knowledge of NFPA, ASHRAE, and OSHA standards relevant to mission-critical cooling infrastructure.
Tech Stack
liquid cooling systemsCDUsRDHXswater chemistry managementcorrosion controlmechanical systemsthermal systemsNFPAASHRAEOSHA