Nuvepro - Task Intelligence for the Enterprise
Mistral· Engineering & Infra· Paris

Datacenter Hardware Engineer, HPC

Classified Tasks (30)

Automate 0%Augment 60%Human-Only 40%

Augment (18)

AI assists, human decides

Maintain GPU and CPU clusters to ensure continuous operational availability.

operational

Scale GPU/CPU clusters safely and reliably to meet growing compute demand.

technical

Follow pre-work and post-work checklists for hardware maintenance and repairs.

operational

Triage hardware faults using LEDs, POST, beep codes, and basic diagnostic tests.

technical

Capture diagnostic evidence including photos, serial numbers, and test results.

administrative

Open, update, and close support tickets with clear, factual notes.

administrative

Provide feedback and propose improvements to proactive maintenance, monitoring, and targeted follow-ups on recurring anomalies.

analytical

Convert ad-hoc checks into standard operating procedures, alerts, and monitoring dashboards.

operational

Receive, inspect, and track incoming hardware parts and components.

administrative

Maintain accurate labeled inventory of spare parts and consumables.

administrative

Process and manage vendor RMAs for defective parts.

administrative

Coordinate part deliveries, returns, and technical support with vendors.

communication

Communicate status updates and next steps to hardware owners, datacenter operations, and stakeholders clearly and promptly.

communication

Update and maintain SOPs, checklists, and runbooks to reflect current procedures.

administrative

Record all hardware changes and maintain audit-ready documentation with no undocumented modifications.

administrative

Perform basic Linux boot checks and analyze system logs to support hardware diagnostics.

technical

Use or develop Python/Bash scripts to automate diagnostics and routine hardware tasks.

technical

Monitor cluster health and perform remediation actions to keep the GPU cluster operational.

operational

Human-Only (12)

Requires human judgment

Troubleshoot compute and storage hardware issues affecting CPUs, memory, drives, NICs, GPUs, and PSUs.

technical

Investigate and repair interconnect and networking faults involving switches, cables, transceivers, Ethernet, and InfiniBand.

technical

Perform power-off, lockout/tagout (LOTO), and ESD-protected interventions to replace, reseat, or recable components and restore service.

operational

Apply LOTO and ESD procedures during all hardware interventions.

operational

Maintain tidy, organized, and safe work areas in the datacenter.

operational

Escalate and collaborate with senior hardware and firmware owners on complex or multi-node incidents.

leadership

Install, reseat, and swap GPU and PCIe cards.

technical

Install, reseat, and swap network interface cards (NICs).

technical

Replace and service power supply units (PSUs) and storage drives.

technical

Install and organize servers and components in racks, including rails, cabling, and labeling.

technical

Manage rack cooling, PDU/power connections, network/storage cabling, and general cable management.

operational

Lift and mount equipment into racks following HSE and electrical safety procedures.

operational

Job description

About Mistral At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life. We democratize AI through high-performance, optimized, open-source and cutting-edge models, products and solutions. Our comprehensive AI platform is designed to meet enterprise needs, whether on-premises or in cloud environments. Our offerings include le Chat, the AI assistant for life and work. We are a dynamic, collaborative team passionate about AI and its potential to transform society. Our diverse workforce thrives in competitive environments and is committed to driving innovation. Our teams are distributed between France, USA, UK, Germany and Singapore. We are creative, low-ego and team-spirited. Join us to be part of a pioneering company shaping the future of AI. Together, we can make a meaningful impact. See more about our culture on https://mistral.ai/careers . Role Summary Our compute footprint is growing fast to support our science and engineering teams. We’re hiring a Datacenter HW Engineer to maintain, troubleshoot, and scale our GPU/CPU clusters safely and reliably. You’ll execute hands-on hardware work in our Paris-area datacenter and partner with hardware owners, DC operations, and vendors to keep one of France’s largest GPU clusters healthy. Location: Bruyères-le-Châtel — on-site, field role Reporting line: Hardware Ops Impact • Compute is a key lever for Mistral’s success and our largest spend item. • Direct impact on scale: your work keeps one of France’s largest AI clusters healthy as we grow to unprecedented scale. • Enable breakthrough AI: you unlock our science & engineering teams to deliver groundbreaking AI solutions . What you will do • Diagnose & operate core server/cluster components - Investigate and handle compute/storage hardware issues ( CPU, memory, drives, NICs, GPUs, PSUs ) and interconnect problems ( switches, cables, transceivers; Ethernet/InfiniBand ). Perform safe interventions (power-off/lockout, ESD ) to replace, re-seat, or recable components and restore service. • Safety & procedures - Apply lockout/tagout (LOTO) and ESD discipline; follow pre/post-work checklists; maintain tidy, safe work areas. • First-line diagnostics - Triage using LEDs, POST, beep codes and basic tests; capture evidence (photos, serials, results); open/update/close tickets with clear notes. • Preventive maintenance - Provide feedback and ideas to improve proactive activities, monitoring, and targeted follow-ups on recurring or specific anomalies; help turn ad-hoc checks into SOPs, alerts, and dashboards. • Parts & logistics - Receive and track parts, keep labeled inventory accurate, manage simple RMAs , and coordinate with vendors. • Collaboration & escalation - Partner with senior hardware/firmware owners on complex or multi-node issues; communicate status and next steps crisply. • Documentation & quality - Keep SOPs/checklists current; ensure zero undocumented changes and consistent, audit-ready records. About you • Hands-on mindset in datacenters/server hardware : you can install/re-seat/swap GPU/PCIe cards , NICs , PSUs , drives , and work cleanly in racks (rails, cabling, labeling). We also welcome candidates with strong Linux fundamentals (boot/check, logs) and scripting ( Python/Bash ) who are eager to learn hardware; you’ll be trained and mentored by a senior hardware engineer. • Disciplined and meticulous: follows checklists, ESD/LOTO ; no rough handling; careful with all high-value server components . • Practical electrical basics: power-off, PPE, short-circuit risk awareness. • Comfortable in racks: cooling, network, storage, PDU, cable management; can lift/mount safely (within HSE limits). • Clear communicator: short factual updates; reliable teammate; punctual and process-minded. • Hardware-passionate, professionally grounded: strong curiosity and craft mindset. Nice to have • HPC/AI/Cloud at scale exp
Source: Mistral careers · scraped 2026-05-22
Apply at Mistral