Location: Bissen, Luxembourg

Type: Permanent

 

HPC System Engineer – Computing Services (M/F)

 

Who we are:

LuxProvide is the national supercomputer HPC organization in charge of the planning, installation and long term operation of the same. For our recently set up organization and HPC infrastructure, we are seeking for a Network Engineer to join the core team in charge of the design, implementation and operation of the same. You will be come part of a high performance team of experts with the unique opportunity to contribute to the initial planning and installation of the whole HPC infrastructure.

 

The tasks:

  • As part of the first team on the ground, you will Design, install, configure, tune and maintain the HPC software and middleware infrastructure for computing and storage, from operating system installations to high level services
  • Contribute to the design and plans for integration of operational services
  • Install and manage: software provisioning and configuration automation systems, virtualization solutions and monitoring systems and status reporting systems, deployment systems for bare-metal server/compute node installation, HPC workload management systems & license management solutions, DevOps platforms
  • Manage the lifecycle of servers and compute nodes, and their software environment
  • Perform system builds, deploy systems software, perform software integration
  • Plan and install software/firmware patches and upgrades to all components
  • Troubleshoot, debug and solve system’s operational problems
  • Develop and apply policy rules within HPC software and middleware tools and implement user-facing software tools and APIs
  • Contribute to porting and the development of system and service management tools
  • Apply modern system administration and software development best practices
  • Select, configure, and install hardware (including lifting, racking and cabling)
  • Ensure repairing of hardware and subsystems, working with vendor field engineers to resolve hardware or subsystem problems in a timely manner
  • Set up and maintain documentation of the existing HPC operational software and hardware infrastructure
  • Lead, contribute in and participate in training sessions for the efficient and effective use of HPC system software and the HPC hardware infrastructure
  • Participate in the development and implementation of security measures and disaster recovery procedures at all levels of the HPC software and hardware infrastructure
  • Participate if needed in 24/7 on-call support rotating shifts to resolve urgent issues with software on mission-critical systems

 

Skills and Requirements:

  • University degree in computer science, computer engineering, information technology or a closely related field is requested
  • +5 yearsin of proven working experience in system administration of large-scale Linux-based environments, preferably in a HPC/Supercomputing centre in a Site Reliability Engineering role
  • Expert knowledge of large-scale systems management with modern administration practices
  • Excellent knowledge of scripting and programming languages
  • Fluency in English and strong verbal/written communication skills is required. German and/or French is a plus

 

Benefits:

  • Work on cutting edge and exciting technologies within a team of highly motivated and passionate colleagues
  • Flat hierarchies, own area of responsibility with room for creativity, with the possibility to grow within the role
  • Home office is possible
  • An excellent working atmosphere and working conditions

Apply for the position:

Please send your application to hr@lxp.lu or use our contact form: