Company Description

At CERN, the European Organization for Nuclear Research, physicists and engineers are probing the fundamental structure of the universe. Using the world's largest and most complex scientific instruments, they study the basic constituents of matter - fundamental particles that are made to collide together at close to the speed of light. The process gives physicists clues about how particles interact, and provides insights into the fundamental laws of nature. Find out more on http://home.cern.

Job Description

Introduction

Are you a skilled and experienced Computing Engineer, interested in working in an exciting international environment at the forefront of modern computing? Then join CMS as one of the largest particle physics experiments in the world, and take part in its major upgrade activities to answer questions at the heart of particle physics!

The Experimental Physics (EP) Department carries out research in the field of experimental particle physics, supporting several experiments at the Large Hadron Collider (LHC) at CERN. CMS is a general-purpose particle physics experiment operated by an international collaboration. The CMS Data Acquisition and Trigger Group (CMD) has the major responsibility for developing, implementing, commissioning and operating the data acquisition (DAQ) and control systems of the CMS experiment. The group is also responsible for implementing, supporting and maintaining the experiment on-line computer clusters, ethernet based networks and mass storage systems.

From a computing infrastructure perspective, this is a big and complex Ethernet based distributed system with hundreds of sub-detector control PCs, hundreds of data acquisition PC and high-performance data analysis PCs with GPUs, as well as high availability servers, control room operator consoles and a thousands of servers in a cloud system used for offline data analysis.

Our team helps make this a success through the development and evolution of this computing infrastructure needed as a core element for the data taking of the CMS experiment.

Functions

As a Computing Infrastructure Engineer, you will join a passionate team and take an important role in the operation of the CMS computing infrastructure and participate in numerous strategic projects for the future in terms of operating systems, container technology and storage systems.

In particular you will:

· Perform system administration tasks for the LINUX computers connected to the CMS experiment network (control room consoles, data centre servers, and virtual machines).

· Communicate with the end-users to understand their needs and help translate these into appropriate solutions.

· Investigate, diagnose, and resolve operational problems in collaboration with end-users ranging from physicists to operations teams.

· Participate actively to the strategic design and evolution of the CMS on-line computing infrastructure including networks, on-line clusters, mass storage devices, virtual machines and containers. In particular:

Provide a container orchestration system suited to different teams (DAQ, sub-detectors, system administration)Provide an evolving NAS solution for the primary storage (users home directory, project areas, etc...)Provide the underlying high performance servers and storage for the bare metal, virtual machines and container orchestration system.

· Produce technical documentation on all the processes and systems in place

· Take part in on-call duty to address urgent system administration issues, which might require intervention on site.

· You may have the opportunity to supervise student and/or graduates.

Qualifications

Master's degree or equivalent relevant experience in the field of Computer Science or a related field.

Experience:

The following are required for this post:

· Proven experience in the operation of medium to large-scale distributed computing infrastructures, including:

LINUX operating system (RedHat), and scripting languages such as Bash, Python.Core services such as LDAP, Kerberos, DNS, DHCP, HAProxy, keepalived, MySQL or PostgreSQL.High-performance and high-availability commercial server platforms, such as blade systems and high end servers with GPUs.High end storage systems, including NetApp NAS.Container technology (Docker, Podman) and orchestrators (Kubernetes).Diagnostic and monitoring solutions for the aforementioned aspects, including Prometheus, ELK, Icinga, Grafana, and the Ceph File System.Configuration management tools such as Puppet with Foreman, or similar and a version control system such as Git.Virtualisation platforms such as Openstack and oVirt.

· Proven experience and interest in software failure analysis, diagnostics, and validation in a Linux environment.

· Experience in managing the computing infrastructure of a large data acquisition system, definitely a plus.

Technical competencies:

Knowledge of operating systems: linux-based support and maintenance, including UNIX shell and Python scripting.Knowledge of system configuration tools: ideally Puppet and Foreman.Architecture and design of ICT systems: control & data acquisition systems, distributed applications and services.Knowledge of communication technologies and protocols: in particular Ethernet network architectures, configuration, maintenance and diagnostics.Knowledge of storage technologies: hardware and software RAID, SAN, and NAS.

Behavioural competencies:

Working in Teams: working well in groups and readily fitting into a team; participating fully and taking an active role in team activities.Achieving Results: having a structured and organised approach towards work; being able to set priorities and plan tasks with results in mind; delivering prompt and efficient service taking into account customer needs.Learning and sharing knowledge: keeping up-to-date with developments in own field of expertise and readily absorbing new information.Demonstrating flexibility: adapting quickly and resourcefully to shifting priorities and requirements; actively participating in the implementation of new processes and technologies.Communicating effectively: demonstrating a pro-active approach to resolving differences; addressing issues of conflict constructively; expressing opinions, ideas and suggestions with conviction and in a logical/structured manner; keeping to the point.

Language skills:

Spoken and written English or French: ability to understand and speak the other language in professional contexts. Ability to draw-up technical specifications and/or scientific reports and to make oral presentations in at least one of the two languages.

Additional Information

Eligibility and closing date:

Diversity has been an integral part of CERN's mission since its foundation and is an established value of the Organization. Employing a diverse workforce is central to our success. We welcome applications from all Member States and Associate Member States.

This vacancy will be filled as soon as possible, and applications should normally reach us no later than 21.01.2024

Employment Conditions

Contract type: Limited duration contract (3 years). Subject to certain conditions, holders of limited-duration contracts may apply for an indefinite position.

These functions require:

Participation in a regular stand-by duty, including nights, Sundays and official holidays.Interventions in underground installations.A valid driving licence.

Job grade: 6-7

Job reference: EP-CMD-2023-175-LD

Benchmark Job Title: Computing Engineer

Recommended for you