Data Handling System

For many years, ECMWF has operated a large-scale data handling system (DHS), in which all ECMWF users can store and retrieve data that is needed to perform weather modelling, research in weather modelling and mining of weather data. ECMWF's meteorological archive contains petabytes of operational and research data.

SL8500 tape libraries

 

The user view

From the user viewpoint the DHS supports two main applications developed by ECMWF to hide the complexities of the underlying storage management from users:

  • MARS, the Meteorological Archival and Retrieval System, provides access to a powerful abstraction engine that allows staff and applications to access the meteorological data that has been collected or generated at ECMWF for more than 30 years. MARS stores GRIB and BUFR data, hiding from its users all of the details concerning the physical location and internal organisation of this data. It manages its own set of disk caches for staging data that has been recently acquired, generated or accessed. However the bulk of its information is stored in HPSS (see below).

  • ECFS provides users with a logical view of a seemingly very large file system, and is used for data that is not suitable for storing in MARS. UNIX-like commands enable users to copy whole files to and from any of ECMWF's computing platforms. ECFS uses the storage hierarchy of disks and tapes within HPSS to store the files and their associated metadata (file ownership, directory structure, etc.).

Underlying storage management

Supporting ECFS and most of MARS is an underlying file archiving component, IBM's High Performance Storage System (HPSS), in which data is kept and managed. It keeps track of files that are stored, provides Hierarchical Storage Management (HSM) facilities when needed, and it manages activities related to disks, tapes, tape drives and automated tape libraries.

HPSS is based on version 5 of the IEEE Mass Storage Reference Model. It supports a variety of tape drives and automated tape libraries. Data is transferred from multiple storage devices via multiple data streams over multiple network paths; in this way high aggregate transfer rates are achieved. 'Data movers' (specialised software modules), which can execute on different server machines, send streams of data directly between those servers and the client machines requesting the data transfer. This distributed multi-processing nature of HPSS is one of the keys to its scalability.

In turn, HPSS uses IBM's DB2, a high performance database management system with advanced transaction-processing techniques, to guarantee security, protection and integrity of data.

In addition a secondary copy of the most important data is kept on tape cartridges in the Disaster Recovery System (DRS).

Hardware configuration

At Summer 2020 the DHS hardware includes the following:

  • Many x64 servers are used to execute the HPSS, MARS and ECFS applications.  These run Red Hat Enterprise Linux  or CentOS.

  • The bulk of the data is stored on tape cartridges. We are in the process of migrating from Oracle T1000C & T10000D tape drives hosted in a set of four Oracle (Sun) SL8500 tape libraries to an environment based on IBM TS1160 tape drives.  The IBM tape drives are hosted in two IBM TS4500 tape libraries and we have plans to add more libraries in our new Data Centre.

  • Many IBM, DDN and Western Digital subsystems provide disk storage that is used to cache data being stored into, or retrieved from, the tape libraries, as well as the metadata needed by HPSS, MARS and ECFS.

The DHS servers are connected to each other, to the DHS clients including the HPC, and to the Centre's general purpose servers and desktops through the Centre's main 10-gigabit network.  Some 1 gigabit networks provide additional internal control paths.

As with the data, some of this equipment is housed in the DRS.

Some figures (August 2020)

  • On an average day the system handles requests more than 13,000 tape mounts.

  • In a typical day the archive grows by about 287 TB, and 215 TB is retrieved .

  • MARS data represents about 75% of the volume of data stored in the DHS, but only about 4% of the number of files. ECFS data represents almost all of the remaining 25% of the data, corresponding to 96% of the files.

  • The DHS provides access to over 360 PB of primary data (had been 125 PB in 2015). An additional 130 PB of backup copies (had been 20 PB in 2015) of part of the primary data are stored in the DRS. There are about 350 million files in ECFS (had been 204 million) and over 13 million in MARS (had been about 15 million).