In the current era of Petascale supercomputers, in which scientific machines generate Petabytes of scientific data on a yearly basis, handling fast growing data sets is becoming a challenge for the scientific community. Scientists frequently use complex measuring tools that generate huge amounts of data. Examples include the experiments with the Large Hadron Collider at CERN and LOFAR, an astronomy research project using antennas. Research in life sciences also generates enormous data files (such as in DNA sequencing).
The great challenge is to find ways of storing these Petabytes of data in an efficient manner so as to make them easily accessible for use by researchers. To that end we have set up an infrastructure that combines very fast data connections with a wealth of storage space on tape or disk. More and more data storage as well as analysis involve complex and/or unstructured data. For these types of data SURFsara also off ers interesting infrastructures. The Hadoop cluster for instance, which can be used for the fast analysis of extremely big data sets. The spectrum of applications is very broad, typically including indexation, storage and search functions.
SURFsara’s Data Services group provides services to the scientific user community in handling the scientific data explosion and long-term data preservation scientific data sets. We work closely together with all groups within SURFsara, participate in a wide range of national, European and global scientific projects (e.g. PRACE, WLCG, Lofar) and support scientific communities or individual scientists in handling their growing data challenges.
To manage all this data the Data Services group manages two facilities for long-term data preservation: