File systems on Lisa

Introduction

To use the Lisa system efficiently, it is important to know something about the file systems that are available, and how to use them.

Incorrect use of a file system will slow down your jobs and even the system as a whole.

The home file system

The home file system contains the files you normally use. This file system is available on the login nodes and on all batch nodes. This means that you have access to the home file system in your jobs.

As a consequence, the home file system is not very efficient, especially with the handling of meta data: creating and destroying of files; opening and closing of files; many small updates to files and so on.

If your jobs are treating files like this, it is better to copy the files needed by the job to the scratch file system (see below), and work only on files in the scratch file system. Output files should be copied to the home file system at the end of a job.

backup & restore

  • We do nightly incremental backups.
  • We can restore files and/or directories when you accidentally remove them (provided they already existed during the last successful backup). Up to 15 days back.

The scratch file system

Every compute node in the Lisa system contains a disk. These disks are much more efficient than the home file system and they are only accessible within the node itself.

The scratch file system is located on such a local disk.

The performance is much better than the performance of the home file system and very suitable for I/O intensive operations.

Note: do NOT use /tmp for temporary files, /tmp has a limited size and is only to be used by system processes.

You access the scratch file system by using the environment variable $TMPDIR: this points to an existing directory on the local disk. For example: to create a directory 'work' on the scratch file system and copy a file from the home file system to that directory:

  mkdir "$TMPDIR"/work
  cp my-file "$TMPDIR"/work

Note: use "$TMPDIR" (with quotes), not $TMPDIR (without quotes). The reason is that $TMPDIR can contain meta-characters. (Notably [ and ]). The quotes take care that the shell will leave those characters as-is.

Copying files to the scratch file system

For jobs that use only one node, the cp command is excellent to copy files from the home file system to the scratch file system.

Examples

#PBS -lnodes=1 -lwalltime=5:00:00
cp -r $HOME/datadir/problem1 "$TMPDIR"
cp -r $HOME/datadir/problem2 "$TMPDIR"
(  
  cd "$TMPDIR"/problem1
  myprogram >out 2>err
  cp out err results $HOME/datadir/problem1
) &
(
  cd "$TMPDIR"/problem2
  myprogram >out 2>err &
  cp out err results $HOME/datadir/problem2
) &
wait

In above example, two processes will run in parallel, in general you can extend this to more processes, depending on the number of cores available.

We assume, that in the 'problem' directories a few large files are situated. If you have to work with a large number of small files, it is best to combine them first, for example:

tar zcvf data.tar.gz datafile.*

In a job the copying and untarring can be done in one step:

cd "$TMPDIR"
tar zxf $HOME/datadir/data.tar.gz

Also, when your program produces many files, it is best to tar them before copying to the home file system:

cd "$TMPDIR"
do  - some  - work
tar zcf $HOME/datadir/data.tar.gz data.*

For jobs that use more than one node (typically MPI programs), we developed a tool mpicopy that copies efficiently files to the scratch file systems of all nodes that are participating in the job.

NOTE The scratch file system is cleaned at the end of a job to make room for the next job. So, your job script should take care that any important output file that is made on the scratch file system is copied to the home file system.

The archive file system

Find here a general introduction about the archive file system.

The following is lisa-specific:

The archive file system is only accessible from the login nodes, and is located in /archive. If an archived file is to be used in a job, first copy it from the archive file system to the home file system. /archive contains subdirectories with the same name as the logins. Example: user fred copies a file to the archive file system:

cp myfile /archive/fred

The archive file system is optimized for large files (larger than 10 Mbyte). So, it is best to tar and compress a directory before transferring it to the archive file system. Example: user fred tars and compresses the directory 'work' and copies that to the archive file system. Fred also checks the validity of the compressed tar file 'work.tgz' before deleting the 'work' directory and the compressed tar file in his home directory:

tar zcvf work.tgz work
cp work.tgz /archive/fred
tar ztf /archive/fred/work.tgz && rm work.tgz && rm -r work

Access Control Lists (ACLs)

With ACLs you define on a per-user or per-group basis who are allowed to access your files.