To use the Lisa system efficiently, it is important to know something about the file systems that are available, and how to use them.
Incorrect use of a file system will slow down your jobs and even the system as a whole.
The home file system contains the files you normally use. This file system is available on the login nodes and on all batch nodes. This means that you have access to the home file system in your jobs.
As a consequence, the home file system is not very efficient, especially with the handling of meta data: creating and destroying of files; opening and closing of files; many small updates to files and so on.
If your jobs are treating files like this, it is better to copy the files needed by the job to the scratch file system (see below), and work only on files in the scratch file system. Output files should be copied to the home file system at the end of a job.
Every compute node in the Lisa system contains a disk. These disks are much more efficient than the home file system and they are only accessible within the node itself.
The scratch file system is located on such a local disk.
The performance is much better than the performance of the home file system and very suitable for I/O intensive operations.
Note: do NOT use /tmp for temporary files, /tmp has a limited size and is only to be used by system processes.
You access the scratch file system by using the environment variable
$TMPDIR: this points to an existing directory on the local disk. For example: to create a directory 'work' on the scratch file system and copy a file from the home file system to that directory:
mkdir "$TMPDIR"/work cp my-file "$TMPDIR"/work
"$TMPDIR" (with quotes), not
$TMPDIR (without quotes). The reason is that
$TMPDIR can contain meta-characters. (Notably [ and ]). The quotes take care that the shell will leave those characters as-is.
For jobs that use only one node, the
cp command is excellent to copy files from the home file system to the scratch file system.
#PBS -lnodes=1 -lwalltime=5:00:00 cp -r $HOME/datadir/problem1 "$TMPDIR" cp -r $HOME/datadir/problem2 "$TMPDIR" ( cd "$TMPDIR"/problem1 myprogram >out 2>err cp out err results $HOME/datadir/problem1 ) & ( cd "$TMPDIR"/problem2 myprogram >out 2>err & cp out err results $HOME/datadir/problem2 ) & wait
In above example, two processes will run in parallel, in general you can extend this to more processes, depending on the number of cores available.
We assume, that in the 'problem' directories a few large files are situated. If you have to work with a large number of small files, it is best to combine them first, for example:
tar zcvf data.tar.gz datafile.*
In a job the copying and untarring can be done in one step:
cd "$TMPDIR" tar zxf $HOME/datadir/data.tar.gz
Also, when your program produces many files, it is best to tar them before copying to the home file system:
cd "$TMPDIR" do - some - work tar zcf $HOME/datadir/data.tar.gz data.*
For jobs that use more than one node (typically MPI programs), we developed a tool mpicopy that copies efficiently files to the scratch file systems of all nodes that are participating in the job.
NOTE The scratch file system is cleaned at the end of a job to make room for the next job. So, your job script should take care that any important output file that is made on the scratch file system is copied to the home file system.
The following is lisa-specific:
The archive file system is only accessible from the login nodes, and is located in /archive. If an archived file is to be used in a job, first copy it from the archive file system to the home file system. /archive contains subdirectories with the same name as the logins. Example: user fred copies a file to the archive file system:
cp myfile /archive/fred
The archive file system is optimized for large files (larger than 10 Mbyte). So, it is best to tar and compress a directory before transferring it to the archive file system. Example: user fred tars and compresses the directory 'work' and copies that to the archive file system. Fred also checks the validity of the compressed tar file 'work.tgz' before deleting the 'work' directory and the compressed tar file in his home directory:
tar zcvf work.tgz work cp work.tgz /archive/fred tar ztf /archive/fred/work.tgz && rm work.tgz && rm -r work
With ACLs you define on a per-user or per-group basis who are allowed to access your files.