We think the name 'Lisa' is appropriate for the system, because:
If one wants, 'Lisa' can stand for:
The first one honors the fact that large essential portions of the software that make systems like Lisa possible are from the open source community: GNU. GNU stands for "GNU's not Unix". The second honours the fact that de operating system on Lisa is Linux. Without the availability of free, open source operating systems like Linux, a cluster like Lisa would be nearly impossible.
This question is most adequately answer here: Information for new users.
We would appreciate if you put a text like this in your publications about projects wherein Lisa played a role:
We thank SURFsara (www.surfsara.nl) for the support in using the Lisa Compute Cluster.
See this page.
No, when you run an normal program on the cluster, it will not run automatically in parallel. Two things are required:
Specifying to use more than one node is only meaningful for a parallel job (see above). Serial programs do not benefit, on the contrary: they will use only one core on one node, thus spilling the rest of the cores. Here is an example how to use all cores in a job.
A node on the Lisa cluster contains eight or more cores. It would have been possible to allocate single cores to jobs, so that eight serial jobs could run on the same node. We chose not to do so, but to allocate whole nodes to a job because of the following reasons:
Every user on the lisa system has her own file system, normally 200 Gbyte large. To get an impression how much is used, issue the following command:
You will get an output like this:
willem@login4: ~/:=-> quota -h Filesystem Size Used Avail Use% Mounted on fs7:/lisapool/home/willem 75G 69G 6G 92% /home/willem
or, depending on the type of disks your home directory resides:
willem@login4: ~/:=-> quota Disk quotas for user willem (uid 31009): Filesystem space quota limit grace files quota limit grace fs8:/lisapool/home/willem 69011M 76800M 76800M 21973 0 0
Meaning that user 'willem' has a total disk space of 75 Gbyte, of which 69Gbyte is used and 6 Gbyte is free, usage percentage is 92.
If you really used up all your disk quota, there is not much you can do, even an attempt to remove some files:
does not work, because 'rm' temporarily needs some file space.
What to do? Removing a file is not possible, but it is possible to change the size of a file to zero bytes, here follows the very short command to change the size of 'myfile' to zero:
When you have resized a few large files this way, you can proceed cleaning up your home directory using the 'rm' command again.
It is very well possible to use two or more cores in a job by starting two or more processes. The operating system will take care that both cores are used. An example is here.
Lisa's job scheduler uses a first-in first-out strategy, complemented with backfilling and a fair share algorithm. This ensures that:
To get you job running as soon as possible, specify a wall clock time that is as short as possible. The shorter the runtime of a job, the more chance that it is eligible for backfilling.
Serial and parallel jobs are defined as:
Note: it is possible to run an parallel program in a serial job: the processes will run on one and the same node.
The system consists of nodes with infiniband and nodes without infiniband. Infiniband is a high-speed, low-latency network, especially to be used by parallel programs.
The system will schedule parallel jobs on infiniband nodes, and serial jobs on non-infiniband nodes. The decision is made on the number of nodes a job requests:
You can arrange that output for a specific file, or all open files, is flushed to disk in your program:
#include <stdio.h> FILE *stream; ... fflush(stream); // to flush file 'stream' fflush(0); // to flush all open files
integer unit ... call flush(unit) ! to flush file nr unit call flush(0) ! to flush all open files
You can also arrange, that output of all files is flushed directly after each write:
#include <stdio.h> ... setvbuf(stdout, NULL, _IONBF, 0);
If you specify a too long wall clock time, the system rejects your job. Often there is a possibility to checkpoint your programs just before the end of the allotted wall clock time, and restart them in another job. See our description of the DMTCP package.
To facilitate quick testing of short jobs, we dedicate a few 8-core nodes for jobs that ask not more than 5 minutes wallclock time. You can submit as many of these jobs as you want, but per user only one job will run at a time. Example testjob (the most important part is the #PBS line):
#PBS -lnodes=1:cores8:ppn=8 -lwalltime=5:00 module load openmpi/gnu cd $HOME/workdir mpiexec ./my-mpi-program
If you submitted many jobs, and they are running one by one, it could be that you specified a walltime less than or equal to 5 minutes. These very short jobs are submitted to a special queue and have a high change to run very quickly, but per user only one at a time. So, if you have many short jobs, specify a walltime larger that 5 minutes, for example:
#PBS -lnodes=1:cores8:ppn=8 -lwalltime=6:00 module load openmpi/gnu cd $HOME/workdir mpiexec ./my-mpi-program
As is explained in the description the file systems we urge you to read and produce files in $TMPDIR (or /scratch ). Problems can arise if your job hits the walltime limit: how to save the output files? In the description of the module sara-batch-resources a solution is presented.
A few times per year, you will see in the 'message of the day' (the message you get when you login in to lisa), that maintenance is planned. During this period the system will be upgraded or adapted.
Consequences for you:
No, you can't receive messages from outside the Lisa system. De batch nodes can send mail to your login, but, in order to read them, you have to forward mail sent to your Lisa login.