M  O  S  I  X
Cluster and Multi-Cluster Management

Home     About     Distributions     Clouds     Wiki     HUGI     FAQ     Pubs     Contact


MOSIX Frequently Asked Questions - Flat listing
Table of contents

Copyright © 1999 - 2010 A. Barak. All rights reserved.



General

Question:

What is MOSIX

Answer:

MOSIX is an operating system like management system targeted for High Performance Computing (HPC) on Linux clusters and multi-clusters.

Its main feature is to provide users and applications with the illusion of running on a single computer with multiple processors, without changing the interface and the run-time environment of their respective login nodes.

More information can be found in the About web page and "The MOSIX2 Management System for Linux Clusters and Multi-clusters" white paper.


Question:

Why this name

Answer:

MOSIX stands for a Multicomputer Operating System for UnIX.

MOSIX® is a registered trademark of Amnon Barak and Amnon Shiloh.


Question:

Who is it suitable for

Answer:

MOSIX is suited to run compute intensive and applications with moderate amounts of I/O over fast, secure networks, in a trusted environment (where all remote nodes are trusted), e.g., as in private clusters and intra-organizational multi-clusters.

Question:

What are the main benefits of MOSIX

Answer:

Users can login on any node and do not need to know where their programs run.

In a MOSIX cluster/multi-cluster there is no need to modify or to link applications with any library, copy files or login to remote nodes, or even assign processes to different nodes, including nodes in different clusters - it is all done automatically.

The outcome is ease of use, better utilization of resources and near maximal performance.


Question:

How this is accomplished

Answer:

By a software layer that allows applications to run in remote computers as if they run locally.

Users can run their regular sequential and parallel applications as if they use one computer (node), while MOSIX automatically (and transparently) seek resources and migrate processes among nodes to improve the overall performance.

This is accomplished by on-line algorithms that monitor the state of the system-wide resources and the running processes, then, whenever appropriate, initiate process migration to:

  1. Balance the load;
  2. Move processes from slower to faster nodes;
  3. Move processes from nodes that run out of free memory;
  4. Preserve long-running guest processes when clusters are about to be disconnected from the multi-cluster.


Question:

Which hardware platforms are supported

Answer:

The latest production distribution of MOSIX runs on all x86-compatible computers (both 32-bit and 64-bit architectures).

Question:

Which software platforms are supported

Answer:

The latest production distribution of MOSIX runs on Linux-2.6.

Distributions are provided as RPMs for openSUSE, for use in native Linux and as a pre-installed virtual-disk image that can be used to create a MOSIX virtual cluster on Windows and/or Linux computers.


Question:

Is MOSIX a cluster or a multi-cluster technology

Answer:

Both.

MOSIX version 2 (MOSIX2) for Linux-2.6 can manage x86-based clusters and multi-clusters.

MOSIX version 1 for Linux-2.4 can manage a single cluster.


Question:

Why all remote nodes must be trusted

Answer:

To ensure that migrated (guest) processes are not tampered with while running in remote clusters of a multi-cluster.

Note that guest processes run in a sandbox, which prevents such processes from accessing local resources in the hosting nodes.


Question:

History of MOSIX

Answer:

The History of MOSIX wiki page provides information about all the versions of MOSIX.

Question:

MOSIX related papers and reports

Answer:

Can be found in this wiki page.



MOSIX2 - conceptual

Question:

What are the main features of MOSIX

Answer:

The main features are listed in the MOSIX for Linux-2.6 web page.

Question:

What aspects of a single-system image are supported

Answer:

MOSIX provides to users and applications the user's "login-node" run-time environment.

This means that:

  1. Users can login on any node and do not need to know where their programs run.
  2. No need to modify or link applications with special libraries.
  3. No need to copy files to remote nodes.
  4. Automatic resource discovery: whenever clusters or nodes join (disconnect), all the active nodes are updated.
  5. Automatic workload distribution by process migration, including load balancing, process migration from slower to faster nodes and from nodes that run out of free memory.


Question:

How MOSIX supports Virtual Organizations (VOs)

Answer:

A VO is a set of clusters (including servers and workstations) whose owners wish to share their computing resources from time to time in a flexible way.

MOSIX provides the following features to manage VOs:

  1. Support of disruptive configurations: clusters can join or leave the multi-cluster at any time.
  2. Clusters could be shared symmetrically or asymmetrically. For example, the owner of cluster A can allow processes originating from cluster B to move in but block processes originating from cluster C.
  3. A run-time priority for flexible use of nodes within and among groups. For example, to partition a cluster among different users.
  4. Each cluster owner can assign priorities to processes from other clusters. For example, the owner of cluster A can assign higher priority to processes from cluster B and lower priority to processes from cluster C. This way, when guest processes from cluster B wish to move to cluster A, they will push out guest processes from cluster C (if any).
  5. Local and higher priority processes force out lower priority processes.
  6. Migrated processes to/from a disconnecting cluster are moved out/back, so that long-running migrated processes are not killed.


Question:

What is the architecture of a MOSIX configuration (cluster, multi-cluster)

Answer:

The architecture of a MOSIX configuration is homogeneous: all nodes must be x86-based and run (nearly) the same version of MOSIX (see the question about mixing different versions of MOSIX).

However, individual nodes may have different number of processors (cores), different speed, different memory size or I/O devices.


Question:

Which type of processes are supported

Answer:

MOSIX recognizes two types of processes: Linux and MOSIX processes.

Linux processes are not affected by MOSIX - they run as they do on any Linux system and can not be migrated.

MOSIX processes run in an environment that allows them to migrate from one node to another.

Linux processes usually include administrative and other tasks that are not suitable for migration, whereas MOSIX processes are selected user-applications that are suitable and can benefit from migration.

Apart from process-migration that is available only to MOSIX processes, MOSIX includes batch mechanisms that can queue and assign new jobs to start on the best available nodes: these batch mechanisms are available for both Linux and MOSIX jobs.

MOSIX processes are invoked by the "mosrun" command. If you want to make use of the MOSIX batch mechanisms for Linux (non-migratable) processes, use the "mosrun -E" option.

This can be summarized in the following table:

Process type Migratable (MOSIX) Non-Migratable (Linux)
Batch mosrun -M [-b] mosrun -E [-b]
Fully-interactive mosrun [-b] (do not use "mosrun")

where the "-b" selects the best location to run it.


Question:

Does MOSIX support checkpoint/restart

Answer:

Yes, most CPU-intensive MOSIX processes can be checkpointed.

When a checkpoint is performed, the image of the processes is saved to a file. The process can later recover itself from that file and continue to run from that point.

For successful checkpoint and recovery, a process must not depend heavily on its Linux environment. For example, for security reasons processes with setuid/setgid privileges or processes with open pipes or sockets can't be checkpointed.

Checkpoints can be triggered by a program, by a manual request and/or automatically - at regular time intervals, see the next question.


Question:

How to trigger a checkpoint

Answer:

Checkpoints can be triggered in 3 ways:
  1. By providing the "-C< file-name> " and "-A< integer-number>" flags to mosrun. This will perform a periodic checkpoint every "integer-number" of minutes and the checkpointed file will be saved to the files "file-name.1", "file-name.2", etc. Read the mosrun manual for details.
  2. By using the "migrate < pid> checkpoint" command to perform a checkpoint at a specific time externally to the program. Read the migrate command manual for details.
  3. From within the program, by using the MOSIX checkpoint interface.
The MOSIX checkpoint interface is documented in "man MOSIX". It contains the following files in the proc file system:

  /proc/self/checkpoint
  /proc/self/checkpointfile
  /proc/self/checkpointlimit
  /proc/self/checkpointinterval

These files are private to each process. They allow the process to modify its checkpoint parameters and to trigger a checkpoint operation.


Question:

Example how to perform a checkpoint from within a program

Answer:

The following program performs 100 units of work and uses the checkpoint-unit argument to trigger a checkpoint right after that unit. The "Checkpoint-file" is used to save the copies of the program.

#include < stdlib.h>
#include < unistd.h>
#include < string.h>
#include < stdio.h>
#include < fcntl.h>
#include < sys/stat.h>
#include < sys/types.h>

// Setting the checkpoint file from withing the process
// This can also be done via the -C argument to mosrun
int setCheckpointFile(char *file) {
     int fd;

     fd = open("/proc/self/checkpointfile", 1|O_CREAT, file);
     if (fd == -1) {
        return 0;
     }
     return 1;

}

// Triggering a checkpoint from within the process
int triggerCheckpoint() {
     int fd;
     fd = open("/proc/self/checkpoint", 1|O_CREAT, 1);
     if(fd == -1) {
        fprintf(stderr, "Error doing self checkpoint \n");
        return 0;
     }
     printf("Checkpoint was done successfuly\n");
     return 1;
}

int main(int argc, char **argv) {
     int j, unit, t;
     char *checkpointFileName;
     int checkpointUnit = 0;

     if(argc < 3) {
        fprintf(stderr, "Usage %s < checkpoint-file> < unit> \n", argv[0]);
        exit(1);
     }

     checkpointFileName = strdup(argv[1]);
     checkpointUnit = atoi(argv[2]);
     if(checkpointUnit < 1 || checkpointUnit > 100) {
        fprintf(stderr, "Checkpoint unit should be > 0 and < 100\n");
        exit(1);
     }

     printf("Checkpoint file: %s\n", checkpointFileName);
     printf("Checkpoint unit: %d\n", checkpointUnit);

// Setting the checkpoint file from within the process (can also be done using
// the -C argument of mosrun
     if(!setCheckpointFile(checkpointFileName)) {
        fprintf(stderr, "Error setting the checkpoint filename from within the process\n");
        fprintf(stderr, "Make sure you are running this program via mosrun\n");
        return 0;
     }

// Main loop ... running for 100 units. checnge this loop if you wish
// the program to run do more loops
     for( unit = 0; unit < 100 ; unit++ ) {
        // Consuming some cpu time (simulating the run of the application)
        // Change the number below to cause each loop to consume more (or) less time
        for( t=0, j = 0; j < 1000000 * 500; j++ ) {
          t = j+unit*2;
       }
       printf("Unit %d done\n", unit);

// Trigerring a checkpoint request from within the process
       if(unit == checkpointUnit) {
          if(!triggerCheckpoint())
             return 0;
          }
       }
       return 1;
}

To compile: gcc -o checkpoint_demo checkpoint_demo.c
To run: mosrun checkpoint_demo

A typical run:
> mosrun ./checkpoint_demo ccc 5
Checkpoint file: ccc
Checkpoint unit: 5
Unit 0 done
Unit 1 done
Unit 2 done
Unit 3 done
Unit 4 done
Unit 5 done
Checkpoint was done successfuly
Unit 6 done
Unit 7 done
Unit 8 done
^C

The program triggered a checkpoint after unit 5. The checkpointed file was saved in ccc.1.
After unit 8 the program was killed.

To restart:
> mosrun -R ccc.1
Checkpoint was done successfuly
Unit 6 done
Unit 7 done
Unit 8 done
Unit 9 done
Unit 10 done
...

The program was restarted from the point right after it was checkpointed.


Question:

What are the options of "live-queuing" in MOSIX

Answer:

MOSIX supports "live-queuing" that allows queued jobs to preserve their full connection with their Linux environment. This includes controlling terminal, parent-process, signals, pipes, sockets, shared file-descriptors, etc.

The queuing system includes tools for tracing queued jobs, setting and changing their priorities or the order of execution, and for running parallel jobs.


Question:

How the queuing system of MOSIX works

Answer:

In a MOSIX multi-cluster, each cluster has its own queue and this queue is shared by all the users of that cluster.

The number of jobs that can be placed in the queue is limited by the number of Linux processes (about 30000 for all users). To queue a larger number of jobs, there is an option to run multiple command-lines from a file, each with its own arguments. This option is commonly used to run the same program with many different sets of arguments. Another option allows to set an upper limit on the number of simultaneous jobs that are allowed to run. This option combines well with the queuing system which run jobs based on the availability of cluster/multi-cluster resources.

There is an argument to inform the queuing system that the job may split into a number of parallel processes, so that more resources are reserved for it. Another argument allows bundling for easy identification of several instances of a job by a single job-ID. Jobs can also be handled as a group and be killed collectively.


Question:

How MOSIX manages batch jobs

Answer:

Batch jobs can be sent to any node in the local cluster (as opposed to non-batch jobs that require the specific environment of their dispatching node).

There are two types of batch jobs: Linux and MOSIX. Linux batch processes do not migrate, while MOSIX batch processes can migrate, but their home-node can be different than their dispatching node. MOSIX can assist both types by:

  1. Queuing the job until resources are available (using "mosrun -q", "mosrun -S" or both);
  2. Selecting the best initial assignment for the job.

Batch jobs are started from binaries in another node and preserve only some of the caller's environment: they receive the environment variables; they can read from their standard-input and write to their standard output and error, but not from/to other open files; they receive signals, but if they fork, signals are delivered to the whole process-group rather than just the parent; they can not communicate with other processes on the calling node using pipes and sockets (other than standard input/output/error), semaphores, messages, etc. and can only receive signals, but not send them to processes on the calling node.

The main advantage of batch jobs is that they save time by not needing to refer to the dispatching-node to perform system-calls, and that temporary files can be created on the node where they start, preventing the dispatching node from becoming a bottleneck. This approach is therefore recommended for programs that perform a significant amount of I/O.


Question:

How MOSIX handles temporary files

Answer:

To reduce the I/O overhead, MOSIX has an option to migrate (private) temporary files with the process.

Question:

Can MOSIX run in a Virtual Machine (VM).

Answer:

Yes.

MOSIX can run in a virtual machine in any platform that supports virtualization (including Windows).

The MOSIX web provides a free evaluation copy of MOSIX on a pre-installed virtual-disk image that can be used to create a MOSIX virtual cluster on Linux and/or Windows computers.


Question:

Is it possible to install and run more than one VM with MOSIX on the same node

Answer:

Yes, this is especially useful on multi-core computers.

Note that the total number of processors used by the VMs should not exceed the number of physical processors.


Question:

Can MOSIX run on an unmodified Linux kernel

Answer:

Yes, within a Virtual Machine.

Question:

Why migrate processes when one can move a whole VM with a process inside

Answer:

Mainly because it is expensive, both in terms of time and the required memory, to create a VM for each process.

Specifically:

  1. Migrating a whole VM requires the transfer of much more memory. Even in the case of "live-migration" (that works for certain types of processes, not all), this can overload the network more.
  2. Once in a VM, a process that splits (using "fork") can not get independent resources for each split process: the original process with all its children will have to remain together on the same VM.
  3. Processes within a VM can not maintain most of their connections (pipes, signals, parents/children, IPC, etc.) with other processes, either on the generating host or in other VM's.
  4. Allocating a full virtual-disk image for each process can consume a large amount of disk space.
  5. Current VM technology doesn't support migration between different clusters that are on different switches.




MOSIX Reach the Clouds (MRC)

Question:

What is MRC

Answer:

MRC is a tool that allows applications to run on remote computers (including other clusters and commercial Clouds), without the need to pre-copy files to these clusters.

MRC can run on both Linux nodes and MOSIX clusters.

MRC applications run in a hybrid environment, where some of their files are on their launching (local) node and the rest are on target (remote) nodes.


Question:

What are the main features of MRC

Answer:

  1. The ability to run local applications on remote computers, such as Clouds.
  2. Application running on Clouds can use both the files of the target computer and any desired subset of directories from the launching computer.

    With a proper choice of directories, this allows to achieve:

    1. Remote file access;
    2. File sharing among different computers and users;
    3. Running application that need both common and private data.
    4. By running MRC recursively, files can be shared from three or more computers.
  3. Standard-Input/Output/Error remain on the launching computer.
  4. "mosrun" and all its features can be used on remote clusters that run MOSIX.


Question:

How MRC works

Answer:

MRC consists of two parts, a launching program that can send jobs from the user's "head-node", e.g. workstation, to a designated target node, and a run-time environment that provides file services to running jobs on target computers.

If the target node is part of a MOSIX cluster, then MRC jobs can benefit from all the MOSIX features. If a target node runs Linux (but not MOSIX), then MRC jobs can only run there as native Linux jobs.


Question:

How MRC jobs are launched

Answer:

By the "mrc" command, run "man mrc" for details.

Question:

If I run several MRC jobs simultaneously, is their view of files consistent?

Answer:

Each MRC job has the option whether or not to maintain a cache on the target node. When a cache is maintained, changes made by one job may take time until seen by other jobs, but when caching is not used, there is full file-consistency. It is also possible to cache some directories and not others.

Question:

Which distributions of MOSIX support MRC

Answer:

A copy of MRC is included in MOSIX-2.26.0.0 for Linux-2.6.30 onwards.



MOSIX2 - technical

Question:

The latest release and change-log

Answer:

Are available
here.

Question:

Technical support

Answer:

Technical support, including configuration and installation assistance as well as upgrades to new releases are available for a fee here.

Question:

How to install

Answer:

An installation script and instructions are included in all the MOSIX distributions.

Question:

After installing MOSIX in one node, how do I install it on the other nodes

Answer:

The best way is to use a cluster installation package (such as OSCAR).

If you use a common NFS root directory for your cluster, you can install MOSIX in that directory.

Otherwise, on a small cluster, you can install MOSIX node by node.


Question:

Why did the installer failed to patch my kernel

Answer:

You probably specified a wrong version of the Linux kernel sources.

You must use only the official Linux kernel sources for your specific MOSIX distribution.

Note: do not use the kernel sources supplied with commercial Linux distributions - they were modified and could cause the MOSIX patch to fail.


Question:

Why did I get a kernel panic when trying to boot the MOSIX kernel

Answer:

This is not because of MOSIX, but simply because you have prepared your own Linux kernel, which is probably miss-configured (if you need to be convinced, try a plain, non-MOSIX, kernel from http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.x).

When you use a standard Linux package (such as RedHat, SuSe, or Debian), your kernel (and/or kernel modules) would already be configured by that package, but when you compile your own kernel - as you do when installing MOSIX, you need to make sure that the kernel configuration suits your hardware and contains all the necessary device-drivers and file-systems that you are using.

One tool that often helps in constructing the correct kernel configuration is to use the output of "gzip -cd < /proc/config.gz", produced on the originally-supplied kernel, as a basis for the new configuration (but note that not every Linux distribution has "/proc/config.gz"). This output may not be totally accurate because it comes from a different (usually older) Linux kernel-version, but is a good place to start: place it in the file ".config" of the kernel-source directory, then adjust it by running "make menuconfig".

Another tip that may help to configure the kernel correctly, is that unless you are a very experienced Linux system-administrator, you should probably avoid the "initrd" hassles and configure all the drivers and file-systems that you need in order to get the system to start within the kernel itself rather than as kernel modules.


Question:

After I installed MOSIX, "mosrun" produces "Not Super User" and exits

Answer:

The file "/bin/mosrun" (and a few others) must have setuid-root permissions. If for any reason it does not, then run:
> chown root /bin/mosrun /bin/mosq /bin/mosps
> chmod 4755 /bin/mosrun /bin/mosq /bin/mosps


Question:

May I mix different versions of MOSIX in the same cluster or multi-cluster.

Answer:

The MOSIX version has 4 digits. It is OK to mix versions when only the last digit is different, but not otherwise.

Question:

How can I see the state of my cluster or multi-cluster.

Answer:

Type "mon" (the MOSIX monitor). It can display the number of active nodes (type t), loads (l), size of total/used memory (m), dead nodes (d) and relative CPU speeds (s).

Question:

Is it necessary to restart MOSIX in order to change the configuration

Answer:

No.

Once you modify configuration files, the changes will take effect within a minute. After editing the list of nodes in your cluster ("/etc/mosix/mosix.map") you need to run "setpe", but if you are using "mosconf" to modify the local configuration, then there is no need to run "setpe".


Question:

How do I know that the process migration works

Answer:

Run "mon" in one screen. Then run several copies of a test (CPU bound) program, e.g.,

mosrun -e awk 'BEGIN {for(i=0;i<100000;i++)for(j=0;j<100000;j++);}'

First you should see an increase of the load in one node. After a few seconds, if the process migration works you will see how the load is spread among the nodes.

If your nodes are not of the same speed then more processes will run in the faster nodes.


Question:

What is the maximal number of multi-cores supported

Answer:

MOSIX supports whatever hardware is supported by the Linux kernel that it runs under, including multi-cores (dual, quad, 8-way, etc.) and SMPs.

Question:

Is Hyper-threading supported

Answer:

Yes.

Question:

What are the port numbers used by MOSIX

Answer:

TCP ports 249 - 253.

UDP ports 249 - 250.


Question:

What happens when a node crashes

Answer:

All processes that were running on or originated from that node are killed. To minimize the damage for long-running processes, it is recommended to use the MOSIX checkpoint facility.

Question:

Does the traffic among MOSIX nodes pass safely through the IPSec tunnels

Answer:

Yes. MOSIX works on top of TCP and UDP, obviously above IP.

Question:

Is it possible to run MOSIX over a WAN or the Internet

Answer:

Yes. However, opening a cluster over the Internet without a VPN is a security hazard.

Question:

How to run MOSIX processes in idle workstations

Answer:

MOSIX can take advantage of idle workstations (when no one is logged in), with the option that upon a login, all MOSIX processes are moved out and the MOSIX activities are stopped.
  1. In the login script add the commands:
    > mosctl block
    > mosctl expel &

    The "mosctl block" command prevents new remote processes from migrating to that workstation.
    The "mosctl expel &" move out MOSIX guest processes. Note that an & is used after the expel command, since expelling processes may take some time and we don't want the user login process to hang. The processes are expelled while the user logs in.

  2. On logout, run the command:
    > mosctl noblock

    This command allows remote processes to migrate to the workstation.

    On a Debian system using GDM the appropriate file to add this command is /etc/gdm/PostSession/Default .

Note that when adding the mosctl commands to the GDM script you shouldn't interfere with the correct work of gdb.




32-bit and 64-bit applications

Question:

How do I inform MOSIX whether I use 32-bit or 64-bit systems

Answer:

No need to do so - the MOSIX installation script will automatically detect the type of system that you have and install the appropriate binaries.

Question:

Can I mix 32-bit and 64-bit nodes in the same cluster

Answer:

Yes you can, but performance can be better if you set the 32-bit and the 64-bit nodes as separate clusters in a multi-cluster configuration.

Question:

Can I run 32-bit programs on 64-bit nodes

Answer:

Yes, 32-bit programs can migrate to 64-bit nodes (and even start there), but the home-node of 32-bit programs must be on a 32-bit node.. Thus, if you want to run 32-bit programs on predominantly 64-bit cluster(s), you may consider leaving aside a few 32-bit nodes as part of your cluster and/or multi-cluster, from where you can start 32-bit programs.

Question:

Can I run 64-bit programs on 32-bit nodes

Answer:

No, the hardware does not support it (and even when it does, the 32-bit Linux kernel doesn't).

Question:

Can I have MOSIX running under a 64-bit kernel, but a 32-bit Linux installation, utilities and libraries (because it is so much easier to upgrade only the kernel)

Answer:

No, while Linux allows this combination, the current version of MOSIX (neither the 32-bit nor the 64-bit variants) does not yet support this option, so MOSIX will fail to start.

Question:

What happens if I attempt to run a 32-bit executable from a 64-bit node

Answer:

It will run correctly for the sake of transparency, but as a "native" Linux process, so the program will not be able to migrate or use special MOSIX features (not even its child processes and even if they later execute a 64-bit binary).



Running applications

Question:

If a child process is spawned from a parent, must they migrate together

Answer:

No. Each process is managed independently.

Question:

Why shared-memory is not supported

Answer:

Because it is not scalable, i.e., it is impossible to change the contents of a memory in one node and expect that the same change will be reflected instantly in the memory of the remaining nodes (with which memory is shared), e.g., as in a multi-core.

Question:

How to run a threaded application

Answer:

Threaded applications are created by the "CLONE_VM" system-call which uses shared-memory, and thus are not suitable for distributed-memory architectures.

In MOSIX it is possible to run threaded applications as standard Linux processes. Such applications can not be migrated, but can still benefit from MOSIX features such as queuing and best initial-assignment.

To launch threaded applications use "mosrun -E".


Question:

How to run a script where one of the commands is a threaded application

Answer:

By using the "native" utility in your script:

> native {threaded_program} [program-args]...


Question:

Must all migratable executables be started under "mosrun"

Answer:

To be migratable, either the executables, or the shell (or other program) that called them must be run under "mosrun". Once a shell runs under "mosrun", all its descendants will also be under "mosrun" (but there is a way to request explicitly that a particular child will NOT run under "mosrun").

Question:

Are there any limitations on I/O that can be performed by migrated processes

Answer:

Usually, remote I/O done by migrated processes on remote nodes is performed via the respective home-node of each process. While this does not limit the allowed operations, it may slow-down such processes. Thus, if the amount of I/O is significant, it will often cause the process to migrate back to its home-node.

Note that the amount and frequency of I/O is taken into account and weighted against other considerations in making such a decision.

The direct-communication (migratable socket) can reduce this slow-down affect for I/O between communicating processes.


Question:

Which IPC mechanism should be use between processes to get the best performance

Answer:

The most efficient mechanism is the direct-communication, see the next questions.

Otherwise, MOSIX is not different from Linux: depending on the particular needs of the process, whatever approach (other than shared-memory) that is best in Linux is best on MOSIX. It could be pipes, SYSV-messages, UNIX-sockets, TCP-sockets and files.

Obviously files can be slow when they usually require writing on a physically-moving surface and/or networking. On the other hand, Linux has very good caching mechanisms for local files.


Question:

Can MOSIX support migratable socket

Answer:

Yes, direct-communication provides an effective migratable socket between migrated processes.

Question:

How direct-communication can improve the performance of communicating processes

Answer:

Normally, MOSIX processes do all their I/O and (most) system-calls via their respective home-nodes. This can be slow because operations are limited by the network speed and latency.

Direct communication allows processes to exchange messages directly between migrated processes, bypassing their home-nodes.


Question:

How to run MATLAB Version 7.4 (or older) jobs in MOSIX

Answer:

Jobs running MATLAB Version 7.4 (or older) can automatically migrate among nodes of a cluster/multi-cluster.

First, tune MATLAB to MOSIX by the following 3 steps:

  1. Find where MATLAB is installed on your system by
    > which matlab
    /usr/local/bin/matlab
  2. Backup the matlab program to another location
    > cp /usr/local/bin/matlab /tmp/mos-matlab
  3. Comment-out the following 2 lines in the mos-matlab script:
    LD_ASSUME_KERNEL=2.4.1
    export LD_ASSUME_KERNEL

    the result should be :

    #LD_ASSUME_KERNEL=2.4.1
    #export LD_ASSUME_KERNEL

You can now run MATLAB jobs in a cluster/multi-cluster using mosrun.

Example: to run the following MATLAB test.m program:

a=randn(3000);
b=svd(a);

use:

> mosrun -e mos-matlab -nojvm -nodesktop -nodisplay < test.m


Question:

How to run MATLAB Version 7.5 (or newer) jobs

Answer:

MATLAB Version 7.5 (or newer) applications use a library which uses threads (the "CLONE_VM" system-call) incorrectly. To overcome this problem we added to mosrun the -i flag, which should be used with the -E flag. This means that MATLAB jobs can be queued and assigned by MOSIX to nodes as regular Linux processes, but they can't migrate afterwards.

The MOSIX version should be at least MOSIX-2.24.0.0 and jobs should be started by:

> mosrun -E -b -i matlab ....

MOSIX will assign each job to the best node in the local cluster.

Example: to run the following MATLAB test.m program:

a=randn(3000);
b=svd(a);

use:

> mosrun -E -b -i matlab -nojvm -nodesktop -nodisplay < test.m


Question:

How to run JAVA programs

Answer:

JAVA supports (shared-memory) threads (the "CLONE_VM" system-call), which is not suitable for distributed-memory architectures (clusters). This means that JAVA jobs can be queued and assigned by MOSIX to nodes only as regular Linux processes (with the "mosrun -E" flag).

A JAVA job should be started by:

> mosrun -E -b java job

MOSIX will assign each job to the best node in the local cluster.


Question:

Can MOSIX migrate MPI processes

Answer:

Yes.

MPI allocates processes to slave nodes of a cluster in a Round-Robin fashion, without checking the state of the resources, e.g. speed, current load and available memory.

Process migration can improve the performance by load-balancing, by migration of processes from slower to faster nodes and to nodes with sufficient free memory, as well as by migration of MPI processes to nodes in remote clusters, which are not part of the user's cluster.




HUGI

Question:

What is HUGI

Answer:

HUGI is a campus multi-cluster with 16 MOSIX clusters.

Most clusters are private. They are made of production servers that belong to research groups in various departments. Four clusters are made of workstations in student labs.

Processes of users are allowed to migrate to idle workstations and among nodes in private clusters, subject to the priorities among the different groups. For example, since the workstations belong to the CS department, processes that are started in a CS private cluster has a higher priority to move to a workstation over already running processes from the Chemistry cluster.

Due to the increased computing demands by our researchers, the amount of installed memory in the workstations was increased (beyond the needs of the students), to allow large guest processes from the private clusters to run in these workstations.


Question:

How HUGI is managed

Answer:

All the nodes in all the HUGI clusters are diskless, they do not rely on local disks for booting and running. Local disks may be used for temporary storage.

User-files and home directories are located on central NFS servers.


Question:

What are the rules and policies for running applications on HUGI

Answer:

Rules: Policies:

Question:

Who is responsible to allocate freeze space

Answer:

Cluster owners should designate sufficient freeze space for processes originating from their cluster.