M  O  S  I  X
Cluster and Multi-Cluster Grid Management

Home     About     Linux-2.6     Wiki     Linux-2.4     HUGI     FAQ     Papers     Contact


MOSIX Frequently Asked Questions - Flat listing
Table of contents

Copyright © 1999 - 2008 Amnon Barak. All rights reserved.



General

Question:

What is MOSIX

Answer:

MOSIX is a management system targeted for High Performance Computing (HPC) on clusters and organizational grids with multiple clusters.

MOSIX incorporates automatic resource discovery and dynamic workload distribution, commonly found on single computers with multiple processors.

More information can be found in the About web page and "The MOSIX2 Management System for Linux Clusters and Organizational Grids" white paper.


Question:

Why this name

Answer:

MOSIX stands for a Multicomputer Operating System for UnIX.

MOSIX® is a registered trademark of Amnon Barak and Amnon Shiloh.


Question:

Who is it suitable for

Answer:

MOSIX is suitable to run compute intensive and applications with moderate amounts of I/O over fast, secure networks, in a trusted environment (where all remote nodes are trusted), e.g., as in private clusters and organizational grids.

Question:

What are the main benefits of MOSIX

Answer:

Users can login on any node and do not need to know where their programs run.

In a MOSIX cluster/grid there is no need to modify or to link applications with any library, copy files or login to remote nodes, or even assign processes to different nodes, including nodes in different clusters - it is all done automatically.

The outcome is ease of use, better utilization of resources and near maximal performance.


Question:

How this is accomplished

Answer:

By a software layer that allows applications to run in remote computers as if they run locally.

Users can run their regular sequential and parallel applications as if they use one computer (node), while MOSIX automatically (and transparently) seek resources and migrate processes among nodes to improve the overall performance.

This is accomplished by on-line algorithms that monitor the state of the system-wide resources and the running processes, then, whenever appropriate, initiate process migration to:

  1. Balance the load;
  2. Move processes from slower to faster nodes;
  3. Move processes from nodes that run out of free memory;
  4. Preserve long-running guest processes when clusters are about to be disconnected from the grid.


Question:

Which hardware platforms are supported

Answer:

The latest production distribution of MOSIX runs on all x86-compatible computers (both 32-bit and 64-bit architectures).

Question:

Which software platforms are supported

Answer:

The latest production distribution of MOSIX runs on Linux-2.6.

MOSIX can also run in Virtual Machines over most operating systems, including Windows.


Question:

Is MOSIX a cluster or a grid technology

Answer:

Both.

MOSIX version 2 (MOSIX2) for Linux-2.6 can manage a cluster as well as a multi-cluster organizational grid with several homogeneous clusters.

MOSIX version 1 for Linux-2.4 can manage a single cluster.


Question:

Why all remote nodes must be trusted

Answer:

To ensure that migrated (guest) processes in a multi-cluster grid are not tampered while running in remote (hosting) clusters.

Note that guest processes run in a sandbox, which prevents such processes from accessing local resources in the hosting nodes.


Question:

History of MOSIX

Answer:

The History of MOSIX web page provides information about all the versions of MOSIX.

Question:

MOSIX related papers and reports

Answer:

Can be found in link.



MOSIX2 - conceptual

Question:

What are the main features of MOSIX

Answer:

The main features are listed in the MOSIX for Linux-2.6 and the MOSIX for Linux-2.4 web pages.

Question:

What aspects of a single-system image are supported

Answer:

The main aspects are:
  1. Users can login on any node and do not need to know where their programs run.
  2. No need to modify or link applications with special libraries.
  3. No need to copy files to remote nodes.
  4. Automatic resource discovery: whenever clusters or nodes join (disconnect), all the active nodes are updated.
  5. Automatic workload distribution by process migration, including load balancing, process migration from slower to faster nodes and from nodes that run out of free memory.
  6. Preservation of the user's "login-node" run-time environment.


Question:

How MOSIX supports Virtual Organizations (VOs)

Answer:

A VO is a set of clusters (servers and workstations) whose owners wish to share their computing resources from time to time in a flexible way.

MOSIX2 provides the following features to manage VOs:

  1. Support of disruptive configurations: clusters can join or leave the grid at any time.
  2. Clusters could be shared symmetrically or asymmetrically. For example, the owner of cluster A can allow processes originating from cluster B to move in but not processes originating from cluster C.
  3. A run-time priority for flexible use of nodes within and among groups. For example, to partition a cluster among different users.
  4. Each cluster owner can assign priorities to processes from other clusters. For example, the owner of cluster A can assign higher priority to processes from cluster B and lower priority to processes from cluster C. This way, when guest processes from cluster B wish to move to cluster A, they will push out guest processes from cluster C (if any).
  5. Local and higher priority processes force out lower priority processes.
  6. Migrated processes to/from a disconnecting cluster are moved out/back, so that long-running migrated processes are not killed.


Question:

What is the architecture of a MOSIX configuration (cluster, grid)

Answer:

The architecture of a MOSIX configuration is homogeneous: all nodes must be x86-based and run (nearly) the same version of MOSIX (see the question about mixing different versions of MOSIX).

However, individual nodes may have different number of processors (cores), different speed, different memory size or I/O devices.


Question:

Which type of processes are available in MOSIX

Answer:

MOSIX2 recognizes two types of processes: Linux and MOSIX processes.

Linux processes are not affected by MOSIX2 - they run as they do on any Linux system, but cannot be migrated.

MOSIX processes are run in an environment that allows them to migrate from one node to another.

Linux processes usually include administrative and other tasks that are not suitable for migration, whereas MOSIX processes are selected user-applications that are suitable and can benefit from migration.

Apart from process-migration that is available only to MOSIX processes, MOSIX2 includes batch mechanisms that can queue and assign new jobs to begin on the best available node: these batch mechanisms are available for both Linux and MOSIX jobs.

Unlike MOSIX1, in MOSIX2 you need to invoke "mosrun" in order to use MOSIX - otherwise you run your programs on your standard Linux platform. If you want to make use of the MOSIX batch mechanisms for Linux (non-migratable) processes, use the "mosrun -E" option.

This can be summarized in the following table:

Process type Migratable (MOSIX) Non-Migratable (Linux)
Batch mosrun -M [-b] mosrun -E [-b]
Fully-interactive mosrun [-b] (do not use "mosrun")

where the "-b" selects the best location to run it.


Question:

Does MOSIX support checkpoint/restart

Answer:

Yes, most CPU-intensive MOSIX processes can be checkpointed.

When a checkpoint is performed, the image of the processes is saved to a file. The process can later recover itself from that file and continue to run from that point.

For successful checkpoint and recovery, a process must not depend heavily on its Linux environment. For example, for security reasons processes with setuid/setgid privileges or processes with open pipes or sockets can't be checkpointed.

Checkpoints can be triggered by a program, by a manual request and/or automatically - at regular time intervals.


Question:

What are the options of "live-queuing" in MOSIX

Answer:

MOSIX2 supports "live-queuing" that allows queued jobs to preserve their full connection with their Linux environment. This includes controlling terminal, parent-process, signals, pipes, sockets, shared file-descriptors, etc. The queuing system includes tools for tracing queued jobs, setting and changing their priorities or the order of execution, and for running parallel jobs.

Question:

How the queuing system of MOSIX works

Answer:

In a MOSIX grid, each cluster has its own queue and this queue is shared by all the users of that cluster. The number of jobs that can be placed in the queue is limited by the number of Linux processes (about 30000 for all users). To queue a larger number of jobs, there is an option to run multiple command-lines from a file, each with its own arguments. This option is commonly used to run the same program with many different sets of arguments. Another option allows to set an upper limit on the number of simultaneous jobs that are allowed to run. This option combines well with the queuing system which run jobs based on the availability of grid/cluster resources.

There is an argument to inform the queuing system that the job may split into a number of parallel processes, so that more resources are reserved for it. Another argument allows bundling for easy identification of several instances of a job by a single job-ID. Jobs can also be handled as a group and be killed collectively.


Question:

How MOSIX manages batch jobs

Answer:

In MOSIX2 batch jobs can be sent to any node in the local cluster (as opposed to non-batch jobs that require the specific environment of their dispatching node).

There are two types of batch jobs: Linux and MOSIX. Linux batch processes do not migrate, while MOSIX batch processes can migrate, but their home-node can be different than their dispatching node. MOSIX can assist both types by:

  1. Queuing the job until resources are available (using "mosrun -q", "mosrun -S" or both);
  2. Selecting the best initial assignment for the job.

Batch jobs are started from binaries in another node and preserve only some of the caller's environment: they receive the environment variables; they can read from their standard-input and write to their standard output and error, but not from/to other open files; they receive signals, but if they fork, signals are delivered to the whole process-group rather than just the parent; they cannot communicate with other processes on the calling node using pipes and sockets (other than standard input/output/error), semaphores, messages, etc. and can only receive signals, but not send them to processes on the calling node.

The main advantage of batch jobs is that they save time by not needing to refer to the dispatching-node to perform system-calls, and that temporary files can be created on the node where they start, preventing the dispatching node from becoming a bottleneck. This approach is therefore recommended for programs that perform a significant amount of I/O.


Question:

How MOSIX handles temporary files

Answer:

To reduce the I/O overhead, MOSIX2 has an option to migrate (private) temporary files with the process.

Question:

Can MOSIX run in a Virtual Machine (VM).

Answer:

Yes.

MOSIX can run in a virtual machine in any platform that supports virtualization (including Windows).

The MOSIX web provides a free evaluation copy of MOSIX2 on a pre-installed virtual-disk image that can be used to create a MOSIX virtual cluster on Linux and/or Windows computers.


Question:

Is it possible to install and run more than one VM with MOSIX on the same node

Answer:

Yes, this is especially useful on multi-core computers.

Note that the total number of processors used by the VMs should not exceed the number of physical processors.


Question:

Can MOSIX run on an unmodified Linux kernel

Answer:

Yes, within a Virtual Machine.

Question:

Why migrate processes when one can move a whole VM with a process inside

Answer:

Mainly because it is expensive, both in terms of time and the required memory, to create a VM for each process.

Specifically:

  1. Migrating a whole VM requires the transfer of much more memory. Even in the case of "live-migration" (that works for certain types of processes, not all), this can overload the network more.
  2. Once in a VM, a process that splits (using "fork") cannot get independent resources for each split process: the original process with all its children will have to remain together on the same VM.
  3. Processes within a VM cannot maintain most of their connections (pipes, signals, parents/children, IPC, etc.) with other processes, either on the generating host or in other VM's.
  4. Allocating a full virtual-disk image for each process can consume a large amount of disk space.
  5. Current VM technology doesn't support migration between different clusters that are on different switches.




MOSIX2 - technical

Question:

How to find the latest release and change-log of MOSIX

Answer:

The latest release of MOSIX and its change-log are available at
this link.

Question:

Is technical support available

Answer:

Yes.

Technical support is available for a fee. It includes configuration and installation assistance as well as upgrades to new releases. For details please follow this link.


Question:

How to install MOSIX

Answer:

An installation script and instructions are included in all MOSIX distributions.

Question:

After installing MOSIX in one node, how do I install it on the other nodes

Answer:

The best way is to use a cluster installation package (such as OSCAR).

If you use a common NFS root directory for your cluster, you can install MOSIX in that directory.

Otherwise, on a small cluster, you can install MOSIX node by node.


Question:

Why did the installer failed to patch my kernel

Answer:

You probably specified a wrong version of the Linux kernel sources.

You must use only the official Linux kernel sources for your specific MOSIX distribution.

Note: do not use the kernel sources supplied with commercial Linux distributions - they were modified and could cause the MOSIX patch to fail.


Question:

Why did I get a kernel panic when trying to boot the MOSIX kernel

Answer:

This is not because of MOSIX, but simply because you have prepared your own Linux kernel, which is probably miss-configured (if you need to be convinced, try a plain, non-MOSIX, kernel from http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.x).

When you use a standard Linux package (such as RedHat, SuSe, or Debian), your kernel (and/or kernel modules) would already be configured by that package, but when you compile your own kernel - as you do when installing MOSIX, you need to make sure that the kernel configuration suits your hardware and contains all the necessary device-drivers and file-systems that you are using.

One tool that often helps in constructing the correct kernel configuration is to use the output of "gzip -cd < /proc/config.gz", produced on the originally-supplied kernel, as a basis for the new configuration (but note that not every Linux distribution has "/proc/config.gz"). This output may not be totally accurate because it comes from a different (usually older) Linux kernel-version, but is a good place to start: place it in the file ".config" of the kernel-source directory, then adjust it by running "make menuconfig".

Another tip that may help to configure the kernel correctly, is that unless you are a very experienced Linux system-administrator, you should probably avoid the "initrd" hassles and configure all the drivers and file-systems that you need in order to get the system to start within the kernel itself rather than as kernel modules.


Question:

After I installed MOSIX, "mosrun" produces "Not Super User" and exits

Answer:

The file "/bin/mosrun" (and a few others) must have setuid-root permissions. If for any reason it does not, then run:
> chown root /bin/mosrun /bin/mosq /bin/mosps
> chmod 4755 /bin/mosrun /bin/mosq /bin/mosps


Question:

May I mix different versions of MOSIX in the same cluster or grid

Answer:

The MOSIX version has 4 digits. It is OK to mix versions when only the last digit is different, but not otherwise.

Question:

How can I see the state of my cluster or grid

Answer:

Type "mon" (the MOSIX monitor). It can display the number of active nodes (type t), loads (l), size of total/used memory (m), dead nodes (d) and relative CPU speeds (s).

Question:

Is it necessary to restart MOSIX in order to change the configuration

Answer:

No.

Once you modify configuration files, the changes will take effect within a minute. After editing the list of nodes in your cluster ("/etc/mosix/mosix.map") you need to run "setpe", but if you are using "mosconf" to modify the local configuration, then there is no need to run "setpe".


Question:

How do I know that the process migration works

Answer:

Run "mon" in one screen. Then run several copies of a test (CPU bound) program, e.g.,

mosrun -e awk 'BEGIN {for(i=0;i<100000;i++)for(j=0;j<100000;j++);}'

First you should see an increase of the load in one node. After a few seconds, if the process migration works you will see how the load is spread among the nodes.

If your nodes are not of the same speed then more processes will run in the faster nodes.


Question:

What is the maximal number of multi-cores supported

Answer:

MOSIX2 supports whatever hardware is supported by the Linux kernel that it runs under, including multi-cores (dual, quad, 8-way, etc.) and SMPs.

Question:

Is Hyper-threading supported

Answer:

Yes.

Question:

What are the port numbers used by MOSIX

Answer:

TCP ports 249 - 253.

UDP ports 249 - 250.


Question:

What happens when a node crashes

Answer:

All processes that were running on or originated from that node are killed. To minimize the damage for long-running processes, it is recommended to use the MOSIX checkpoint facility.

Question:

Does the traffic among MOSIX nodes pass safely through the IPSec tunnels

Answer:

Yes. MOSIX works on top of TCP and UDP, obviously above IP.

Question:

Is it possible to run MOSIX over a WAN or the Internet

Answer:

Yes. However, opening the grid over the Internet without a VPN is a security hazard.

Question:

How to run MOSIX processes in idle workstations

Answer:

MOSIX can take advantage of idle workstations (when no one is logged in), with the option that upon a login, all MOSIX processes are moved out and the MOSIX activities are stopped.
  1. In the login script add the commands:
    > mosctl block
    > mosctl expel &

    The "mosctl block" command prevents new remote processes from migrating to that workstation.
    The "mosctl expel &" move out MOSIX guest processes. Note that an & is used after the expel command, since expelling processes may take some time and we don't want the user login process to hang. The processes are expelled while the user logs in.

  2. On logout, run the command:
    > mosctl noblock

    This command allows remote processes to migrate to the workstation.

    On a Debian system using GDM the appropriate file to add this command is /etc/gdm/PostSession/Default .

Note that when adding the mosctl commands to the GDM script you shouldn't interfere with the correct work of gdb.




32-bit and 64-bit applications

Question:

How do I inform MOSIX whether I use 32-bit or 64-bit systems

Answer:

There is no need to do so - the MOSIX installation script will automatically detect the type of system that you have and install the appropriate binaries.

Question:

Can I mix 32-bit and 64-bit nodes in the same cluster

Answer:

Yes you can, but performance can be better if your situation allows you to set the 32-bit and 64-bit nodes as separate clusters within a multi-cluster grid.

Question:

Can I run 32-bit programs on 64-bit nodes

Answer:

Yes, 32-bit programs can migrate to 64-bit nodes (and even start there), but the home-node of 32-bit programs must be on a 32-bit computer. Thus, if you want to run 32-bit programs on predominantly 64-bit cluster(s), you may consider leaving aside a few 32-bit computers as part of your cluster and/or multi-cluster grid, from where you can start 32-bit programs.

Question:

Can I run 64-bit programs on 32-bit nodes

Answer:

No, the hardware does not support it (and even when it does, a 32-bit Linux kernel doesn't).

Question:

Can I have MOSIX running under a 64-bit kernel, but a 32-bit Linux installation, utilities and libraries (because it is so much easier to upgrade only the kernel)

Answer:

No, while Linux allows this combination, the current version of MOSIX (neither the 32-bit nor the 64-bit variants) does not yet support this option, so MOSIX will fail to start.

Question:

What happens if I attempt to run a 32-bit executable from a 64-bit node

Answer:

It will run correctly for the sake of transparency, but as a "native" Linux process, so the program will not be able to migrate or use special MOSIX features (not even its child processes, not even if they later execute a 64-bit binary).



Running applications

Question:

If a child process is spawned from a parent, must they migrate together

Answer:

No. Each process is managed independently.

Question:

Why shared-memory is not supported

Answer:

Because it is not scalable, i.e., it is impossible to change the contents of a memory in one node and expect that the same change will be reflected instantly in the memory of the remaining nodes (with which memory is shared), e.g., as in an SMP or a multi-core.

Question:

How to run a threaded application

Answer:

Threaded applications are created by the "CLONE_VM" system-call which uses shared-memory, and thus are not suitable for distributed-memory architectures.

In MOSIX it is possible to run threaded applications as standard Linux processes. Such applications cannot be migrated, but can still benefit from MOSIX features such as queuing and best initial-assignment.

To launch threaded applications use "mosrun -E".


Question:

How to run a script where one of commands is a threaded application

Answer:

By using the "native" utility in your script:

> native {threaded_program} [program-args]...


Question:

Must all migratable executables be started under "mosrun"

Answer:

To be migratable, either the executables, or the shell (or other program) that called them must be run under "mosrun". Once a shell runs under "mosrun", all its descendants will also be under "mosrun" (but there is a way to request explicitly that a particular child will NOT run under "mosrun").

Question:

Are there any limitations on I/O that can be performed by migrated processes

Answer:

Usually, remote I/O done by migrated processes on remote nodes is performed via the respective home-node of each process. While this does not limit the allowed operations, it may slow-down such processes. Thus, if the amount of I/O is significant, it will often cause the process to migrate back to its home-node.

Note that the amount and frequency of I/O is taken into account and weighted against other considerations in making such a decision.

The direct-communication (migratable socket) can remove this slow-down affect for I/O between communicating processes.


Question:

Which IPC mechanism should be use between processes to get the best performance

Answer:

The most efficient mechanism is the direct-communication, see the next questions.

Otherwise, MOSIX is not different from Linux: depending on the particular needs of the process, whatever approach (other than shared-memory) that is best in Linux is best on MOSIX. It could be pipes, SYSV-messages, UNIX-sockets, TCP-sockets and files.

Obviously files can be slow when they usually require writing on a physically-moving surface and/or networking. On the other hand, Linux has very good caching mechanisms for local files.


Question:

Can MOSIX support migratable socket

Answer:

Yes, direct-communication provides an effective migratable socket between migrated processes.

Question:

How direct-communication can improve the performance of communicating processes

Answer:

Normally, MOSIX processes do all their I/O and (most) system-calls via their respective home-nodes. This can be slow because operations are limited by the network speed and latency.

Direct communication allows processes to exchange messages directly between migrated processes, bypassing their home-nodes.


Question:

How to run MATLAB Version 7.4 (or older) jobs in MOSIX

Answer:

Jobs running MATLAB Version 7.4 (or older) can automatically migrate among nodes of a cluster/multi-cluster.

First, tune MATLAB to MOSIX by the following 3 steps:

  1. Find where MATLAB is installed on your system by
    > which matlab
    /usr/local/bin/matlab
  2. Backup the matlab program to another location
    > cp /usr/local/bin/matlab /tmp/mos-matlab
  3. Comment-out the following 2 lines in the mos-matlab script:
    LD_ASSUME_KERNEL=2.4.1
    export LD_ASSUME_KERNEL

    the result should be :

    #LD_ASSUME_KERNEL=2.4.1
    #export LD_ASSUME_KERNEL

You can now run MATLAB jobs in a cluster/multi-cluster using mosrun.

Example: to run the following MATLAB test.m program:

a=randn(3000);
b=svd(a);

use:

> mosrun -e mos-matlab -nojvm -nodesktop -nodisplay < test.m


Question:

How to run MATLAB Version 7.5 (or newer) jobs

Answer:

MATLAB Version 7.5 (or newer) applications use a library which uses threads (the "CLONE_VM" system-call) incorrectly. To overcome this problem we added to mosrun the -i flag, which should be used with the -E flag. This means that MATLAB jobs can be queued and assigned by MOSIX to nodes as regular Linux processes, but they can't migrate afterwards.

The MOSIX version should be at least MOSIX-2.24.0.0 and jobs should be started by:

> mosrun -E -b -i matlab ....

MOSIX will assign each job to the best node in the local cluster.

Example: to run the following MATLAB test.m program:

a=randn(3000);
b=svd(a);

use:

> mosrun -E -b -i matlab -nojvm -nodesktop -nodisplay < test.m


Question:

How to run JAVA programs

Answer:

JAVA supports (shared-memory) threads (the "CLONE_VM" system-call), which is not suitable for distributed-memory architectures (clusters). This means that JAVA jobs can be queued and assigned by MOSIX to nodes only as regular Linux processes (with the mosrun -E flag).

A JAVA job should be started by:

> mosrun -E -b java job

MOSIX will assign each job to the best node in the local cluster.


Question:

Can MOSIX migrate MPI processes

Answer:

Yes.

MPI allocates processes to slave nodes of a cluster in a Round-Robin fashion, without checking the state of the resources, e.g. speed, current load and available memory.

Process migration can improve the performance by load-balancing, by migration of processes from slower to faster nodes and to nodes with sufficient free memory, as well as by migration of MPI processes to grid nodes which are not part of the user's cluster.




HUGI

Question:

What is HUGI

Answer:

The Hebrew University Grid (HUGI) is a production organizational grid with 15 MOSIX clusters. Most clusters are private. They are made of production servers that belong to research groups in various departments. Four clusters are made of workstations in student labs.

Processes of users are allowed to migrate to idle workstations and among nodes in the private clusters, subject to the priorities among the different groups. For example, since the workstations belong to the CS department, processes that are started in a CS private cluster has a higher priority to move to a workstation over already running processes from the Chemistry cluster.

Due to the increased computing demands by our researchers, the amount of installed memory in the workstations was increased (beyond the needs of the students), to allow large guest processes from the private clusters to run in these workstations.


Question:

How HUGI is managed

Answer:

All the nodes in all the clusters of HUGI do not rely on local disks for booting and running. Local disks may be used for temporary storage.

User-files and home directories are located on central NFS servers.


Question:

What are the rules and policies for running applications on HUGI

Answer:

Rules: Policies:

Question:

Who is responsible to allocate freeze space

Answer:

Cluster owners should designate sufficient freeze space for processes originating from their cluster.