MOSIX Frequently Asked Questions - Flat listing
|
Table of contents
Copyright © 1999 - 2010 A. Barak. All rights reserved.
Question:
What is MOSIX
Answer:
MOSIX is an operating system like management system targeted for
High Performance Computing (HPC) on Linux clusters and multi-clusters.
Its main feature is to provide users and applications with the
illusion of running on a single computer with multiple processors,
without changing the interface and the run-time environment of their
respective login nodes.
More information can be found in the
About web page and
"The MOSIX Management System for Linux Clusters and
Multi-clusters" white paper.
Question:
Why this name
Answer:
MOSIX stands for a
Multicomputer
Operating
System
for UnIX.
MOSIX® is a registered
trademark of Amnon Barak and Amnon Shiloh.
Question:
Who is it suitable for
Answer:
MOSIX is suited to run compute intensive and applications with
moderate amounts of I/O over fast, secure networks, in a trusted
environment (where all remote nodes are trusted),
e.g., as in private clusters and intra-organizational multi-clusters.
Question:
What are the main benefits of MOSIX
Answer:
Users can login on any node and do not need to know where their
programs run.
In a MOSIX cluster/multi-cluster there is no need to modify or to link
applications with any library, copy files or login to remote nodes,
or even assign processes to different nodes, including nodes in
different clusters - it is all done automatically.
The outcome is ease of use, better utilization of resources and
near maximal performance.
Question:
How this is accomplished
Answer:
By a software layer that allows applications to run in
remote computers as if they run locally.
Users can run their regular sequential and parallel applications
as if they use one computer (node), while MOSIX automatically
(and transparently) seek resources and migrate processes among
nodes to improve the overall performance.
This is accomplished by on-line algorithms that monitor the state
of the system-wide resources and the running processes, then,
whenever appropriate, initiate process migration to:
-
Balance the load;
-
Move processes from slower to faster nodes;
-
Move processes from nodes that run out of free memory;
-
Preserve long-running guest processes when clusters are
about to be disconnected from the multi-cluster.
Question:
Which hardware platforms are supported
Answer:
The latest production distribution of MOSIX runs on all x86-compatible
computers (both 32-bit and 64-bit architectures).
Question:
Which software platforms are supported
Answer:
The latest production distribution of MOSIX runs on Linux-2.6.
Distributions are provided as RPMs for openSUSE, for use in native Linux
and as a pre-installed virtual-disk image that can be used to create
a MOSIX virtual cluster on Windows and/or Linux computers.
Question:
Is MOSIX a cluster or a multi-cluster technology
Answer:
Both.
MOSIX version 2 (MOSIX2) for Linux-2.6 can manage x86-based
clusters and multi-clusters.
MOSIX version 1 for Linux-2.4 can manage a single cluster.
Question:
Why all remote nodes must be trusted
Answer:
To ensure that migrated (guest) processes are not tampered with
while running in remote clusters of a multi-cluster.
Note that guest processes run in a sandbox, which prevents such
processes from accessing local resources in the hosting nodes.
Question:
History of MOSIX
Answer:
The
History of MOSIX wiki page provides information about all the
versions of MOSIX.
Question:
MOSIX related papers and reports
Answer:
Can be found in this wiki
page.
Question:
What are the main features of MOSIX
Answer:
The main features are listed in the
MOSIX for Linux-2.6
web page.
Question:
What aspects of a single-system image are supported
Answer:
MOSIX provides to users and applications the
user's "login-node" run-time environment.
This means that:
- Users can login on any node and do not need to know where
their programs run.
- No need to modify or link applications with special libraries.
- No need to copy files to remote nodes.
- Automatic resource discovery: whenever clusters or nodes
join (disconnect), all the active nodes are updated.
- Automatic workload distribution by process migration,
including load balancing, process migration from slower
to faster nodes and from nodes that run out of free memory.
Question:
How MOSIX supports Virtual Organizations (VOs)
Answer:
A VO is a set of clusters (including servers and workstations) whose owners
wish to share their computing resources from time to time
in a flexible way.
MOSIX provides the following features to manage VOs:
- Support of disruptive configurations:
clusters can join or leave the multi-cluster at any time.
- Clusters could be shared symmetrically or asymmetrically. For
example, the owner of cluster A can allow processes originating from
cluster B to move in but block processes originating from cluster C.
- A run-time priority for flexible use of nodes within and among
groups. For example, to partition a cluster among different users.
- Each cluster owner can assign priorities to processes from other clusters.
For example, the owner of cluster A can assign higher priority
to processes from cluster B and lower priority to processes from
cluster C. This way, when guest processes from cluster B wish to
move to cluster A, they will push out guest processes from cluster C
(if any).
- Local and higher priority processes force out lower priority processes.
- Migrated processes to/from a disconnecting cluster are moved
out/back, so that long-running migrated processes are not killed.
Question:
What is the architecture of a MOSIX configuration (cluster, multi-cluster)
Answer:
The architecture of a MOSIX configuration is homogeneous:
all nodes must be x86-based and run (nearly) the same version of MOSIX
(see the question about mixing different versions of MOSIX).
However, individual nodes may have different number of processors (cores),
different speed, different memory size or I/O devices.
Question:
Which type of processes are supported
Answer:
MOSIX recognizes two types of processes: Linux and MOSIX processes.
Linux processes are not affected by MOSIX - they run as
they do on any Linux system and can not be migrated.
MOSIX processes run in an environment that allows them
to migrate from one node to another.
Linux processes usually include administrative and other tasks that
are not suitable for migration, whereas MOSIX processes are selected
user-applications that are suitable and can benefit from migration.
Apart from process-migration that is available only to MOSIX processes,
MOSIX includes batch mechanisms that can queue and assign new jobs
to start on the best available nodes: these batch mechanisms are available
for both Linux and MOSIX jobs.
MOSIX processes are invoked by the "mosrun" command.
If you want to make use of the MOSIX batch mechanisms for Linux
(non-migratable) processes, use the "mosrun -E" option.
This can be summarized in the following table:
| Process type |
Migratable (MOSIX) |
Non-Migratable (Linux) |
| Batch |
mosrun -M [-b] |
mosrun -E [-b] |
| Fully-interactive |
mosrun [-b] |
(do not use "mosrun") |
where the "-b" selects the best location to run it.
Question:
Does MOSIX support checkpoint/restart
Answer:
Yes, most CPU-intensive MOSIX processes can be checkpointed.
When a checkpoint is performed, the image of the processes is saved to a
file. The process can later recover itself from that file and continue to
run from that point.
For successful checkpoint and recovery, a process must not depend heavily
on its Linux environment. For example, for security reasons processes with
setuid/setgid privileges or processes with open pipes or sockets can't be
checkpointed.
Checkpoints can be triggered by a program, by a manual request
and/or automatically - at regular time intervals, see the next question.
Question:
How to trigger a checkpoint
Answer:
Checkpoints can be triggered in 3 ways:
- By providing the "-C< file-name> " and "-A< integer-number>"
flags to mosrun.
This will perform a periodic checkpoint every "integer-number" of minutes
and the checkpointed file will be saved to the files "file-name.1",
"file-name.2", etc. Read the mosrun manual for details.
- By using the "migrate < pid> checkpoint" command to perform a
checkpoint at a specific time externally to the program.
Read the migrate command manual for details.
- From within the program, by using the MOSIX checkpoint interface.
The MOSIX checkpoint interface is documented in "man MOSIX".
It contains the following files in the proc file system:
/proc/self/checkpoint
/proc/self/checkpointfile
/proc/self/checkpointlimit
/proc/self/checkpointinterval
These files are private to each process. They allow the process to
modify its checkpoint parameters and to trigger a checkpoint operation.
Question:
Example how to perform a checkpoint from within a program
Answer:
The following program performs 100 units of work and uses the
checkpoint-unit argument to trigger a checkpoint right after that unit.
The "Checkpoint-file" is used to save the copies of the program.
#include < stdlib.h>
#include < unistd.h>
#include < string.h>
#include < stdio.h>
#include < fcntl.h>
#include < sys/stat.h>
#include < sys/types.h>
// Setting the checkpoint file from withing the process
// This can also be done via the -C argument to mosrun
int setCheckpointFile(char *file) {
int fd;
fd = open("/proc/self/checkpointfile", 1|O_CREAT, file);
if (fd == -1) {
return 0;
}
return 1;
}
// Triggering a checkpoint from within the process
int triggerCheckpoint() {
int fd;
fd = open("/proc/self/checkpoint", 1|O_CREAT, 1);
if(fd == -1) {
fprintf(stderr, "Error doing self checkpoint \n");
return 0;
}
printf("Checkpoint was done successfully\n");
return 1;
}
int main(int argc, char **argv) {
int j, unit, t;
char *checkpointFileName;
int checkpointUnit = 0;
if(argc < 3) {
fprintf(stderr, "Usage %s < checkpoint-file> < unit> \n", argv[0]);
exit(1);
}
checkpointFileName = strdup(argv[1]);
checkpointUnit = atoi(argv[2]);
if(checkpointUnit < 1 || checkpointUnit > 100) {
fprintf(stderr, "Checkpoint unit should be > 0 and < 100\n");
exit(1);
}
printf("Checkpoint file: %s\n", checkpointFileName);
printf("Checkpoint unit: %d\n", checkpointUnit);
// Setting the checkpoint file from within the process (can also be done using
// the -C argument of mosrun
if(!setCheckpointFile(checkpointFileName)) {
fprintf(stderr, "Error setting the checkpoint filename from within the process\n");
fprintf(stderr, "Make sure you are running this program via mosrun\n");
return 0;
}
// Main loop ... running for 100 units. change this loop if you wish
// the program to run do more loops
for( unit = 0; unit < 100 ; unit++ ) {
// Consuming some cpu time (simulating the run of the application)
// Change the number below to cause each loop to consume more (or) less time
for( t=0, j = 0; j < 1000000 * 500; j++ ) {
t = j+unit*2;
}
printf("Unit %d done\n", unit);
// Trigerring a checkpoint request from within the process
if(unit == checkpointUnit) {
if(!triggerCheckpoint())
return 0;
}
}
return 1;
}
To compile: gcc -o checkpoint_demo checkpoint_demo.c
To run: mosrun checkpoint_demo
A typical run:
> mosrun ./checkpoint_demo ccc 5
Checkpoint file: ccc
Checkpoint unit: 5
Unit 0 done
Unit 1 done
Unit 2 done
Unit 3 done
Unit 4 done
Unit 5 done
Checkpoint was done successfully
Unit 6 done
Unit 7 done
Unit 8 done
^C
The program triggered a checkpoint after unit 5.
The checkpointed file was saved in ccc.1.
After unit 8 the program was killed.
To restart:
> mosrun -R ccc.1
Checkpoint was done successfully
Unit 6 done
Unit 7 done
Unit 8 done
Unit 9 done
Unit 10 done
...
The program was restarted from the point right after it was checkpointed.
Question:
What are the options of "live-queuing" in MOSIX
Answer:
MOSIX supports "live-queuing" that allows queued jobs to preserve
their full connection with their Linux environment.
This includes controlling terminal, parent-process, signals, pipes,
sockets, shared file-descriptors, etc.
The queuing system includes tools for tracing queued jobs, setting
and changing their priorities or the order of execution, and for running
parallel jobs.
Question:
How the queuing system of MOSIX works
Answer:
In a MOSIX multi-cluster, each cluster has its own queue and this queue
is shared by all the users of that cluster.
The number of jobs that can be placed in the queue is limited by the
number of Linux processes (about 30000 for all users). To queue a
larger number of jobs, there is an option to run multiple command-lines
from a file, each with its own arguments. This option is commonly used
to run the same program with many different sets of arguments.
Another option allows to set an upper limit on the number of
simultaneous jobs that are allowed to run. This option combines well
with the queuing system which run jobs based on the availability of
cluster/multi-cluster resources.
There is an argument to inform the queuing system that the job may
split into a number of parallel processes, so that more resources
are reserved for it. Another argument allows bundling for easy
identification of several instances of a job by a single job-ID.
Jobs can also be handled as a group and be killed collectively.
Question:
How MOSIX manages batch jobs
Answer:
Batch jobs can be sent to any node in the local cluster
(as opposed to non-batch jobs that require the specific environment
of their dispatching node).
There are two types of batch jobs: Linux and MOSIX. Linux batch
processes do not migrate, while MOSIX batch processes can migrate,
but their home-node can be different than their dispatching node.
MOSIX can assist both types by:
- Queuing the job until resources are available
(using "mosrun -q", "mosrun -S" or both);
- Selecting the best initial assignment for the job.
Batch jobs are started from binaries in another node and preserve only
some of the caller's environment: they receive the environment variables;
they can read from their standard-input and write to their standard
output and error, but not from/to other open files; they receive signals,
but if they fork, signals are delivered to the whole process-group
rather than just the parent; they can not communicate with other processes
on the calling node using pipes and sockets (other than standard
input/output/error), semaphores, messages, etc. and can only receive
signals, but not send them to processes on the calling node.
The main advantage of batch jobs is that
they save time by not needing to refer to the dispatching-node to perform
system-calls, and that temporary files can be created on the node where
they start, preventing the dispatching node from becoming a bottleneck.
This approach is therefore recommended for programs that
perform a significant amount of I/O.
Question:
what is the MOSIX File-System (MFS).
Answer:
MFS was a file system that treated all files and directories
within a MOSIX cluster as a single file system.
It was available in MOSIX for Linux-kernel 2.4,
but is not available in MOSIX for Linux-kernel 2.6.
Question:
How MOSIX handles temporary files
Answer:
To reduce the I/O overhead, MOSIX has an option to
migrate (private) temporary files with the process.
Question:
Can MOSIX run in a Virtual Machine (VM).
Answer:
Yes.
MOSIX can run in a virtual machine in any platform
that supports virtualization (including Windows).
The MOSIX web provides a free evaluation copy of MOSIX on a
pre-installed virtual-disk image
that can be used to create a MOSIX virtual cluster
on Linux and/or Windows computers.
Question:
Is it possible to install and run more than one VM with MOSIX on the same node
Answer:
Yes, this is especially useful on multi-core computers.
Note that the total number of processors used by the VMs should not
exceed the number of physical processors.
Question:
Can MOSIX run on an unmodified Linux kernel
Answer:
Yes, within a Virtual Machine.
Question:
Why migrate processes when one can move a whole VM with a process inside
Answer:
Mainly because it is expensive, both in terms of time and the required
memory, to create a VM for each process.
Specifically:
-
Migrating a whole VM requires the transfer of much more memory.
Even in the case of "live-migration" (that works for certain types
of processes, not all), this can overload the network more.
-
Once in a VM, a process that splits (using "fork") can not get
independent resources for each split process: the original process
with all its children will have to remain together on the same VM.
-
Processes within a VM can not maintain most of their connections
(pipes, signals, parents/children, IPC, etc.) with other processes,
either on the generating host or in other VM's.
-
Allocating a full virtual-disk image for each process can consume
a large amount of disk space.
-
Current VM technology doesn't support migration between different
clusters that are on different switches.
MOSIX Virtual OpenCL (VCL)
|
Question:
What is MOSIX-VCL
Answer:
MOSIX-VCL (VCL) is an OpenCL implementation that allows
one to use multiple GPUs in a cluster, as if they were all present
on the user's computer.
Question:
Why VCL is needed
Answer:
Currently, OpenCL applications can utilize only local devices.
VCL overcome this limitation by allowing such application to
transparently use cluster-wide devices.
Question:
What are the requirements of VCL
Answer:
- Linux 64-bit.
- TCP/IP connection, with port #255 reserved for VCL.
- OpenCL version 1.1 installed on all the computers with GPUs.
Question:
How VCL works with MOSIX
Answer:
VCL can use the MOSIX dynamic cluster configuration and the
information that flows between MOSIX nodes
to determine the availability of GPUs.
Note that VCL works with MOSIX-2.28.1.0 or higher.
Question:
Can VCL work without MOSIX
Answer:
Yes.
In this case you need to manually configure the participating computers.
Question:
Are only GPUs supported
Answer:
No, CPUs can also be supported so long as an appropriate OpenCL
SDK that supports CPUs is installed.
Question:
Which version of OpenCL is supported
Answer:
Currently, VCL supports OpenCL version 1.0.
Support for OpenCL version 1.1 will be available.
Question:
Can VCL work along with other OpenCL implementations
Answer:
Currently, the "cl_khr_icd" OpenCL extension is not supported,
so each instance of an application can only use either VCL or other
implementations, but not both.
Question:
How can I obtain the best speedup from using many GPUs in parallel
Answer:
Use as much parallelism and asynchronous operations as you can in
your OpenCL program. Avoid enqueue operations on the same memory-object
(buffer/image) in queues that belong to different devices (use copies
if necessary).
MOSIX Reach the Clouds (MRC)
|
Question:
What is MRC
Answer:
MRC is a tool that allows applications to run on remote computers
(including other clusters and commercial Clouds), without the need
to pre-copy files to these clusters.
MRC can run on both Linux nodes and MOSIX clusters.
MRC applications run in a hybrid environment, where some of their
files are on their launching (local) node and the rest are on
target (remote) nodes.
Question:
What are the main features of MRC
Answer:
-
The ability to run local applications on remote computers, such as
Clouds.
-
Application running on Clouds can use both the files of the target
computer and any desired subset of directories from the launching
computer.
With a proper choice of directories, this allows to achieve:
-
Remote file access;
-
File sharing among different computers and users;
-
Running application that need both common and private data.
-
By running MRC recursively, files can be shared from three or
more computers.
-
Standard-Input/Output/Error remain on the launching computer.
-
"mosrun" and all its features can be used on remote clusters
that run MOSIX.
Question:
How MRC works
Answer:
MRC consists of two parts, a launching program that can send jobs from
the user's "head-node", e.g. workstation, to a designated target node,
and a run-time environment that provides file services to running jobs
on target computers.
If the target node is part of a MOSIX cluster, then MRC jobs can benefit
from all the MOSIX features. If a target node runs Linux (but not MOSIX),
then MRC jobs can only run there as native Linux jobs.
Question:
How MRC jobs are launched
Answer:
By the "mrc" command, run "man mrc" for details.
Question:
If I run several MRC jobs simultaneously, is their view of files consistent?
Answer:
Each MRC job has the option whether or not to maintain a cache on the
target node. When a cache is maintained, changes made by one job may
take time until seen by other jobs, but when caching is not used, there
is full file-consistency. It is also possible to cache some directories
and not others.
Question:
Which distributions of MOSIX support MRC
Answer:
A copy of MRC is included in MOSIX-2.26.0.0 for Linux-2.6.30 onwards.
Question:
The latest release and change-log
Answer:
Are available here.
Question:
Technical support
Answer:
Technical support, including configuration and installation assistance
as well as upgrades to new releases are available for a fee
here.
Question:
How to install
Answer:
An installation script and instructions are included
in all the MOSIX distributions.
Question:
After installing MOSIX in one node, how do I install it on the other nodes
Answer:
The best way is to use a cluster installation package (such as OSCAR).
If you use a common NFS root directory for your cluster,
you can install MOSIX in that directory.
Otherwise, on a small cluster, you can install MOSIX node by node.
Question:
Why did the installer failed to patch my kernel
Answer:
You probably specified a wrong version of the Linux kernel sources.
You must use only the
official Linux kernel sources for your specific MOSIX distribution.
Note: do not use the kernel sources supplied with commercial Linux
distributions - they were modified and could cause the MOSIX patch
to fail.
Question:
Why did I get a kernel panic when trying to boot the MOSIX kernel
Answer:
This is not because of MOSIX, but simply because you have prepared
your own Linux kernel, which is probably miss-configured
(if you need to be convinced, try a plain, non-MOSIX, kernel from
http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.x).
When you use a standard Linux package (such as RedHat, SuSe, or Debian),
your kernel (and/or kernel modules) would already be configured by that
package, but when you compile your own kernel - as you do when installing
MOSIX, you need to make sure that the kernel configuration suits your
hardware and contains all the necessary device-drivers and file-systems
that you are using.
One tool that often helps in constructing the correct kernel configuration
is to use the output of "gzip -cd < /proc/config.gz", produced on the
originally-supplied kernel, as a basis for the new configuration
(but note that not every Linux distribution has "/proc/config.gz").
This output may not be totally accurate because it comes from a different
(usually older) Linux kernel-version, but is a good place to start:
place it in the file ".config" of the kernel-source directory,
then adjust it by running "make menuconfig".
Another tip that may help to configure the kernel correctly, is that
unless you are a very experienced Linux system-administrator, you should
probably avoid the "initrd" hassles and configure all the drivers and
file-systems that you need in order to get the system to start within
the kernel itself rather than as kernel modules.
Question:
After I installed MOSIX, "mosrun" produces "Not Super User" and exits
Answer:
The file "/bin/mosrun" (and a few others) must have setuid-root
permissions. If for any reason it does not, then run:
> chown root /bin/mosrun /bin/mosq /bin/mosps
> chmod 4755 /bin/mosrun /bin/mosq /bin/mosps
Question:
May I mix different versions of MOSIX in the same cluster or multi-cluster.
Answer:
The MOSIX version has 4 digits. It is OK to mix versions when
only the last digit is different, but not otherwise.
Question:
How can I see the state of my cluster or multi-cluster.
Answer:
Type "mon" (the MOSIX monitor).
It can display the number of active nodes (type t),
loads (l), size of total/used memory (m),
dead nodes (d) and relative CPU speeds (s).
Question:
Is it necessary to restart MOSIX in order to change the configuration
Answer:
No.
Once you modify configuration files, the changes will take effect
within a minute. After editing the list of nodes in your cluster
("/etc/mosix/mosix.map") you need to run "setpe", but if you are
using "mosconf" to modify the local configuration, then there is
no need to run "setpe".
Question:
How do I know that the process migration works
Answer:
Run "mon" in one screen.
Then run several copies of a test (CPU bound) program,
e.g.,
mosrun -e awk 'BEGIN {for(i=0;i<100000;i++)for(j=0;j<100000;j++);}'
First you should see an increase of the load in one node.
After a few seconds, if the process migration works you will
see how the load is spread among the nodes.
If your nodes are not of the same speed then more processes
will run in the faster nodes.
Question:
What is the maximal number of multi-cores supported
Answer:
MOSIX supports whatever hardware is supported by the Linux kernel
that it runs under, including multi-cores (dual, quad, 8-way, etc.)
and SMPs.
Question:
Is Hyper-threading supported
Answer:
Yes.
Question:
/proc/cpuinfo shows 8 CPUs, but MOSIX claims that there are only 4
Answer:
There are in fact only 4 real CPUs (or cores) per node -
the extra CPUs shown in /proc/cpuinfo reflect Hyper-Threading.
You can tell that Hyper-Threading is enabled by the "ht" flag
in the "flags" field of /proc/cpuinfo.
Hyper-Threading may help a few threaded applications, such as
browsers, but is useless and sometimes even detrimental for HPC
applications. It is a good idea to disable the Hyper-Threading
feature in the BIOS (some vendors name it "virtual processor").
Question:
What are the port numbers used by MOSIX
Answer:
TCP ports 249 - 253.
UDP ports 249 - 250.
Question:
What happens when a node crashes
Answer:
All processes that were running on or originated from that node
are killed.
To minimize the damage for long-running processes, it is recommended
to use the MOSIX checkpoint facility.
Question:
Does the traffic among MOSIX nodes pass safely through the IPSec tunnels
Answer:
Yes. MOSIX works on top of TCP and UDP, obviously above IP.
Question:
Is it possible to run MOSIX over a WAN or the Internet
Answer:
Yes.
However, opening a cluster over the Internet without a VPN is a
security hazard.
Question:
How to run MOSIX processes in idle workstations
Answer:
MOSIX can take advantage of idle workstations (when no one is logged in),
with the option that upon a login, all MOSIX processes are moved out and the
MOSIX activities are stopped.
- In the login script add the commands:
> mosctl block
> mosctl expel &
The "mosctl block" command prevents new remote processes from migrating
to that workstation.
The "mosctl expel &" move out MOSIX guest processes.
Note that an & is used after the expel command, since
expelling processes may take some time and we don't want the user login
process to hang. The processes are expelled while the user logs in.
- On logout, run the command:
> mosctl noblock
This command allows remote processes to migrate to the workstation.
On a Debian system using GDM the appropriate file to add this command
is /etc/gdm/PostSession/Default .
Note that when adding the mosctl commands to the GDM script you shouldn't
interfere with the correct work of gdb.
32-bit and 64-bit applications
|
Question:
How do I inform MOSIX whether I use 32-bit or 64-bit systems
Answer:
No need to do so - the MOSIX installation script will automatically
detect the type of system that you have and install the appropriate binaries.
Question:
Can I mix 32-bit and 64-bit nodes in the same cluster
Answer:
Yes you can, but performance can be better if you set the 32-bit and
the 64-bit nodes as separate clusters in a multi-cluster configuration.
Question:
Can I run 32-bit programs on 64-bit nodes
Answer:
Yes, 32-bit programs can migrate to 64-bit nodes (and even start there),
but the home-node of 32-bit programs must be on a 32-bit node..
Thus, if
you want to run 32-bit programs on predominantly 64-bit cluster(s), you
may consider leaving aside a few 32-bit nodes as part of your cluster
and/or multi-cluster, from where you can start 32-bit programs.
Question:
Can I run 64-bit programs on 32-bit nodes
Answer:
No, the hardware does not support it
(and even when it does, the 32-bit Linux kernel doesn't).
Question:
Can I have MOSIX running under a 64-bit kernel, but a 32-bit Linux installation, utilities and libraries (because it is so much easier to upgrade only the kernel)
Answer:
No, while Linux allows this combination, the current version of MOSIX
(neither the 32-bit nor the 64-bit variants) does not yet support this
option, so MOSIX will fail to start.
Question:
What happens if I attempt to run a 32-bit executable from a 64-bit node
Answer:
It will run correctly for the sake of transparency, but as a "native"
Linux process, so the program will not be able to migrate or use special
MOSIX features (not even its child processes and even if they later
execute a 64-bit binary).
Question:
If a child process is spawned from a parent, must they migrate together
Answer:
No. Each process is managed independently.
Question:
Why shared-memory is not supported
Answer:
Because it is not scalable,
i.e., it is impossible to change the contents of a memory in one
node and expect that the same change will be reflected instantly
in the memory of the remaining nodes (with which memory is shared),
e.g., as in a multi-core.
Question:
How to run a threaded application
Answer:
Threaded applications are created by the "CLONE_VM" system-call
which uses shared-memory, and thus are not suitable for distributed-memory
architectures.
In MOSIX it is possible to run threaded applications as standard Linux
processes. Such applications can not be migrated, but can still benefit
from MOSIX features such as queuing and best initial-assignment.
To launch threaded applications use "mosrun -E".
Question:
How to run a script where one of the commands is a threaded application
Answer:
By using the "native" utility in your script:
> native {threaded_program} [program-args]...
Question:
Must all migratable executables be started under "mosrun"
Answer:
To be migratable, either the executables, or the shell (or other program)
that called them must be run under "mosrun". Once a shell runs under
"mosrun", all its descendants will also be under "mosrun"
(but there is a way to request explicitly that a particular child
will NOT run under "mosrun").
Question:
Are there any limitations on I/O that can be performed by migrated processes
Answer:
Usually, remote I/O done by migrated processes on remote nodes
is performed via the respective home-node of each process.
While this does not limit the allowed operations, it may slow-down
such processes. Thus, if the amount of I/O is significant, it will
often cause the process to migrate back to its home-node.
Note that the amount and frequency of I/O is taken into account and
weighted against other considerations in making such a decision.
The direct-communication (migratable socket) can reduce this slow-down
affect for I/O between communicating processes.
Question:
Which IPC mechanism should be use between processes to get the best performance
Answer:
The most efficient mechanism is the direct-communication, see the next
questions.
Otherwise, MOSIX is not different from Linux:
depending on the particular needs of the process,
whatever approach (other than shared-memory) that is best in Linux
is best on MOSIX. It could be pipes, SYSV-messages, UNIX-sockets,
TCP-sockets and files.
Obviously files can be slow when they usually require writing on
a physically-moving surface and/or networking. On the other hand,
Linux has very good caching mechanisms for local files.
Question:
Can MOSIX support migratable socket
Answer:
Yes, direct-communication provides an effective migratable socket between
migrated processes.
Question:
How direct-communication can improve the performance of communicating processes
Answer:
Normally, MOSIX processes do all their I/O and (most) system-calls via
their respective home-nodes.
This can be slow because operations are limited by the network speed and
latency.
Direct communication allows processes to exchange messages
directly between migrated processes, bypassing their home-nodes.
Question:
How to run MATLAB Version 7.4 (or older) jobs in MOSIX
Answer:
Jobs running MATLAB Version 7.4 (or older) can automatically migrate
among nodes of a cluster/multi-cluster.
First, tune MATLAB to MOSIX by the following 3 steps:
- Find where MATLAB is installed on your system by
> which matlab
/usr/local/bin/matlab
- Backup the matlab program to another location
> cp /usr/local/bin/matlab /tmp/mos-matlab
- Comment-out the following 2 lines in the mos-matlab script:
LD_ASSUME_KERNEL=2.4.1
export LD_ASSUME_KERNEL
the result should be :
#LD_ASSUME_KERNEL=2.4.1
#export LD_ASSUME_KERNEL
You can now run MATLAB jobs in a cluster/multi-cluster using mosrun.
Example: to run the following MATLAB test.m program:
a=randn(3000);
b=svd(a);
use:
> mosrun -e mos-matlab -nojvm -nodesktop -nodisplay < test.m
Question:
How to run MATLAB Version 7.5 (or newer) jobs
Answer:
MATLAB Version 7.5 (or newer) applications use a library which
uses threads (the "CLONE_VM" system-call) incorrectly.
To overcome this problem we added to mosrun the -i flag,
which should be used with the -E flag.
This means that MATLAB jobs can be queued and assigned by MOSIX to nodes
as regular Linux processes, but they can't migrate afterwards.
The MOSIX version should be at least MOSIX-2.24.0.0
and jobs should be started by:
> mosrun -E -b -i matlab ....
MOSIX will assign each job to the best node in the local cluster.
Example: to run the following MATLAB test.m program:
a=randn(3000);
b=svd(a);
use:
> mosrun -E -b -i matlab -nojvm -nodesktop -nodisplay < test.m
Question:
How to run JAVA programs
Answer:
JAVA supports (shared-memory) threads (the "CLONE_VM" system-call),
which is not suitable for distributed-memory architectures (clusters).
This means that JAVA jobs can be queued and assigned by MOSIX to nodes
only as regular Linux processes (with the "mosrun -E" flag).
A JAVA job should be started by:
> mosrun -E -b java job
MOSIX will assign each job to the best node in the local cluster.
Question:
Can MOSIX migrate MPI processes
Answer:
Yes.
MPI allocates processes to slave nodes of a cluster in a Round-Robin fashion,
without checking the state of the resources, e.g. speed, current load and
available memory.
Process migration can improve the performance by load-balancing, by
migration of processes from slower to faster nodes and to nodes with
sufficient free memory, as well as by migration of MPI processes to
nodes in remote clusters, which are not part of the user's cluster.
Question:
What is HUGI
Answer:
HUGI
is a campus cloud with 16 MOSIX clusters.
Most clusters are private. They are made of production servers
that belong to research groups in various departments.
Four clusters are made of workstations in student labs.
Processes of users are allowed to migrate to idle workstations
and among nodes in private clusters, subject to the priorities
among the different groups.
For example, since the workstations belong to the CS department,
processes that are started in a CS private cluster has a higher
priority to move to a workstation over already running processes
from the Chemistry cluster.
Due to the increased computing demands by our researchers,
the amount of installed memory in the workstations was increased
(beyond the needs of the students), to allow large guest processes
from the private clusters to run in these workstations.
Question:
How HUGI is managed
Answer:
All nodes in the HUGI cloud are diskless,
they do not rely on local disks for booting and running.
Local disks may be used for temporary storage.
User-files and home directories are located on central NFS servers.
Question:
What are the rules and policies for running applications on HUGI
Answer:
Rules:
- Users are requested to login and start their jobs in the private
cluster of their group.
- Remote logins to the student workstations are not permitted.
- Users that submit jobs with a large number of sequential
processes are requested to use either the -q or the -S queuing
options of mosrun.
- Users are requested to use the -m parameter of mosrun, predicting
the amount of memory they will require.
Policies:
- All the workstations are rebooted every night.
- Before rebooting a workstation, all guest processes are moved out.
Those processes can move to other nodes in HUGI- if no other
nodes are available, they are frozen in the home node.
- Processes will automatically migrate to the best available cloud
nodes (subject to the priority of their home cluster).
- Students have the highest priority over their workstations. Whenever a
student logs in, all guest processes are moved out from that workstation.
Question:
Who is responsible to allocate freeze space
Answer:
Cluster owners should designate sufficient freeze space for
processes originating from their cluster.