MOSIX is a cluster management system, aka "cluster operating system"
targeted for High-Performance Computing (HPC) on X86 Linux clusters,
multi-clusters and Clouds.
When combined with VCL, see below, it can also be used to manage GPU clusters.
MOSIX supports both interactive concurrent processes and batch jobs.
It provides a single-system image, incorporating automatic resource
discovery and dynamic workload distribution by preemptive process migration.
MOSIX is implemented as a software layer that allows applications
to run in remote nodes as if they run locally.
Users can start (sequential and parallel) applications on one node,
while MOSIX automatically seek resources and transparently run them
on other nodes.
There is no need to modify applications, copy files, login or assign
processes to remote nodes - it is all done automatically.
Allocation of applications to nodes are supervised by a comprehensive
set of on-line algorithms that monitor the state of the resources and
attempt to improve the overall performance by dynamic resource allocation,
e.g. load-balancing.
A unique feature of MOSIX is that it operates on the process-level,
unlike systems that operate on the job-level. This means that
MOSIX adapts and redistributes the workload when the number
of processes of a job (and/or their demands) changes (using "fork" and
"exit"). This is especially useful for parallel jobs.
The latest version of MOSIX
can manage clusters, multi-clusters and Clouds.
Flexible management allows owners of different
clusters to share their computational resources, while still
preserving the autonomy to disconnect their clusters
at any time, without disrupting already running programs.
A MOSIX multi-cluster can extend indefinitely as long as there is trust
between the owners of its clusters.
MOSIX can run in non-virtualised or Virtual Machine (VM) environments.
A non-virtualised environment requires to patch the Linux kernel and
provides better performance,
whereas a VM can run on top of unmodified Linux or Windows.
MOSIX is most suitable for running HPC applications
with low to moderate amount of I/O.
It is particularly suitable for:
efficient utilization of cluster-wide resources;
running applications with unpredictable resource
requirements or run times;
running (and preserving) long processes
and combining nodes of different speeds.