MOSIX is a cluster operating system that provides a single-system image.
When combined with VirtualCL (see below), it can also be used to
manage a cluster with many accelerator devices.
MOSIX supports both interactive concurrent processes and batch jobs.
It incorporates automatic resource discovery and dynamic workload
distribution by preemptive process migration.
MOSIX is implemented as a software layer that allows applications
to run in remote nodes as if they run locally.
Users can start (sequential and parallel) applications on one node,
while MOSIX automatically seek resources and transparently run them
on other nodes.
There is no need to modify applications, copy files, login or assign
processes to remote nodes - it is all done automatically.
Allocation of applications to nodes are supervised by a comprehensive
set of on-line algorithms that monitor the state of the resources and
attempt to improve the overall performance by dynamic resource allocation,
e.g. load-balancing.
A unique feature of MOSIX is that it operates on the process-level,
unlike systems that operate on the job-level. This means that
MOSIX adapts and redistributes the workload when the number
of processes of a job (and/or their demands) changes (using "fork" and
"exit"). This is especially useful for parallel jobs.
The latest version of MOSIX
can manage clusters, multi-clusters and Clouds.
Flexible management allows owners of different
clusters to share their computational resources, while still
preserving the autonomy to disconnect their clusters
at any time, without disrupting already running programs.
A MOSIX multi-cluster can extend indefinitely as long as there is trust
between the owners of its clusters.
MOSIX can run in non-virtualized or Virtual Machine (VM) environments.
A non-virtualized environment requires to patch the Linux kernel and
provides better performance,
whereas a VM can run on top of unmodified Linux or Windows.
MOSIX is suitable for running HPC applications
with low to moderate amount of I/O.
It is particularly suitable for:
efficient utilization of cluster-wide resources;
running applications with unpredictable resource
requirements or run times;
running (and preserving) long processes
and combining nodes of different speeds.