Deploying Servers - failover clusters
Configure high availability.
Of course, although you may be using servers with RAID, redundant power supplies, and redundant network cards, servers still can fail. Faulty memory, processors, motherboards, or other components can still bring down a server. In addition, operating system and application files on a server could also become corrupted or deleted. If a server cannot be down for even a minute, you need to consider installing redundant servers using a fail-over cluster or network load balancing.
A cluster is a set of independent computers that work together to increase the availability of services and applications. Each server that makes up the cluster is known as a node. Clustering is most commonly used for database applications such as Exchange Server or SQL server, file servers, print services, or network services such as DHCP servers.
If one of the nodes fails, the remaining nodes respond by distributing the load amongst them. 32-bit versions of Windows Server 2008 Enterprise or Windows Server 2008 Datacenter provide rapid, automatic failover with up to 8-node clusters and up to 16-node clusters are possible with the 64-bit versions.
Windows Server 2008 for Itanium-Based Systems supports up to eight nodes in a failover cluster. The most common failover cluster is the active-passive cluster because one node is active, providing the network services and applications, and the other node is passive, waiting to become active. If the current active node goes down, the passive node becomes the new active node, taking over in providing the network services and applications.
Another failover cluster configuration is the active-active, wherein the cluster has multiple resources that are being shared amongst the cluster nodes, spreading the load. Some resources run on one node and other resources run on the other node. If one of the nodes fails, all the resources fail over to the remaining node.
Cluster nodes are kept aware of the status of other cluster nodes and services through the use of heartbeats. A heartbeat (sent no more than every 500ms) is sent through a dedicated network card, is used to keep track of the status of each node, and is also used to send updates on the cluster's configuration.
Typically, the network resources are assigned to a cluster, which can be enabled or disabled when the node is active or inactive. Some of the terms that you need to understand include
- Cluster resource: A network application service or hardware device (including network adapters and storage systems) that is defined and managed by the cluster service.
- Cluster resource group: Resources grouped together. When a cluster resource group fails and the cluster service cannot automatically restart it, the entire cluster resource group is placed in an offline status and failed over to another node.
- Cluster virtual server: A cluster resource group that has a network name and IP address assigned to it. Cluster virtual servers are then accessed by their NetBIOS name, DNS name, and IP address.
For example, if you have an active-passive cluster, you assign a different NetBIOS name, DNS name, and IP address to each of the two nodes so that each node can be addressed individually. You then define a cluster virtual server that represents the cluster as a whole, which receives a third NetBIOS name,
DNS name and IP address. Other resources that are shared between the two nodes are defined as cluster resources and assigned to a cluster resource group. When a node is active, it has control of those cluster resources. When the server goes down, the heartbeat shows that the server is not accessible and the second node becomes active, taking over the cluster resources. To keep track of the cluster configuration, the cluster uses a witness disk, known as a quorum disk, to hold the cluster configuration database. It defines the nodes participating in the cluster, the applications and services defined within the cluster resource group, and the status of each node and cluster resource. Because the quorum disk has to be shared by all the nodes, it is typically located on a shared storage device, either as a SAN or a shared SCSI array.