As part of a MySQL Cluster setup, I recently setup a 2-node web cluster using CentOS’s native cluster software suite with a twist. The web root was mounted on a DRBD partition in place of periodic file sync’ing. This article focuses on the cluster setup and does not cover the DRBD setup/configuration. Let’s go:
Install “Cluster Storage” group using yum:
[root@host]# yum installgroup “Cluster Storage”
Edit /etc/cluster/cluster.conf: ======================================================
<?xml version=”1.0″?>
<cluster name=”drbd_srv” config_version=”1″>
<cman two_node=”1″ expected_votes=”1″>
</cman>
<clusternodes>
<clusternode name=”WEB_Node1″ votes=”1″ nodeid=”1″>
<fence>
<method name=”single”>
<device name=”human” ipaddr=”10.255.255.225″/>
</method>
</fence>
</clusternode>
<clusternode name=”WEB_Node2″ votes=”1″ nodeid=”2″>
<fence>
<method name=”single”>
<device name=”human” ipaddr=”10.255.255.226″/>
</method>
</fence>
</clusternode>
</clusternodes>
<fence_devices>
<fence_device name=”human” agent=”fence_manual”/>
</fence_devices>
</cluster>
======================================================
Start the cluster:
[root@host]# service cman start (on both nodes)
To verify proper startup:
[root@host]# cman_tool nodes
Should show:
Node Sts Inc Joined Name
1 M 16 2009-08-11 22:13:27 WEB_Node1
2 M 24 2009-08-11 22:13:34 WEB_Node2
Status ‘M’ means normal, ‘X’ would mean there is a problem
Edit /etc/lvm/lvm.conf
change:
locking_type = 1
to:
locking_type = 3
and change:
filter = [“a/.*/”]
to:
filter = [ “a|drbd.*|”, “r|.*|” ]
Start clvmd:
[root@host]# service clvmd start (on both nodes)
Set cman and clvmd to start on bootup on both nodes:
[root@host]# chkconfig –level 345 cman on
[root@host]# chkconfig –level 345 clvmd on
Run vgscan:
[root@host]# vgscan
Create a new PV (physical volume) using the drbd block device
[root@host]# pvcreate /dev/drbd1
Create a new VG (volume group) using the drbd block device
[root@host]# vgcreate name-it-something /dev/drbd1 (VolGroup01, for example)
Now when you run vgdisplay, you should see:
— Volume group —
VG Name VolGroup01
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 1
VG Access read/write
VG Status resizable
Clustered yes
Shared no
MAX LV 0
Cur LV 0
Open LV 0
Max PV 0
Cur PV 1
Act PV 1
VG Size 114.81 GB
PE Size 4.00 MB
Total PE 29391
Alloc PE / Size 0 / 0
Free PE / Size 29391 / 114.81 GB
VG UUID k9TBBF-xdg7-as4a-2F0c-XGTv-M2Wh-CabVXZ
Notice the line that reads “Clustered yes”
Create a LV (logical volume) in the PV that we just created.
**Make sure your drbd nodes are both in primary roles before doing this.
If there is a “Split-Brain detected, dropping connection!” entry in /var/log/messages, then a manual split-brain recovery is necessary.
To manually recover from a split-brain scenario (on the split-brain machine):
[root@host]# drbdadm secondary r0
[root@host]# drbdadm — –discard-my-data connect r0
On verified that both nodes are in primary mode:
On node2:
[root@host]# service clvmd restart
On node1:
[root@host]# lvcreate -l 100%FREE -n gfs VolGroup01
Format the LV with gfs:
[root@host]# mkfs.gfs -p lock_dlm -t drbd_srv:www -j 2 /dev/VolGroup01/gfs (drbd_srv is the cluster name, www is a locking table name)
Start the gfs service (on both nodes):
[root@host]# service gfs start
Set the gfs service to start automatically (on both nodes):
[root@host]# chkconfig –level 345 gfs on
Mount the filesystem (on both nodes):
[root@host]# mount -t gfs /dev/VolGroup01/gfs /srv
Modify the startup sequence as follows:
1) network
…
2) drbd (S15)
3) cman (S21)
4) clvmd (S24)
5) gfs (S26)
Remove the soft link for shutting down openais (K20openais in runlevel 3 for our install) because shutting down cman does the same.
Modify the shutdown sequence as follows (in reverse order as startup sequence):
1) gfs (K21)
2) clvmd (K22)
3) cman (K23)
4) drbd (K24)
…
5) network
Reboot each server to test availability of gfs filesystem during reboot.
Troubleshooting:
If one node’s drbd status is showing:
version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by buildsvn@c5-x8664-build, 2008-10-03 11:30:17
1: cs:StandAlone st:Primary/Unknown ds:UpToDate/DUnknown r—
ns:0 nr:0 dw:40 dr:845 al:1 bm:3 lo:0 pe:0 ua:0 ap:0 oos:12288
and the other node is showing:
version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by buildsvn@c5-x8664-build, 2008-10-03 11:30:17
1: cs:WFConnection st:Primary/Unknown ds:UpToDate/DUnknown C r—
ns:0 nr:0 dw:56 dr:645 al:3 bm:2 lo:0 pe:0 ua:0 ap:0 oos:8200
Also, the node with cs:WFConnection in the status field should have the gfs filesystem mounted.
Then, there is a drbd sync issue. To solve this, just restart the gfs cluster services on the node with “cs:Standalone” in the status field by:
1) service clvmd stop
2) service cman stop
3) service drbd restart
Verify you see Primary/Primary in the st: section of the drbd status (cat /proc/drbd)
4) service cman start
5) service clvmd start
6) service gfs start