midja****@tsuna*****
midja****@tsuna*****
2012年 1月 26日 (木) 15:43:43 JST
松浦と申します。 初めて投稿させていただきます。 現在Heartbeat+DRBDにてクラスタ構成を検討しており、 下記のとおりフェイルオーバー動作する事を想定しています。 (1)node01/node02からルータ(192.168.0.1)に向けてPINGを送信し、 応答が無い場合はマスターノードのリソースをスレーブノードに切り替える。 (2)マスターノードのHeartbeatサービスを停止、またはノードをシャットダウンした際に、 リソースををスレーブノードに切り替える。 <問題点> (1)はフェイルオーバーすることが確認出来ました。 (2)はスレーブノード上でDRBDリソースが一瞬マスターに切り替わるのですが、 すぐにスレーブに戻ってしまい、うまく切り替わりません。 <環境> CentOS5.6 64bit Pacemaker 1.0.11 Heartbeat 3.0.5 drbd 8.3.8 Pacemaker設定ファイル ------------------------------------------------------------------------------------------- property $id="cib-bootstrap-options" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ no-quorum-policy="ignore" \ stonith-enabled="false" \ startup-fencing="false" \ dc-deadtime="20s" rsc_defaults $id="rsc-options" \ resource-stickiness="INFINITY" \ migration-threshold="1" primitive res_drbd0 ocf:linbit:drbd \ params \ drbd_resource="r0" \ drbdconf="/etc/drbd.conf" \ op start interval="0s" timeout="240s" on-fail="restart" \ op monitor interval="11s" timeout="60s" on-fail="restart" \ op monitor interval="10s" timeout="60s" on-fail="restart" role="Master" \ op stop interval="0s" timeout="100s" on-fail="block" primitive res_fs_drbd0 ocf:heartbeat:Filesystem \ params \ device="/dev/drbd0" \ directory="/chroot" \ fstype="ext3" \ op start interval="0s" timeout="60s" on-fail="restart" \ op monitor interval="10s" timeout="60s" on-fail="restart" \ op stop interval="0s" timeout="60s" on-fail="block" primitive res_vip ocf:heartbeat:IPaddr2 \ params \ nic="eth0" \ ip="192.168.0.111" \ cidr_netmask="24" \ op start interval="0s" timeout="90s" on-fail="restart" \ op monitor interval="10s" timeout="60s" on-fail="restart" \ op stop interval="0s" timeout="100s" on-fail="block" primitive res_ping ocf:pacemaker:pingd \ params \ name="default_ping_set" \ host_list="192.168.0.1" \ multiplier="100" \ dampen="0" \ meta \ migration-threshold="10" \ op start interval="0" timeout="90s" on-fail="restart" \ op monitor interval="10s" timeout="60s" on-fail="restart" \ op stop interval="0" timeout="100s" on-fail="ignore" group rg_drbd \ res_vip res_fs_drbd0 ms ms_drbd0 res_drbd0 \ meta \ master-max="1" \ master-node-max="1" \ clone-max="2" \ clone-node-max="1" \ notify="true" clone cl_ping res_ping \ meta \ clone-max="2" \ clone-node-max="1" location loc_rg_drbd rg_drbd \ rule 200: #uname eq node01 \ rule 100: #uname eq node02 \ rule -INFINITY: defined default_ping_set and default_ping_set lt 100 location loc_ms_drbd0 ms_drbd0 \ rule 200: #uname eq node01 \ rule 100: #uname eq node02 \ rule role=master -INFINITY: defined default_ping_set and default_ping_set lt 100 colocation rg_on_drbd inf: rg_drbd ms_drbd0:Master colocation cl_ping_col 1000: rg_drbd cl_ping order ord_rg_aft_drbd inf: ms_drbd0:promote rg_drbd:start ------------------------------------------------------------------------------------------- drbd.conf ------------------------------------------------------------------------------------------- global { usage-count yes; } common { syncer { rate 250M; } } resource r0 { protocol C; startup { degr-wfc-timeout 120; } net { cram-hmac-alg sha1; shared-secret "qawsedrftgyhujiko"; } on node01 { device /dev/drbd0; disk /dev/sda8; address 192.168.100.1:7789; meta-disk internal; } on node02 { device /dev/drbd0; disk /dev/sda8; address 192.168.100.2:7789; meta-disk internal; } } ------------------------------------------------------------------------------------------- Heartbeatサービスを停止した際の/var/log/messages ------------------------------------------------------------------------------------------- Jan 26 14:49:38 node01 kernel: block drbd0: role( Primary -> Secondary ) Jan 26 14:49:43 node01 kernel: block drbd0: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown ) Jan 26 14:49:43 node01 kernel: block drbd0: short read expecting header on sock: r=-512 Jan 26 14:49:43 node01 kernel: block drbd0: meta connection shut down by peer. Jan 26 14:49:43 node01 kernel: block drbd0: asender terminated Jan 26 14:49:43 node01 kernel: block drbd0: Terminating asender thread Jan 26 14:49:43 node01 kernel: block drbd0: Connection closed Jan 26 14:49:43 node01 kernel: block drbd0: conn( Disconnecting -> StandAlone ) Jan 26 14:49:43 node01 kernel: block drbd0: receiver terminated Jan 26 14:49:43 node01 kernel: block drbd0: Terminating receiver thread Jan 26 14:49:43 node01 kernel: block drbd0: disk( UpToDate -> Diskless ) Jan 26 14:49:43 node01 kernel: block drbd0: drbd_bm_resize called with capacity == 0 Jan 26 14:49:43 node01 kernel: block drbd0: worker terminated Jan 26 14:49:43 node01 kernel: block drbd0: Terminating worker thread ------------------------------------------------------------------------------------------- Jan 26 14:49:38 node02 kernel: block drbd0: peer( Primary -> Secondary ) Jan 26 14:49:43 node02 kernel: block drbd0: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown ) Jan 26 14:49:43 node02 kernel: block drbd0: asender terminated Jan 26 14:49:43 node02 kernel: block drbd0: Terminating asender thread Jan 26 14:49:43 node02 kernel: block drbd0: Connection closed Jan 26 14:49:43 node02 kernel: block drbd0: conn( TearDown -> Unconnected ) Jan 26 14:49:43 node02 kernel: block drbd0: receiver terminated Jan 26 14:49:43 node02 kernel: block drbd0: Restarting receiver thread Jan 26 14:49:43 node02 kernel: block drbd0: receiver (re)started Jan 26 14:49:43 node02 kernel: block drbd0: conn( Unconnected -> WFConnection ) Jan 26 14:49:45 node02 kernel: block drbd0: role( Secondary -> Primary ) Jan 26 14:49:45 node02 kernel: block drbd0: Creating new current UUID Jan 26 14:49:45 node02 kernel: block drbd0: State change failed: Need access to UpToDate data Jan 26 14:49:45 node02 kernel: block drbd0: state = { cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown r--- } Jan 26 14:49:45 node02 kernel: block drbd0: wanted = { cs:WFConnection ro:Primary/Unknown ds:Outdated/DUnknown r--- } Jan 26 14:49:46 node02 kernel: block drbd0: role( Primary -> Secondary ) Jan 26 14:49:46 node02 kernel: block drbd0: disk( UpToDate -> Outdated ) ------------------------------------------------------------------------------------------- 大変お手数ですが、どなたかご教授頂けますでしょうか。 以上よろしくおねがいします。