[Linux-ha-jp] DRBD シャットダウン時にフェイルオーバーできない

Back to archive index

midja****@tsuna***** midja****@tsuna*****
2012年 1月 26日 (木) 15:43:43 JST


松浦と申します。
初めて投稿させていただきます。

現在Heartbeat+DRBDにてクラスタ構成を検討しており、
下記のとおりフェイルオーバー動作する事を想定しています。

(1)node01/node02からルータ(192.168.0.1)に向けてPINGを送信し、
   応答が無い場合はマスターノードのリソースをスレーブノードに切り替える。
(2)マスターノードのHeartbeatサービスを停止、またはノードをシャットダウンした際に、
   リソースををスレーブノードに切り替える。

<問題点>
(1)はフェイルオーバーすることが確認出来ました。
(2)はスレーブノード上でDRBDリソースが一瞬マスターに切り替わるのですが、
すぐにスレーブに戻ってしまい、うまく切り替わりません。

<環境>
CentOS5.6 64bit
Pacemaker 1.0.11
Heartbeat 3.0.5
drbd      8.3.8

Pacemaker設定ファイル
-------------------------------------------------------------------------------------------
property $id="cib-bootstrap-options" \
     cluster-infrastructure="openais" \
     expected-quorum-votes="2" \
     no-quorum-policy="ignore" \
     stonith-enabled="false" \
     startup-fencing="false" \
     dc-deadtime="20s"

rsc_defaults $id="rsc-options" \
     resource-stickiness="INFINITY" \
     migration-threshold="1"

primitive res_drbd0 ocf:linbit:drbd \
     params \
          drbd_resource="r0" \
          drbdconf="/etc/drbd.conf" \
     op start interval="0s" timeout="240s" on-fail="restart" \
     op monitor interval="11s" timeout="60s" on-fail="restart" \
     op monitor interval="10s" timeout="60s" on-fail="restart" role="Master" \
     op stop interval="0s" timeout="100s" on-fail="block"

primitive res_fs_drbd0 ocf:heartbeat:Filesystem \
     params \
          device="/dev/drbd0" \
          directory="/chroot" \
          fstype="ext3" \
     op start interval="0s" timeout="60s" on-fail="restart" \
     op monitor interval="10s" timeout="60s" on-fail="restart" \
     op stop interval="0s" timeout="60s" on-fail="block"

primitive res_vip ocf:heartbeat:IPaddr2 \
     params \
          nic="eth0" \
          ip="192.168.0.111" \
          cidr_netmask="24" \
     op start interval="0s" timeout="90s" on-fail="restart" \
     op monitor interval="10s" timeout="60s" on-fail="restart" \
     op stop interval="0s" timeout="100s" on-fail="block"

primitive res_ping ocf:pacemaker:pingd \
     params \
          name="default_ping_set" \
          host_list="192.168.0.1" \
          multiplier="100" \
          dampen="0" \
     meta \
          migration-threshold="10" \
     op start interval="0" timeout="90s" on-fail="restart" \
     op monitor interval="10s" timeout="60s" on-fail="restart" \
     op stop interval="0" timeout="100s" on-fail="ignore"

group rg_drbd \
     res_vip res_fs_drbd0

ms ms_drbd0 res_drbd0 \
     meta \
          master-max="1" \
          master-node-max="1" \
          clone-max="2" \
          clone-node-max="1" \
          notify="true"

clone cl_ping res_ping \
     meta \
          clone-max="2" \
          clone-node-max="1"

location loc_rg_drbd rg_drbd \
     rule 200: #uname eq node01 \
     rule 100: #uname eq node02 \
     rule -INFINITY: defined default_ping_set and default_ping_set lt 100

location loc_ms_drbd0 ms_drbd0 \
     rule 200: #uname eq node01 \
     rule 100: #uname eq node02 \
     rule role=master -INFINITY: defined default_ping_set and default_ping_set lt 100

colocation rg_on_drbd inf: rg_drbd ms_drbd0:Master
colocation cl_ping_col 1000: rg_drbd cl_ping
order ord_rg_aft_drbd inf: ms_drbd0:promote rg_drbd:start
-------------------------------------------------------------------------------------------

drbd.conf
-------------------------------------------------------------------------------------------
global { usage-count yes; }
common { syncer { rate 250M; } }
resource r0 {
    protocol C;
    startup {
         degr-wfc-timeout 120;
    }
    net {
         cram-hmac-alg sha1;
         shared-secret "qawsedrftgyhujiko";
    }
    on node01 {
         device    /dev/drbd0;
         disk      /dev/sda8;
         address   192.168.100.1:7789;
         meta-disk  internal;
    }
    on node02 {
         device    /dev/drbd0;
         disk      /dev/sda8;
         address   192.168.100.2:7789;
         meta-disk  internal;
    }
}
-------------------------------------------------------------------------------------------

Heartbeatサービスを停止した際の/var/log/messages
-------------------------------------------------------------------------------------------
Jan 26 14:49:38 node01 kernel: block drbd0: role( Primary -> Secondary )
Jan 26 14:49:43 node01 kernel: block drbd0: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown )
Jan 26 14:49:43 node01 kernel: block drbd0: short read expecting header on sock: r=-512
Jan 26 14:49:43 node01 kernel: block drbd0: meta connection shut down by peer.
Jan 26 14:49:43 node01 kernel: block drbd0: asender terminated
Jan 26 14:49:43 node01 kernel: block drbd0: Terminating asender thread
Jan 26 14:49:43 node01 kernel: block drbd0: Connection closed
Jan 26 14:49:43 node01 kernel: block drbd0: conn( Disconnecting -> StandAlone )
Jan 26 14:49:43 node01 kernel: block drbd0: receiver terminated
Jan 26 14:49:43 node01 kernel: block drbd0: Terminating receiver thread
Jan 26 14:49:43 node01 kernel: block drbd0: disk( UpToDate -> Diskless )
Jan 26 14:49:43 node01 kernel: block drbd0: drbd_bm_resize called with capacity == 0
Jan 26 14:49:43 node01 kernel: block drbd0: worker terminated
Jan 26 14:49:43 node01 kernel: block drbd0: Terminating worker thread
-------------------------------------------------------------------------------------------
Jan 26 14:49:38 node02 kernel: block drbd0: peer( Primary -> Secondary )
Jan 26 14:49:43 node02 kernel: block drbd0: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown )
Jan 26 14:49:43 node02 kernel: block drbd0: asender terminated
Jan 26 14:49:43 node02 kernel: block drbd0: Terminating asender thread
Jan 26 14:49:43 node02 kernel: block drbd0: Connection closed
Jan 26 14:49:43 node02 kernel: block drbd0: conn( TearDown -> Unconnected )
Jan 26 14:49:43 node02 kernel: block drbd0: receiver terminated
Jan 26 14:49:43 node02 kernel: block drbd0: Restarting receiver thread
Jan 26 14:49:43 node02 kernel: block drbd0: receiver (re)started
Jan 26 14:49:43 node02 kernel: block drbd0: conn( Unconnected -> WFConnection )
Jan 26 14:49:45 node02 kernel: block drbd0: role( Secondary -> Primary )
Jan 26 14:49:45 node02 kernel: block drbd0: Creating new current UUID
Jan 26 14:49:45 node02 kernel: block drbd0: State change failed: Need access to UpToDate data
Jan 26 14:49:45 node02 kernel: block drbd0:   state = { cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown r--- }
Jan 26 14:49:45 node02 kernel: block drbd0:  wanted = { cs:WFConnection ro:Primary/Unknown ds:Outdated/DUnknown r--- }
Jan 26 14:49:46 node02 kernel: block drbd0: role( Primary -> Secondary )
Jan 26 14:49:46 node02 kernel: block drbd0: disk( UpToDate -> Outdated )
-------------------------------------------------------------------------------------------

大変お手数ですが、どなたかご教授頂けますでしょうか。
以上よろしくおねがいします。





Linux-ha-japan メーリングリストの案内
Back to archive index