mlus
mlus****@39596*****
2014年 1月 22日 (水) 13:07:58 JST
コヤマです。 高塚さん、ご返答ありがとうございます。 長文にて失礼します。 > クラスタ管理部が corosync の場合については詳しく把握していませんが、 > heartbeat の場合についていえば、 > 双方が DC を名乗っている状態になった後で相互通信が復旧したとして、 > 少なくとも片方の heartbeat プロセスを再起動をしないと、スプリット > ブレインを解消できないと思います。 テストとして、NICをホストから挿抜して行いましたが、 スプリットブレインを擬似的に作り出せてないので、 想定していたテストが出来ていないのが現状です。 2つのノードとも、HAデーモン(pacemaker corosync)を起動させた ままのテストになりました。 片方(この場合active側)のHAデーモンを再起動してしまうケースだと、 私の知識では、standby側のHAデーモンを再起動なしに稼動したままでの crmコマンドを使って元に戻す事ができていません。 質問が上手くできていませんでしたが、この方法も知りたかったのです。 再起動なしバージョンでは、テストと経過が以下のようになりまして、 一応元に戻すことができています。 ******> host1 のUSBLANを引っこ抜く #crm_mon -rfA1 1:host1------------------- Last updated: Wed Jan 22 11:45:27 2014 Last change: Wed Jan 22 11:32:29 2014 by root via cibadmin on host1 Current DC: host1 (2886926337) - partition with quorum 2 Nodes configured 4 Resources configured Online: [ host2 host1 ] Full list of resources: Resource Group: grp v_ip (ocf::heartbeat:IPaddr2): Started host2 failmail (ocf::heartbeat:MailTo): Started host2 Clone Set: clone_v_ping [v_ping] Started: [ host2 host1 ] Node Attributes: * Node host2: + pingcheck : 100 * Node host1: + pingcheck : 0 Migration summary: * Node host1: v_ip: migration-threshold=1 fail-count=1 last-failure='Wed Jan 22 11:45:11 2014 '* Node host2: v_ip_monitor_30000 on (null) 'unknown error' (1): c ---------------------------------------------- ********> host1 のUSBLANを差す 2:host1------------------- Last updated: Wed Jan 22 11:48:02 2014 Last change: Wed Jan 22 11:32:29 2014 by root via cibadmin on host1 Current DC: host1 (2886926337) - partition with quorum 2 Nodes configured 4 Resources configured Online: [ host2 host1 ] Full list of resources: Resource Group: grp v_ip (ocf::heartbeat:IPaddr2): Started host2 failmail (ocf::heartbeat:MailTo): Started host2 Clone Set: clone_v_ping [v_ping] Started: [ host2 host1 ] Node Attributes: * Node host2: + pingcheck : 100 * Node host1: + pingcheck : 100 Migration summary: * Node host1: v_ip: migration-threshold=1 fail-count=1 last-failure='Wed Jan 22 11:45:11 2014 '* Node host2: v_ip_monitor_30000 on (null) 'unknown error' (1): c 2:host2------------------------------------------ Last updated: Wed Jan 22 11:48:42 2014 Last change: Wed Jan 22 11:32:29 2014 by root via cibadmin on host1 Current DC: host1 (2886926337) - partition with quorum 2 Nodes configured 4 Resources configured Online: [ host2 host1 ] Full list of resources: Resource Group: grp v_ip (ocf::heartbeat:IPaddr2): Started host2 failmail (ocf::heartbeat:MailTo): Started host2 Clone Set: clone_v_ping [v_ping] Started: [ host2 host1 ] Node Attributes: * Node host2: + pingcheck : 100 * Node host1: + pingcheck : 100 Migration summary: * Node host1: v_ip: migration-threshold=1 fail-count=1 last-failure='Wed Jan 22 11:45:11 2014 ' * Node host2: Failed actions: v_ip_monitor_30000 on host1 'unknown error' (1): call=56, status=co mplete, last-rc-change='Wed Jan 22 11:45:11 2014', queued=0ms, exec=0ms *******> host1 でリソース停止・状態クリアコマンド実行 crm(live)resource# cleanup grp host1 Cleaning up v_ip on host1 Cleaning up failmail on host1 Waiting for 1 replies from the CRMd. OK crm(live)resource# cleanup grp host2 Cleaning up v_ip on host2 Cleaning up failmail on host2 Waiting for 1 replies from the CRMd. OK 3:host1 --------------------------------------------------- Last updated: Wed Jan 22 11:55:58 2014 Last change: Wed Jan 22 11:54:42 2014 by hacluster via crmd on host2 Current DC: host1 (2886926337) - partition with quorum 2 Nodes configured 4 Resources configured Online: [ host2 host1 ] Full list of resources: Resource Group: grp v_ip (ocf::heartbeat:IPaddr2): Started host2 failmail (ocf::heartbeat:MailTo): Started host2 Clone Set: clone_v_ping [v_ping] Started: [ host2 host1 ] Node Attributes: * Node host2: + pingcheck : 100 * Node host1: + pingcheck : 100 Migration summary: * Node host1: * Node host2: 3:host2 ------------------------------------------------------- Last updated: Wed Jan 22 11:54:48 2014 Last change: Wed Jan 22 11:54:42 2014 by hacluster via crmd on host2 Current DC: host1 (2886926337) - partition with quorum 2 Nodes configured 4 Resources configured Online: [ host2 host1 ] Full list of resources: Resource Group: grp v_ip (ocf::heartbeat:IPaddr2): Started host2 failmail (ocf::heartbeat:MailTo): Started host2 Clone Set: clone_v_ping [v_ping] Started: [ host2 host1 ] Node Attributes: * Node host2: + pingcheck : 100 * Node host1: + pingcheck : 100 Migration summary: * Node host1: * Node host2: **********> host1 コンソール リソースを停止 crm(live)resource# stop grp **********> host1 コンソール リソースを移動 crm(live)resource# move grp host1 force **********> host1 コンソール リソースを稼動 crm(live)resource# start grp 4:host1 ------------------------------------------ Last updated: Wed Jan 22 12:00:00 2014 Last change: Wed Jan 22 11:59:34 2014 by root via cibadmin on host1 Current DC: host1 (2886926337) - partition with quorum 2 Nodes configured 4 Resources configured Online: [ host2 host1 ] Full list of resources: Resource Group: grp v_ip (ocf::heartbeat:IPaddr2): Started host1 failmail (ocf::heartbeat:MailTo): Started host1 Clone Set: clone_v_ping [v_ping] Started: [ host2 host1 ] Node Attributes: * Node host2: + pingcheck : 100 * Node host1: + pingcheck : 100 Migration summary: * Node host1: * Node host2: 4:host2 ----------------------------------------------------- Last updated: Wed Jan 22 11:59:38 2014 Last change: Wed Jan 22 11:59:34 2014 by root via cibadmin on host1 Current DC: host1 (2886926337) - partition with quorum 2 Nodes configured 4 Resources configured Online: [ host2 host1 ] Full list of resources: Resource Group: grp v_ip (ocf::heartbeat:IPaddr2): Started host1 failmail (ocf::heartbeat:MailTo): Started host1 Clone Set: clone_v_ping [v_ping] Started: [ host2 host1 ] Node Attributes: * Node host2: + pingcheck : 100 * Node host1: + pingcheck : 100 Migration summary: * Node host1: * Node host2: