renay****@ybb*****
renay****@ybb*****
2015年 3月 17日 (火) 23:51:09 JST
福田さん こんばんは、山内です。 ということは、xen0もstonith-helperもたぶん、Pacemakerのstonithプラグインとしての管理下のパスにはないということになると思います。 Reusableと、pm_extrasあたりのインストールが怪しいと思いますね。 また、何かわかったらご連絡します。 以上です。 ----- Original Message ----- >From: Masamichi Fukuda - elf-systems <masamichi_fukud****@elf-s*****> >To: 山内英生 <renay****@ybb*****>; "linux****@lists*****" <linux****@lists*****> >Date: 2015/3/17, Tue 23:46 >Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて > > >山内さん > >こんばんは、福田です。 > >stonith-helperの-x指定は何かやり方が違うんでしょうかね。 > >stonith-helperを外して、xen0だけにして起動してみました。 > ># crm_mon -rfA > >Last updated: Tue Mar 17 23:38:53 2015 >Last change: Tue Mar 17 23:30:34 2015 >Stack: heartbeat >Current DC: lbv1.beta.com (38b0f200-83ea-8633-6f37-047d36cd39c6) - parti >tion with quorum >Version: 1.1.12-e32080b >2 Nodes configured >6 Resources configured > > >Online: [ lbv1.beta.com lbv2.beta.com ] > >Full list of resources: > >Stonith1-2 (stonith:external/xen0): Stopped >Stonith2-2 (stonith:external/xen0): Stopped > Resource Group: HAvarnish > vip_208 (ocf::heartbeat:IPaddr2): Started lbv1.beta.com > varnishd (lsb:varnish): Started lbv1.beta.com > Clone Set: clone_ping [ping] > Started: [ lbv1.beta.com lbv2.beta.com ] > >Node Attributes: >* Node lbv1.beta.com: > + default_ping_set : 100 >* Node lbv2.beta.com: > + default_ping_set : 100 > >Migration summary: >* Node lbv1.beta.com: > Stonith2-2: migration-threshold=1 fail-count=1000000 last-failure='Tue Mar 17 > 23:38:34 2015' >* Node lbv2.beta.com: > Stonith1-2: migration-threshold=1 fail-count=1000000 last-failure='Tue Mar 17 > 23:38:27 2015' > >Failed actions: > Stonith2-2_start_0 on lbv1.beta.com 'unknown error' (1): call=23, st >atus=Error, exit-reason='none', last-rc-change='Tue Mar 17 23:38:32 2015', queue >d=0ms, exec=1061ms > Stonith1-2_start_0 on lbv2.beta.com 'unknown error' (1): call=23, st >atus=Error, exit-reason='none', last-rc-change='Tue Mar 17 23:38:25 2015', queue >d=0ms, exec=1342ms > > > > >stonith-helperがあるときと同様のfialed actionsが出ているようです。 > > >宜しくお願いします。 > >以上 > > > > >2015年3月17日 22:38 <renay****@ybb*****>: > >福田さん >> >>こんばんは、山内です。 >> >>ちなみに可能であれば、external/stonith-helperを外して、external/xen0だけにした場合に >>どうなるか?を確認すると、問題の切り分けになるかもしれません。 >> >>以上です。 >> >> >> >>----- Original Message ----- >> >>> From: "renay****@ybb*****" <renay****@ybb*****> >>> To: "linux****@lists*****" <linux****@lists*****> >>> Cc: >>> Date: 2015/3/17, Tue 22:28 >>> Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて >>> >>> 福田さん >>> >>> こんばんは、山内です。 >>> >>> 変わらないようですね。。。 >>> >>> とりあえず、明日くらいに、RHEL上ですが、 >>> >>> Heartbeat3.0.6 >>> Pacemakerの最新 >>> >>> 組み合わせで、同じような設定(リソースはDummy、external/xen0はexternal/sshになりますが)stonith-helperが動くかどうかを確認してみます。 >>> >>> #stonith-helperの-x指定の出力が確認出来ると、もう少し問題が絞りやすいのですが・・・ >>> >>> >>> 以上です。 >>> >>> >>> >>> ----- Original Message ----- >>>> From: Masamichi Fukuda - elf-systems >>> <masamichi_fukud****@elf-s*****> >>>> To: 山内英生 <renay****@ybb*****>; >>> "linux****@lists*****" >>> <linux****@lists*****> >>>> Date: 2015/3/17, Tue 21:24 >>>> Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて >>>> >>>> >>>> 山内さん >>>> >>>> こんばんは、福田です。 >>>> 最新版の情報をありがとうございました。 >>>> >>>> 早速インストールしてみました。 >>>> >>>> 起動後の状態です。 >>>> >>>> failed actionsは変わりないようです。 >>>> >>>> >>>> >>>> # crm_mon -rfA >>>> Last updated: Tue Mar 17 21:03:49 2015 >>>> Last change: Tue Mar 17 20:30:58 2015 >>>> Stack: heartbeat >>>> Current DC: lbv1.beta.com (38b0f200-83ea-8633-6f37-047d36cd39c6) - parti >>>> tion with quorum >>>> Version: 1.1.12-e32080b >>>> 2 Nodes configured >>>> 8 Resources configured >>>> >>>> >>>> Online: [ lbv1.beta.com lbv2.beta.com ] >>>> >>>> Full list of resources: >>>> >>>> Resource Group: HAvarnish >>>> vip_208 (ocf::heartbeat:IPaddr2): Started lbv1.beta.com >>>> varnishd (lsb:varnish): Started lbv1.beta.com >>>> Resource Group: grpStonith1 >>>> Stonith1-1 (stonith:external/stonith-helper): Stopped >>>> Stonith1-2 (stonith:external/xen0): Stopped >>>> Resource Group: grpStonith2 >>>> Stonith2-1 (stonith:external/stonith-helper): Stopped >>>> Stonith2-2 (stonith:external/xen0): Stopped >>>> Clone Set: clone_ping [ping] >>>> Started: [ lbv1.beta.com lbv2.beta.com ] >>>> >>>> Node Attributes: >>>> * Node lbv1.beta.com: >>>> + default_ping_set : 100 >>>> * Node lbv2.beta.com: >>>> + default_ping_set : 100 >>>> >>>> Migration summary: >>>> * Node lbv1.beta.com: >>>> Stonith2-1: migration-threshold=1 fail-count=1000000 >>> last-failure='Tue Mar 17 >>>> 21:03:39 2015' >>>> * Node lbv2.beta.com: >>>> Stonith1-1: migration-threshold=1 fail-count=1000000 >>> last-failure='Tue Mar 17 >>>> 21:03:32 2015' >>>> >>>> Failed actions: >>>> Stonith2-1_start_0 on lbv1.beta.com 'unknown error' (1): >>> call=31, st >>>> atus=Error, exit-reason='none', last-rc-change='Tue Mar 17 >>> 21:03:37 2015', queue >>>> d=0ms, exec=1085ms >>>> Stonith1-1_start_0 on lbv2.beta.com 'unknown error' (1): >>> call=18, st >>>> atus=Error, exit-reason='none', last-rc-change='Tue Mar 17 >>> 21:03:30 2015', queue >>>> d=0ms, exec=1061ms >>>> >>>> >>>> >>>> >>>> ログです。 >>>> >>>> >>>> # less /var/log/ha-debug >>>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: info: Pacemaker support: >>> yes >>>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: WARN: File >>> /etc/ha.d//haresources exists. >>>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: WARN: This file is not used >>> because pacemaker is enabled >>>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: debug: Checking access of: >>> /usr/local/heartbeat/libexec/heartbeat/ccm >>>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: debug: Checking access of: >>> /usr/local/heartbeat/libexec/pacemaker/cib >>>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: debug: Checking access of: >>> /usr/local/heartbeat/libexec/pacemaker/stonithd >>>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: debug: Checking access of: >>> /usr/local/heartbeat/libexec/pacemaker/lrmd >>>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: debug: Checking access of: >>> /usr/local/heartbeat/libexec/pacemaker/attrd >>>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: debug: Checking access of: >>> /usr/local/heartbeat/libexec/pacemaker/crmd >>>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: WARN: Core dumps could be >>> lost if multiple dumps occur. >>>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: WARN: Consider setting >>> non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum >>> supportability >>>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: WARN: Consider setting >>> /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability >>>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: WARN: Logging daemon is >>> disabled --enabling logging daemon is recommended >>>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: info: >>> ************************** >>>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: info: Configuration >>> validated. Starting heartbeat 3.0.6 >>>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: heartbeat: version >>> 3.0.6 >>>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: Heartbeat generation: >>> 1423534116 >>>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: seed is -1702799346 >>>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: glib: ucast: write >>> socket priority set to IPTOS_LOWDELAY on eth1 >>>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: glib: ucast: bound >>> send socket to device: eth1 >>>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: glib: ucast: set >>> SO_REUSEADDR >>>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: glib: ucast: bound >>> receive socket to device: eth1 >>>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: glib: ucast: started >>> on port 694 interface eth1 to 10.0.17.133 >>>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: Local status now set >>> to: 'up' >>>> Mar 17 21:02:46 lbv1.beta.com heartbeat: [4236]: info: Link >>> lbv2.beta.com:eth1 up. >>>> Mar 17 21:02:46 lbv1.beta.com heartbeat: [4236]: info: Status update for >>> node lbv2.beta.com: status up >>>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Comm_now_up(): >>> updating status to active >>>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Local status now set >>> to: 'active' >>>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Starting child client >>> "/usr/local/heartbeat/libexec/heartbeat/ccm" (109,113) >>>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Starting child client >>> "/usr/local/heartbeat/libexec/pacemaker/cib" (109,113) >>>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Starting child client >>> "/usr/local/heartbeat/libexec/pacemaker/stonithd" (0,0) >>>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Starting child client >>> "/usr/local/heartbeat/libexec/pacemaker/lrmd" (0,0) >>>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Starting child client >>> "/usr/local/heartbeat/libexec/pacemaker/attrd" (109,113) >>>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Starting child client >>> "/usr/local/heartbeat/libexec/pacemaker/crmd" (109,113) >>>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: debug: get_delnodelist: >>> delnodelist= >>>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4250]: info: Starting >>> "/usr/local/heartbeat/libexec/pacemaker/crmd" as uid 109 gid 113 (pid >>> 4250) >>>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4246]: info: Starting >>> "/usr/local/heartbeat/libexec/pacemaker/cib" as uid 109 gid 113 (pid >>> 4246) >>>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4249]: info: Starting >>> "/usr/local/heartbeat/libexec/pacemaker/attrd" as uid 109 gid 113 >>> (pid 4249) >>>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4245]: info: Starting >>> "/usr/local/heartbeat/libexec/heartbeat/ccm" as uid 109 gid 113 (pid >>> 4245) >>>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4248]: info: Starting >>> "/usr/local/heartbeat/libexec/pacemaker/lrmd" as uid 0 gid 0 (pid >>> 4248) >>>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4247]: info: Starting >>> "/usr/local/heartbeat/libexec/pacemaker/stonithd" as uid 0 gid 0 (pid >>> 4247) >>>> Mar 17 21:02:47 lbv1.beta.com ccm: [4245]: info: Hostname: lbv1.beta.com >>>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: the send queue length >>> from heartbeat to client ccm is set to 1024 >>>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: the send queue length >>> from heartbeat to client attrd is set to 1024 >>>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: the send queue length >>> from heartbeat to client stonith-ng is set to 1024 >>>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Status update for >>> node lbv2.beta.com: status active >>>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: the send queue length >>> from heartbeat to client cib is set to 1024 >>>> Mar 17 21:02:51 lbv1.beta.com heartbeat: [4236]: WARN: 1 lost packet(s) for >>> [lbv2.beta.com] [15:17] >>>> Mar 17 21:02:51 lbv1.beta.com heartbeat: [4236]: info: No pkts missing from >>> lbv2.beta.com! >>>> Mar 17 21:02:52 lbv1.beta.com heartbeat: [4236]: WARN: 1 lost packet(s) for >>> [lbv2.beta.com] [19:21] >>>> Mar 17 21:02:52 lbv1.beta.com heartbeat: [4236]: info: No pkts missing from >>> lbv2.beta.com! >>>> Mar 17 21:02:52 lbv1.beta.com heartbeat: [4236]: info: the send queue length >>> from heartbeat to client crmd is set to 1024 >>>> Mar 17 21:02:53 lbv1.beta.com heartbeat: [4236]: WARN: 1 lost packet(s) for >>> [lbv2.beta.com] [24:26] >>>> Mar 17 21:02:53 lbv1.beta.com heartbeat: [4236]: info: No pkts missing from >>> lbv2.beta.com! >>>> Mar 17 21:02:54 lbv1.beta.com heartbeat: [4236]: WARN: 1 lost packet(s) for >>> [lbv2.beta.com] [26:28] >>>> Mar 17 21:02:54 lbv1.beta.com heartbeat: [4236]: info: No pkts missing from >>> lbv2.beta.com! >>>> Mar 17 21:02:54 lbv1.beta.com heartbeat: [4236]: WARN: 1 lost packet(s) for >>> [lbv2.beta.com] [30:32] >>>> Mar 17 21:02:54 lbv1.beta.com heartbeat: [4236]: info: No pkts missing from >>> lbv2.beta.com! >>>> >>>> >>>> >>>> # less /var/log/error >>>> >>>> Mar 17 21:02:47 lbv1 attrd[4249]: error: ha_msg_dispatch: Ignored >>> incoming message. Please set_msg_callback on hbclstat >>>> Mar 17 21:02:48 lbv1 attrd[4249]: error: ha_msg_dispatch: Ignored >>> incoming message. Please set_msg_callback on hbclstat >>>> Mar 17 21:02:53 lbv1 stonith-ng[4247]: error: ha_msg_dispatch: Ignored >>> incoming message. Please set_msg_callback on hbclstat >>>> Mar 17 21:02:53 lbv1 stonith-ng[4247]: error: ha_msg_dispatch: Ignored >>> incoming message. Please set_msg_callback on hbclstat >>>> Mar 17 21:03:39 lbv1 crmd[4250]: error: process_lrm_event: Operation >>> Stonith2-1_start_0 (node=lbv1.beta.com, call=31, status=4, cib-update=42, >>> confirmed=true) Error >>>> >>>> # cat syslog|egrep 'Mar 17 21:03|Mar 17 21:02' |egrep >>> 'heartbeat|stonith|pacemaker|error' >>>> Mar 17 21:03:24 lbv1 pengine[4253]: notice: process_pe_message: Calculated >>> Transition 0: /var/lib/pacemaker/pengine/pe-input-115.bz2 >>>> Mar 17 21:03:27 lbv1 crmd[4250]: notice: run_graph: Transition 0 >>> (Complete=15, Pending=0, Fired=0, Skipped=16, Incomplete=2, >>> Source=/var/lib/pacemaker/pengine/pe-input-115.bz2): Stopped >>>> Mar 17 21:03:29 lbv1 pengine[4253]: notice: process_pe_message: Calculated >>> Transition 1: /var/lib/pacemaker/pengine/pe-input-116.bz2 >>>> Mar 17 21:03:34 lbv1 crmd[4250]: notice: run_graph: Transition 1 >>> (Complete=8, Pending=0, Fired=0, Skipped=12, Incomplete=1, >>> Source=/var/lib/pacemaker/pengine/pe-input-116.bz2): Stopped >>>> Mar 17 21:03:37 lbv1 pengine[4253]: warning: unpack_rsc_op_failure: >>> Processing failed op start for Stonith1-1 on lbv2.beta.com: unknown error (1) >>>> Mar 17 21:03:37 lbv1 pengine[4253]: warning: unpack_rsc_op_failure: >>> Processing failed op start for Stonith1-1 on lbv2.beta.com: unknown error (1) >>>> Mar 17 21:03:37 lbv1 pengine[4253]: notice: process_pe_message: Calculated >>> Transition 2: /var/lib/pacemaker/pengine/pe-input-117.bz2 >>>> Mar 17 21:03:39 lbv1 stonith-ng[4247]: notice: log_operation: Operation >>> 'monitor' [4377] for device 'Stonith2-1' returned: -201 (Generic >>> Pacemaker error) >>>> Mar 17 21:03:39 lbv1 stonith-ng[4247]: warning: log_operation: >>> Stonith2-1:4377 [ Performing: stonith -t external/stonith-helper -S ] >>>> Mar 17 21:03:39 lbv1 stonith-ng[4247]: warning: log_operation: >>> Stonith2-1:4377 [ failed to exec "stonith" ] >>>> Mar 17 21:03:39 lbv1 stonith-ng[4247]: warning: log_operation: >>> Stonith2-1:4377 [ failed: 2 ] >>>> Mar 17 21:03:39 lbv1 crmd[4250]: error: process_lrm_event: Operation >>> Stonith2-1_start_0 (node=lbv1.beta.com, call=31, status=4, cib-update=42, >>> confirmed=true) Error >>>> Mar 17 21:03:40 lbv1 crmd[4250]: notice: run_graph: Transition 2 >>> (Complete=12, Pending=0, Fired=0, Skipped=3, Incomplete=0, >>> Source=/var/lib/pacemaker/pengine/pe-input-117.bz2): Stopped >>>> Mar 17 21:03:42 lbv1 pengine[4253]: warning: unpack_rsc_op_failure: >>> Processing failed op start for Stonith2-1 on lbv1.beta.com: unknown error (1) >>>> Mar 17 21:03:42 lbv1 pengine[4253]: warning: unpack_rsc_op_failure: >>> Processing failed op start for Stonith2-1 on lbv1.beta.com: unknown error (1) >>>> Mar 17 21:03:42 lbv1 pengine[4253]: warning: unpack_rsc_op_failure: >>> Processing failed op start for Stonith1-1 on lbv2.beta.com: unknown error (1) >>>> Mar 17 21:03:42 lbv1 pengine[4253]: notice: process_pe_message: Calculated >>> Transition 3: /var/lib/pacemaker/pengine/pe-input-118.bz2 >>>> Mar 17 21:03:42 lbv1 IPaddr2(vip_208)[4448]: INFO: >>> /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p >>> /var/run/resource-agents/send_arp-192.168.17.208 eth0 192.168.17.208 auto >>> not_used not_used >>>> Mar 17 21:03:47 lbv1 crmd[4250]: notice: run_graph: Transition 3 >>> (Complete=10, Pending=0, Fired=0, Skipped=0, Incomplete=0, >>> Source=/var/lib/pacemaker/pengine/pe-input-118.bz2): Complete >>>> >>>> 宜しくお願いします。 >>>> >>>> 以上 >>>> >>>> >>>> >>>> 2015年3月17日 18:31 <renay****@ybb*****>: >>>> >>>> 福田さん >>>>> >>>>> こんばんは、山内です。 >>>>> >>>>> tag付けされていないので、本日の最新版は、 >>>>> >>>>> * >>> https://github.com/ClusterLabs/pacemaker/tree/e32080b460f81486b85d08ec958582b3e72d858c >>>>> >>>>> >>>>> になります。 >>>>> 右側の[Download ZIP]からダウンロード出来ます。 >>>>> >>>>> 以上です。 >>>>> >>>>> >>>>> ----- Original Message ----- >>>>>> From: Masamichi Fukuda - elf-systems >>> <masamichi_fukud****@elf-s*****> >>>>> >>>>>> To: "renay****@ybb*****" >>> <renay****@ybb*****>; >>> "linux****@lists*****" >>> <linux****@lists*****> >>>>>> Date: 2015/3/17, Tue 18:07 >>>>>> Subject: スプリットブレイン時のSTONITHエラーについて >>>>>> >>>>>> >>>>>> 山内さん >>>>>> >>>>>> >>>>>> お疲れ様です、福田です。 >>>>>> >>>>>> >>>>>> こちらを見たのですが、 >>>>>> https://github.com/ClusterLabs/pacemaker/tags >>>>>> >>>>>> >>>>>> >>>>>> pacemaker 1.1.12 561c4cf が最新のようなのですが。 >>>>>> 済みませんが、これ以降の最新版はどちらにあるか教えて頂けますか。 >>>>>> >>>>>> >>>>>> 宜しくお願いします。 >>>>>> >>>>>> >>>>>> 以上 >>>>>> >>>>>> >>>>>> >>>>>> 2015年3月17日火曜日、<renay****@ybb*****>さんは書きました: >>>>>> >>>>>> 福田さん >>>>>>> >>>>>>> お疲れ様です。山内です。 >>>>>>> >>>>>>> はい。古いです。 >>>>>>> >>>>>>> PacemakerがHeartbeat3.0.6に対応したのは意外と最近です。 >>>>>>> もっと新しいものを入れてください。(また、ソースから構築する必要がありますが・・・・) >>>>>>> >>>>>>> >>>>>>> >>>>>>> 本家のgithubから入手可能です。 >>>>>>> * https://github.com/ClusterLabs/pacemaker >>>>>>> >>>>>>> >>>>>>> 場合によっては、最新のmasterはエラーなどが出る場合がありますので、その場合は、バージョンを古い方にたぐって >>>>>>> いくのが良いと思います。 >>>>>>> >>>>>>> 以上です。 >>>>>>> >>>>>>> >>>>>>> >>>>>>> ----- Original Message ----- >>>>>>>> From: Masamichi Fukuda - elf-systems >>> <masamichi_fukud****@elf-s*****> >>>>>>>> To: 山内英生 <renay****@ybb*****>; >>> "linux****@lists*****" >>> <linux****@lists*****> >>>>>>>> Date: 2015/3/17, Tue 16:06 >>>>>>>> Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて >>>>>>>> >>>>>>>> >>>>>>>> 山内さん >>>>>>>> >>>>>>>> お疲れ様です、福田です。 >>>>>>>> >>>>>>>> 以前のメールでheartbeatとpacemakerを最新版を入れたほうが良いと回答頂きました。 >>>>>>>> そこで今回、heartbeat3.0.6とpacemaker1.1.12を入れたのですが。 >>>>>>>> >>>>>>>> heartbeat configuration: Version = "3.0.6" >>>>>>>> pacemaker configuration: Version = 1.1.12 (Build: >>> 561c4cf)pacemakerがまだ古いということでしょうか。 >>>>>>>> >>>>>>>> 済みませんが、宜しくお願いします。 >>>>>>>> >>>>>>>> 以上 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 2015年3月17日 14:59 <renay****@ybb*****>: >>>>>>>> >>>>>>>> 福田さん >>>>>>>>> >>>>>>>>> お疲れ様です。山内です。 >>>>>>>>> >>>>>>>>> ふと思ったのすが、以前のやり取りのメールで以下と回答してますが、問題ないでしょうか? >>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>> 2)Heartbeat3.0.6+Pacemaker最新 : >>> OK >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>> どうやら、Heartbeatも最新版3.0.6を組合せる必要があるようです。 >>>>>>>>>>>>>>> >>> * http://hg.linux-ha.org/heartbeat-STABLE_3_0/rev/cceeb47a7d8f >>>>>>>>> >>>>>>>>> 以下のcrm_monのバージョンを見ると、1.1.12のようです。 >>>>>>>>> Heartbeat3.0.6と組み合わせるには、かなり新しめのPacemakerが必要です。 >>>>>>>>> >>>>>>>>>> # crm_mon -rfA >>>>>>>>>> >>>>>>>>>> Last updated: Tue Mar 17 14:14:39 2015 >>>>>>>>>> Last change: Tue Mar 17 14:01:43 2015 >>>>>>>>>> Stack: heartbeat >>>>>>>>>> Current DC: lbv2.beta.com >>> (82ffc36f-1ad8-8686-7db0-35686465c624) - parti >>>>>>>>>> tion with quorum >>>>>>>>>> Version: 1.1.12-561c4cf >>>>>>>>> >>>>>>>>> たぶん、以下の変更以降は少なくとも必要かと思います。 >>>>>>>>> >>>>>>>>> https://github.com/ClusterLabs/pacemaker/commit/f2302da063d08719d28367d8e362b8bfb0f85bf3 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> 以上です。 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ----- Original Message ----- >>>>>>>>>> From: Masamichi Fukuda - elf-systems >>> <masamichi_fukud****@elf-s*****> >>>>>>>>>> To: 山内英生 <renay****@ybb*****>; >>> "linux****@lists*****" >>> <linux****@lists*****> >>>>>>>>> >>>>>>>>>> Date: 2015/3/17, Tue 14:38 >>>>>>>>>> Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 山内さん >>>>>>>>>> >>>>>>>>>> お疲れ様です、福田です。 >>>>>>>>>> >>>>>>>>>> stonith-helperのシェバング行に-xを追加すれば良いのでしょうか? >>>>>>>>>> stonith-helperの先頭行を#!/bin/bash -xにしてクラスタを起動してみました。 >>>>>>>>>> >>>>>>>>>> crm_monでは先ほどと変わりはないようです。 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> # crm_mon -rfA >>>>>>>>>> >>>>>>>>>> Last updated: Tue Mar 17 14:14:39 2015 >>>>>>>>>> Last change: Tue Mar 17 14:01:43 2015 >>>>>>>>>> Stack: heartbeat >>>>>>>>>> Current DC: lbv2.beta.com >>> (82ffc36f-1ad8-8686-7db0-35686465c624) - parti >>>>>>>>>> tion with quorum >>>>>>>>>> Version: 1.1.12-561c4cf >>>>>>>>>> 2 Nodes configured >>>>>>>>>> 8 Resources configured >>>>>>>>>> >>>>>>>>>> Online: [ lbv1.beta.com lbv2.beta.com ] >>>>>>>>>> >>>>>>>>>> Full list of resources: >>>>>>>>>> >>>>>>>>>> Resource Group: HAvarnish >>>>>>>>>> vip_208 (ocf::heartbeat:IPaddr2): >>> Started lbv1.beta.com >>>>>>>>>> varnishd (lsb:varnish): Started >>> lbv1.beta.com >>>>>>>>>> Resource Group: grpStonith1 >>>>>>>>>> Stonith1-1 >>> (stonith:external/stonith-helper): Stopped >>>>>>>>>> Stonith1-2 (stonith:external/xen0): >>> Stopped >>>>>>>>>> Resource Group: grpStonith2 >>>>>>>>>> Stonith2-1 >>> (stonith:external/stonith-helper): Stopped >>>>>>>>>> Stonith2-2 (stonith:external/xen0): >>> Stopped >>>>>>>>>> Clone Set: clone_ping [ping] >>>>>>>>>> Started: [ lbv1.beta.com lbv2.beta.com ] >>>>>>>>>> >>>>>>>>>> Node Attributes: >>>>>>>>>> * Node lbv1.beta.com: >>>>>>>>>> + default_ping_set : 100 >>>>>>>>>> * Node lbv2.beta.com: >>>>>>>>>> + default_ping_set : 100 >>>>>>>>>> >>>>>>>>>> Migration summary: >>>>>>>>>> * Node lbv2.beta.com: >>>>>>>>>> Stonith1-1: migration-threshold=1 >>> fail-count=1000000 last-failure='Tue Mar 17 >>>>>>>>>> 14:12:16 2015' >>>>>>>>>> * Node lbv1.beta.com: >>>>>>>>>> Stonith2-1: migration-threshold=1 >>> fail-count=1000000 last-failure='Tue Mar 17 >>>>>>>>>> 14:12:21 2015' >>>>>>>>>> >>>>>>>>>> Failed actions: >>>>>>>>>> Stonith1-1_start_0 on lbv2.beta.com 'unknown >>> error' (1): call=31, st >>>>>>>>>> atus=Error, last-rc-change='Tue Mar 17 14:12:14 >>> 2015', queued=0ms, exec=1065ms >>>>>>>>>> Stonith2-1_start_0 on lbv1.beta.com 'unknown >>> error' (1): call=26, st >>>>>>>>>> atus=Error, last-rc-change='Tue Mar 17 14:12:19 >>> 2015', queued=0ms, exec=1081ms >>>>>>>>>> >>>>>>>>>> その他のログを探してみました。 >>>>>>>>>> >>>>>>>>>> heartbeat起動時です。 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> # less /var/log/pm_logconv.out >>>>>>>>>> Mar 17 14:11:28 lbv1.beta.com info: Starting >>> Heartbeat 3.0.6. >>>>>>>>>> Mar 17 14:11:33 lbv1.beta.com info: Link >>> lbv2.beta.com:eth1 is up. >>>>>>>>>> Mar 17 14:11:34 lbv1.beta.com info: Start >>> "ccm" process. (pid=13264) >>>>>>>>>> Mar 17 14:11:34 lbv1.beta.com info: Start >>> "lrmd" process. (pid=13267) >>>>>>>>>> Mar 17 14:11:34 lbv1.beta.com info: Start >>> "attrd" process. (pid=13268) >>>>>>>>>> Mar 17 14:11:34 lbv1.beta.com info: Start >>> "stonithd" process. (pid=13266) >>>>>>>>>> Mar 17 14:11:34 lbv1.beta.com info: Start >>> "cib" process. (pid=13265) >>>>>>>>>> Mar 17 14:11:34 lbv1.beta.com info: Start >>> "crmd" process. (pid=13269) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> # less /var/log/error >>>>>>>>>> Mar 17 14:12:20 lbv1 crmd[13269]: error: >>> process_lrm_event: Operation Stonith2-1_start_0 (node=lbv1.beta.com, call=26, >>> status=4, cib-update=19, confirmed=true) Error >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> syslogからstonithをgrepしたものです >>>>>>>>>> >>>>>>>>>> Mar 17 14:11:34 lbv1 heartbeat: [13255]: info: >>> Starting child client >>> "/usr/local/heartbeat/libexec/pacemaker/stonithd" (0,0) >>>>>>>>>> Mar 17 14:11:34 lbv1 heartbeat: [13266]: info: >>> Starting "/usr/local/heartbeat/libexec/pacemaker/stonithd" as uid 0 >>> gid 0 (pid 13266) >>>>>>>>>> Mar 17 14:11:34 lbv1 stonithd[13266]: notice: >>> crm_cluster_connect: Connecting to cluster infrastructure: heartbeat >>>>>>>>>> Mar 17 14:11:34 lbv1 heartbeat: [13255]: info: the >>> send queue length from heartbeat to client stonithd is set to 1024 >>>>>>>>>> Mar 17 14:11:40 lbv1 stonithd[13266]: notice: >>> setup_cib: Watching for stonith topology changes >>>>>>>>>> Mar 17 14:11:40 lbv1 stonithd[13266]: notice: >>> unpack_config: On loss of CCM Quorum: Ignore >>>>>>>>>> Mar 17 14:11:40 lbv1 stonithd[13266]: warning: >>> handle_startup_fencing: Blind faith: not fencing unseen nodes >>>>>>>>>> Mar 17 14:11:40 lbv1 stonithd[13266]: warning: >>> handle_startup_fencing: Blind faith: not fencing unseen nodes >>>>>>>>>> Mar 17 14:11:41 lbv1 stonithd[13266]: notice: >>> stonith_device_register: Added 'Stonith2-1' to the device list (1 active >>> devices) >>>>>>>>>> Mar 17 14:11:41 lbv1 stonithd[13266]: notice: >>> stonith_device_register: Added 'Stonith2-2' to the device list (2 active >>> devices) >>>>>>>>>> Mar 17 14:12:04 lbv1 stonithd[13266]: notice: >>> xml_patch_version_check: Versions did not change in patch 0.5.0 >>>>>>>>>> Mar 17 14:12:20 lbv1 stonithd[13266]: notice: >>> log_operation: Operation 'monitor' [13386] for device >>> 'Stonith2-1' returned: -201 (Generic Pacemaker error) >>>>>>>>>> Mar 17 14:12:20 lbv1 stonithd[13266]: warning: >>> log_operation: Stonith2-1:13386 [ Performing: stonith -t external/stonith-helper >>> -S ] >>>>>>>>>> Mar 17 14:12:20 lbv1 stonithd[13266]: warning: >>> log_operation: Stonith2-1:13386 [ failed to exec "stonith" ] >>>>>>>>>> Mar 17 14:12:20 lbv1 stonithd[13266]: warning: >>> log_operation: Stonith2-1:13386 [ failed: 2 ] >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 宜しくお願いします。 >>>>>>>>>> >>>>>>>>>> 以上 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2015年3月17日 13:32 <renay****@ybb*****>: >>>>>>>>>> >>>>>>>>>> 福田さん >>>>>>>>>>> >>>>>>>>>>> お疲れ様です。山内です。 >>>>>>>>>>> >>>>>>>>>>> ということは、stonith-helperのstartに問題があるようですね。 >>>>>>>>>>> >>>>>>>>>>> stonith-helperの先頭に >>>>>>>>>>> >>>>>>>>>>> #!/bin/bash -x >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> を入れて、クラスタを起動すると何かわかるかも知れません。 >>>>>>>>>>> >>>>>>>>>>> ちなみに、stonith-helperのログもどこかに出ていると思うのですが。。。 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 以上です。 >>>>>>>>>>> >>>>>>>>>>> ----- Original Message ----- >>>>>>>>>>>> From: Masamichi Fukuda - elf-systems >>> <masamichi_fukud****@elf-s*****> >>>>>>>>>>>> To: 山内英生 <renay****@ybb*****>; >>> "linux****@lists*****" >>> <linux****@lists*****> >>>>>>>>>>> >>>>>>>>>>>> Date: 2015/3/17, Tue 12:31 >>>>>>>>>>>> Subject: Re: [Linux-ha-jp] >>> スプリットブレイン時のSTONITHエラーについて >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 山内さん >>>>>>>>>>>> cc:松島さん >>>>>>>>>>>> >>>>>>>>>>>> こんにちは、福田です。 >>>>>>>>>>>> >>>>>>>>>>>> 同じディレクトリにxen0はありました。 >>>>>>>>>>>> >>>>>>>>>>>> # pwd >>>>>>>>>>>> /usr/local/heartbeat/lib/stonith/plugins/external >>>>>>>>>>>> >>>>>>>>>>>> # ls >>>>>>>>>>>> drac5 ibmrsa kdumpcheck >>> riloe vmware >>>>>>>>>>>> dracmc-telnet ibmrsa-telnet libvirt >>> ssh xen0 >>>>>>>>>>>> hetzner ipmi nut >>> stonith-helper xen0-ha >>>>>>>>>>>> hmchttp ippower9258 rackpdu >>> vcenter >>>>>>>>>>>> >>>>>>>>>>>> 宜しくお願いします。 >>>>>>>>>>>> >>>>>>>>>>>> 以上 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 2015-03-17 10:53 GMT+09:00 >>> <renay****@ybb*****>: >>>>>>>>>>>> >>>>>>>>>>>> 福田さん >>>>>>>>>>>>> cc:松島さん >>>>>>>>>>>>> >>>>>>>>>>>>> お疲れ様です。山内です。 >>>>>>>>>>>>> >>>>>>>>>>>>>> 標準出力や標準エラー出力はありませんでした。 >>>>>>>>>>>>>> >>>>>>>>>>>>>> stonith-helperがおかしいのでしょうか。 >>>>>>>>>>>>>> stonith-helperはシェルスクリプトなのでインストールはあまり気にしていなかったのですが。 >>>>>>>>>>>>>> stonith-helperはここに配置されています。 >>>>>>>>>>>>>> /usr/local/heartbeat/lib/stonith/plugins/external/stonith-helper >>>>>>>>>>>>> >>>>>>>>>>>>> このディレクトリにxen0もありますか? >>>>>>>>>>>>> 無いようでしたら、問題がありますので、一度、stonith-helperのファイルを属性などはそのまま、xen0と同じディレクトリに >>>>>>>>>>>>> コピーしてみてください。 >>>>>>>>>>>>> >>>>>>>>>>>>> それで稼働するなら、pm_extrasのインストールに問題があるということになります。 >>>>>>>>>>>>> >>>>>>>>>>>>> 以上です。 >>>>>>>>>>>>> >>>>>>>>>>>>> ----- Original Message ----- >>>>>>>>>>>>>> From: Masamichi Fukuda - elf-systems >>> <masamichi_fukud****@elf-s*****> >>>>>>>>>>>>>> To: 山内英生 >>> <renay****@ybb*****>; >>> "linux****@lists*****" >>> <linux****@lists*****> >>>>>>>>>>>>> >>>>>>>>>>>>>> Date: 2015/3/17, Tue 10:31 >>>>>>>>>>>>>> Subject: Re: [Linux-ha-jp] >>> スプリットブレイン時のSTONITHエラーについて >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 山内さん >>>>>>>>>>>>>> cc:松島さん >>>>>>>>>>>>>> >>>>>>>>>>>>>> おはようございます、福田です。 >>>>>>>>>>>>>> crmの例をありがとうございます。 >>>>>>>>>>>>>> >>>>>>>>>>>>>> 早速、こちらの環境に合わせてみました。 >>>>>>>>>>>>>> >>>>>>>>>>>>>> $ cat test.crm >>>>>>>>>>>>>> ### Cluster Option ### >>>>>>>>>>>>>> property \ >>>>>>>>>>>>>> >>> no-quorum-policy="ignore" \ >>>>>>>>>>>>>> stonith-enabled="true" >>> \ >>>>>>>>>>>>>> >>> startup-fencing="false" \ >>>>>>>>>>>>>> stonith-timeout="710s" >>> \ >>>>>>>>>>>>>> >>> crmd-transition-delay="2s" >>>>>>>>>>>>>> >>>>>>>>>>>>>> ### Resource Default ### >>>>>>>>>>>>>> rsc_defaults \ >>>>>>>>>>>>>> >>> resource-stickiness="INFINITY" \ >>>>>>>>>>>>>> >>> migration-threshold="1" >>>>>>>>>>>>>> >>>>>>>>>>>>>> ### Group Configuration ### >>>>>>>>>>>>>> group HAvarnish \ >>>>>>>>>>>>>> vip_208 \ >>>>>>>>>>>>>> varnishd >>>>>>>>>>>>>> >>>>>>>>>>>>>> group grpStonith1 \ >>>>>>>>>>>>>> Stonith1-1 \ >>>>>>>>>>>>>> Stonith1-2 >>>>>>>>>>>>>> >>>>>>>>>>>>>> group grpStonith2 \ >>>>>>>>>>>>>> Stonith2-1 \ >>>>>>>>>>>>>> Stonith2-2 >>>>>>>>>>>>>> >>>>>>>>>>>>>> ### Clone Configuration ### >>>>>>>>>>>>>> clone clone_ping \ >>>>>>>>>>>>>> ping >>>>>>>>>>>>>> >>>>>>>>>>>>>> ### Fencing Topology ### >>>>>>>>>>>>>> fencing_topology \ >>>>>>>>>>>>>> lbv1.beta.com: Stonith1-1 >>> Stonith1-2 \ >>>>>>>>>>>>>> lbv2.beta.com: Stonith2-1 >>> Stonith2-2 >>>>>>>>>>>>>> >>>>>>>>>>>>>> ### Primitive Configuration ### >>>>>>>>>>>>>> primitive vip_208 >>> ocf:heartbeat:IPaddr2 \ >>>>>>>>>>>>>> params \ >>>>>>>>>>>>>> >>> ip="192.168.17.208" \ >>>>>>>>>>>>>> nic="eth0" \ >>>>>>>>>>>>>> cidr_netmask="24" >>> \ >>>>>>>>>>>>>> op start interval="0s" >>> timeout="90s" on-fail="restart" \ >>>>>>>>>>>>>> op monitor >>> interval="5s" timeout="60s" on-fail="restart" >>> \ >>>>>>>>>>>>>> op stop interval="0s" >>> timeout="100s" on-fail="fence" >>>>>>>>>>>>>> >>>>>>>>>>>>>> primitive varnishd lsb:varnish \ >>>>>>>>>>>>>> op start interval="0s" >>> timeout="90s" on-fail="restart" \ >>>>>>>>>>>>>> op monitor >>> interval="10s" timeout="60s" on-fail="restart" >>> \ >>>>>>>>>>>>>> op stop interval="0s" >>> timeout="100s" on-fail="fence" >>>>>>>>>>>>>> >>>>>>>>>>>>>> primitive ping ocf:pacemaker:ping >>> \ >>>>>>>>>>>>>> params \ >>>>>>>>>>>>>> >>> name="default_ping_set" \ >>>>>>>>>>>>>> >>> host_list="192.168.17.254" \ >>>>>>>>>>>>>> multiplier="100" >>> \ >>>>>>>>>>>>>> dampen="1" \ >>>>>>>>>>>>>> op start interval="0s" >>> timeout="90s" on-fail="restart" \ >>>>>>>>>>>>>> op monitor >>> interval="10s" timeout="60s" on-fail="restart" >>> \ >>>>>>>>>>>>>> op stop interval="0s" >>> timeout="100s" on-fail="fence" >>>>>>>>>>>>>> >>>>>>>>>>>>>> primitive Stonith1-1 >>> stonith:external/stonith-helper \ >>>>>>>>>>>>>> params \ >>>>>>>>>>>>>> >>> pcmk_reboot_retries="1" \ >>>>>>>>>>>>>> >>> pcmk_reboot_timeout="40s" \ >>>>>>>>>>>>>> >>> hostlist="lbv1.beta.com" \ >>>>>>>>>>>>>> >>> dead_check_target="192.168.17.132 10.0.17.132" \ >>>>>>>>>>>>>> >>> standby_check_command="/usr/local/sbin/crm_resource -r varnishd -W | grep >>> -q `hostname`" \ >>>>>>>>>>>>>> >>> run_online_check="yes" \ >>>>>>>>>>>>>> op start interval="0s" >>> timeout="60s" on-fail="restart" \ >>>>>>>>>>>>>> op stop interval="0s" >>> timeout="60s" on-fail="ignore" >>>>>>>>>>>>>> >>>>>>>>>>>>>> primitive Stonith1-2 >>> stonith:external/xen0 \ >>>>>>>>>>>>>> params \ >>>>>>>>>>>>>> >>> pcmk_reboot_timeout="60s" \ >>>>>>>>>>>>>> >>> hostlist="lbv1.beta.com:/etc/xen/lbv1.cfg" \ >>>>>>>>>>>>>> >>> dom0="xen0.beta.com" \ >>>>>>>>>>>>>> op start interval="0s" >>> timeout="60s" on-fail="restart" \ >>>>>>>>>>>>>> op monitor >>> interval="3600s" timeout="60s" on-fail="restart" >>> \ >>>>>>>>>>>>>> op stop interval="0s" >>> timeout="60s" on-fail="ignore" >>>>>>>>>>>>>> >>>>>>>>>>>>>> primitive Stonith2-1 >>> stonith:external/stonith-helper \ >>>>>>>>>>>>>> params \ >>>>>>>>>>>>>> >>> pcmk_reboot_retries="1" \ >>>>>>>>>>>>>> >>> pcmk_reboot_timeout="40s" \ >>>>>>>>>>>>>> >>> hostlist="lbv2.beta.com" \ >>>>>>>>>>>>>> >>> dead_check_target="192.168.17.133 10.0.17.133" \ >>>>>>>>>>>>>> >>> standby_check_command="/usr/local/sbin/crm_resource -r varnishd -W | grep >>> -q `hostname`" \ >>>>>>>>>>>>>> >>> run_online_check="yes" \ >>>>>>>>>>>>>> op start interval="0s" >>> timeout="60s" on-fail="restart" \ >>>>>>>>>>>>>> op stop interval="0s" >>> timeout="60s" on-fail="ignore" >>>>>>>>>>>>>> >>>>>>>>>>>>>> primitive Stonith2-2 >>> stonith:external/xen0 \ >>>>>>>>>>>>>> params \ >>>>>>>>>>>>>> >>> pcmk_reboot_timeout="60s" \ >>>>>>>>>>>>>> >>> hostlist="lbv2.beta.com:/etc/xen/lbv2.cfg" \ >>>>>>>>>>>>>> >>> dom0="xen0.beta.com" \ >>>>>>>>>>>>>> op start interval="0s" >>> timeout="60s" on-fail="restart" \ >>>>>>>>>>>>>> op monitor >>> interval="3600s" timeout="60s" on-fail="restart" >>> \ >>>>>>>>>>>>>> op stop interval="0s" >>> timeout="60s" on-fail="ignore" >>>>>>>>>>>>>> >>>>>>>>>>>>>> ### Resource Location ### >>>>>>>>>>>>>> location HA_location-1 HAvarnish >>> \ >>>>>>>>>>>>>> rule 200: #uname eq >>> lbv1.beta.com \ >>>>>>>>>>>>>> rule 100: #uname eq >>> lbv2.beta.com >>>>>>>>>>>>>> >>>>>>>>>>>>>> location HA_location-2 HAvarnish >>> \ >>>>>>>>>>>>>> rule -INFINITY: not_defined >>> default_ping_set or default_ping_set lt 100 >>>>>>>>>>>>>> >>>>>>>>>>>>>> location HA_location-3 grpStonith1 >>> \ >>>>>>>>>>>>>> rule -INFINITY: #uname eq >>> lbv1.beta.com >>>>>>>>>>>>>> >>>>>>>>>>>>>> location HA_location-4 grpStonith2 >>> \ >>>>>>>>>>>>>> rule -INFINITY: #uname eq >>> lbv2.beta.com >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> これを流しこんだところ、昨日とはメッセージが異なります。 >>>>>>>>>>>>>> pingのメッセージはなくなっていました。 >>>>>>>>>>>>>> >>>>>>>>>>>>>> # crm_mon -rfA >>>>>>>>>>>>>> Last updated: Tue Mar 17 10:21:28 >>> 2015 >>>>>>>>>>>>>> Last change: Tue Mar 17 10:21:09 >>> 2015 >>>>>>>>>>>>>> Stack: heartbeat >>>>>>>>>>>>>> Current DC: lbv2.beta.com >>> (82ffc36f-1ad8-8686-7db0-35686465c624) - parti >>>>>>>>>>>>>> tion with quorum >>>>>>>>>>>>>> Version: 1.1.12-561c4cf >>>>>>>>>>>>>> 2 Nodes configured >>>>>>>>>>>>>> 8 Resources configured >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Online: [ lbv1.beta.com >>> lbv2.beta.com ] >>>>>>>>>>>>>> >>>>>>>>>>>>>> Full list of resources: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Resource Group: HAvarnish >>>>>>>>>>>>>> vip_208 >>> (ocf::heartbeat:IPaddr2): Started lbv1.beta.com >>>>>>>>>>>>>> varnishd (lsb:varnish): >>> Started lbv1.beta.com >>>>>>>>>>>>>> Resource Group: grpStonith1 >>>>>>>>>>>>>> Stonith1-1 >>> (stonith:external/stonith-helper): Stopped >>>>>>>>>>>>>> Stonith1-2 >>> (stonith:external/xen0): Stopped >>>>>>>>>>>>>> Resource Group: grpStonith2 >>>>>>>>>>>>>> Stonith2-1 >>> (stonith:external/stonith-helper): Stopped >>>>>>>>>>>>>> Stonith2-2 >>> (stonith:external/xen0): Stopped >>>>>>>>>>>>>> Clone Set: clone_ping [ping] >>>>>>>>>>>>>> Started: [ lbv1.beta.com >>> lbv2.beta.com ] >>>>>>>>>>>>>> >>>>>>>>>>>>>> Node Attributes: >>>>>>>>>>>>>> * Node lbv1.beta.com: >>>>>>>>>>>>>> + >>> default_ping_set : 100 >>>>>>>>>>>>>> * Node lbv2.beta.com: >>>>>>>>>>>>>> + >>> default_ping_set : 100 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Migration summary: >>>>>>>>>>>>>> * Node lbv2.beta.com: >>>>>>>>>>>>>> Stonith1-1: migration-threshold=1 >>> fail-count=1000000 last-failure='Tue Mar 17 >>>>>>>>>>>>>> 10:21:17 2015' >>>>>>>>>>>>>> * Node lbv1.beta.com: >>>>>>>>>>>>>> Stonith2-1: migration-threshold=1 >>> fail-count=1000000 last-failure='Tue Mar 17 >>>>>>>>>>>>>> 10:21:17 2015' >>>>>>>>>>>>>> >>>>>>>>>>>>>> Failed actions: >>>>>>>>>>>>>> Stonith1-1_start_0 on >>> lbv2.beta.com 'unknown error' (1): call=31, st >>>>>>>>>>>>>> atus=Error, last-rc-change='Tue >>> Mar 17 10:21:15 2015', queued=0ms, exec=1082ms >>>>>>>>>>>>>> Stonith2-1_start_0 on >>> lbv1.beta.com 'unknown error' (1): call=31, st >>>>>>>>>>>>>> atus=Error, last-rc-change='Tue >>> Mar 17 10:21:16 2015', queued=0ms, exec=1079ms >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> /var/log/ha-debugのログです。 >>>>>>>>>>>>>> >>>>>>>>>>>>>> IPaddr2(vip_208)[7851]: >>> 2015/03/17_10:21:22 INFO: Adding inet address 192.168.17.208/24 with broadcast >>> address 192.168.17.255 to device eth0 >>>>>>>>>>>>>> IPaddr2(vip_208)[7851]: >>> 2015/03/17_10:21:22 INFO: Bringing device eth0 up >>>>>>>>>>>>>> IPaddr2(vip_208)[7851]: >>> 2015/03/17_10:21:22 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p >>> /var/run/resource-agents/send_arp-192.168.17.208 eth0 192.168.17.208 auto >>> not_used not_used >>>>>>>>>>>>>> >>>>>>>>>>>>>> 標準出力や標準エラー出力はありませんでした。 >>>>>>>>>>>>>> >>>>>>>>>>>>>> stonith-helperがおかしいのでしょうか。 >>>>>>>>>>>>>> stonith-helperはシェルスクリプトなのでインストールはあまり気にしていなかったのですが。 >>>>>>>>>>>>>> stonith-helperはここに配置されています。 >>>>>>>>>>>>>> /usr/local/heartbeat/lib/stonith/plugins/external/stonith-helper >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 宜しくお願いします。 >>>>>>>>>>>>>> >>>>>>>>>>>>>> 以上 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2015-03-17 9:45 GMT+09:00 >>> <renay****@ybb*****>: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 福田さん >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> おはようございます。山内です。 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 念の為、手元にある複数のstonithを利用した場合の例を抜粋してお送りします。 >>>>>>>>>>>>>>> (実際には、改行に気を付けてください) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 以下の例は、PM1.1系での設定で、 >>>>>>>>>>>>>>> nodeaは、prmStonith1-1、 prmStonith1-2の順でstonithが実行されます。 >>>>>>>>>>>>>>> nodebは、prmStonith2-1、 prmStonith2-2の順でstonithが実行されます。 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> stonith自体は、helperとsshです。 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> (snip) >>>>>>>>>>>>>>> ### Group Configuration ### >>>>>>>>>>>>>>> group grpStonith1 \ >>>>>>>>>>>>>>> prmStonith1-1 \ >>>>>>>>>>>>>>> prmStonith1-2 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> group grpStonith2 \ >>>>>>>>>>>>>>> prmStonith2-1 \ >>>>>>>>>>>>>>> prmStonith2-2 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ### Fencing Topology ### >>>>>>>>>>>>>>> fencing_topology \ >>>>>>>>>>>>>>> nodea: prmStonith1-1 >>> prmStonith1-2 \ >>>>>>>>>>>>>>> nodeb: prmStonith2-1 >>> prmStonith2-2 >>>>>>>>>>>>>>> (snp) >>>>>>>>>>>>>>> primitive prmStonith1-1 >>> stonith:external/stonith-helper \ >>>>>>>>>>>>>>> params \ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> pcmk_reboot_retries="1" >>> \ >>>>>>>>>>>>>>> pcmk_reboot_timeout="40s" >>> \ >>>>>>>>>>>>>>> hostlist="nodea" \ >>>>>>>>>>>>>>> dead_check_target="192.168.28.60 >>> 192.168.28.70" \ >>>>>>>>>>>>>>> standby_check_command="/usr/sbin/crm_resource >>> -r prmRES -W | grep -qi `hostname`" \ >>>>>>>>>>>>>>> run_online_check="yes" >>> \ >>>>>>>>>>>>>>> op start interval="0s" >>> timeout="60s" on-fail="restart" \ >>>>>>>>>>>>>>> op stop interval="0s" >>> timeout="60s" on-fail="ignore" >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> primitive prmStonith1-2 >>> stonith:external/ssh \ >>>>>>>>>>>>>>> params \ >>>>>>>>>>>>>>> pcmk_reboot_timeout="60s" >>> \ >>>>>>>>>>>>>>> hostlist="nodea" \ >>>>>>>>>>>>>>> op start interval="0s" >>> timeout="60s" on-fail="restart" \ >>>>>>>>>>>>>>> op monitor >>> interval="3600s" timeout="60s" on-fail="restart" >>> \ >>>>>>>>>>>>>>> op stop interval="0s" >>> timeout="60s" on-fail="ignore" >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> primitive prmStonith2-1 >>> stonith:external/stonith-helper \ >>>>>>>>>>>>>>> params \ >>>>>>>>>>>>>>> pcmk_reboot_retries="1" >>> \ >>>>>>>>>>>>>>> pcmk_reboot_timeout="40s" >>> \ >>>>>>>>>>>>>>> hostlist="nodeb" \ >>>>>>>>>>>>>>> dead_check_target="192.168.28.61 >>> 192.168.28.71" \ >>>>>>>>>>>>>>> standby_check_command="/usr/sbin/crm_resource >>> -r prmRES -W | grep -qi `hostname`" \ >>>>>>>>>>>>>>> run_online_check="yes" >>> \ >>>>>>>>>>>>>>> op start interval="0s" >>> timeout="60s" on-fail="restart" \ >>>>>>>>>>>>>>> op stop interval="0s" >>> timeout="60s" on-fail="ignore" >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> primitive prmStonith2-2 >>> stonith:external/ssh \ >>>>>>>>>>>>>>> params \ >>>>>>>>>>>>>>> pcmk_reboot_timeout="60s" >>> \ >>>>>>>>>>>>>>> hostlist="nodeb" \ >>>>>>>>>>>>>>> op start interval="0s" >>> timeout="60s" on-fail="restart" \ >>>>>>>>>>>>>>> op monitor >>> interval="3600s" timeout="60s" on-fail="restart" >>> \ >>>>>>>>>>>>>>> op stop interval="0s" >>> timeout="60s" on-fail="ignore" >>>>>>>>>>>>>>> (snip) >>>>>>>>>>>>>>> location >>> rsc_location-grpStonith1-2 grpStonith1 \ >>>>>>>>>>>>>>> rule -INFINITY: #uname eq nodea >>>>>>>>>>>>>>> location >>> rsc_location-grpStonith2-3 grpStonith2 \ >>>>>>>>>>>>>>> rule -INFINITY: #uname eq nodeb >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 以上です。 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> >>>>>>>>>>>>>> ELF Systems >>>>>>>>>>>>>> Masamichi Fukuda >>>>>>>>>>>>>> mail to: >>> masamichi_fukud****@elf-s***** >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Linux-ha-japan mailing list >>>>>>>>>>>>> Linux****@lists***** >>>>>>>>>>>>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> >>>>>>>>>>>> ELF Systems >>>>>>>>>>>> Masamichi Fukuda >>>>>>>>>>>> mail to: masamichi_fukud****@elf-s***** >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Linux-ha-japan mailing list >>>>>>>>>>> Linux****@lists***** >>>>>>>>>>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> ELF Systems >>>>>>>>>> Masamichi Fukuda >>>>>>>>>> mail to: masamichi_fukud****@elf-s***** >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Linux-ha-japan mailing list >>>>>>>>> Linux****@lists***** >>>>>>>>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> ELF Systems >>>>>>>> Masamichi Fukuda >>>>>>>> mail to: masamichi_fukud****@elf-s***** >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Linux-ha-japan mailing list >>>>>>> Linux****@lists***** >>>>>>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan >>>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> ELF Systems >>>>>> Masamichi Fukuda >>>>>> mail to: masamichi_fukud****@elf-s***** >>>>>> >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Linux-ha-japan mailing list >>>>> Linux****@lists***** >>>>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan >>>>> >>>> >>>> >>>> -- >>>> >>>> ELF Systems >>>> Masamichi Fukuda >>>> mail to: masamichi_fukud****@elf-s***** >>>> >>>> >>> >>> _______________________________________________ >>> Linux-ha-japan mailing list >>> Linux****@lists***** >>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan >>> >> >>_______________________________________________ >>Linux-ha-japan mailing list >>Linux****@lists***** >>http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan >> > > >-- > >ELF Systems >Masamichi Fukuda >mail to: masamichi_fukud****@elf-s***** > >