Masamichi Fukuda - elf-systems
masamichi_fukud****@elf-s*****
2015年 3月 17日 (火) 14:38:47 JST
山内さん お疲れ様です、福田です。 stonith-helperのシェバング行に-xを追加すれば良いのでしょうか? stonith-helperの先頭行を#!/bin/bash -xにしてクラスタを起動してみました。 crm_monでは先ほどと変わりはないようです。 # crm_mon -rfA Last updated: Tue Mar 17 14:14:39 2015 Last change: Tue Mar 17 14:01:43 2015 Stack: heartbeat Current DC: lbv2.beta.com (82ffc36f-1ad8-8686-7db0-35686465c624) - parti tion with quorum Version: 1.1.12-561c4cf 2 Nodes configured 8 Resources configured Online: [ lbv1.beta.com lbv2.beta.com ] Full list of resources: Resource Group: HAvarnish vip_208 (ocf::heartbeat:IPaddr2): Started lbv1.beta.com varnishd (lsb:varnish): Started lbv1.beta.com Resource Group: grpStonith1 Stonith1-1 (stonith:external/stonith-helper): Stopped Stonith1-2 (stonith:external/xen0): Stopped Resource Group: grpStonith2 Stonith2-1 (stonith:external/stonith-helper): Stopped Stonith2-2 (stonith:external/xen0): Stopped Clone Set: clone_ping [ping] Started: [ lbv1.beta.com lbv2.beta.com ] Node Attributes: * Node lbv1.beta.com: + default_ping_set : 100 * Node lbv2.beta.com: + default_ping_set : 100 Migration summary: * Node lbv2.beta.com: Stonith1-1: migration-threshold=1 fail-count=1000000 last-failure='Tue Mar 17 14:12:16 2015' * Node lbv1.beta.com: Stonith2-1: migration-threshold=1 fail-count=1000000 last-failure='Tue Mar 17 14:12:21 2015' Failed actions: Stonith1-1_start_0 on lbv2.beta.com 'unknown error' (1): call=31, st atus=Error, last-rc-change='Tue Mar 17 14:12:14 2015', queued=0ms, exec=1065ms Stonith2-1_start_0 on lbv1.beta.com 'unknown error' (1): call=26, st atus=Error, last-rc-change='Tue Mar 17 14:12:19 2015', queued=0ms, exec=1081ms その他のログを探してみました。 heartbeat起動時です。 # less /var/log/pm_logconv.out Mar 17 14:11:28 lbv1.beta.com info: Starting Heartbeat 3.0.6. Mar 17 14:11:33 lbv1.beta.com info: Link lbv2.beta.com:eth1 is up. Mar 17 14:11:34 lbv1.beta.com info: Start "ccm" process. (pid=13264) Mar 17 14:11:34 lbv1.beta.com info: Start "lrmd" process. (pid=13267) Mar 17 14:11:34 lbv1.beta.com info: Start "attrd" process. (pid=13268) Mar 17 14:11:34 lbv1.beta.com info: Start "stonithd" process. (pid=13266) Mar 17 14:11:34 lbv1.beta.com info: Start "cib" process. (pid=13265) Mar 17 14:11:34 lbv1.beta.com info: Start "crmd" process. (pid=13269) # less /var/log/error Mar 17 14:12:20 lbv1 crmd[13269]: error: process_lrm_event: Operation Stonith2-1_start_0 (node=lbv1.beta.com, call=26, status=4, cib-update=19, confirmed=true) Error syslogからstonithをgrepしたものです Mar 17 14:11:34 lbv1 heartbeat: [13255]: info: Starting child client "/usr/local/heartbeat/libexec/pacemaker/stonithd" (0,0) Mar 17 14:11:34 lbv1 heartbeat: [13266]: info: Starting "/usr/local/heartbeat/libexec/pacemaker/stonithd" as uid 0 gid 0 (pid 13266) Mar 17 14:11:34 lbv1 stonithd[13266]: notice: crm_cluster_connect: Connecting to cluster infrastructure: heartbeat Mar 17 14:11:34 lbv1 heartbeat: [13255]: info: the send queue length from heartbeat to client stonithd is set to 1024 Mar 17 14:11:40 lbv1 stonithd[13266]: notice: setup_cib: Watching for stonith topology changes Mar 17 14:11:40 lbv1 stonithd[13266]: notice: unpack_config: On loss of CCM Quorum: Ignore Mar 17 14:11:40 lbv1 stonithd[13266]: warning: handle_startup_fencing: Blind faith: not fencing unseen nodes Mar 17 14:11:40 lbv1 stonithd[13266]: warning: handle_startup_fencing: Blind faith: not fencing unseen nodes Mar 17 14:11:41 lbv1 stonithd[13266]: notice: stonith_device_register: Added 'Stonith2-1' to the device list (1 active devices) Mar 17 14:11:41 lbv1 stonithd[13266]: notice: stonith_device_register: Added 'Stonith2-2' to the device list (2 active devices) Mar 17 14:12:04 lbv1 stonithd[13266]: notice: xml_patch_version_check: Versions did not change in patch 0.5.0 Mar 17 14:12:20 lbv1 stonithd[13266]: notice: log_operation: Operation 'monitor' [13386] for device 'Stonith2-1' returned: -201 (Generic Pacemaker error) Mar 17 14:12:20 lbv1 stonithd[13266]: warning: log_operation: Stonith2-1:13386 [ Performing: stonith -t external/stonith-helper -S ] Mar 17 14:12:20 lbv1 stonithd[13266]: warning: log_operation: Stonith2-1:13386 [ failed to exec "stonith" ] Mar 17 14:12:20 lbv1 stonithd[13266]: warning: log_operation: Stonith2-1:13386 [ failed: 2 ] 宜しくお願いします。 以上 2015年3月17日 13:32 <renay****@ybb*****>: > 福田さん > > お疲れ様です。山内です。 > > ということは、stonith-helperのstartに問題があるようですね。 > > stonith-helperの先頭に > > #!/bin/bash -x > > > を入れて、クラスタを起動すると何かわかるかも知れません。 > > ちなみに、stonith-helperのログもどこかに出ていると思うのですが。。。 > > > > 以上です。 > > ----- Original Message ----- > >From: Masamichi Fukuda - elf-systems <masamichi_fukud****@elf-s*****> > >To: 山内英生 <renay****@ybb*****>; " > linux****@lists*****" <linux****@lists*****> > >Date: 2015/3/17, Tue 12:31 > >Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて > > > > > >山内さん > >cc:松島さん > > > >こんにちは、福田です。 > > > >同じディレクトリにxen0はありました。 > > > ># pwd > >/usr/local/heartbeat/lib/stonith/plugins/external > > > ># ls > >drac5 ibmrsa kdumpcheck riloe vmware > >dracmc-telnet ibmrsa-telnet libvirt ssh xen0 > >hetzner ipmi nut stonith-helper xen0-ha > >hmchttp ippower9258 rackpdu vcenter > > > >宜しくお願いします。 > > > >以上 > > > > > > > >2015-03-17 10:53 GMT+09:00 <renay****@ybb*****>: > > > >福田さん > >>cc:松島さん > >> > >>お疲れ様です。山内です。 > >> > >>>標準出力や標準エラー出力はありませんでした。 > >>> > >>>stonith-helperがおかしいのでしょうか。 > >>>stonith-helperはシェルスクリプトなのでインストールはあまり気にしていなかったのですが。 > >>>stonith-helperはここに配置されています。 > >>>/usr/local/heartbeat/lib/stonith/plugins/external/stonith-helper > >> > >>このディレクトリにxen0もありますか? > >>無いようでしたら、問題がありますので、一度、stonith-helperのファイルを属性などはそのまま、xen0と同じディレクトリに > >>コピーしてみてください。 > >> > >>それで稼働するなら、pm_extrasのインストールに問題があるということになります。 > >> > >>以上です。 > >> > >>----- Original Message ----- > >>>From: Masamichi Fukuda - elf-systems <masamichi_fukud****@elf-s*****> > >>>To: 山内英生 <renay****@ybb*****>; " > linux****@lists*****" <linux****@lists*****> > >> > >>>Date: 2015/3/17, Tue 10:31 > >>>Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて > >>> > >>> > >>>山内さん > >>>cc:松島さん > >>> > >>>おはようございます、福田です。 > >>>crmの例をありがとうございます。 > >>> > >>>早速、こちらの環境に合わせてみました。 > >>> > >>>$ cat test.crm > >>>### Cluster Option ### > >>>property \ > >>> no-quorum-policy="ignore" \ > >>> stonith-enabled="true" \ > >>> startup-fencing="false" \ > >>> stonith-timeout="710s" \ > >>> crmd-transition-delay="2s" > >>> > >>>### Resource Default ### > >>>rsc_defaults \ > >>> resource-stickiness="INFINITY" \ > >>> migration-threshold="1" > >>> > >>>### Group Configuration ### > >>>group HAvarnish \ > >>> vip_208 \ > >>> varnishd > >>> > >>>group grpStonith1 \ > >>> Stonith1-1 \ > >>> Stonith1-2 > >>> > >>>group grpStonith2 \ > >>> Stonith2-1 \ > >>> Stonith2-2 > >>> > >>>### Clone Configuration ### > >>>clone clone_ping \ > >>> ping > >>> > >>>### Fencing Topology ### > >>>fencing_topology \ > >>> lbv1.beta.com: Stonith1-1 Stonith1-2 \ > >>> lbv2.beta.com: Stonith2-1 Stonith2-2 > >>> > >>>### Primitive Configuration ### > >>>primitive vip_208 ocf:heartbeat:IPaddr2 \ > >>> params \ > >>> ip="192.168.17.208" \ > >>> nic="eth0" \ > >>> cidr_netmask="24" \ > >>> op start interval="0s" timeout="90s" on-fail="restart" \ > >>> op monitor interval="5s" timeout="60s" on-fail="restart" \ > >>> op stop interval="0s" timeout="100s" on-fail="fence" > >>> > >>>primitive varnishd lsb:varnish \ > >>> op start interval="0s" timeout="90s" on-fail="restart" \ > >>> op monitor interval="10s" timeout="60s" on-fail="restart" \ > >>> op stop interval="0s" timeout="100s" on-fail="fence" > >>> > >>>primitive ping ocf:pacemaker:ping \ > >>> params \ > >>> name="default_ping_set" \ > >>> host_list="192.168.17.254" \ > >>> multiplier="100" \ > >>> dampen="1" \ > >>> op start interval="0s" timeout="90s" on-fail="restart" \ > >>> op monitor interval="10s" timeout="60s" on-fail="restart" \ > >>> op stop interval="0s" timeout="100s" on-fail="fence" > >>> > >>>primitive Stonith1-1 stonith:external/stonith-helper \ > >>> params \ > >>> pcmk_reboot_retries="1" \ > >>> pcmk_reboot_timeout="40s" \ > >>> hostlist="lbv1.beta.com" \ > >>> dead_check_target="192.168.17.132 10.0.17.132" \ > >>> standby_check_command="/usr/local/sbin/crm_resource -r varnishd > -W | grep -q `hostname`" \ > >>> run_online_check="yes" \ > >>> op start interval="0s" timeout="60s" on-fail="restart" \ > >>> op stop interval="0s" timeout="60s" on-fail="ignore" > >>> > >>>primitive Stonith1-2 stonith:external/xen0 \ > >>> params \ > >>> pcmk_reboot_timeout="60s" \ > >>> hostlist="lbv1.beta.com:/etc/xen/lbv1.cfg" \ > >>> dom0="xen0.beta.com" \ > >>> op start interval="0s" timeout="60s" on-fail="restart" \ > >>> op monitor interval="3600s" timeout="60s" on-fail="restart" \ > >>> op stop interval="0s" timeout="60s" on-fail="ignore" > >>> > >>>primitive Stonith2-1 stonith:external/stonith-helper \ > >>> params \ > >>> pcmk_reboot_retries="1" \ > >>> pcmk_reboot_timeout="40s" \ > >>> hostlist="lbv2.beta.com" \ > >>> dead_check_target="192.168.17.133 10.0.17.133" \ > >>> standby_check_command="/usr/local/sbin/crm_resource -r varnishd > -W | grep -q `hostname`" \ > >>> run_online_check="yes" \ > >>> op start interval="0s" timeout="60s" on-fail="restart" \ > >>> op stop interval="0s" timeout="60s" on-fail="ignore" > >>> > >>>primitive Stonith2-2 stonith:external/xen0 \ > >>> params \ > >>> pcmk_reboot_timeout="60s" \ > >>> hostlist="lbv2.beta.com:/etc/xen/lbv2.cfg" \ > >>> dom0="xen0.beta.com" \ > >>> op start interval="0s" timeout="60s" on-fail="restart" \ > >>> op monitor interval="3600s" timeout="60s" on-fail="restart" \ > >>> op stop interval="0s" timeout="60s" on-fail="ignore" > >>> > >>>### Resource Location ### > >>>location HA_location-1 HAvarnish \ > >>> rule 200: #uname eq lbv1.beta.com \ > >>> rule 100: #uname eq lbv2.beta.com > >>> > >>>location HA_location-2 HAvarnish \ > >>> rule -INFINITY: not_defined default_ping_set or default_ping_set lt > 100 > >>> > >>>location HA_location-3 grpStonith1 \ > >>> rule -INFINITY: #uname eq lbv1.beta.com > >>> > >>>location HA_location-4 grpStonith2 \ > >>> rule -INFINITY: #uname eq lbv2.beta.com > >>> > >>> > >>>これを流しこんだところ、昨日とはメッセージが異なります。 > >>>pingのメッセージはなくなっていました。 > >>> > >>># crm_mon -rfA > >>>Last updated: Tue Mar 17 10:21:28 2015 > >>>Last change: Tue Mar 17 10:21:09 2015 > >>>Stack: heartbeat > >>>Current DC: lbv2.beta.com (82ffc36f-1ad8-8686-7db0-35686465c624) - > parti > >>>tion with quorum > >>>Version: 1.1.12-561c4cf > >>>2 Nodes configured > >>>8 Resources configured > >>> > >>> > >>>Online: [ lbv1.beta.com lbv2.beta.com ] > >>> > >>>Full list of resources: > >>> > >>> Resource Group: HAvarnish > >>> vip_208 (ocf::heartbeat:IPaddr2): Started lbv1.beta.com > >>> varnishd (lsb:varnish): Started lbv1.beta.com > >>> Resource Group: grpStonith1 > >>> Stonith1-1 (stonith:external/stonith-helper): Stopped > >>> Stonith1-2 (stonith:external/xen0): Stopped > >>> Resource Group: grpStonith2 > >>> Stonith2-1 (stonith:external/stonith-helper): Stopped > >>> Stonith2-2 (stonith:external/xen0): Stopped > >>> Clone Set: clone_ping [ping] > >>> Started: [ lbv1.beta.com lbv2.beta.com ] > >>> > >>>Node Attributes: > >>>* Node lbv1.beta.com: > >>> + default_ping_set : 100 > >>>* Node lbv2.beta.com: > >>> + default_ping_set : 100 > >>> > >>>Migration summary: > >>>* Node lbv2.beta.com: > >>> Stonith1-1: migration-threshold=1 fail-count=1000000 > last-failure='Tue Mar 17 > >>> 10:21:17 2015' > >>>* Node lbv1.beta.com: > >>> Stonith2-1: migration-threshold=1 fail-count=1000000 > last-failure='Tue Mar 17 > >>> 10:21:17 2015' > >>> > >>>Failed actions: > >>> Stonith1-1_start_0 on lbv2.beta.com 'unknown error' (1): call=31, > st > >>>atus=Error, last-rc-change='Tue Mar 17 10:21:15 2015', queued=0ms, > exec=1082ms > >>> Stonith2-1_start_0 on lbv1.beta.com 'unknown error' (1): call=31, > st > >>>atus=Error, last-rc-change='Tue Mar 17 10:21:16 2015', queued=0ms, > exec=1079ms > >>> > >>> > >>>/var/log/ha-debugのログです。 > >>> > >>>IPaddr2(vip_208)[7851]: 2015/03/17_10:21:22 INFO: Adding inet address > 192.168.17.208/24 with broadcast address 192.168.17.255 to device eth0 > >>>IPaddr2(vip_208)[7851]: 2015/03/17_10:21:22 INFO: Bringing device eth0 > up > >>>IPaddr2(vip_208)[7851]: 2015/03/17_10:21:22 INFO: > /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p > /var/run/resource-agents/send_arp-192.168.17.208 eth0 192.168.17.208 auto > not_used not_used > >>> > >>>標準出力や標準エラー出力はありませんでした。 > >>> > >>>stonith-helperがおかしいのでしょうか。 > >>>stonith-helperはシェルスクリプトなのでインストールはあまり気にしていなかったのですが。 > >>>stonith-helperはここに配置されています。 > >>>/usr/local/heartbeat/lib/stonith/plugins/external/stonith-helper > >>> > >>> > >>> > >>>宜しくお願いします。 > >>> > >>>以上 > >>> > >>> > >>> > >>>2015-03-17 9:45 GMT+09:00 <renay****@ybb*****>: > >>> > >>>福田さん > >>>> > >>>>おはようございます。山内です。 > >>>> > >>>>念の為、手元にある複数のstonithを利用した場合の例を抜粋してお送りします。 > >>>>(実際には、改行に気を付けてください) > >>>> > >>>>以下の例は、PM1.1系での設定で、 > >>>>nodeaは、prmStonith1-1、 prmStonith1-2の順でstonithが実行されます。 > >>>>nodebは、prmStonith2-1、 prmStonith2-2の順でstonithが実行されます。 > >>>> > >>>>stonith自体は、helperとsshです。 > >>>> > >>>> > >>>>(snip) > >>>>### Group Configuration ### > >>>>group grpStonith1 \ > >>>>prmStonith1-1 \ > >>>>prmStonith1-2 > >>>> > >>>>group grpStonith2 \ > >>>>prmStonith2-1 \ > >>>>prmStonith2-2 > >>>> > >>>>### Fencing Topology ### > >>>>fencing_topology \ > >>>>nodea: prmStonith1-1 prmStonith1-2 \ > >>>>nodeb: prmStonith2-1 prmStonith2-2 > >>>>(snp) > >>>>primitive prmStonith1-1 stonith:external/stonith-helper \ > >>>>params \ > >>>> > >>>>pcmk_reboot_retries="1" \ > >>>>pcmk_reboot_timeout="40s" \ > >>>>hostlist="nodea" \ > >>>>dead_check_target="192.168.28.60 192.168.28.70" \ > >>>>standby_check_command="/usr/sbin/crm_resource -r prmRES -W | grep -qi > `hostname`" \ > >>>>run_online_check="yes" \ > >>>>op start interval="0s" timeout="60s" on-fail="restart" \ > >>>>op stop interval="0s" timeout="60s" on-fail="ignore" > >>>> > >>>>primitive prmStonith1-2 stonith:external/ssh \ > >>>>params \ > >>>>pcmk_reboot_timeout="60s" \ > >>>>hostlist="nodea" \ > >>>>op start interval="0s" timeout="60s" on-fail="restart" \ > >>>>op monitor interval="3600s" timeout="60s" on-fail="restart" \ > >>>>op stop interval="0s" timeout="60s" on-fail="ignore" > >>>> > >>>>primitive prmStonith2-1 stonith:external/stonith-helper \ > >>>>params \ > >>>>pcmk_reboot_retries="1" \ > >>>>pcmk_reboot_timeout="40s" \ > >>>>hostlist="nodeb" \ > >>>>dead_check_target="192.168.28.61 192.168.28.71" \ > >>>>standby_check_command="/usr/sbin/crm_resource -r prmRES -W | grep -qi > `hostname`" \ > >>>>run_online_check="yes" \ > >>>>op start interval="0s" timeout="60s" on-fail="restart" \ > >>>>op stop interval="0s" timeout="60s" on-fail="ignore" > >>>> > >>>>primitive prmStonith2-2 stonith:external/ssh \ > >>>>params \ > >>>>pcmk_reboot_timeout="60s" \ > >>>>hostlist="nodeb" \ > >>>>op start interval="0s" timeout="60s" on-fail="restart" \ > >>>>op monitor interval="3600s" timeout="60s" on-fail="restart" \ > >>>>op stop interval="0s" timeout="60s" on-fail="ignore" > >>>>(snip) > >>>>location rsc_location-grpStonith1-2 grpStonith1 \ > >>>>rule -INFINITY: #uname eq nodea > >>>>location rsc_location-grpStonith2-3 grpStonith2 \ > >>>>rule -INFINITY: #uname eq nodeb > >>>> > >>>> > >>>>以上です。 > >>>> > >>>> > >>>> > >>>> > >>> > >>>-- > >>> > >>>ELF Systems > >>>Masamichi Fukuda > >>>mail to: masamichi_fukud****@elf-s***** > >>> > >>> > >> > >> > >>_______________________________________________ > >>Linux-ha-japan mailing list > >>Linux****@lists***** > >>http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan > >> > > > > > >-- > > > >ELF Systems > >Masamichi Fukuda > >mail to: masamichi_fukud****@elf-s***** > > > > > > _______________________________________________ > Linux-ha-japan mailing list > Linux****@lists***** > http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan > -- ELF Systems Masamichi Fukuda mail to: *masamichi_fukud****@elf-s***** <elfsy****@gmail*****>* -------------- next part -------------- HTML$B$NE:IU%U%!%$%k$rJ]4I$7$^$7$?(B...Download