Masamichi Fukuda - elf-systems
masamichi_fukud****@elf-s*****
2015年 3月 17日 (火) 09:30:50 JST
山内さん おはようございます、福田です。 サンプル等の参考urlの情報ありがとうございます。 宜しくお願いします。 以上 2015-03-16 21:48 GMT+09:00 <renay****@ybb*****>: > 福田さん > > こんばんは、山内です。 > > 以下に去年のOSC Tokyoでのfencing_topologyのサンプルがあるようです。 > > * http://linux-ha.sourceforge.jp/wp/wp-content/uploads/osc2014_crm.txt > > fencing_topologyで対象とするノードと実行stonithエージェントが制御出来ます。 > > ----------------- > fencing_topology \ > > server01: prmStonith1 \ server02: prmStonith2 > ----------------- > > の形式で、 > 1行に対象ノード: 実行するstonithエージェントを記載...[複数可能] > 以下にも本家の情報があります。 > * http://clusterlabs.org/wiki/Fencing_topology > 以上です。 > > > > > ----- Original Message ----- > >From: Masamichi Fukuda - elf-systems <masamichi_fukud****@elf-s*****> > >To: "linux****@lists*****" < > linux****@lists*****> > >Date: 2015/3/16, Mon 19:24 > >Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて > > > > > >松島さん > > > >こんばんは、福田です。 > >早速のご連絡ありがとうございます。 > > > >crm_mon -rfAの表示です。 > > > >Last updated: Mon Mar 16 18:26:37 2015 > >Last change: Mon Mar 16 18:04:31 2015 > >Stack: heartbeat > >Current DC: lbv2.beta.com (82ffc36f-1ad8-8686-7db0-35686465c624) - parti > >tion with quorum > >Version: 1.1.12-561c4cf > >2 Nodes configured > >10 Resources configured > > > > > >Online: [ lbv1.beta.com lbv2.beta.com ] > > > >Full list of resources: > > > > Resource Group: HAvarnish > > vip_208 (ocf::heartbeat:IPaddr2): Stopped > > varnishd (lsb:varnish): Stopped > > Resource Group: grpStonith1 > > Stonith1-1 (stonith:external/stonith-helper): Stopped > > Stonith1-2 (stonith:external/xen0): Stopped > > Stonith1-3 (stonith:meatware): Stopped > > Resource Group: grpStonith2 > > Stonith2-1 (stonith:external/stonith-helper): Stopped > > Stonith2-2 (stonith:external/xen0): Stopped > > Stonith2-3 (stonith:meatware): Stopped > > Clone Set: clone_ping [ping] > > Stopped: [ lbv1.beta.com lbv2.beta.com ] > > > >Node Attributes: > >* Node lbv1.beta.com: > >* Node lbv2.beta.com: > > > >Migration summary: > >* Node lbv2.beta.com: > > Stonith1-1: migration-threshold=1 fail-count=1000000 last-failure='Mon > Mar 16 > > 18:23:47 2015' > > ping: migration-threshold=1 fail-count=1000000 last-failure='Mon Mar > 16 18:23 > >:47 2015' > >* Node lbv1.beta.com: > > Stonith2-1: migration-threshold=1 fail-count=1000000 last-failure='Mon > Mar 16 > > 18:23:48 2015' > > ping: migration-threshold=1 fail-count=1000000 last-failure='Mon Mar > 16 18:23 > >:55 2015' > > > >Failed actions: > > Stonith1-1_start_0 on lbv2.beta.com 'unknown error' (1): call=39, st > >atus=Error, last-rc-change='Mon Mar 16 18:23:44 2015', queued=0ms, > exec=2014ms > > ping_start_0 on lbv2.beta.com 'unknown error' (1): call=40, status=c > >omplete, last-rc-change='Mon Mar 16 18:23:45 2015', queued=0ms, exec=995ms > > Stonith2-1_start_0 on lbv1.beta.com 'unknown error' (1): call=39, st > >atus=Error, last-rc-change='Mon Mar 16 18:23:45 2015', queued=0ms, > exec=2009ms > > ping_start_0 on lbv1.beta.com 'unknown error' (1): call=41, status=c > >omplete, last-rc-change='Mon Mar 16 18:23:54 2015', queued=0ms, exec=182ms > > > > > >標準出力、標準エラー出力はなく、ログ(/var/log/ha-debug)になります。 > > > >ノード1側(lbv1) > > > >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: info: Pacemaker > support: yes > >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: WARN: File > /etc/ha.d//haresources exists. > >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: WARN: This file is not > used because pacemaker is enabled > >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: debug: Checking access > of: /usr/local/heartbeat/libexec/heartbeat/ccm > >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: debug: Checking access > of: /usr/local/heartbeat/libexec/pacemaker/cib > >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: debug: Checking access > of: /usr/local/heartbeat/libexec/pacemaker/stonithd > >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: debug: Checking access > of: /usr/local/heartbeat/libexec/pacemaker/lrmd > >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: debug: Checking access > of: /usr/local/heartbeat/libexec/pacemaker/attrd > >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: debug: Checking access > of: /usr/local/heartbeat/libexec/pacemaker/crmd > >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1914]: WARN: Core dumps could > be lost if multiple dumps occur. > >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1914]: WARN: Consider setting > non-default value in /proc/sys/kernel/core_pattern (or equivalent) for > maximum supportability > >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1914]: WARN: Consider setting > /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum > supportability > >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1914]: WARN: Logging daemon is > disabled --enabling logging daemon is recommended > >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1914]: info: > ************************** > >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1914]: info: Configuration > validated. Starting heartbeat 3.0.6 > >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: heartbeat: > version 3.0.6 > >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: Heartbeat > generation: 1423534103 > >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: seed is > -1702799346 > >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: glib: ucast: > write socket priority set to IPTOS_LOWDELAY on eth1 > >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: glib: ucast: > bound send socket to device: eth1 > >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: glib: ucast: set > SO_REUSEADDR > >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: glib: ucast: > bound receive socket to device: eth1 > >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: glib: ucast: > started on port 694 interface eth1 to 10.0.17.133 > >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: Local status now > set to: 'up' > >Mar 16 18:22:53 lbv1.beta.com heartbeat: [1957]: info: Link > lbv2.beta.com:eth1 up. > >Mar 16 18:22:53 lbv1.beta.com heartbeat: [1957]: info: Status update for > node lbv2.beta.com: status up > >Mar 16 18:22:53 lbv1.beta.com heartbeat: [1957]: debug: get_delnodelist: > delnodelist= > >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Comm_now_up(): > updating status to active > >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Local status now > set to: 'active' > >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Starting child > client "/usr/local/heartbeat/libexec/heartbeat/ccm" (109,113) > >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Starting child > client "/usr/local/heartbeat/libexec/pacemaker/cib" (109,113) > >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Starting child > client "/usr/local/heartbeat/libexec/pacemaker/stonithd" (0,0) > >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Starting child > client "/usr/local/heartbeat/libexec/pacemaker/lrmd" (0,0) > >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Starting child > client "/usr/local/heartbeat/libexec/pacemaker/attrd" (109,113) > >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Starting child > client "/usr/local/heartbeat/libexec/pacemaker/crmd" (109,113) > >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Status update for > node lbv2.beta.com: status active > >Mar 16 18:22:54 lbv1.beta.com heartbeat: [2868]: info: Starting > "/usr/local/heartbeat/libexec/pacemaker/stonithd" as uid 0 gid 0 (pid 2868) > >Mar 16 18:22:54 lbv1.beta.com heartbeat: [2866]: info: Starting > "/usr/local/heartbeat/libexec/heartbeat/ccm" as uid 109 gid 113 (pid 2866) > >Mar 16 18:22:54 lbv1.beta.com heartbeat: [2871]: info: Starting > "/usr/local/heartbeat/libexec/pacemaker/crmd" as uid 109 gid 113 (pid 2871) > >Mar 16 18:22:54 lbv1.beta.com heartbeat: [2869]: info: Starting > "/usr/local/heartbeat/libexec/pacemaker/lrmd" as uid 0 gid 0 (pid 2869) > >Mar 16 18:22:54 lbv1.beta.com heartbeat: [2867]: info: Starting > "/usr/local/heartbeat/libexec/pacemaker/cib" as uid 109 gid 113 (pid 2867) > >Mar 16 18:22:54 lbv1.beta.com heartbeat: [2870]: info: Starting > "/usr/local/heartbeat/libexec/pacemaker/attrd" as uid 109 gid 113 (pid > 2870) > >Mar 16 18:22:54 lbv1.beta.com ccm: [2866]: info: Hostname: lbv1.beta.com > >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: the send queue > length from heartbeat to client ccm is set to 1024 > >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: the send queue > length from heartbeat to client attrd is set to 1024 > >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: the send queue > length from heartbeat to client stonithd is set to 1024 > >Mar 16 18:22:55 lbv1.beta.com heartbeat: [1957]: info: the send queue > length from heartbeat to client cib is set to 1024 > >Mar 16 18:22:58 lbv1.beta.com heartbeat: [1957]: WARN: 1 lost packet(s) > for [lbv2.beta.com] [33:35] > >Mar 16 18:22:58 lbv1.beta.com heartbeat: [1957]: info: No pkts missing > from lbv2.beta.com! > >Mar 16 18:22:59 lbv1.beta.com heartbeat: [1957]: info: the send queue > length from heartbeat to client crmd is set to 1024 > >Mar 16 18:22:59 lbv1.beta.com heartbeat: [1957]: WARN: 1 lost packet(s) > for [lbv2.beta.com] [40:42] > >Mar 16 18:22:59 lbv1.beta.com heartbeat: [1957]: info: No pkts missing > from lbv2.beta.com! > >ping(ping)[3164]: 2015/03/16_18:23:54 WARNING: Could not update > default_ping_set = 100: rc=127 > > > >ノード2側(lbv2) > > > >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: info: Pacemaker > support: yes > >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: WARN: File > /etc/ha.d//haresources exists. > >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: WARN: This file is not > used because pacemaker is enabled > >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: debug: Checking access > of: /usr/local/heartbeat/libexec/heartbeat/ccm > >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: debug: Checking access > of: /usr/local/heartbeat/libexec/pacemaker/cib > >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: debug: Checking access > of: /usr/local/heartbeat/libexec/pacemaker/stonithd > >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: debug: Checking access > of: /usr/local/heartbeat/libexec/pacemaker/lrmd > >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: debug: Checking access > of: /usr/local/heartbeat/libexec/pacemaker/attrd > >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: debug: Checking access > of: /usr/local/heartbeat/libexec/pacemaker/crmd > >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: WARN: Core dumps could > be lost if multiple dumps occur. > >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: WARN: Consider setting > non-default value in /proc/sys/kernel/core_pattern (or equivalent) for > maximum supportability > >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: WARN: Consider setting > /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum > supportability > >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: WARN: Logging daemon is > disabled --enabling logging daemon is recommended > >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: info: > ************************** > >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: info: Configuration > validated. Starting heartbeat 3.0.6 > >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: heartbeat: > version 3.0.6 > >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: Heartbeat > generation: 1423534179 > >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: seed is 2086609325 > >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: glib: ucast: > write socket priority set to IPTOS_LOWDELAY on eth1 > >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: glib: ucast: > bound send socket to device: eth1 > >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: glib: ucast: set > SO_REUSEADDR > >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: glib: ucast: > bound receive socket to device: eth1 > >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: glib: ucast: > started on port 694 interface eth1 to 10.0.17.132 > >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: Local status now > set to: 'up' > >Mar 16 18:22:48 lbv2.beta.com heartbeat: [1977]: info: Link > lbv1.beta.com:eth1 up. > >Mar 16 18:22:48 lbv2.beta.com heartbeat: [1977]: info: Status update for > node lbv1.beta.com: status up > >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: debug: get_delnodelist: > delnodelist= > >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Comm_now_up(): > updating status to active > >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Local status now > set to: 'active' > >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Starting child > client "/usr/local/heartbeat/libexec/heartbeat/ccm" (109,113) > >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Starting child > client "/usr/local/heartbeat/libexec/pacemaker/cib" (109,113) > >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Starting child > client "/usr/local/heartbeat/libexec/pacemaker/stonithd" (0,0) > >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Starting child > client "/usr/local/heartbeat/libexec/pacemaker/lrmd" (0,0) > >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Starting child > client "/usr/local/heartbeat/libexec/pacemaker/attrd" (109,113) > >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Starting child > client "/usr/local/heartbeat/libexec/pacemaker/crmd" (109,113) > >Mar 16 18:22:53 lbv2.beta.com heartbeat: [3026]: info: Starting > "/usr/local/heartbeat/libexec/pacemaker/attrd" as uid 109 gid 113 (pid > 3026) > >Mar 16 18:22:53 lbv2.beta.com heartbeat: [3023]: info: Starting > "/usr/local/heartbeat/libexec/pacemaker/cib" as uid 109 gid 113 (pid 3023) > >Mar 16 18:22:53 lbv2.beta.com heartbeat: [3025]: info: Starting > "/usr/local/heartbeat/libexec/pacemaker/lrmd" as uid 0 gid 0 (pid 3025) > >Mar 16 18:22:53 lbv2.beta.com heartbeat: [3024]: info: Starting > "/usr/local/heartbeat/libexec/pacemaker/stonithd" as uid 0 gid 0 (pid 3024) > >Mar 16 18:22:53 lbv2.beta.com heartbeat: [3022]: info: Starting > "/usr/local/heartbeat/libexec/heartbeat/ccm" as uid 109 gid 113 (pid 3022) > >Mar 16 18:22:53 lbv2.beta.com heartbeat: [3027]: info: Starting > "/usr/local/heartbeat/libexec/pacemaker/crmd" as uid 109 gid 113 (pid 3027) > >Mar 16 18:22:54 lbv2.beta.com ccm: [3022]: info: Hostname: lbv2.beta.com > >Mar 16 18:22:54 lbv2.beta.com heartbeat: [1977]: info: the send queue > length from heartbeat to client ccm is set to 1024 > >Mar 16 18:22:54 lbv2.beta.com heartbeat: [1977]: info: the send queue > length from heartbeat to client attrd is set to 1024 > >Mar 16 18:22:54 lbv2.beta.com heartbeat: [1977]: info: Status update for > node lbv1.beta.com: status active > >Mar 16 18:22:54 lbv2.beta.com heartbeat: [1977]: info: the send queue > length from heartbeat to client stonithd is set to 1024 > >Mar 16 18:22:54 lbv2.beta.com heartbeat: [1977]: info: the send queue > length from heartbeat to client cib is set to 1024 > >Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: debug: quorum plugin: majority > >Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: debug: cluster:linux-ha, > member_count=1, member_quorum_votes=100 > >Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: debug: total_node_count=2, > total_quorum_votes=200 > >Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: debug: quorum plugin: twonodes > >Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: debug: cluster:linux-ha, > member_count=1, member_quorum_votes=100 > >Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: debug: total_node_count=2, > total_quorum_votes=200 > >Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: info: Break tie for 2 nodes > cluster > >Mar 16 18:22:58 lbv2.beta.com heartbeat: [1977]: WARN: 1 lost packet(s) > for [lbv1.beta.com] [30:32] > >Mar 16 18:22:58 lbv2.beta.com heartbeat: [1977]: info: No pkts missing > from lbv1.beta.com! > >Mar 16 18:22:58 lbv2.beta.com heartbeat: [1977]: info: the send queue > length from heartbeat to client crmd is set to 1024 > >Mar 16 18:22:59 lbv2.beta.com heartbeat: [1977]: WARN: 1 lost packet(s) > for [lbv1.beta.com] [35:37] > >Mar 16 18:22:59 lbv2.beta.com heartbeat: [1977]: info: No pkts missing > from lbv1.beta.com! > >Mar 16 18:22:59 lbv2.beta.com ccm: [3022]: debug: quorum plugin: majority > >Mar 16 18:22:59 lbv2.beta.com ccm: [3022]: debug: cluster:linux-ha, > member_count=2, member_quorum_votes=200 > >Mar 16 18:22:59 lbv2.beta.com ccm: [3022]: debug: total_node_count=2, > total_quorum_votes=200 > >ping(ping)[3144]: 2015/03/16_18:23:46 WARNING: Could not update > default_ping_set = 100: rc=127 > > > > > > > >宜しくお願いします。 > > > >以上 > > > > > > > > > >2015年3月16日 18:53 Takehiro Matsushima <takeh****@gmail*****>: > > > >福田さん > >> > >>こんばんは、松島です。 > >>取り急ぎ1点確認させていただけますでしょうか。 > >> > >>ping RAのstartでunknown errorになっているのも気になりますので、 > >>pingやStonith Helperについて、各RAが標準出力・標準エラー出力に吐き出した部分も含めて > >>該当しそうなログの引用をいただければ幸いです。 > >> > >>---- > >>Takehiro Matsushima > >> > >>_______________________________________________ > >>Linux-ha-japan mailing list > >>Linux****@lists***** > >>http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan > >> > >> > > > > > >-- > > > >ELF Systems > >Masamichi Fukuda > >mail to: masamichi_fukud****@elf-s***** > >_______________________________________________ > >Linux-ha-japan mailing list > >Linux****@lists***** > >http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan > > > > > > > > _______________________________________________ > Linux-ha-japan mailing list > Linux****@lists***** > http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan > -- ELF Systems Masamichi Fukuda mail to: *masamichi_fukud****@elf-s***** <elfsy****@gmail*****>* -------------- next part -------------- HTML$B$NE:IU%U%!%$%k$rJ]4I$7$^$7$?(B...Download