NAKAHIRA Kazutomo
nakah****@oss*****
2011年 10月 10日 (月) 15:04:32 JST
TO:宮本さま 中平と申します。 今回の原因を調査するためには、crmコマンドでどのような 設定を流し込んだのかが重要かと思われます。 差し支えなければ、以下の各種ファイルについて ご提供いただけないでしょうか? ・stonith-setup2.txt.node001 ・stonith-setup2.txt.node002 ・/var/lib/heartbeat/crm/cib.xml → できれば、crmコマンド実行前と後のものがそれぞれ欲しいです ・/etc/corosync/corosync.conf 前回メールに添付されていたログから推測できることとしては、 1. crmコマンドによる STONITH設定の投入時、元々定義されていた STONITH設定があり、その書き換えにおいて何らかの問題が発生した 2. node001用と node002用の STONITH設定定義ファイルの内容に 不整合があり、何らかの問題が発生した。 といった可能性が考えられますが、詳細は設定を見ないと不明です。 --- 以下ログの抜粋とコメントです(細かい話なので読み飛ばしてOKです) --- Oct 08 17:05:59 node001 pengine: [4422]: notice: check_rsc_parameters: Forcing restart of stonith-node001 on node002, type changed: external/ipmi -> <null> Oct 08 17:05:59 node001 pengine: [4422]: notice: check_rsc_parameters: Forcing restart of stonith-node001 on node002, class changed: stonith -> <null> # stonith-node001(STONITHリソースグループ)のパラメータ変更により、 # リソースを強制的に再起動しようとしている。 # ただ、stonith-node001はグループ名であり、ipmi用のSTONITH # リソース名(stonith-node001-2)のパラメータは持っていないはず。 Oct 08 17:05:59 node001 pengine: [4422]: notice: DeleteRsc: Removing stonith-node001 from node002 # stonith-node001リソースの削除を試みようとしている。 # ひょっとしてSTONITHグループ名と同名のリソースが元々定義されていた?? # また、同じリソースに対し、restartと同時に削除を実行しようとしている # 点が不可解。何故そうなったのかは現時点では不明です。 # リソースstop → リソース削除なら筋が通ります。 Oct 08 17:07:19 node001 crmd: [4423]: WARN: action_timer_callback: Timer popped (timeout=20000, abort_level=0, complete=false) Oct 08 17:07:19 node001 crmd: [4423]: ERROR: print_elem: Aborting transition, action lost: [Action 10]: In-flight (id: stonith-node001_delete_0, loc: node002, priority: 0) Oct 08 17:07:19 node001 crmd: [4423]: info: abort_transition_graph: action_timer_callback:486 - Triggered transition abort (complete=0) : Action lost # stonith-node001の削除処理が失敗。Action lostしている。 Oct 08 17:07:19 node001 crmd: [4423]: info: update_abort_priority: Abort priority upgraded from 0 to 1000000 Oct 08 17:07:19 node001 crmd: [4423]: info: update_abort_priority: Abort action done superceeded by restart Oct 08 17:07:19 node001 crmd: [4423]: WARN: cib_action_update: rsc_op 10: stonith-node001_delete_0 on node002 timed out Oct 08 17:07:19 node001 crmd: [4423]: WARN: find_xml_node: Could not find primitive in rsc_op. # リソース削除処理は restartにより置き換えられたため、 # その処理自体が消失してしまい、タイムアウトした?? # この時点で、STONITHリソースグループ stonith-node001は停止され、 # stonith-node002は起動されていた。 --- 以上ログ抜粋 --- stonith-node001とstonith-node002 の 2つのリソース(グループ)に対し、 上記一連の流れ(再起動と削除を同時実行)が交互に繰り返されています。 その発端となったのは、両リソースの設定変更(type,classがnull) にあるように見受けられます。 以上、よろしくお願いします。 (2011/10/08 17:31), N.Miyamoto wrote: > > いつもお世話になっております。 > 宮本です。 > > STONITH リソースが、停止・起動を繰り返す現象に遭遇しています。 > 原因と回避方法について、教えて下さい。 > > 検証環境は、以下の通りです。 > pacemaker-1.0.10-1.4.el5 + corosync-1.2.5-1.3.el5 です。 > > 現象発生手順: > (1) crm< stonith-setup2.txt.node001 > > (2) crm_mon -f > リソースがStarted になることを確認する。 > > Online: [ node001 node002 ] > > Resource Group: rscgroup > mntrsc1 (ocf::heartbeat:Filesystem): Started node001 > mntrsc2 (ocf::heartbeat:Filesystem): Started node001 > mgrrsc (lsb:mgrrsc): Started node001 > viprsc (ocf::heartbeat:IPaddr2): Started node001 > Resource Group: stonith-node001 > stonith-node001-1 (stonith:external/stonith-helper): Started node002 > stonith-node001-2 (stonith:external/ipmi): Started node002 > stonith-node001-3 (stonith:meatware): Started node002 > > Migration summary: > * Node node001: > * Node node002: > > (3) crm< stonith-setup2.txt.node002 > > (4) crm_mon -f > ☆stonith-node001 が停止?し、stonith-node002が起動する。 > > Online: [ node001 node002 ] > > Resource Group: rscgroup > mntrsc1 (ocf::heartbeat:Filesystem): Started node001 > mntrsc2 (ocf::heartbeat:Filesystem): Started node001 > mgrrsc (lsb:mgrrsc): Started node001 > viprsc (ocf::heartbeat:IPaddr2): Started node001 > Resource Group: stonith-node002 > stonith-node002-1 (stonith:external/stonith-helper): Started node001 > stonith-node002-2 (stonith:external/ipmi): Started node001 > stonith-node002-3 (stonith:meatware): Started node001 > > Migration summary: > * Node node001: > * Node node002: > > (5) (2) →(4) →(2) →(4) →・・・と1分20秒程度の間隔で繰り返し。 > > 現象発生時のログ: > Oct 08 17:05:59 node001 crmd: [4423]: WARN: action_timer_callback: Timer popped (timeout=20000, abort_level=0, complete=false) > Oct 08 17:05:59 node001 crmd: [4423]: ERROR: print_elem: Aborting transition, action lost: [Action 10]: In-flight (id: stonith-node002_delete_0, loc: node001, priority: 0) > Oct 08 17:05:59 node001 crmd: [4423]: info: abort_transition_graph: action_timer_callback:486 - Triggered transition abort (complete=0) : Action lost > Oct 08 17:05:59 node001 crmd: [4423]: info: update_abort_priority: Abort priority upgraded from 0 to 1000000 > Oct 08 17:05:59 node001 crmd: [4423]: info: update_abort_priority: Abort action done superceeded by restart > Oct 08 17:05:59 node001 crmd: [4423]: WARN: cib_action_update: rsc_op 10: stonith-node002_delete_0 on node001 timed out > Oct 08 17:05:59 node001 crmd: [4423]: WARN: find_xml_node: Could not find primitive in rsc_op. > Oct 08 17:05:59 node001 crmd: [4423]: info: run_graph: ==================================================== > Oct 08 17:05:59 node001 crmd: [4423]: notice: run_graph: Transition 55 (Complete=14, Pending=0, Fired=0, Skipped=9, Incomplete=0, Source=/var/lib/pengine/pe-input-1.bz2): Stopped > Oct 08 17:05:59 node001 crmd: [4423]: info: te_graph_trigger: Transition 55 is now complete > Oct 08 17:05:59 node001 crmd: [4423]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ] > Oct 08 17:05:59 node001 crmd: [4423]: info: do_state_transition: All 2 cluster nodes are eligible to run resources. > Oct 08 17:05:59 node001 crmd: [4423]: info: do_pe_invoke: Query 935: Requesting the current CIB: S_POLICY_ENGINE > Oct 08 17:05:59 node001 crmd: [4423]: info: do_pe_invoke_callback: Invoking the PE: query=935, ref=pe_calc-dc-1318061159-605, seq=130536, quorate=1 > Oct 08 17:05:59 node001 pengine: [4422]: notice: unpack_config: On loss of CCM Quorum: Ignore > Oct 08 17:05:59 node001 pengine: [4422]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 > Oct 08 17:05:59 node001 pengine: [4422]: WARN: unpack_nodes: Blind faith: not fencing unseen nodes > Oct 08 17:05:59 node001 pengine: [4422]: info: determine_online_status: Node node001 is online > Oct 08 17:05:59 node001 pengine: [4422]: info: determine_online_status: Node node002 is online > Oct 08 17:05:59 node001 pengine: [4422]: notice: group_print: Resource Group: rscgroup > Oct 08 17:05:59 node001 pengine: [4422]: notice: native_print: mntrsc1 (ocf::heartbeat:Filesystem): Started node001 > Oct 08 17:05:59 node001 pengine: [4422]: notice: native_print: mntrsc2 (ocf::heartbeat:Filesystem): Started node001 > Oct 08 17:05:59 node001 pengine: [4422]: notice: native_print: mgrrsc (lsb:mgrrsc): Started node001 > Oct 08 17:05:59 node001 pengine: [4422]: notice: native_print: viprsc (ocf::heartbeat:IPaddr2): Started node001 > Oct 08 17:05:59 node001 pengine: [4422]: notice: group_print: Resource Group: stonith-node001 > Oct 08 17:05:59 node001 pengine: [4422]: notice: native_print: stonith-node001-1 (stonith:external/stonith-helper): Started node002 > Oct 08 17:05:59 node001 pengine: [4422]: notice: native_print: stonith-node001-2 (stonith:external/ipmi): Started node002 > Oct 08 17:05:59 node001 pengine: [4422]: notice: native_print: stonith-node001-3 (stonith:meatware): Started node002 > Oct 08 17:05:59 node001 pengine: [4422]: notice: group_print: Resource Group: stonith-node002 > Oct 08 17:05:59 node001 pengine: [4422]: notice: native_print: stonith-node002-1 (stonith:external/stonith-helper): Stopped > Oct 08 17:05:59 node001 pengine: [4422]: notice: native_print: stonith-node002-2 (stonith:external/ipmi): Stopped > Oct 08 17:05:59 node001 pengine: [4422]: notice: native_print: stonith-node002-3 (stonith:meatware): Stopped > Oct 08 17:05:59 node001 pengine: [4422]: notice: check_rsc_parameters: Forcing restart of stonith-node001 on node002, type changed: external/ipmi -> <null> > Oct 08 17:05:59 node001 pengine: [4422]: notice: check_rsc_parameters: Forcing restart of stonith-node001 on node002, class changed: stonith -> <null> > Oct 08 17:05:59 node001 pengine: [4422]: notice: DeleteRsc: Removing stonith-node001 from node002 > Oct 08 17:05:59 node001 pengine: [4422]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense > Oct 08 17:05:59 node001 pengine: [4422]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense > Oct 08 17:05:59 node001 pengine: [4422]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense > Oct 08 17:05:59 node001 pengine: [4422]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense > Oct 08 17:05:59 node001 pengine: [4422]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense > Oct 08 17:05:59 node001 pengine: [4422]: notice: RecurringOp: Start recurring monitor (60s) for stonith-node002-1 on node001 > Oct 08 17:05:59 node001 pengine: [4422]: notice: RecurringOp: Start recurring monitor (10s) for stonith-node002-2 on node001 > Oct 08 17:05:59 node001 pengine: [4422]: notice: RecurringOp: Start recurring monitor (10s) for stonith-node002-3 on node001 > Oct 08 17:05:59 node001 pengine: [4422]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense > Oct 08 17:05:59 node001 pengine: [4422]: notice: LogActions: Leave resource mntrsc1 (Started node001) > Oct 08 17:05:59 node001 pengine: [4422]: notice: LogActions: Leave resource mntrsc2 (Started node001) > Oct 08 17:05:59 node001 pengine: [4422]: notice: LogActions: Leave resource mgrrsc (Started node001) > Oct 08 17:05:59 node001 pengine: [4422]: notice: LogActions: Leave resource viprsc (Started node001) > Oct 08 17:05:59 node001 pengine: [4422]: notice: LogActions: Restart resource stonith-node001-1 (Started node002) > Oct 08 17:05:59 node001 pengine: [4422]: notice: LogActions: Restart resource stonith-node001-2 (Started node002) > Oct 08 17:05:59 node001 pengine: [4422]: notice: LogActions: Restart resource stonith-node001-3 (Started node002) > Oct 08 17:05:59 node001 pengine: [4422]: notice: LogActions: Start stonith-node002-1 (node001) > Oct 08 17:05:59 node001 pengine: [4422]: notice: LogActions: Start stonith-node002-2 (node001) > Oct 08 17:05:59 node001 pengine: [4422]: notice: LogActions: Start stonith-node002-3 (node001) > Oct 08 17:05:59 node001 crmd: [4423]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] > Oct 08 17:05:59 node001 crmd: [4423]: info: unpack_graph: Unpacked transition 56: 23 actions in 23 synapses > Oct 08 17:05:59 node001 crmd: [4423]: info: do_te_invoke: Processing graph 56 (ref=pe_calc-dc-1318061159-605) derived from /var/lib/pengine/pe-input-2.bz2 > Oct 08 17:05:59 node001 crmd: [4423]: info: te_pseudo_action: Pseudo action 9 fired and confirmed > Oct 08 17:05:59 node001 crmd: [4423]: info: te_rsc_command: Initiating action 10: delete stonith-node001_delete_0 on node002 > Oct 08 17:05:59 node001 crmd: [4423]: info: te_rsc_command: Initiating action 31: stop stonith-node001-3_stop_0 on node002 > Oct 08 17:05:59 node001 crmd: [4423]: info: te_pseudo_action: Pseudo action 42 fired and confirmed > Oct 08 17:05:59 node001 crmd: [4423]: info: te_rsc_command: Initiating action 36: start stonith-node002-1_start_0 on node001 (local) > Oct 08 17:05:59 node001 crmd: [4423]: info: do_lrm_rsc_op: Performing key=36:56:0:76d16842-4a6f-4ae1-908b-890f2c3926c1 op=stonith-node002-1_start_0 ) > Oct 08 17:05:59 node001 lrmd: [4420]: info: rsc:stonith-node002-1:152: start > Oct 08 17:05:59 node001 lrmd: [28668]: info: Try to start STONITH resource<rsc_id=stonith-node002-1> : Device=external/stonith-helper > Oct 08 17:05:59 node001 stonithd: [4418]: info: Cannot get parameter run_dead_check from StonithNVpair > Oct 08 17:05:59 node001 stonithd: [4418]: info: Cannot get parameter run_quorum_check from StonithNVpair > Oct 08 17:05:59 node001 stonithd: [4418]: info: Cannot get parameter run_standby_wait from StonithNVpair > Oct 08 17:05:59 node001 stonithd: [4418]: info: Cannot get parameter check_quorum_wait_time from StonithNVpair > Oct 08 17:05:59 node001 crmd: [4423]: info: match_graph_event: Action stonith-node001-3_stop_0 (31) confirmed on node002 (rc=0) > Oct 08 17:05:59 node001 crmd: [4423]: info: te_rsc_command: Initiating action 29: stop stonith-node001-2_stop_0 on node002 > Oct 08 17:05:59 node001 crmd: [4423]: info: match_graph_event: Action stonith-node001-2_stop_0 (29) confirmed on node002 (rc=0) > Oct 08 17:05:59 node001 crmd: [4423]: info: te_rsc_command: Initiating action 27: stop stonith-node001-1_stop_0 on node002 > Oct 08 17:05:59 node001 crmd: [4423]: info: match_graph_event: Action stonith-node001-1_stop_0 (27) confirmed on node002 (rc=0) > Oct 08 17:05:59 node001 crmd: [4423]: info: te_pseudo_action: Pseudo action 35 fired and confirmed > Oct 08 17:05:59 node001 pengine: [4422]: info: process_pe_message: Transition 56: PEngine Input stored in: /var/lib/pengine/pe-input-2.bz2 > Oct 08 17:05:59 node001 pengine: [4422]: info: process_pe_message: Configuration ERRORs found during PE processing. Please run "crm_verify -L" to identify issues. > Oct 08 17:06:00 node001 stonithd: [4418]: info: stonith-node002-1 stonith resource started > Oct 08 17:06:00 node001 lrmd: [4420]: debug: stonithRA plugin: provider attribute is not needed and will be ignored. > Oct 08 17:06:00 node001 crmd: [4423]: info: process_lrm_event: LRM operation stonith-node002-1_start_0 (call=152, rc=0, cib-update=936, confirmed=true) ok > Oct 08 17:06:00 node001 crmd: [4423]: info: match_graph_event: Action stonith-node002-1_start_0 (36) confirmed on node001 (rc=0) > Oct 08 17:06:00 node001 crmd: [4423]: info: te_rsc_command: Initiating action 37: monitor stonith-node002-1_monitor_60000 on node001 (local) > Oct 08 17:06:00 node001 crmd: [4423]: info: do_lrm_rsc_op: Performing key=37:56:0:76d16842-4a6f-4ae1-908b-890f2c3926c1 op=stonith-node002-1_monitor_60000 ) > Oct 08 17:06:00 node001 lrmd: [4420]: info: rsc:stonith-node002-1:153: monitor > Oct 08 17:06:00 node001 crmd: [4423]: info: te_rsc_command: Initiating action 38: start stonith-node002-2_start_0 on node001 (local) > Oct 08 17:06:00 node001 crmd: [4423]: info: do_lrm_rsc_op: Performing key=38:56:0:76d16842-4a6f-4ae1-908b-890f2c3926c1 op=stonith-node002-2_start_0 ) > Oct 08 17:06:00 node001 lrmd: [4420]: info: rsc:stonith-node002-2:154: start > Oct 08 17:06:00 node001 lrmd: [28734]: info: Try to start STONITH resource<rsc_id=stonith-node002-2> : Device=external/ipmi > Oct 08 17:06:00 node001 crmd: [4423]: info: process_lrm_event: LRM operation stonith-node002-1_monitor_60000 (call=153, rc=0, cib-update=937, confirmed=false) ok > Oct 08 17:06:00 node001 stonithd: [4418]: info: stonith-node002-2 stonith resource started > Oct 08 17:06:00 node001 crmd: [4423]: info: match_graph_event: Action stonith-node002-1_monitor_60000 (37) confirmed on node001 (rc=0) > Oct 08 17:06:00 node001 lrmd: [4420]: debug: stonithRA plugin: provider attribute is not needed and will be ignored. > Oct 08 17:06:00 node001 crmd: [4423]: info: process_lrm_event: LRM operation stonith-node002-2_start_0 (call=154, rc=0, cib-update=938, confirmed=true) ok > Oct 08 17:06:00 node001 crmd: [4423]: info: match_graph_event: Action stonith-node002-2_start_0 (38) confirmed on node001 (rc=0) > Oct 08 17:06:00 node001 crmd: [4423]: info: te_rsc_command: Initiating action 39: monitor stonith-node002-2_monitor_10000 on node001 (local) > Oct 08 17:06:00 node001 crmd: [4423]: info: do_lrm_rsc_op: Performing key=39:56:0:76d16842-4a6f-4ae1-908b-890f2c3926c1 op=stonith-node002-2_monitor_10000 ) > Oct 08 17:06:00 node001 lrmd: [4420]: info: rsc:stonith-node002-2:155: monitor > Oct 08 17:06:00 node001 crmd: [4423]: info: te_rsc_command: Initiating action 40: start stonith-node002-3_start_0 on node001 (local) > Oct 08 17:06:00 node001 crmd: [4423]: info: do_lrm_rsc_op: Performing key=40:56:0:76d16842-4a6f-4ae1-908b-890f2c3926c1 op=stonith-node002-3_start_0 ) > Oct 08 17:06:00 node001 lrmd: [4420]: info: rsc:stonith-node002-3:156: start > Oct 08 17:06:00 node001 lrmd: [28772]: info: Try to start STONITH resource<rsc_id=stonith-node002-3> : Device=meatware > Oct 08 17:06:00 node001 stonithd: [4418]: info: parse config info info=node002 > Oct 08 17:06:00 node001 stonithd: [4418]: info: stonith-node002-3 stonith resource started > Oct 08 17:06:00 node001 lrmd: [4420]: debug: stonithRA plugin: provider attribute is not needed and will be ignored. > Oct 08 17:06:00 node001 crmd: [4423]: info: process_lrm_event: LRM operation stonith-node002-3_start_0 (call=156, rc=0, cib-update=939, confirmed=true) ok > Oct 08 17:06:00 node001 crmd: [4423]: info: match_graph_event: Action stonith-node002-3_start_0 (40) confirmed on node001 (rc=0) > Oct 08 17:06:00 node001 crmd: [4423]: info: te_pseudo_action: Pseudo action 43 fired and confirmed > Oct 08 17:06:00 node001 crmd: [4423]: info: te_rsc_command: Initiating action 41: monitor stonith-node002-3_monitor_10000 on node001 (local) > Oct 08 17:06:00 node001 crmd: [4423]: info: do_lrm_rsc_op: Performing key=41:56:0:76d16842-4a6f-4ae1-908b-890f2c3926c1 op=stonith-node002-3_monitor_10000 ) > Oct 08 17:06:00 node001 lrmd: [4420]: info: rsc:stonith-node002-3:157: monitor > Oct 08 17:06:00 node001 crmd: [4423]: info: process_lrm_event: LRM operation stonith-node002-3_monitor_10000 (call=157, rc=0, cib-update=940, confirmed=false) ok > Oct 08 17:06:00 node001 crmd: [4423]: info: match_graph_event: Action stonith-node002-3_monitor_10000 (41) confirmed on node001 (rc=0) > Oct 08 17:06:01 node001 crmd: [4423]: info: process_lrm_event: LRM operation stonith-node002-2_monitor_10000 (call=155, rc=0, cib-update=941, confirmed=false) ok > Oct 08 17:06:01 node001 crmd: [4423]: info: match_graph_event: Action stonith-node002-2_monitor_10000 (39) confirmed on node001 (rc=0) > Oct 08 17:06:38 node001 cib: [4419]: info: cib_stats: Processed 80 operations (3000.00us average, 0% utilization) in the last 10min > Oct 08 17:07:19 node001 crmd: [4423]: WARN: action_timer_callback: Timer popped (timeout=20000, abort_level=0, complete=false) > Oct 08 17:07:19 node001 crmd: [4423]: ERROR: print_elem: Aborting transition, action lost: [Action 10]: In-flight (id: stonith-node001_delete_0, loc: node002, priority: 0) > Oct 08 17:07:19 node001 crmd: [4423]: info: abort_transition_graph: action_timer_callback:486 - Triggered transition abort (complete=0) : Action lost > Oct 08 17:07:19 node001 crmd: [4423]: info: update_abort_priority: Abort priority upgraded from 0 to 1000000 > Oct 08 17:07:19 node001 crmd: [4423]: info: update_abort_priority: Abort action done superceeded by restart > Oct 08 17:07:19 node001 crmd: [4423]: WARN: cib_action_update: rsc_op 10: stonith-node001_delete_0 on node002 timed out > Oct 08 17:07:19 node001 crmd: [4423]: WARN: find_xml_node: Could not find primitive in rsc_op. > Oct 08 17:07:19 node001 crmd: [4423]: info: run_graph: ==================================================== > Oct 08 17:07:19 node001 crmd: [4423]: notice: run_graph: Transition 56 (Complete=14, Pending=0, Fired=0, Skipped=9, Incomplete=0, Source=/var/lib/pengine/pe-input-2.bz2): Stopped > Oct 08 17:07:19 node001 crmd: [4423]: info: te_graph_trigger: Transition 56 is now complete > Oct 08 17:07:19 node001 crmd: [4423]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ] > Oct 08 17:07:19 node001 crmd: [4423]: info: do_state_transition: All 2 cluster nodes are eligible to run resources. > Oct 08 17:07:19 node001 crmd: [4423]: info: do_pe_invoke: Query 942: Requesting the current CIB: S_POLICY_ENGINE > Oct 08 17:07:19 node001 crmd: [4423]: info: do_pe_invoke_callback: Invoking the PE: query=942, ref=pe_calc-dc-1318061239-616, seq=130536, quorate=1 > Oct 08 17:07:19 node001 pengine: [4422]: notice: unpack_config: On loss of CCM Quorum: Ignore > Oct 08 17:07:19 node001 pengine: [4422]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 > Oct 08 17:07:19 node001 pengine: [4422]: WARN: unpack_nodes: Blind faith: not fencing unseen nodes > Oct 08 17:07:19 node001 pengine: [4422]: info: determine_online_status: Node node001 is online > Oct 08 17:07:19 node001 pengine: [4422]: info: determine_online_status: Node node002 is online > Oct 08 17:07:19 node001 pengine: [4422]: notice: group_print: Resource Group: rscgroup > Oct 08 17:07:19 node001 pengine: [4422]: notice: native_print: mntrsc1 (ocf::heartbeat:Filesystem): Started node001 > Oct 08 17:07:19 node001 pengine: [4422]: notice: native_print: mntrsc2 (ocf::heartbeat:Filesystem): Started node001 > Oct 08 17:07:19 node001 pengine: [4422]: notice: native_print: mgrrsc (lsb:mgrrsc): Started node001 > Oct 08 17:07:19 node001 pengine: [4422]: notice: native_print: viprsc (ocf::heartbeat:IPaddr2): Started node001 > Oct 08 17:07:19 node001 pengine: [4422]: notice: group_print: Resource Group: stonith-node001 > Oct 08 17:07:19 node001 pengine: [4422]: notice: native_print: stonith-node001-1 (stonith:external/stonith-helper): Stopped > Oct 08 17:07:19 node001 pengine: [4422]: notice: native_print: stonith-node001-2 (stonith:external/ipmi): Stopped > Oct 08 17:07:19 node001 pengine: [4422]: notice: native_print: stonith-node001-3 (stonith:meatware): Stopped > Oct 08 17:07:19 node001 pengine: [4422]: notice: group_print: Resource Group: stonith-node002 > Oct 08 17:07:19 node001 pengine: [4422]: notice: native_print: stonith-node002-1 (stonith:external/stonith-helper): Started node001 > Oct 08 17:07:19 node001 pengine: [4422]: notice: native_print: stonith-node002-2 (stonith:external/ipmi): Started node001 > Oct 08 17:07:19 node001 pengine: [4422]: notice: native_print: stonith-node002-3 (stonith:meatware): Started node001 > Oct 08 17:07:19 node001 pengine: [4422]: notice: check_rsc_parameters: Forcing restart of stonith-node002 on node001, type changed: external/ipmi -> <null> > Oct 08 17:07:19 node001 pengine: [4422]: notice: check_rsc_parameters: Forcing restart of stonith-node002 on node001, class changed: stonith -> <null> > Oct 08 17:07:19 node001 pengine: [4422]: notice: DeleteRsc: Removing stonith-node002 from node001 > Oct 08 17:07:19 node001 pengine: [4422]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense > Oct 08 17:07:19 node001 pengine: [4422]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense > Oct 08 17:07:19 node001 pengine: [4422]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense > Oct 08 17:07:19 node001 pengine: [4422]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense > Oct 08 17:07:19 node001 pengine: [4422]: notice: RecurringOp: Start recurring monitor (60s) for stonith-node001-1 on node002 > Oct 08 17:07:19 node001 pengine: [4422]: notice: RecurringOp: Start recurring monitor (10s) for stonith-node001-2 on node002 > Oct 08 17:07:19 node001 pengine: [4422]: notice: RecurringOp: Start recurring monitor (10s) for stonith-node001-3 on node002 > Oct 08 17:07:19 node001 pengine: [4422]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense > Oct 08 17:07:19 node001 pengine: [4422]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense > Oct 08 17:07:19 node001 pengine: [4422]: notice: LogActions: Leave resource mntrsc1 (Started node001) > Oct 08 17:07:19 node001 pengine: [4422]: notice: LogActions: Leave resource mntrsc2 (Started node001) > Oct 08 17:07:19 node001 pengine: [4422]: notice: LogActions: Leave resource mgrrsc (Started node001) > Oct 08 17:07:19 node001 pengine: [4422]: notice: LogActions: Leave resource viprsc (Started node001) > Oct 08 17:07:19 node001 pengine: [4422]: notice: LogActions: Start stonith-node001-1 (node002) > Oct 08 17:07:19 node001 pengine: [4422]: notice: LogActions: Start stonith-node001-2 (node002) > Oct 08 17:07:19 node001 pengine: [4422]: notice: LogActions: Start stonith-node001-3 (node002) > Oct 08 17:07:19 node001 pengine: [4422]: notice: LogActions: Restart resource stonith-node002-1 (Started node001) > Oct 08 17:07:19 node001 pengine: [4422]: notice: LogActions: Restart resource stonith-node002-2 (Started node001) > Oct 08 17:07:19 node001 pengine: [4422]: notice: LogActions: Restart resource stonith-node002-3 (Started node001) > Oct 08 17:07:19 node001 crmd: [4423]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] > Oct 08 17:07:19 node001 crmd: [4423]: info: unpack_graph: Unpacked transition 57: 23 actions in 23 synapses > Oct 08 17:07:19 node001 crmd: [4423]: info: do_te_invoke: Processing graph 57 (ref=pe_calc-dc-1318061239-616) derived from /var/lib/pengine/pe-input-3.bz2 > Oct 08 17:07:19 node001 crmd: [4423]: info: te_pseudo_action: Pseudo action 33 fired and confirmed > Oct 08 17:07:19 node001 crmd: [4423]: info: te_rsc_command: Initiating action 27: start stonith-node001-1_start_0 on node002 > Oct 08 17:07:19 node001 crmd: [4423]: info: te_pseudo_action: Pseudo action 9 fired and confirmed > Oct 08 17:07:19 node001 crmd: [4423]: info: te_rsc_command: Initiating action 10: delete stonith-node002_delete_0 on node001 (local) > Oct 08 17:07:19 node001 crmd: [4423]: WARN: find_xml_node: Could not find primitive in rsc_op. > Oct 08 17:07:19 node001 crmd: [4423]: ERROR: crm_abort: do_lrm_invoke: Triggered asser****@lrm*****:1285 : xml_rsc != NULL > Oct 08 17:07:19 node001 crmd: [4423]: info: te_rsc_command: Initiating action 41: stop stonith-node002-3_stop_0 on node001 (local) > Oct 08 17:07:19 node001 lrmd: [4420]: info: cancel_op: operation monitor[157] on stonith::meatware::stonith-node002-3 for client 4423, its parameters: CRM_meta_interval=[10000] on_fail=[restart] stonith-timeout=[600s] hostlist=[node002] CRM_meta_on_fail=[restart] CRM_meta_timeout=[30000] crm_feature_set=[3.0.1] priority=[3] CRM_meta_name=[monitor] cancelled > Oct 08 17:07:19 node001 crmd: [4423]: info: do_lrm_rsc_op: Performing key=41:57:0:76d16842-4a6f-4ae1-908b-890f2c3926c1 op=stonith-node002-3_stop_0 ) > Oct 08 17:07:19 node001 lrmd: [4420]: info: rsc:stonith-node002-3:158: stop > Oct 08 17:07:19 node001 crmd: [4423]: info: process_lrm_event: LRM operation stonith-node002-3_monitor_10000 (call=157, status=1, cib-update=0, confirmed=true) Cancelled > Oct 08 17:07:19 node001 lrmd: [29829]: info: Try to stop STONITH resource<rsc_id=stonith-node002-3> : Device=meatware > Oct 08 17:07:19 node001 crmd: [4423]: info: process_lrm_event: LRM operation stonith-node002-3_stop_0 (call=158, rc=0, cib-update=943, confirmed=true) ok > Oct 08 17:07:19 node001 crmd: [4423]: info: match_graph_event: Action stonith-node002-3_stop_0 (41) confirmed on node001 (rc=0) > Oct 08 17:07:19 node001 crmd: [4423]: info: te_rsc_command: Initiating action 39: stop stonith-node002-2_stop_0 on node001 (local) > Oct 08 17:07:19 node001 lrmd: [4420]: info: cancel_op: operation monitor[155] on stonith::external/ipmi::stonith-node002-2 for client 4423, its parameters: CRM_meta_interval=[10000] ipaddr=[172.25.1.2] on_fail=[restart] interface=[lan] CRM_meta_on_fail=[restart] CRM_meta_timeout=[30000] crm_feature_set=[3.0.1] priority=[2] CRM_meta_name=[monitor] hostname=[node002] passwd=[admin00] userid=[admin] cancelled > Oct 08 17:07:19 node001 crmd: [4423]: info: do_lrm_rsc_op: Performing key=39:57:0:76d16842-4a6f-4ae1-908b-890f2c3926c1 op=stonith-node002-2_stop_0 ) > Oct 08 17:07:19 node001 lrmd: [4420]: info: rsc:stonith-node002-2:159: stop > Oct 08 17:07:19 node001 crmd: [4423]: info: process_lrm_event: LRM operation stonith-node002-2_monitor_10000 (call=155, status=1, cib-update=0, confirmed=true) Cancelled > Oct 08 17:07:19 node001 lrmd: [29831]: info: Try to stop STONITH resource<rsc_id=stonith-node002-2> : Device=external/ipmi > Oct 08 17:07:19 node001 crmd: [4423]: info: process_lrm_event: LRM operation stonith-node002-2_stop_0 (call=159, rc=0, cib-update=944, confirmed=true) ok > Oct 08 17:07:19 node001 crmd: [4423]: info: match_graph_event: Action stonith-node002-2_stop_0 (39) confirmed on node001 (rc=0) > Oct 08 17:07:19 node001 crmd: [4423]: info: te_rsc_command: Initiating action 37: stop stonith-node002-1_stop_0 on node001 (local) > Oct 08 17:07:19 node001 lrmd: [4420]: info: cancel_op: operation monitor[153] on stonith::external/stonith-helper::stonith-node002-1 for client 4423, its parameters: CRM_meta_interval=[60000] standby_wait_time=[15] stonith-timeout=[180s] hostlist=[node002] CRM_meta_on_fail=[restart] CRM_meta_timeout=[30000] standby_check_command=[/usr/sbin/crm_resource -r rscgroup -W | grep -q `hostnamcrm_feature_set=[3.0.1] priority=[1] CRM_meta_name=[monitor] dead_check_target=[172.25.0.2] cancelled > Oct 08 17:07:19 node001 crmd: [4423]: info: do_lrm_rsc_op: Performing key=37:57:0:76d16842-4a6f-4ae1-908b-890f2c3926c1 op=stonith-node002-1_stop_0 ) > Oct 08 17:07:19 node001 lrmd: [4420]: info: rsc:stonith-node002-1:160: stop > Oct 08 17:07:19 node001 crmd: [4423]: info: process_lrm_event: LRM operation stonith-node002-1_monitor_60000 (call=153, status=1, cib-update=0, confirmed=true) Cancelled > Oct 08 17:07:19 node001 lrmd: [29833]: info: Try to stop STONITH resource<rsc_id=stonith-node002-1> : Device=external/stonith-helper > Oct 08 17:07:19 node001 crmd: [4423]: info: process_lrm_event: LRM operation stonith-node002-1_stop_0 (call=160, rc=0, cib-update=945, confirmed=true) ok > Oct 08 17:07:19 node001 crmd: [4423]: info: match_graph_event: Action stonith-node002-1_stop_0 (37) confirmed on node001 (rc=0) > Oct 08 17:07:19 node001 crmd: [4423]: info: te_pseudo_action: Pseudo action 45 fired and confirmed > Oct 08 17:07:19 node001 pengine: [4422]: info: process_pe_message: Transition 57: PEngine Input stored in: /var/lib/pengine/pe-input-3.bz2 > Oct 08 17:07:19 node001 pengine: [4422]: info: process_pe_message: Configuration ERRORs found during PE processing. Please run "crm_verify -L" to identify issues. > Oct 08 17:07:19 node001 crmd: [4423]: info: match_graph_event: Action stonith-node001-1_start_0 (27) confirmed on node002 (rc=0) > Oct 08 17:07:19 node001 crmd: [4423]: info: te_rsc_command: Initiating action 28: monitor stonith-node001-1_monitor_60000 on node002 > Oct 08 17:07:19 node001 crmd: [4423]: info: te_rsc_command: Initiating action 29: start stonith-node001-2_start_0 on node002 > Oct 08 17:07:19 node001 crmd: [4423]: info: match_graph_event: Action stonith-node001-2_start_0 (29) confirmed on node002 (rc=0) > Oct 08 17:07:19 node001 crmd: [4423]: info: te_rsc_command: Initiating action 30: monitor stonith-node001-2_monitor_10000 on node002 > Oct 08 17:07:19 node001 crmd: [4423]: info: te_rsc_command: Initiating action 31: start stonith-node001-3_start_0 on node002 > Oct 08 17:07:19 node001 crmd: [4423]: info: match_graph_event: Action stonith-node001-3_start_0 (31) confirmed on node002 (rc=0) > Oct 08 17:07:19 node001 crmd: [4423]: info: te_pseudo_action: Pseudo action 34 fired and confirmed > Oct 08 17:07:19 node001 crmd: [4423]: info: te_rsc_command: Initiating action 32: monitor stonith-node001-3_monitor_10000 on node002 > Oct 08 17:07:19 node001 crmd: [4423]: info: match_graph_event: Action stonith-node001-3_monitor_10000 (32) confirmed on node002 (rc=0) > Oct 08 17:07:19 node001 crmd: [4423]: info: match_graph_event: Action stonith-node001-2_monitor_10000 (30) confirmed on node002 (rc=0) > Oct 08 17:07:20 node001 crmd: [4423]: info: match_graph_event: Action stonith-node001-1_monitor_60000 (28) confirmed on node002 (rc=0) > Oct 08 17:08:39 node001 crmd: [4423]: WARN: action_timer_callback: Timer popped (timeout=20000, abort_level=0, complete=false) > > 以上ですが、宜しくお願いします。 > > ---------------------------------------------- > Nobuaki Miyamoto > mail:fj508****@aa***** > > _______________________________________________ > Linux-ha-japan mailing list > Linux****@lists***** > http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan > > -- 日本電信電話株式会社 研究企画部門 NTT オープンソースソフトウェアセンタ 中平 和友 TEL: 03-5860-5135 FAX: 03-5463-6490 Mail: nakah****@oss*****