[Linux-ha-jp] STONITH リソースが停止・起動を繰り返す

Back to archive index

NAKAHIRA Kazutomo nakah****@oss*****
2011年 10月 10日 (月) 15:04:32 JST


TO:宮本さま

中平と申します。

今回の原因を調査するためには、crmコマンドでどのような
設定を流し込んだのかが重要かと思われます。

差し支えなければ、以下の各種ファイルについて
ご提供いただけないでしょうか?

・stonith-setup2.txt.node001
・stonith-setup2.txt.node002
・/var/lib/heartbeat/crm/cib.xml
  → できれば、crmコマンド実行前と後のものがそれぞれ欲しいです
・/etc/corosync/corosync.conf

前回メールに添付されていたログから推測できることとしては、

1. crmコマンドによる STONITH設定の投入時、元々定義されていた
   STONITH設定があり、その書き換えにおいて何らかの問題が発生した

2. node001用と node002用の STONITH設定定義ファイルの内容に
   不整合があり、何らかの問題が発生した。

といった可能性が考えられますが、詳細は設定を見ないと不明です。

--- 以下ログの抜粋とコメントです(細かい話なので読み飛ばしてOKです) ---
Oct 08 17:05:59 node001 pengine: [4422]: notice: check_rsc_parameters:
Forcing restart of stonith-node001 on node002, type changed:
external/ipmi -> <null>
Oct 08 17:05:59 node001 pengine: [4422]: notice: check_rsc_parameters:
Forcing restart of stonith-node001 on node002, class changed: stonith ->
<null>
# stonith-node001(STONITHリソースグループ)のパラメータ変更により、
# リソースを強制的に再起動しようとしている。
# ただ、stonith-node001はグループ名であり、ipmi用のSTONITH
# リソース名(stonith-node001-2)のパラメータは持っていないはず。

Oct 08 17:05:59 node001 pengine: [4422]: notice: DeleteRsc: Removing
stonith-node001 from node002
# stonith-node001リソースの削除を試みようとしている。
# ひょっとしてSTONITHグループ名と同名のリソースが元々定義されていた??
# また、同じリソースに対し、restartと同時に削除を実行しようとしている
# 点が不可解。何故そうなったのかは現時点では不明です。
# リソースstop → リソース削除なら筋が通ります。

Oct 08 17:07:19 node001 crmd: [4423]: WARN: action_timer_callback: Timer
popped (timeout=20000, abort_level=0, complete=false)
Oct 08 17:07:19 node001 crmd: [4423]: ERROR: print_elem: Aborting
transition, action lost: [Action 10]: In-flight (id:
stonith-node001_delete_0, loc: node002, priority: 0)
Oct 08 17:07:19 node001 crmd: [4423]: info: abort_transition_graph:
action_timer_callback:486 - Triggered transition abort (complete=0) :
Action lost
# stonith-node001の削除処理が失敗。Action lostしている。

Oct 08 17:07:19 node001 crmd: [4423]: info: update_abort_priority: Abort
priority upgraded from 0 to 1000000
Oct 08 17:07:19 node001 crmd: [4423]: info: update_abort_priority: Abort
action done superceeded by restart
Oct 08 17:07:19 node001 crmd: [4423]: WARN: cib_action_update: rsc_op
10: stonith-node001_delete_0 on node002 timed out
Oct 08 17:07:19 node001 crmd: [4423]: WARN: find_xml_node: Could not
find primitive in rsc_op.
# リソース削除処理は restartにより置き換えられたため、
# その処理自体が消失してしまい、タイムアウトした??
# この時点で、STONITHリソースグループ stonith-node001は停止され、
# stonith-node002は起動されていた。
--- 以上ログ抜粋 ---

stonith-node001とstonith-node002 の 2つのリソース(グループ)に対し、
上記一連の流れ(再起動と削除を同時実行)が交互に繰り返されています。

その発端となったのは、両リソースの設定変更(type,classがnull)
にあるように見受けられます。

以上、よろしくお願いします。


(2011/10/08 17:31), N.Miyamoto wrote:
> 
> いつもお世話になっております。
> 宮本です。
> 
> STONITH リソースが、停止・起動を繰り返す現象に遭遇しています。
> 原因と回避方法について、教えて下さい。
> 
> 検証環境は、以下の通りです。
> pacemaker-1.0.10-1.4.el5 + corosync-1.2.5-1.3.el5 です。
> 
> 現象発生手順:
> (1) crm<  stonith-setup2.txt.node001
> 
> (2) crm_mon -f
>      リソースがStarted になることを確認する。
> 
>      Online: [ node001 node002 ]
> 
>       Resource Group: rscgroup
>           mntrsc1 (ocf::heartbeat:Filesystem):    Started node001
>           mntrsc2 (ocf::heartbeat:Filesystem):    Started node001
>           mgrrsc  (lsb:mgrrsc):   Started node001
>           viprsc  (ocf::heartbeat:IPaddr2):       Started node001
>       Resource Group: stonith-node001
>           stonith-node001-1 (stonith:external/stonith-helper):      Started node002
>           stonith-node001-2 (stonith:external/ipmi):        Started node002
>           stonith-node001-3 (stonith:meatware):     Started node002
> 
>      Migration summary:
>      * Node node001:
>      * Node node002:
> 
> (3) crm<  stonith-setup2.txt.node002
> 
> (4) crm_mon -f
>      ☆stonith-node001 が停止?し、stonith-node002が起動する。
> 
>      Online: [ node001 node002 ]
> 
>       Resource Group: rscgroup
>           mntrsc1 (ocf::heartbeat:Filesystem):    Started node001
>           mntrsc2 (ocf::heartbeat:Filesystem):    Started node001
>           mgrrsc  (lsb:mgrrsc):   Started node001
>           viprsc  (ocf::heartbeat:IPaddr2):       Started node001
>       Resource Group: stonith-node002
>           stonith-node002-1 (stonith:external/stonith-helper):      Started node001
>           stonith-node002-2 (stonith:external/ipmi):        Started node001
>           stonith-node002-3 (stonith:meatware):     Started node001
> 
>      Migration summary:
>      * Node node001:
>      * Node node002:
> 
> (5) (2) →(4) →(2) →(4) →・・・と1分20秒程度の間隔で繰り返し。
> 
> 現象発生時のログ:
> Oct 08 17:05:59 node001 crmd: [4423]: WARN: action_timer_callback: Timer popped (timeout=20000, abort_level=0, complete=false)
> Oct 08 17:05:59 node001 crmd: [4423]: ERROR: print_elem: Aborting transition, action lost: [Action 10]: In-flight (id: stonith-node002_delete_0, loc: node001, priority: 0)
> Oct 08 17:05:59 node001 crmd: [4423]: info: abort_transition_graph: action_timer_callback:486 - Triggered transition abort (complete=0) : Action lost
> Oct 08 17:05:59 node001 crmd: [4423]: info: update_abort_priority: Abort priority upgraded from 0 to 1000000
> Oct 08 17:05:59 node001 crmd: [4423]: info: update_abort_priority: Abort action done superceeded by restart
> Oct 08 17:05:59 node001 crmd: [4423]: WARN: cib_action_update: rsc_op 10: stonith-node002_delete_0 on node001 timed out
> Oct 08 17:05:59 node001 crmd: [4423]: WARN: find_xml_node: Could not find primitive in rsc_op.
> Oct 08 17:05:59 node001 crmd: [4423]: info: run_graph: ====================================================
> Oct 08 17:05:59 node001 crmd: [4423]: notice: run_graph: Transition 55 (Complete=14, Pending=0, Fired=0, Skipped=9, Incomplete=0, Source=/var/lib/pengine/pe-input-1.bz2): Stopped
> Oct 08 17:05:59 node001 crmd: [4423]: info: te_graph_trigger: Transition 55 is now complete
> Oct 08 17:05:59 node001 crmd: [4423]: info: do_state_transition: State transition S_TRANSITION_ENGINE ->  S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ]
> Oct 08 17:05:59 node001 crmd: [4423]: info: do_state_transition: All 2 cluster nodes are eligible to run resources.
> Oct 08 17:05:59 node001 crmd: [4423]: info: do_pe_invoke: Query 935: Requesting the current CIB: S_POLICY_ENGINE
> Oct 08 17:05:59 node001 crmd: [4423]: info: do_pe_invoke_callback: Invoking the PE: query=935, ref=pe_calc-dc-1318061159-605, seq=130536, quorate=1
> Oct 08 17:05:59 node001 pengine: [4422]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Oct 08 17:05:59 node001 pengine: [4422]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
> Oct 08 17:05:59 node001 pengine: [4422]: WARN: unpack_nodes: Blind faith: not fencing unseen nodes
> Oct 08 17:05:59 node001 pengine: [4422]: info: determine_online_status: Node node001 is online
> Oct 08 17:05:59 node001 pengine: [4422]: info: determine_online_status: Node node002 is online
> Oct 08 17:05:59 node001 pengine: [4422]: notice: group_print:  Resource Group: rscgroup
> Oct 08 17:05:59 node001 pengine: [4422]: notice: native_print:      mntrsc1 (ocf::heartbeat:Filesystem):    Started node001
> Oct 08 17:05:59 node001 pengine: [4422]: notice: native_print:      mntrsc2 (ocf::heartbeat:Filesystem):    Started node001
> Oct 08 17:05:59 node001 pengine: [4422]: notice: native_print:      mgrrsc  (lsb:mgrrsc):   Started node001
> Oct 08 17:05:59 node001 pengine: [4422]: notice: native_print:      viprsc  (ocf::heartbeat:IPaddr2):       Started node001
> Oct 08 17:05:59 node001 pengine: [4422]: notice: group_print:  Resource Group: stonith-node001
> Oct 08 17:05:59 node001 pengine: [4422]: notice: native_print:      stonith-node001-1 (stonith:external/stonith-helper):      Started node002
> Oct 08 17:05:59 node001 pengine: [4422]: notice: native_print:      stonith-node001-2 (stonith:external/ipmi):        Started node002
> Oct 08 17:05:59 node001 pengine: [4422]: notice: native_print:      stonith-node001-3 (stonith:meatware):     Started node002
> Oct 08 17:05:59 node001 pengine: [4422]: notice: group_print:  Resource Group: stonith-node002
> Oct 08 17:05:59 node001 pengine: [4422]: notice: native_print:      stonith-node002-1 (stonith:external/stonith-helper):      Stopped
> Oct 08 17:05:59 node001 pengine: [4422]: notice: native_print:      stonith-node002-2 (stonith:external/ipmi):        Stopped
> Oct 08 17:05:59 node001 pengine: [4422]: notice: native_print:      stonith-node002-3 (stonith:meatware):     Stopped
> Oct 08 17:05:59 node001 pengine: [4422]: notice: check_rsc_parameters: Forcing restart of stonith-node001 on node002, type changed: external/ipmi ->  <null>
> Oct 08 17:05:59 node001 pengine: [4422]: notice: check_rsc_parameters: Forcing restart of stonith-node001 on node002, class changed: stonith ->  <null>
> Oct 08 17:05:59 node001 pengine: [4422]: notice: DeleteRsc: Removing stonith-node001 from node002
> Oct 08 17:05:59 node001 pengine: [4422]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense
> Oct 08 17:05:59 node001 pengine: [4422]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense
> Oct 08 17:05:59 node001 pengine: [4422]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense
> Oct 08 17:05:59 node001 pengine: [4422]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense
> Oct 08 17:05:59 node001 pengine: [4422]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense
> Oct 08 17:05:59 node001 pengine: [4422]: notice: RecurringOp:  Start recurring monitor (60s) for stonith-node002-1 on node001
> Oct 08 17:05:59 node001 pengine: [4422]: notice: RecurringOp:  Start recurring monitor (10s) for stonith-node002-2 on node001
> Oct 08 17:05:59 node001 pengine: [4422]: notice: RecurringOp:  Start recurring monitor (10s) for stonith-node002-3 on node001
> Oct 08 17:05:59 node001 pengine: [4422]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense
> Oct 08 17:05:59 node001 pengine: [4422]: notice: LogActions: Leave resource mntrsc1 (Started node001)
> Oct 08 17:05:59 node001 pengine: [4422]: notice: LogActions: Leave resource mntrsc2 (Started node001)
> Oct 08 17:05:59 node001 pengine: [4422]: notice: LogActions: Leave resource mgrrsc  (Started node001)
> Oct 08 17:05:59 node001 pengine: [4422]: notice: LogActions: Leave resource viprsc  (Started node001)
> Oct 08 17:05:59 node001 pengine: [4422]: notice: LogActions: Restart resource stonith-node001-1       (Started node002)
> Oct 08 17:05:59 node001 pengine: [4422]: notice: LogActions: Restart resource stonith-node001-2       (Started node002)
> Oct 08 17:05:59 node001 pengine: [4422]: notice: LogActions: Restart resource stonith-node001-3       (Started node002)
> Oct 08 17:05:59 node001 pengine: [4422]: notice: LogActions: Start stonith-node002-1  (node001)
> Oct 08 17:05:59 node001 pengine: [4422]: notice: LogActions: Start stonith-node002-2  (node001)
> Oct 08 17:05:59 node001 pengine: [4422]: notice: LogActions: Start stonith-node002-3  (node001)
> Oct 08 17:05:59 node001 crmd: [4423]: info: do_state_transition: State transition S_POLICY_ENGINE ->  S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
> Oct 08 17:05:59 node001 crmd: [4423]: info: unpack_graph: Unpacked transition 56: 23 actions in 23 synapses
> Oct 08 17:05:59 node001 crmd: [4423]: info: do_te_invoke: Processing graph 56 (ref=pe_calc-dc-1318061159-605) derived from /var/lib/pengine/pe-input-2.bz2
> Oct 08 17:05:59 node001 crmd: [4423]: info: te_pseudo_action: Pseudo action 9 fired and confirmed
> Oct 08 17:05:59 node001 crmd: [4423]: info: te_rsc_command: Initiating action 10: delete stonith-node001_delete_0 on node002
> Oct 08 17:05:59 node001 crmd: [4423]: info: te_rsc_command: Initiating action 31: stop stonith-node001-3_stop_0 on node002
> Oct 08 17:05:59 node001 crmd: [4423]: info: te_pseudo_action: Pseudo action 42 fired and confirmed
> Oct 08 17:05:59 node001 crmd: [4423]: info: te_rsc_command: Initiating action 36: start stonith-node002-1_start_0 on node001 (local)
> Oct 08 17:05:59 node001 crmd: [4423]: info: do_lrm_rsc_op: Performing key=36:56:0:76d16842-4a6f-4ae1-908b-890f2c3926c1 op=stonith-node002-1_start_0 )
> Oct 08 17:05:59 node001 lrmd: [4420]: info: rsc:stonith-node002-1:152: start
> Oct 08 17:05:59 node001 lrmd: [28668]: info: Try to start STONITH resource<rsc_id=stonith-node002-1>  : Device=external/stonith-helper
> Oct 08 17:05:59 node001 stonithd: [4418]: info: Cannot get parameter run_dead_check from StonithNVpair
> Oct 08 17:05:59 node001 stonithd: [4418]: info: Cannot get parameter run_quorum_check from StonithNVpair
> Oct 08 17:05:59 node001 stonithd: [4418]: info: Cannot get parameter run_standby_wait from StonithNVpair
> Oct 08 17:05:59 node001 stonithd: [4418]: info: Cannot get parameter check_quorum_wait_time from StonithNVpair
> Oct 08 17:05:59 node001 crmd: [4423]: info: match_graph_event: Action stonith-node001-3_stop_0 (31) confirmed on node002 (rc=0)
> Oct 08 17:05:59 node001 crmd: [4423]: info: te_rsc_command: Initiating action 29: stop stonith-node001-2_stop_0 on node002
> Oct 08 17:05:59 node001 crmd: [4423]: info: match_graph_event: Action stonith-node001-2_stop_0 (29) confirmed on node002 (rc=0)
> Oct 08 17:05:59 node001 crmd: [4423]: info: te_rsc_command: Initiating action 27: stop stonith-node001-1_stop_0 on node002
> Oct 08 17:05:59 node001 crmd: [4423]: info: match_graph_event: Action stonith-node001-1_stop_0 (27) confirmed on node002 (rc=0)
> Oct 08 17:05:59 node001 crmd: [4423]: info: te_pseudo_action: Pseudo action 35 fired and confirmed
> Oct 08 17:05:59 node001 pengine: [4422]: info: process_pe_message: Transition 56: PEngine Input stored in: /var/lib/pengine/pe-input-2.bz2
> Oct 08 17:05:59 node001 pengine: [4422]: info: process_pe_message: Configuration ERRORs found during PE processing.  Please run "crm_verify -L" to identify issues.
> Oct 08 17:06:00 node001 stonithd: [4418]: info: stonith-node002-1 stonith resource started
> Oct 08 17:06:00 node001 lrmd: [4420]: debug: stonithRA plugin: provider attribute is not needed and will be ignored.
> Oct 08 17:06:00 node001 crmd: [4423]: info: process_lrm_event: LRM operation stonith-node002-1_start_0 (call=152, rc=0, cib-update=936, confirmed=true) ok
> Oct 08 17:06:00 node001 crmd: [4423]: info: match_graph_event: Action stonith-node002-1_start_0 (36) confirmed on node001 (rc=0)
> Oct 08 17:06:00 node001 crmd: [4423]: info: te_rsc_command: Initiating action 37: monitor stonith-node002-1_monitor_60000 on node001 (local)
> Oct 08 17:06:00 node001 crmd: [4423]: info: do_lrm_rsc_op: Performing key=37:56:0:76d16842-4a6f-4ae1-908b-890f2c3926c1 op=stonith-node002-1_monitor_60000 )
> Oct 08 17:06:00 node001 lrmd: [4420]: info: rsc:stonith-node002-1:153: monitor
> Oct 08 17:06:00 node001 crmd: [4423]: info: te_rsc_command: Initiating action 38: start stonith-node002-2_start_0 on node001 (local)
> Oct 08 17:06:00 node001 crmd: [4423]: info: do_lrm_rsc_op: Performing key=38:56:0:76d16842-4a6f-4ae1-908b-890f2c3926c1 op=stonith-node002-2_start_0 )
> Oct 08 17:06:00 node001 lrmd: [4420]: info: rsc:stonith-node002-2:154: start
> Oct 08 17:06:00 node001 lrmd: [28734]: info: Try to start STONITH resource<rsc_id=stonith-node002-2>  : Device=external/ipmi
> Oct 08 17:06:00 node001 crmd: [4423]: info: process_lrm_event: LRM operation stonith-node002-1_monitor_60000 (call=153, rc=0, cib-update=937, confirmed=false) ok
> Oct 08 17:06:00 node001 stonithd: [4418]: info: stonith-node002-2 stonith resource started
> Oct 08 17:06:00 node001 crmd: [4423]: info: match_graph_event: Action stonith-node002-1_monitor_60000 (37) confirmed on node001 (rc=0)
> Oct 08 17:06:00 node001 lrmd: [4420]: debug: stonithRA plugin: provider attribute is not needed and will be ignored.
> Oct 08 17:06:00 node001 crmd: [4423]: info: process_lrm_event: LRM operation stonith-node002-2_start_0 (call=154, rc=0, cib-update=938, confirmed=true) ok
> Oct 08 17:06:00 node001 crmd: [4423]: info: match_graph_event: Action stonith-node002-2_start_0 (38) confirmed on node001 (rc=0)
> Oct 08 17:06:00 node001 crmd: [4423]: info: te_rsc_command: Initiating action 39: monitor stonith-node002-2_monitor_10000 on node001 (local)
> Oct 08 17:06:00 node001 crmd: [4423]: info: do_lrm_rsc_op: Performing key=39:56:0:76d16842-4a6f-4ae1-908b-890f2c3926c1 op=stonith-node002-2_monitor_10000 )
> Oct 08 17:06:00 node001 lrmd: [4420]: info: rsc:stonith-node002-2:155: monitor
> Oct 08 17:06:00 node001 crmd: [4423]: info: te_rsc_command: Initiating action 40: start stonith-node002-3_start_0 on node001 (local)
> Oct 08 17:06:00 node001 crmd: [4423]: info: do_lrm_rsc_op: Performing key=40:56:0:76d16842-4a6f-4ae1-908b-890f2c3926c1 op=stonith-node002-3_start_0 )
> Oct 08 17:06:00 node001 lrmd: [4420]: info: rsc:stonith-node002-3:156: start
> Oct 08 17:06:00 node001 lrmd: [28772]: info: Try to start STONITH resource<rsc_id=stonith-node002-3>  : Device=meatware
> Oct 08 17:06:00 node001 stonithd: [4418]: info: parse config info info=node002
> Oct 08 17:06:00 node001 stonithd: [4418]: info: stonith-node002-3 stonith resource started
> Oct 08 17:06:00 node001 lrmd: [4420]: debug: stonithRA plugin: provider attribute is not needed and will be ignored.
> Oct 08 17:06:00 node001 crmd: [4423]: info: process_lrm_event: LRM operation stonith-node002-3_start_0 (call=156, rc=0, cib-update=939, confirmed=true) ok
> Oct 08 17:06:00 node001 crmd: [4423]: info: match_graph_event: Action stonith-node002-3_start_0 (40) confirmed on node001 (rc=0)
> Oct 08 17:06:00 node001 crmd: [4423]: info: te_pseudo_action: Pseudo action 43 fired and confirmed
> Oct 08 17:06:00 node001 crmd: [4423]: info: te_rsc_command: Initiating action 41: monitor stonith-node002-3_monitor_10000 on node001 (local)
> Oct 08 17:06:00 node001 crmd: [4423]: info: do_lrm_rsc_op: Performing key=41:56:0:76d16842-4a6f-4ae1-908b-890f2c3926c1 op=stonith-node002-3_monitor_10000 )
> Oct 08 17:06:00 node001 lrmd: [4420]: info: rsc:stonith-node002-3:157: monitor
> Oct 08 17:06:00 node001 crmd: [4423]: info: process_lrm_event: LRM operation stonith-node002-3_monitor_10000 (call=157, rc=0, cib-update=940, confirmed=false) ok
> Oct 08 17:06:00 node001 crmd: [4423]: info: match_graph_event: Action stonith-node002-3_monitor_10000 (41) confirmed on node001 (rc=0)
> Oct 08 17:06:01 node001 crmd: [4423]: info: process_lrm_event: LRM operation stonith-node002-2_monitor_10000 (call=155, rc=0, cib-update=941, confirmed=false) ok
> Oct 08 17:06:01 node001 crmd: [4423]: info: match_graph_event: Action stonith-node002-2_monitor_10000 (39) confirmed on node001 (rc=0)
> Oct 08 17:06:38 node001 cib: [4419]: info: cib_stats: Processed 80 operations (3000.00us average, 0% utilization) in the last 10min
> Oct 08 17:07:19 node001 crmd: [4423]: WARN: action_timer_callback: Timer popped (timeout=20000, abort_level=0, complete=false)
> Oct 08 17:07:19 node001 crmd: [4423]: ERROR: print_elem: Aborting transition, action lost: [Action 10]: In-flight (id: stonith-node001_delete_0, loc: node002, priority: 0)
> Oct 08 17:07:19 node001 crmd: [4423]: info: abort_transition_graph: action_timer_callback:486 - Triggered transition abort (complete=0) : Action lost
> Oct 08 17:07:19 node001 crmd: [4423]: info: update_abort_priority: Abort priority upgraded from 0 to 1000000
> Oct 08 17:07:19 node001 crmd: [4423]: info: update_abort_priority: Abort action done superceeded by restart
> Oct 08 17:07:19 node001 crmd: [4423]: WARN: cib_action_update: rsc_op 10: stonith-node001_delete_0 on node002 timed out
> Oct 08 17:07:19 node001 crmd: [4423]: WARN: find_xml_node: Could not find primitive in rsc_op.
> Oct 08 17:07:19 node001 crmd: [4423]: info: run_graph: ====================================================
> Oct 08 17:07:19 node001 crmd: [4423]: notice: run_graph: Transition 56 (Complete=14, Pending=0, Fired=0, Skipped=9, Incomplete=0, Source=/var/lib/pengine/pe-input-2.bz2): Stopped
> Oct 08 17:07:19 node001 crmd: [4423]: info: te_graph_trigger: Transition 56 is now complete
> Oct 08 17:07:19 node001 crmd: [4423]: info: do_state_transition: State transition S_TRANSITION_ENGINE ->  S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ]
> Oct 08 17:07:19 node001 crmd: [4423]: info: do_state_transition: All 2 cluster nodes are eligible to run resources.
> Oct 08 17:07:19 node001 crmd: [4423]: info: do_pe_invoke: Query 942: Requesting the current CIB: S_POLICY_ENGINE
> Oct 08 17:07:19 node001 crmd: [4423]: info: do_pe_invoke_callback: Invoking the PE: query=942, ref=pe_calc-dc-1318061239-616, seq=130536, quorate=1
> Oct 08 17:07:19 node001 pengine: [4422]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Oct 08 17:07:19 node001 pengine: [4422]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
> Oct 08 17:07:19 node001 pengine: [4422]: WARN: unpack_nodes: Blind faith: not fencing unseen nodes
> Oct 08 17:07:19 node001 pengine: [4422]: info: determine_online_status: Node node001 is online
> Oct 08 17:07:19 node001 pengine: [4422]: info: determine_online_status: Node node002 is online
> Oct 08 17:07:19 node001 pengine: [4422]: notice: group_print:  Resource Group: rscgroup
> Oct 08 17:07:19 node001 pengine: [4422]: notice: native_print:      mntrsc1 (ocf::heartbeat:Filesystem):    Started node001
> Oct 08 17:07:19 node001 pengine: [4422]: notice: native_print:      mntrsc2 (ocf::heartbeat:Filesystem):    Started node001
> Oct 08 17:07:19 node001 pengine: [4422]: notice: native_print:      mgrrsc  (lsb:mgrrsc):   Started node001
> Oct 08 17:07:19 node001 pengine: [4422]: notice: native_print:      viprsc  (ocf::heartbeat:IPaddr2):       Started node001
> Oct 08 17:07:19 node001 pengine: [4422]: notice: group_print:  Resource Group: stonith-node001
> Oct 08 17:07:19 node001 pengine: [4422]: notice: native_print:      stonith-node001-1 (stonith:external/stonith-helper):      Stopped
> Oct 08 17:07:19 node001 pengine: [4422]: notice: native_print:      stonith-node001-2 (stonith:external/ipmi):        Stopped
> Oct 08 17:07:19 node001 pengine: [4422]: notice: native_print:      stonith-node001-3 (stonith:meatware):     Stopped
> Oct 08 17:07:19 node001 pengine: [4422]: notice: group_print:  Resource Group: stonith-node002
> Oct 08 17:07:19 node001 pengine: [4422]: notice: native_print:      stonith-node002-1 (stonith:external/stonith-helper):      Started node001
> Oct 08 17:07:19 node001 pengine: [4422]: notice: native_print:      stonith-node002-2 (stonith:external/ipmi):        Started node001
> Oct 08 17:07:19 node001 pengine: [4422]: notice: native_print:      stonith-node002-3 (stonith:meatware):     Started node001
> Oct 08 17:07:19 node001 pengine: [4422]: notice: check_rsc_parameters: Forcing restart of stonith-node002 on node001, type changed: external/ipmi ->  <null>
> Oct 08 17:07:19 node001 pengine: [4422]: notice: check_rsc_parameters: Forcing restart of stonith-node002 on node001, class changed: stonith ->  <null>
> Oct 08 17:07:19 node001 pengine: [4422]: notice: DeleteRsc: Removing stonith-node002 from node001
> Oct 08 17:07:19 node001 pengine: [4422]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense
> Oct 08 17:07:19 node001 pengine: [4422]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense
> Oct 08 17:07:19 node001 pengine: [4422]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense
> Oct 08 17:07:19 node001 pengine: [4422]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense
> Oct 08 17:07:19 node001 pengine: [4422]: notice: RecurringOp:  Start recurring monitor (60s) for stonith-node001-1 on node002
> Oct 08 17:07:19 node001 pengine: [4422]: notice: RecurringOp:  Start recurring monitor (10s) for stonith-node001-2 on node002
> Oct 08 17:07:19 node001 pengine: [4422]: notice: RecurringOp:  Start recurring monitor (10s) for stonith-node001-3 on node002
> Oct 08 17:07:19 node001 pengine: [4422]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense
> Oct 08 17:07:19 node001 pengine: [4422]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense
> Oct 08 17:07:19 node001 pengine: [4422]: notice: LogActions: Leave resource mntrsc1 (Started node001)
> Oct 08 17:07:19 node001 pengine: [4422]: notice: LogActions: Leave resource mntrsc2 (Started node001)
> Oct 08 17:07:19 node001 pengine: [4422]: notice: LogActions: Leave resource mgrrsc  (Started node001)
> Oct 08 17:07:19 node001 pengine: [4422]: notice: LogActions: Leave resource viprsc  (Started node001)
> Oct 08 17:07:19 node001 pengine: [4422]: notice: LogActions: Start stonith-node001-1  (node002)
> Oct 08 17:07:19 node001 pengine: [4422]: notice: LogActions: Start stonith-node001-2  (node002)
> Oct 08 17:07:19 node001 pengine: [4422]: notice: LogActions: Start stonith-node001-3  (node002)
> Oct 08 17:07:19 node001 pengine: [4422]: notice: LogActions: Restart resource stonith-node002-1       (Started node001)
> Oct 08 17:07:19 node001 pengine: [4422]: notice: LogActions: Restart resource stonith-node002-2       (Started node001)
> Oct 08 17:07:19 node001 pengine: [4422]: notice: LogActions: Restart resource stonith-node002-3       (Started node001)
> Oct 08 17:07:19 node001 crmd: [4423]: info: do_state_transition: State transition S_POLICY_ENGINE ->  S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
> Oct 08 17:07:19 node001 crmd: [4423]: info: unpack_graph: Unpacked transition 57: 23 actions in 23 synapses
> Oct 08 17:07:19 node001 crmd: [4423]: info: do_te_invoke: Processing graph 57 (ref=pe_calc-dc-1318061239-616) derived from /var/lib/pengine/pe-input-3.bz2
> Oct 08 17:07:19 node001 crmd: [4423]: info: te_pseudo_action: Pseudo action 33 fired and confirmed
> Oct 08 17:07:19 node001 crmd: [4423]: info: te_rsc_command: Initiating action 27: start stonith-node001-1_start_0 on node002
> Oct 08 17:07:19 node001 crmd: [4423]: info: te_pseudo_action: Pseudo action 9 fired and confirmed
> Oct 08 17:07:19 node001 crmd: [4423]: info: te_rsc_command: Initiating action 10: delete stonith-node002_delete_0 on node001 (local)
> Oct 08 17:07:19 node001 crmd: [4423]: WARN: find_xml_node: Could not find primitive in rsc_op.
> Oct 08 17:07:19 node001 crmd: [4423]: ERROR: crm_abort: do_lrm_invoke: Triggered asser****@lrm*****:1285 : xml_rsc != NULL
> Oct 08 17:07:19 node001 crmd: [4423]: info: te_rsc_command: Initiating action 41: stop stonith-node002-3_stop_0 on node001 (local)
> Oct 08 17:07:19 node001 lrmd: [4420]: info: cancel_op: operation monitor[157] on stonith::meatware::stonith-node002-3 for client 4423, its parameters: CRM_meta_interval=[10000] on_fail=[restart] stonith-timeout=[600s] hostlist=[node002] CRM_meta_on_fail=[restart] CRM_meta_timeout=[30000] crm_feature_set=[3.0.1] priority=[3] CRM_meta_name=[monitor]  cancelled
> Oct 08 17:07:19 node001 crmd: [4423]: info: do_lrm_rsc_op: Performing key=41:57:0:76d16842-4a6f-4ae1-908b-890f2c3926c1 op=stonith-node002-3_stop_0 )
> Oct 08 17:07:19 node001 lrmd: [4420]: info: rsc:stonith-node002-3:158: stop
> Oct 08 17:07:19 node001 crmd: [4423]: info: process_lrm_event: LRM operation stonith-node002-3_monitor_10000 (call=157, status=1, cib-update=0, confirmed=true) Cancelled
> Oct 08 17:07:19 node001 lrmd: [29829]: info: Try to stop STONITH resource<rsc_id=stonith-node002-3>  : Device=meatware
> Oct 08 17:07:19 node001 crmd: [4423]: info: process_lrm_event: LRM operation stonith-node002-3_stop_0 (call=158, rc=0, cib-update=943, confirmed=true) ok
> Oct 08 17:07:19 node001 crmd: [4423]: info: match_graph_event: Action stonith-node002-3_stop_0 (41) confirmed on node001 (rc=0)
> Oct 08 17:07:19 node001 crmd: [4423]: info: te_rsc_command: Initiating action 39: stop stonith-node002-2_stop_0 on node001 (local)
> Oct 08 17:07:19 node001 lrmd: [4420]: info: cancel_op: operation monitor[155] on stonith::external/ipmi::stonith-node002-2 for client 4423, its parameters: CRM_meta_interval=[10000] ipaddr=[172.25.1.2] on_fail=[restart] interface=[lan] CRM_meta_on_fail=[restart] CRM_meta_timeout=[30000] crm_feature_set=[3.0.1] priority=[2] CRM_meta_name=[monitor] hostname=[node002] passwd=[admin00] userid=[admin]  cancelled
> Oct 08 17:07:19 node001 crmd: [4423]: info: do_lrm_rsc_op: Performing key=39:57:0:76d16842-4a6f-4ae1-908b-890f2c3926c1 op=stonith-node002-2_stop_0 )
> Oct 08 17:07:19 node001 lrmd: [4420]: info: rsc:stonith-node002-2:159: stop
> Oct 08 17:07:19 node001 crmd: [4423]: info: process_lrm_event: LRM operation stonith-node002-2_monitor_10000 (call=155, status=1, cib-update=0, confirmed=true) Cancelled
> Oct 08 17:07:19 node001 lrmd: [29831]: info: Try to stop STONITH resource<rsc_id=stonith-node002-2>  : Device=external/ipmi
> Oct 08 17:07:19 node001 crmd: [4423]: info: process_lrm_event: LRM operation stonith-node002-2_stop_0 (call=159, rc=0, cib-update=944, confirmed=true) ok
> Oct 08 17:07:19 node001 crmd: [4423]: info: match_graph_event: Action stonith-node002-2_stop_0 (39) confirmed on node001 (rc=0)
> Oct 08 17:07:19 node001 crmd: [4423]: info: te_rsc_command: Initiating action 37: stop stonith-node002-1_stop_0 on node001 (local)
> Oct 08 17:07:19 node001 lrmd: [4420]: info: cancel_op: operation monitor[153] on stonith::external/stonith-helper::stonith-node002-1 for client 4423, its parameters: CRM_meta_interval=[60000] standby_wait_time=[15] stonith-timeout=[180s] hostlist=[node002] CRM_meta_on_fail=[restart] CRM_meta_timeout=[30000] standby_check_command=[/usr/sbin/crm_resource -r rscgroup -W | grep -q `hostnamcrm_feature_set=[3.0.1] priority=[1] CRM_meta_name=[monitor] dead_check_target=[172.25.0.2]  cancelled
> Oct 08 17:07:19 node001 crmd: [4423]: info: do_lrm_rsc_op: Performing key=37:57:0:76d16842-4a6f-4ae1-908b-890f2c3926c1 op=stonith-node002-1_stop_0 )
> Oct 08 17:07:19 node001 lrmd: [4420]: info: rsc:stonith-node002-1:160: stop
> Oct 08 17:07:19 node001 crmd: [4423]: info: process_lrm_event: LRM operation stonith-node002-1_monitor_60000 (call=153, status=1, cib-update=0, confirmed=true) Cancelled
> Oct 08 17:07:19 node001 lrmd: [29833]: info: Try to stop STONITH resource<rsc_id=stonith-node002-1>  : Device=external/stonith-helper
> Oct 08 17:07:19 node001 crmd: [4423]: info: process_lrm_event: LRM operation stonith-node002-1_stop_0 (call=160, rc=0, cib-update=945, confirmed=true) ok
> Oct 08 17:07:19 node001 crmd: [4423]: info: match_graph_event: Action stonith-node002-1_stop_0 (37) confirmed on node001 (rc=0)
> Oct 08 17:07:19 node001 crmd: [4423]: info: te_pseudo_action: Pseudo action 45 fired and confirmed
> Oct 08 17:07:19 node001 pengine: [4422]: info: process_pe_message: Transition 57: PEngine Input stored in: /var/lib/pengine/pe-input-3.bz2
> Oct 08 17:07:19 node001 pengine: [4422]: info: process_pe_message: Configuration ERRORs found during PE processing.  Please run "crm_verify -L" to identify issues.
> Oct 08 17:07:19 node001 crmd: [4423]: info: match_graph_event: Action stonith-node001-1_start_0 (27) confirmed on node002 (rc=0)
> Oct 08 17:07:19 node001 crmd: [4423]: info: te_rsc_command: Initiating action 28: monitor stonith-node001-1_monitor_60000 on node002
> Oct 08 17:07:19 node001 crmd: [4423]: info: te_rsc_command: Initiating action 29: start stonith-node001-2_start_0 on node002
> Oct 08 17:07:19 node001 crmd: [4423]: info: match_graph_event: Action stonith-node001-2_start_0 (29) confirmed on node002 (rc=0)
> Oct 08 17:07:19 node001 crmd: [4423]: info: te_rsc_command: Initiating action 30: monitor stonith-node001-2_monitor_10000 on node002
> Oct 08 17:07:19 node001 crmd: [4423]: info: te_rsc_command: Initiating action 31: start stonith-node001-3_start_0 on node002
> Oct 08 17:07:19 node001 crmd: [4423]: info: match_graph_event: Action stonith-node001-3_start_0 (31) confirmed on node002 (rc=0)
> Oct 08 17:07:19 node001 crmd: [4423]: info: te_pseudo_action: Pseudo action 34 fired and confirmed
> Oct 08 17:07:19 node001 crmd: [4423]: info: te_rsc_command: Initiating action 32: monitor stonith-node001-3_monitor_10000 on node002
> Oct 08 17:07:19 node001 crmd: [4423]: info: match_graph_event: Action stonith-node001-3_monitor_10000 (32) confirmed on node002 (rc=0)
> Oct 08 17:07:19 node001 crmd: [4423]: info: match_graph_event: Action stonith-node001-2_monitor_10000 (30) confirmed on node002 (rc=0)
> Oct 08 17:07:20 node001 crmd: [4423]: info: match_graph_event: Action stonith-node001-1_monitor_60000 (28) confirmed on node002 (rc=0)
> Oct 08 17:08:39 node001 crmd: [4423]: WARN: action_timer_callback: Timer popped (timeout=20000, abort_level=0, complete=false)
> 
> 以上ですが、宜しくお願いします。
> 
> ----------------------------------------------
> Nobuaki Miyamoto
> mail:fj508****@aa*****
> 
> _______________________________________________
> Linux-ha-japan mailing list
> Linux****@lists*****
> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
> 
> 


-- 
日本電信電話株式会社
研究企画部門
NTT オープンソースソフトウェアセンタ
中平 和友
TEL: 03-5860-5135 FAX: 03-5463-6490
Mail: nakah****@oss*****





Linux-ha-japan メーリングリストの案内
Back to archive index