Pacemakerのフェールオーバー後のノード組込みについて (Linux-ha-jp) - Linux-HA Japan

辻さん

こんにちは、山内です。

まず、node1でpostgreSQLをkillしてからの停止動作には問題はないようです。
ログを見ると、いくつか気になる動作はしておりますが...とりあえず、２度のstartの件のみに限定して回答します。

２度のstartが掛かる原因ですが、以下のログが該当しています。
----
May 31 07:40:19 node2-001 pacemaker-controld  [242326] (run_graph) 	notice: Transition 3 (Complete=10, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-301.bz2): Complete
May 31 07:40:19 node2-001 pacemaker-controld  [242326] (do_state_transition) 	info: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE | input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd
May 31 07:40:19 node2-001 pacemaker-schedulerd[242325] (unpack_config) 	notice: On loss of quorum: Ignore
May 31 07:40:19 node2-001 pacemaker-schedulerd[242325] (determine_online_status) 	info: Node node1 is online
May 31 07:40:19 node2-001 pacemaker-schedulerd[242325] (determine_online_status) 	info: Node node2 is online
May 31 07:40:19 node2-001 pacemaker-schedulerd[242325] (unpack_rsc_op_failure) 	warning: Unexpected result (error) was recorded for start of r_pgsql:0 on node1 at May 31 07:38:59 2022 | rc=1 id=r_pgsql_last_failure_0
May 31 07:40:19 node2-001 pacemaker-schedulerd[242325] (pe_get_failcount) 	info: r_pgsql:0 has failed INFINITY times on node1
May 31 07:40:19 node2-001 pacemaker-schedulerd[242325] (check_migration_threshold) 	warning: Forcing ms_pgsql away from node1 after 1000000 failures (max=1)
May 31 07:40:19 node2-001 pacemaker-schedulerd[242325] (pe_get_failcount) 	info: r_pgsql:1 has failed INFINITY times on node1
May 31 07:40:19 node2-001 pacemaker-schedulerd[242325] (check_migration_threshold) 	warning: Forcing ms_pgsql away from node1 after 1000000 failures (max=1)
May 31 07:40:19 node2-001 pacemaker-schedulerd[242325] (pcmk__native_allocate) 	info: Resource r_pgsql:1 cannot run anywhere
May 31 07:40:19 node2-001 pacemaker-schedulerd[242325] (pcmk__set_instance_roles) 	info: Promoting r_pgsql:0 (Master node2)
May 31 07:40:19 node2-001 pacemaker-schedulerd[242325] (pcmk__set_instance_roles) 	info: ms_pgsql: Promoted 1 instances of a possible 1
May 31 07:40:19 node2-001 pacemaker-schedulerd[242325] (rsc_action_default) 	info: Leave   r_service_fh	(Started node2)
May 31 07:40:19 node2-001 pacemaker-schedulerd[242325] (rsc_action_default) 	info: Leave   r_pgsql:0	(Master node2)
May 31 07:40:19 node2-001 pacemaker-schedulerd[242325] (rsc_action_default) 	info: Leave   r_pgsql:1	(Stopped)
May 31 07:40:19 node2-001 pacemaker-schedulerd[242325] (pcmk__log_transition_summary) 	notice: Calculated transition 4, saving inputs in /var/lib/pacemaker/pengine/pe-input-302.bz2
May 31 07:40:19 node2-001 pacemaker-controld  [242326] (do_state_transition) 	info: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response
May 31 07:40:19 node2-001 pacemaker-controld  [242326] (do_te_invoke) 	info: Processing graph 4 (ref=pe_calc-dc-1653982819-61) derived from /var/lib/pacemaker/pengine/pe-input-302.bz2
May 31 07:40:19 node2-001 pacemaker-controld  [242326] (run_graph) 	notice: Transition 4 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-302.bz2): Complete
May 31 07:40:19 node2-001 pacemaker-controld  [242326] (do_log) 	info: Input I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd
May 31 07:40:19 node2-001 pacemaker-controld  [242326] (do_state_transition) 	notice: State transition S_TRANSITION_ENGINE -> S_IDLE | input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd
★node1のr_pgsqlのstartの失敗を処理して、クラスタは一旦安定状態に入ります。
----
★しばらく以下のログが続き...
May 31 07:40:21 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[268937] error output [ # fh_monitor  dig success[OK]  STAGE: 1  stpcnt: 0 ]
May 31 07:40:21 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[268937] error output [ # fh_monitor  hostname success[OK]  STAGE: 2  stpcnt: 0 ]
May 31 07:40:21 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[268937] error output [ # fh_monitor  dug_ip: xx.xx.xx.xx  host_ip: xx.xx.xx.xx  stpcnt: 0 ]
May 31 07:40:21 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[268937] error output [ # fh_monitor  Exit:OCF_SUCCESS[OK] ]
May 31 07:40:24 node2-001 pacemaker-based     [242321] (cib_process_ping) 	info: Reporting our current digest to node2: cc0b84d295ff3f31db6079faf3ac1c05 for 0.550.1 (0x564c388d0a90 0)
May 31 07:40:31 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[269358] error output [ # fh_monitor  dig success[OK]  STAGE: 1  stpcnt: 0 ]
May 31 07:40:31 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[269358] error output [ # fh_monitor  hostname success[OK]  STAGE: 2  stpcnt: 0 ]
May 31 07:40:31 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[269358] error output [ # fh_monitor  dug_ip: xx.xx.xx.xx  host_ip: xx.xx.xx.xx  stpcnt: 0 ]
May 31 07:40:31 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[269358] error output [ # fh_monitor  Exit:OCF_SUCCESS[OK] ]
May 31 07:40:41 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[269783] error output [ # fh_monitor  dig success[OK]  STAGE: 1  stpcnt: 0 ]
May 31 07:40:41 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[269783] error output [ # fh_monitor  hostname success[OK]  STAGE: 2  stpcnt: 0 ]
May 31 07:40:41 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[269783] error output [ # fh_monitor  dug_ip: xx.xx.xx.xx  host_ip: xx.xx.xx.xx  stpcnt: 0 ]
May 31 07:40:41 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[269783] error output [ # fh_monitor  Exit:OCF_SUCCESS[OK] ]
May 31 07:40:51 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[270595] error output [ # fh_monitor  dig success[OK]  STAGE: 1  stpcnt: 0 ]
May 31 07:40:51 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[270595] error output [ # fh_monitor  hostname success[OK]  STAGE: 2  stpcnt: 0 ]
May 31 07:40:51 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[270595] error output [ # fh_monitor  dug_ip: xx.xx.xx.xx  host_ip: xx.xx.xx.xx  stpcnt: 0 ]
May 31 07:40:51 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[270595] error output [ # fh_monitor  Exit:OCF_SUCCESS[OK] ]
May 31 07:41:01 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[271017] error output [ # fh_monitor  dig success[OK]  STAGE: 1  stpcnt: 0 ]
May 31 07:41:01 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[271017] error output [ # fh_monitor  hostname success[OK]  STAGE: 2  stpcnt: 0 ]
May 31 07:41:01 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[271017] error output [ # fh_monitor  dug_ip: xx.xx.xx.xx  host_ip: xx.xx.xx.xx  stpcnt: 0 ]
May 31 07:41:01 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[271017] error output [ # fh_monitor  Exit:OCF_SUCCESS[OK] ]
May 31 07:41:11 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[271548] error output [ # fh_monitor  dig success[OK]  STAGE: 1  stpcnt: 0 ]
May 31 07:41:11 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[271548] error output [ # fh_monitor  hostname success[OK]  STAGE: 2  stpcnt: 0 ]
May 31 07:41:11 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[271548] error output [ # fh_monitor  dug_ip: xx.xx.xx.xx  host_ip: xx.xx.xx.xx  stpcnt: 0 ]
May 31 07:41:11 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[271548] error output [ # fh_monitor  Exit:OCF_SUCCESS[OK] ]
May 31 07:41:21 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[272286] error output [ # fh_monitor  dig success[OK]  STAGE: 1  stpcnt: 0 ]
May 31 07:41:21 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[272286] error output [ # fh_monitor  hostname success[OK]  STAGE: 2  stpcnt: 0 ]
May 31 07:41:21 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[272286] error output [ # fh_monitor  dug_ip: xx.xx.xx.xx  host_ip: xx.xx.xx.xx  stpcnt: 0 ]
May 31 07:41:21 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[272286] error output [ # fh_monitor  Exit:OCF_SUCCESS[OK] ]
May 31 07:41:31 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[272782] error output [ # fh_monitor  dig success[OK]  STAGE: 1  stpcnt: 0 ]
May 31 07:41:31 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[272782] error output [ # fh_monitor  hostname success[OK]  STAGE: 2  stpcnt: 0 ]
May 31 07:41:31 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[272782] error output [ # fh_monitor  dug_ip: xx.xx.xx.xx  host_ip: xx.xx.xx.xx  stpcnt: 0 ]
May 31 07:41:31 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[272782] error output [ # fh_monitor  Exit:OCF_SUCCESS[OK] ]
May 31 07:41:41 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[273223] error output [ # fh_monitor  dig success[OK]  STAGE: 1  stpcnt: 0 ]
May 31 07:41:41 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[273223] error output [ # fh_monitor  hostname success[OK]  STAGE: 2  stpcnt: 0 ]
May 31 07:41:41 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[273223] error output [ # fh_monitor  dug_ip: xx.xx.xx.xx  host_ip: xx.xx.xx.xx  stpcnt: 0 ]
May 31 07:41:41 node2-001 pacemaker-execd     [242323] (log_op_output) 	notice: r_service_fh_monitor_10000[273223] error output [ # fh_monitor  Exit:OCF_SUCCESS[OK] ]
----
★S_IDLE(07:40:19)から1分27秒後にnode1のエラーが、node2からクリアされています。
May 31 07:41:46 node2-001 pacemaker-attrd     [242324] (attrd_peer_update) 	notice: Setting last-failure-r_pgsql#start_0[node1]: 1653982739 -> (unset) | from node2
May 31 07:41:46 node2-001 pacemaker-attrd     [242324] (write_attribute) 	info: Sent CIB request 21 with 2 changes for last-failure-r_pgsql#start_0 (id n/a, set n/a)
May 31 07:41:46 node2-001 pacemaker-based     [242321] (cib_process_request) 	info: Forwarding cib_modify operation for section status to all (origin=local/attrd/21)
May 31 07:41:46 node2-001 pacemaker-attrd     [242324] (attrd_peer_update) 	notice: Setting fail-count-r_pgsql#start_0[node1]: INFINITY -> (unset) | from node2
May 31 07:41:46 node2-001 pacemaker-attrd     [242324] (write_attribute) 	info: Sent CIB request 22 with 2 changes for fail-count-r_pgsql#start_0 (id n/a, set n/a)
May 31 07:41:46 node2-001 pacemaker-based     [242321] (cib_process_request) 	info: Forwarding cib_modify operation for section status to all (origin=local/attrd/22)
May 31 07:41:46 node2-001 pacemaker-based     [242321] (cib_perform_op) 	info: Diff: --- 0.550.1 2
May 31 07:41:46 node2-001 pacemaker-based     [242321] (cib_perform_op) 	info: Diff: +++ 0.550.2 (null)
May 31 07:41:46 node2-001 pacemaker-based     [242321] (cib_perform_op) 	info: -- /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-last-failure-r_pgsql.start_0']
May 31 07:41:46 node2-001 pacemaker-based     [242321] (cib_perform_op) 	info: +  /cib:  @num_updates=2
May 31 07:41:46 node2-001 pacemaker-based     [242321] (cib_process_request) 	info: Completed cib_modify operation for section status: OK (rc=0, origin=node2/attrd/21, version=0.550.2)
May 31 07:41:46 node2-001 pacemaker-attrd     [242324] (attrd_cib_callback) 	info: CIB update 21 result for last-failure-r_pgsql#start_0: OK | rc=0
May 31 07:41:46 node2-001 pacemaker-attrd     [242324] (attrd_cib_callback) 	info: * last-failure-r_pgsql#start_0[node1]=(null)
May 31 07:41:46 node2-001 pacemaker-attrd     [242324] (attrd_cib_callback) 	info: * last-failure-r_pgsql#start_0[node2]=(null)
May 31 07:41:46 node2-001 pacemaker-controld  [242326] (abort_transition_graph) 	info: Transition 4 aborted by deletion of nvpair[@id='status-1-last-failure-r_pgsql.start_0']: Transient attribute change | cib=0.550.2 source=abort_unless_down:327 path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-last-failure-r_pgsql.start_0'] complete=true
May 31 07:41:46 node2-001 pacemaker-controld  [242326] (do_state_transition) 	notice: State transition S_IDLE -> S_POLICY_ENGINE | input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph

★この為、以降で、エラーが解除された為、再度、node1のr_pgsqlのstartが計算・実行されています。
----

何かお使いになっているリソースや、手順などで、node2からnode1のr_pgsqlのエラーカウントをクリアするような事が起きていませんでしょうか？
もしくは、使ったことはありませんが、failure-timeoutを設定しているなどで、エラーカウントを一定時間で実行するような設定を使っていませんか？

なお、pacemakaerに流し込んでいる設定ファイルを開示して頂ければ、こちらの手元でも、簡易な構成で確認してみることは可能です。

以上、宜しくお願いいたします。


> ----- Original Message -----
> 
> From: "辻　真吾" <tsuji****@ryobi*****>
> To: "renay****@ybb*****" <renay****@ybb*****>; "LINUX-HA" <linux****@lists*****>
> Cc: "d-ike****@ryobi*****" <d-ike****@ryobi*****>
> Date: 2022/06/07 火 17:09
> Subject: RE: RE: [Linux-ha-jp]  Pacemakerのフェールオーバー後のノード組込みについて
> 
> 
> 山内さん
> 
> お世話になっております。辻です。
> 
> > まだ、ログについては、これから拝見しますが、2度start(例えば、故障などが
> > 起きてstopしてstart(restart))は聞いたことがありません。
> 
> 承知しました。ありがとうございます。
> 
> > また、ログを拝見してご連絡いたします。
> 
> ありがとうございます。お手数をおかけします。
> 
> 以上、よろしくお願いいたします。
> 
> > -----Original Message-----
> > From: renay****@ybb***** <>
> > Sent: Tuesday, June 7, 2022 4:57 PM
> > To: tsuji****@ryobi*****; LINUX-HA <linux****@lists*****>
> > Cc: d-ike****@ryobi*****
> > Subject: Re: RE: [Linux-ha-jp] Pacemakerのフェールオーバー後のノード組込
> > みについて
> > 
> > 辻さん
> > 
> > こんにちは、山内です。
> > 
> > ログの送付ありがとうございました。拝見させていただきます。
> > 
> > > ちなみに、これまでに他の利用者で同様に二度startされたようなケースは
> > > ございますか？
> > > どのような時に二度startされるのか、情報をお持ちでしたら可能な範囲で
> > > 共有いただけますと幸いです。
> > まだ、ログについては、これから拝見しますが、2度start(例えば、故障などが
> > 起きてstopしてstart(restart))は聞いたことがありません。
> > 
> > 他に知見がある方が、コメントしてくれるかも知れません。
> > 
> > 1)2)について、承知いたしました。
> > また、ログを拝見してご連絡いたします。
> > 
> > 以上、よろしくお願いたします。
> > 
> > 
> > > ----- Original Message -----
> > >
> > > From: "辻　真吾" <tsuji****@ryobi*****>
> > > To: "renay****@ybb*****" <renay****@ybb*****>;
> > "LINUX-HA" <linux****@lists*****>
> > > Cc: "d-ike****@ryobi*****" <d-ike****@ryobi*****>
> > > Date: 2022/06/07 火 15:42
> > > Subject: RE: [Linux-ha-jp]  Pacemakerのフェールオーバー後のノード組込
> > みについて
> > >
> > >
> > > 山内さん
> > >
> > > お世話になります。辻です。
> > >
> > > ※先ほど、本メールと同じ内容のメールを送信しましたが、
> > > ※送信エラーになりましたので、再送します。
> > > ※重複して受信された場合は、先のメールの破棄をお願いします。
> > >
> > > 早速のお返事、ありがとうございます。
> > >
> > > > RAが２度startされているとのことですが、ログなど拝見出来ればなにか
> > わか
> > > > るかも知れません。
> > >
> > > 恐縮ですが、ログを送付させていただきますので、何かお気付きの点が
> > > ありましたら、ご助言いただけますと幸いです。
> > > （添付ファイルの送信でエラーになったようですので、このメールの後で
> > > 　山内さん個別にログを送付させていただきます。）
> > >
> > > ちなみに、これまでに他の利用者で同様に二度startされたようなケースは
> > > ございますか？
> > > どのような時に二度startされるのか、情報をお持ちでしたら可能な範囲で
> > > 共有いただけますと幸いです。
> > >
> > > > 1)STONITHは組み込まれていると思いますが、マスターノードは正常にフ
> > ェン
> > > > シング終了して、再起動しているでしょうか？
> > >
> > > 停止時のログでfencing関連のメッセージは以下のものがありました。
> > > 最後の「disconnected」により停止されているように見えますが、
> > > いかがでしょうか？
> > > このメッセージより後に、組込みの起動を行っております。
> > >
> > > May 31 07:32:28 node1-001 pacemaker-controld  [224147]
> > (stonith__watchdog_fencing_enabled_for_node_api) 	warning:
> > watchdog-fencing-query failed
> > > May 31 07:34:36 node1-001 pacemaker-controld  [224147]
> > (stonith__watchdog_fencing_enabled_for_node_api) 	warning:
> > watchdog-fencing-query failed
> > > May 31 07:35:19 node1-001 pacemaker-controld  [224147]
> > (tengine_stonith_connection_destroy) 	info: Fencing daemon disconnected
> > >
> > >
> > > > 2)corosync/pacemakerのsystemdの自動起動が有効になっていたりしま
> > せん
> > > > でしょうか？
> > >
> > > こちらの設定は、無効化しております。
> > >
> > > 以上、よろしくお願いいたします。
> > >
> > > > -----Original Message-----
> > > > From: Linux-ha-japan <> On Behalf Of renay****@ybb*****
> > > > Sent: Saturday, June 4, 2022 9:20 AM
> > > > To: linux****@lists*****
> > > > Cc: d-ike****@ryobi*****
> > > > Subject: Re: [Linux-ha-jp] Pacemakerのフェールオーバー後のノード組込
> > みに
> > > > ついて
> > > >
> > > > 辻さん
> > > >
> > > > こんにちは、山内です。
> > > >
> > > > 5)が正常に終わっているとのことですので、マスターに昇格したスレーブ
> > との
> > > > 同期は正しく実行されていると思いますので、
> > > > 再起動して、スレーブ起動しようとしているpacemakerのリソース起動中
> > に何
> > > > か起きていると思われます。
> > > >
> > > > RAが２度startされているとのことですが、ログなど拝見出来ればなにか
> > わか
> > > > るかも知れません。
> > > >
> > > > 以下の点、とりあえず、確認して見た方が良いかと思います。
> > > > 1)STONITHは組み込まれていると思いますが、マスターノードは正常にフ
> > ェン
> > > > シング終了して、再起動しているでしょうか？
> > > > 2)corosync/pacemakerのsystemdの自動起動が有効になっていたりしま
> > せん
> > > > でしょうか？
> > > >
> > > > 以上、よろしくお願いいたします。
> > > >
> > > > > ----- Original Message -----
> > > > >
> > > > > From: "辻　真吾" <tsuji****@ryobi*****>
> > > > > To: "LINUX-HA" <linux****@lists*****>
> > > > > Cc: "d-ike****@ryobi*****" <d-ike****@ryobi*****>
> > > > > Date: 2022/06/03 金 16:29
> > > > > Subject: [Linux-ha-jp] Pacemakerのフェールオーバー後のノード組込み
> > に
> > > > ついて
> > > > >
> > > > >
> > > > > 初めて投稿させていただきます。
> > > > > 辻と申します。
> > > > >
> > > > > DBサーバ(PostgreSQLレプリケーション構成)を
> > > > > Pacemaker+Corosyncを使用してHAクラスタ構成としています。
> > > > >
> > > > > クラスタ構成において、フェールオーバーを発生させて、
> > > > > その後に、ダウンさせたノードの組込みを実施したところ、
> > > > > 組込みに失敗するという事象が発生しております。
> > > > >
> > > > > 初期状態のノードを以下の構成として、実施した手順を記載いたします。
> > > > >   - マスターノード：ノード1
> > > > >   - スレーブノード：ノード2
> > > > >
> > > > > 実施手順
> > > > >  1. ノード1のpostgresプロセスをkill
> > > > >  2. ノード2がマスターに昇格
> > > > >  3. ノード1のpacemaker、corosyncをsystemctlで停止
> > > > >  4. ノード1でテーブル空間のディレクトリ配下とPGDATAのディレク
> > トリ
> > > > を削除
> > > > >  5. ノード1でpg_basebackupを実施
> > > > >     $ /usr/pgsql-14/bin/pg_basebackup -h <ノード2> -D $PGDATA
> > -Xs
> > > > -P -n
> > > > > 6. ノード1のcorosync、pacemakerをsystemctlで起動
> > > > >     → ここでノード1が"sync"状態になる想定ですが、
> > > > >        実際には起動処理が実施された後に、
> > > > >        停止状態に遷移してしまいます。※
> > > > >
> > > > >   ※
> > > > >   手順6の起動を実施した際に、RAのpgsqlでstart処理が二度呼ばれ
> > てい
> > > > ます。
> > > > >   一度目のstartは正常終了していますが、二度目のstartが異常終了し
> > て、
> > > > >   停止状態に遷移しています。
> > > > >
> > > > > 使用しているバージョンは以下の通りです。
> > > > > corosync ： 3.1.5-1
> > > > > pacemaker： 2.1.0-8
> > > > > pcs      ： 0.10.10-4
> > > > > PotgreSQL： 14.1
> > > > >
> > > > > 原因と対処方法などをご存じの方がおられましたら、ご教授ください。
> > > > > よろしくお願いいたします。
> > > > >
> > > > > _______________________________________________
> > > > > Linux-ha-japan mailing list
> > > > > Linux****@lists*****
> > > > > https://lists.osdn.me/mailman/listinfo/linux-ha-japan
> > > > >
> > > >
> > > > _______________________________________________
> > > > Linux-ha-japan mailing list
> > > > Linux****@lists*****
> > > > https://lists.osdn.me/mailman/listinfo/linux-ha-japan
> > >
> 
>

Linux-HA Japan Forkpm_logconv-cspm_diskdpm_logconv-hbpm_extrasdocpm_crmgenvm-ctlpm_kvm_tools

[Linux-ha-jp] Pacemakerのフェールオーバー後のノード組込みについて

Linux-HA Japan
Fork
pm_logconv-cs
pm_diskd
pm_logconv-hb
pm_extras
doc
pm_crmgen
vm-ctl
pm_kvm_tools