酒井 聡司
ssaka****@opend*****
2013年 9月 19日 (木) 17:13:53 JST
酒井と申します。 pacemaker,hearbeat,nginxで設定がうまくいきません。原因についてどなたかご教授ください。 ・環境 HW :VMware上の仮想サーバ OS :CentOS6.4 Pacemaker:1.0.13-1.1 Heartbeat:3.0.5 niginx :1.4.2 行ったことは以下です。 ・Nginxのインストール ・Pacemakerのインストール tar zxvf pacemaker-1.0.13-1.1.el6.x86_64.repo.tar.gz -C /tmp yum -c /tmp/pacemaker-1.0.13-1.1.el6.x86_64.repo/pacemaker.repo install pacemaker-1.0.13 heartbeat-3.0.5 pm_extras-1.3 ha.cf =============================================================== pacemaker on logfacility local1 debug 0 udpport 694 keepalive 2 warntime 20 deadtime 24 initdead 48 bcast eth1 node nginx1 node nginx2 watchdog /dev/watchdog =============================================================== authkeys =============================================================== auth 1 1 sha1 abcdefg =============================================================== chmod 600 authkeys /etc/init.d/heartbeat start リソースの追加 crm configure property no-quorum-policy="ignore" stonith-enabled="false" crm configure rsc_defaults resource-stickiness="INFINITY" migration-threshold="1" crm configure primitive r-nginx ocf:heartbeat:nginx params configfile="/usr/local/nginx/conf/nginx.conf" op start interval="0" timeout="40" op stop interval="0" timeout="60" ここまで行った時点で、crm_monでは以下のように表示されてしまいます。 ============ Stack: Heartbeat Current DC: nginx2 (f972658e-c709-4bb3-b2b9-1c354b6722c4) - partition with quorum Version: 1.0.13-30bb726 2 Nodes configured, unknown expected votes 1 Resources configured. ============ Online: [ nginx2 ] OFFLINE: [ nginx1 ] Failed actions: r-nginx_start_0 (node=nginx2, call=3, rc=-2, status=Timed Out): unknown exec error ログには次のように記録されています。 〜抜出〜 Sep 18 18:44:33 nginx2 lrmd: [2273]: info: rsc:r-nginx start[3] (pid 2458) Sep 18 18:44:33 nginx2 lrmd: [2273]: info: RA output: (r-nginx:start:stderr) ls: Sep 18 18:44:33 nginx2 lrmd: [2273]: info: RA output: (r-nginx:start:stderr) cannot access mime.types Sep 18 18:44:33 nginx2 lrmd: [2273]: info: RA output: (r-nginx:start:stderr) : No such file or directory Sep 18 18:44:33 nginx2 lrmd: [2273]: info: RA output: (r-nginx:start:stderr) Sep 18 18:44:33 nginx2 lrmd: [2273]: info: RA output: (r-nginx:start:stderr) ls: Sep 18 18:44:33 nginx2 lrmd: [2273]: info: RA output: (r-nginx:start:stderr) cannot access mime.types Sep 18 18:44:33 nginx2 lrmd: [2273]: info: RA output: (r-nginx:start:stderr) : No such file or directory Sep 18 18:44:33 nginx2 lrmd: [2273]: info: RA output: (r-nginx:start:stderr) Sep 18 18:44:33 nginx2 lrmd: [2273]: info: RA output: (r-nginx:start:stderr) /usr/lib/ocf/resource.d//heartbeat/nginx: line 403: [: too many arguments Sep 18 18:44:34 nginx2 nginx(r-nginx)[2458]: INFO: nginx: the configuration file /usr/local/nginx/conf/nginx.conf syntax is ok nginx: configuration file /usr/local/nginx/conf/nginx.conf test is successful Sep 18 18:44:34 nginx2 nginx(r-nginx)[2458]: INFO: Starting /usr/local/nginx/sbin/nginx - nginx version: nginx/1.4.2 Sep 18 18:44:34 nginx2 nginx(r-nginx)[2458]: INFO: /usr/local/nginx/sbin/nginx build configuration: configure arguments: --user=nginx --group=nginx --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_xslt_module --with-http_image_filter_module --with-http_geoip_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_stub_status_module Sep 18 18:44:34 nginx2 lrmd: [2273]: info: RA output: (r-nginx:start:stderr) /usr/lib/ocf/resource.d//heartbeat/nginx: line 403: [: too many arguments Sep 18 18:44:34 nginx2 nginx(r-nginx)[2458]: INFO: nginx not running Sep 18 18:44:34 nginx2 nginx(r-nginx)[2458]: INFO: Waiting for /usr/local/nginx/sbin/nginx -c /usr/local/nginx/conf/nginx.conf to come up (try 1) Sep 18 18:44:35 nginx2 lrmd: [2273]: info: RA output: (r-nginx:start:stderr) /usr/lib/ocf/resource.d//heartbeat/nginx: line 403: [: too many arguments Sep 18 18:44:35 nginx2 nginx(r-nginx)[2458]: INFO: nginx not running Sep 18 18:44:35 nginx2 nginx(r-nginx)[2458]: INFO: Waiting for /usr/local/nginx/sbin/nginx -c /usr/local/nginx/conf/nginx.conf to come up (try 2) Sep 18 18:44:36 nginx2 lrmd: [2273]: info: RA output: (r-nginx:start:stderr) /usr/lib/ocf/resource.d//heartbeat/nginx: line 403: [: too many arguments (snip) Sep 18 18:45:13 nginx2 lrmd: [2273]: info: RA output: (r-nginx:start:stderr) /usr/lib/ocf/resource.d//heartbeat/nginx: line 403: [: too many arguments Sep 18 18:45:13 nginx2 nginx(r-nginx)[2458]: INFO: nginx not running Sep 18 18:45:13 nginx2 nginx(r-nginx)[2458]: INFO: Waiting for /usr/local/nginx/sbin/nginx -c /usr/local/nginx/conf/nginx.conf to come up (try 40) Sep 18 18:45:13 nginx2 lrmd: [2273]: WARN: r-nginx:start process (PID 2458) timed out (try 1). Killing with signal SIGTERM (15). Sep 18 18:45:13 nginx2 lrmd: [2273]: WARN: operation start[3] on r-nginx for client 2276: pid 2458 timed out Sep 18 18:45:13 nginx2 crmd: [2276]: ERROR: process_lrm_event: LRM operation r-nginx_start_0 (3) Timed Out (timeout=40000ms) Sep 18 18:45:13 nginx2 crmd: [2276]: WARN: status_from_rc: Action 5 (r-nginx_start_0) on nginx2 failed (target: 0 vs. rc: -2): Error Sep 18 18:45:14 nginx2 crmd: [2276]: WARN: update_failcount: Updating failcount for r-nginx on nginx2 after failed start: rc=-2 (update=INFINITY, time=1379497514) Sep 18 18:45:14 nginx2 crmd: [2276]: info: abort_transition_graph: match_graph_event:299 - Triggered transition abort (complete=0, tag=lrm_rsc_op, id=r-nginx_start_0, magic=2:-2;5:3:0:c339c71a-c03d-4d27-9134-ff9ea830bed3, cib=0.12.5) : Event failed Sep 18 18:45:14 nginx2 crmd: [2276]: info: update_abort_priority: Abort priority upgraded from 0 to 1 Sep 18 18:45:14 nginx2 crmd: [2276]: info: update_abort_priority: Abort action done superceeded by restart Sep 18 18:45:14 nginx2 crmd: [2276]: info: match_graph_event: Action r-nginx_start_0 (5) confirmed on nginx2 (rc=4) Sep 18 18:45:14 nginx2 crmd: [2276]: info: run_graph: ==================================================== Sep 18 18:45:14 nginx2 crmd: [2276]: notice: run_graph: Transition 3 (Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-56.bz2): Complete Sep 18 18:45:14 nginx2 crmd: [2276]: info: te_graph_trigger: Transition 3 is now complete Sep 18 18:45:14 nginx2 crmd: [2276]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ] Sep 18 18:45:14 nginx2 crmd: [2276]: info: do_state_transition: All 1 cluster nodes are eligible to run resources. Sep 18 18:45:14 nginx2 attrd: [2275]: info: find_hash_entry: Creating hash entry for fail-count-r-nginx Sep 18 18:45:14 nginx2 attrd: [2275]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-r-nginx (INFINITY) Sep 18 18:45:14 nginx2 crmd: [2276]: info: do_pe_invoke: Query 85: Requesting the current CIB: S_POLICY_ENGINE Sep 18 18:45:14 nginx2 crmd: [2276]: info: do_pe_invoke_callback: Invoking the PE: query=85, ref=pe_calc-dc-1379497514-30, seq=1, quorate=1 Sep 18 18:45:14 nginx2 attrd: [2275]: info: attrd_perform_update: Sent update 19: fail-count-r-nginx=INFINITY Sep 18 18:45:14 nginx2 attrd: [2275]: info: find_hash_entry: Creating hash entry for last-failure-r-nginx Sep 18 18:45:14 nginx2 attrd: [2275]: info: attrd_trigger_update: Sending flush op to all hosts for: last-failure-r-nginx (1379497514) Sep 18 18:45:14 nginx2 pengine: [2278]: notice: unpack_config: On loss of CCM Quorum: Ignore Sep 18 18:45:14 nginx2 pengine: [2278]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 Sep 18 18:45:14 nginx2 pengine: [2278]: info: determine_online_status: Node nginx2 is online Sep 18 18:45:14 nginx2 pengine: [2278]: WARN: unpack_rsc_op: Processing failed op r-nginx_start_0 on nginx2: unknown exec error (-2) Sep 18 18:45:14 nginx2 pengine: [2278]: notice: native_print: r-nginx#011(ocf::heartbeat:nginx):#011Started nginx2 FAILED Sep 18 18:45:14 nginx2 pengine: [2278]: notice: LogActions: Recover resource r-nginx#011(Started nginx2) Sep 18 18:45:14 nginx2 attrd: [2275]: info: attrd_perform_update: Sent update 22: last-failure-r-nginx=1379497514 Sep 18 18:45:14 nginx2 crmd: [2276]: info: abort_transition_graph: te_update_diff:150 - Triggered transition abort (complete=1, tag=nvpair, id=status-f972658e-c709-4bb3-b2b9-1c354b6722c4-fail-count-r-nginx, name=fail-count-r-nginx, value=INFINITY, magic=NA, cib=0.12.6) : Transient attribute: update Sep 18 18:45:14 nginx2 crmd: [2276]: info: abort_transition_graph: te_update_diff:150 - Triggered transition abort (complete=1, tag=nvpair, id=status-f972658e-c709-4bb3-b2b9-1c354b6722c4-last-failure-r-nginx, name=last-failure-r-nginx, value=1379497514, magic=NA, cib=0.12.7) : Transient attribute: update Sep 18 18:45:14 nginx2 crmd: [2276]: info: handle_response: pe_calc calculation pe_calc-dc-1379497514-30 is obsolete Sep 18 18:45:14 nginx2 crmd: [2276]: info: do_pe_invoke: Query 86: Requesting the current CIB: S_POLICY_ENGINE Sep 18 18:45:14 nginx2 crmd: [2276]: info: do_pe_invoke: Query 87: Requesting the current CIB: S_POLICY_ENGINE Sep 18 18:45:14 nginx2 pengine: [2278]: info: process_pe_message: Transition 4: PEngine Input stored in: /var/lib/pengine/pe-input-57.bz2 Sep 18 18:45:14 nginx2 crmd: [2276]: info: do_pe_invoke_callback: Invoking the PE: query=87, ref=pe_calc-dc-1379497514-31, seq=1, quorate=1 Sep 18 18:45:14 nginx2 pengine: [2278]: notice: unpack_config: On loss of CCM Quorum: Ignore Sep 18 18:45:14 nginx2 pengine: [2278]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 Sep 18 18:45:14 nginx2 pengine: [2278]: info: determine_online_status: Node nginx2 is online Sep 18 18:45:14 nginx2 pengine: [2278]: WARN: unpack_rsc_op: Processing failed op r-nginx_start_0 on nginx2: unknown exec error (-2) Sep 18 18:45:14 nginx2 pengine: [2278]: notice: native_print: r-nginx#011(ocf::heartbeat:nginx):#011Started nginx2 FAILED Sep 18 18:45:14 nginx2 pengine: [2278]: info: get_failcount: r-nginx has failed INFINITY times on nginx2 Sep 18 18:45:14 nginx2 pengine: [2278]: WARN: common_apply_stickiness: Forcing r-nginx away from nginx2 after 1000000 failures (max=1) 〜〜 どのようなことが原因として考えられるのでしょうか?