島田雄太
yuta.****@gmail*****
2008年 11月 17日 (月) 22:27:08 JST
島田と申します linux-haの挙動で質問があります。 現在、VMwareのホストOS2台を用いて、その上にそれぞれRedhatES4をゲストOSとして 立ち上げています。 heartbeatとDRBDを用いて、その上にpostgresを使って構築をしております。 VM上でシリアルケーブルを認識させているのですが、通常運用時に以下のようなエラーが出ており WatchDog上で障害だと判断し、切り替わりが発生してしまったように見受けられます なぜ、そうなったのか、調査したいと考えております。 そもそも設定がおかしいというのも考えられるのでしょうか。 どういった情報をお伝えすればよいか分からない為、不適切なご相談になってしまうようでしたら 必要な情報をご教授いただけますでしょうか。 ご迷惑をお掛け致しますが、宜しくお願い致します 1.コンソール上において以下のようなメッセージが出ていました 「1xxxdb01 kernel: ttyS0: 1 input overrun(s)」 というメッセージが表示されており、最後に 「SoftDog: Initiaing system reboot.」 というメッセージが表示されていた 2./var/log/messageについて Nov 17 05:37:03 1mctdb01 kernel: drbd0: resync bitmap: bits=12799789 words=399994 Nov 17 05:37:03 1mctdb01 kernel: klogd 1.4.1, ---------- state change ---------- Nov 17 05:37:03 1mctdb01 kernel: drbd0: size = 48 GB (51199155 KB) Nov 17 05:37:03 1mctdb01 kernel: drbd0: 0 KB marked out-of-sync by on disk bit-map. Nov 17 05:37:03 1mctdb01 kernel: drbd0: Found 4 transactions (192 active extents) in activity log. Nov 17 05:37:03 1mctdb01 kernel: drbd0: Marked additional 507 MB as out-of-sync based on AL. Nov 17 05:37:03 1mctdb01 kernel: drbd0: drbdsetup [4700]: cstate Unconfigured --> StandAlone Nov 17 05:37:03 1mctdb01 drbd: s0 Nov 17 05:37:03 1mctdb01 drbd: n0 Nov 17 05:37:03 1mctdb01 kernel: drbd0: drbdsetup [4713]: cstate StandAlone --> Unconnected Nov 17 05:37:03 1mctdb01 kernel: drbd0: drbd0_receiver [4714]: cstate Unconnected --> WFConnection Nov 17 05:37:03 1mctdb01 drbd: ] Nov 17 05:37:03 1mctdb01 drbd: . Nov 17 05:37:04 1mctdb01 drbd: WARN: stdin/stdout is not a TTY; using /dev/console Nov 17 05:37:05 1mctdb01 kernel: drbd0: drbd0_receiver [4714]: cstate WFConnection --> WFReportParams Nov 17 05:37:05 1mctdb01 kernel: drbd0: Handshake successful: DRBD Network Protocol version 74 Nov 17 05:37:05 1mctdb01 kernel: drbd0: Connection established. Nov 17 05:37:05 1mctdb01 kernel: drbd0: I am(S): 1:00000002:00000001:0000001e:0000000e:11 Nov 17 05:37:05 1mctdb01 kernel: drbd0: Peer(P): 1:00000002:00000001:0000001e:0000000f:10 Nov 17 05:37:05 1mctdb01 kernel: drbd0: drbd0_receiver [4714]: cstate WFReportParams --> WFBitMapT Nov 17 05:37:05 1mctdb01 kernel: drbd0: Secondary/Unknown --> Secondary/Primary Nov 17 05:37:05 1mctdb01 rc: Starting drbd: succeeded Nov 17 05:37:05 1mctdb01 kernel: drbd0: drbd0_receiver [4714]: cstate WFBitMapT --> SyncTarget Nov 17 05:37:05 1mctdb01 kernel: drbd0: Resync started as SyncTarget (need to sync 519348 KB [129837 bits set]). Nov 17 05:37:06 1mctdb01 logd: [4739]: info: logd started with default configuration. Nov 17 05:37:06 1mctdb01 logd: [4739]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Nov 17 05:37:06 1mctdb01 logd: [4748]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Nov 17 05:37:06 1mctdb01 kernel: Software Watchdog Timer: 0.07 initialized. soft_noboot=0 soft_margin=60 sec (nowayout= 0) Nov 17 05:37:09 1mctdb01 sendmail[4833]: sql_select option missing Nov 17 05:37:09 1mctdb01 sendmail[4833]: auxpropfunc error no mechanism available Nov 17 05:37:09 1mctdb01 sendmail: sendmail startup succeeded Nov 17 05:37:09 1mctdb01 sendmail: sm-client startup succeeded Nov 17 05:37:09 1mctdb01 gpm[4855]: *** info [startup.c(95)]: Nov 17 05:37:09 1mctdb01 gpm[4855]: Started gpm successfully. Entered daemon mode. Nov 17 05:37:09 1mctdb01 gpm[4855]: *** info [mice.c(1766)]: Nov 17 05:37:09 1mctdb01 gpm[4855]: imps2: Auto-detected intellimouse PS/2 Nov 17 05:37:10 1mctdb01 lrmd: [4858]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Nov 17 05:37:10 1mctdb01 attrd: [4860]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Nov 17 05:37:10 1mctdb01 ccm: [4856]: info: Hostname: 1mctdb01.tdc-asp.jp Nov 17 05:37:10 1mctdb01 attrd: [4860]: info: register_with_ha: Hostname: 1mctdb01.tdc-asp.jp Nov 17 05:37:10 1mctdb01 gpm: gpm startup succeeded Nov 17 05:37:10 1mctdb01 cib: [4857]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Nov 17 05:37:10 1mctdb01 cib: [4857]: info: G_main_add_TriggerHandler: Added signal manual handler Nov 17 05:37:10 1mctdb01 cib: [4857]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Nov 17 05:37:10 1mctdb01 cib: [4857]: info: main: Retrieval of a per-action CIB: disabled Nov 17 05:37:10 1mctdb01 ipfail: [4862]: ERROR: No managed resources Nov 17 05:37:10 1mctdb01 lrmd: [4858]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Nov 17 05:37:10 1mctdb01 ccm: [4856]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Nov 17 05:37:10 1mctdb01 lrmd: [4858]: info: G_main_add_SignalHandler: Added signal handler for signal 10 Nov 17 05:37:10 1mctdb01 lrmd: [4858]: info: G_main_add_SignalHandler: Added signal handler for signal 12 Nov 17 05:37:10 1mctdb01 cib: [4857]: ERROR: crm_is_writable: /var/lib/heartbeat/crm/cib.xml must be owned and r/w by user hacluster Nov 17 05:37:10 1mctdb01 cib: [4857]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/heartbeat/crm/cib.xml.sig) Nov 17 05:37:10 1mctdb01 attrd: [4860]: info: register_with_ha: UUID: c73aec4b-6947-40a2-b1e2-853d10c4d22d Nov 17 05:37:10 1mctdb01 stonithd: [4859]: info: G_main_add_SignalHandler: Added signal handler for signal 10 Nov 17 05:37:10 1mctdb01 stonithd: [4859]: info: G_main_add_SignalHandler: Added signal handler for signal 12 Nov 17 05:37:10 1mctdb01 lrmd: [4858]: info: Started. Nov 17 05:37:10 1mctdb01 crmd: [4861]: info: main: CRM Hg Version: node: 552305612591183b1628baa5bc6e903e0f1e26a3 Nov 17 05:37:10 1mctdb01 crmd: [4861]: info: crmd_init: Starting crmd Nov 17 05:37:10 1mctdb01 crmd: [4861]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Nov 17 05:37:10 1mctdb01 crmd: [4861]: info: G_main_add_TriggerHandler: Added signal manual handler Nov 17 05:37:10 1mctdb01 crmd: [4861]: info: G_main_add_SignalHandler: Added signal handler for signal 17 3./var/ha-log heartbeat[4806]: 2008/11/17_05:37:07 WARN: Logging daemon is disabled --enabling logging daemon is recommended heartbeat[4806]: 2008/11/17_05:37:07 info: ************************** heartbeat[4806]: 2008/11/17_05:37:07 info: Configuration validated. Starting heartbeat 2.1.3 heartbeat[4807]: 2008/11/17_05:37:07 info: heartbeat: version 2.1.3 heartbeat[4807]: 2008/11/17_05:37:07 info: Heartbeat generation: 1211968635 heartbeat[4807]: 2008/11/17_05:37:08 info: glib: Starting serial heartbeat on tty /dev/ttyS0 (19200 baud) heartbeat[4807]: 2008/11/17_05:37:08 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth1 heartbeat[4807]: 2008/11/17_05:37:08 info: glib: ucast: bound send socket to device: eth1 heartbeat[4807]: 2008/11/17_05:37:08 info: glib: ucast: bound receive socket to device: eth1 heartbeat[4807]: 2008/11/17_05:37:08 info: glib: ucast: started on port 694 interface eth1 to 192.168.201.2 heartbeat[4807]: 2008/11/17_05:37:08 info: glib: ping heartbeat started. heartbeat[4807]: 2008/11/17_05:37:08 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[4807]: 2008/11/17_05:37:08 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[4807]: 2008/11/17_05:37:08 notice: Using watchdog device: /dev/watchdog heartbeat[4807]: 2008/11/17_05:37:08 info: G_main_add_SignalHandler: Added signal handler for signal 17 heartbeat[4807]: 2008/11/17_05:37:08 info: Local status now set to: 'up' heartbeat[4807]: 2008/11/17_05:37:09 info: Link 192.168.70.254:192.168.70.254 up. heartbeat[4807]: 2008/11/17_05:37:09 info: Status update for node 192.168.70.254: status ping heartbeat[4807]: 2008/11/17_05:37:09 info: Link 1mctdb02.xxx-asp.jp:/dev/ttyS0 up. heartbeat[4807]: 2008/11/17_05:37:09 info: Status update for node 1mctdb02.xxx-asp.jp: status active heartbeat[4807]: 2008/11/17_05:37:09 info: Link 1mctdb02.xxx-asp.jp:eth1 up. heartbeat[4807]: 2008/11/17_05:37:09 info: Comm_now_up(): updating status to active heartbeat[4807]: 2008/11/17_05:37:09 info: Local status now set to: 'active' heartbeat[4807]: 2008/11/17_05:37:09 info: Starting child client "/usr/lib/heartbeat/ccm" (400,400) heartbeat[4807]: 2008/11/17_05:37:09 info: Starting child client "/usr/lib/heartbeat/cib" (400,400) heartbeat[4807]: 2008/11/17_05:37:09 info: Starting child client "/usr/lib/heartbeat/lrmd -r" (0,0) heartbeat[4857]: 2008/11/17_05:37:09 info: Starting "/usr/lib/heartbeat/cib" as uid 400 gid 400 (pid 4857) heartbeat[4807]: 2008/11/17_05:37:09 info: Starting child client "/usr/lib/heartbeat/stonithd" (0,0) heartbeat[4807]: 2008/11/17_05:37:09 info: Starting child client "/usr/lib/heartbeat/attrd" (400,400) heartbeat[4859]: 2008/11/17_05:37:09 info: Starting "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 4859) heartbeat[4807]: 2008/11/17_05:37:09 info: Starting child client "/usr/lib/heartbeat/crmd" (400,400) heartbeat[4856]: 2008/11/17_05:37:09 info: Starting "/usr/lib/heartbeat/ccm" as uid 400 gid 400 (pid 4856) heartbeat[4807]: 2008/11/17_05:37:09 info: Starting child client "/usr/lib/heartbeat/ipfail" (400,400) heartbeat[4858]: 2008/11/17_05:37:09 info: Starting "/usr/lib/heartbeat/lrmd -r" as uid 0 gid 0 (pid 4858) heartbeat[4860]: 2008/11/17_05:37:09 info: Starting "/usr/lib/heartbeat/attrd" as uid 400 gid 400 (pid 4860) heartbeat[4807]: 2008/11/17_05:37:09 WARN: G_CH_dispatch_int: Dispatch function for read child took too long to execute: 80 ms (> 50 ms) (GSource: 0x9148158) heartbeat[4862]: 2008/11/17_05:37:09 info: Starting "/usr/lib/heartbeat/ipfail" as uid 400 gid 400 (pid 4862) heartbeat[4861]: 2008/11/17_05:37:10 info: Starting "/usr/lib/heartbeat/crmd" as uid 400 gid 400 (pid 4861) lrmd[4858]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added signal handler for signal 15 attrd[4860]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added signal handler for signal 15 ccm[4856]: 2008/11/17_05:37:10 info: Hostname: 1mctdb01.xxx-asp.jp attrd[4860]: 2008/11/17_05:37:10 info: register_with_ha: Hostname: 1mctdb01.xxx-asp.jp cib[4857]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added signal handler for signal 15 cib[4857]: 2008/11/17_05:37:10 info: G_main_add_TriggerHandler: Added signal manual handler cib[4857]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added signal handler for signal 17 heartbeat[4807]: 2008/11/17_05:37:10 ERROR: api_process_request: bad request [getrsc] cib[4857]: 2008/11/17_05:37:10 info: main: Retrieval of a per-action CIB: disabled heartbeat[4807]: 2008/11/17_05:37:10 ERROR: MSG: Dumping message with 5 fields heartbeat[4807]: 2008/11/17_05:37:10 ERROR: MSG[0] : [t=hbapi-req] heartbeat[4807]: 2008/11/17_05:37:10 ERROR: MSG[1] : [reqtype=getrsc] heartbeat[4807]: 2008/11/17_05:37:10 ERROR: MSG[2] : [dest=1mctdb01.xxx-asp.jp] heartbeat[4807]: 2008/11/17_05:37:10 ERROR: MSG[3] : [pid=4862] heartbeat[4807]: 2008/11/17_05:37:10 ERROR: MSG[4] : [from_id=ipfail] ipfail[4862]: 2008/11/17_05:37:10 ERROR: No managed resources heartbeat[4807]: 2008/11/17_05:37:10 WARN: Managed /usr/lib/heartbeat/ipfail process 4862 exited with return code 100. lrmd[4858]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added signal handler for signal 17 lrmd[4858]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added signal handler for signal 10 ccm[4856]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added signal handler for signal 15 lrmd[4858]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added signal handler for signal 12 cib[4857]: 2008/11/17_05:37:10 ERROR: crm_is_writable: /var/lib/heartbeat/crm/cib.xml must be owned and r/w by user hacluster cib[4857]: 2008/11/17_05:37:10 info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/heartbeat/crm/cib.xml.sig) stonithd[4859]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added signal handler for signal 10 stonithd[4859]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added signal handler for signal 12 attrd[4860]: 2008/11/17_05:37:10 info: register_with_ha: UUID: c73aec4b-6947-40a2-b1e2-853d10c4d22d lrmd[4858]: 2008/11/17_05:37:10 info: Started. crmd[4861]: 2008/11/17_05:37:10 info: main: CRM Hg Version: node: 552305612591183b1628baa5bc6e903e0f1e26a3 crmd[4861]: 2008/11/17_05:37:10 info: crmd_init: Starting crmd crmd[4861]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added signal handler for signal 15 crmd[4861]: 2008/11/17_05:37:10 info: G_main_add_TriggerHandler: Added signal manual handler crmd[4861]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added signal handler for signal 17 cib[4857]: 2008/11/17_05:37:10 WARN: validate_cib_digest: No on-disk digest present あとこういったエラーログも出ていました heartbeat[4807]: 2008/11/17_05:37:59 ERROR: ha_msg_addraw_ll: illegal field heartbeat[4807]: 2008/11/17_05:37:59 ERROR: ha_msg_addraw(): ha_msg_addraw_ll failed heartbeat[4807]: 2008/11/17_05:37:59 ERROR: NV failure (string2msg_ll): heartbeat[4807]: 2008/11/17_05:37:59 ERROR: Input string: [>>> t=cib cib_clientid=5977 cib_callopt=1052672 cib_callid=4 cib_op=cib_apply_diff cib_section=status cib_delegated_from=1mctdb01.xxx-asp.jp cib_clientname=5977 (4)cib_calldata=eJyV0EEOwiAQBdC912ia2AQSmE5bbcJZCC2zIFE0QD2/QrqnfuDb6XWhS2CyKsKQWaCB5vJwESWo/aZYlArFILlqOJVfLG3/qUJJS3o8/1FmEhFgj8cXAPxpoiYfzC4234a2 hg=483d2dbb ts=49208487 ld=0.42 0.13 0.03<< >>> t=cib cib_clientid=ab2880a6-3d82-472f-8275-2c9846356fcb cited_from=1mctdb01.xxx-asp.jp cib_clientname=4861 (4)cib_calldata=eJyVUtlu4yAUfe9vRJFaCSJW28mIb0Es1xk6trEAV8rfF9vpVG1nHvKCdLez3MvhcHhi7PJMXrSezAhaqyl60LmYAkhsleCVa7kBJyxuzqLFghiGLQWGO8k9JU54xjxi3dq9rDC Kjq54S+ipeIdNnk+vM+Jb3aURe7DLFccUrmFSPupl9pVOJ8hxSQ5QS/jlmb0MaVSNRvMKlYMKU6/NScnY7zzs1+cm9VxMVXZj3HXPQUS+hv2sMY64KrnPPaFWdIpoQ4qb3+wGkasvlJZsphRdB/4KZk e6m+HO+sBS8wZ+6MhaMcd9AYLKnvJXBOO5Coab7Nj+YanKqCfz2Asq/ZmWHQ1TGlaN9LNaD7urwl1R8MRbHTxwGS067+a1VjsW9g++JLXjP7zaYC6c0MNSHkvcWHK+SiHAFCO9vyvkpgzDbOey4lUCZ NTyWg4/H49K/nHeAYBD8= cib_isreplyto=1mctdb01.xxx-asp.jp cib_update=true (4)cib_update_diff=eJydU+1uozAQ/N/XiCI1kjl5/YXhxLNYxl5yqAEi41Tq258NtGma6KTLHwt2ZtezO+vdbvcCqn6lB2NGO6Axje+7joCq6ld2yN9FwGF6R9/sEpeJO+4nTmSqIw+ubxcmyBtm ChPQOTReBnM5extxbiSlZL/fv5BVg/VDPxo8T+5PQwlAjq1/QP+VXVXll1rr/adWdq91QYnW5RNSgWi2tDhHGy/zmqpvUleElBoyb5w8mhzCR3quKFGaZf4pDI/0pDCRVbkxTMB5ugSHqwB2x70SiOT 6Z9YjF7/jRCjxlTM7M50fab+ihC+1et/40HpaU3Oe5mjGKfbdh/FpNSKaZF6VWdMZg439NDYrTvgyPheGwmN7ORZT6I/92PhpG/tVlqKZGYMd5z5XMG/40ciyBlY7rtsWvSg4c1UhHPBCo7KFBN9J5B w0SqLUj/zBHnvXJMG//6MK8EWvPZ1M6hiArHNJDZgOk/UhOYqxYb9SxwsSnHHJ57zNYp2A2baHbsX6MWJ4t6cUEHKj+P6Ic2wcRQq6LXmXJDDWKuc9lxKBSduBRMI3V/L9eb1TU+ENQzJo2fM6Jn/W5 8VvTVqjyyHYBrmSW3SiLVQlykJQy4oWkBVacg/UCc+Yfzrlejz50uH71dt8RMudsJ1T1noG0tOSMc4sp9RqqNSa8hfq6W3F oseq=62 _compression_algorithm=zlib from_id=cib to_id=cib client_gen=3 src=1mctdb02.xxx-asp.jp (1)srcuuid=PhGaPNvdTVSt7wPintDUbg== seq=4234a3 hg=483d2dbb ts=49208488 ld=0.42 0.13 0.03 2/89 24799 ttl=3 auth=1 2a06b826 <<< ]