[Linux-ha-jp] VM上でシリアルケーブル接続をした際の挙動

Back to archive index

島田雄太 yuta.****@gmail*****
2008年 11月 17日 (月) 22:27:08 JST


島田と申します

linux-haの挙動で質問があります。

現在、VMwareのホストOS2台を用いて、その上にそれぞれRedhatES4をゲストOSとして
立ち上げています。

heartbeatとDRBDを用いて、その上にpostgresを使って構築をしております。
VM上でシリアルケーブルを認識させているのですが、通常運用時に以下のようなエラーが出ており
WatchDog上で障害だと判断し、切り替わりが発生してしまったように見受けられます

なぜ、そうなったのか、調査したいと考えております。


そもそも設定がおかしいというのも考えられるのでしょうか。
どういった情報をお伝えすればよいか分からない為、不適切なご相談になってしまうようでしたら
必要な情報をご教授いただけますでしょうか。

ご迷惑をお掛け致しますが、宜しくお願い致します

1.コンソール上において以下のようなメッセージが出ていました
	「1xxxdb01 kernel: ttyS0: 1 input overrun(s)」
	というメッセージが表示されており、最後に
	「SoftDog: Initiaing system reboot.」
	というメッセージが表示されていた

2./var/log/messageについて

Nov 17 05:37:03 1mctdb01 kernel: drbd0: resync bitmap: bits=12799789
words=399994
Nov 17 05:37:03 1mctdb01 kernel: klogd 1.4.1, ---------- state change ----------
Nov 17 05:37:03 1mctdb01 kernel: drbd0: size = 48 GB (51199155 KB)
Nov 17 05:37:03 1mctdb01 kernel: drbd0: 0 KB marked out-of-sync by on
disk bit-map.
Nov 17 05:37:03 1mctdb01 kernel: drbd0: Found 4 transactions (192
active extents) in activity log.
Nov 17 05:37:03 1mctdb01 kernel: drbd0: Marked additional 507 MB as
out-of-sync based on AL.
Nov 17 05:37:03 1mctdb01 kernel: drbd0: drbdsetup [4700]: cstate
Unconfigured --> StandAlone
Nov 17 05:37:03 1mctdb01 drbd: s0
Nov 17 05:37:03 1mctdb01 drbd: n0
Nov 17 05:37:03 1mctdb01 kernel: drbd0: drbdsetup [4713]: cstate
StandAlone --> Unconnected
Nov 17 05:37:03 1mctdb01 kernel: drbd0: drbd0_receiver [4714]: cstate
Unconnected --> WFConnection
Nov 17 05:37:03 1mctdb01 drbd: ]
Nov 17 05:37:03 1mctdb01 drbd: .
Nov 17 05:37:04 1mctdb01 drbd: WARN: stdin/stdout is not a TTY; using
/dev/console
Nov 17 05:37:05 1mctdb01 kernel: drbd0: drbd0_receiver [4714]: cstate
WFConnection --> WFReportParams
Nov 17 05:37:05 1mctdb01 kernel: drbd0: Handshake successful: DRBD
Network Protocol version 74
Nov 17 05:37:05 1mctdb01 kernel: drbd0: Connection established.
Nov 17 05:37:05 1mctdb01 kernel: drbd0: I am(S):
1:00000002:00000001:0000001e:0000000e:11
Nov 17 05:37:05 1mctdb01 kernel: drbd0: Peer(P):
1:00000002:00000001:0000001e:0000000f:10
Nov 17 05:37:05 1mctdb01 kernel: drbd0: drbd0_receiver [4714]: cstate
WFReportParams --> WFBitMapT
Nov 17 05:37:05 1mctdb01 kernel: drbd0: Secondary/Unknown --> Secondary/Primary
Nov 17 05:37:05 1mctdb01 rc: Starting drbd:  succeeded
Nov 17 05:37:05 1mctdb01 kernel: drbd0: drbd0_receiver [4714]: cstate
WFBitMapT --> SyncTarget
Nov 17 05:37:05 1mctdb01 kernel: drbd0: Resync started as SyncTarget
(need to sync 519348 KB [129837 bits set]).
Nov 17 05:37:06 1mctdb01 logd: [4739]: info: logd started with default
configuration.
Nov 17 05:37:06 1mctdb01 logd: [4739]: info: G_main_add_SignalHandler:
Added signal handler for signal 15
Nov 17 05:37:06 1mctdb01 logd: [4748]: info: G_main_add_SignalHandler:
Added signal handler for signal 15
Nov 17 05:37:06 1mctdb01 kernel: Software Watchdog Timer: 0.07
initialized. soft_noboot=0 soft_margin=60 sec (nowayout= 0)
Nov 17 05:37:09 1mctdb01 sendmail[4833]: sql_select option missing
Nov 17 05:37:09 1mctdb01 sendmail[4833]: auxpropfunc error no
mechanism available
Nov 17 05:37:09 1mctdb01 sendmail: sendmail startup succeeded
Nov 17 05:37:09 1mctdb01 sendmail: sm-client startup succeeded
Nov 17 05:37:09 1mctdb01 gpm[4855]: *** info [startup.c(95)]:
Nov 17 05:37:09 1mctdb01 gpm[4855]: Started gpm successfully. Entered
daemon mode.
Nov 17 05:37:09 1mctdb01 gpm[4855]: *** info [mice.c(1766)]:
Nov 17 05:37:09 1mctdb01 gpm[4855]: imps2: Auto-detected intellimouse PS/2
Nov 17 05:37:10 1mctdb01 lrmd: [4858]: info: G_main_add_SignalHandler:
Added signal handler for signal 15
Nov 17 05:37:10 1mctdb01 attrd: [4860]: info:
G_main_add_SignalHandler: Added signal handler for signal 15
Nov 17 05:37:10 1mctdb01 ccm: [4856]: info: Hostname: 1mctdb01.tdc-asp.jp
Nov 17 05:37:10 1mctdb01 attrd: [4860]: info: register_with_ha:
Hostname: 1mctdb01.tdc-asp.jp
Nov 17 05:37:10 1mctdb01 gpm: gpm startup succeeded
Nov 17 05:37:10 1mctdb01 cib: [4857]: info: G_main_add_SignalHandler:
Added signal handler for signal 15
Nov 17 05:37:10 1mctdb01 cib: [4857]: info: G_main_add_TriggerHandler:
Added signal manual handler
Nov 17 05:37:10 1mctdb01 cib: [4857]: info: G_main_add_SignalHandler:
Added signal handler for signal 17
Nov 17 05:37:10 1mctdb01 cib: [4857]: info: main: Retrieval of a
per-action CIB: disabled
Nov 17 05:37:10 1mctdb01 ipfail: [4862]: ERROR: No managed resources
Nov 17 05:37:10 1mctdb01 lrmd: [4858]: info: G_main_add_SignalHandler:
Added signal handler for signal 17
Nov 17 05:37:10 1mctdb01 ccm: [4856]: info: G_main_add_SignalHandler:
Added signal handler for signal 15
Nov 17 05:37:10 1mctdb01 lrmd: [4858]: info: G_main_add_SignalHandler:
Added signal handler for signal 10
Nov 17 05:37:10 1mctdb01 lrmd: [4858]: info: G_main_add_SignalHandler:
Added signal handler for signal 12
Nov 17 05:37:10 1mctdb01 cib: [4857]: ERROR: crm_is_writable:
/var/lib/heartbeat/crm/cib.xml must be owned and r/w by user hacluster
Nov 17 05:37:10 1mctdb01 cib: [4857]: info: retrieveCib: Reading
cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
/var/lib/heartbeat/crm/cib.xml.sig)
Nov 17 05:37:10 1mctdb01 attrd: [4860]: info: register_with_ha: UUID:
c73aec4b-6947-40a2-b1e2-853d10c4d22d
Nov 17 05:37:10 1mctdb01 stonithd: [4859]: info:
G_main_add_SignalHandler: Added signal handler for signal 10
Nov 17 05:37:10 1mctdb01 stonithd: [4859]: info:
G_main_add_SignalHandler: Added signal handler for signal 12
Nov 17 05:37:10 1mctdb01 lrmd: [4858]: info: Started.
Nov 17 05:37:10 1mctdb01 crmd: [4861]: info: main: CRM Hg Version:
node: 552305612591183b1628baa5bc6e903e0f1e26a3
Nov 17 05:37:10 1mctdb01 crmd: [4861]: info: crmd_init: Starting crmd
Nov 17 05:37:10 1mctdb01 crmd: [4861]: info: G_main_add_SignalHandler:
Added signal handler for signal 15
Nov 17 05:37:10 1mctdb01 crmd: [4861]: info:
G_main_add_TriggerHandler: Added signal manual handler
Nov 17 05:37:10 1mctdb01 crmd: [4861]: info: G_main_add_SignalHandler:
Added signal handler for signal 17

3./var/ha-log
heartbeat[4806]: 2008/11/17_05:37:07 WARN: Logging daemon is disabled
--enabling logging daemon is recommended
heartbeat[4806]: 2008/11/17_05:37:07 info: **************************
heartbeat[4806]: 2008/11/17_05:37:07 info: Configuration validated.
Starting heartbeat 2.1.3
heartbeat[4807]: 2008/11/17_05:37:07 info: heartbeat: version 2.1.3
heartbeat[4807]: 2008/11/17_05:37:07 info: Heartbeat generation: 1211968635
heartbeat[4807]: 2008/11/17_05:37:08 info: glib: Starting serial
heartbeat on tty /dev/ttyS0 (19200 baud)
heartbeat[4807]: 2008/11/17_05:37:08 info: glib: ucast: write socket
priority set to IPTOS_LOWDELAY on eth1
heartbeat[4807]: 2008/11/17_05:37:08 info: glib: ucast: bound send
socket to device: eth1
heartbeat[4807]: 2008/11/17_05:37:08 info: glib: ucast: bound receive
socket to device: eth1
heartbeat[4807]: 2008/11/17_05:37:08 info: glib: ucast: started on
port 694 interface eth1 to 192.168.201.2
heartbeat[4807]: 2008/11/17_05:37:08 info: glib: ping heartbeat started.
heartbeat[4807]: 2008/11/17_05:37:08 info: G_main_add_TriggerHandler:
Added signal manual handler
heartbeat[4807]: 2008/11/17_05:37:08 info: G_main_add_TriggerHandler:
Added signal manual handler
heartbeat[4807]: 2008/11/17_05:37:08 notice: Using watchdog device:
/dev/watchdog
heartbeat[4807]: 2008/11/17_05:37:08 info: G_main_add_SignalHandler:
Added signal handler for signal 17
heartbeat[4807]: 2008/11/17_05:37:08 info: Local status now set to: 'up'
heartbeat[4807]: 2008/11/17_05:37:09 info: Link
192.168.70.254:192.168.70.254 up.
heartbeat[4807]: 2008/11/17_05:37:09 info: Status update for node
192.168.70.254: status ping
heartbeat[4807]: 2008/11/17_05:37:09 info: Link
1mctdb02.xxx-asp.jp:/dev/ttyS0 up.
heartbeat[4807]: 2008/11/17_05:37:09 info: Status update for node
1mctdb02.xxx-asp.jp: status active
heartbeat[4807]: 2008/11/17_05:37:09 info: Link 1mctdb02.xxx-asp.jp:eth1 up.
heartbeat[4807]: 2008/11/17_05:37:09 info: Comm_now_up(): updating
status to active
heartbeat[4807]: 2008/11/17_05:37:09 info: Local status now set to: 'active'
heartbeat[4807]: 2008/11/17_05:37:09 info: Starting child client
"/usr/lib/heartbeat/ccm" (400,400)
heartbeat[4807]: 2008/11/17_05:37:09 info: Starting child client
"/usr/lib/heartbeat/cib" (400,400)
heartbeat[4807]: 2008/11/17_05:37:09 info: Starting child client
"/usr/lib/heartbeat/lrmd -r" (0,0)
heartbeat[4857]: 2008/11/17_05:37:09 info: Starting
"/usr/lib/heartbeat/cib" as uid 400  gid 400 (pid 4857)
heartbeat[4807]: 2008/11/17_05:37:09 info: Starting child client
"/usr/lib/heartbeat/stonithd" (0,0)
heartbeat[4807]: 2008/11/17_05:37:09 info: Starting child client
"/usr/lib/heartbeat/attrd" (400,400)
heartbeat[4859]: 2008/11/17_05:37:09 info: Starting
"/usr/lib/heartbeat/stonithd" as uid 0  gid 0 (pid 4859)
heartbeat[4807]: 2008/11/17_05:37:09 info: Starting child client
"/usr/lib/heartbeat/crmd" (400,400)
heartbeat[4856]: 2008/11/17_05:37:09 info: Starting
"/usr/lib/heartbeat/ccm" as uid 400  gid 400 (pid 4856)
heartbeat[4807]: 2008/11/17_05:37:09 info: Starting child client
"/usr/lib/heartbeat/ipfail" (400,400)
heartbeat[4858]: 2008/11/17_05:37:09 info: Starting
"/usr/lib/heartbeat/lrmd -r" as uid 0  gid 0 (pid 4858)
heartbeat[4860]: 2008/11/17_05:37:09 info: Starting
"/usr/lib/heartbeat/attrd" as uid 400  gid 400 (pid 4860)
heartbeat[4807]: 2008/11/17_05:37:09 WARN: G_CH_dispatch_int: Dispatch
function for read child took too long to execute: 80 ms (> 50 ms)
(GSource: 0x9148158)
heartbeat[4862]: 2008/11/17_05:37:09 info: Starting
"/usr/lib/heartbeat/ipfail" as uid 400  gid 400 (pid 4862)
heartbeat[4861]: 2008/11/17_05:37:10 info: Starting
"/usr/lib/heartbeat/crmd" as uid 400  gid 400 (pid 4861)
lrmd[4858]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added
signal handler for signal 15
attrd[4860]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added
signal handler for signal 15
ccm[4856]: 2008/11/17_05:37:10 info: Hostname: 1mctdb01.xxx-asp.jp
attrd[4860]: 2008/11/17_05:37:10 info: register_with_ha: Hostname:
1mctdb01.xxx-asp.jp
cib[4857]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added
signal handler for signal 15
cib[4857]: 2008/11/17_05:37:10 info: G_main_add_TriggerHandler: Added
signal manual handler
cib[4857]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added
signal handler for signal 17
heartbeat[4807]: 2008/11/17_05:37:10 ERROR: api_process_request: bad
request [getrsc]
cib[4857]: 2008/11/17_05:37:10 info: main: Retrieval of a per-action
CIB: disabled
heartbeat[4807]: 2008/11/17_05:37:10 ERROR: MSG: Dumping message with 5 fields
heartbeat[4807]: 2008/11/17_05:37:10 ERROR: MSG[0] : [t=hbapi-req]
heartbeat[4807]: 2008/11/17_05:37:10 ERROR: MSG[1] : [reqtype=getrsc]
heartbeat[4807]: 2008/11/17_05:37:10 ERROR: MSG[2] : [dest=1mctdb01.xxx-asp.jp]
heartbeat[4807]: 2008/11/17_05:37:10 ERROR: MSG[3] : [pid=4862]
heartbeat[4807]: 2008/11/17_05:37:10 ERROR: MSG[4] : [from_id=ipfail]
ipfail[4862]: 2008/11/17_05:37:10 ERROR: No managed resources
heartbeat[4807]: 2008/11/17_05:37:10 WARN: Managed
/usr/lib/heartbeat/ipfail process 4862 exited with return code 100.
lrmd[4858]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added
signal handler for signal 17
lrmd[4858]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added
signal handler for signal 10
ccm[4856]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added
signal handler for signal 15
lrmd[4858]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added
signal handler for signal 12
cib[4857]: 2008/11/17_05:37:10 ERROR: crm_is_writable:
/var/lib/heartbeat/crm/cib.xml must be owned and r/w by user hacluster
cib[4857]: 2008/11/17_05:37:10 info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
/var/lib/heartbeat/crm/cib.xml.sig)
stonithd[4859]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler:
Added signal handler for signal 10
stonithd[4859]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler:
Added signal handler for signal 12
attrd[4860]: 2008/11/17_05:37:10 info: register_with_ha: UUID:
c73aec4b-6947-40a2-b1e2-853d10c4d22d
lrmd[4858]: 2008/11/17_05:37:10 info: Started.
crmd[4861]: 2008/11/17_05:37:10 info: main: CRM Hg Version: node:
552305612591183b1628baa5bc6e903e0f1e26a3

crmd[4861]: 2008/11/17_05:37:10 info: crmd_init: Starting crmd
crmd[4861]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added
signal handler for signal 15
crmd[4861]: 2008/11/17_05:37:10 info: G_main_add_TriggerHandler: Added
signal manual handler
crmd[4861]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added
signal handler for signal 17
cib[4857]: 2008/11/17_05:37:10 WARN: validate_cib_digest: No on-disk
digest present


あとこういったエラーログも出ていました

heartbeat[4807]: 2008/11/17_05:37:59 ERROR: ha_msg_addraw_ll: illegal field
heartbeat[4807]: 2008/11/17_05:37:59 ERROR: ha_msg_addraw():
ha_msg_addraw_ll failed
heartbeat[4807]: 2008/11/17_05:37:59 ERROR: NV failure (string2msg_ll):
heartbeat[4807]: 2008/11/17_05:37:59 ERROR: Input string: [>>>
t=cib
cib_clientid=5977
cib_callopt=1052672
cib_callid=4
cib_op=cib_apply_diff
cib_section=status
cib_delegated_from=1mctdb01.xxx-asp.jp
cib_clientname=5977
(4)cib_calldata=eJyV0EEOwiAQBdC912ia2AQSmE5bbcJZCC2zIFE0QD2/QrqnfuDb6XWhS2CyKsKQWaCB5vJwESWo/aZYlArFILlqOJVfLG3/qUJJS3o8/1FmEhFgj8cXAPxpoiYfzC4234a2
hg=483d2dbb
ts=49208487
ld=0.42 0.13 0.03<<
>>>
t=cib
cib_clientid=ab2880a6-3d82-472f-8275-2c9846356fcb
cited_from=1mctdb01.xxx-asp.jp
cib_clientname=4861
(4)cib_calldata=eJyVUtlu4yAUfe9vRJFaCSJW28mIb0Es1xk6trEAV8rfF9vpVG1nHvKCdLez3MvhcHhi7PJMXrSezAhaqyl60LmYAkhsleCVa7kBJyxuzqLFghiGLQWGO8k9JU54xjxi3dq9rDC
Kjq54S+ipeIdNnk+vM+Jb3aURe7DLFccUrmFSPupl9pVOJ8hxSQ5QS/jlmb0MaVSNRvMKlYMKU6/NScnY7zzs1+cm9VxMVXZj3HXPQUS+hv2sMY64KrnPPaFWdIpoQ4qb3+wGkasvlJZsphRdB/4KZk
e6m+HO+sBS8wZ+6MhaMcd9AYLKnvJXBOO5Coab7Nj+YanKqCfz2Asq/ZmWHQ1TGlaN9LNaD7urwl1R8MRbHTxwGS067+a1VjsW9g++JLXjP7zaYC6c0MNSHkvcWHK+SiHAFCO9vyvkpgzDbOey4lUCZ
NTyWg4/H49K/nHeAYBD8=
cib_isreplyto=1mctdb01.xxx-asp.jp
cib_update=true
(4)cib_update_diff=eJydU+1uozAQ/N/XiCI1kjl5/YXhxLNYxl5yqAEi41Tq258NtGma6KTLHwt2ZtezO+vdbvcCqn6lB2NGO6Axje+7joCq6ld2yN9FwGF6R9/sEpeJO+4nTmSqIw+ubxcmyBtm
ChPQOTReBnM5extxbiSlZL/fv5BVg/VDPxo8T+5PQwlAjq1/QP+VXVXll1rr/adWdq91QYnW5RNSgWi2tDhHGy/zmqpvUleElBoyb5w8mhzCR3quKFGaZf4pDI/0pDCRVbkxTMB5ugSHqwB2x70SiOT
6Z9YjF7/jRCjxlTM7M50fab+ihC+1et/40HpaU3Oe5mjGKfbdh/FpNSKaZF6VWdMZg439NDYrTvgyPheGwmN7ORZT6I/92PhpG/tVlqKZGYMd5z5XMG/40ciyBlY7rtsWvSg4c1UhHPBCo7KFBN9J5B
w0SqLUj/zBHnvXJMG//6MK8EWvPZ1M6hiArHNJDZgOk/UhOYqxYb9SxwsSnHHJ57zNYp2A2baHbsX6MWJ4t6cUEHKj+P6Ic2wcRQq6LXmXJDDWKuc9lxKBSduBRMI3V/L9eb1TU+ENQzJo2fM6Jn/W5
8VvTVqjyyHYBrmSW3SiLVQlykJQy4oWkBVacg/UCc+Yfzrlejz50uH71dt8RMudsJ1T1noG0tOSMc4sp9RqqNSa8hfq6W3F
oseq=62
_compression_algorithm=zlib
from_id=cib
to_id=cib
client_gen=3
src=1mctdb02.xxx-asp.jp
(1)srcuuid=PhGaPNvdTVSt7wPintDUbg==
seq=4234a3
hg=483d2dbb
ts=49208488
ld=0.42 0.13 0.03 2/89 24799
ttl=3
auth=1 2a06b826
<<<
]





Linux-ha-japan メーリングリストの案内
Back to archive index