场景:公司用户VM,老出现IP异常为169的问题,找网络专家定位发现该问题是VM的OS问题,于是开一个高级case给微软并协助抓取日志,花费一个多月,初步的定位结论: 应该是audiodg.exe(声卡)在创建的时候继承了父进程(DHCP Client Service)句柄,导致一个EndPoint没有释放,
所以此EndPoint收到ip续租的回包内存不足(与前面的分析对应),导致续租失败。
附其分析日志,但存在一些疑问,微软工程师答复他们是借助他们源代码调试分析的,我们自己拿到dump,很非常困难调试分析的,但其中有些疑问在微软那里得不到明确的答复,于是特来请张老师指点: 从AFD日志:
datagram dropped: 2: Process 0x8876e658, Endpoint 0x86cc5ee0, Buffer 0x87fc44d2, Length 312, Address 10.66.104.1:67, Seq 10001, Reason Insufficient local buffer space
所以我们可以确定,出问题的endpoint地址是0x86cc5ee0,DHCP client service进程是0x8876e658
从dump中我们发现下面的TCP Endpoint 绑定在端口68
0x89748150
对应的AFD Endpoint就是0x86cc5ee0
在一台干净的机器上,这些endpoints应该在DHCP client service 关闭socket的时候清除。对应的调用堆栈是
fffff880`0367f818 fffff880`024b3c90 tcpip!UdpCloseEndpoint fffff880`0367f820 fffff880`024b4122 afd!AfdCleanupCore+0x410 fffff880`0367f9a0 fffff800`017d968f afd!AfdDispatch+0x42 fffff880`0367f9f0 fffff800`017bf304 nt!IopCloseFile+0x11f fffff880`0367fa80 fffff800`017d9181 nt!ObpDecrementHandleCount+0xb4 fffff880`0367fb00 fffff800`017d9094 nt!ObpCloseHandleTableEntry+0xb1 fffff880`0367fb90 fffff800`014c5153 nt!ObpCloseHandle+0x94 fffff880`0367fbe0 00000000`77bfffaa nt!KiSystemServiceCopyEnd+0x13 00000000`014ef8f8 000007fe`fd3419ca ntdll!ZwClose+0xa
从dump中看,对应的AfdCleanupCore函数没有被调用过,根据之前svchost.exe (dhcp client service) dump中可以确定,用户模式的socket handle已经被关闭了。
这里唯一的解释就是有程序或者驱动打开了对应的afd endpoint的文件对象(file_object)
从dump中,我一共找到了130+ AFD ENDPOINT的文件对象,最后找到了0x86b1c480 ----如何从核心dump日志中,可以通过其他方法得到endpoint下的所有文件对象???? 2: kd> !object 86b1c480 Object: 86b1c480 Type: (86960f78) File ObjectHeader: 86b1c468 (new version) HandleCount: 1 PointerCount: 1 Directory Object: 00000000 Name: \Endpoint {Afd}
应该还有句柄打开这。 接下来我又遍历了系统的句柄表,最后发现audiodg.exe ---------如何通过文件对象来查相应的进程名?
04b8: Object: 86b1c480 GrantedAccess: 0016019f (Inherit)
Inherit表示继承,也就是说这个句柄是从父进程继承下来的。那么audiodg.exe 父进程是谁呢? 从dump来看就是dhcp client service 所在的svchost进程。 从MSDN的解释来看
A child process can inherit handles from its parent process. An inherited handle is valid only in the context of the child process. To enable a child process to inherit open handles from its parent process, use the following steps.
1. Create the handle with the bInheritHandle member of the SECURITY_ATTRIBUTES structure set to TRUE.
2. Create the child process using the CreateProcess function, with the bInheritHandles parameter set to TRUE.
应该是audiodg.exe在创建的时候继承了父进程句柄。从进程创建时间来看:
2: kd> !process 86c7aa58 1 PROCESS 86c7aa58 SessionId: 0 Cid: 0528 Peb: 7ffde000 ParentCid: 03b4 DirBase: eeaf6bc0 ObjectTable: be3423b0 HandleCount: 209. Image: audiodg.exe VadRoot 88de3a00 Vads 83 Clone 0 Private 2545. Modified 8805. Locked 0. DeviceMap b31c6d48 Token be258668 ElapsedTime 7 Days 08:14:05.302 UserTime 00:00:06.328 KernelTime 00:00:09.390
Dump创建时间 Mon Oct 22 17:05:19.201 2012 (UTC + 8:00)
UDP_ENDPOINT创建时间:Mon Oct 15 08:51:13.023 2012 (UTC + 8:00)
所以也可以确认audiodg.exe 正是那个时候创建的。17:05:19 – 08:14:05 = 08:51:14
从今天你收集的smc –stop smc –start 之后问题消失点的dump来看:最后创建的几个进程是:
PROCESS 88696278 SessionId: 0 Cid: 1fc8 Peb: 7ffdc000 ParentCid: 02a4 DirBase: eeddf240 ObjectTable: ac4e1698 HandleCount: 727. Image: Smc.exe
PROCESS 88068bf8 SessionId: 1 Cid: 2084 Peb: 7ffd6000 ParentCid: 1fc8 DirBase: eeddfec0 ObjectTable: 900d3d20 HandleCount: 345. Image: SmcGui.exe
PROCESS 89b02d40 SessionId: 0 Cid: 1e70 Peb: 7ffd4000 ParentCid: 0eb8 DirBase: eeddfe20 ObjectTable: b28e6968 HandleCount: 172. Image: w3wp.exe
PROCESS 883ff030 SessionId: 0 Cid: 1c8c Peb: 7ffd5000 ParentCid: 03bc DirBase: eeddfca0 ObjectTable: b23740a0 HandleCount: 127. Image: audiodg.exe
PROCESS 88fc4778 SessionId: 1 Cid: 18c4 Peb: 7ffdf000 ParentCid: 0314 DirBase: eeddfd00 ObjectTable: a5eae7c0 HandleCount: 124. Image: dllhost.exe
PROCESS 87d36030 SessionId: 1 Cid: 1ba8 Peb: 7ffde000 ParentCid: 1580 DirBase: eeddfd80 ObjectTable: b1b5a228 HandleCount: 96. Image: NotMyfault.exe
这也说明问题消失前,audiodg.exe 被重起了。 我会继续察看audiodg.exe的创建过程。
|