调试笔记:BSOD 0xA
Stop A (IRQL_NOT_LESS_OR_EQUAL ) 是比较多见又难以解决的一个内核问题。本文以一个真实例子对其稍加分析。
首先把内核转储文件调入到WinDbg中,并使用!analyze -v了解概况如下:
IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: ffefad8c, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000000, value 0 = read operation, 1 = write operation
Arg4: 8042c157, address which referenced memory
Debugging Details:
------------------
OVERLAPPED_MODULE: kmixer
READ_ADDRESS: ffefad8c Nonpaged pool expansion
CURRENT_IRQL: 2
FAULTING_IP:
nt!KeWaitForSingleObject+4f
8042c157 803b02 cmp byte ptr [ebx],0x2
DEFAULT_BUCKET_ID: INTEL_CPU_MICROCODE_ZERO
BUGCHECK_STR: 0xA
LAST_CONTROL_TRANSFER: from 80414f03 to 8042c157
IRP_ADDRESS: 815bfce8
TRAP_FRAME: b4ff5800 -- (.trap ffffffffb4ff5800)
ErrCode = 00000000
eax=00000001 ebx=ffefad8c ecx=00000000 edx=b4ff58b8 esi=81807340 edi=818073ac
eip=8042c157 esp=b4ff5874 ebp=b4ff5894 iopl=0 nv up ei pl zr na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010246
nt!KeWaitForSingleObject+0x4f:
8042c157 803b02 cmp byte ptr [ebx],0x2 ds:0023:ffefad8c=??
Resetting default scope
STACK_TEXT:
b4ff5894 80414f03 ffefad8c 00000000 00000000 nt!KeWaitForSingleObject+0x4f
b4ff58d0 8041457c 815f7b70 00d0b0b0 b4ff58e8 nt!ExpWaitForResource+0x2d
b4ff58e0 804145c1 b4ff5908 804d5d07 bad0b0b0 nt!ExpAcquireResourceExclusiveLite+0x64
b4ff58e8 804d5d07 bad0b0b0 00000001 815f7b70 nt!ExAcquireResourceExclusiveLite+0x37
b4ff5908 8044e981 815f7b88 815f7b88 00000000 nt!ObpRemoveObjectRoutine+0x47
b4ff592c 80423b1a 81807340 815bfd28 80064bd4 nt!ObfDereferenceObject+0x149
b4ff5994 8042f5da 815bfd28 b4ff59d0 b4ff59c4 nt!IopCompleteRequest+0x190
b4ff59d4 80065303 00000000 00000000 b4ff59ec nt!KiDeliverApc+0x8e
b4ff59d4 80064beb 00000000 00000000 b4ff59ec hal!HalpApcInterrupt+0xaf
b4ff5a5c 8042ce6e 00000000 00000000 b4ff5a9c hal!KfLowerIrql+0x17
b4ff5a6c 8041e152 815bfd28 815f7b88 00000000 nt!KeInsertQueueApc+0x3c
b4ff5a9c eb5760c8 818c4bf0 815bfce8 818c4bf0 nt!IopfCompleteRequest+0x258
b4ff5ae4 bfafe5a7 818c4bf0 815bfce8 8041dded sysaudio!PinDispatchIoControl+0x18d
b4ff5af0 8041dded 818c4bf0 815bfce8 815f7b88 KS!DispatchDeviceIoControl+0x25
b4ff5b04 bfaf8829 b4ea0610 e2cffd18 e2cffd08 nt!IopfCallDriver+0x35
b4ff5b30 b4e9e102 815f7b88 00000000 002f0003 KS!KsSynchronousIoControlDevice+0xbb
b4ff5b8c b4e9e18d 815f7b88 00000029 ffffffff AudSight!ProSetVol+0x105 [c:\work\icafe\audsight\hookiat.c @ 979]
b4ff5bac b4e9d716 81762d08 00000000 81762d08 AudSight!SetVol+0x2f [c:\work\icafe\audsight\hookiat.c @ 1053]
b4ff5bc8 b4e9d78f 81942d28 00000001 81762d08 AudSight!AudSightDeviceControl+0x158 [c:\work\icafe\audsight\audsight.c @ 388]
b4ff5c00 8041dded 8180ea60 81762d08 816fb928 AudSight!AudSightDispatch+0x59 [c:\work\icafe\audsight\audsight.c @ 510]
b4ff5c14 804ae80c 81762d1c 00000000 816fb928 nt!IopfCallDriver+0x35
b4ff5c28 804af676 8180ea60 816fb928 81942d28 nt!IopSynchronousServiceTail+0x60
b4ff5d00 804a71ee 000000c8 00000000 00000000 nt!IopXxxControlFile+0x5e4
b4ff5d34 80464f84 000000c8 00000000 00000000 nt!NtDeviceIoControlFile+0x28
b4ff5d34 77f82ca0 000000c8 00000000 00000000 nt!KiSystemService+0xc4
00bbf5e0 77e694e7 000000c8 00000000 00000000 ntdll!NtDeviceIoControlFile+0xb
00bbf644 00401353 000000c8 83050020 00bbf66c KERNEL32!DeviceIoControl+0xf8
WARNING: Stack unwind information not available. Following frames may be wrong.
00bbf684 004012fe 00000000 00000000 00000051 SvcServ+0x1353
00bbf6c4 00401703 000003eb 00000051 004029e6 SvcServ+0x12fe
00bbf700 77e1a3d0 00020048 0000046d 00000003 SvcServ+0x1703
00bbf720 77df6381 004028f3 00020048 0000046d USER32!UserCallWinProc+0x18
00bbf750 77df68c4 0056fed0 0000046d 00000003 USER32!SendMessageWorker+0x31e
00bbf770 0040208a 00020048 0000046d 00000003 USER32!SendMessageA+0x44
00bbf7a8 0040213b 00bbf7e8 00000000 00000004 SvcServ+0x208a
00bbf814 00401f85 00402070 00419548 0014eb70 SvcServ+0x213b
00bbf850 779c776d 00aa4bb8 0000000d 0014ec3c SvcServ+0x1f85
00bbf888 779924ac 00aa4bb8 0000000d 0014ec3c OLEAUT32!IDispatch_Invoke_Stub+0x6d
00bbf8c0 7875a3b7 00bbf9dc 00000000 00000000 OLEAUT32!IDispatch_RemoteInvoke_Thunk+0x3c
00bbfb7c 78753a2c 0014e438 0014cc2c 0014f178 RPCRT4!NdrStubCall2+0x604
00bbfbe0 779c916b 0014e438 0014f178 0014cc2c RPCRT4!CStdStubBuffer_Invoke+0xc8
00bbfc00 77b05616 00152a48 0014f178 0014cc2c OLEAUT32!CDispStubWrapper::Invoke+0xfb
00bbfc44 77b058f1 0014f178 00152ac4 0014be88 ole32!SyncStubInvoke+0x61
00bbfc8c 77a981ed 0014f178 00156a10 00152a48 ole32!StubInvoke+0xa8
00bbfcf0 77a8a60c 0014cc2c 00000000 00152a48 ole32!CCtxComChnl::ContextInvoke+0xbb
00bbfd0c 77a8a476 0014f178 00000001 00152a48 ole32!MTAInvoke+0x18
00bbfd3c 77b054ce 0014f178 00000001 00152a48 ole32!STAInvoke+0x56
00bbfd70 77b05c1f 0014f128 0014cc2c 00152a48 ole32!AppInvoke+0x88
00bbfe30 77b05969 001458c8 00000000 00000000 ole32!ComInvokeWithLockAndIPID+0x297
00bbfe50 77a945aa 0014f128 00000400 00139d50 ole32!ComInvoke+0x41
00bbfe60 77a4c3de 0014f128 00bbff48 00bbff50 ole32!ThreadDispatch+0x21
FOLLOWUP_IP:
sysaudio!PinDispatchIoControl+18d
eb5760c8 8bc6 mov eax,esi
SYMBOL_STACK_INDEX: c
FOLLOWUP_NAME: MachineOwner
SYMBOL_NAME: sysaudio!PinDispatchIoControl+18d
MODULE_NAME: sysaudio
IMAGE_NAME: sysaudio.sys
DEBUG_FLR_IMAGE_TIMESTAMP: 3e9cda58
STACK_COMMAND: .trap ffffffffb4ff5800 ; kb
FAILURE_BUCKET_ID: 0xA_sysaudio!PinDispatchIoControl+18d
BUCKET_ID: 0xA_sysaudio!PinDispatchIoControl+18d
Followup: MachineOwner
---------
第一、从以上数据很容易看出导致BSOD的直接原因:在较高的IRQL调用了KeWaitForSingleObject().
ANALYZE命令告诉我们目前的IRQL是2即DISPATCH_LEVEL. 根据DDK,KeWaitForSingleObject()一般应该在PASSIVE_LEVEL(0)调用, 但也可以有条件的在DISPATCH_LEVEL调用. 这个条件就是 第五个参数一定要为0.
ANALYZE命令的STACK DUMP 仅显示前三个参数,但因为目前的EBP已知,所以很容易打印其它参数:
kd> dd b4ff5894
b4ff5894 b4ff58d0 80414f03 ffefad8c 00000000
b4ff58a4 00000000 00000000 b4ff58b8 00000000
EBP+8是第一个参数,易见第5个参数是b4ff58b8(不为0的负数).
#define PASSIVE_LEVEL 0 // Passive release level
#define LOW_LEVEL 0 // Lowest interrupt level
#define APC_LEVEL 1 // APC interrupt level
#define DISPATCH_LEVEL 2 // Dispatcher level
#define PROFILE_LEVEL 27 // timer used for profiling.
#define CLOCK1_LEVEL 28 // Interval clock 1 level - Not used on x86
#define CLOCK2_LEVEL 28 // Interval clock 2 level
#define IPI_LEVEL 29 // Interprocessor interrupt level
#define POWER_LEVEL 30 // Power failure level
#define HIGH_LEVEL 31 // Highest interrupt level
但是知道了直接原因还离解决问题很远. 比须继续分析,于是顺着STACK CALL寻找WAIT的原因, 可以推测是要删除一个内核对象,ObfDereferenceObject()导致对象引用COUNT为0,于是触发对象删除行为.
再向下分析可以看到比较详细的IRP触发和派送和COMPLETE过程. 至此, 可以大概推测导致问题的根本原因可能是在对一个文件对象发送IRP之前没有通过必要的API(如ObRefenceObjectByXXX)增加引用记数. 而在这个IRP处理过程中其它线程减少了对此对象COUNT,且减到0. 于是到此IRP的COMPLETE过程时,再次引发对象删除过程,于是BSOD.