刚吃完饭,看了一下 DUMP:
首先,栈上的参数确实是不一样,但这不是问题的所在:
b8a60aa8 8084c103 0000001a 00003452 60839101 nt!KeBugCheckEx+0x1b b8a60c64 8084c24d 88cc8d88 88cc8d88 00000000 nt!MiDeleteAddressesInWorkingSet+0x155 b8a60c84 8094b539 88cc8fb0 88cf12f8 88cf1538 nt!MmCleanProcessAddressSpace+0x111 b8a60d0c 8094b68d 00000003 00000000 88cc8d88 nt!PspExitThread+0x5f1 b8a60d24 8094b887 88cf12f8 00000003 00000001 nt!PspTerminateThreadByPointer+0x4b b8a60d54 80888c6c 00000000 00000003 0012ff14 nt!NtTerminateProcess+0x125 b8a60d54 7c95ed54 00000000 00000003 0012ff14 nt!KiFastCallEntry+0xfc
我怀疑是编译器生成的代码,“优化”破坏了参数 PEPROCESS。
问题的实际原因是 PTE 的校验出错。MiDeleteAddressesInWorkingSet 部分源码如下:
VOID MiDeleteAddressesInWorkingSet ( IN PEPROCESS Process )
/*++
Routine Description:
This routine deletes all user mode addresses from the working set list.
Arguments:
Process - Supplies a pointer to the current process.
Return Value:
None.
Environment:
Kernel mode, Working Set Lock held.
--*/
{ PVOID Va; PMMPTE PointerPte; ......
......
Va = Wsle->u1.VirtualAddress;
// // Ensure the WSLE and the PTE are both valid so that any // conflict between them can be resolved before both are deleted. //
PointerPte = MiGetPteAddress (Va);
if (PointerPte->u.Hard.Valid == 0) { KeBugCheckEx (MEMORY_MANAGEMENT, 0x3452, (ULONG_PTR) Va, (ULONG_PTR) Wsle, (ULONG_PTR) PointerPte->u.Long); }
...... }
这对应楼主的蓝屏(nt!MiDeleteAddressesInWorkingSet+0x155):
0: kd> nt!MiDeleteAddressesInWorkingSet+0x13f: 8084c0ed c9 leave 8084c0ee c20400 ret 4 8084c0f1 ff36 push dword ptr [esi] 8084c0f3 ff75fc push dword ptr [ebp-4] 8084c0f6 51 push ecx 8084c0f7 6852340000 push 3452h 8084c0fc 6a1a push 1Ah 8084c0fe e833b3fdff call nt!KeBugCheckEx (80827436) 0: kd> nt!MiDeleteAddressesInWorkingSet+0x155: 8084c103 cc int 3 8084c104 41 inc ecx 8084c105 20647269 and byte ptr [edx+esi*2+69h],ah 8084c109 7665 jbe nt!MmCleanProcessAddressSpace+0x34 (8084c170) 8084c10b 7220 jb nt!MiDeleteAddressesInWorkingSet+0x17f (8084c12d) 8084c10d 686173206c push 6C207361h 8084c112 6561 popad 8084c114 6b656420 imul esp,dword ptr [ebp+64h],20h
上次看 Bug Check Code Reference 的时候,我很好奇 Bug Check 0x1A: MEMORY_MANAGEMENT 蓝屏的参数,微软告诉我们 —— Parameter 1 is the only parameter of interest; this identifies the exact violation. 那么,剩下的参数都是干什么的呢?
MEMORY_MANAGEMENT (1a) # Any other values for parameter 1 must be individually examined. Arguments: Arg1: 00003452, The subtype of the bugcheck. Arg2: 60839101 Arg3: c0881fec Arg4: 00000000
现在我知道了,原来参数4 是 (ULONG_PTR) PointerPte->u.Long。很好!
对照 \base\ntos\mm\i386\mi386.h 可以看出这个联合体和 if (PointerPte->u.Hard.Valid == 0) 的关系:
typedef struct _MMPTE { union { ULONG Long; HARDWARE_PTE Flush; MMPTE_HARDWARE Hard; MMPTE_PROTOTYPE Proto; MMPTE_SOFTWARE Soft; MMPTE_TRANSITION Trans; MMPTE_SUBSECTION Subsect; MMPTE_LIST List; } u; } MMPTE;
而这个32位的 PTE 的位是这样定义的:
// // A Page Table Entry on the x86 has the following definition. // Note the MP version is to avoid stalls when flushing TBs across processors. //
typedef struct _MMPTE_HARDWARE { ULONG Valid : 1; #if defined(NT_UP) ULONG Write : 1; // UP version #else ULONG Writable : 1; // changed for MP version #endif ULONG Owner : 1; ULONG WriteThrough : 1; ULONG CacheDisable : 1; ULONG Accessed : 1; ULONG Dirty : 1; ULONG LargePage : 1; ULONG Global : 1; ULONG CopyOnWrite : 1; // software field ULONG Prototype : 1; // software field #if defined(NT_UP) ULONG reserved : 1; // software field #else ULONG Write : 1; // software field - MP change #endif ULONG PageFrameNumber : 20; } MMPTE_HARDWARE, *PMMPTE_HARDWARE;
现在很清楚了,内核在校验 if (PointerPte->u.Hard.Valid == 0) 时主动产生的蓝屏。此时的 PTE 是 00000000 (PointerPte->u.Long)。PointerPte->u.Hard.Valid 所指的是 PTE 的 Present 位(bit0, P flag),也就是说这是一个无效PTE。而正常情况下不是这样的。
OK。谁破坏了页面属性呢? 另 Oracle 数据库会影响页面大小等吗?(Arg1: 00003452, The subtype of the bugcheck. Arg2: 004b2221 Arg3: c0884bfc Arg4: 00000080 - Page size (PS) flag, bit 7,When the flag is set, the page size is 4 MBytes for normal 32-bit addressing (and 2 MBytes if extended physical addressing is enabled) and the pagedirectory entry points to a page.)
|