Ubuntu内核调试要点(下)——常见问题
前两讲我们分别介绍了安装调试符号和源文件,以及设置目标机和开始调试会话,这一讲我们介绍一下大家可能遇到的两个常见问题:架构不匹配和内核地址错位。
架构不匹配
如果目标机是32位的,那么一般没有问题,如果是64位的,那么主机端的GDB可能误以为对方是32位的,表现出的症状就是找不到函数的符号(因为错把64位的地址当作32位,只取一半),如果观察寄存器,那么也都不对,比如:
(gdb) info registers
eax 0x1b 27
ecx 0x0 0
edx 0x67 103
ebx 0x0 0
esp 0x6 0x6 <irq_stack_union+6>
ebp 0x0 0x0 <irq_stack_union>
esi 0x0 0
edi 0x0 0
eip 0x246 0x246 <irq_stack_union+582>
eflags 0x0 [ ]
cs 0x3f40dc80 1061215360
ss 0xffff96dd -26915
ds 0x1e5be38 31833656
es 0xffffaf96 -20586
fs 0x1e5be38 31833656
gs 0xffffaf96 -20586
仔细观察上面的寄存器内容,对于熟悉x86架构的读者,很容易看出很多异常,比如段寄存器都是16位,数值应该很小,可上面显示的却很大,再比如程序指针寄存器(eip)的值也太小。这都是因为GDB把64位的寄存器上下文强行按32位来对待了。
有人说,GDB真的这么傻么?真的如此。为了了解GDB里的原委,老雷还特意启动第二个GDB,以GDB调试GDB,截图如下:
上图左侧是做内核调试的GDB,右侧便是调试左侧GDB的GDB。
当我们执行target remote命令时,GDB就会初始化一个gdbarch实例,而此时还未与目标机建立连接,不知道对方是32位还是64位,所以此时GDB是始终初始化一个32为的gdbarch,过程如下:
#0 i386_gdbarch_init (info=..., arches=0x1240c00) at /build/gdb-9un5Xp/gdb-7.11.1/gdb/i386-tdep.c:8255
#1 0x00000000005e218f in gdbarch_find_by_info (info=...) at /build/gdb-9un5Xp/gdb-7.11.1/gdb/gdbarch.c:5100
#2 0x00000000005e2b71 in gdbarch_update_p (info=...) at /build/gdb-9un5Xp/gdb-7.11.1/gdb/arch-utils.c:522
#3 0x00000000006bb2c3 in target_clear_description () at /build/gdb-9un5Xp/gdb-7.11.1/gdb/target-descriptions.c:400
#4 0x0000000000600637 in target_pre_inferior (from_tty=from_tty@entry=1) at /build/gdb-9un5Xp/gdb-7.11.1/gdb/target.c:2161
#5 0x0000000000601aad in target_preopen (from_tty=from_tty@entry=1) at /build/gdb-9un5Xp/gdb-7.11.1/gdb/target.c:2220
#6 0x00000000004c6b73 in remote_open_1 (name=0x1082e7e "/dev/ttyS0", from_tty=1, target=0xc44040 <remote_ops>, extended_p=0)
at /build/gdb-9un5Xp/gdb-7.11.1/gdb/remote.c:4886
#7 0x00000000005f4538 in open_target (args=0x1082e7e "/dev/ttyS0", from_tty=1, command=<optimized out>)
at /build/gdb-9un5Xp/gdb-7.11.1/gdb/target.c:356
#8 0x000000000069dbc6 in execute_command (p=<optimized out>, p@entry=0x1082e70 "", from_tty=1) at /build/gdb-9un5Xp/gdb-7.11.1/gdb/top.c:475
#9 0x00000000005d490c in command_handler (command=0x1082e70 "") at /build/gdb-9un5Xp/gdb-7.11.1/gdb/event-top.c:491
#10 0x00000000005d4fef in command_line_handler (rl=<optimized out>) at /build/gdb-9un5Xp/gdb-7.11.1/gdb/event-top.c:690
#11 0x00007f9a096586f5 in rl_callback_read_char () from /lib/x86_64-linux-gnu/libreadline.so.6
#12 0x00000000005d4969 in rl_callback_read_char_wrapper (client_data=<optimized out>) at /build/gdb-9un5Xp/gdb-7.11.1/gdb/event-top.c:171
#13 0x00000000005d49b3 in stdin_event_handler (error=<optimized out>, client_data=0x0) at /build/gdb-9un5Xp/gdb-7.11.1/gdb/event-top.c:430
#14 0x00000000005d3795 in gdb_wait_for_event (block=block@entry=1) at /build/gdb-9un5Xp/gdb-7.11.1/gdb/event-loop.c:834
#15 0x00000000005d3939 in gdb_do_one_event () at /build/gdb-9un5Xp/gdb-7.11.1/gdb/event-loop.c:323
#16 0x00000000005d3a7e in start_event_loop () at /build/gdb-9un5Xp/gdb-7.11.1/gdb/event-loop.c:347
#17 0x00000000005cd443 in captured_command_loop (data=data@entry=0x0) at /build/gdb-9un5Xp/gdb-7.11.1/gdb/main.c:318
#18 0x00000000005ca25d in catch_errors (func=func@entry=0x5cd430 <captured_command_loop>, func_args=func_args@entry=0x0,
errstring=errstring@entry=0x7abfeb "", mask=mask@entry=RETURN_MASK_ALL) at /build/gdb-9un5Xp/gdb-7.11.1/gdb/exceptions.c:240
#19 0x00000000005ce036 in captured_main (data=data@entry=0x7ffec93a15d0) at /build/gdb-9un5Xp/gdb-7.11.1/gdb/main.c:1157
#20 0x00000000005ca25d in catch_errors (func=func@entry=0x5cd990 <captured_main>, func_args=func_args@entry=0x7ffec93a15d0,
errstring=errstring@entry=0x7abfeb "", mask=mask@entry=RETURN_MASK_ALL) at /build/gdb-9un5Xp/gdb-7.11.1/gdb/exceptions.c:240
#21 0x00000000005ce90b in gdb_main (args=args@entry=0x7ffec93a15d0) at /build/gdb-9un5Xp/gdb-7.11.1/gdb/main.c:1165
#22 0x000000000045ecd5 in main (argc=<optimized out>, argv=<optimized out>) at /build/gdb-9un5Xp/gdb-7.11.1/gdb/gdb.c:32
GDB虽然代码量不小,但是设计逻辑还是很清楚的,一个current_inferior_全局变量用来记录当前的被调试对象。在GDB中,调试对象被统称为inferior,是下属晚辈的意思,老雷将其翻译为“下程”(下属程序之意)。
在current_inferior_结构体中,有一个gdbarch成员,就是用来记录当前调试目标的架构的(32/64位之类),即:
(gdb) p *current_inferior_
$20 = {next = 0x0, num = 1, pid = 42000, fake_pid_p = 1,
highest_thread_num = 384, control = {stop_soon = STOP_QUIETLY_REMOTE},
removable = 0, aspace = 0x22d3fc0, pspace = 0x11992b0, args = 0x0,
argc = 0, argv = 0x0, terminal = 0x0, environment = 0x11c9780,
attach_flag = 0, vfork_parent = 0x0, vfork_child = 0x0,
pending_detach = 0, waiting_for_vfork_done = 0, detaching = 0,
continuations = 0x0, needs_setup = 0, priv = 0x0, has_exit_code = 0,
exit_code = 0, symfile_flags = 0, tdesc_info = 0x11f2e30,
gdbarch = 0x1242b80, registry_data = {data = 0x123d160, num_data = 7}}
观察gdbarch,可以看到它内部的函数指向的都是32位版本(以i386开头)。
可以用watch命令设置硬件断点监视gdbarch字段的变化,但是当GDB与目标建立连接是,这个断点并没有命中。至少在老雷调试的GDB中,它不会自动检测目标的架构并作切换。
怎么办呢?答案是需要执行如下命令手工切换:
(gdb) set architecture i386:x86-64
The target architecture is assumed to be i386:x86-64
这样切换后,再观察gdbarch,就是64位版本的了。
(gdb) p *gdbarch
$42 = {initialized_p = 1, obstack = 0x12e40b0, bfd_arch_info = 0x952dc0 <bfd_x86_64_arch>, byte_order = BFD_ENDIAN_LITTLE,
byte_order_for_code = BFD_ENDIAN_LITTLE, osabi = GDB_OSABI_LINUX, target_desc = 0x0, tdep = 0x12e3f10, dump_tdep = 0x0, nr_data = 26,
data = 0x12f9990, bits_big_endian = 0, short_bit = 16, int_bit = 32, long_bit = 64, long_long_bit = 64, long_long_align_bit = 32, half_bit = 16,
half_format = 0xc37970 <floatformats_ieee_half>, float_bit = 32, float_format = 0xc37960 <floatformats_ieee_single>, double_bit = 64,
double_format = 0xc37950 <floatformats_ieee_double>, long_double_bit = 128, long_double_format = 0xc37930 <floatformats_i387_ext>, ptr_bit = 64,
addr_bit = 64, dwarf2_addr_size = 8, char_signed = 1, read_pc = 0x0, write_pc = 0x46bf90 <amd64_linux_write_pc>,
virtual_frame_pointer = 0x5e29f0 <legacy_virtual_frame_pointer>, pseudo_register_read = 0x0,
pseudo_register_read_value = 0x45fc30 <amd64_pseudo_register_read_value>, pseudo_register_write = 0x45fb20 <amd64_pseudo_register_write>,
num_regs = 152, num_pseudo_regs = 52, ax_pseudo_register_collect = 0x0, ax_pseudo_register_push_stack = 0x0, sp_regnum = 7, pc_regnum = 16,
ps_regnum = 17, fp0_regnum = 24, stab_reg_to_regnum = 0x45f9c0 <amd64_dwarf_reg_to_regnum>, ecoff_reg_to_regnum = 0x5e2990 <no_op_reg_to_regnum>,
sdb_reg_to_regnum = 0x47ef30 <i386_dbx_reg_to_regnum>, dwarf2_reg_to_regnum = 0x45f9c0 <amd64_dwarf_reg_to_regnum>,
register_name = 0x6bb050 <tdesc_register_name>, register_type = 0x6bc1a0 <tdesc_register_type>, dummy_id = 0x45f950 <amd64_dummy_id>,
deprecated_fp_regnum = -1, push_dummy_call = 0x467330 <amd64_push_dummy_call>, call_dummy_location = 1,
push_dummy_code = 0x477b80 <i386_push_dummy_code>, print_registers_info = 0x5aef10 <default_print_registers_info>,
print_float_info = 0x488170 <i387_print_float_info>, print_vector_info = 0x0, register_sim_regno = 0x5e2890 <legacy_register_sim_regno>,
cannot_fetch_register = 0x5e29e0 <cannot_register_not>, cannot_store_register = 0x5e29e0 <cannot_register_not>,
get_longjmp_target = 0x45f400 <amd64_get_longjmp_target>, believe_pcc_promotion = 0, convert_register_p = 0x488c50 <i387_convert_register_p>,
register_to_value = 0x488c80 <i387_register_to_value>, value_to_register = 0x488dc0 <i387_value_to_register>,
value_from_register = 0x561a10 <default_value_from_register>, pointer_to_address = 0x561790 <unsigned_pointer_to_address>,
address_to_pointer = 0x5617f0 <unsigned_address_to_pointer>, integer_to_address = 0x0, return_value = 0x465e20 <amd64_return_value>,
return_in_first_hidden_param_p = 0x5e3880 <default_return_in_first_hidden_param_p>, skip_prologue = 0x4674a0 <amd64_skip_prologue>,
skip_main_prologue = 0x0, skip_entrypoint = 0x0, inner_than = 0x5e2950 <core_addr_lessthan>,
breakpoint_from_pc = 0x477b10 <i386_breakpoint_from_pc>, remote_breakpoint_from_pc = 0x5e3850 <default_remote_breakpoint_from_pc>,
adjust_breakpoint_address = 0x0, memory_insert_breakpoint = 0x5f1ba0 <default_memory_insert_breakpoint>,
memory_remove_breakpoint = 0x5f1c70 <default_memory_remove_breakpoint>, decr_pc_after_break = 1, deprecated_function_start_offset = 0,
remote_register_number = 0x6bb020 <tdesc_remote_register_number>, fetch_tls_load_module_address = 0x493be0 <svr4_fetch_objfile_link_map>,
frame_args_skip = 8, unwind_pc = 0x47a250 <i386_unwind_pc>, unwind_sp = 0x0, frame_num_args = 0x0, frame_align = 0x45efd0 <amd64_frame_align>,
stabs_argument_has_addr = 0x5e2ac0 <default_stabs_argument_has_addr>, frame_red_zone_size = 128,
convert_from_func_ptr_addr = 0x5e2980 <convert_from_func_ptr_addr_identity>, addr_bits_remove = 0x5e2970 <core_addr_identity>,
software_single_step = 0x0, single_step_through_delay = 0x0, print_insn = 0x47e170 <i386_print_insn>,
skip_trampoline_code = 0x613c30 <find_solib_trampoline_target>, skip_solib_resolver = 0x490380 <glibc_skip_solib_resolver>,
in_solib_return_trampoline = 0x5e2930 <generic_in_solib_return_trampoline>, stack_frame_destroyed_p = 0x5e2940 <generic_stack_frame_destroyed_p>,
elf_make_msymbol_special = 0x0, coff_make_msymbol_special = 0x5e29a0 <default_coff_make_msymbol_special>,
make_symbol_special = 0x5e29b0 <default_make_symbol_special>, adjust_dwarf2_addr = 0x5e29c0 <default_adjust_dwarf2_addr>,
adjust_dwarf2_line = 0x5e29d0 <default_adjust_dwarf2_line>, cannot_step_breakpoint = 0, have_nonsteppable_watchpoint = 0,
address_class_type_flags = 0x0, address_class_type_flags_to_name = 0x0, address_class_name_to_type_flags = 0x0,
register_reggroup_p = 0x474950 <amd64_linux_register_reggroup_p>, fetch_pointer_argument = 0x4796f0 <i386_fetch_pointer_argument>,
iterate_over_regset_sections = 0x46b610 <amd64_linux_iterate_over_regset_sections>, make_corefile_notes = 0x496c60 <linux_make_corefile_notes>,
elfcore_write_linux_prpsinfo = 0x0, find_memory_regions = 0x4976a0 <linux_find_memory_regions>, core_xfer_shared_libraries = 0x0,
core_xfer_shared_libraries_aix = 0x0, core_pid_to_str = 0x495e30 <linux_core_pid_to_str>, core_thread_name = 0x0, gcore_bfd_target = 0x0,
vtable_function_descriptors = 0, vbit_in_delta = 0, skip_permanent_breakpoint = 0x5e38c0 <default_skip_permanent_breakpoint>, max_insn_length = 16,
displaced_step_copy_insn = 0x467770 <amd64_displaced_step_copy_insn>,
displaced_step_hw_singlestep = 0x5e2810 <default_displaced_step_hw_singlestep>, displaced_step_fixup = 0x467b10 <amd64_displaced_step_fixup>,
displaced_step_free_closure = 0x5e2800 <simple_displaced_step_free_closure>, displaced_step_location = 0x4981d0 <linux_displaced_step_location>,
relocate_instruction = 0x45f0f0 <amd64_relocate_instruction>, overlay_update = 0x0,
core_read_description = 0x4748a0 <amd64_linux_core_read_description>, static_transform_name = 0x0, sofun_address_maybe_missing = 0,
process_record = 0x480860 <i386_process_record>, process_record_signal = 0x474800 <amd64_linux_record_signal>,
gdb_signal_from_target = 0x494280 <linux_gdb_signal_from_target>, gdb_signal_to_target = 0x4944c0 <linux_gdb_signal_to_target>,
get_siginfo_type = 0x48b2d0 <x86_linux_get_siginfo_type>, record_special_symbol = 0x0,
get_syscall_number = 0x46bf10 <amd64_linux_get_syscall_number>, xml_syscall_file = 0x79fce7 "syscalls/amd64-linux.xml", syscalls_info = 0x0,
---Type <return> to continue, or q <return> to quit---
stap_integer_prefixes = 0x79f3b0 <stap_integer_prefixes>, stap_integer_suffixes = 0x0, stap_register_prefixes = 0x79f3a0 <stap_register_prefixes>,
stap_register_suffixes = 0x0, stap_register_indirection_prefixes = 0x79f390 <stap_register_indirection_prefixes>,
stap_register_indirection_suffixes = 0x79f380 <stap_register_indirection_suffixes>, stap_gdb_register_prefix = 0x0, stap_gdb_register_suffix = 0x0,
stap_is_single_operand = 0x478270 <i386_stap_is_single_operand>, stap_parse_special_token = 0x478b80 <i386_stap_parse_special_token>,
dtrace_parse_probe_argument = 0x474990 <amd64_dtrace_parse_probe_argument>, dtrace_probe_is_enabled = 0x46c8e0 <amd64_dtrace_probe_is_enabled>,
dtrace_enable_probe = 0x46c8c0 <amd64_dtrace_enable_probe>, dtrace_disable_probe = 0x46c8a0 <amd64_dtrace_disable_probe>, has_global_solist = 0,
has_global_breakpoints = 0, has_shared_address_space = 0x4981c0 <linux_has_shared_address_space>,
fast_tracepoint_valid_at = 0x4795d0 <i386_fast_tracepoint_valid_at>, auto_charset = 0x566ae0 <default_auto_charset>,
auto_wide_charset = 0x566af0 <default_auto_wide_charset>, solib_symbols_extension = 0x0, has_dos_based_file_system = 0,
gen_return_address = 0x45f090 <amd64_gen_return_address>, info_proc = 0x494f70 <linux_info_proc>, core_info_proc = 0x4976f0 <linux_core_info_proc>,
iterate_over_objfiles_in_search_order = 0x60fa60 <default_iterate_over_objfiles_in_search_order>, ravenscar_ops = 0x0,
insn_is_call = 0x45f080 <amd64_insn_is_call>, insn_is_ret = 0x45f070 <amd64_insn_is_ret>, insn_is_jump = 0x45f060 <amd64_insn_is_jump>,
auxv_parse = 0x0, vsyscall_range = 0x494990 <linux_vsyscall_range>, infcall_mmap = 0x494800 <linux_infcall_mmap>,
infcall_munmap = 0x494710 <linux_infcall_munmap>, gcc_target_options = 0x5e3970 <default_gcc_target_options>,
gnu_triplet_regexp = 0x477ba0 <i386_gnu_triplet_regexp>, addressable_memory_unit_size = 0x5e39d0 <default_addressable_memory_unit_size>}
接下来,再观察寄存器,就对了,比如:
(gdb) i r
rax 0x2f 47
rbx 0xffffffff81efadc0 -2114998848
rcx 0xffffffff81e5f568 -2115635864
rdx 0x0 0
rsi 0x246 582
rdi 0x246 582
rbp 0xffffc90000643da8 0xffffc90000643da8
rsp 0xffffc90000643da8 0xffffc90000643da8
r8 0xb57e8 743400
r9 0x207 519
r10 0xd2a81d91 3534232977
r11 0xffffffff822487ed -2111535123
r12 0xffff880111a2be40 -131936804487616
r13 0x0 0
r14 0x55a9bbc50cf0 94187488087280
r15 0x55a9bbc503c0 94187488084928
rip 0xffffffff81141344 0xffffffff81141344 <kgdb_breakpoint+20>
eflags 0x202 [ IF ]
cs 0x10 16
ss 0x0 0
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0xb 11
内核地址错位
第二个常见的问题是,GDB里找不到函数的名字,长话短说是因为在GDB看来,内核的编译地址和运行时的地址是一样的,所以就直接用符号文件中查找到的地址来访问目标机器。但其实,今天的较高版本内核(3.14+)可能启用了KASLR(内核空间地址随机化),把内核也做了重定位。
不妨做个小实验来理解KASLR,先观察/proc/kallsyms中的vfs_read函数的地址(这是实际运行时的地址),再观察/boot目录下编译时的地址,会发现二者明显不同。
gedu@gedu-VirtualBox:~$ sudo cat /proc/kallsyms | grep " vfs_read"
ffffffff86c31960 T vfs_readf
ffffffff86c32ce0 T vfs_read
ffffffff86c33210 T vfs_readv
gedu@gedu-VirtualBox:~$ sudo cat /boot/System.map-4.8.0-36-generic | grep " vfs_read"
ffffffff81231960 T vfs_readf
ffffffff81232ce0 T vfs_read
ffffffff81233210 T vfs_readv
KASLR是一项安全措施,与黑客捉迷藏。但这样躲躲闪闪,让GDB也蒙了。
那么如何让告诉GDB内核搬家了呢?今天用的方法大多都是禁止KASLR,也就是在内核的命令行中加入nokaslr,然后重启问题就消除了。
读到这里,大家是不觉得陷阱很多啊。是的,新问题,新挑战总是有的。GNU领袖RMS在GDB教程的封面如此写道:“Don't worry if it doesn't work right. If everything did, you'd be out of job.”翻译一下:“别担心它工作的不好。如果一切都工作的好,那么你就没工作了。” 其实,我并不很赞同这句话,因为不够积极。不过RMS如此说,也只是开个玩笑而已,看看他的相册,可以看到他走到哪里就在哪里掏出本子改BUG啊。 :-)