From d52691ca338b9d03598f2346efe1dc5440a7e9c3 Mon Sep 17 00:00:00 2001 From: Rodrigo Arias Mallo Date: Fri, 5 Jul 2024 16:53:07 +0200 Subject: [PATCH] Add journal to the repository --- JOURNAL.md | 649 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 649 insertions(+) create mode 100644 JOURNAL.md diff --git a/JOURNAL.md b/JOURNAL.md new file mode 100644 index 0000000..e563135 --- /dev/null +++ b/JOURNAL.md @@ -0,0 +1,649 @@ +## 2024-07-02 + +Aleix provided some notes on how to trace the kernel using tracepoints that are +printed to the console: + +``` +BOOT TIME TRACING + - compile with CONFIG_BOOTTIME_TRACING=y + - add to kernel parameters: + trace_options=sym-addr trace_event=initcall:* tp_printk trace_buf_size=1M ftrace=function ftrace_filter="vfs*" + trace_options=sym-addr trace_event=sched:* tp_printk trace_buf_size=1M + - tp_printk sends tracepoint (TRACE_EVENT) to printk buffer + - trace_event=[subsytem:event]; accepts + - *: + - :* + - *:* all + - many more options at https://www.kernel.org/doc/html/latest/trace/boottime-trace.html +``` + +Start with: + +``` +trace_options=sym-addr trace_event=sched:* tp_printk trace_buf_size=1M loglevel=7 +``` + +Can it be caused by the D extension? If I set the ISA to: + +``` +riscv,isa = "rv64imaf"; +``` + +It locks the stage1 script without providing any output. Let see trying again. +Locked again. + +I can disable the FPU in the kernel, and then it will act as a detector of any +floating point instruction. + +Then I just need to rebuild busybox without support for double instructions. + +Let see if I can build busybox with double instruction + +Nope, the u-boot is reporting the d extension is in the isa: + +> riscv,isa = "rv64imafd"; + +## 2024-07-03 + +I cannot switch to `gcc.arch = rv64ima` because rust fails to build. + +**Assumption**: The extensions F and D work well and don't cause a hang in the +CPU. + +Let's go back and try to get the initrd shell, so we can systematically hang it +in the `switch_root` + +**Observation**: The riscv-timer seems to be causing interrupts with IRQ 5: + +``` +[ 62.439060] irq_handler_entry: irq=5 name=riscv-timer +[ 62.444980] irq_handler_exit: irq=5 ret=handled +``` + +**Observation**: Rohan reports the serial startup routine being running *after*{{{ +the init begins. + +**Observation**: Only interrupts in timer, others are zero. + +With: + + commit 4c656bd8ddd2e41ccaa976ff8d6bd9209175a632 (HEAD -> lagarto-ox) + Author: Rodrigo Arias Mallo + Date: Wed Jul 3 13:21:04 2024 +0200 + + Add busybox patch to see debug lines + + The switch_root command seems to hang in the execv() syscall. + +I can see this: + + ~ # cat /proc/interrupts ; sleep 10 ; cat /proc/interrupts + CPU0 + 10: 42926 RISC-V INTC 5 Edge riscv-timer + IPI0: 0 Rescheduling interrupts + IPI1: 0 Function call interrupts + IPI2: 0 CPU stop interrupts + IPI3: 0 CPU stop (for crash dump) interrupts + IPI4: 0 IRQ work interrupts + IPI5: 0 Timer broadcast interrupts + CPU0 + 10: 46023 RISC-V INTC 5 Edge riscv-timer + IPI0: 0 Rescheduling interrupts + IPI1: 0 Function call interrupts + IPI2: 0 CPU stop interrupts + IPI3: 0 CPU stop (for crash dump) interrupts + IPI4: 0 IRQ work interrupts + IPI5: 0 Timer broadcast interrupts + +**Observation**: There is a timer configured in 0x40170000 but in the device +tree we only have one at `timer@40002000`. + + #define OX_ALVEO_TIMER_BASE 0x40170000 + #define ADDR_TIME_L 0x0u // 32 lower bits of the time register + #define ADDR_TIME_H 0x1u // 32 higher bits of the time register + #define ADDR_TIMECMP_L 0x2u // 32 lower bits of the time comparator + #define ADDR_TIMECMP_H 0x3u // 32 higher bits of the time comparator + + https://gitlab.bsc.es/hwdesign/bsc-linux/-/blob/d6d194bd30d9a8fe49c2a278ffb3c3ae7852e75d/bsc_tree/patches/ox_alveo/opensbi/0001-opensbi-ox_alveo-platform.patch#L63 + +**Observation**: When the serial console starts, the speed of the serial port +changes to 9600: + + [ 6.845400] io scheduler mq-deadline registered + [ 6.851500] io scheduler kyber registered + [ 17.644460] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled + [ 18.141160] printk: console [ttyS0] disabled + [ 18.229480] 40001000.serial: ttyS0 at MMIO 0x40001000 (irq = 11, base_baud = 3125000) is a 16550 + + *** baud: 230400 *** + + *** baud: 460800 *** + + *** baud: 500000 *** + + *** baud: 576000 *** + + *** baud: 500000 *** + + *** baud: 460800 *** + + *** baud: 230400 *** + + *** baud: 115200 *** + + *** baud: 57600 *** + + *** baud: 38400 *** + + *** baud: 19200 *** + faaa0?xx + +**Observation**: Trying to read from the serial console /dev/ttyS0 causes no +more messages in the console (or a hang). + +**Question**: Can we make a heartbeat for the kernel? The idea is to keep a +counter in some memory of the kernel so we can see it from the host being moved. + +**Question**: Can we disable the serial driver 8250 from loading? + + initcall_blacklist= + +I need to know the 8250 init function name: + + drivers/tty/serial/8250/8250_core.c:static int __init serial8250_init(void) + +So... + + initcall_blacklist=serial8250_init + +Yes, but that doesn't seem to do anything. It is hanging: + + + modprobe dm_mod + [ 627.473580] stage-1-init: [Thu Jan 1 00:10:26 UTC 1970] + echo init /nix/store/v6pi2mqfgshxdsbyxlvpm9nvawxrpijv-nixos-system-nixos-riscv-23.11pre-git/init + [ 628.249440] stage-1-init: [Thu Jan 1 00:10:27 UTC 1970] + set -- init /nix/store/v6pi2mqfgshxdsbyxlvpm9nvawxrpijv-nixos-system-nixos-riscv-23.11pre-git/init + [ 629.004840] stage-1-init: [Thu Jan 1 00:10:28 UTC 1970] + stage2Init=/nix/store/v6pi2mqfgshxdsbyxlvpm9nvawxrpijv-nixos-system-nixos-riscv-23.11pre-git/init + [ 629.733920] stage-1-init: [Thu Jan 1 00:10:29 UTC 1970] + echo /nix/store/snvvqpxmryw1szlllk0bxpm37p8vj8sw-extra-utils/bin/modprobe + + +**Question**: What happens if we remap the interruptions? + +- Move the serial from 0 to 1 +- Move the plic from 3 to 2 and remove 7 + +Now we have one context only: + + [ 0.000000] riscv-intc: 64 local interrupts mapped + [ 0.000000] plic: plic@40800000: mapped 3 interrupts with 0 handlers for 1 contexts. + [ 0.000000] riscv: providing IPIs using SBI IPI extension + +Rather than two: + + [ 0.000000] riscv-intc: 64 local interrupts mapped + [ 0.000000] plic: plic@40800000: mapped 3 interrupts with 0 handlers for 2 contexts. + [ 0.000000] riscv: providing IPIs using SBI IPI extension + +**Question**: What happens if we block the `sbi_ipi` driver? + + initcall_blacklist=sbi_ipi_init + +Nothing, it cannot be disabled it seems. I will remove SMP support so it won't +be compiled in. + +**Observation**: Searching for 'riscv,plic0' only matches irq-sifive-plic driver. + + hut% rg 'riscv,plic0' + Documentation/devicetree/bindings/interrupt-controller/sifive,plic-1.0.0.yaml + 72: - const: riscv,plic0 + + drivers/irqchip/irq-sifive-plic.c + 572:IRQCHIP_DECLARE(riscv_plic0, "riscv,plic0", plic_init); /* for legacy systems */ + +So it looks that the only driver that setups the plic is the one used by SiFive. +Here is the doc: https://static.dev.sifive.com/U54-MC-RVCoreIP.pdf + +**Observation**: The number of handlers is 0, so there are no interruptions. + +It seems the number next to the phandle of the interrupts-extended attribute in +the plic follows a different convention of values. Using 9 and 11: + + plic: plic@40800000: mapped 3 interrupts with 1 handlers for 2 contexts. + +**Remark**: The key combination to run Magic SysRq using the HVC console is +Ctrl-O and then the SysRq key. It only works it the console is being actively +polled, otherwise it hangs. + +## 2024-07-04 + +**Observation**: I saw they changed this option in Cinco Ranch DTS for the +serial: + +> reg-shift = <0>; // regs are spaced on 8 bit boundary (modified from Xilinx UART16550 to be ns16550 compatible) + +Tested booting with debug1 and the ttyS0 console, and it goes extremely slow +(but still outputs at 115200) and then continues to fail to read keyboard input. + +**Question**: Let's try setting the console in poll mode. + +setenv bootargs "root=/dev/ram0 loglevel=7 debug rw earlycon=uart,io,0x40001000,115200n8 boot.trace console=uart,io,0x40001000,115200n8 debug1 init=/nix/store/wavmnv6wjj8y10ha07wxd5f0sqacivj8-nixos-system-nixos-riscv-23.11pre-git/init" + + [ 5.909360] io scheduler mq-deadline registered + [ 5.914900] io scheduler kyber registered + [ 14.405980] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled + [ 14.856040] 40001000.serial: ttyS0 at MMIO 0x40001000 (irq = 2, base_baud = 3125000) is a 16550 + [ 15.059680] nd_pmem namespace0.0: unable to guarantee persistence of writes + ... + [ 40.250100] clk: Disabling unused clocks + [ 40.256220] Warning: unable to open an initial console. <---- SEE THIS + [ 40.618300] Freeing unused kernel image (initmem) memory: 5592K + [ 40.728300] Checked W+X mappings: passed, no W+X pages found + [ 40.735540] Run /init as init process + [ 40.740080] with arguments: + [ 40.743920] /init + [ 40.746660] with environment: + [ 40.750740] HOME=/ + [ 40.754020] TERM=linux + [ 46.569960] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100 + [ 46.578100] CPU: 0 PID: 1 Comm: init Not tainted 6.6.1 #1-NixOS + [ 46.584600] Hardware name: Barcelona Supercomputing Center - Lagarto Ox (NixOS) (DT) + [ 46.592740] Call Trace: + [ 46.595640] [] dump_backtrace+0x38/0x48 + [ 46.601760] [] show_stack+0x50/0x68 + [ 46.607540] [] dump_stack_lvl+0x3c/0x5c + [ 46.613660] [] dump_stack+0x20/0x30 + [ 46.619400] [] panic+0x158/0x374 + [ 46.624900] [] do_exit+0x9e8/0x9f0 + [ 46.630580] [] do_group_exit+0x44/0xb0 + [ 46.636600] [] __wake_up_parent+0x0/0x40 + [ 46.642780] [] do_trap_ecall_u+0x14c/0x168 + [ 46.649140] [] ret_from_exception+0x0/0xac + [ 46.655500] Kernel Offset: 0x0 from 0xffffffff80000000 + [ 46.661160] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100 ]--- + +Also found: `no_console_suspend` + +**Observation**: There are messages of address space being assigned to +registers: + + Slave segment '/MEEP_uart_0/S_AXI/Reg' is being assigned into address space '/m_axi_uart0' at <0x0000_0000 [ 4K ]>. + Slave segment '/MEEP_uart_1/S_AXI/Reg' is being assigned into address space '/m_axi_uart1' at <0x0000_0000 [ 4K ]>. + +**Question**: What happens if I enable `CONFIG_CONSOLE_POLL`? + +With `console=ttyS0,115200n8 debug1` I cannot type. + +**Observation**: I can dump iomem memory with the tool devmem: + +But it seems I cannot dump the registers of the serial io mapped region: + + ~ # cat /proc/iomem + 40001000-400010ff : serial + 60000000-7fffffff : Reserved + 80000000-ffefffff : System RAM + 80201000-81fa0b87 : Kernel image + 80201000-80cb177f : Kernel code + 81400000-819fffff : Kernel rodata + 81c00000-81f18747 : Kernel data + 81f19000-81fa0b87 : Kernel bss + 100000000-1bfffffff : namespace0.0 + + ~ # devmem 0x40001000 + devmem: mmap: Operation not permitted + +It looks like the following options may be required to be disabled to allow +user-space tools read those regions. + + ~ # zcat /proc/config.gz | grep CONFIG_STRICT_DEVMEM + CONFIG_STRICT_DEVMEM=y + ~ # zcat /proc/config.gz | grep CONFIG_IO_STRICT_DEVMEM + CONFIG_IO_STRICT_DEVMEM=y + +Let's try disabling `CONFIG_STRICT_DEVMEM` and see if we can read the serial +registers. + +It works! + + ~ # cat /proc/iomem + 40001000-400010ff : serial + 60000000-7fffffff : Reserved + 80000000-ffefffff : System RAM + 80201000-81fa0b87 : Kernel image + 80201000-80cb159f : Kernel code + 81400000-819fffff : Kernel rodata + 81c00000-81f18707 : Kernel data + 81f19000-81fa0b87 : Kernel bss + 100000000-1bfffffff : namespace0.0 + ~ # devmem 0x40001000 + 0x0000006E + ~ # devmem 0x40001000 + 0x0000006C + ~ # devmem 0x40001000 + 0x00000072 + ~ # devmem 0x40001000 + 0x0000000D + +**Observation**: The interrupt register of the serial console is 0x0: + +Assuming the console registers follow AXI UART 16550, here is the IER: + +> 0x1004 IER R/W Interrupt Enable Register + +Which is zero: + +~ # devmem 0x40001004 +0x00000000 + +The line control register is 0x3: + +~ # devmem 0x4000100C +0x00000003 + +**Question**: Can I write to some memory address and see the result from the +host? + +For that I would need to find some address that is mapped to the DMA or to the +pmem. Xavi recommended `0x6000_0000` as it is uncached. + +It seems to have some content already: + + ~ # devmem 0x60000000 + 0x00000093 + ~ # devmem 0x60000004 + 0x00000113 + ~ # devmem 0x60000008 + 0x00000193 + ~ # devmem 0x60000010 + 0x00000293 + ~ # devmem 0x60000014 + 0x00000313 + ~ # devmem 0x60000018 + 0x00000393 + +Writing test seems to work: + + ~ # devmem 0x60000000 32 0x11223344 + ~ # devmem 0x60000000 + 0x11223344 + +So, technically we should be using the `/dev/qdma34000-MM-0` device, as we +already use the other one to map the memory. + + [bsc015557@fpgan02 nixos]$ for i in {0..16}; do addr=$(($i * 0x10000000)); \ + printf "addr 0x%09x: " $addr; dd if=/dev/qdma34000-MM-0 count=16 bs=1 skip=$addr 2>/dev/null | xxd; done + addr 0x000000000: 00000000: 4444 4444 4444 4444 4444 4444 4444 4444 DDDDDDDDDDDDDDDD + addr 0x010000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ + addr 0x020000000: 00000000: 3333 3333 3333 3333 3333 3333 3333 3333 3333333333333333 + addr 0x030000000: 00000000: cccc cccc cccc cccc cccc cccc cccc cccc ................ + addr 0x040000000: 00000000: cccc cccc cccc cccc cccc cccc cccc cccc ................ + addr 0x050000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ + addr 0x060000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ + addr 0x070000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ + addr 0x080000000: 00000000: 3333 3333 3333 3333 3333 3333 3333 3333 3333333333333333 + addr 0x090000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ + addr 0x0a0000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ + addr 0x0b0000000: 00000000: 2f2f 2f2f 2f2f 2f2f 2f2f 2f2f 2f2f 2f2f //////////////// + addr 0x0c0000000: 00000000: 6e6e 6e6e 6e6e 6e6e 6e6e 6e6e 6e6e 6e6e nnnnnnnnnnnnnnnn + addr 0x0d0000000: 00000000: 2020 2020 2020 2020 2020 2020 2020 2020 + addr 0x0e0000000: 00000000: 6c6c 6c6c 6c6c 6c6c 6c6c 6c6c 6c6c 6c6c llllllllllllllll + addr 0x0f0000000: 00000000: 6767 6767 6767 6767 6767 6767 6767 6767 gggggggggggggggg + addr 0x100000000: 00000000: 2424 2424 2424 2424 2424 2424 2424 2424 $$$$$$$$$$$$$$$$ + + [bsc015557@fpgan02 nixos]$ for i in {0..16}; do addr=$(($i * 0x10000000)); \ + printf "addr 0x%09x: " $addr; dd if=/dev/qdma34000-MM-1 count=16 bs=1 skip=$addr 2>/dev/null | xxd; done + addr 0x000000000: 00000000: 4444 4444 4444 4444 4444 4444 4444 4444 DDDDDDDDDDDDDDDD + addr 0x010000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ + addr 0x020000000: 00000000: 3333 3333 3333 3333 3333 3333 3333 3333 3333333333333333 + addr 0x030000000: 00000000: cccc cccc cccc cccc cccc cccc cccc cccc ................ + addr 0x040000000: 00000000: cccc cccc cccc cccc cccc cccc cccc cccc ................ + addr 0x050000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ + addr 0x060000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ + addr 0x070000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ + addr 0x080000000: 00000000: 3333 3333 3333 3333 3333 3333 3333 3333 3333333333333333 + addr 0x090000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ + addr 0x0a0000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ + addr 0x0b0000000: 00000000: 2f2f 2f2f 2f2f 2f2f 2f2f 2f2f 2f2f 2f2f //////////////// + addr 0x0c0000000: 00000000: 6e6e 6e6e 6e6e 6e6e 6e6e 6e6e 6e6e 6e6e nnnnnnnnnnnnnnnn + addr 0x0d0000000: 00000000: 2020 2020 2020 2020 2020 2020 2020 2020 + addr 0x0e0000000: 00000000: 6c6c 6c6c 6c6c 6c6c 6c6c 6c6c 6c6c 6c6c llllllllllllllll + addr 0x0f0000000: 00000000: 6767 6767 6767 6767 6767 6767 6767 6767 gggggggggggggggg + addr 0x100000000: 00000000: 2424 2424 2424 2424 2424 2424 2424 2424 $$$$$$$$$$$$$$$$ + +None of the two queues seem to have the value 0x11223344 at any multiple of `0x1000_0000`. + +Let's verify first that this method works. The kernel is loaded here: + + [bsc015557@fpgan02 nixos]$ printf '0x%x\n' $FPGACTL_KERNEL_ADDR + 0x24000000 + +So we should see the same values as the kernel file: + + [bsc015557@fpgan02 nixos]$ dd if=kernel.bin count=16 bs=1 2>/dev/null| xxd + 00000000: 6f00 400d 0000 0000 0000 2000 0000 0000 o.@....... ..... + +But we don't see the same: + + [bsc015557@fpgan02 nixos]$ dd if=/dev/qdma34000-MM-1 count=16 bs=1 skip=$FPGACTL_KERNEL_ADDR 2>/dev/null | xxd + 00000000: 9797 9797 9797 9797 9797 9797 9797 9797 ................ + [bsc015557@fpgan02 nixos]$ dd if=/dev/qdma34000-MM-0 count=16 bs=1 skip=$FPGACTL_KERNEL_ADDR 2>/dev/null | xxd + 00000000: 9797 9797 9797 9797 9797 9797 9797 9797 ................ +**Question**: Missing forward M to S via Mideleg? + +Can it be happening that he MEDELEG is not forwarding the interruptions to the +Supervisor (kernel)? + + Boot HART MIDELEG : 0x0000000000000222 + Boot HART MEDELEG : 0x000000000000b109 + +**Question**: Can we add a timer to the PLIC to test the interrupts? + +**Observation**: Here is the PLIC register dump: + + ~ # for i in `seq 0 16`; do addr=$((0x40600000 + $i)); printf '%08x: ' $addr; devmem $addr; done + 40600000: 0x00010002 + 40600001: 0x09000000 + 40600002: 0x00090000 + 40600003: 0x00000900 + 40600004: 0x00010009 + 40600005: 0x00000000 + 40600006: 0x00000000 + 40600007: 0x00000000 + 40600008: 0x00000000 + 40600009: 0x00000000 + 4060000a: 0x00000000 + 4060000b: 0x00000000 + 4060000c: 0x00000000 + 4060000d: 0x00000000 + 4060000e: 0x00000000 + 4060000f: 0x00000000 + 40600010: 0x00000000 + +**Question**: Can we boot with the new bitstream that includes the second UART? + +The interruptions are enabled for the UART 1, not the default UART 0. + + +**Observation**: I'm using 0x100 not 0x1000 in the serial range: + + reg = <0x0 0x40003000 0x0 0x100>; + reg = <0x0 0x40003000 0x0 0x1000>; + +Can this produce any problem? + +It doesn't seem to change anything, still unable to send any bytes. + +**Question**: Can we use virtio to mount a FS in the DMA shared memory? + +## 2024-07-05 + +**Observation**: The kernel continues working when the console hangs. + +Switching to 0x100000000 as 0x60000000 shows: + + ~ # devmem 0x6000000 + 0xBADCAB1E + +With the following loop: + + ~ # i=0; while [ 1 ]; do let i=$i+1; devmem 0x100000000 32 $i; done & + ~ # cat /dev/ttyS0 + (hangs) + +Shows the kernel works: + + [bsc015557@fpgan02 nixos]$ while [ 1 ]; do xxd -s $((0x100000000 - 0x60000000)) -l 4 /dev/qdma34000-MM-1; sleep 0.2; done + ... + a0000000: 6400 0000 d... + a0000000: 6500 0000 e... + a0000000: 6600 0000 f... + a0000000: 6700 0000 g... + a0000000: 6800 0000 h... + a0000000: 6900 0000 i... + +**Question**: Can we reproduce it with `switch_root`? + +For that I would have to ensure the process continues to operate, even if we +exit the console. Maybe I can make a double fork? + +I cannot use `0x1000_0000` as that is where the pmem will be. But I can try to +use an address in the end, as we are not filling the whole space. + + [0x1_0000_0000, 0x1_c000_0000) -> PMEM (3072 MiB) + +Maybe `0x1_bfff_0000`? Let's try first from the initrd shell. + + i=0; while [ 1 ]; do let i=$i+1; devmem 0x1bfff0000 32 $i; done & + +Then + + while [ 1 ]; do xxd -s $((0x1bfff0000 - 0x60000000)) -l 4 /dev/qdma34000-MM-1; sleep 0.2; done + +Yes, it seems to be working. Let's load the rootfs too. + +I added a loop in the stage1 script. + +**Question**: Can we see any clock in memory? This will allow us to check if the +AXI still works. + +**Observation**: The kernel stops updating the counter in the mount phase. + +Managed to reach the mount and hang there: + + [ 337.504740] stage-1-init: [Thu Jan 1 00:05:36 UTC 1970] + '[' -d + /dev/disk/by-label/NIXOS_SD ] + [ 338.284560] stage-1-init: [Thu Jan 1 00:05:37 UTC 1970] + mkdir -m 0755 -p + /mnt-root/ + [ 339.017420] stage-1-init: [Thu Jan 1 00:05:38 UTC 1970] + local 'n=0' + [ 339.752560] stage-1-init: [Thu Jan 1 00:05:39 UTC 1970] + true + [ 340.488960] stage-1-init: [Thu Jan 1 00:05:39 UTC 1970] + mount /mnt-root/ + +After almost 6 minutes, with 571 beats: + + $ xxd -s 5905514496 -l 4 /dev/qdma34000-MM-1 + 5fff0000: 3b02 0000 ;... + +It looks like the kernel is the one getting stuck *or* at least is unable to +propagate the heartbeat changes to the host. It would be nice to monitor a +hardware clock from the DMA region too, so we can discard problems in the AXI. + + + [ 177.009540] stage-1-init: [Thu Jan 1 00:02:56 UTC 1970] + udevadm settle + + kbd_mode -u -C /dev/console + kbd_mode: KDSKBMODE: Inappropriate ioctl for device + + printf '\033%%G' + + loadkmap + [ 266.301040] stage-1-init: [Thu Jan 1 00:04:25 UTC 1970] + kbd_mode -u -C /dev/console + +**Assumption**: The kernel hangs. + +If the kernel hangs, there must be an instruction or sequence of instructions +that causes it. First I need to determine what is being executed by the kernel. +For that I could use `ftrace` to see which program is running at the time it +hangs. + + trace_options=sym-addr trace_event=initcall:* tp_printk trace_buf_size=1M + + (prev_comm != 2 && next_comm != 2) + +So, we can just enable the `tp_printk` but not the tracer. Then in the initrd +script, I enable the function tracer and the filter. + +**Observation**: It takes a long time to init the pty: + +Interesting timing: + + [ 12.612620] initcall_start: func=pty_init+0x0/0x3f4 + [ 20.962640] initcall_finish: func=pty_init+0x0/0x3f4 ret=0 + +**Observation**: The kcompactd0 daemon is using the CPU: + + [ 290.394920] sched_switch: prev_comm=devmem prev_pid=129 prev_prio=120 prev_state=R ==> next_comm=init next_pid=69 next_prio=120 + [ 290.408160] sched_switch: prev_comm=init prev_pid=69 prev_prio=120 prev_state=R ==> next_comm=tee next_pid=68 next_prio=120 + [ 290.420720] sched_switch: prev_comm=tee prev_pid=68 prev_prio=120 prev_state=R+ ==> next_comm=ksoftirqd/0 next_pid=12 next_prio=120 + [ 290.433960] sched_switch: prev_comm=ksoftirqd/0 prev_pid=12 prev_prio=120 prev_state=R ==> next_comm=init next_pid=1 next_prio=120 + [ 290.447100] sched_switch: prev_comm=init prev_pid=1 prev_prio=120 prev_state=R ==> next_comm=kcompactd0 next_pid=22 next_prio=120 + [ 290.460180] sched_switch: prev_comm=kcompactd0 prev_pid=22 prev_prio=120 prev_state=R ==> next_comm=khvcd next_pid=31 next_prio=120 + [ 290.473400] sched_switch: prev_comm=khvcd prev_pid=31 prev_prio=120 prev_state=R ==> next_comm=kworker/u2:2 next_pid=19 next_prio=120 + [ 290.486960] sched_switch: prev_comm=kworker/u2:2 prev_pid=19 prev_prio=120 prev_state=R ==> next_comm=khungtaskd next_pid=18 next_prio=120 + [ 290.500800] sched_switch: prev_comm=khungtaskd prev_pid=18 prev_prio=120 prev_state=R ==> next_comm=kworker/0:1 next_pid=13 next_prio=120 + [ 290.514560] sched_switch: prev_comm=kworker/0:1 prev_pid=13 prev_prio=120 prev_state=R ==> next_comm=tee next_pid=68 next_prio=120 + [ 290.527720] sched_switch: prev_comm=tee prev_pid=68 prev_prio=120 prev_state=R+ ==> next_comm=init next_pid=69 next_prio=120 + [ 290.540360] sched_switch: prev_comm=init prev_pid=69 prev_prio=120 prev_state=R ==> next_comm=devmem next_pid=129 next_prio=120 + [ 290.553280] sched_switch: prev_comm=devmem prev_pid=129 prev_prio=120 prev_state=R ==> next_comm=ksoftirqd/0 next_pid=12 next_prio=120 + [ 290.566780] sched_switch: prev_comm=ksoftirqd/0 prev_pid=12 prev_prio=120 prev_state=R ==> next_comm=kcompactd0 next_pid=22 next_prio=120 + [ 290.580500] sched_switch: prev_comm=kcompactd0 prev_pid=22 prev_prio=120 prev_state=R ==> next_comm=init next_pid=1 next_prio=120 + [ 290.593740] sched_switch: prev_comm=init prev_pid=1 prev_prio=120 prev_state=R ==> next_comm=khvcd next_pid=31 next_prio=120 + [ 290.606340] sched_switch: prev_comm=khvcd prev_pid=31 prev_prio=120 prev_state=R ==> next_comm=kworker/u2:2 next_pid=19 next_prio=120 + [ 290.619780] sched_switch: prev_comm=kworker/u2:2 prev_pid=19 prev_prio=120 prev_state=R ==> next_comm=khungtaskd next_pid=18 next_prio=120 + [ 290.633620] sched_switch: prev_comm=khungtaskd prev_pid=18 prev_prio=120 prev_state=R ==> next_comm=tee next_pid=68 next_prio=120 + [ 290.646700] sched_switch: prev_comm=tee prev_pid=68 prev_prio=120 prev_state=R+ ==> next_comm=init next_pid=69 next_prio=120 + [ 290.659320] sched_switch: prev_comm=init prev_pid=69 prev_prio=120 prev_state=R ==> next_comm=kworker/0:1 next_pid=13 next_prio=120 + [ 290.672560] sched_switch: prev_comm=kworker/0:1 prev_pid=13 prev_prio=120 prev_state=R ==> next_comm=devmem next_pid=129 next_prio=120 + [ 290.686080] sched_switch: prev_comm=devmem prev_pid=129 prev_prio=120 prev_state=R ==> next_comm=ksoftirqd/0 next_pid=12 next_prio=120 + [ 290.699720] sched_switch: prev_comm=ksoftirqd/0 prev_pid=12 prev_prio=120 prev_state=R ==> next_comm=init next_pid=1 next_prio=120 + [ 290.712880] sched_switch: prev_comm=init prev_pid=1 prev_prio=120 prev_state=R ==> next_comm=khvcd next_pid=31 next_prio=120 + [ 290.725500] sched_switch: prev_comm=khvcd prev_pid=31 prev_prio=120 prev_state=R ==> next_comm=kcompactd0 next_pid=22 next_prio=120 + +**Question**: Can we reproduce this hang with 6.9.7? +