## 2024-07-02 Aleix provided some notes on how to trace the kernel using tracepoints that are printed to the console: ``` BOOT TIME TRACING - compile with CONFIG_BOOTTIME_TRACING=y - add to kernel parameters: trace_options=sym-addr trace_event=initcall:* tp_printk trace_buf_size=1M ftrace=function ftrace_filter="vfs*" trace_options=sym-addr trace_event=sched:* tp_printk trace_buf_size=1M - tp_printk sends tracepoint (TRACE_EVENT) to printk buffer - trace_event=[subsytem:event]; accepts - *: - :* - *:* all - many more options at https://www.kernel.org/doc/html/latest/trace/boottime-trace.html ``` Start with: ``` trace_options=sym-addr trace_event=sched:* tp_printk trace_buf_size=1M loglevel=7 ``` Can it be caused by the D extension? If I set the ISA to: ``` riscv,isa = "rv64imaf"; ``` It locks the stage1 script without providing any output. Let see trying again. Locked again. I can disable the FPU in the kernel, and then it will act as a detector of any floating point instruction. Then I just need to rebuild busybox without support for double instructions. Let see if I can build busybox with double instruction Nope, the u-boot is reporting the d extension is in the isa: > riscv,isa = "rv64imafd"; ## 2024-07-03 I cannot switch to `gcc.arch = rv64ima` because rust fails to build. **Assumption**: The extensions F and D work well and don't cause a hang in the CPU. Let's go back and try to get the initrd shell, so we can systematically hang it in the `switch_root` ### OBSERVATION: The riscv-timer seems to be causing interrupts with IRQ 5: ``` [ 62.439060] irq_handler_entry: irq=5 name=riscv-timer [ 62.444980] irq_handler_exit: irq=5 ret=handled ``` ### OBSERVATION: Rohan reports the serial startup routine being running *after* the init begins. ### OBSERVATION: Only interrupts in timer, others are zero. With: commit 4c656bd8ddd2e41ccaa976ff8d6bd9209175a632 (HEAD -> lagarto-ox) Author: Rodrigo Arias Mallo Date: Wed Jul 3 13:21:04 2024 +0200 Add busybox patch to see debug lines The switch_root command seems to hang in the execv() syscall. I can see this: ~ # cat /proc/interrupts ; sleep 10 ; cat /proc/interrupts CPU0 10: 42926 RISC-V INTC 5 Edge riscv-timer IPI0: 0 Rescheduling interrupts IPI1: 0 Function call interrupts IPI2: 0 CPU stop interrupts IPI3: 0 CPU stop (for crash dump) interrupts IPI4: 0 IRQ work interrupts IPI5: 0 Timer broadcast interrupts CPU0 10: 46023 RISC-V INTC 5 Edge riscv-timer IPI0: 0 Rescheduling interrupts IPI1: 0 Function call interrupts IPI2: 0 CPU stop interrupts IPI3: 0 CPU stop (for crash dump) interrupts IPI4: 0 IRQ work interrupts IPI5: 0 Timer broadcast interrupts ### OBSERVATION: There is a timer configured in 0x40170000 but in the device tree we only have one at `timer@40002000`. #define OX_ALVEO_TIMER_BASE 0x40170000 #define ADDR_TIME_L 0x0u // 32 lower bits of the time register #define ADDR_TIME_H 0x1u // 32 higher bits of the time register #define ADDR_TIMECMP_L 0x2u // 32 lower bits of the time comparator #define ADDR_TIMECMP_H 0x3u // 32 higher bits of the time comparator https://gitlab.bsc.es/hwdesign/bsc-linux/-/blob/d6d194bd30d9a8fe49c2a278ffb3c3ae7852e75d/bsc_tree/patches/ox_alveo/opensbi/0001-opensbi-ox_alveo-platform.patch#L63 ### OBSERVATION: When the serial console starts, the speed of the serial port changes to 9600: [ 6.845400] io scheduler mq-deadline registered [ 6.851500] io scheduler kyber registered [ 17.644460] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled [ 18.141160] printk: console [ttyS0] disabled [ 18.229480] 40001000.serial: ttyS0 at MMIO 0x40001000 (irq = 11, base_baud = 3125000) is a 16550 *** baud: 230400 *** *** baud: 460800 *** *** baud: 500000 *** *** baud: 576000 *** *** baud: 500000 *** *** baud: 460800 *** *** baud: 230400 *** *** baud: 115200 *** *** baud: 57600 *** *** baud: 38400 *** *** baud: 19200 *** faaa0?xx ### OBSERVATION: Trying to read from the serial console /dev/ttyS0 causes no more messages in the console (or a hang). ### QUESTION: Can we make a heartbeat for the kernel? The idea is to keep a counter in some memory of the kernel so we can see it from the host being moved. ### QUESTION: Can we disable the serial driver 8250 from loading? initcall_blacklist= I need to know the 8250 init function name: drivers/tty/serial/8250/8250_core.c:static int __init serial8250_init(void) So... initcall_blacklist=serial8250_init Yes, but that doesn't seem to do anything. It is hanging: + modprobe dm_mod [ 627.473580] stage-1-init: [Thu Jan 1 00:10:26 UTC 1970] + echo init /nix/store/v6pi2mqfgshxdsbyxlvpm9nvawxrpijv-nixos-system-nixos-riscv-23.11pre-git/init [ 628.249440] stage-1-init: [Thu Jan 1 00:10:27 UTC 1970] + set -- init /nix/store/v6pi2mqfgshxdsbyxlvpm9nvawxrpijv-nixos-system-nixos-riscv-23.11pre-git/init [ 629.004840] stage-1-init: [Thu Jan 1 00:10:28 UTC 1970] + stage2Init=/nix/store/v6pi2mqfgshxdsbyxlvpm9nvawxrpijv-nixos-system-nixos-riscv-23.11pre-git/init [ 629.733920] stage-1-init: [Thu Jan 1 00:10:29 UTC 1970] + echo /nix/store/snvvqpxmryw1szlllk0bxpm37p8vj8sw-extra-utils/bin/modprobe ### QUESTION: What happens if we remap the interruptions? - Move the serial from 0 to 1 - Move the plic from 3 to 2 and remove 7 Now we have one context only: [ 0.000000] riscv-intc: 64 local interrupts mapped [ 0.000000] plic: plic@40800000: mapped 3 interrupts with 0 handlers for 1 contexts. [ 0.000000] riscv: providing IPIs using SBI IPI extension Rather than two: [ 0.000000] riscv-intc: 64 local interrupts mapped [ 0.000000] plic: plic@40800000: mapped 3 interrupts with 0 handlers for 2 contexts. [ 0.000000] riscv: providing IPIs using SBI IPI extension ### QUESTION: What happens if we block the `sbi_ipi` driver? initcall_blacklist=sbi_ipi_init Nothing, it cannot be disabled it seems. I will remove SMP support so it won't be compiled in. ### OBSERVATION: Searching for 'riscv,plic0' only matches irq-sifive-plic driver. hut% rg 'riscv,plic0' Documentation/devicetree/bindings/interrupt-controller/sifive,plic-1.0.0.yaml 72: - const: riscv,plic0 drivers/irqchip/irq-sifive-plic.c 572:IRQCHIP_DECLARE(riscv_plic0, "riscv,plic0", plic_init); /* for legacy systems */ So it looks that the only driver that setups the plic is the one used by SiFive. Here is the doc: https://static.dev.sifive.com/U54-MC-RVCoreIP.pdf ### OBSERVATION: The number of handlers is 0, so there are no interruptions. It seems the number next to the phandle of the interrupts-extended attribute in the plic follows a different convention of values. Using 9 and 11: plic: plic@40800000: mapped 3 interrupts with 1 handlers for 2 contexts. **Remark**: The key combination to run Magic SysRq using the HVC console is Ctrl-O and then the SysRq key. It only works it the console is being actively polled, otherwise it hangs. ## 2024-07-04 ### OBSERVATION: I saw they changed this option in Cinco Ranch DTS for the serial: > reg-shift = <0>; // regs are spaced on 8 bit boundary (modified from Xilinx UART16550 to be ns16550 compatible) Tested booting with debug1 and the ttyS0 console, and it goes extremely slow (but still outputs at 115200) and then continues to fail to read keyboard input. ### QUESTION: Let's try setting the console in poll mode. setenv bootargs "root=/dev/ram0 loglevel=7 debug rw earlycon=uart,io,0x40001000,115200n8 boot.trace console=uart,io,0x40001000,115200n8 debug1 init=/nix/store/wavmnv6wjj8y10ha07wxd5f0sqacivj8-nixos-system-nixos-riscv-23.11pre-git/init" [ 5.909360] io scheduler mq-deadline registered [ 5.914900] io scheduler kyber registered [ 14.405980] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled [ 14.856040] 40001000.serial: ttyS0 at MMIO 0x40001000 (irq = 2, base_baud = 3125000) is a 16550 [ 15.059680] nd_pmem namespace0.0: unable to guarantee persistence of writes ... [ 40.250100] clk: Disabling unused clocks [ 40.256220] Warning: unable to open an initial console. <---- SEE THIS [ 40.618300] Freeing unused kernel image (initmem) memory: 5592K [ 40.728300] Checked W+X mappings: passed, no W+X pages found [ 40.735540] Run /init as init process [ 40.740080] with arguments: [ 40.743920] /init [ 40.746660] with environment: [ 40.750740] HOME=/ [ 40.754020] TERM=linux [ 46.569960] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100 [ 46.578100] CPU: 0 PID: 1 Comm: init Not tainted 6.6.1 #1-NixOS [ 46.584600] Hardware name: Barcelona Supercomputing Center - Lagarto Ox (NixOS) (DT) [ 46.592740] Call Trace: [ 46.595640] [] dump_backtrace+0x38/0x48 [ 46.601760] [] show_stack+0x50/0x68 [ 46.607540] [] dump_stack_lvl+0x3c/0x5c [ 46.613660] [] dump_stack+0x20/0x30 [ 46.619400] [] panic+0x158/0x374 [ 46.624900] [] do_exit+0x9e8/0x9f0 [ 46.630580] [] do_group_exit+0x44/0xb0 [ 46.636600] [] __wake_up_parent+0x0/0x40 [ 46.642780] [] do_trap_ecall_u+0x14c/0x168 [ 46.649140] [] ret_from_exception+0x0/0xac [ 46.655500] Kernel Offset: 0x0 from 0xffffffff80000000 [ 46.661160] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100 ]--- Also found: `no_console_suspend` ### OBSERVATION: There are messages of address space being assigned to registers: Slave segment '/MEEP_uart_0/S_AXI/Reg' is being assigned into address space '/m_axi_uart0' at <0x0000_0000 [ 4K ]>. Slave segment '/MEEP_uart_1/S_AXI/Reg' is being assigned into address space '/m_axi_uart1' at <0x0000_0000 [ 4K ]>. ### QUESTION: What happens if I enable `CONFIG_CONSOLE_POLL`? With `console=ttyS0,115200n8 debug1` I cannot type. ### OBSERVATION: I can dump iomem memory with the tool devmem: But it seems I cannot dump the registers of the serial io mapped region: ~ # cat /proc/iomem 40001000-400010ff : serial 60000000-7fffffff : Reserved 80000000-ffefffff : System RAM 80201000-81fa0b87 : Kernel image 80201000-80cb177f : Kernel code 81400000-819fffff : Kernel rodata 81c00000-81f18747 : Kernel data 81f19000-81fa0b87 : Kernel bss 100000000-1bfffffff : namespace0.0 ~ # devmem 0x40001000 devmem: mmap: Operation not permitted It looks like the following options may be required to be disabled to allow user-space tools read those regions. ~ # zcat /proc/config.gz | grep CONFIG_STRICT_DEVMEM CONFIG_STRICT_DEVMEM=y ~ # zcat /proc/config.gz | grep CONFIG_IO_STRICT_DEVMEM CONFIG_IO_STRICT_DEVMEM=y Let's try disabling `CONFIG_STRICT_DEVMEM` and see if we can read the serial registers. It works! ~ # cat /proc/iomem 40001000-400010ff : serial 60000000-7fffffff : Reserved 80000000-ffefffff : System RAM 80201000-81fa0b87 : Kernel image 80201000-80cb159f : Kernel code 81400000-819fffff : Kernel rodata 81c00000-81f18707 : Kernel data 81f19000-81fa0b87 : Kernel bss 100000000-1bfffffff : namespace0.0 ~ # devmem 0x40001000 0x0000006E ~ # devmem 0x40001000 0x0000006C ~ # devmem 0x40001000 0x00000072 ~ # devmem 0x40001000 0x0000000D ### OBSERVATION: The interrupt register of the serial console is 0x0: Assuming the console registers follow AXI UART 16550, here is the IER: > 0x1004 IER R/W Interrupt Enable Register Which is zero: ~ # devmem 0x40001004 0x00000000 The line control register is 0x3: ~ # devmem 0x4000100C 0x00000003 ### QUESTION: Can I write to some memory address and see the result from the host? For that I would need to find some address that is mapped to the DMA or to the pmem. Xavi recommended `0x6000_0000` as it is uncached. It seems to have some content already: ~ # devmem 0x60000000 0x00000093 ~ # devmem 0x60000004 0x00000113 ~ # devmem 0x60000008 0x00000193 ~ # devmem 0x60000010 0x00000293 ~ # devmem 0x60000014 0x00000313 ~ # devmem 0x60000018 0x00000393 Writing test seems to work: ~ # devmem 0x60000000 32 0x11223344 ~ # devmem 0x60000000 0x11223344 So, technically we should be using the `/dev/qdma34000-MM-0` device, as we already use the other one to map the memory. [bsc015557@fpgan02 nixos]$ for i in {0..16}; do addr=$(($i * 0x10000000)); \ printf "addr 0x%09x: " $addr; dd if=/dev/qdma34000-MM-0 count=16 bs=1 skip=$addr 2>/dev/null | xxd; done addr 0x000000000: 00000000: 4444 4444 4444 4444 4444 4444 4444 4444 DDDDDDDDDDDDDDDD addr 0x010000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ addr 0x020000000: 00000000: 3333 3333 3333 3333 3333 3333 3333 3333 3333333333333333 addr 0x030000000: 00000000: cccc cccc cccc cccc cccc cccc cccc cccc ................ addr 0x040000000: 00000000: cccc cccc cccc cccc cccc cccc cccc cccc ................ addr 0x050000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ addr 0x060000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ addr 0x070000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ addr 0x080000000: 00000000: 3333 3333 3333 3333 3333 3333 3333 3333 3333333333333333 addr 0x090000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ addr 0x0a0000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ addr 0x0b0000000: 00000000: 2f2f 2f2f 2f2f 2f2f 2f2f 2f2f 2f2f 2f2f //////////////// addr 0x0c0000000: 00000000: 6e6e 6e6e 6e6e 6e6e 6e6e 6e6e 6e6e 6e6e nnnnnnnnnnnnnnnn addr 0x0d0000000: 00000000: 2020 2020 2020 2020 2020 2020 2020 2020 addr 0x0e0000000: 00000000: 6c6c 6c6c 6c6c 6c6c 6c6c 6c6c 6c6c 6c6c llllllllllllllll addr 0x0f0000000: 00000000: 6767 6767 6767 6767 6767 6767 6767 6767 gggggggggggggggg addr 0x100000000: 00000000: 2424 2424 2424 2424 2424 2424 2424 2424 $$$$$$$$$$$$$$$$ [bsc015557@fpgan02 nixos]$ for i in {0..16}; do addr=$(($i * 0x10000000)); \ printf "addr 0x%09x: " $addr; dd if=/dev/qdma34000-MM-1 count=16 bs=1 skip=$addr 2>/dev/null | xxd; done addr 0x000000000: 00000000: 4444 4444 4444 4444 4444 4444 4444 4444 DDDDDDDDDDDDDDDD addr 0x010000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ addr 0x020000000: 00000000: 3333 3333 3333 3333 3333 3333 3333 3333 3333333333333333 addr 0x030000000: 00000000: cccc cccc cccc cccc cccc cccc cccc cccc ................ addr 0x040000000: 00000000: cccc cccc cccc cccc cccc cccc cccc cccc ................ addr 0x050000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ addr 0x060000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ addr 0x070000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ addr 0x080000000: 00000000: 3333 3333 3333 3333 3333 3333 3333 3333 3333333333333333 addr 0x090000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ addr 0x0a0000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ addr 0x0b0000000: 00000000: 2f2f 2f2f 2f2f 2f2f 2f2f 2f2f 2f2f 2f2f //////////////// addr 0x0c0000000: 00000000: 6e6e 6e6e 6e6e 6e6e 6e6e 6e6e 6e6e 6e6e nnnnnnnnnnnnnnnn addr 0x0d0000000: 00000000: 2020 2020 2020 2020 2020 2020 2020 2020 addr 0x0e0000000: 00000000: 6c6c 6c6c 6c6c 6c6c 6c6c 6c6c 6c6c 6c6c llllllllllllllll addr 0x0f0000000: 00000000: 6767 6767 6767 6767 6767 6767 6767 6767 gggggggggggggggg addr 0x100000000: 00000000: 2424 2424 2424 2424 2424 2424 2424 2424 $$$$$$$$$$$$$$$$ None of the two queues seem to have the value 0x11223344 at any multiple of `0x1000_0000`. Let's verify first that this method works. The kernel is loaded here: [bsc015557@fpgan02 nixos]$ printf '0x%x\n' $FPGACTL_KERNEL_ADDR 0x24000000 So we should see the same values as the kernel file: [bsc015557@fpgan02 nixos]$ dd if=kernel.bin count=16 bs=1 2>/dev/null| xxd 00000000: 6f00 400d 0000 0000 0000 2000 0000 0000 o.@....... ..... But we don't see the same: [bsc015557@fpgan02 nixos]$ dd if=/dev/qdma34000-MM-1 count=16 bs=1 skip=$FPGACTL_KERNEL_ADDR 2>/dev/null | xxd 00000000: 9797 9797 9797 9797 9797 9797 9797 9797 ................ [bsc015557@fpgan02 nixos]$ dd if=/dev/qdma34000-MM-0 count=16 bs=1 skip=$FPGACTL_KERNEL_ADDR 2>/dev/null | xxd 00000000: 9797 9797 9797 9797 9797 9797 9797 9797 ................ ### QUESTION: Missing forward M to S via Mideleg? Can it be happening that he MEDELEG is not forwarding the interruptions to the Supervisor (kernel)? Boot HART MIDELEG : 0x0000000000000222 Boot HART MEDELEG : 0x000000000000b109 ### QUESTION: Can we add a timer to the PLIC to test the interrupts? ### OBSERVATION: Here is the PLIC register dump: ~ # for i in `seq 0 16`; do addr=$((0x40600000 + $i)); printf '%08x: ' $addr; devmem $addr; done 40600000: 0x00010002 40600001: 0x09000000 40600002: 0x00090000 40600003: 0x00000900 40600004: 0x00010009 40600005: 0x00000000 40600006: 0x00000000 40600007: 0x00000000 40600008: 0x00000000 40600009: 0x00000000 4060000a: 0x00000000 4060000b: 0x00000000 4060000c: 0x00000000 4060000d: 0x00000000 4060000e: 0x00000000 4060000f: 0x00000000 40600010: 0x00000000 ### QUESTION: Can we boot with the new bitstream that includes the second UART? The interruptions are enabled for the UART 1, not the default UART 0. ### OBSERVATION: I'm using 0x100 not 0x1000 in the serial range: reg = <0x0 0x40003000 0x0 0x100>; reg = <0x0 0x40003000 0x0 0x1000>; Can this produce any problem? It doesn't seem to change anything, still unable to send any bytes. ### QUESTION: Can we use virtio to mount a FS in the DMA shared memory? ## 2024-07-05 ### OBSERVATION: The kernel continues working when the console hangs. Switching to 0x100000000 as 0x60000000 shows: ~ # devmem 0x6000000 0xBADCAB1E With the following loop: ~ # i=0; while [ 1 ]; do let i=$i+1; devmem 0x100000000 32 $i; done & ~ # cat /dev/ttyS0 (hangs) Shows the kernel works: [bsc015557@fpgan02 nixos]$ while [ 1 ]; do xxd -s $((0x100000000 - 0x60000000)) -l 4 /dev/qdma34000-MM-1; sleep 0.2; done ... a0000000: 6400 0000 d... a0000000: 6500 0000 e... a0000000: 6600 0000 f... a0000000: 6700 0000 g... a0000000: 6800 0000 h... a0000000: 6900 0000 i... ### QUESTION: Can we reproduce it with `switch_root`? For that I would have to ensure the process continues to operate, even if we exit the console. Maybe I can make a double fork? I cannot use `0x1000_0000` as that is where the pmem will be. But I can try to use an address in the end, as we are not filling the whole space. [0x1_0000_0000, 0x1_c000_0000) -> PMEM (3072 MiB) Maybe `0x1_bfff_0000`? Let's try first from the initrd shell. i=0; while [ 1 ]; do let i=$i+1; devmem 0x1bfff0000 32 $i; done & Then while [ 1 ]; do xxd -s $((0x1bfff0000 - 0x60000000)) -l 4 /dev/qdma34000-MM-1; sleep 0.2; done Yes, it seems to be working. Let's load the rootfs too. I added a loop in the stage1 script. ### QUESTION: Can we see any clock in memory? This will allow us to check if the AXI still works. ### OBSERVATION: The kernel stops updating the counter in the mount phase. Managed to reach the mount and hang there: [ 337.504740] stage-1-init: [Thu Jan 1 00:05:36 UTC 1970] + '[' -d /dev/disk/by-label/NIXOS_SD ] [ 338.284560] stage-1-init: [Thu Jan 1 00:05:37 UTC 1970] + mkdir -m 0755 -p /mnt-root/ [ 339.017420] stage-1-init: [Thu Jan 1 00:05:38 UTC 1970] + local 'n=0' [ 339.752560] stage-1-init: [Thu Jan 1 00:05:39 UTC 1970] + true [ 340.488960] stage-1-init: [Thu Jan 1 00:05:39 UTC 1970] + mount /mnt-root/ After almost 6 minutes, with 571 beats: $ xxd -s 5905514496 -l 4 /dev/qdma34000-MM-1 5fff0000: 3b02 0000 ;... It looks like the kernel is the one getting stuck *or* at least is unable to propagate the heartbeat changes to the host. It would be nice to monitor a hardware clock from the DMA region too, so we can discard problems in the AXI. ### OBSERVATION: There is an ioctl failed for /dev/console [ 177.009540] stage-1-init: [Thu Jan 1 00:02:56 UTC 1970] + udevadm settle + kbd_mode -u -C /dev/console kbd_mode: KDSKBMODE: Inappropriate ioctl for device + printf '\033%%G' + loadkmap [ 266.301040] stage-1-init: [Thu Jan 1 00:04:25 UTC 1970] + kbd_mode -u -C /dev/console ### ASSUMPTION: The kernel hangs. If the kernel hangs, there must be an instruction or sequence of instructions that causes it. First I need to determine what is being executed by the kernel. For that I could use `ftrace` to see which program is running at the time it hangs. trace_options=sym-addr trace_event=initcall:* tp_printk trace_buf_size=1M (prev_comm != 2 && next_comm != 2) So, we can just enable the `tp_printk` but not the tracer. Then in the initrd script, I enable the function tracer and the filter. ### OBSERVATION: It takes a long time to init the pty: Interesting timing: [ 12.612620] initcall_start: func=pty_init+0x0/0x3f4 [ 20.962640] initcall_finish: func=pty_init+0x0/0x3f4 ret=0 ### OBSERVATION: The kcompactd0 daemon is using the CPU: [ 290.394920] sched_switch: prev_comm=devmem prev_pid=129 prev_prio=120 prev_state=R ==> next_comm=init next_pid=69 next_prio=120 [ 290.408160] sched_switch: prev_comm=init prev_pid=69 prev_prio=120 prev_state=R ==> next_comm=tee next_pid=68 next_prio=120 [ 290.420720] sched_switch: prev_comm=tee prev_pid=68 prev_prio=120 prev_state=R+ ==> next_comm=ksoftirqd/0 next_pid=12 next_prio=120 [ 290.433960] sched_switch: prev_comm=ksoftirqd/0 prev_pid=12 prev_prio=120 prev_state=R ==> next_comm=init next_pid=1 next_prio=120 [ 290.447100] sched_switch: prev_comm=init prev_pid=1 prev_prio=120 prev_state=R ==> next_comm=kcompactd0 next_pid=22 next_prio=120 [ 290.460180] sched_switch: prev_comm=kcompactd0 prev_pid=22 prev_prio=120 prev_state=R ==> next_comm=khvcd next_pid=31 next_prio=120 [ 290.473400] sched_switch: prev_comm=khvcd prev_pid=31 prev_prio=120 prev_state=R ==> next_comm=kworker/u2:2 next_pid=19 next_prio=120 [ 290.486960] sched_switch: prev_comm=kworker/u2:2 prev_pid=19 prev_prio=120 prev_state=R ==> next_comm=khungtaskd next_pid=18 next_prio=120 [ 290.500800] sched_switch: prev_comm=khungtaskd prev_pid=18 prev_prio=120 prev_state=R ==> next_comm=kworker/0:1 next_pid=13 next_prio=120 [ 290.514560] sched_switch: prev_comm=kworker/0:1 prev_pid=13 prev_prio=120 prev_state=R ==> next_comm=tee next_pid=68 next_prio=120 [ 290.527720] sched_switch: prev_comm=tee prev_pid=68 prev_prio=120 prev_state=R+ ==> next_comm=init next_pid=69 next_prio=120 [ 290.540360] sched_switch: prev_comm=init prev_pid=69 prev_prio=120 prev_state=R ==> next_comm=devmem next_pid=129 next_prio=120 [ 290.553280] sched_switch: prev_comm=devmem prev_pid=129 prev_prio=120 prev_state=R ==> next_comm=ksoftirqd/0 next_pid=12 next_prio=120 [ 290.566780] sched_switch: prev_comm=ksoftirqd/0 prev_pid=12 prev_prio=120 prev_state=R ==> next_comm=kcompactd0 next_pid=22 next_prio=120 [ 290.580500] sched_switch: prev_comm=kcompactd0 prev_pid=22 prev_prio=120 prev_state=R ==> next_comm=init next_pid=1 next_prio=120 [ 290.593740] sched_switch: prev_comm=init prev_pid=1 prev_prio=120 prev_state=R ==> next_comm=khvcd next_pid=31 next_prio=120 [ 290.606340] sched_switch: prev_comm=khvcd prev_pid=31 prev_prio=120 prev_state=R ==> next_comm=kworker/u2:2 next_pid=19 next_prio=120 [ 290.619780] sched_switch: prev_comm=kworker/u2:2 prev_pid=19 prev_prio=120 prev_state=R ==> next_comm=khungtaskd next_pid=18 next_prio=120 [ 290.633620] sched_switch: prev_comm=khungtaskd prev_pid=18 prev_prio=120 prev_state=R ==> next_comm=tee next_pid=68 next_prio=120 [ 290.646700] sched_switch: prev_comm=tee prev_pid=68 prev_prio=120 prev_state=R+ ==> next_comm=init next_pid=69 next_prio=120 [ 290.659320] sched_switch: prev_comm=init prev_pid=69 prev_prio=120 prev_state=R ==> next_comm=kworker/0:1 next_pid=13 next_prio=120 [ 290.672560] sched_switch: prev_comm=kworker/0:1 prev_pid=13 prev_prio=120 prev_state=R ==> next_comm=devmem next_pid=129 next_prio=120 [ 290.686080] sched_switch: prev_comm=devmem prev_pid=129 prev_prio=120 prev_state=R ==> next_comm=ksoftirqd/0 next_pid=12 next_prio=120 [ 290.699720] sched_switch: prev_comm=ksoftirqd/0 prev_pid=12 prev_prio=120 prev_state=R ==> next_comm=init next_pid=1 next_prio=120 [ 290.712880] sched_switch: prev_comm=init prev_pid=1 prev_prio=120 prev_state=R ==> next_comm=khvcd next_pid=31 next_prio=120 [ 290.725500] sched_switch: prev_comm=khvcd prev_pid=31 prev_prio=120 prev_state=R ==> next_comm=kcompactd0 next_pid=22 next_prio=120 ### QUESTION: Can we reproduce this hang with 6.9.7? Disabling clang as it is failing to build: hut% nix develop '.#lagarto-ox' error: builder for '/nix/store/x1nfa792pv28px70kvfakm3aalcfbdyw-clang-epi-479518d.drv' failed with exit code 2; last 10 log lines: > | ^~~~~~~~~~~~~~~ > 2 errors generated. > make[2]: *** [lib/Support/CMakeFiles/LLVMSupport.dir/build.make:1868: lib/Support/CMakeFiles/LLVMSupport.dir/Signals.cpp.o] Error 1 > make[2]: *** Waiting for unfinished jobs.... > [ 9%] Built target obj.clang-tblgen > 1 warning generated. > make[1]: *** [CMakeFiles/Makefile2:9468: lib/Support/CMakeFiles/LLVMSupport.dir/all] Error 2 > make[1]: *** Waiting for unfinished jobs.... > [ 9%] Built target obj.llvm-tblgen > make: *** [Makefile:156: all] Error 2 For full logs, run 'nix log /nix/store/x1nfa792pv28px70kvfakm3aalcfbdyw-clang-epi-479518d.drv'. error: 1 dependencies of derivation '/nix/store/m54sxxyi3cg062djrcddcawp10z7r49l-riscv64-unknown-linux-gnu-clang-epi-wrapper-479518d.drv' failed to build error: 1 dependencies of derivation '/nix/store/yl26dbqqj0snl807c0wjabg4dpbq5gvp-stdenv-linux.drv' failed to build error: 1 dependencies of derivation '/nix/store/h6180fcl30kqy3apaqjsbkkik2p1spmr-rvb-riscv64-unknown-linux-gnu-da202d6.drv' failed to build error: 1 dependencies of derivation '/nix/store/b13shgqj7128rdsdzzp4qicqbzl0wnfw-system-path.drv' failed to build error: 1 dependencies of derivation '/nix/store/6qghlihqcyg6155309ldj5xm9m0v835i-nixos-system-nixos-riscv-24.11pre-git.drv' failed to build error: 1 dependencies of derivation '/nix/store/l2x18cih29r1kn6vi8imwhkyk98yhw4i-nix-shell-riscv64-unknown-linux-gnu-env.drv' failed to build ### QUESTION: Missing cache information may affect? Other CPUs report the cache details in the DT. For example this one https://github.com/torvalds/linux/blob/master/arch/riscv/boot/dts/sifive/fu540-c000.dtsi#L45 cpu1: cpu@1 { compatible = "sifive,u54-mc", "sifive,rocket0", "riscv"; d-cache-block-size = <64>; d-cache-sets = <64>; d-cache-size = <32768>; d-tlb-sets = <1>; d-tlb-size = <32>; device_type = "cpu"; i-cache-block-size = <64>; i-cache-sets = <64>; i-cache-size = <32768>; i-tlb-sets = <1>; i-tlb-size = <32>; mmu-type = "riscv,sv39"; reg = <1>; riscv,isa = "rv64imafdc"; riscv,isa-base = "rv64i"; riscv,isa-extensions = "i", "m", "a", "f", "d", "c", "zicntr", "zicsr", "zifencei", "zihpm"; tlb-split; next-level-cache = <&l2cache>; cpu1_intc: interrupt-controller { #interrupt-cells = <1>; compatible = "riscv,cpu-intc"; interrupt-controller; }; }; We may want to add it to our DT to be sure that it has no effect. ### OBSERVATION: Arrived to stage 2! + kill -9 74 + readlink /proc/75/exe [ 374.961120] stage-1-init: [Thu Jan 1 00:06:14 UTC 1970] + test 0 -ge 8 + '[' 75 -eq 1 ] + kill -9 75 + readlink /proc/102/exe + '[' 102 -eq 1 ] + kill -9 102 + readlink /proc/137/exe + continue + readlink /proc/674/exe + continue + readlink /proc/675/exe + continue + test -n + echo /sbin/modprobe + '[' '!' -e /mnt-root//nix/store/xwqaqpc66ijvay9wxnm5nqmi30f2lp1i-nixos-system-nixos-riscv-24.11pre-git/init ] + mkdir -m 0755 -p /mnt-root/proc /mnt-root/sys /mnt-root/dev /mnt-root/run + mount --move /proc /mnt-root/proc + mount --move /sys /mnt-root/sys + mount --move /dev /mnt-root/dev + mount --move /run /mnt-root/run + type -P switch_root + exec env -i /nix/store/988axh0bq3wqp90gms4b4a0hkfwvkd3i-extra-utils/bin/switch_root /mnt-root /nix/store/xwqaqpc66ijvay9wxnm5nqmi30f2lp1i-nixos-system-nixos-riscv-24.11pre-git/init <<< NixOS Stage 2 >>> [ 384.203680] EXT4-fs (pmem0p2): re-mounted 44444444-4444-4444-8888-888888888888 r/w. Quota mode: none. [ 384.287600] booting system configuration /nix/store/xwqaqpc66ijvay9wxnm5nqmi30f2lp1i-nixos-system-nixos-riscv-24.11pre-git running activation script... [ 388.163860] stage-2-init: running activation script... [ 391.643500] random: perl: uninitialized urandom read (4 bytes read) [ 391.884800] random: perl: uninitialized urandom read (4 bytes read) [ 425.302000] random: perl: uninitialized urandom read (4 bytes read) But then it hangs. ## 2024-07-08 ### QUESTION: Who sets the plic interrupts? Shouldn't OpenSBI read the DT and do some configuration in the PLIC while in machine mode? ### OBSERVATION: Semi-stack trace from CincoRanch hvc_remove? console_unlock <-- only called from hvc_remove() prb_read_valid desc_read_finalized_seq __memcpy (multiple times) get_data do_trap_break report_bug is_valid_bugaddr copy_from_kernel_nofault copy_from_kernel_nofault_allowed find_bug _printk vprintk vprintk_default vprintk_emit vprintk_store sched_clock vsnprintf format_decode __memcpy printk_parse_prefix prb_reserve do_page_fault fixup_exception search_exception_tables search_extable cmp_ex_search (multiple times) search_module_extables __module_address no_context.part.0 die_kernel_fault <-- last frame(?) ### QUESTION: Can we place a trace point in `hvc_remove`? If we are getting stuck in the same place, we should be able to see the backtrace (assuming the console still works) just before we try to remove the console device. Placed, but still unable to see anything in any hang. Here is a hang in the Stage 2: <<< NixOS Stage 2 >>> [ 404.158340] EXT4-fs (pmem0p2): re-mounted 44444444-4444-4444-8888-888888888888 r/w. Quota mode: none. [ 404.242500] booting system configuration /nix/store/0za1vqh5alk7mxqs59qxx8izmwmf21w6-nixos-system-nixos-riscv-24.11pre-git running activation script... [ 408.148380] stage-2-init: running activation script... [ 411.612240] random: perl: uninitialized urandom read (4 bytes read) [ 411.866440] random: perl: uninitialized urandom read (4 bytes read) [ 447.588880] random: perl: uninitialized urandom read (4 bytes read) Still, it may be hang in a similar way, causing a loop of page faults just while trying to printk to the console, which would explain why we don't see anything and why the heartbeat stops. Fran has created another bitstream with two consoles enabled (`ox_u55c_a234c132.bit`), let see if I manage to boot with it. First I will need to enable the consoles on the DTS.