diff --git a/JOURNAL.md b/JOURNAL.md index 1c4e949..5574148 100644 --- a/JOURNAL.md +++ b/JOURNAL.md @@ -1229,3 +1229,165 @@ Reproduced from U-Boot: Let's see if we can fix the boot hang by reducing the memory enough to avoid this bad region. + +## 2024-07-11 + +### OBSERVATION: U-Boot mtest hangs in the last 256 MiB + +After reducing the size of the RAM segment, I run again the mtest, but this time +it fails in the last 256 MiB block. + +I assume that U-Boot moves itself to the last part of the memory, and them mtest +overwrites the U-Boot code, causing a hang. + +So, I simply changed the FDT from U-Boot to skip the first 2M: + + fdt set /memory@80000000 reg <0x0 0x80200000 0x0 0x40000000> + +And then I enabled the memtest in the kernel boot parameters: + + => fdt set /memory@80000000 reg <0x0 0x80200000 0x0 0x30000000> + => setenv bootargs "root=/dev/ram0 loglevel=7 debug rw earlycon=sbi console=hvc0 boot.trace boot.tracedebug init=/nix/store/zxbq93zfg8ijkyq5cq5sb4742rczqfck-nixos-system-nixos-riscv-24.11pre-git/init" + => setenv ramdisk_size 12611657 + => setenv bootargs "root=/dev/ram0 loglevel=7 debug rw earlycon=sbi console=hvc0 boot.trace boot.tracedebug memtest=3 init=/nix/store/zxbq93zfg8ijkyq5cq5sb4742rczqfck-nixos-system-nixos-risc=> 4.11pre-git/init" + => booti ${kernel_addr_r} ${ramdisk_addr_r}:${ramdisk_size} ${fdtcontroladdr} + Moving Image from 0x84000000 to 0x80200000, end=8303c4d0 + ## Flattened Device Tree blob at 80013000 + Booting using the fdt blob at 0x80013000 + Working FDT set to 80013000 + Using Device Tree in place at 0000000080013000, end 000000008001696f + Working FDT set to 80013000 + + Starting kernel ... + + [ 0.000000] Linux version 6.9.7 (nixbld@localhost) (riscv64-unknown-linux-gnu-gcc (GCC) 13.3.0, GNU ld (GNU Binutils) 2.41) #1-NixOS Thu Jun 27 11:52:32 UTC 2024 + [ 0.000000] Machine model: Barcelona Supercomputing Center - Lagarto Ox (NixOS) + [ 0.000000] SBI specification v2.0 detected + [ 0.000000] SBI implementation ID=0x1 Version=0x10004 + [ 0.000000] SBI TIME extension detected + [ 0.000000] SBI IPI extension detected + [ 0.000000] SBI RFENCE extension detected + [ 0.000000] SBI DBCN extension detected + [ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '') + [ 0.000000] printk: legacy bootconsole [sbi0] enabled + [ 0.000000] Reserved memory: created DMA memory pool at 0x0000000060000000, size 256 MiB + [ 0.000000] OF: reserved mem: initialized node dma_pool@60000000, compatible id shared-dma-pool + [ 0.000000] OF: reserved mem: 0x0000000060000000..0x000000006fffffff (262144 KiB) map non-reusable dma_pool@60000000 + [ 0.000000] Reserved memory: created DMA memory pool at 0x0000000070000000, size 256 MiB + [ 0.000000] OF: reserved mem: initialized node dma_pool@70000000, compatible id shared-dma-pool + [ 0.000000] OF: reserved mem: 0x0000000070000000..0x000000007fffffff (262144 KiB) map non-reusable dma_pool@70000000 + [ 0.000000] cma: Reserved 16 MiB at 0x00000000af000000 on node -1 + [ 0.000000] early_memtest: # of tests: 3 + [ 0.000000] 0x0000000083200000 - 0x000000008c300000 pattern 5555555555555555 + [ 0.000000] 0x000000009e912000 - 0x00000000aeff9308 pattern 5555555555555555 + [ 0.000000] 0x00000000aeff9337 - 0x00000000aeff9338 pattern 5555555555555555 + [ 0.000000] 0x00000000aeff9367 - 0x00000000aeff9368 pattern 5555555555555555 + [ 0.000000] 0x00000000aeffcffc - 0x00000000aeffd000 pattern 5555555555555555 + [ 0.000000] 0x0000000083200000 - 0x000000008c300000 pattern ffffffffffffffff + [ 0.000000] 0x000000009e912000 - 0x00000000aeff9308 pattern ffffffffffffffff + [ 0.000000] 0x00000000aeff9337 - 0x00000000aeff9338 pattern ffffffffffffffff + [ 0.000000] 0x00000000aeff9367 - 0x00000000aeff9368 pattern ffffffffffffffff + [ 0.000000] 0x00000000aeffcffc - 0x00000000aeffd000 pattern ffffffffffffffff + [ 0.000000] 0x0000000083200000 - 0x000000008c300000 pattern 0000000000000000 + [ 0.000000] 0x000000009e912000 - 0x00000000aeff9308 pattern 0000000000000000 + [ 0.000000] 0x00000000aeff9337 - 0x00000000aeff9338 pattern 0000000000000000 + [ 0.000000] 0x00000000aeff9367 - 0x00000000aeff9368 pattern 0000000000000000 + [ 0.000000] 0x00000000aeffcffc - 0x00000000aeffd000 pattern 0000000000000000 + [ 0.000000] Zone ranges: + [ 0.000000] DMA32 [mem 0x0000000080200000-0x00000000b01fffff] + [ 0.000000] Normal empty + [ 0.000000] Movable zone start for each node + [ 0.000000] Early memory node ranges + [ 0.000000] node 0: [mem 0x0000000080200000-0x00000000b01fffff] + +That seems to pass the memtest fine, however the boot process hangs in different +stages. + +### OBSERVATION: Cannot open /dev/ttyS0 + + ~ # setserial -g /dev/ttyS1 -a + /dev/ttyS1, Line 1, UART: 16550, Port: 0x0000, IRQ: 1 + Baud_base: 3125000, close_delay: 50, divisor: 0 + closing_wait: 3000 + Flags: spd_normal + + ~ # setserial -g /dev/ttyS0 -a + (hangs) + +This page seems to have good resources on the serial console: + + https://tldp.org/HOWTO/Serial-HOWTO-16.html + +It seems that there are some differences in the way the serial port is handled +regarding 16550 and 16550A. + +I can write to the UART console from U-Boot by directly writing in the +0x40001000 address (A = 0x41): + + => help mw + mw - memory write (fill) + + Usage: + mw [.b, .w, .l, .q] address value [count] + => mw 0x40001000 0x41 + A=> mw 0x40001000 0x42 + B=> mw 0x40001000 0x43 + C=> + +### OBSERVATION: I can type with the ttyS0 8250 driver + +Tried to boot too, but hangs: + + => fdt set /memory@80000000 reg <0x0 0x80200000 0x0 0x30000000> + => setenv bootargs "root=/dev/ram0 loglevel=7 debug rw earlycon=sbi console=ttyS0,115200n8 boot.trace boot.tracedebug init=/nix/store/zxbq93zfg8ijkyq5cq5sb4742rczqfck-nixos- + system-nixos-riscv-24.11pre-git/init" + => setenv ramdisk_size 12611657 + => booti ${kernel_addr_r} ${ramdisk_addr_r}:${ramdisk_size} ${fdtcontroladdr} + ... + [ 30.740740] riscv-plic 40800000.plic: mapped 3 interrupts with 1 handlers for 2 contexts. + [ 40.048300] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled + [ 40.420000] of_serial 40001000.serial: error -ENXIO: IRQ index 0 not found + [ 40.496720] printk: legacy console [ttyS0] disabled + [ 40.558860] 40001000.serial: ttyS0 at MMIO 0x40001000 (irq = 0, base_baud = 3125000) is a 16550 + [ 40.583940] printk: legacy console [ttyS0] enabled + [ 40.583940] printk: legacy console [ttyS0] enabled + [ 40.595760] printk: legacy bootconsole [sbi0] disabled + [ 40.595760] printk: legacy bootconsole [sbi0] disabled + [ 40.820380] 40003000.serial: ttyS1 at MMIO 0x40003000 (irq = 1, base_baud = 3125000) is a 16550 + + ... + + <<< NixOS Stage 2 >>> + + [ 394.678980] EXT4-fs (pmem0p2): re-mounted 44444444-4444-4444-8888-888888888888 r/w. Quota mode: none. + [ 394.764620] booting system configuration /nix/store/zxbq93zfg8ijkyq5cq5sb4742rczqfck-nixos-system-nixos-riscv-24.11pre-git + runnin[ 398.543300] stage-2-init: running activation script... + g activation script... + +So, if we observe a hang when writing to a bad memory segment, can there be a +problem in the place we are placing the pmem? Maybe we can test it with u-boot +first. + +Another note, the serial device 16550 doesn't seem to use a FIFO, but the +16550A does. + +We may want to switch to the A variant, as it seems to be supported by U-boot and +the kernel: + + https://github.com/u-boot/u-boot/blob/master/drivers/serial/ns16550.c#L607-L619 + https://github.com/torvalds/linux/blob/v6.9/drivers/tty/serial/8250/8250_of.c#L285 + +And defines a FIFO size of 16 bytes: + + https://github.com/torvalds/linux/blob/v6.9/drivers/tty/serial/8250/8250_port.c#L74-L81 + +Still, we would have to wait for a bitstream that can forward the interrupts +from the host to the serial console to test it. + +### OBSERVATION: The memory for the pmem seems to be ok + + => mtest 0x100000000 0x1c0000000 0 2 + Testing 100000000 ... 1c0000000: + Pattern FFFFFFFFFFFFFFFF Writing... Reading...Iteration: 2 + Tested 2 iteration(s) with 0 errors. +