Add more experiments with mtest

This commit is contained in:
Rodrigo Arias 2024-07-11 15:36:52 +02:00
parent b7dba89d63
commit 131713e7fc

View File

@ -1229,3 +1229,165 @@ Reproduced from U-Boot:
Let's see if we can fix the boot hang by reducing the memory enough to avoid Let's see if we can fix the boot hang by reducing the memory enough to avoid
this bad region. this bad region.
## 2024-07-11
### OBSERVATION: U-Boot mtest hangs in the last 256 MiB
After reducing the size of the RAM segment, I run again the mtest, but this time
it fails in the last 256 MiB block.
I assume that U-Boot moves itself to the last part of the memory, and them mtest
overwrites the U-Boot code, causing a hang.
So, I simply changed the FDT from U-Boot to skip the first 2M:
fdt set /memory@80000000 reg <0x0 0x80200000 0x0 0x40000000>
And then I enabled the memtest in the kernel boot parameters:
=> fdt set /memory@80000000 reg <0x0 0x80200000 0x0 0x30000000>
=> setenv bootargs "root=/dev/ram0 loglevel=7 debug rw earlycon=sbi console=hvc0 boot.trace boot.tracedebug init=/nix/store/zxbq93zfg8ijkyq5cq5sb4742rczqfck-nixos-system-nixos-riscv-24.11pre-git/init"
=> setenv ramdisk_size 12611657
=> setenv bootargs "root=/dev/ram0 loglevel=7 debug rw earlycon=sbi console=hvc0 boot.trace boot.tracedebug memtest=3 init=/nix/store/zxbq93zfg8ijkyq5cq5sb4742rczqfck-nixos-system-nixos-risc=> 4.11pre-git/init"
=> booti ${kernel_addr_r} ${ramdisk_addr_r}:${ramdisk_size} ${fdtcontroladdr}
Moving Image from 0x84000000 to 0x80200000, end=8303c4d0
## Flattened Device Tree blob at 80013000
Booting using the fdt blob at 0x80013000
Working FDT set to 80013000
Using Device Tree in place at 0000000080013000, end 000000008001696f
Working FDT set to 80013000
Starting kernel ...
[ 0.000000] Linux version 6.9.7 (nixbld@localhost) (riscv64-unknown-linux-gnu-gcc (GCC) 13.3.0, GNU ld (GNU Binutils) 2.41) #1-NixOS Thu Jun 27 11:52:32 UTC 2024
[ 0.000000] Machine model: Barcelona Supercomputing Center - Lagarto Ox (NixOS)
[ 0.000000] SBI specification v2.0 detected
[ 0.000000] SBI implementation ID=0x1 Version=0x10004
[ 0.000000] SBI TIME extension detected
[ 0.000000] SBI IPI extension detected
[ 0.000000] SBI RFENCE extension detected
[ 0.000000] SBI DBCN extension detected
[ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '')
[ 0.000000] printk: legacy bootconsole [sbi0] enabled
[ 0.000000] Reserved memory: created DMA memory pool at 0x0000000060000000, size 256 MiB
[ 0.000000] OF: reserved mem: initialized node dma_pool@60000000, compatible id shared-dma-pool
[ 0.000000] OF: reserved mem: 0x0000000060000000..0x000000006fffffff (262144 KiB) map non-reusable dma_pool@60000000
[ 0.000000] Reserved memory: created DMA memory pool at 0x0000000070000000, size 256 MiB
[ 0.000000] OF: reserved mem: initialized node dma_pool@70000000, compatible id shared-dma-pool
[ 0.000000] OF: reserved mem: 0x0000000070000000..0x000000007fffffff (262144 KiB) map non-reusable dma_pool@70000000
[ 0.000000] cma: Reserved 16 MiB at 0x00000000af000000 on node -1
[ 0.000000] early_memtest: # of tests: 3
[ 0.000000] 0x0000000083200000 - 0x000000008c300000 pattern 5555555555555555
[ 0.000000] 0x000000009e912000 - 0x00000000aeff9308 pattern 5555555555555555
[ 0.000000] 0x00000000aeff9337 - 0x00000000aeff9338 pattern 5555555555555555
[ 0.000000] 0x00000000aeff9367 - 0x00000000aeff9368 pattern 5555555555555555
[ 0.000000] 0x00000000aeffcffc - 0x00000000aeffd000 pattern 5555555555555555
[ 0.000000] 0x0000000083200000 - 0x000000008c300000 pattern ffffffffffffffff
[ 0.000000] 0x000000009e912000 - 0x00000000aeff9308 pattern ffffffffffffffff
[ 0.000000] 0x00000000aeff9337 - 0x00000000aeff9338 pattern ffffffffffffffff
[ 0.000000] 0x00000000aeff9367 - 0x00000000aeff9368 pattern ffffffffffffffff
[ 0.000000] 0x00000000aeffcffc - 0x00000000aeffd000 pattern ffffffffffffffff
[ 0.000000] 0x0000000083200000 - 0x000000008c300000 pattern 0000000000000000
[ 0.000000] 0x000000009e912000 - 0x00000000aeff9308 pattern 0000000000000000
[ 0.000000] 0x00000000aeff9337 - 0x00000000aeff9338 pattern 0000000000000000
[ 0.000000] 0x00000000aeff9367 - 0x00000000aeff9368 pattern 0000000000000000
[ 0.000000] 0x00000000aeffcffc - 0x00000000aeffd000 pattern 0000000000000000
[ 0.000000] Zone ranges:
[ 0.000000] DMA32 [mem 0x0000000080200000-0x00000000b01fffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000080200000-0x00000000b01fffff]
That seems to pass the memtest fine, however the boot process hangs in different
stages.
### OBSERVATION: Cannot open /dev/ttyS0
~ # setserial -g /dev/ttyS1 -a
/dev/ttyS1, Line 1, UART: 16550, Port: 0x0000, IRQ: 1
Baud_base: 3125000, close_delay: 50, divisor: 0
closing_wait: 3000
Flags: spd_normal
~ # setserial -g /dev/ttyS0 -a
(hangs)
This page seems to have good resources on the serial console:
https://tldp.org/HOWTO/Serial-HOWTO-16.html
It seems that there are some differences in the way the serial port is handled
regarding 16550 and 16550A.
I can write to the UART console from U-Boot by directly writing in the
0x40001000 address (A = 0x41):
=> help mw
mw - memory write (fill)
Usage:
mw [.b, .w, .l, .q] address value [count]
=> mw 0x40001000 0x41
A=> mw 0x40001000 0x42
B=> mw 0x40001000 0x43
C=>
### OBSERVATION: I can type with the ttyS0 8250 driver
Tried to boot too, but hangs:
=> fdt set /memory@80000000 reg <0x0 0x80200000 0x0 0x30000000>
=> setenv bootargs "root=/dev/ram0 loglevel=7 debug rw earlycon=sbi console=ttyS0,115200n8 boot.trace boot.tracedebug init=/nix/store/zxbq93zfg8ijkyq5cq5sb4742rczqfck-nixos-
system-nixos-riscv-24.11pre-git/init"
=> setenv ramdisk_size 12611657
=> booti ${kernel_addr_r} ${ramdisk_addr_r}:${ramdisk_size} ${fdtcontroladdr}
...
[ 30.740740] riscv-plic 40800000.plic: mapped 3 interrupts with 1 handlers for 2 contexts.
[ 40.048300] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[ 40.420000] of_serial 40001000.serial: error -ENXIO: IRQ index 0 not found
[ 40.496720] printk: legacy console [ttyS0] disabled
[ 40.558860] 40001000.serial: ttyS0 at MMIO 0x40001000 (irq = 0, base_baud = 3125000) is a 16550
[ 40.583940] printk: legacy console [ttyS0] enabled
[ 40.583940] printk: legacy console [ttyS0] enabled
[ 40.595760] printk: legacy bootconsole [sbi0] disabled
[ 40.595760] printk: legacy bootconsole [sbi0] disabled
[ 40.820380] 40003000.serial: ttyS1 at MMIO 0x40003000 (irq = 1, base_baud = 3125000) is a 16550
...
<<< NixOS Stage 2 >>>
[ 394.678980] EXT4-fs (pmem0p2): re-mounted 44444444-4444-4444-8888-888888888888 r/w. Quota mode: none.
[ 394.764620] booting system configuration /nix/store/zxbq93zfg8ijkyq5cq5sb4742rczqfck-nixos-system-nixos-riscv-24.11pre-git
runnin[ 398.543300] stage-2-init: running activation script...
g activation script...
So, if we observe a hang when writing to a bad memory segment, can there be a
problem in the place we are placing the pmem? Maybe we can test it with u-boot
first.
Another note, the serial device 16550 doesn't seem to use a FIFO, but the
16550A does.
We may want to switch to the A variant, as it seems to be supported by U-boot and
the kernel:
https://github.com/u-boot/u-boot/blob/master/drivers/serial/ns16550.c#L607-L619
https://github.com/torvalds/linux/blob/v6.9/drivers/tty/serial/8250/8250_of.c#L285
And defines a FIFO size of 16 bytes:
https://github.com/torvalds/linux/blob/v6.9/drivers/tty/serial/8250/8250_port.c#L74-L81
Still, we would have to wait for a bitstream that can forward the interrupts
from the host to the serial console to test it.
### OBSERVATION: The memory for the pmem seems to be ok
=> mtest 0x100000000 0x1c0000000 0 2
Testing 100000000 ... 1c0000000:
Pattern FFFFFFFFFFFFFFFF Writing... Reading...Iteration: 2
Tested 2 iteration(s) with 0 errors.