Add more experiments with mtest
This commit is contained in:
parent
b7dba89d63
commit
131713e7fc
162
JOURNAL.md
162
JOURNAL.md
@ -1229,3 +1229,165 @@ Reproduced from U-Boot:
|
|||||||
|
|
||||||
Let's see if we can fix the boot hang by reducing the memory enough to avoid
|
Let's see if we can fix the boot hang by reducing the memory enough to avoid
|
||||||
this bad region.
|
this bad region.
|
||||||
|
|
||||||
|
## 2024-07-11
|
||||||
|
|
||||||
|
### OBSERVATION: U-Boot mtest hangs in the last 256 MiB
|
||||||
|
|
||||||
|
After reducing the size of the RAM segment, I run again the mtest, but this time
|
||||||
|
it fails in the last 256 MiB block.
|
||||||
|
|
||||||
|
I assume that U-Boot moves itself to the last part of the memory, and them mtest
|
||||||
|
overwrites the U-Boot code, causing a hang.
|
||||||
|
|
||||||
|
So, I simply changed the FDT from U-Boot to skip the first 2M:
|
||||||
|
|
||||||
|
fdt set /memory@80000000 reg <0x0 0x80200000 0x0 0x40000000>
|
||||||
|
|
||||||
|
And then I enabled the memtest in the kernel boot parameters:
|
||||||
|
|
||||||
|
=> fdt set /memory@80000000 reg <0x0 0x80200000 0x0 0x30000000>
|
||||||
|
=> setenv bootargs "root=/dev/ram0 loglevel=7 debug rw earlycon=sbi console=hvc0 boot.trace boot.tracedebug init=/nix/store/zxbq93zfg8ijkyq5cq5sb4742rczqfck-nixos-system-nixos-riscv-24.11pre-git/init"
|
||||||
|
=> setenv ramdisk_size 12611657
|
||||||
|
=> setenv bootargs "root=/dev/ram0 loglevel=7 debug rw earlycon=sbi console=hvc0 boot.trace boot.tracedebug memtest=3 init=/nix/store/zxbq93zfg8ijkyq5cq5sb4742rczqfck-nixos-system-nixos-risc=> 4.11pre-git/init"
|
||||||
|
=> booti ${kernel_addr_r} ${ramdisk_addr_r}:${ramdisk_size} ${fdtcontroladdr}
|
||||||
|
Moving Image from 0x84000000 to 0x80200000, end=8303c4d0
|
||||||
|
## Flattened Device Tree blob at 80013000
|
||||||
|
Booting using the fdt blob at 0x80013000
|
||||||
|
Working FDT set to 80013000
|
||||||
|
Using Device Tree in place at 0000000080013000, end 000000008001696f
|
||||||
|
Working FDT set to 80013000
|
||||||
|
|
||||||
|
Starting kernel ...
|
||||||
|
|
||||||
|
[ 0.000000] Linux version 6.9.7 (nixbld@localhost) (riscv64-unknown-linux-gnu-gcc (GCC) 13.3.0, GNU ld (GNU Binutils) 2.41) #1-NixOS Thu Jun 27 11:52:32 UTC 2024
|
||||||
|
[ 0.000000] Machine model: Barcelona Supercomputing Center - Lagarto Ox (NixOS)
|
||||||
|
[ 0.000000] SBI specification v2.0 detected
|
||||||
|
[ 0.000000] SBI implementation ID=0x1 Version=0x10004
|
||||||
|
[ 0.000000] SBI TIME extension detected
|
||||||
|
[ 0.000000] SBI IPI extension detected
|
||||||
|
[ 0.000000] SBI RFENCE extension detected
|
||||||
|
[ 0.000000] SBI DBCN extension detected
|
||||||
|
[ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '')
|
||||||
|
[ 0.000000] printk: legacy bootconsole [sbi0] enabled
|
||||||
|
[ 0.000000] Reserved memory: created DMA memory pool at 0x0000000060000000, size 256 MiB
|
||||||
|
[ 0.000000] OF: reserved mem: initialized node dma_pool@60000000, compatible id shared-dma-pool
|
||||||
|
[ 0.000000] OF: reserved mem: 0x0000000060000000..0x000000006fffffff (262144 KiB) map non-reusable dma_pool@60000000
|
||||||
|
[ 0.000000] Reserved memory: created DMA memory pool at 0x0000000070000000, size 256 MiB
|
||||||
|
[ 0.000000] OF: reserved mem: initialized node dma_pool@70000000, compatible id shared-dma-pool
|
||||||
|
[ 0.000000] OF: reserved mem: 0x0000000070000000..0x000000007fffffff (262144 KiB) map non-reusable dma_pool@70000000
|
||||||
|
[ 0.000000] cma: Reserved 16 MiB at 0x00000000af000000 on node -1
|
||||||
|
[ 0.000000] early_memtest: # of tests: 3
|
||||||
|
[ 0.000000] 0x0000000083200000 - 0x000000008c300000 pattern 5555555555555555
|
||||||
|
[ 0.000000] 0x000000009e912000 - 0x00000000aeff9308 pattern 5555555555555555
|
||||||
|
[ 0.000000] 0x00000000aeff9337 - 0x00000000aeff9338 pattern 5555555555555555
|
||||||
|
[ 0.000000] 0x00000000aeff9367 - 0x00000000aeff9368 pattern 5555555555555555
|
||||||
|
[ 0.000000] 0x00000000aeffcffc - 0x00000000aeffd000 pattern 5555555555555555
|
||||||
|
[ 0.000000] 0x0000000083200000 - 0x000000008c300000 pattern ffffffffffffffff
|
||||||
|
[ 0.000000] 0x000000009e912000 - 0x00000000aeff9308 pattern ffffffffffffffff
|
||||||
|
[ 0.000000] 0x00000000aeff9337 - 0x00000000aeff9338 pattern ffffffffffffffff
|
||||||
|
[ 0.000000] 0x00000000aeff9367 - 0x00000000aeff9368 pattern ffffffffffffffff
|
||||||
|
[ 0.000000] 0x00000000aeffcffc - 0x00000000aeffd000 pattern ffffffffffffffff
|
||||||
|
[ 0.000000] 0x0000000083200000 - 0x000000008c300000 pattern 0000000000000000
|
||||||
|
[ 0.000000] 0x000000009e912000 - 0x00000000aeff9308 pattern 0000000000000000
|
||||||
|
[ 0.000000] 0x00000000aeff9337 - 0x00000000aeff9338 pattern 0000000000000000
|
||||||
|
[ 0.000000] 0x00000000aeff9367 - 0x00000000aeff9368 pattern 0000000000000000
|
||||||
|
[ 0.000000] 0x00000000aeffcffc - 0x00000000aeffd000 pattern 0000000000000000
|
||||||
|
[ 0.000000] Zone ranges:
|
||||||
|
[ 0.000000] DMA32 [mem 0x0000000080200000-0x00000000b01fffff]
|
||||||
|
[ 0.000000] Normal empty
|
||||||
|
[ 0.000000] Movable zone start for each node
|
||||||
|
[ 0.000000] Early memory node ranges
|
||||||
|
[ 0.000000] node 0: [mem 0x0000000080200000-0x00000000b01fffff]
|
||||||
|
|
||||||
|
That seems to pass the memtest fine, however the boot process hangs in different
|
||||||
|
stages.
|
||||||
|
|
||||||
|
### OBSERVATION: Cannot open /dev/ttyS0
|
||||||
|
|
||||||
|
~ # setserial -g /dev/ttyS1 -a
|
||||||
|
/dev/ttyS1, Line 1, UART: 16550, Port: 0x0000, IRQ: 1
|
||||||
|
Baud_base: 3125000, close_delay: 50, divisor: 0
|
||||||
|
closing_wait: 3000
|
||||||
|
Flags: spd_normal
|
||||||
|
|
||||||
|
~ # setserial -g /dev/ttyS0 -a
|
||||||
|
(hangs)
|
||||||
|
|
||||||
|
This page seems to have good resources on the serial console:
|
||||||
|
|
||||||
|
https://tldp.org/HOWTO/Serial-HOWTO-16.html
|
||||||
|
|
||||||
|
It seems that there are some differences in the way the serial port is handled
|
||||||
|
regarding 16550 and 16550A.
|
||||||
|
|
||||||
|
I can write to the UART console from U-Boot by directly writing in the
|
||||||
|
0x40001000 address (A = 0x41):
|
||||||
|
|
||||||
|
=> help mw
|
||||||
|
mw - memory write (fill)
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
mw [.b, .w, .l, .q] address value [count]
|
||||||
|
=> mw 0x40001000 0x41
|
||||||
|
A=> mw 0x40001000 0x42
|
||||||
|
B=> mw 0x40001000 0x43
|
||||||
|
C=>
|
||||||
|
|
||||||
|
### OBSERVATION: I can type with the ttyS0 8250 driver
|
||||||
|
|
||||||
|
Tried to boot too, but hangs:
|
||||||
|
|
||||||
|
=> fdt set /memory@80000000 reg <0x0 0x80200000 0x0 0x30000000>
|
||||||
|
=> setenv bootargs "root=/dev/ram0 loglevel=7 debug rw earlycon=sbi console=ttyS0,115200n8 boot.trace boot.tracedebug init=/nix/store/zxbq93zfg8ijkyq5cq5sb4742rczqfck-nixos-
|
||||||
|
system-nixos-riscv-24.11pre-git/init"
|
||||||
|
=> setenv ramdisk_size 12611657
|
||||||
|
=> booti ${kernel_addr_r} ${ramdisk_addr_r}:${ramdisk_size} ${fdtcontroladdr}
|
||||||
|
...
|
||||||
|
[ 30.740740] riscv-plic 40800000.plic: mapped 3 interrupts with 1 handlers for 2 contexts.
|
||||||
|
[ 40.048300] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
|
||||||
|
[ 40.420000] of_serial 40001000.serial: error -ENXIO: IRQ index 0 not found
|
||||||
|
[ 40.496720] printk: legacy console [ttyS0] disabled
|
||||||
|
[ 40.558860] 40001000.serial: ttyS0 at MMIO 0x40001000 (irq = 0, base_baud = 3125000) is a 16550
|
||||||
|
[ 40.583940] printk: legacy console [ttyS0] enabled
|
||||||
|
[ 40.583940] printk: legacy console [ttyS0] enabled
|
||||||
|
[ 40.595760] printk: legacy bootconsole [sbi0] disabled
|
||||||
|
[ 40.595760] printk: legacy bootconsole [sbi0] disabled
|
||||||
|
[ 40.820380] 40003000.serial: ttyS1 at MMIO 0x40003000 (irq = 1, base_baud = 3125000) is a 16550
|
||||||
|
|
||||||
|
...
|
||||||
|
|
||||||
|
<<< NixOS Stage 2 >>>
|
||||||
|
|
||||||
|
[ 394.678980] EXT4-fs (pmem0p2): re-mounted 44444444-4444-4444-8888-888888888888 r/w. Quota mode: none.
|
||||||
|
[ 394.764620] booting system configuration /nix/store/zxbq93zfg8ijkyq5cq5sb4742rczqfck-nixos-system-nixos-riscv-24.11pre-git
|
||||||
|
runnin[ 398.543300] stage-2-init: running activation script...
|
||||||
|
g activation script...
|
||||||
|
|
||||||
|
So, if we observe a hang when writing to a bad memory segment, can there be a
|
||||||
|
problem in the place we are placing the pmem? Maybe we can test it with u-boot
|
||||||
|
first.
|
||||||
|
|
||||||
|
Another note, the serial device 16550 doesn't seem to use a FIFO, but the
|
||||||
|
16550A does.
|
||||||
|
|
||||||
|
We may want to switch to the A variant, as it seems to be supported by U-boot and
|
||||||
|
the kernel:
|
||||||
|
|
||||||
|
https://github.com/u-boot/u-boot/blob/master/drivers/serial/ns16550.c#L607-L619
|
||||||
|
https://github.com/torvalds/linux/blob/v6.9/drivers/tty/serial/8250/8250_of.c#L285
|
||||||
|
|
||||||
|
And defines a FIFO size of 16 bytes:
|
||||||
|
|
||||||
|
https://github.com/torvalds/linux/blob/v6.9/drivers/tty/serial/8250/8250_port.c#L74-L81
|
||||||
|
|
||||||
|
Still, we would have to wait for a bitstream that can forward the interrupts
|
||||||
|
from the host to the serial console to test it.
|
||||||
|
|
||||||
|
### OBSERVATION: The memory for the pmem seems to be ok
|
||||||
|
|
||||||
|
=> mtest 0x100000000 0x1c0000000 0 2
|
||||||
|
Testing 100000000 ... 1c0000000:
|
||||||
|
Pattern FFFFFFFFFFFFFFFF Writing... Reading...Iteration: 2
|
||||||
|
Tested 2 iteration(s) with 0 errors.
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user