3341 lines
143 KiB
Markdown
3341 lines
143 KiB
Markdown
## 2024-07-02
|
||
|
||
Aleix provided some notes on how to trace the kernel using tracepoints that are
|
||
printed to the console:
|
||
|
||
```
|
||
BOOT TIME TRACING
|
||
- compile with CONFIG_BOOTTIME_TRACING=y
|
||
- add to kernel parameters:
|
||
trace_options=sym-addr trace_event=initcall:* tp_printk trace_buf_size=1M ftrace=function ftrace_filter="vfs*"
|
||
trace_options=sym-addr trace_event=sched:* tp_printk trace_buf_size=1M
|
||
- tp_printk sends tracepoint (TRACE_EVENT) to printk buffer
|
||
- trace_event=[subsytem:event]; accepts
|
||
- *:<event_name>
|
||
- <subsystem>:*
|
||
- *:* all
|
||
- many more options at https://www.kernel.org/doc/html/latest/trace/boottime-trace.html
|
||
```
|
||
|
||
Start with:
|
||
|
||
```
|
||
trace_options=sym-addr trace_event=sched:* tp_printk trace_buf_size=1M loglevel=7
|
||
```
|
||
|
||
Can it be caused by the D extension? If I set the ISA to:
|
||
|
||
```
|
||
riscv,isa = "rv64imaf";
|
||
```
|
||
|
||
It locks the stage1 script without providing any output. Let see trying again.
|
||
Locked again.
|
||
|
||
I can disable the FPU in the kernel, and then it will act as a detector of any
|
||
floating point instruction.
|
||
|
||
Then I just need to rebuild busybox without support for double instructions.
|
||
|
||
Let see if I can build busybox with double instruction
|
||
|
||
Nope, the u-boot is reporting the d extension is in the isa:
|
||
|
||
> riscv,isa = "rv64imafd";
|
||
|
||
## 2024-07-03<!--{{{-->
|
||
|
||
I cannot switch to `gcc.arch = rv64ima` because rust fails to build.
|
||
|
||
**Assumption**: The extensions F and D work well and don't cause a hang in the<!--{{{-->
|
||
CPU.
|
||
|
||
Let's go back and try to get the initrd shell, so we can systematically hang it
|
||
in the `switch_root`
|
||
<!--}}}-->
|
||
### OBSERVATION: The riscv-timer seems to be causing interrupts with IRQ 5:<!--{{{-->
|
||
|
||
```
|
||
[ 62.439060] irq_handler_entry: irq=5 name=riscv-timer
|
||
[ 62.444980] irq_handler_exit: irq=5 ret=handled
|
||
```
|
||
<!--}}}-->
|
||
### OBSERVATION: Rohan reports the serial startup routine being running *after* <!--{{{-->
|
||
the init begins.
|
||
<!--}}}-->
|
||
### OBSERVATION: Only interrupts in timer, others are zero.<!--{{{-->
|
||
|
||
With:
|
||
|
||
commit 4c656bd8ddd2e41ccaa976ff8d6bd9209175a632 (HEAD -> lagarto-ox)
|
||
Author: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
|
||
Date: Wed Jul 3 13:21:04 2024 +0200
|
||
|
||
Add busybox patch to see debug lines
|
||
|
||
The switch_root command seems to hang in the execv() syscall.
|
||
|
||
I can see this:
|
||
|
||
~ # cat /proc/interrupts ; sleep 10 ; cat /proc/interrupts
|
||
CPU0
|
||
10: 42926 RISC-V INTC 5 Edge riscv-timer
|
||
IPI0: 0 Rescheduling interrupts
|
||
IPI1: 0 Function call interrupts
|
||
IPI2: 0 CPU stop interrupts
|
||
IPI3: 0 CPU stop (for crash dump) interrupts
|
||
IPI4: 0 IRQ work interrupts
|
||
IPI5: 0 Timer broadcast interrupts
|
||
CPU0
|
||
10: 46023 RISC-V INTC 5 Edge riscv-timer
|
||
IPI0: 0 Rescheduling interrupts
|
||
IPI1: 0 Function call interrupts
|
||
IPI2: 0 CPU stop interrupts
|
||
IPI3: 0 CPU stop (for crash dump) interrupts
|
||
IPI4: 0 IRQ work interrupts
|
||
IPI5: 0 Timer broadcast interrupts
|
||
<!--}}}-->
|
||
### OBSERVATION: There is a timer configured in 0x40170000 but in the device<!--{{{-->
|
||
tree we only have one at `timer@40002000`.
|
||
|
||
#define OX_ALVEO_TIMER_BASE 0x40170000
|
||
#define ADDR_TIME_L 0x0u // 32 lower bits of the time register
|
||
#define ADDR_TIME_H 0x1u // 32 higher bits of the time register
|
||
#define ADDR_TIMECMP_L 0x2u // 32 lower bits of the time comparator
|
||
#define ADDR_TIMECMP_H 0x3u // 32 higher bits of the time comparator
|
||
|
||
https://gitlab.bsc.es/hwdesign/bsc-linux/-/blob/d6d194bd30d9a8fe49c2a278ffb3c3ae7852e75d/bsc_tree/patches/ox_alveo/opensbi/0001-opensbi-ox_alveo-platform.patch#L63
|
||
<!--}}}-->
|
||
### OBSERVATION: When the serial console starts, the speed of the serial port<!--{{{-->
|
||
changes to 9600:
|
||
|
||
[ 6.845400] io scheduler mq-deadline registered
|
||
[ 6.851500] io scheduler kyber registered
|
||
[ 17.644460] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
|
||
[ 18.141160] printk: console [ttyS0] disabled
|
||
[ 18.229480] 40001000.serial: ttyS0 at MMIO 0x40001000 (irq = 11, base_baud = 3125000) is a 16550
|
||
|
||
*** baud: 230400 ***
|
||
|
||
*** baud: 460800 ***
|
||
|
||
*** baud: 500000 ***
|
||
|
||
*** baud: 576000 ***
|
||
|
||
*** baud: 500000 ***
|
||
|
||
*** baud: 460800 ***
|
||
|
||
*** baud: 230400 ***
|
||
|
||
*** baud: 115200 ***
|
||
|
||
*** baud: 57600 ***
|
||
|
||
*** baud: 38400 ***
|
||
|
||
*** baud: 19200 ***
|
||
faaa0?xx
|
||
<ffa
|
||
fx3fÆf
|
||
ff
|
||
~3xf03faaa0?~3<3<`<~<f<`<憆3~fÆf̆x3fff
|
||
|
||
*** baud: 9600 ***
|
||
|
||
*** baud: 19200 ***
|
||
|
||
*** baud: 38400 ***
|
||
xxxxxx<xx<xx<xx<xxxxxxxx
|
||
*** baud: 57600 ***
|
||
|
||
*** baud: 38400 ***
|
||
xxxxxxx<x<xx<xx<xx<xx<x<xxxxxxx<xxx<x<x<xxx<xxxxxx<xx<xxxxx<x<xxxxxxxxxxxxxx<xxxxxxxx<xxxxx<x
|
||
*** baud: 19200 ***
|
||
faaa??~3fx3x3憆`?3f
|
||
ff3f<ffa<xÞ?<?f<x<x<`<`<f<x??`?`<怘
|
||
|
||
*** baud: 9600 ***
|
||
[ 77.929980] stage-1-init: [Thu Jan 1 00:01:17 UTC 1970] + ln -sfn /proc/self/fd/1 /dev/stdout
|
||
[ 79.258380] stage-1-init: [Thu Jan 1 00:01:18 UTC 1970] + ln -sfn /proc/self/fd/2 /dev/stderr
|
||
[ 80.212360] stage-1-init: [Thu Jan 1 00:01:19 UTC 1970] + mkdir -p /etc/systemd
|
||
[ 81.224900] stage-1-init: [Thu Jan 1 00:01:20 UTC 1970] + ln -sfn /nix/store/r4ycv0ymw9wrnalh0sdgkp39nwy3szqz-link-units /etc/systemd/network
|
||
[ 82.270000] stage-1-init: [Thu Jan 1 00:01:21 UTC 1970] + mkdir -p /etc/udev
|
||
[ 83.249640] stage-1-init: [Thu Jan 1 00:01:22 UTC 1970] + ln -sfn /nix/store/lx1q7j61dldj8bv0p33v42q0nldb6sd7-udev-rules /etc/udev/rules.d
|
||
[ 84.304940] stage-1-init: [Thu Jan 1 00:01:23 UTC 1970] + mkdir -p /dev/.mdadm
|
||
[ 85.221060] stage-1-init: [Thu Jan 1 00:01:24 UTC 1970] + systemd-udevd --daemon
|
||
[ 86.157320] stage-1-init: [Thu Jan 1 00:01:25 UTC 1970] Starting systemd-udevd version 254.3
|
||
[ 87.130060] stage-1-init: [Thu Jan 1 00:01:26 UTC 1970] + udevadm trigger '--action=add'
|
||
|
||
*** break sent ***
|
||
[ 158.281900] stage-1-init: [Thu Jan 1 00:02:37 UTC 1970] + udevadm settle
|
||
|
||
*** break sent ***
|
||
|
||
*** break sent ***
|
||
[ 214.524500] stage-1-init: [Thu Jan 1 00:03:33 UTC 1970] + kbd_mode -u -C /dev/console
|
||
[ 215.392960] stage-1-init: [Thu Jan 1 00:03:34 UTC 1970] kbd_mode: KDSKBMODE: Inappropriate ioctl for device
|
||
|
||
That was my mistake as I need to put the baud speed in the ttyS0, like this:
|
||
|
||
console=ttyS0,115200n8
|
||
<!--}}}-->
|
||
### OBSERVATION: Trying to read from the serial console /dev/ttyS0 causes no<!--{{{-->
|
||
more messages in the console (or a hang).
|
||
<!--}}}-->
|
||
### QUESTION: Can we make a heartbeat for the kernel? <!--{{{-->
|
||
The idea is to keep a counter in some memory of the kernel so we can see it from
|
||
the host being moved.
|
||
<!--}}}-->
|
||
### QUESTION: Can we disable the serial driver 8250 from loading?<!--{{{-->
|
||
|
||
initcall_blacklist=<driver_init>
|
||
|
||
I need to know the 8250 init function name:
|
||
|
||
drivers/tty/serial/8250/8250_core.c:static int __init serial8250_init(void)
|
||
|
||
So...
|
||
|
||
initcall_blacklist=serial8250_init
|
||
|
||
Yes, but that doesn't seem to do anything. It is hanging:
|
||
|
||
+ modprobe dm_mod
|
||
[ 627.473580] stage-1-init: [Thu Jan 1 00:10:26 UTC 1970] + echo init /nix/store/v6pi2mqfgshxdsbyxlvpm9nvawxrpijv-nixos-system-nixos-riscv-23.11pre-git/init
|
||
[ 628.249440] stage-1-init: [Thu Jan 1 00:10:27 UTC 1970] + set -- init /nix/store/v6pi2mqfgshxdsbyxlvpm9nvawxrpijv-nixos-system-nixos-riscv-23.11pre-git/init
|
||
[ 629.004840] stage-1-init: [Thu Jan 1 00:10:28 UTC 1970] + stage2Init=/nix/store/v6pi2mqfgshxdsbyxlvpm9nvawxrpijv-nixos-system-nixos-riscv-23.11pre-git/init
|
||
[ 629.733920] stage-1-init: [Thu Jan 1 00:10:29 UTC 1970] + echo /nix/store/snvvqpxmryw1szlllk0bxpm37p8vj8sw-extra-utils/bin/modprobe
|
||
|
||
<!--}}}-->
|
||
### QUESTION: What happens if we remap the interruptions?<!--{{{-->
|
||
|
||
- Move the serial from 0 to 1
|
||
- Move the plic from 3 to 2 and remove 7
|
||
|
||
Now we have one context only:
|
||
|
||
[ 0.000000] riscv-intc: 64 local interrupts mapped
|
||
[ 0.000000] plic: plic@40800000: mapped 3 interrupts with 0 handlers for 1 contexts.
|
||
[ 0.000000] riscv: providing IPIs using SBI IPI extension
|
||
|
||
Rather than two:
|
||
|
||
[ 0.000000] riscv-intc: 64 local interrupts mapped
|
||
[ 0.000000] plic: plic@40800000: mapped 3 interrupts with 0 handlers for 2 contexts.
|
||
[ 0.000000] riscv: providing IPIs using SBI IPI extension
|
||
<!--}}}-->
|
||
### QUESTION: What happens if we block the `sbi_ipi` driver?<!--{{{-->
|
||
|
||
initcall_blacklist=sbi_ipi_init
|
||
|
||
Nothing, it cannot be disabled it seems. I will remove SMP support so it won't
|
||
be compiled in.
|
||
<!--}}}-->
|
||
### OBSERVATION: Searching for 'riscv,plic0' only matches irq-sifive-plic driver.<!--{{{-->
|
||
|
||
hut% rg 'riscv,plic0'
|
||
Documentation/devicetree/bindings/interrupt-controller/sifive,plic-1.0.0.yaml
|
||
72: - const: riscv,plic0
|
||
|
||
drivers/irqchip/irq-sifive-plic.c
|
||
572:IRQCHIP_DECLARE(riscv_plic0, "riscv,plic0", plic_init); /* for legacy systems */
|
||
|
||
So it looks that the only driver that setups the plic is the one used by SiFive.
|
||
Here is the doc: https://static.dev.sifive.com/U54-MC-RVCoreIP.pdf
|
||
<!--}}}-->
|
||
### OBSERVATION: The number of handlers is 0, so there are no interruptions.<!--{{{-->
|
||
|
||
It seems the number next to the phandle of the interrupts-extended attribute in
|
||
the plic follows a different convention of values. Using 9 and 11:
|
||
|
||
plic: plic@40800000: mapped 3 interrupts with 1 handlers for 2 contexts.
|
||
<!--}}}-->
|
||
**Remark**: The key combination to run Magic SysRq using the HVC console is<!--{{{-->
|
||
Ctrl-O and then the SysRq key. It only works it the console is being actively
|
||
polled, otherwise it hangs.<!--}}}-->
|
||
<!--}}}-->
|
||
## 2024-07-04<!--{{{-->
|
||
|
||
### OBSERVATION: I saw they changed this option in Cinco Ranch DTS for the<!--{{{-->
|
||
serial:
|
||
|
||
> reg-shift = <0>; // regs are spaced on 8 bit boundary (modified from Xilinx UART16550 to be ns16550 compatible)
|
||
|
||
Tested booting with debug1 and the ttyS0 console, and it goes extremely slow
|
||
(but still outputs at 115200) and then continues to fail to read keyboard input.
|
||
<!--}}}-->
|
||
### QUESTION: Let's try setting the console in poll mode.<!--{{{-->
|
||
|
||
setenv bootargs "root=/dev/ram0 loglevel=7 debug rw earlycon=uart,io,0x40001000,115200n8 boot.trace console=uart,io,0x40001000,115200n8 debug1 init=/nix/store/wavmnv6wjj8y10ha07wxd5f0sqacivj8-nixos-system-nixos-riscv-23.11pre-git/init"
|
||
|
||
[ 5.909360] io scheduler mq-deadline registered
|
||
[ 5.914900] io scheduler kyber registered
|
||
[ 14.405980] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
|
||
[ 14.856040] 40001000.serial: ttyS0 at MMIO 0x40001000 (irq = 2, base_baud = 3125000) is a 16550
|
||
[ 15.059680] nd_pmem namespace0.0: unable to guarantee persistence of writes
|
||
...
|
||
[ 40.250100] clk: Disabling unused clocks
|
||
[ 40.256220] Warning: unable to open an initial console. <---- SEE THIS
|
||
[ 40.618300] Freeing unused kernel image (initmem) memory: 5592K
|
||
[ 40.728300] Checked W+X mappings: passed, no W+X pages found
|
||
[ 40.735540] Run /init as init process
|
||
[ 40.740080] with arguments:
|
||
[ 40.743920] /init
|
||
[ 40.746660] with environment:
|
||
[ 40.750740] HOME=/
|
||
[ 40.754020] TERM=linux
|
||
[ 46.569960] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
|
||
[ 46.578100] CPU: 0 PID: 1 Comm: init Not tainted 6.6.1 #1-NixOS
|
||
[ 46.584600] Hardware name: Barcelona Supercomputing Center - Lagarto Ox (NixOS) (DT)
|
||
[ 46.592740] Call Trace:
|
||
[ 46.595640] [<ffffffff80006688>] dump_backtrace+0x38/0x48
|
||
[ 46.601760] [<ffffffff80a96a04>] show_stack+0x50/0x68
|
||
[ 46.607540] [<ffffffff80aa3f74>] dump_stack_lvl+0x3c/0x5c
|
||
[ 46.613660] [<ffffffff80aa3fb4>] dump_stack+0x20/0x30
|
||
[ 46.619400] [<ffffffff80a970dc>] panic+0x158/0x374
|
||
[ 46.624900] [<ffffffff8001c18c>] do_exit+0x9e8/0x9f0
|
||
[ 46.630580] [<ffffffff8001c3b0>] do_group_exit+0x44/0xb0
|
||
[ 46.636600] [<ffffffff8001c454>] __wake_up_parent+0x0/0x40
|
||
[ 46.642780] [<ffffffff80aa4eac>] do_trap_ecall_u+0x14c/0x168
|
||
[ 46.649140] [<ffffffff800035ec>] ret_from_exception+0x0/0xac
|
||
[ 46.655500] Kernel Offset: 0x0 from 0xffffffff80000000
|
||
[ 46.661160] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100 ]---
|
||
|
||
Also found: `no_console_suspend`
|
||
<!--}}}-->
|
||
### OBSERVATION: There are messages of address space being assigned to<!--{{{-->
|
||
registers:
|
||
|
||
Slave segment '/MEEP_uart_0/S_AXI/Reg' is being assigned into address space '/m_axi_uart0' at <0x0000_0000 [ 4K ]>.
|
||
Slave segment '/MEEP_uart_1/S_AXI/Reg' is being assigned into address space '/m_axi_uart1' at <0x0000_0000 [ 4K ]>.
|
||
<!--}}}-->
|
||
### QUESTION: What happens if I enable `CONFIG_CONSOLE_POLL`?<!--{{{-->
|
||
|
||
With `console=ttyS0,115200n8 debug1` I cannot type.
|
||
<!--}}}-->
|
||
### OBSERVATION: I can dump iomem memory with the tool devmem:<!--{{{-->
|
||
|
||
But it seems I cannot dump the registers of the serial io mapped region:
|
||
|
||
~ # cat /proc/iomem
|
||
40001000-400010ff : serial
|
||
60000000-7fffffff : Reserved
|
||
80000000-ffefffff : System RAM
|
||
80201000-81fa0b87 : Kernel image
|
||
80201000-80cb177f : Kernel code
|
||
81400000-819fffff : Kernel rodata
|
||
81c00000-81f18747 : Kernel data
|
||
81f19000-81fa0b87 : Kernel bss
|
||
100000000-1bfffffff : namespace0.0
|
||
|
||
~ # devmem 0x40001000
|
||
devmem: mmap: Operation not permitted
|
||
|
||
It looks like the following options may be required to be disabled to allow
|
||
user-space tools read those regions.
|
||
|
||
~ # zcat /proc/config.gz | grep CONFIG_STRICT_DEVMEM
|
||
CONFIG_STRICT_DEVMEM=y
|
||
~ # zcat /proc/config.gz | grep CONFIG_IO_STRICT_DEVMEM
|
||
CONFIG_IO_STRICT_DEVMEM=y
|
||
|
||
Let's try disabling `CONFIG_STRICT_DEVMEM` and see if we can read the serial
|
||
registers.
|
||
|
||
It works!
|
||
|
||
~ # cat /proc/iomem
|
||
40001000-400010ff : serial
|
||
60000000-7fffffff : Reserved
|
||
80000000-ffefffff : System RAM
|
||
80201000-81fa0b87 : Kernel image
|
||
80201000-80cb159f : Kernel code
|
||
81400000-819fffff : Kernel rodata
|
||
81c00000-81f18707 : Kernel data
|
||
81f19000-81fa0b87 : Kernel bss
|
||
100000000-1bfffffff : namespace0.0
|
||
~ # devmem 0x40001000
|
||
0x0000006E
|
||
~ # devmem 0x40001000
|
||
0x0000006C
|
||
~ # devmem 0x40001000
|
||
0x00000072
|
||
~ # devmem 0x40001000
|
||
0x0000000D
|
||
<!--}}}-->
|
||
### OBSERVATION: The interrupt register of the serial console is 0x0:<!--{{{-->
|
||
|
||
Assuming the console registers follow AXI UART 16550, here is the IER:
|
||
|
||
> 0x1004 IER R/W Interrupt Enable Register
|
||
|
||
Which is zero:
|
||
|
||
~ # devmem 0x40001004
|
||
0x00000000
|
||
|
||
The line control register is 0x3:
|
||
|
||
~ # devmem 0x4000100C
|
||
0x00000003
|
||
<!--}}}-->
|
||
### QUESTION: Can I write to some memory address and see the result from the host?<!--{{{-->
|
||
|
||
For that I would need to find some address that is mapped to the DMA or to the
|
||
pmem. Xavi recommended `0x6000_0000` as it is uncached.
|
||
|
||
It seems to have some content already:
|
||
|
||
~ # devmem 0x60000000
|
||
0x00000093
|
||
~ # devmem 0x60000004
|
||
0x00000113
|
||
~ # devmem 0x60000008
|
||
0x00000193
|
||
~ # devmem 0x60000010
|
||
0x00000293
|
||
~ # devmem 0x60000014
|
||
0x00000313
|
||
~ # devmem 0x60000018
|
||
0x00000393
|
||
|
||
Writing test seems to work:
|
||
|
||
~ # devmem 0x60000000 32 0x11223344
|
||
~ # devmem 0x60000000
|
||
0x11223344
|
||
|
||
So, technically we should be using the `/dev/qdma34000-MM-0` device, as we
|
||
already use the other one to map the memory.
|
||
|
||
[bsc015557@fpgan02 nixos]$ for i in {0..16}; do addr=$(($i * 0x10000000)); \
|
||
printf "addr 0x%09x: " $addr; dd if=/dev/qdma34000-MM-0 count=16 bs=1 skip=$addr 2>/dev/null | xxd; done
|
||
addr 0x000000000: 00000000: 4444 4444 4444 4444 4444 4444 4444 4444 DDDDDDDDDDDDDDDD
|
||
addr 0x010000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
|
||
addr 0x020000000: 00000000: 3333 3333 3333 3333 3333 3333 3333 3333 3333333333333333
|
||
addr 0x030000000: 00000000: cccc cccc cccc cccc cccc cccc cccc cccc ................
|
||
addr 0x040000000: 00000000: cccc cccc cccc cccc cccc cccc cccc cccc ................
|
||
addr 0x050000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
|
||
addr 0x060000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
|
||
addr 0x070000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
|
||
addr 0x080000000: 00000000: 3333 3333 3333 3333 3333 3333 3333 3333 3333333333333333
|
||
addr 0x090000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
|
||
addr 0x0a0000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
|
||
addr 0x0b0000000: 00000000: 2f2f 2f2f 2f2f 2f2f 2f2f 2f2f 2f2f 2f2f ////////////////
|
||
addr 0x0c0000000: 00000000: 6e6e 6e6e 6e6e 6e6e 6e6e 6e6e 6e6e 6e6e nnnnnnnnnnnnnnnn
|
||
addr 0x0d0000000: 00000000: 2020 2020 2020 2020 2020 2020 2020 2020
|
||
addr 0x0e0000000: 00000000: 6c6c 6c6c 6c6c 6c6c 6c6c 6c6c 6c6c 6c6c llllllllllllllll
|
||
addr 0x0f0000000: 00000000: 6767 6767 6767 6767 6767 6767 6767 6767 gggggggggggggggg
|
||
addr 0x100000000: 00000000: 2424 2424 2424 2424 2424 2424 2424 2424 $$$$$$$$$$$$$$$$
|
||
|
||
[bsc015557@fpgan02 nixos]$ for i in {0..16}; do addr=$(($i * 0x10000000)); \
|
||
printf "addr 0x%09x: " $addr; dd if=/dev/qdma34000-MM-1 count=16 bs=1 skip=$addr 2>/dev/null | xxd; done
|
||
addr 0x000000000: 00000000: 4444 4444 4444 4444 4444 4444 4444 4444 DDDDDDDDDDDDDDDD
|
||
addr 0x010000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
|
||
addr 0x020000000: 00000000: 3333 3333 3333 3333 3333 3333 3333 3333 3333333333333333
|
||
addr 0x030000000: 00000000: cccc cccc cccc cccc cccc cccc cccc cccc ................
|
||
addr 0x040000000: 00000000: cccc cccc cccc cccc cccc cccc cccc cccc ................
|
||
addr 0x050000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
|
||
addr 0x060000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
|
||
addr 0x070000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
|
||
addr 0x080000000: 00000000: 3333 3333 3333 3333 3333 3333 3333 3333 3333333333333333
|
||
addr 0x090000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
|
||
addr 0x0a0000000: 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
|
||
addr 0x0b0000000: 00000000: 2f2f 2f2f 2f2f 2f2f 2f2f 2f2f 2f2f 2f2f ////////////////
|
||
addr 0x0c0000000: 00000000: 6e6e 6e6e 6e6e 6e6e 6e6e 6e6e 6e6e 6e6e nnnnnnnnnnnnnnnn
|
||
addr 0x0d0000000: 00000000: 2020 2020 2020 2020 2020 2020 2020 2020
|
||
addr 0x0e0000000: 00000000: 6c6c 6c6c 6c6c 6c6c 6c6c 6c6c 6c6c 6c6c llllllllllllllll
|
||
addr 0x0f0000000: 00000000: 6767 6767 6767 6767 6767 6767 6767 6767 gggggggggggggggg
|
||
addr 0x100000000: 00000000: 2424 2424 2424 2424 2424 2424 2424 2424 $$$$$$$$$$$$$$$$
|
||
|
||
None of the two queues seem to have the value 0x11223344 at any multiple of `0x1000_0000`.
|
||
|
||
Let's verify first that this method works. The kernel is loaded here:
|
||
|
||
[bsc015557@fpgan02 nixos]$ printf '0x%x\n' $FPGACTL_KERNEL_ADDR
|
||
0x24000000
|
||
|
||
So we should see the same values as the kernel file:
|
||
|
||
[bsc015557@fpgan02 nixos]$ dd if=kernel.bin count=16 bs=1 2>/dev/null| xxd
|
||
00000000: 6f00 400d 0000 0000 0000 2000 0000 0000 o.@....... .....
|
||
|
||
But we don't see the same:
|
||
|
||
[bsc015557@fpgan02 nixos]$ dd if=/dev/qdma34000-MM-1 count=16 bs=1 skip=$FPGACTL_KERNEL_ADDR 2>/dev/null | xxd
|
||
00000000: 9797 9797 9797 9797 9797 9797 9797 9797 ................
|
||
[bsc015557@fpgan02 nixos]$ dd if=/dev/qdma34000-MM-0 count=16 bs=1 skip=$FPGACTL_KERNEL_ADDR 2>/dev/null | xxd
|
||
00000000: 9797 9797 9797 9797 9797 9797 9797 9797 ................
|
||
|
||
<!--}}}-->
|
||
### QUESTION: Missing forward M to S via Mideleg?<!--{{{-->
|
||
|
||
Can it be happening that he MEDELEG is not forwarding the interruptions to the
|
||
Supervisor (kernel)?
|
||
|
||
Boot HART MIDELEG : 0x0000000000000222
|
||
Boot HART MEDELEG : 0x000000000000b109
|
||
|
||
<!--}}}-->
|
||
### QUESTION: Can we add a timer to the PLIC to test the interrupts?<!--{{{-->
|
||
|
||
<!--}}}-->
|
||
### OBSERVATION: Here is the PLIC register dump:<!--{{{-->
|
||
|
||
~ # for i in `seq 0 16`; do addr=$((0x40600000 + $i)); printf '%08x: ' $addr; devmem $addr; done
|
||
40600000: 0x00010002
|
||
40600001: 0x09000000
|
||
40600002: 0x00090000
|
||
40600003: 0x00000900
|
||
40600004: 0x00010009
|
||
40600005: 0x00000000
|
||
40600006: 0x00000000
|
||
40600007: 0x00000000
|
||
40600008: 0x00000000
|
||
40600009: 0x00000000
|
||
4060000a: 0x00000000
|
||
4060000b: 0x00000000
|
||
4060000c: 0x00000000
|
||
4060000d: 0x00000000
|
||
4060000e: 0x00000000
|
||
4060000f: 0x00000000
|
||
40600010: 0x00000000
|
||
<!--}}}-->
|
||
### QUESTION: Can we boot with the new bitstream that includes the second UART?<!--{{{-->
|
||
|
||
The interruptions are enabled for the UART 1, not the default UART 0.
|
||
|
||
<!--}}}-->
|
||
### OBSERVATION: I'm using 0x100 not 0x1000 in the serial range:<!--{{{-->
|
||
|
||
reg = <0x0 0x40003000 0x0 0x100>;
|
||
reg = <0x0 0x40003000 0x0 0x1000>;
|
||
|
||
Can this produce any problem?
|
||
|
||
It doesn't seem to change anything, still unable to send any bytes.
|
||
|
||
<!--}}}-->
|
||
### QUESTION: Can we use virtio to mount a FS in the DMA shared memory?<!--{{{-->
|
||
|
||
<!--}}}-->
|
||
<!--}}}-->
|
||
## 2024-07-05<!--{{{-->
|
||
|
||
### OBSERVATION: The kernel continues working when the console hangs.<!--{{{-->
|
||
|
||
Switching to 0x100000000 as 0x60000000 shows:
|
||
|
||
~ # devmem 0x6000000
|
||
0xBADCAB1E
|
||
|
||
With the following loop:
|
||
|
||
~ # i=0; while [ 1 ]; do let i=$i+1; devmem 0x100000000 32 $i; done &
|
||
~ # cat /dev/ttyS0
|
||
(hangs)
|
||
|
||
Shows the kernel works:
|
||
|
||
[bsc015557@fpgan02 nixos]$ while [ 1 ]; do xxd -s $((0x100000000 - 0x60000000)) -l 4 /dev/qdma34000-MM-1; sleep 0.2; done
|
||
...
|
||
a0000000: 6400 0000 d...
|
||
a0000000: 6500 0000 e...
|
||
a0000000: 6600 0000 f...
|
||
a0000000: 6700 0000 g...
|
||
a0000000: 6800 0000 h...
|
||
a0000000: 6900 0000 i...
|
||
|
||
<!--}}}-->
|
||
### QUESTION: Can we reproduce it with `switch_root`?<!--{{{-->
|
||
|
||
For that I would have to ensure the process continues to operate, even if we
|
||
exit the console. Maybe I can make a double fork?
|
||
|
||
I cannot use `0x1000_0000` as that is where the pmem will be. But I can try to
|
||
use an address in the end, as we are not filling the whole space.
|
||
|
||
[0x1_0000_0000, 0x1_c000_0000) -> PMEM (3072 MiB)
|
||
|
||
Maybe `0x1_bfff_0000`? Let's try first from the initrd shell.
|
||
|
||
i=0; while [ 1 ]; do let i=$i+1; devmem 0x1bfff0000 32 $i; done &
|
||
|
||
Then
|
||
|
||
while [ 1 ]; do xxd -s $((0x1bfff0000 - 0x60000000)) -l 4 /dev/qdma34000-MM-1; sleep 0.2; done
|
||
|
||
Yes, it seems to be working. Let's load the rootfs too.
|
||
|
||
I added a loop in the stage1 script.<!--}}}-->
|
||
### QUESTION: Can we see any clock in memory?<!--{{{-->
|
||
|
||
This will allow us to check if the AXI still works.
|
||
<!--}}}-->
|
||
### OBSERVATION: The kernel stops updating the counter in the mount phase.<!--{{{-->
|
||
|
||
Managed to reach the mount and hang there:
|
||
|
||
[ 337.504740] stage-1-init: [Thu Jan 1 00:05:36 UTC 1970] + '[' -d
|
||
/dev/disk/by-label/NIXOS_SD ]
|
||
[ 338.284560] stage-1-init: [Thu Jan 1 00:05:37 UTC 1970] + mkdir -m 0755 -p
|
||
/mnt-root/
|
||
[ 339.017420] stage-1-init: [Thu Jan 1 00:05:38 UTC 1970] + local 'n=0'
|
||
[ 339.752560] stage-1-init: [Thu Jan 1 00:05:39 UTC 1970] + true
|
||
[ 340.488960] stage-1-init: [Thu Jan 1 00:05:39 UTC 1970] + mount /mnt-root/
|
||
|
||
After almost 6 minutes, with 571 beats:
|
||
|
||
$ xxd -s 5905514496 -l 4 /dev/qdma34000-MM-1
|
||
5fff0000: 3b02 0000 ;...
|
||
|
||
It looks like the kernel is the one getting stuck *or* at least is unable to
|
||
propagate the heartbeat changes to the host. It would be nice to monitor a
|
||
hardware clock from the DMA region too, so we can discard problems in the AXI.
|
||
|
||
<!--}}}-->
|
||
### OBSERVATION: There is an ioctl failed for /dev/console<!--{{{-->
|
||
|
||
[ 177.009540] stage-1-init: [Thu Jan 1 00:02:56 UTC 1970] + udevadm settle
|
||
+ kbd_mode -u -C /dev/console
|
||
kbd_mode: KDSKBMODE: Inappropriate ioctl for device
|
||
+ printf '\033%%G'
|
||
+ loadkmap
|
||
[ 266.301040] stage-1-init: [Thu Jan 1 00:04:25 UTC 1970] + kbd_mode -u -C /dev/console
|
||
|
||
<!--}}}-->
|
||
### ASSUMPTION: The kernel hangs.<!--{{{-->
|
||
|
||
If the kernel hangs, there must be an instruction or sequence of instructions
|
||
that causes it. First I need to determine what is being executed by the kernel.
|
||
For that I could use `ftrace` to see which program is running at the time it
|
||
hangs.
|
||
|
||
trace_options=sym-addr trace_event=initcall:* tp_printk trace_buf_size=1M
|
||
|
||
(prev_comm != 2 && next_comm != 2)
|
||
|
||
So, we can just enable the `tp_printk` but not the tracer. Then in the initrd
|
||
script, I enable the function tracer and the filter.
|
||
|
||
<!--}}}-->
|
||
### OBSERVATION: It takes a long time to init the pty:<!--{{{-->
|
||
|
||
Interesting timing:
|
||
|
||
[ 12.612620] initcall_start: func=pty_init+0x0/0x3f4
|
||
[ 20.962640] initcall_finish: func=pty_init+0x0/0x3f4 ret=0
|
||
|
||
<!--}}}-->
|
||
### OBSERVATION: The kcompactd0 daemon is using the CPU:<!--{{{-->
|
||
|
||
[ 290.394920] sched_switch: prev_comm=devmem prev_pid=129 prev_prio=120 prev_state=R ==> next_comm=init next_pid=69 next_prio=120
|
||
[ 290.408160] sched_switch: prev_comm=init prev_pid=69 prev_prio=120 prev_state=R ==> next_comm=tee next_pid=68 next_prio=120
|
||
[ 290.420720] sched_switch: prev_comm=tee prev_pid=68 prev_prio=120 prev_state=R+ ==> next_comm=ksoftirqd/0 next_pid=12 next_prio=120
|
||
[ 290.433960] sched_switch: prev_comm=ksoftirqd/0 prev_pid=12 prev_prio=120 prev_state=R ==> next_comm=init next_pid=1 next_prio=120
|
||
[ 290.447100] sched_switch: prev_comm=init prev_pid=1 prev_prio=120 prev_state=R ==> next_comm=kcompactd0 next_pid=22 next_prio=120
|
||
[ 290.460180] sched_switch: prev_comm=kcompactd0 prev_pid=22 prev_prio=120 prev_state=R ==> next_comm=khvcd next_pid=31 next_prio=120
|
||
[ 290.473400] sched_switch: prev_comm=khvcd prev_pid=31 prev_prio=120 prev_state=R ==> next_comm=kworker/u2:2 next_pid=19 next_prio=120
|
||
[ 290.486960] sched_switch: prev_comm=kworker/u2:2 prev_pid=19 prev_prio=120 prev_state=R ==> next_comm=khungtaskd next_pid=18 next_prio=120
|
||
[ 290.500800] sched_switch: prev_comm=khungtaskd prev_pid=18 prev_prio=120 prev_state=R ==> next_comm=kworker/0:1 next_pid=13 next_prio=120
|
||
[ 290.514560] sched_switch: prev_comm=kworker/0:1 prev_pid=13 prev_prio=120 prev_state=R ==> next_comm=tee next_pid=68 next_prio=120
|
||
[ 290.527720] sched_switch: prev_comm=tee prev_pid=68 prev_prio=120 prev_state=R+ ==> next_comm=init next_pid=69 next_prio=120
|
||
[ 290.540360] sched_switch: prev_comm=init prev_pid=69 prev_prio=120 prev_state=R ==> next_comm=devmem next_pid=129 next_prio=120
|
||
[ 290.553280] sched_switch: prev_comm=devmem prev_pid=129 prev_prio=120 prev_state=R ==> next_comm=ksoftirqd/0 next_pid=12 next_prio=120
|
||
[ 290.566780] sched_switch: prev_comm=ksoftirqd/0 prev_pid=12 prev_prio=120 prev_state=R ==> next_comm=kcompactd0 next_pid=22 next_prio=120
|
||
[ 290.580500] sched_switch: prev_comm=kcompactd0 prev_pid=22 prev_prio=120 prev_state=R ==> next_comm=init next_pid=1 next_prio=120
|
||
[ 290.593740] sched_switch: prev_comm=init prev_pid=1 prev_prio=120 prev_state=R ==> next_comm=khvcd next_pid=31 next_prio=120
|
||
[ 290.606340] sched_switch: prev_comm=khvcd prev_pid=31 prev_prio=120 prev_state=R ==> next_comm=kworker/u2:2 next_pid=19 next_prio=120
|
||
[ 290.619780] sched_switch: prev_comm=kworker/u2:2 prev_pid=19 prev_prio=120 prev_state=R ==> next_comm=khungtaskd next_pid=18 next_prio=120
|
||
[ 290.633620] sched_switch: prev_comm=khungtaskd prev_pid=18 prev_prio=120 prev_state=R ==> next_comm=tee next_pid=68 next_prio=120
|
||
[ 290.646700] sched_switch: prev_comm=tee prev_pid=68 prev_prio=120 prev_state=R+ ==> next_comm=init next_pid=69 next_prio=120
|
||
[ 290.659320] sched_switch: prev_comm=init prev_pid=69 prev_prio=120 prev_state=R ==> next_comm=kworker/0:1 next_pid=13 next_prio=120
|
||
[ 290.672560] sched_switch: prev_comm=kworker/0:1 prev_pid=13 prev_prio=120 prev_state=R ==> next_comm=devmem next_pid=129 next_prio=120
|
||
[ 290.686080] sched_switch: prev_comm=devmem prev_pid=129 prev_prio=120 prev_state=R ==> next_comm=ksoftirqd/0 next_pid=12 next_prio=120
|
||
[ 290.699720] sched_switch: prev_comm=ksoftirqd/0 prev_pid=12 prev_prio=120 prev_state=R ==> next_comm=init next_pid=1 next_prio=120
|
||
[ 290.712880] sched_switch: prev_comm=init prev_pid=1 prev_prio=120 prev_state=R ==> next_comm=khvcd next_pid=31 next_prio=120
|
||
[ 290.725500] sched_switch: prev_comm=khvcd prev_pid=31 prev_prio=120 prev_state=R ==> next_comm=kcompactd0 next_pid=22 next_prio=120
|
||
|
||
<!--}}}-->
|
||
### QUESTION: Can we reproduce this hang with 6.9.7?<!--{{{-->
|
||
|
||
Disabling clang as it is failing to build:
|
||
|
||
hut% nix develop '.#lagarto-ox'
|
||
error: builder for '/nix/store/x1nfa792pv28px70kvfakm3aalcfbdyw-clang-epi-479518d.drv' failed with exit code 2;
|
||
last 10 log lines:
|
||
> | ^~~~~~~~~~~~~~~
|
||
> 2 errors generated.
|
||
> make[2]: *** [lib/Support/CMakeFiles/LLVMSupport.dir/build.make:1868: lib/Support/CMakeFiles/LLVMSupport.dir/Signals.cpp.o] Error 1
|
||
> make[2]: *** Waiting for unfinished jobs....
|
||
> [ 9%] Built target obj.clang-tblgen
|
||
> 1 warning generated.
|
||
> make[1]: *** [CMakeFiles/Makefile2:9468: lib/Support/CMakeFiles/LLVMSupport.dir/all] Error 2
|
||
> make[1]: *** Waiting for unfinished jobs....
|
||
> [ 9%] Built target obj.llvm-tblgen
|
||
> make: *** [Makefile:156: all] Error 2
|
||
For full logs, run 'nix log /nix/store/x1nfa792pv28px70kvfakm3aalcfbdyw-clang-epi-479518d.drv'.
|
||
error: 1 dependencies of derivation '/nix/store/m54sxxyi3cg062djrcddcawp10z7r49l-riscv64-unknown-linux-gnu-clang-epi-wrapper-479518d.drv' failed to build
|
||
error: 1 dependencies of derivation '/nix/store/yl26dbqqj0snl807c0wjabg4dpbq5gvp-stdenv-linux.drv' failed to build
|
||
error: 1 dependencies of derivation '/nix/store/h6180fcl30kqy3apaqjsbkkik2p1spmr-rvb-riscv64-unknown-linux-gnu-da202d6.drv' failed to build
|
||
error: 1 dependencies of derivation '/nix/store/b13shgqj7128rdsdzzp4qicqbzl0wnfw-system-path.drv' failed to build
|
||
error: 1 dependencies of derivation '/nix/store/6qghlihqcyg6155309ldj5xm9m0v835i-nixos-system-nixos-riscv-24.11pre-git.drv' failed to build
|
||
error: 1 dependencies of derivation '/nix/store/l2x18cih29r1kn6vi8imwhkyk98yhw4i-nix-shell-riscv64-unknown-linux-gnu-env.drv' failed to build
|
||
|
||
<!--}}}-->
|
||
### QUESTION: Missing cache information may affect?<!--{{{-->
|
||
|
||
Other CPUs report the cache details in the DT. For example this one
|
||
https://github.com/torvalds/linux/blob/master/arch/riscv/boot/dts/sifive/fu540-c000.dtsi#L45
|
||
|
||
cpu1: cpu@1 {
|
||
compatible = "sifive,u54-mc", "sifive,rocket0", "riscv";
|
||
d-cache-block-size = <64>;
|
||
d-cache-sets = <64>;
|
||
d-cache-size = <32768>;
|
||
d-tlb-sets = <1>;
|
||
d-tlb-size = <32>;
|
||
device_type = "cpu";
|
||
i-cache-block-size = <64>;
|
||
i-cache-sets = <64>;
|
||
i-cache-size = <32768>;
|
||
i-tlb-sets = <1>;
|
||
i-tlb-size = <32>;
|
||
mmu-type = "riscv,sv39";
|
||
reg = <1>;
|
||
riscv,isa = "rv64imafdc";
|
||
riscv,isa-base = "rv64i";
|
||
riscv,isa-extensions = "i", "m", "a", "f", "d", "c", "zicntr", "zicsr",
|
||
"zifencei", "zihpm";
|
||
tlb-split;
|
||
next-level-cache = <&l2cache>;
|
||
cpu1_intc: interrupt-controller {
|
||
#interrupt-cells = <1>;
|
||
compatible = "riscv,cpu-intc";
|
||
interrupt-controller;
|
||
};
|
||
};
|
||
|
||
We may want to add it to our DT to be sure that it has no effect.<!--}}}-->
|
||
### OBSERVATION: Arrived to stage 2!<!--{{{-->
|
||
|
||
+ kill -9 74
|
||
+ readlink /proc/75/exe
|
||
[ 374.961120] stage-1-init: [Thu Jan 1 00:06:14 UTC 1970] + test 0 -ge 8
|
||
+ '[' 75 -eq 1 ]
|
||
+ kill -9 75
|
||
+ readlink /proc/102/exe
|
||
+ '[' 102 -eq 1 ]
|
||
+ kill -9 102
|
||
+ readlink /proc/137/exe
|
||
+ continue
|
||
+ readlink /proc/674/exe
|
||
+ continue
|
||
+ readlink /proc/675/exe
|
||
+ continue
|
||
+ test -n
|
||
+ echo /sbin/modprobe
|
||
+ '[' '!' -e /mnt-root//nix/store/xwqaqpc66ijvay9wxnm5nqmi30f2lp1i-nixos-system-nixos-riscv-24.11pre-git/init ]
|
||
+ mkdir -m 0755 -p /mnt-root/proc /mnt-root/sys /mnt-root/dev /mnt-root/run
|
||
+ mount --move /proc /mnt-root/proc
|
||
+ mount --move /sys /mnt-root/sys
|
||
+ mount --move /dev /mnt-root/dev
|
||
+ mount --move /run /mnt-root/run
|
||
+ type -P switch_root
|
||
+ exec env -i /nix/store/988axh0bq3wqp90gms4b4a0hkfwvkd3i-extra-utils/bin/switch_root /mnt-root /nix/store/xwqaqpc66ijvay9wxnm5nqmi30f2lp1i-nixos-system-nixos-riscv-24.11pre-git/init
|
||
|
||
<<< NixOS Stage 2 >>>
|
||
|
||
[ 384.203680] EXT4-fs (pmem0p2): re-mounted 44444444-4444-4444-8888-888888888888 r/w. Quota mode: none.
|
||
[ 384.287600] booting system configuration /nix/store/xwqaqpc66ijvay9wxnm5nqmi30f2lp1i-nixos-system-nixos-riscv-24.11pre-git
|
||
running activation script...
|
||
[ 388.163860] stage-2-init: running activation script...
|
||
[ 391.643500] random: perl: uninitialized urandom read (4 bytes read)
|
||
[ 391.884800] random: perl: uninitialized urandom read (4 bytes read)
|
||
[ 425.302000] random: perl: uninitialized urandom read (4 bytes read)
|
||
|
||
But then it hangs.
|
||
<!--}}}-->
|
||
<!--}}}-->
|
||
## 2024-07-08
|
||
|
||
### QUESTION: Who sets the plic interrupts?<!--{{{-->
|
||
|
||
Shouldn't OpenSBI read the DT and do some configuration in the PLIC while in
|
||
machine mode?
|
||
|
||
<!--}}}-->
|
||
### OBSERVATION: Semi-stack trace from CincoRanch<!--{{{-->
|
||
|
||
hvc_remove?
|
||
console_unlock <-- only called from hvc_remove()
|
||
prb_read_valid
|
||
desc_read_finalized_seq
|
||
__memcpy (multiple times)
|
||
get_data
|
||
do_trap_break
|
||
report_bug
|
||
is_valid_bugaddr
|
||
copy_from_kernel_nofault
|
||
copy_from_kernel_nofault_allowed
|
||
find_bug
|
||
_printk
|
||
vprintk
|
||
vprintk_default
|
||
vprintk_emit
|
||
vprintk_store
|
||
sched_clock
|
||
vsnprintf
|
||
format_decode
|
||
__memcpy
|
||
printk_parse_prefix
|
||
prb_reserve
|
||
do_page_fault
|
||
fixup_exception
|
||
search_exception_tables
|
||
search_extable
|
||
cmp_ex_search (multiple times)
|
||
search_module_extables
|
||
__module_address
|
||
no_context.part.0
|
||
die_kernel_fault <-- last frame(?)
|
||
|
||
<!--}}}-->
|
||
### QUESTION: Can we place a trace point in `hvc_remove`?<!--{{{-->
|
||
|
||
If we are getting stuck in the same place, we should be able to see the
|
||
backtrace (assuming the console still works) just before we try to remove the
|
||
console device.
|
||
|
||
Placed, but still unable to see anything in any hang. Here is a hang in the
|
||
Stage 2:
|
||
|
||
<<< NixOS Stage 2 >>>
|
||
|
||
[ 404.158340] EXT4-fs (pmem0p2): re-mounted 44444444-4444-4444-8888-888888888888 r/w. Quota mode: none.
|
||
[ 404.242500] booting system configuration /nix/store/0za1vqh5alk7mxqs59qxx8izmwmf21w6-nixos-system-nixos-riscv-24.11pre-git
|
||
running activation script...
|
||
[ 408.148380] stage-2-init: running activation script...
|
||
[ 411.612240] random: perl: uninitialized urandom read (4 bytes read)
|
||
[ 411.866440] random: perl: uninitialized urandom read (4 bytes read)
|
||
[ 447.588880] random: perl: uninitialized urandom read (4 bytes read)
|
||
|
||
Still, it may be hang in a similar way, causing a loop of page faults just
|
||
while trying to printk to the console, which would explain why we don't see
|
||
anything and why the heartbeat stops.
|
||
|
||
<!--}}}-->
|
||
|
||
Fran has created another bitstream with two consoles enabled
|
||
(`ox_u55c_a234c132.bit`), let see if I manage to boot with it.
|
||
|
||
First I will need to enable the consoles on the DTS.
|
||
|
||
It doesn't seem to produce any output in the UART. I cannot see OpenSBI while
|
||
loading it on each baud rate:
|
||
|
||
Type [C-a] [C-h] to see available commands
|
||
Terminal ready
|
||
|
||
*** baud: 57600 ***
|
||
|
||
*** baud: 38400 ***
|
||
|
||
*** baud: 19200 ***
|
||
|
||
*** baud: 9600 ***
|
||
|
||
[...]
|
||
|
||
*** baud: 230400 ***
|
||
|
||
*** baud: 460800 ***
|
||
|
||
Let's keep the bitstream files in a repository, so I can carefully track them
|
||
with git too.
|
||
|
||
### OBSERVATION: The new bitstream requires a bootrom to start
|
||
|
||
I added it to the bitstream repository, as it is a binary blob too. Now I need
|
||
to update the load addresses:
|
||
|
||
https://gitlab.bsc.es/hwdesign/fpga/integration-lab/fpga-tools/-/blob/6a63bcea6d1d59df3c7d62311aa4935efd54d3a3/boot_riscv/boot_sa.sh#L36-40
|
||
|
||
Continues to hang just after those perl messages:
|
||
|
||
+ mount --move /proc /mnt-root/proc
|
||
+ mount --move /sys /mnt-root/sys
|
||
+ mount --move /dev /mnt-root/dev
|
||
+ mount --move /run /mnt-root/run
|
||
+ type -P switch_root
|
||
+ exec env -i /nix/store/988axh0bq3wqp90gms4b4a0hkfwvkd3i-extra-utils/bin/switch_root /mnt-root /nix/store/0za1vqh5alk7mxqs59qxx8izmwmf21w6-nixos-system-nixos-riscv-24.11pre-git/init
|
||
|
||
<<< NixOS Stage 2 >>>
|
||
|
||
[ 541.559320] EXT4-fs (pmem0p2): re-mounted 44444444-4444-4444-8888-888888888888 r/w. Quota mode: none.
|
||
[ 541.641280] booting system configuration /nix/store/0za1vqh5alk7mxqs59qxx8izmwmf21w6-nixos-system-nixos-riscv-24.11pre-git
|
||
running activation script...
|
||
[ 545.569700] stage-2-init: running activation script...
|
||
[ 549.019380] random: perl: uninitialized urandom read (4 bytes read)
|
||
[ 549.274940] random: perl: uninitialized urandom read (4 bytes read)
|
||
|
||
### QUESTION: What happens if we disable `CONFIG_BUG`
|
||
|
||
May be a long shot, but if we are experiencing the same page fault loop as in
|
||
cincoranch we may as well try.
|
||
|
||
## 2024-07-09
|
||
|
||
### QUESTION: Maybe we can try without out-of-order?
|
||
|
||
I made a small tool in C to view and change the CSR register that controls the
|
||
in-order/out-of-order. Maybe we can try with the "in-order" setting.
|
||
|
||
We arrive to execute `systemd`:
|
||
|
||
|
||
Starting interactive shell...
|
||
+ setsid /nix/store/xm3mpj9aldz5r4s5yb7p08jdjv98hj4w-extra-utils/bin/ash -c 'exec /nix/store/xm3mpj9aldz5r4s5yb7p08jdjv98hj4w-extra-utils/bin/ash < /dev/hvc0 >/dev/hvc0 2>/dev/hvc0'
|
||
[ 90.077300] stage-1-init: [Thu Jan 1 00:01:27 UTC 1970] + '[' -n 1 -a i '=' f ]
|
||
[ 90.639760] stage-1-init: [Thu Jan 1 00:01:28 UTC 1970] + '[' -n 1 -a i '=' i ]
|
||
~ # [ 90.967260] stage-1-init: [Thu Jan 1 00:01:28 UTC 1970] + echo 'Starting interactive shell...'
|
||
[ 91.234980] stage-1-init: [Thu Jan 1 00:01:28 UTC 1970] Starting interactive shell...
|
||
[ 91.569580] stage-1-init: [Thu Jan 1 00:01:29 UTC 1970] + setsid /nix/store/xm3mpj9aldz5r4s5yb7p08jdjv98hj4w-extra-utils/bin/ash -c 'exec /nix/store/xm3mpj9aldz5r4s5yb7p08jdjv98hj4w-extr
|
||
a-utils/bin/ash < /dev/hvc0 >/dev/hvc0 2>/dev/hvc0'
|
||
which csrtool
|
||
/nix/store/xm3mpj9aldz5r4s5yb7p08jdjv98hj4w-extra-utils/bin/csrtool
|
||
~ # csrtool
|
||
CSR 0x801 = 0u
|
||
~ # csrtool o
|
||
unknown 'o', use: mem-in-order, all-in-order or all-out-of-order
|
||
~ # csrtool all-in-order
|
||
CSR 0x801 = 7u
|
||
~ # csrtool
|
||
CSR 0x801 = 7u
|
||
~ #
|
||
+ IFS='='
|
||
+ echo init /nix/store/xmagm60y90pfh3yvqanvmaswa0m3cb0a-nixos-system-nixos-riscv-24.11pre-git/init
|
||
+ set -- init /nix/store/xmagm60y90pfh3yvqanvmaswa0m3cb0a-nixos-system-nixos-riscv-24.11pre-git/init
|
||
+ stage2Init=/nix/store/xmagm60y90pfh3yvqanvmaswa0m3cb0a-nixos-system-nixos-riscv-24.11pre-git/init
|
||
+ echo /nix/store/xm3mpj9aldz5r4s5yb7p08jdjv98hj4w-extra-utils/bin/modprobe
|
||
+ basename dm_mod
|
||
[...]
|
||
+ echo /sbin/modprobe
|
||
+ '[' '!' -e /mnt-root//nix/store/xmagm60y90pfh3yvqanvmaswa0m3cb0a-nixos-system-nixos-riscv-24.11pre-git/init ]
|
||
+ mkdir -m 0755 -p /mnt-root/proc /mnt-root/sys /mnt-root/dev /mnt-root/run
|
||
+ mount --move /proc /mnt-root/proc
|
||
+ mount --move /sys /mnt-root/sys
|
||
+ mount --move /dev /mnt-root/dev
|
||
+ mount --move /run /mnt-root/run
|
||
+ type -P switch_root
|
||
+ exec env -i /nix/store/xm3mpj9aldz5r4s5yb7p08jdjv98hj4w-extra-utils/bin/switch_root /mnt-root /nix/store/xmagm60y90pfh3yvqanvmaswa0m3cb0a-nixos-system-nixos-riscv-24.11pre-git/init
|
||
|
||
<<< NixOS Stage 2 >>>
|
||
|
||
[ 967.703320] EXT4-fs (pmem0p2): re-mounted 44444444-4444-4444-8888-888888888888 r/w. Quota mode: none.
|
||
[ 967.928020] booting system configuration /nix/store/xmagm60y90pfh3yvqanvmaswa0m3cb0a-nixos-system-nixos-riscv-24.11pre-git
|
||
running activation script...
|
||
[ 977.608980] stage-2-init: running activation script...
|
||
bbbbbbbbbbbbbbbbbbbbsetting up /etc...
|
||
[ 1084.376420] stage-2-init: setting up /etc...
|
||
starting systemd...
|
||
|
||
Not sure if it is a good reproducer, as it is taking around 15 minutes to hang
|
||
in a very large piece of software, while when we have the out-of-order enabled,
|
||
we can hang in half of the time in some script.
|
||
|
||
Either way, we need to see a backtrace of where it is hanging to understand why
|
||
it does. We may also enable a stage-2 heartbeat to be sure that it is hanging
|
||
the kernel and not only the console.
|
||
|
||
Another idea is to arrive at a proper bash shell, where we can have debugging
|
||
tools, which may allow us to go slowly until we catch the bug.
|
||
|
||
### OBSERVATION: Setting memory in-order only causes a hang
|
||
|
||
Tested with:
|
||
|
||
$ csrtool mem-in-order
|
||
|
||
And it hangs just after exiting the tool.
|
||
|
||
### QUESTION: Can we see the printk buffer from the host?
|
||
|
||
If the problem that we are observing is somehow related to the recursive
|
||
segfault of the kernel in Cincoranch, we may be able to see the printk ring
|
||
buffer by directly poking at the memory from the host.
|
||
|
||
### QUESTION: Can we crash the CPU by exercising the memory?
|
||
|
||
I did a small tool `memtool` that performs allocations and
|
||
deallocations.
|
||
|
||
Good news, the tool has hang the console (potentially the kernel too).
|
||
|
||
~ # which memtool
|
||
/nix/store/amj11aclwx62d4mnvkhdgj19kq5gjb9y-extra-utils/bin/memtool
|
||
~ # memtool
|
||
iter 0, nblocks 1, nbytes 0.1M (A)
|
||
iter 1, nblocks 2, nbytes 3.5M (A)
|
||
iter 2, nblocks 3, nbytes 3.8M (A)
|
||
iter 3, nblocks 4, nbytes 5.7M (A)
|
||
iter 4, nblocks 5, nbytes 6.8M (A)
|
||
iter 5, nblocks 6, nbytes 10.7M (A)
|
||
iter 6, nblocks 5, nbytes 10.6M (D)
|
||
iter 7, nblocks 6, nbytes 13.9M (A)
|
||
iter 8, nblocks 7, nbytes 16.4M (A)
|
||
iter 9, nblocks 8, nbytes 19.8M (A)
|
||
iter 10, nblocks 9, nbytes 21.2M (A)
|
||
iter 11, nblocks 10, nbytes 24.3M (A)
|
||
iter 12, nblocks 11, nbytes 27.4M (A)
|
||
iter 13, nblocks 12, nbytes 28.3M (A)
|
||
iter 14, nblocks 13, nbytes 31.6M (A)
|
||
iter 15, nblocks 12, nbytes 28.2M (D)
|
||
iter 16, nblocks 13, nbytes 29.2M (A)
|
||
iter 17, nblocks 14, nbytes 30.8M (A)
|
||
iter 18, nblocks 15, nbytes 32.6M (A)
|
||
iter 19, nblocks 16, nbytes 32.8M (A)
|
||
iter 20, nblocks 17, nbytes 36.8M (A)
|
||
iter 21, nblocks 18, nbytes 39.6M (A)
|
||
iter 22, nblocks 19, nbytes 41.1M (A)
|
||
iter 23, nblocks 20, nbytes 44.1M (A)
|
||
iter 24, nblocks 21, nbytes 46.9M (A)
|
||
iter 25, nblocks 20, nbytes 46.5M (D)
|
||
iter 26, nblocks 21, nbytes 50.2M (A)
|
||
iter 27, nblocks 22, nbytes 53.8M (A)
|
||
|
||
Let's see if we can reproduce it again in the same position.
|
||
|
||
~ # memtool
|
||
iter 0, nblocks 1, nbytes 0.1M (A)
|
||
iter 1, nblocks 2, nbytes 3.5M (A)
|
||
iter 2, nblocks 3, nbytes 3.8M (A)
|
||
iter 3, nblocks 4, nbytes 5.7M (A)
|
||
iter 4, nblocks 5, nbytes 6.8M (A)
|
||
iter 5, nblocks 6, nbytes 10.7M (A)
|
||
iter 6, nblocks 5, nbytes 10.6M (D)
|
||
iter 7, nblocks 6, nbytes 13.9M (A)
|
||
iter 8, nblocks 7, nbytes 16.4M (A)
|
||
iter 9, nblocks 8, nbytes 19.8M (A)
|
||
iter 10, nblocks 9, nbytes 21.2M (A)
|
||
iter 11, nblocks 10, nbytes 24.3M (A)
|
||
iter 12, nblocks 11, nbytes 27.4M (A)
|
||
iter 13, nblocks 12, nbytes 28.3M (A)
|
||
iter 14, nblocks 13, nbytes 31.6M (A)
|
||
iter 15, nblocks 12, nbytes 28.2M (D)
|
||
iter 16, nblocks 13, nbytes 29.2M (A)
|
||
iter 17, nblocks 14, nbytes 30.8M (A)
|
||
iter 18, nblocks 15, nbytes 32.6M (A)
|
||
iter 19, nblocks 16, nbytes 32.8M (A)
|
||
iter 20, nblocks 17, nbytes 36.8M (A)
|
||
iter 21, nblocks 18, nbytes 39.6M (A)
|
||
iter 22, nblocks 19, nbytes 41.1M (A)
|
||
iter 23, nblocks 20, nbytes 44.1M (A)
|
||
iter 24, nblocks 21, nbytes 46.9M (A)
|
||
iter 25, nblocks 20, nbytes 46.5M (D)
|
||
iter 26, nblocks 21, nbytes 50.2M (A)
|
||
iter 27, nblocks 22, nbytes 53.8M (A)
|
||
|
||
Let's make it automatic, so we only need to boot and confirm that it
|
||
hangs. Just in case we can make it not hang by a miracle.
|
||
|
||
|
||
With blocks of up to 64K we hang in the ~50 M region.
|
||
|
||
iter=2042 nblocks=1577 allocated=50458K (A)
|
||
iter=2043 nblocks=1578 allocated=50489K (A)
|
||
iter=2044 nblocks=1579 allocated=50550K (A)
|
||
iter=2045 nblocks=1580 allocated=50605K (A)
|
||
|
||
With blocks of maxsize=512K it also hangs around ~57M.
|
||
|
||
memtool v1.0.0 maxsize=512K
|
||
iter=0 nblocks=1 allocated=88K (A)
|
||
iter=1 nblocks=2 allocated=464K (A)
|
||
...
|
||
iter=275 nblocks=218 allocated=56674K (A)
|
||
iter=276 nblocks=219 allocated=56787K (A)
|
||
iter=277 nblocks=220 allocated=57252K (A)
|
||
iter=278 nblocks=221 allocated=57493K (A)
|
||
iter=279 nblocks=222 allocated=57581K (A)
|
||
iter=280 nblocks=221 allocated=57416K (D)
|
||
iter=281 nblocks=222 allocated=57521K (A)
|
||
|
||
Maybe there is a problem in the memory segment? Can we reduce it to 1 GiB only
|
||
and see if it has any effect?
|
||
|
||
iter=289 nblocks=228 allocated=58636K (A)
|
||
iter=290 nblocks=227 allocated=58412K (D)
|
||
iter=291 nblocks=228 allocated=58480K (A)
|
||
iter=292 nblocks=229 allocated=58599K (A)
|
||
|
||
Has changed, but not much.
|
||
|
||
aaaiter=291 nblocks=228 allocated=58480K (A)
|
||
allocating...
|
||
filling...
|
||
aaaaaaaaiter=292 nblocks=229 allocated=58599K (A)
|
||
allocating...
|
||
aafilling...
|
||
|
||
It seems to be getting stuck in the filling phase. Can we trace it down with
|
||
ftrace? It should be generating page faults.
|
||
|
||
### 2024-07-10
|
||
|
||
So, if we manage to crash in the filling phase, we can further pinpoint the
|
||
issue and remove any effect of `malloc()`. It would be only related to a page
|
||
fault and the MMU at this point.
|
||
|
||
Let's make a much simpler program that only allocates once a buffer of N bytes
|
||
and then begins filling it, printing the progress in the output.
|
||
|
||
### OBSERVATION: Writing to a vector also stops around 58 MiB
|
||
|
||
+ memtool fill 536870912
|
||
memtool v0.0.1 - Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
|
||
mode fill: nbytes=512M, n=134217728
|
||
written=0K, addr=0x3f9b800020 OK
|
||
written=4096K, addr=0x3f9bc00020 OK
|
||
written=8192K, addr=0x3f9c000020 OK
|
||
written=12288K, addr=0x3f9c400020 OK
|
||
written=16384K, addr=0x3f9c800020 OK
|
||
written=20480K, addr=0x3f9cc00020 OK
|
||
written=24576K, addr=0x3f9d000020 OK
|
||
written=28672K, addr=0x3f9d400020 OK
|
||
written=32768K, addr=0x3f9d800020 OK
|
||
written=36864K, addr=0x3f9dc00020 OK
|
||
written=40960K, addr=0x3f9e000020 OK
|
||
written=45056K, addr=0x3f9e400020 OK
|
||
written=49152K, addr=0x3f9e800020 OK
|
||
written=53248K, addr=0x3f9ec00020 OK
|
||
written=57344K, addr=0x3f9f000020 OK
|
||
|
||
Let's see if we can trace the page fault.
|
||
|
||
In today's meeting, Jonnatan suggests test the memtool program with all-in-order
|
||
configuration. I should also try to reproduce this hang with a "production"
|
||
bitstream (from master).
|
||
|
||
Also, Xavi reports that the memtool chain test continued to run until the memory
|
||
was exhausted using an old bistream.
|
||
|
||
Let's do the quick CSR test first, and then we go back to the ftrace testing,
|
||
which will take more time.
|
||
|
||
Here are the commands I was testing, but nothing comes out of the console, even
|
||
after booting with the `tp_printk trace_buf_size=1M` boot options:
|
||
|
||
mkdir -p /sys/kernel/debug/
|
||
mount -t debugfs none /sys/kernel/debug/
|
||
td=/sys/kernel/debug/tracing
|
||
echo nop > $td/current_tracer
|
||
echo 100 > $td/max_graph_depth
|
||
echo do_page_fault > $td/set_graph_function
|
||
echo function_graph > $td/current_tracer
|
||
|
||
### OBSERVATION: Using all-in-order causes the hang in the same place
|
||
|
||
After setting the CSR 0x801 register to 0x7, the all-in-order configuration, the
|
||
memtool fill tests continues to hang in the same position:
|
||
|
||
+ csrtool all-in-order
|
||
CSR 0x801 = 7u
|
||
+ memtool fill 536870912
|
||
memtool v0.0.1 - Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
|
||
mode fill: nbytes=512M, n=134217728
|
||
written=0K, addr=0x3f8d600020 OK
|
||
written=4096K, addr=0x3f8da00020 OK
|
||
written=8192K, addr=0x3f8de00020 OK
|
||
written=12288K, addr=0x3f8e200020 OK
|
||
written=16384K, addr=0x3f8e600020 OK
|
||
written=20480K, addr=0x3f8ea00020 OK
|
||
written=24576K, addr=0x3f8ee00020 OK
|
||
written=28672K, addr=0x3f8f200020 OK
|
||
written=32768K, addr=0x3f8f600020 OK
|
||
written=36864K, addr=0x3f8fa00020 OK
|
||
written=40960K, addr=0x3f8fe00020 OK
|
||
written=45056K, addr=0x3f90200020 OK
|
||
written=49152K, addr=0x3f90600020 OK
|
||
written=53248K, addr=0x3f90a00020 OK
|
||
written=57344K, addr=0x3f90e00020 OK
|
||
|
||
It doesn't seem to have any observable effect with this test, other than going
|
||
more slow.
|
||
|
||
Interesting [article](https://wiki.osdev.org/RISC-V_Bare_Bones) on how to write
|
||
a simple bootrom that outputs some ASCII text into the console.
|
||
|
||
Regarding the `do_page_fault` filter, it seems that is no available. I need to
|
||
pick one from `available_filter_functions` instead. Let's try `handle_page_fault`.
|
||
|
||
It still doesn't seem to appear in the console. It is visible with a SysRq
|
||
request Ctrl+O then 'z'.
|
||
|
||
Maybe I can try disabling the huge pages, just to discard that it may be
|
||
related to it: `CONFIG_HUGETLBFS`.
|
||
|
||
Same hang with huge pages disabled, but a bit further ~70 MB.
|
||
|
||
### OBSERVATION: The Linux memtest fails in the first round
|
||
|
||
[ 0.000000] Linux version 6.9.7 (nixbld@localhost) (riscv64-unknown-linux-gnu-gcc (GCC) 13.3.0, GNU ld (GNU Binutils) 2.41) #1-NixOS Thu Jun 27 11:52:32 UTC 2024
|
||
[ 0.000000] Machine model: Barcelona Supercomputing Center - Lagarto Ox (NixOS)
|
||
[ 0.000000] SBI specification v2.0 detected
|
||
[ 0.000000] SBI implementation ID=0x1 Version=0x10004
|
||
[ 0.000000] SBI TIME extension detected
|
||
[ 0.000000] SBI IPI extension detected
|
||
[ 0.000000] SBI RFENCE extension detected
|
||
[ 0.000000] SBI DBCN extension detected
|
||
[ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '')
|
||
[ 0.000000] printk: legacy bootconsole [sbi0] enabled
|
||
[ 0.000000] Reserved memory: created DMA memory pool at 0x0000000060000000, size 256 MiB
|
||
[ 0.000000] OF: reserved mem: initialized node dma_pool@60000000, compatible id shared-dma-pool
|
||
[ 0.000000] OF: reserved mem: 0x0000000060000000..0x000000006fffffff (262144 KiB) map non-reusable dma_pool@60000000
|
||
[ 0.000000] Reserved memory: created DMA memory pool at 0x0000000070000000, size 256 MiB
|
||
[ 0.000000] OF: reserved mem: initialized node dma_pool@70000000, compatible id shared-dma-pool
|
||
[ 0.000000] OF: reserved mem: 0x0000000070000000..0x000000007fffffff (262144 KiB) map non-reusable dma_pool@70000000
|
||
[ 0.000000] cma: Reserved 16 MiB at 0x00000000bf000000 on node -1
|
||
[ 0.000000] early_memtest: # of tests: 3
|
||
[ 0.000000] 0x0000000080000000 - 0x0000000080013000 pattern 5555555555555555
|
||
|
||
This is not suggesting that the problem is not in the virtual memory, but with
|
||
the actual physical memory.
|
||
|
||
I will try a similar test with uboot with the `mtest` command, but requires
|
||
enabling it first.
|
||
|
||
### OBSERVATION: Memory in the 0xb0000000..0xc0000000 range is bad
|
||
|
||
Reproduced from U-Boot:
|
||
|
||
=> mtest 0x80000000 0x90000000 0 2
|
||
Testing 80000000 ... 90000000:
|
||
Pattern FFFFFFFFFFFFFFFF Writing... Reading...Iteration: 2
|
||
Tested 2 iteration(s) with 0 errors.
|
||
=> mtest 0x90000000 0xa0000000 0 2
|
||
Testing 90000000 ... a0000000:
|
||
Pattern FFFFFFFFFFFFFFFF Writing... Reading...Iteration: 2
|
||
Tested 2 iteration(s) with 0 errors.
|
||
=> mtest 0xa0000000 0xb0000000 0 2
|
||
Testing a0000000 ... b0000000:
|
||
Pattern FFFFFFFFFFFFFFFF Writing... Reading...Iteration: 2
|
||
Tested 2 iteration(s) with 0 errors.
|
||
=> mtest 0xb0000000 0xc0000000 0 2
|
||
Testing b0000000 ... c0000000:
|
||
Pattern 0000000000000000 Writing...
|
||
|
||
Let's see if we can fix the boot hang by reducing the memory enough to avoid
|
||
this bad region.
|
||
|
||
## 2024-07-11
|
||
|
||
### OBSERVATION: U-Boot mtest hangs in the last 256 MiB
|
||
|
||
After reducing the size of the RAM segment, I run again the mtest, but this time
|
||
it fails in the last 256 MiB block.
|
||
|
||
I assume that U-Boot moves itself to the last part of the memory, and them mtest
|
||
overwrites the U-Boot code, causing a hang.
|
||
|
||
So, I simply changed the FDT from U-Boot to skip the first 2M:
|
||
|
||
fdt set /memory@80000000 reg <0x0 0x80200000 0x0 0x40000000>
|
||
|
||
And then I enabled the memtest in the kernel boot parameters:
|
||
|
||
=> fdt set /memory@80000000 reg <0x0 0x80200000 0x0 0x30000000>
|
||
=> setenv bootargs "root=/dev/ram0 loglevel=7 debug rw earlycon=sbi console=hvc0 boot.trace boot.tracedebug init=/nix/store/zxbq93zfg8ijkyq5cq5sb4742rczqfck-nixos-system-nixos-riscv-24.11pre-git/init"
|
||
=> setenv ramdisk_size 12611657
|
||
=> setenv bootargs "root=/dev/ram0 loglevel=7 debug rw earlycon=sbi console=hvc0 boot.trace boot.tracedebug memtest=3 init=/nix/store/zxbq93zfg8ijkyq5cq5sb4742rczqfck-nixos-system-nixos-risc=> 4.11pre-git/init"
|
||
=> booti ${kernel_addr_r} ${ramdisk_addr_r}:${ramdisk_size} ${fdtcontroladdr}
|
||
Moving Image from 0x84000000 to 0x80200000, end=8303c4d0
|
||
## Flattened Device Tree blob at 80013000
|
||
Booting using the fdt blob at 0x80013000
|
||
Working FDT set to 80013000
|
||
Using Device Tree in place at 0000000080013000, end 000000008001696f
|
||
Working FDT set to 80013000
|
||
|
||
Starting kernel ...
|
||
|
||
[ 0.000000] Linux version 6.9.7 (nixbld@localhost) (riscv64-unknown-linux-gnu-gcc (GCC) 13.3.0, GNU ld (GNU Binutils) 2.41) #1-NixOS Thu Jun 27 11:52:32 UTC 2024
|
||
[ 0.000000] Machine model: Barcelona Supercomputing Center - Lagarto Ox (NixOS)
|
||
[ 0.000000] SBI specification v2.0 detected
|
||
[ 0.000000] SBI implementation ID=0x1 Version=0x10004
|
||
[ 0.000000] SBI TIME extension detected
|
||
[ 0.000000] SBI IPI extension detected
|
||
[ 0.000000] SBI RFENCE extension detected
|
||
[ 0.000000] SBI DBCN extension detected
|
||
[ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '')
|
||
[ 0.000000] printk: legacy bootconsole [sbi0] enabled
|
||
[ 0.000000] Reserved memory: created DMA memory pool at 0x0000000060000000, size 256 MiB
|
||
[ 0.000000] OF: reserved mem: initialized node dma_pool@60000000, compatible id shared-dma-pool
|
||
[ 0.000000] OF: reserved mem: 0x0000000060000000..0x000000006fffffff (262144 KiB) map non-reusable dma_pool@60000000
|
||
[ 0.000000] Reserved memory: created DMA memory pool at 0x0000000070000000, size 256 MiB
|
||
[ 0.000000] OF: reserved mem: initialized node dma_pool@70000000, compatible id shared-dma-pool
|
||
[ 0.000000] OF: reserved mem: 0x0000000070000000..0x000000007fffffff (262144 KiB) map non-reusable dma_pool@70000000
|
||
[ 0.000000] cma: Reserved 16 MiB at 0x00000000af000000 on node -1
|
||
[ 0.000000] early_memtest: # of tests: 3
|
||
[ 0.000000] 0x0000000083200000 - 0x000000008c300000 pattern 5555555555555555
|
||
[ 0.000000] 0x000000009e912000 - 0x00000000aeff9308 pattern 5555555555555555
|
||
[ 0.000000] 0x00000000aeff9337 - 0x00000000aeff9338 pattern 5555555555555555
|
||
[ 0.000000] 0x00000000aeff9367 - 0x00000000aeff9368 pattern 5555555555555555
|
||
[ 0.000000] 0x00000000aeffcffc - 0x00000000aeffd000 pattern 5555555555555555
|
||
[ 0.000000] 0x0000000083200000 - 0x000000008c300000 pattern ffffffffffffffff
|
||
[ 0.000000] 0x000000009e912000 - 0x00000000aeff9308 pattern ffffffffffffffff
|
||
[ 0.000000] 0x00000000aeff9337 - 0x00000000aeff9338 pattern ffffffffffffffff
|
||
[ 0.000000] 0x00000000aeff9367 - 0x00000000aeff9368 pattern ffffffffffffffff
|
||
[ 0.000000] 0x00000000aeffcffc - 0x00000000aeffd000 pattern ffffffffffffffff
|
||
[ 0.000000] 0x0000000083200000 - 0x000000008c300000 pattern 0000000000000000
|
||
[ 0.000000] 0x000000009e912000 - 0x00000000aeff9308 pattern 0000000000000000
|
||
[ 0.000000] 0x00000000aeff9337 - 0x00000000aeff9338 pattern 0000000000000000
|
||
[ 0.000000] 0x00000000aeff9367 - 0x00000000aeff9368 pattern 0000000000000000
|
||
[ 0.000000] 0x00000000aeffcffc - 0x00000000aeffd000 pattern 0000000000000000
|
||
[ 0.000000] Zone ranges:
|
||
[ 0.000000] DMA32 [mem 0x0000000080200000-0x00000000b01fffff]
|
||
[ 0.000000] Normal empty
|
||
[ 0.000000] Movable zone start for each node
|
||
[ 0.000000] Early memory node ranges
|
||
[ 0.000000] node 0: [mem 0x0000000080200000-0x00000000b01fffff]
|
||
|
||
That seems to pass the memtest fine, however the boot process hangs in different
|
||
stages.
|
||
|
||
### OBSERVATION: Cannot open /dev/ttyS0
|
||
|
||
~ # setserial -g /dev/ttyS1 -a
|
||
/dev/ttyS1, Line 1, UART: 16550, Port: 0x0000, IRQ: 1
|
||
Baud_base: 3125000, close_delay: 50, divisor: 0
|
||
closing_wait: 3000
|
||
Flags: spd_normal
|
||
|
||
~ # setserial -g /dev/ttyS0 -a
|
||
(hangs)
|
||
|
||
This page seems to have good resources on the serial console:
|
||
|
||
https://tldp.org/HOWTO/Serial-HOWTO-16.html
|
||
|
||
It seems that there are some differences in the way the serial port is handled
|
||
regarding 16550 and 16550A.
|
||
|
||
I can write to the UART console from U-Boot by directly writing in the
|
||
0x40001000 address (A = 0x41):
|
||
|
||
=> help mw
|
||
mw - memory write (fill)
|
||
|
||
Usage:
|
||
mw [.b, .w, .l, .q] address value [count]
|
||
=> mw 0x40001000 0x41
|
||
A=> mw 0x40001000 0x42
|
||
B=> mw 0x40001000 0x43
|
||
C=>
|
||
|
||
### OBSERVATION: I can type with the ttyS0 8250 driver
|
||
|
||
Tried to boot too, but hangs:
|
||
|
||
=> fdt set /memory@80000000 reg <0x0 0x80200000 0x0 0x30000000>
|
||
=> setenv bootargs "root=/dev/ram0 loglevel=7 debug rw earlycon=sbi console=ttyS0,115200n8 boot.trace boot.tracedebug init=/nix/store/zxbq93zfg8ijkyq5cq5sb4742rczqfck-nixos-
|
||
system-nixos-riscv-24.11pre-git/init"
|
||
=> setenv ramdisk_size 12611657
|
||
=> booti ${kernel_addr_r} ${ramdisk_addr_r}:${ramdisk_size} ${fdtcontroladdr}
|
||
...
|
||
[ 30.740740] riscv-plic 40800000.plic: mapped 3 interrupts with 1 handlers for 2 contexts.
|
||
[ 40.048300] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
|
||
[ 40.420000] of_serial 40001000.serial: error -ENXIO: IRQ index 0 not found
|
||
[ 40.496720] printk: legacy console [ttyS0] disabled
|
||
[ 40.558860] 40001000.serial: ttyS0 at MMIO 0x40001000 (irq = 0, base_baud = 3125000) is a 16550
|
||
[ 40.583940] printk: legacy console [ttyS0] enabled
|
||
[ 40.583940] printk: legacy console [ttyS0] enabled
|
||
[ 40.595760] printk: legacy bootconsole [sbi0] disabled
|
||
[ 40.595760] printk: legacy bootconsole [sbi0] disabled
|
||
[ 40.820380] 40003000.serial: ttyS1 at MMIO 0x40003000 (irq = 1, base_baud = 3125000) is a 16550
|
||
|
||
...
|
||
|
||
<<< NixOS Stage 2 >>>
|
||
|
||
[ 394.678980] EXT4-fs (pmem0p2): re-mounted 44444444-4444-4444-8888-888888888888 r/w. Quota mode: none.
|
||
[ 394.764620] booting system configuration /nix/store/zxbq93zfg8ijkyq5cq5sb4742rczqfck-nixos-system-nixos-riscv-24.11pre-git
|
||
runnin[ 398.543300] stage-2-init: running activation script...
|
||
g activation script...
|
||
|
||
So, if we observe a hang when writing to a bad memory segment, can there be a
|
||
problem in the place we are placing the pmem? Maybe we can test it with u-boot
|
||
first.
|
||
|
||
Another note, the serial device 16550 doesn't seem to use a FIFO, but the
|
||
16550A does.
|
||
|
||
We may want to switch to the A variant, as it seems to be supported by U-boot and
|
||
the kernel:
|
||
|
||
https://github.com/u-boot/u-boot/blob/master/drivers/serial/ns16550.c#L607-L619
|
||
https://github.com/torvalds/linux/blob/v6.9/drivers/tty/serial/8250/8250_of.c#L285
|
||
|
||
And defines a FIFO size of 16 bytes:
|
||
|
||
https://github.com/torvalds/linux/blob/v6.9/drivers/tty/serial/8250/8250_port.c#L74-L81
|
||
|
||
Still, we would have to wait for a bitstream that can forward the interrupts
|
||
from the host to the serial console to test it.
|
||
|
||
### OBSERVATION: The memory for the pmem seems to be ok
|
||
|
||
=> mtest 0x100000000 0x1c0000000 0 2
|
||
Testing 100000000 ... 1c0000000:
|
||
Pattern FFFFFFFFFFFFFFFF Writing... Reading...Iteration: 2
|
||
Tested 2 iteration(s) with 0 errors.
|
||
|
||
It may be worth enabling huge pages again, as there didn't seem to have any
|
||
effect.
|
||
|
||
### OBSERVATION: There are extra regions mapped by OpenSBI
|
||
|
||
In the OpenSBI 1.2 test from buildroot:
|
||
|
||
Domain0 Region00 : 0x0000000080000000-0x000000008003ffff ()
|
||
Domain0 Region01 : 0x0000000000000000-0xffffffffffffffff (R,W,X)
|
||
Domain0 Next Address : 0x0000000080200000
|
||
Domain0 Next Arg1 : 0x0000000080010000
|
||
|
||
But with OpenSBI 1.4:
|
||
|
||
Domain0 Region00 : 0x0000000040000000-0x0000000040000fff M: (I,R,W) S/U: (R,W)
|
||
Domain0 Region01 : 0x0000000080020000-0x000000008002ffff M: (R,W) S/U: ()
|
||
Domain0 Region02 : 0x0000000080000000-0x000000008001ffff M: (R,X) S/U: ()
|
||
Domain0 Region03 : 0x0000000000000000-0xffffffffffffffff M: () S/U: (R,W,X)
|
||
Domain0 Next Address : 0x0000000080200000
|
||
Domain0 Next Arg1 : 0x0000000080013000
|
||
|
||
## 2024-07-12
|
||
|
||
I find it strange that if we are writing to the 0x80000000 region from the
|
||
kernel and OpenSBI has mapped a segment for Machine mode only, there is no error
|
||
message about it in the console.
|
||
|
||
Let's update to OpenSBI 1.5 for now, and see if we get any more information.
|
||
|
||
It seems to be booting fine.
|
||
|
||
So, I observed the `udevadm settle` to take a lot of time, maybe I can get some
|
||
clues by enabling the debug mode.
|
||
|
||
If a trap is being issued, we may be able to modify OpenSBI to print some
|
||
information to the serial console so we can understand what is going on.
|
||
|
||
In the meanwhile, let's try a simple test. We can enable debug output of systemd
|
||
with the following boot options:
|
||
|
||
systemd.log_level=debug systemd.log_target=console
|
||
|
||
We will also need to run `csrtool all-in-order` to arrive to systemd.
|
||
|
||
### OBSERVATION: Hangs in `switch_root` again.
|
||
|
||
Technically, we cannot discard the hypothesis than only the console has crashed,
|
||
as when we switch to the stage 2 we don't have the heartbeat counter. We may as
|
||
well run it again before we arrive to systemd just to verify that userland
|
||
crashed.
|
||
|
||
On the other hand, I don't understand why we hang in such a way when we try to
|
||
write to the `0x8000_0000` area from the kernel memtest. I've been reading the
|
||
OpenSBI source code and they seem to have a trap handler that can emit verbose
|
||
information to the console when a problem with the trap is detected. I would
|
||
expect to see some error being dumped to the console in that case.
|
||
|
||
From the OpenSBI information, this line:
|
||
|
||
Domain0 Region02 : 0x0000000080000000-0x000000008001ffff M: (R,X) S/U: ()
|
||
|
||
Suggests that it registers a region with no write permission at `0x8000_0000`,
|
||
so it should fail right away from the kernel side. However, this is not reported
|
||
anywhere in the console.
|
||
|
||
As we have an easy way to trigger this situation, maybe we can use it as a test
|
||
to modify OpenSBI to report that problem to the console and verify that it is
|
||
working. With that information, we could rule out that a similar problem is
|
||
happening when we try to run systemd. Maybe we could also try to debug other
|
||
traps.
|
||
|
||
Another observation is that the memtest lines we see on the console are printed
|
||
*before* the actual test begins:
|
||
|
||
pr_info(" %pa - %pa pattern %016llx\n",
|
||
&this_start, &this_end, cpu_to_be64(pattern));
|
||
memtest(pattern, this_start, this_end - this_start);
|
||
|
||
So when this line is shown:
|
||
|
||
[ 0.000000] early_memtest: # of tests: 3
|
||
[ 0.000000] 0x0000000080000000 - 0x0000000080013000 pattern 5555555555555555
|
||
|
||
We an infer that the problem is located in that region, which agrees with the
|
||
hypothesis that is related with the OpenSBI regions.
|
||
|
||
This is the output I get with OpenSBI 1.5:
|
||
|
||
OpenSBI v1.5
|
||
____ _____ ____ _____
|
||
/ __ \ / ____| _ \_ _|
|
||
| | | |_ __ ___ _ __ | (___ | |_) || |
|
||
| | | | '_ \ / _ \ '_ \ \___ \| _ < | |
|
||
| |__| | |_) | __/ | | |____) | |_) || |_
|
||
\____/| .__/ \___|_| |_|_____/|____/_____|
|
||
| |
|
||
|_|
|
||
|
||
Platform Name : ox (Rodrigo NixOS version)
|
||
Platform Features : medeleg
|
||
Platform HART Count : 1
|
||
Platform IPI Device : ---
|
||
Platform Timer Device : axi_timer @ 50000000Hz
|
||
Platform Console Device : uart8250
|
||
Platform HSM Device : ---
|
||
Platform PMU Device : ---
|
||
Platform Reboot Device : ---
|
||
Platform Shutdown Device : ---
|
||
Platform Suspend Device : ---
|
||
Platform CPPC Device : ---
|
||
Firmware Base : 0x80000000
|
||
Firmware Size : 310 KB
|
||
Firmware RW Offset : 0x40000
|
||
Firmware RW Size : 54 KB
|
||
Firmware Heap Offset : 0x45000
|
||
Firmware Heap Size : 34 KB (total), 2 KB (reserved), 11 KB (used), 20 KB (free)
|
||
Firmware Scratch Size : 4096 B (total), 368 B (used), 3728 B (free)
|
||
Runtime SBI Version : 2.0
|
||
|
||
Domain0 Name : root
|
||
Domain0 Boot HART : 0
|
||
Domain0 HARTs : 0*
|
||
Domain0 Region00 : 0x0000000040000000-0x0000000040000fff M: (I,R,W) S/U: (R,W)
|
||
Domain0 Region01 : 0x0000000080040000-0x000000008004ffff M: (R,W) S/U: ()
|
||
Domain0 Region02 : 0x0000000080000000-0x000000008003ffff M: (R,X) S/U: ()
|
||
Domain0 Region03 : 0x0000000000000000-0xffffffffffffffff M: () S/U: (R,W,X)
|
||
Domain0 Next Address : 0x0000000080200000
|
||
Domain0 Next Arg1 : 0x0000000080017000
|
||
Domain0 Next Mode : S-mode
|
||
Domain0 SysReset : yes
|
||
Domain0 SysSuspend : yes
|
||
|
||
Boot HART ID : 0
|
||
Boot HART Domain : root
|
||
Boot HART Priv Version : v1.10
|
||
Boot HART Base ISA : rv64imafdc
|
||
Boot HART ISA Extensions : zicntr,zihpm,sdtrig
|
||
Boot HART PMP Count : 0
|
||
Boot HART PMP Granularity : 0 bits
|
||
Boot HART PMP Address Bits: 0
|
||
Boot HART MHPM Info : 29 (0xfffffff8)
|
||
Boot HART Debug Triggers : 0 triggers
|
||
Boot HART MIDELEG : 0x0000000000000222
|
||
Boot HART MEDELEG : 0x000000000000b109
|
||
|
||
<debug_uart>
|
||
Core: 12 devices, 8 uclasses, devicetree: board
|
||
Loading Environment from nowhere... OK
|
||
In: serial,usbkbd
|
||
Out: serial,vidconsole
|
||
Err: serial,vidconsole
|
||
No working controllers found
|
||
Net: No ethernet found.
|
||
Working FDT set to 80017000
|
||
Hit any key to stop autoboot: 0
|
||
|
||
Device 0: unknown device
|
||
|
||
Device 1: unknown device
|
||
scanning bus for devices...
|
||
|
||
Device 0: unknown device
|
||
starting USB...
|
||
No working controllers found
|
||
No ethernet found.
|
||
No ethernet found.
|
||
=>
|
||
|
||
Where now the regions are slightly off:
|
||
|
||
Domain0 Region00 : 0x0000000040000000-0x0000000040000fff M: (I,R,W) S/U: (R,W)
|
||
Domain0 Region01 : 0x0000000080040000-0x000000008004ffff M: (R,W) S/U: ()
|
||
Domain0 Region02 : 0x0000000080000000-0x000000008003ffff M: (R,X) S/U: ()
|
||
Domain0 Region03 : 0x0000000000000000-0xffffffffffffffff M: () S/U: (R,W,X)
|
||
|
||
I would assume that the region 1 is where OpenSBI places its own data, and
|
||
region 2 is where it places its own code. Then, in region 0 there is the serial
|
||
area.
|
||
|
||
Interestingly, I can read and write to the 0x80000000 - 0x81000000 from u-boot
|
||
without problems:
|
||
|
||
=> mtest 0x80000000 0x81000000 0 4
|
||
Testing 80000000 ... 81000000:
|
||
Pattern FFFFFFFFFFFFFFFF Writing... Reading...Iteration: 4
|
||
Tested 4 iteration(s) with 0 errors.
|
||
|
||
So I suspect that it disables those regions before jumping into U-Boot.
|
||
|
||
What I don't understand is why the MMIO region 0 is starting at 0x40000000 while
|
||
the UART port should be mapped in 0x40001000 as per the device tree. Maybe we
|
||
could try with the generic configuration of OpenSBI and see if it can load the
|
||
plic and the serial ports properly directly from the device tree.
|
||
|
||
Nice, with OpenSBI 1.5 I can see the console errors when trying the generic
|
||
configuration:
|
||
|
||
OpenSBI v1.5
|
||
____ _____ ____ _____
|
||
/ __ \ / ____| _ \_ _|
|
||
| | | |_ __ ___ _ __ | (___ | |_) || |
|
||
| | | | '_ \ / _ \ '_ \ \___ \| _ < | |
|
||
| |__| | |_) | __/ | | |____) | |_) || |_
|
||
\____/| .__/ \___|_| |_|_____/|____/_____|
|
||
| |
|
||
|_|
|
||
|
||
init_coldboot: timer init failed (error -3)
|
||
|
||
Seems to be failing in `sbi_timer_init()` with `cold_boot = true`.
|
||
|
||
And the -3 error seems to be:
|
||
|
||
#define SBI_ERR_INVALID_PARAM -3
|
||
|
||
I assume it is calling `fdt_timer_init()`.
|
||
|
||
Let's try removing the `reg-names` property, as it seems to cause it to enter a
|
||
different branch, but "control" is never matched there.
|
||
|
||
Still failing, the problem must be somewhere else.
|
||
|
||
Let's try with openpiton configuration instead.
|
||
|
||
Doesn't even start the UART:
|
||
|
||
GGGGGGGGG
|
||
|
||
These G's must be coming from the bootrom.
|
||
|
||
So let's go back to the generic platform and place some `printf()` calls to
|
||
determine where it is failing.
|
||
|
||
OpenSBI v1.5
|
||
____ _____ ____ _____
|
||
/ __ \ / ____| _ \_ _|
|
||
| | | |_ __ ___ _ __ | (___ | |_) || |
|
||
| | | | '_ \ / _ \ '_ \ \___ \| _ < | |
|
||
| |__| | |_) | __/ | | |____) | |_) || |_
|
||
\____/| .__/ \___|_| |_|_____/|____/_____|
|
||
| |
|
||
|_|
|
||
|
||
sbi_timer_init: begins
|
||
sbi_timer_init: got Zicntr extension
|
||
fdt_timer_cold_init: pos = 0
|
||
fdt_timer_cold_init: got match, name = riscv,clint0
|
||
fdt_timer_cold_init: enabled
|
||
fdt_timer_cold_init: drc->cold_init = -3
|
||
fdt_timer_init: fdt_timer_cold_init failed (-3)
|
||
sbi_platform_timer: sbi_platform_timer_init failed (-3)
|
||
init_coldboot: timer init failed (error -3)
|
||
|
||
Okay, now we can see where it failed. I wonder why aren't these messages enabled
|
||
by default. I'll guess this is the `timer_mtimer_cold_init()` function, so let's
|
||
add some more instrumentation there.
|
||
|
||
It seems that SiFive timer has a very long weird offset:
|
||
|
||
if (is_clint) { /* SiFive CLINT */
|
||
/* Set CLINT addresses */
|
||
mt->mtimecmp_addr = addr[0] + ACLINT_DEFAULT_MTIMECMP_OFFSET;
|
||
mt->mtimecmp_size = ACLINT_DEFAULT_MTIMECMP_SIZE;
|
||
if (!quirks->clint_without_mtime) {
|
||
mt->mtime_addr = addr[0] + ACLINT_DEFAULT_MTIME_OFFSET;
|
||
mt->mtime_size = size[0] - mt->mtimecmp_size;
|
||
/* Adjust MTIMER address and size for CLINT device */
|
||
mt->mtime_addr += quirks->clint_mtime_offset;
|
||
mt->mtime_size -= quirks->clint_mtime_offset;
|
||
} else {
|
||
mt->mtime_addr = mt->mtime_size = 0;
|
||
}
|
||
mt->mtimecmp_addr += quirks->clint_mtime_offset;
|
||
} else { /* RISC-V ACLINT MTIMER */
|
||
/* Set ACLINT MTIMER addresses */
|
||
mt->mtime_addr = addr[0];
|
||
mt->mtime_size = size[0];
|
||
mt->mtimecmp_addr = addr[1];
|
||
mt->mtimecmp_size = size[1];
|
||
}
|
||
|
||
We may want to use the ACLINT timer instead. Let's first see where the addresses
|
||
lay in memory, and then use that to verify we change it to 0x0 and 0x8.
|
||
|
||
GGGGGGGGG
|
||
OpenSBI v1.5
|
||
____ _____ ____ _____
|
||
/ __ \ / ____| _ \_ _|
|
||
| | | |_ __ ___ _ __ | (___ | |_) || |
|
||
| | | | '_ \ / _ \ '_ \ \___ \| _ < | |
|
||
| |__| | |_) | __/ | | |____) | |_) || |_
|
||
\____/| .__/ \___|_| |_|_____/|____/_____|
|
||
| |
|
||
|_|
|
||
|
||
sbi_timer_init: begins
|
||
sbi_timer_init: got Zicntr extension
|
||
fdt_timer_cold_init: pos = 0
|
||
fdt_timer_cold_init: got match, name = riscv,clint0
|
||
fdt_timer_cold_init: enabled
|
||
timer_mtimer_cold_init: begins, is_clint = 1
|
||
timer_mtimer_cold_init: mtime_addr = 0x4000dff8
|
||
timer_mtimer_cold_init: mtime_size = 0x000b4008
|
||
timer_mtimer_cold_init: mtime_addr = 0x40006000
|
||
timer_mtimer_cold_init: mtime_addr = 0x00007ff8
|
||
fdt_timer_cold_init: drc->cold_init = -3
|
||
fdt_timer_init: fdt_timer_cold_init failed (-3)
|
||
sbi_platform_timer: sbi_platform_timer_init failed (-3)
|
||
init_coldboot: timer init failed (error -3)
|
||
|
||
Yeah, those addresses are not what we want. Based on the device tree, the clint
|
||
must be at 0x40002000, so they should be 40002000 and 40002008. Also I made some
|
||
typos in the printf command, this is the patch:
|
||
|
||
+ sbi_printf("timer_mtimer_cold_init: mtime_addr = 0x%08lx\n", mt->mtime_addr);
|
||
+ sbi_printf("timer_mtimer_cold_init: mtime_size = 0x%08lx\n", mt->mtime_size);
|
||
+ sbi_printf("timer_mtimer_cold_init: mtime_addr = 0x%08lx\n", mt->mtimecmp_addr);
|
||
+ sbi_printf("timer_mtimer_cold_init: mtime_addr = 0x%08lx\n", mt->mtimecmp_size);
|
||
|
||
So, let's fix the patch and switch to the "riscv,aclint-mtimer" timer, which
|
||
doesn't set any quirk or weird offset.
|
||
|
||
GGGGGGGGG
|
||
OpenSBI v1.5
|
||
____ _____ ____ _____
|
||
/ __ \ / ____| _ \_ _|
|
||
| | | |_ __ ___ _ __ | (___ | |_) || |
|
||
| | | | '_ \ / _ \ '_ \ \___ \| _ < | |
|
||
| |__| | |_) | __/ | | |____) | |_) || |_
|
||
\____/| .__/ \___|_| |_|_____/|____/_____|
|
||
| |
|
||
|_|
|
||
|
||
sbi_timer_init: begins
|
||
sbi_timer_init: got Zicntr extension
|
||
fdt_timer_cold_init: pos = 0
|
||
fdt_timer_cold_init: got match, name = riscv,aclint-mtimer
|
||
fdt_timer_cold_init: enabled
|
||
timer_mtimer_cold_init: begins, is_clint = 0
|
||
timer_mtimer_cold_init: mtime_addr = 0x40002000
|
||
timer_mtimer_cold_init: mtime_size = 0x000c0000
|
||
timer_mtimer_cold_init: mtimecmp_addr = 0x00000000
|
||
timer_mtimer_cold_init: mtimecmp_size = 0x00000000
|
||
fdt_timer_cold_init: drc->cold_init = -3
|
||
fdt_timer_init: fdt_timer_cold_init failed (-3)
|
||
sbi_platform_timer: sbi_platform_timer_init failed (-3)
|
||
init_coldboot: timer init failed (error -3)
|
||
|
||
So, now we have better addresses, but this is not what I would expect. The regs
|
||
property is being used to set the mtime address and size, while I would expect
|
||
it to set the `mtimecmp_addr` too, but it seems it is expecting two pairs of
|
||
address and size in the regs:
|
||
|
||
reg = <0x0 0x40002000 0x0 0x000c0000>;
|
||
|
||
So, first let's add the mtimecmp after the mtime registers and see if that
|
||
clears the -3 error.
|
||
|
||
I'll leave it at 0x40002000, but I suspect the address might be 0x40170000:
|
||
|
||
#define OX_ALVEO_TIMER_BASE 0x40170000
|
||
|
||
After booting again, now it seems to work. But I'm not sure if that
|
||
may cause more problems down the line.
|
||
|
||
GGGGGGGGG
|
||
OpenSBI v1.5
|
||
____ _____ ____ _____
|
||
/ __ \ / ____| _ \_ _|
|
||
| | | |_ __ ___ _ __ | (___ | |_) || |
|
||
| | | | '_ \ / _ \ '_ \ \___ \| _ < | |
|
||
| |__| | |_) | __/ | | |____) | |_) || |_
|
||
\____/| .__/ \___|_| |_|_____/|____/_____|
|
||
| |
|
||
|_|
|
||
|
||
sbi_timer_init: begins
|
||
sbi_timer_init: got Zicntr extension
|
||
fdt_timer_cold_init: pos = 0
|
||
fdt_timer_cold_init: got match, name = riscv,aclint-mtimer
|
||
fdt_timer_cold_init: enabled
|
||
timer_mtimer_cold_init: begins, is_clint = 0
|
||
timer_mtimer_cold_init: mtime_addr = 0x40002000
|
||
timer_mtimer_cold_init: mtime_size = 0x00000008
|
||
timer_mtimer_cold_init: mtimecmp_addr = 0x40002008
|
||
timer_mtimer_cold_init: mtimecmp_size = 0x00000008
|
||
fdt_timer_cold_init: drc->cold_init = 0
|
||
fdt_timer_cold_init: pos = 1
|
||
fdt_timer_cold_init: returns 0
|
||
Platform Name : Barcelona Supercomputing Center - Lagarto Ox (NixOS)
|
||
Platform Features : medeleg
|
||
Platform HART Count : 1
|
||
Platform IPI Device : ---
|
||
Platform Timer Device : aclint-mtimer @ 50000Hz
|
||
Platform Console Device : uart8250
|
||
Platform HSM Device : ---
|
||
Platform PMU Device : ---
|
||
Platform Reboot Device : ---
|
||
Platform Shutdown Device : ---
|
||
Platform Suspend Device : ---
|
||
Platform CPPC Device : ---
|
||
Firmware Base : 0x80000000
|
||
Firmware Size : 327 KB
|
||
Firmware RW Offset : 0x40000
|
||
Firmware RW Size : 71 KB
|
||
Firmware Heap Offset : 0x49000
|
||
Firmware Heap Size : 35 KB (total), 2 KB (reserved), 11 KB (used), 21 KB (free)
|
||
Firmware Scratch Size : 4096 B (total), 408 B (used), 3688 B (free)
|
||
Runtime SBI Version : 2.0
|
||
|
||
Domain0 Name : root
|
||
Domain0 Boot HART : 0
|
||
Domain0 HARTs : 0*
|
||
Domain0 Region00 : 0x0000000040002000-0x000000004000200f M: (I,R,W) S/U: ()
|
||
Domain0 Region01 : 0x0000000040001000-0x0000000040001fff M: (I,R,W) S/U: (R,W)
|
||
Domain0 Region02 : 0x0000000080040000-0x000000008005ffff M: (R,W) S/U: ()
|
||
Domain0 Region03 : 0x0000000080000000-0x000000008003ffff M: (R,X) S/U: ()
|
||
Domain0 Region04 : 0x0000000040800000-0x0000000040bfffff M: (I,R,W) S/U: (R,W)
|
||
Domain0 Region05 : 0x0000000000000000-0xffffffffffffffff M: () S/U: (R,W,X)
|
||
Domain0 Next Address : 0x0000000080200000
|
||
Domain0 Next Arg1 : 0x0000000082200000
|
||
Domain0 Next Mode : S-mode
|
||
Domain0 SysReset : yes
|
||
Domain0 SysSuspend : yes
|
||
|
||
Boot HART ID : 0
|
||
Boot HART Domain : root
|
||
Boot HART Priv Version : v1.10
|
||
Boot HART Base ISA : rv64imafdc
|
||
Boot HART ISA Extensions : zicntr,zihpm,sdtrig
|
||
Boot HART PMP Count : 0
|
||
Boot HART PMP Granularity : 0 bits
|
||
Boot HART PMP Address Bits: 0
|
||
Boot HART MHPM Info : 29 (0xfffffff8)
|
||
Boot HART Debug Triggers : 0 triggers
|
||
Boot HART MIDELEG : 0x0000000000000222
|
||
Boot HART MEDELEG : 0x000000000000b109
|
||
|
||
<debug_uart>
|
||
Core: 12 devices, 8 uclasses, devicetree: board
|
||
Loading Environment from nowhere... OK
|
||
In: serial,usbkbd
|
||
Out: serial,vidconsole
|
||
Err: serial,vidconsole
|
||
No working controllers found
|
||
Net: No ethernet found.
|
||
Working FDT set to 82200000
|
||
Hit any key to stop autoboot: 0
|
||
|
||
Device 0: unknown device
|
||
|
||
Device 1: unknown device
|
||
scanning bus for devices...
|
||
|
||
Device 0: unknown device
|
||
starting USB...
|
||
No working controllers found
|
||
No ethernet found.
|
||
No ethernet found.
|
||
|
||
However, now I cannot boot the kernel, as it is overwriting the FDT:
|
||
|
||
=> printenv fdtcontroladdr
|
||
fdtcontroladdr=82200000
|
||
=> fdt addr ${fdtcontroladdr }
|
||
Working fdt: 82200000
|
||
=> setenv bootargs "root=/dev/ram0 loglevel=7 debug rw earlycon=sbi console=hvc0"
|
||
=> setenv ramdisk_size 12614846
|
||
=> #booti ${kernel_addr_r} ${ramdisk_addr_r}:${ramdisk_size} ${fdtcontroladdr}
|
||
=> printenv kernel_addr_r
|
||
kernel_addr_r=0x84000000
|
||
=> printenv ramdisk_addr_r
|
||
ramdisk_addr_r=0x8c300000
|
||
=> booti ${kernel_addr_r} ${ramdisk_addr_r}:${ramdisk_size} ${fdtcontroladdr}
|
||
Moving Image from 0x84000000 to 0x80200000, end=83044650
|
||
ERROR: Did not find a cmdline Flattened Device Tree
|
||
Could not find a valid device tree
|
||
|
||
Let's move the FDT to 0x80100000.
|
||
|
||
Now I can load the kernel, but it gets stuck in the middle of the boot:
|
||
|
||
GGGGGGGGG
|
||
OpenSBI v1.5
|
||
____ _____ ____ _____
|
||
/ __ \ / ____| _ \_ _|
|
||
| | | |_ __ ___ _ __ | (___ | |_) || |
|
||
| | | | '_ \ / _ \ '_ \ \___ \| _ < | |
|
||
| |__| | |_) | __/ | | |____) | |_) || |_
|
||
\____/| .__/ \___|_| |_|_____/|____/_____|
|
||
| |
|
||
|_|
|
||
|
||
sbi_timer_init: begins
|
||
sbi_timer_init: got Zicntr extension
|
||
fdt_timer_cold_init: pos = 0
|
||
fdt_timer_cold_init: got match, name = riscv,aclint-mtimer
|
||
fdt_timer_cold_init: enabled
|
||
timer_mtimer_cold_init: begins, is_clint = 0
|
||
timer_mtimer_cold_init: mtime_addr = 0x40002000
|
||
timer_mtimer_cold_init: mtime_size = 0x00000008
|
||
timer_mtimer_cold_init: mtimecmp_addr = 0x40002008
|
||
timer_mtimer_cold_init: mtimecmp_size = 0x00000008
|
||
fdt_timer_cold_init: drc->cold_init = 0
|
||
fdt_timer_cold_init: pos = 1
|
||
fdt_timer_cold_init: returns 0
|
||
Platform Name : Barcelona Supercomputing Center - Lagarto Ox (NixOS)
|
||
Platform Features : medeleg
|
||
Platform HART Count : 1
|
||
Platform IPI Device : ---
|
||
Platform Timer Device : aclint-mtimer @ 50000Hz
|
||
Platform Console Device : uart8250
|
||
Platform HSM Device : ---
|
||
Platform PMU Device : ---
|
||
Platform Reboot Device : ---
|
||
Platform Shutdown Device : ---
|
||
Platform Suspend Device : ---
|
||
Platform CPPC Device : ---
|
||
Firmware Base : 0x80000000
|
||
Firmware Size : 327 KB
|
||
Firmware RW Offset : 0x40000
|
||
Firmware RW Size : 71 KB
|
||
Firmware Heap Offset : 0x49000
|
||
Firmware Heap Size : 35 KB (total), 2 KB (reserved), 11 KB (used), 21 KB (free)
|
||
Firmware Scratch Size : 4096 B (total), 408 B (used), 3688 B (free)
|
||
Runtime SBI Version : 2.0
|
||
|
||
Domain0 Name : root
|
||
Domain0 Boot HART : 0
|
||
Domain0 HARTs : 0*
|
||
Domain0 Region00 : 0x0000000040002000-0x000000004000200f M: (I,R,W) S/U: ()
|
||
Domain0 Region01 : 0x0000000040001000-0x0000000040001fff M: (I,R,W) S/U: (R,W)
|
||
Domain0 Region02 : 0x0000000080040000-0x000000008005ffff M: (R,W) S/U: ()
|
||
Domain0 Region03 : 0x0000000080000000-0x000000008003ffff M: (R,X) S/U: ()
|
||
Domain0 Region04 : 0x0000000040800000-0x0000000040bfffff M: (I,R,W) S/U: (R,W)
|
||
Domain0 Region05 : 0x0000000000000000-0xffffffffffffffff M: () S/U: (R,W,X)
|
||
Domain0 Next Address : 0x0000000080200000
|
||
Domain0 Next Arg1 : 0x0000000080100000
|
||
Domain0 Next Mode : S-mode
|
||
Domain0 SysReset : yes
|
||
Domain0 SysSuspend : yes
|
||
|
||
Boot HART ID : 0
|
||
Boot HART Domain : root
|
||
Boot HART Priv Version : v1.10
|
||
Boot HART Base ISA : rv64imafdc
|
||
Boot HART ISA Extensions : zicntr,zihpm,sdtrig
|
||
Boot HART PMP Count : 0
|
||
Boot HART PMP Granularity : 0 bits
|
||
Boot HART PMP Address Bits: 0
|
||
Boot HART MHPM Info : 29 (0xfffffff8)
|
||
Boot HART Debug Triggers : 0 triggers
|
||
Boot HART MIDELEG : 0x0000000000000222
|
||
Boot HART MEDELEG : 0x000000000000b109
|
||
|
||
<debug_uart>
|
||
Core: 12 devices, 8 uclasses, devicetree: board
|
||
Loading Environment from nowhere... OK
|
||
In: serial,usbkbd
|
||
Out: serial,vidconsole
|
||
Err: serial,vidconsole
|
||
No working controllers found
|
||
Net: No ethernet found.
|
||
Working FDT set to 80100000
|
||
Hit any key to stop autoboot: 0
|
||
|
||
Device 0: unknown device
|
||
|
||
Device 1: unknown device
|
||
scanning bus for devices...
|
||
|
||
Device 0: unknown device
|
||
starting USB...
|
||
No working controllers found
|
||
No ethernet found.
|
||
No ethernet found.
|
||
=> setenv bootargs "root=/dev/ram0 loglevel=7 debug rw earlycon=sbi console=hvc0"
|
||
=> setenv ramdisk_size 12614846
|
||
=> booti ${kernel_addr_r} ${ramdisk_addr_r}:${ramdisk_size} ${fdtcontroladdr}
|
||
Moving Image from 0x84000000 to 0x80200000, end=83044650
|
||
## Flattened Device Tree blob at 80100000
|
||
Booting using the fdt blob at 0x80100000
|
||
Working FDT set to 80100000
|
||
ERROR: reserving fdt memory region failed (addr=80000000 size=4000000 flags=4)
|
||
Using Device Tree in place at 0000000080100000, end 0000000080103dd0
|
||
Working FDT set to 80100000
|
||
|
||
Starting kernel ...
|
||
|
||
[ 0.000000] Linux version 6.9.7 (nixbld@localhost) (riscv64-unknown-linux-gnu-gcc (GCC) 13.3.0, GNU ld (GNU Binutils) 2.41) #1-NixOS Thu Jun 27 11:52:32 UTC 2024
|
||
[ 0.000000] Machine model: Barcelona Supercomputing Center - Lagarto Ox (NixOS)
|
||
[ 0.000000] SBI specification v2.0 detected
|
||
[ 0.000000] SBI implementation ID=0x1 Version=0x10005
|
||
[ 0.000000] SBI TIME extension detected
|
||
[ 0.000000] SBI IPI extension detected
|
||
[ 0.000000] SBI RFENCE extension detected
|
||
[ 0.000000] SBI DBCN extension detected
|
||
[ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '')
|
||
[ 0.000000] printk: legacy bootconsole [sbi0] enabled
|
||
[ 0.000000] OF: reserved mem: Reserved memory: failed to reserve memory for node 'reserved@80000000': base 0x0000000080000000, size 64 MiB
|
||
[ 0.000000] OF: reserved mem: OVERLAP DETECTED!
|
||
[ 0.000000] mmode_resv1@80000000 (0x0000000080000000--0x0000000080040000) overlaps with reserved@80000000 (0x0000000080000000--0x0000000084000000)
|
||
[ 0.000000] OF: reserved mem: OVERLAP DETECTED!
|
||
[ 0.000000] reserved@80000000 (0x0000000080000000--0x0000000084000000) overlaps with mmode_resv0@80040000 (0x0000000080040000--0x0000000080060000)
|
||
[ 0.000000] Reserved memory: created DMA memory pool at 0x0000000060000000, size 256 MiB
|
||
[ 0.000000] OF: reserved mem: initialized node dma_pool@60000000, compatible id shared-dma-pool
|
||
[ 0.000000] OF: reserved mem: 0x0000000060000000..0x000000006fffffff (262144 KiB) map non-reusable dma_pool@60000000
|
||
[ 0.000000] Reserved memory: created DMA memory pool at 0x0000000070000000, size 256 MiB
|
||
[ 0.000000] OF: reserved mem: initialized node dma_pool@70000000, compatible id shared-dma-pool
|
||
[ 0.000000] OF: reserved mem: 0x0000000070000000..0x000000007fffffff (262144 KiB) map non-reusable dma_pool@70000000
|
||
[ 0.000000] OF: reserved mem: 0x0000000080000000..0x000000008003ffff (256 KiB) nomap non-reusable mmode_resv1@80000000
|
||
[ 0.000000] OF: reserved mem: 0x0000000080000000..0x0000000083ffffff (65536 KiB) nomap non-reusable reserved@80000000
|
||
[ 0.000000] OF: reserved mem: 0x0000000080040000..0x000000008005ffff (128 KiB) nomap non-reusable mmode_resv0@80040000
|
||
[ 0.000000] cma: Reserved 16 MiB at 0x00000000af000000 on node -1
|
||
[ 0.000000] Zone ranges:
|
||
[ 0.000000] DMA32 [mem 0x0000000080000000-0x00000000afffffff]
|
||
[ 0.000000] Normal empty
|
||
[ 0.000000] Movable zone start for each node
|
||
[ 0.000000] Early memory node ranges
|
||
[ 0.000000] node 0: [mem 0x0000000080000000-0x000000008005ffff]
|
||
[ 0.000000] node 0: [mem 0x0000000080060000-0x00000000afffffff]
|
||
[ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x00000000afffffff]
|
||
[ 0.000000] Falling back to deprecated "riscv,isa"
|
||
[ 0.000000] riscv: base ISA extensions adfim
|
||
[ 0.000000] riscv: ELF capabilities adfim
|
||
[ 0.000000] pcpu-alloc: s0 r0 d131072 u131072 alloc=1*131072
|
||
[ 0.000000] pcpu-alloc: [0] 0
|
||
[ 0.000000] Kernel command line: root=/dev/ram0 loglevel=7 debug rw earlycon=sbi console=hvc0
|
||
[ 0.000000] Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
|
||
[ 0.000000] Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
|
||
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 193536
|
||
[ 0.000000] mem auto-init: stack:all(zero), heap alloc:off, heap free:off
|
||
[ 0.000000] Virtual kernel memory layout:
|
||
[ 0.000000] fixmap : 0xffffffc6fea00000 - 0xffffffc6ff000000 (6144 kB)
|
||
[ 0.000000] pci io : 0xffffffc6ff000000 - 0xffffffc700000000 ( 16 MB)
|
||
[ 0.000000] vmemmap : 0xffffffc700000000 - 0xffffffc800000000 (4096 MB)
|
||
[ 0.000000] vmalloc : 0xffffffc800000000 - 0xffffffd800000000 ( 64 GB)
|
||
[ 0.000000] modules : 0xffffffff02e45000 - 0xffffffff80000000 (2001 MB)
|
||
[ 0.000000] lowmem : 0xffffffd800000000 - 0xffffffd830000000 ( 768 MB)
|
||
[ 0.000000] kernel : 0xffffffff80000000 - 0xffffffffffffffff (2047 MB)
|
||
[ 0.000000] Memory: 386632K/786432K available (17075K kernel code, 9047K rwdata, 10240K rodata, 8737K init, 917K bss, 383416K reserved, 16384K cma-reserved)
|
||
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
|
||
[ 0.000000] ftrace: allocating 46961 entries in 184 pages
|
||
[ 0.000000] ftrace: allocated 184 pages with 4 groups
|
||
[ 0.000000] trace event string verifier disabled
|
||
[ 0.000000] RCU Tasks Rude: Setting shift to 0 and lim to 1 rcu_task_cb_adjust=1.
|
||
[ 0.000000] RCU Tasks Trace: Setting shift to 0 and lim to 1 rcu_task_cb_adjust=1.
|
||
[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
|
||
[ 0.000000] riscv-intc: 64 local interrupts mapped
|
||
[ 0.000000] clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0x179dd7f66, max_idle_ns: 56421785867800 ns
|
||
[ 0.000020] sched_clock: 64 bits at 50kHz, resolution 20000ns, wraps every 70368744170000ns
|
||
[ 0.015020] kfence: initialized - using 2097152 bytes for 255 objects at 0x(____ptrval____)-0x(____ptrval____)
|
||
[ 0.035900] Console: colour dummy device 80x25
|
||
[ 0.041720] Calibrating delay loop (skipped), value calculated using timer frequency.. 0.10 BogoMIPS (lpj=200)
|
||
[ 0.052480] pid_max: default: 32768 minimum: 301
|
||
[ 0.187380] LSM: initializing lsm=capability,landlock,yama
|
||
[ 0.479640] landlock: Up and running.
|
||
[ 0.483700] Yama: becoming mindful.
|
||
[ 0.521600] Mount-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
|
||
[ 0.529780] Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
|
||
[ 0.784220] riscv: ELF compat mode unsupported
|
||
[ 0.784600] ASID allocator disabled (0 bits)
|
||
[ 0.867620] devtmpfs: initialized
|
||
[ 0.986340] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
|
||
[ 0.996900] futex hash table entries: 256 (order: 1, 12288 bytes, linear)
|
||
[ 1.084560] pinctrl core: initialized pinctrl subsystem
|
||
[ 1.195460] NET: Registered PF_NETLINK/PF_ROUTE protocol family
|
||
[ 1.244960] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
|
||
[ 1.255960] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
|
||
[ 1.266500] audit: initializing netlink subsys (disabled)
|
||
[ 1.317920] thermal_sys: Registered thermal governor 'step_wise'
|
||
[ 1.319800] cpuidle: using governor ladder
|
||
|
||
This is very interesting, because all these lockups show the same symptoms and
|
||
they may be related with the kernel trying to access an area of memory that it
|
||
shouldn't.
|
||
|
||
First, let's remove the reserved region, as now OpenSBI properly forwards the
|
||
regions to the kernel.
|
||
|
||
It continues to hang, but at least now we don't have overlap of memory regions:
|
||
|
||
[ 0.000000] Reserved memory: created DMA memory pool at 0x0000000060000000, size 256 MiB
|
||
[ 0.000000] OF: reserved mem: initialized node dma_pool@60000000, compatible id shared-dma-pool
|
||
[ 0.000000] OF: reserved mem: 0x0000000060000000..0x000000006fffffff (262144 KiB) map non-reusable dma_pool@60000000
|
||
[ 0.000000] Reserved memory: created DMA memory pool at 0x0000000070000000, size 256 MiB
|
||
[ 0.000000] OF: reserved mem: initialized node dma_pool@70000000, compatible id shared-dma-pool
|
||
[ 0.000000] OF: reserved mem: 0x0000000070000000..0x000000007fffffff (262144 KiB) map non-reusable dma_pool@70000000
|
||
[ 0.000000] OF: reserved mem: 0x0000000080000000..0x000000008003ffff (256 KiB) nomap non-reusable mmode_resv1@80000000
|
||
[ 0.000000] OF: reserved mem: 0x0000000080040000..0x000000008005ffff (128 KiB) nomap non-reusable mmode_resv0@80040000
|
||
[ 0.000000] cma: Reserved 16 MiB at 0x00000000af000000 on node -1
|
||
[ 0.000000] Zone ranges:
|
||
[ 0.000000] DMA32 [mem 0x0000000080000000-0x00000000afffffff]
|
||
[ 0.000000] Normal empty
|
||
[ 0.000000] Movable zone start for each node
|
||
[ 0.000000] Early memory node ranges
|
||
[ 0.000000] node 0: [mem 0x0000000080000000-0x000000008005ffff]
|
||
[ 0.000000] node 0: [mem 0x0000000080060000-0x00000000afffffff]
|
||
[ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x00000000afffffff]
|
||
|
||
Let's remove the clint from the device tree for now, just to see if it is
|
||
affecting.
|
||
|
||
Disabling the clint doesn't seem to cause any effect. In fact, the kernel is
|
||
still able to get a clock as shown in the kernel messages.
|
||
|
||
### OBSERVATION: The kernel hangs after the unaligned check begins
|
||
|
||
I enabled the ftrace for initcall and I can see that is getting stuck in the
|
||
unaligned check:
|
||
|
||
Domain0 Name : root
|
||
Domain0 Boot HART : 0
|
||
Domain0 HARTs : 0*
|
||
Domain0 Region00 : 0x0000000040001000-0x0000000040001fff M: (I,R,W) S/U: (R,W)
|
||
Domain0 Region01 : 0x0000000080040000-0x000000008005ffff M: (R,W) S/U: ()
|
||
Domain0 Region02 : 0x0000000080000000-0x000000008003ffff M: (R,X) S/U: ()
|
||
Domain0 Region03 : 0x0000000040800000-0x0000000040bfffff M: (I,R,W) S/U: (R,W)
|
||
Domain0 Region04 : 0x0000000000000000-0xffffffffffffffff M: () S/U: (R,W,X)
|
||
Domain0 Next Address : 0x0000000080200000
|
||
Domain0 Next Arg1 : 0x0000000080100000
|
||
Domain0 Next Mode : S-mode
|
||
Domain0 SysReset : yes
|
||
Domain0 SysSuspend : yes
|
||
...
|
||
=> setenv bootargs "root=/dev/ram0 loglevel=7 debug rw earlycon=sbi console=hvc0 trace_event=initcall:* trace_options=sym-addr tp_printk trace_buf_size=1M"
|
||
=> setenv ramdisk_size 12614846
|
||
=> booti ${kernel_addr_r} ${ramdisk_addr_r}:${ramdisk_size} ${fdtcontroladdr}
|
||
...
|
||
[ 0.000000] printk: legacy bootconsole [sbi0] enabled
|
||
[ 0.000000] Reserved memory: created DMA memory pool at 0x0000000060000000, size 256 MiB
|
||
[ 0.000000] OF: reserved mem: initialized node dma_pool@60000000, compatible id shared-dma-pool
|
||
[ 0.000000] OF: reserved mem: 0x0000000060000000..0x000000006fffffff (262144 KiB) map non-reusable dma_pool@60000000
|
||
[ 0.000000] Reserved memory: created DMA memory pool at 0x0000000070000000, size 256 MiB
|
||
[ 0.000000] OF: reserved mem: initialized node dma_pool@70000000, compatible id shared-dma-pool
|
||
[ 0.000000] OF: reserved mem: 0x0000000070000000..0x000000007fffffff (262144 KiB) map non-reusable dma_pool@70000000
|
||
[ 0.000000] OF: reserved mem: 0x0000000080000000..0x000000008003ffff (256 KiB) nomap non-reusable mmode_resv1@80000000
|
||
[ 0.000000] OF: reserved mem: 0x0000000080040000..0x000000008005ffff (128 KiB) nomap non-reusable mmode_resv0@80040000
|
||
[ 0.000000] cma: Reserved 16 MiB at 0x00000000af000000 on node -1
|
||
[ 0.000000] Zone ranges:
|
||
[ 0.000000] DMA32 [mem 0x0000000080000000-0x00000000afffffff]
|
||
[ 0.000000] Normal empty
|
||
[ 0.000000] Movable zone start for each node
|
||
[ 0.000000] Early memory node ranges
|
||
[ 0.000000] node 0: [mem 0x0000000080000000-0x000000008005ffff]
|
||
[ 0.000000] node 0: [mem 0x0000000080060000-0x00000000afffffff]
|
||
[ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x00000000afffffff]
|
||
...
|
||
[ 2.736300] initcall_finish: func=init_ladder+0x0/0x40 ret=0
|
||
[ 2.742640] initcall_start: func=init_menu+0x0/0x38
|
||
[ 2.748200] initcall_finish: func=init_menu+0x0/0x38 ret=0
|
||
[ 2.754400] initcall_start: func=rpmsg_init+0x0/0xc0
|
||
[ 2.763400] initcall_finish: func=rpmsg_init+0x0/0xc0 ret=0
|
||
[ 2.769840] initcall_start: func=rpmsg_chrdev_init+0x0/0xc0
|
||
[ 2.777660] initcall_finish: func=rpmsg_chrdev_init+0x0/0xc0 ret=0
|
||
[ 2.784540] initcall_start: func=rpmsg_ctrldev_init+0x0/0xbc
|
||
[ 2.792460] initcall_finish: func=rpmsg_ctrldev_init+0x0/0xbc ret=0
|
||
[ 2.799420] initcall_start: func=rpmsg_ns_init+0x0/0x6c
|
||
[ 2.807460] initcall_finish: func=rpmsg_ns_init+0x0/0x6c ret=0
|
||
[ 2.814120] initcall_start: func=kobject_uevent_init+0x0/0x30
|
||
[ 2.821120] initcall_finish: func=kobject_uevent_init+0x0/0x30 ret=0
|
||
[ 2.830980] initcall_level: level=arch
|
||
[ 2.835460] initcall_start: func=riscv_cpuinfo_init+0x0/0x78
|
||
[ 2.841960] initcall_finish: func=riscv_cpuinfo_init+0x0/0x78 ret=0
|
||
[ 2.848920] initcall_start: func=vdso_init+0x0/0x4c
|
||
[ 2.855560] initcall_finish: func=vdso_init+0x0/0x4c ret=0
|
||
[ 2.861900] initcall_start: func=check_unaligned_access_all_cpus+0x0/0x1d4
|
||
|
||
This makes me wonder if what has just happened is that somehow we have
|
||
overwritten the OpenSBI trap for unaligned access and we are now executing some
|
||
garbage code in machine mode.
|
||
|
||
Based on the OpenSBI regions, I would assume here is where the traps should be:
|
||
|
||
Domain0 Region02 : 0x0000000080000000-0x000000008003ffff M: (R,X) S/U: ()
|
||
|
||
It should be feasible to read the memory from the host where those traps are
|
||
placed.
|
||
|
||
Here are the FDT reserved areas that U-Boot sees before switching to the kernel:
|
||
|
||
=> fdt print /reserved-memory
|
||
reserved-memory {
|
||
#address-cells = <0x00000002>;
|
||
#size-cells = <0x00000002>;
|
||
ranges;
|
||
mmode_resv1@80000000 {
|
||
reg = <0x00000000 0x80000000 0x00000000 0x00040000>;
|
||
no-map;
|
||
};
|
||
mmode_resv0@80040000 {
|
||
reg = <0x00000000 0x80040000 0x00000000 0x00020000>;
|
||
no-map;
|
||
};
|
||
dma_pool@60000000 {
|
||
reg = <0x00000000 0x60000000 0x00000000 0x10000000>;
|
||
compatible = "shared-dma-pool";
|
||
};
|
||
dma_pool@70000000 {
|
||
reg = <0x00000000 0x70000000 0x00000000 0x10000000>;
|
||
compatible = "shared-dma-pool";
|
||
};
|
||
};
|
||
|
||
Which match with the ones the kernel parse:
|
||
|
||
[ 0.000000] OF: reserved mem: 0x0000000080000000..0x000000008003ffff (256 KiB) nomap non-reusable mmode_resv1@80000000
|
||
[ 0.000000] OF: reserved mem: 0x0000000080040000..0x000000008005ffff (128 KiB) nomap non-reusable mmode_resv0@80040000
|
||
|
||
And with the OpenSBI regions (in reverse order):
|
||
|
||
Domain0 Region01 : 0x0000000080040000-0x000000008005ffff M: (R,W) S/U: ()
|
||
Domain0 Region02 : 0x0000000080000000-0x000000008003ffff M: (R,X) S/U: ()
|
||
|
||
Let's add some instrumentation in the code that handles the traps in OpenSBI.
|
||
|
||
Memory regions:
|
||
|
||
0x8000_0000 to 0x8003_ffff : OpenSBI code
|
||
0x8004_0000 to 0x8005_ffff : OpenSBI data
|
||
0x8010_0000 to 0x801._.... : FDT
|
||
0x8020_0000 to 0x8020_.... : U-Boot (later kernel)
|
||
|
||
0x8020_1000 to 0x81fa_0b87 : Kernel image
|
||
80201000-80cb177f : Kernel code
|
||
81400000-819fffff : Kernel rodata
|
||
81c00000-81f18747 : Kernel data
|
||
81f19000-81fa0b87 : Kernel bss
|
||
|
||
0x8400_0000 to 0x84.._.... M: (R,W) S/U: () Linux kernel
|
||
|
||
Okay, so we enter the unaligned access check:
|
||
|
||
![ 2.947680] initcall_start: func=check_unaligned_access_all_cpus+0x0/0x1d4
|
||
$
|
||
!OpenSBI: misaligned load$
|
||
!OpenSBI: misaligned load$
|
||
!OpenSBI: misaligned load$
|
||
!OpenSBI: misaligned load$
|
||
!OpenSBI: misaligned load$
|
||
!OpenSBI: misaligned load$
|
||
!OpenSBI: misaligned load$
|
||
!OpenSBI: misaligned load$
|
||
!OpenSBI: misaligned store$
|
||
!OpenSBI: misaligned store$
|
||
!OpenSBI: misaligned store$
|
||
!OpenSBI: misaligned store$
|
||
!OpenSBI: misaligned store$
|
||
!OpenSBI: misaligned store$
|
||
!OpenSBI: misaligned store$
|
||
!OpenSBI: misaligned store$
|
||
...
|
||
!OpenSBI: misaligned load$
|
||
!OpenSBI: misaligned load$
|
||
!OpenSBI: misaligned load$
|
||
!OpenSBI: misaligned load$
|
||
!OpenSBI: misaligned load$
|
||
!OpenSBI: misaligned load$
|
||
!OpenSBI: misaligned load$
|
||
!OpenSBI: misaligned load$
|
||
!OpenSBI: misaligned store$
|
||
!OpenSBI: misaligned store$
|
||
!OpenSBI: misaligned store$
|
||
!OpenSBI: misaligned store$
|
||
!OpenSBI: misaligned store$
|
||
!OpenSBI: misaligned store$
|
||
!OpenSBI: misaligned store$
|
||
!OpenSBI: misaligned store$
|
||
|
||
But we never arrive to anywhere else.
|
||
|
||
Here are the current options that match ALIGN:
|
||
|
||
hut% grep ALIGN /nix/store/c9jr35xnh2ffzjvkq8nvzj9i2siz1n4s-linux-config-riscv64-unknown-linux-gnu-6.9.7
|
||
CONFIG_SYSCTL_ARCH_UNALIGN_ALLOW=y
|
||
CONFIG_RISCV_MISALIGNED=y
|
||
CONFIG_RISCV_PROBE_UNALIGNED_ACCESS=y
|
||
# CONFIG_RISCV_EMULATED_UNALIGNED_ACCESS is not set
|
||
# CONFIG_RISCV_SLOW_UNALIGNED_ACCESS is not set
|
||
# CONFIG_RISCV_EFFICIENT_UNALIGNED_ACCESS is not set
|
||
CONFIG_HAVE_64BIT_ALIGNED_ACCESS=y
|
||
CONFIG_FUNCTION_ALIGNMENT=0
|
||
CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC=y
|
||
CONFIG_CMA_ALIGNMENT=8
|
||
# CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B is not set
|
||
|
||
We may want to set these two:
|
||
|
||
# CONFIG_RISCV_EMULATED_UNALIGNED_ACCESS is not set
|
||
# CONFIG_RISCV_SLOW_UNALIGNED_ACCESS is not set
|
||
|
||
And disable
|
||
|
||
CONFIG_RISCV_PROBE_UNALIGNED_ACCESS=y
|
||
|
||
So we don't perform the probing. However, this may bite later, so probably is a
|
||
better idea to debug it now. I could bound where it is failing as it doesn't
|
||
seem to be outside the speed check function.
|
||
|
||
|
||
Those 8 pairs of load and store calls seem to match this assembly function in
|
||
arch/riscv/kernel/copy-unaligned.S:
|
||
|
||
/* void __riscv_copy_bytes_unaligned(void *, const void *, size_t) */
|
||
/* Performs a memcpy without aligning buffers, using only byte accesses. */
|
||
/* Note: The size is truncated to a multiple of 8 */
|
||
SYM_FUNC_START(__riscv_copy_bytes_unaligned)
|
||
andi a4, a2, ~(8-1)
|
||
beqz a4, 2f
|
||
add a3, a1, a4
|
||
1:
|
||
lb a4, 0(a1)
|
||
lb a5, 1(a1)
|
||
lb a6, 2(a1)
|
||
lb a7, 3(a1)
|
||
lb t0, 4(a1)
|
||
lb t1, 5(a1)
|
||
lb t2, 6(a1)
|
||
lb t3, 7(a1)
|
||
sb a4, 0(a0)
|
||
sb a5, 1(a0)
|
||
sb a6, 2(a0)
|
||
sb a7, 3(a0)
|
||
sb t0, 4(a0)
|
||
sb t1, 5(a0)
|
||
sb t2, 6(a0)
|
||
sb t3, 7(a0)
|
||
addi a0, a0, 8
|
||
addi a1, a1, 8
|
||
bltu a1, a3, 1b
|
||
|
||
2:
|
||
ret
|
||
SYM_FUNC_END(__riscv_copy_bytes_unaligned)
|
||
|
||
Booted again with simpler output:
|
||
<!--{{{-->
|
||
|
||
><E><E><E><E[ 2.902540] initcall_finish: func=riscv_cpuinfo_init+0x0/0x78 ret=0
|
||
><E[ 2.909720] initcall_start: func=vdso_init+0x0/0x4c
|
||
><E[ 2.917380] initcall_finish: func=vdso_init+0x0/0x4c ret=0
|
||
><E[ 2.923940] initcall_start: func=check_unaligned_access_all_cpus+0x0/0x1d4
|
||
><L><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S><L><L><L><L><L><L><L><L><S><S><S><S><S><S><S><S>
|
||
|
||
<!--}}}-->
|
||
So here is the simplified output using the format `<X>` where X is one character
|
||
that differentiates the type of trap.
|
||
|
||
Based on the code that performs the check:
|
||
|
||
/*
|
||
* For a fixed amount of time, repeatedly try the function, and take
|
||
* the best time in cycles as the measurement.
|
||
*/
|
||
while (time_before(jiffies, now + (1 << MISALIGNED_ACCESS_JIFFIES_LG2))) {
|
||
start_cycles = get_cycles64();
|
||
/* Ensure the CSR read can't reorder WRT to the copy. */
|
||
mb();
|
||
__riscv_copy_words_unaligned(dst, src, MISALIGNED_COPY_SIZE);
|
||
/* Ensure the copy ends before the end time is snapped. */
|
||
mb();
|
||
end_cycles = get_cycles64();
|
||
if ((end_cycles - start_cycles) < word_cycles)
|
||
word_cycles = end_cycles - start_cycles;
|
||
}
|
||
|
||
byte_cycles = -1ULL;
|
||
__riscv_copy_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE);
|
||
start_jiffies = jiffies;
|
||
while ((now = jiffies) == start_jiffies)
|
||
cpu_relax();
|
||
|
||
while (time_before(jiffies, now + (1 << MISALIGNED_ACCESS_JIFFIES_LG2))) {
|
||
start_cycles = get_cycles64();
|
||
mb();
|
||
__riscv_copy_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE);
|
||
mb();
|
||
end_cycles = get_cycles64();
|
||
if ((end_cycles - start_cycles) < byte_cycles)
|
||
byte_cycles = end_cycles - start_cycles;
|
||
}
|
||
|
||
preempt_enable();
|
||
|
||
/* Don't divide by zero. */
|
||
if (!word_cycles || !byte_cycles) {
|
||
pr_warn("cpu%d: rdtime lacks granularity needed to measure unaligned access speed\n",
|
||
cpu);
|
||
|
||
return 0;
|
||
}
|
||
|
||
if (word_cycles < byte_cycles)
|
||
speed = RISCV_HWPROBE_MISALIGNED_FAST;
|
||
|
||
ratio = div_u64((byte_cycles * 100), word_cycles);
|
||
pr_info("cpu%d: Ratio of byte access time to unaligned word access is %d.%02d, unaligned accesses are %s\n",
|
||
cpu,
|
||
ratio / 100,
|
||
ratio % 100,
|
||
(speed == RISCV_HWPROBE_MISALIGNED_FAST) ? "fast" : "slow");
|
||
|
||
The fact that the print line with the "Ratio of byte access..." doesn't appear,
|
||
puts the hang place at some point in between the check and the `pr_info()` call.
|
||
|
||
If this is a problem on the OpenSBI side, we can bisect the code to find out
|
||
where the problem was introduced. But first, I would have to try OpenSBI 1.4 and
|
||
ensure we can reproduce it.
|
||
|
||
Okay so with OpenSBI 1.4 we have a hang in the same place.
|
||
|
||
Lets compare the domain regions:
|
||
|
||
With OpenSBI 1.4 `fpga/alveo_ox`:
|
||
|
||
Domain0 Region00 : 0x0000000040000000-0x0000000040000fff M: (I,R,W) S/U: (R,W)
|
||
Domain0 Region01 : 0x0000000080040000-0x000000008004ffff M: (R,W) S/U: ()
|
||
Domain0 Region02 : 0x0000000080000000-0x000000008003ffff M: (R,X) S/U: ()
|
||
Domain0 Region03 : 0x0000000000000000-0xffffffffffffffff M: () S/U: (R,W,X)
|
||
Domain0 Next Address : 0x0000000080200000
|
||
Domain0 Next Arg1 : 0x0000000080017000
|
||
|
||
With OpenSBI 1.4 `generic`:
|
||
|
||
Domain0 Region00 : 0x0000000040001000-0x0000000040001fff M: (I,R,W) S/U: (R,W)
|
||
Domain0 Region01 : 0x0000000080040000-0x000000008005ffff M: (R,W) S/U: ()
|
||
Domain0 Region02 : 0x0000000080000000-0x000000008003ffff M: (R,X) S/U: ()
|
||
Domain0 Region03 : 0x0000000040800000-0x0000000040bfffff M: (I,R,W) S/U: (R,W)
|
||
Domain0 Region04 : 0x0000000000000000-0xffffffffffffffff M: () S/U: (R,W,X)
|
||
Domain0 Next Address : 0x0000000080200000
|
||
Domain0 Next Arg1 : 0x0000000080100000
|
||
|
||
With OpenSBI 1.5 `generic`:
|
||
|
||
Domain0 Region00 : 0x0000000040001000-0x0000000040001fff M: (I,R,W) S/U: (R,W)
|
||
Domain0 Region01 : 0x0000000080040000-0x000000008005ffff M: (R,W) S/U: ()
|
||
Domain0 Region02 : 0x0000000080000000-0x000000008003ffff M: (R,X) S/U: ()
|
||
Domain0 Region03 : 0x0000000040800000-0x0000000040bfffff M: (I,R,W) S/U: (R,W)
|
||
Domain0 Region04 : 0x0000000000000000-0xffffffffffffffff M: () S/U: (R,W,X)
|
||
Domain0 Next Address : 0x0000000080200000
|
||
Domain0 Next Arg1 : 0x0000000080100000
|
||
|
||
So we have several changes.
|
||
|
||
First, the PLIC has a new memory map. Let's comment it out in the device tree,
|
||
and see what happens.
|
||
|
||
Hangs in the same place, but now we don't have the 0x40800000 region with
|
||
OpenSBI 1.4 generic:
|
||
|
||
Domain0 Region00 : 0x0000000040001000-0x0000000040001fff M: (I,R,W) S/U: (R,W)
|
||
Domain0 Region01 : 0x0000000080040000-0x000000008005ffff M: (R,W) S/U: ()
|
||
Domain0 Region02 : 0x0000000080000000-0x000000008003ffff M: (R,X) S/U: ()
|
||
Domain0 Region03 : 0x0000000000000000-0xffffffffffffffff M: () S/U: (R,W,X)
|
||
Domain0 Next Address : 0x0000000080200000
|
||
Domain0 Next Arg1 : 0x0000000080100000
|
||
|
||
Let's try to move the serial region to 0x40000000, although that seems to be
|
||
wrong.
|
||
|
||
Yep, that causes OpenSBI to not emit any message in the console, so let's put it
|
||
back in 0x40001000.
|
||
|
||
Next, we may want to place the FDT in the previous location, at 0x80017000.
|
||
|
||
=> setenv bootargs "root=/dev/ram0 loglevel=7 debug rw earlycon=sbi console=hvc0 trace_event=initcall:* trace_options=sym-addr tp_printk trace_buf_size=1M"
|
||
=> setenv ramdisk_size 12614846
|
||
=> booti ${kernel_addr_r} ${ramdisk_addr_r}:${ramdisk_size} ${fdtcontroladdr}
|
||
|
||
Hmm, continues to hang in the same point:
|
||
|
||
[ 2.852200] initcall_start: func=check_unaligned_access_all_cpus+0x0/0x1d4
|
||
|
||
With this regions:
|
||
|
||
OpenSBI v1.4
|
||
____ _____ ____ _____
|
||
/ __ \ / ____| _ \_ _|
|
||
| | | |_ __ ___ _ __ | (___ | |_) || |
|
||
| | | | '_ \ / _ \ '_ \ \___ \| _ < | |
|
||
| |__| | |_) | __/ | | |____) | |_) || |_
|
||
\____/| .__/ \___|_| |_|_____/|____/_____|
|
||
| |
|
||
|_|
|
||
|
||
Platform Name : Barcelona Supercomputing Center - Lagarto Ox (NixOS)
|
||
Platform Features : medeleg
|
||
Platform HART Count : 1
|
||
Platform IPI Device : ---
|
||
Platform Timer Device : --- @ 0Hz
|
||
Platform Console Device : uart8250
|
||
Platform HSM Device : ---
|
||
Platform PMU Device : ---
|
||
Platform Reboot Device : ---
|
||
Platform Shutdown Device : ---
|
||
Platform Suspend Device : ---
|
||
Platform CPPC Device : ---
|
||
Firmware Base : 0x80000000
|
||
Firmware Size : 323 KB
|
||
Firmware RW Offset : 0x40000
|
||
Firmware RW Size : 67 KB
|
||
Firmware Heap Offset : 0x48000
|
||
Firmware Heap Size : 35 KB (total), 2 KB (reserved), 8 KB (used), 24 KB (free)
|
||
Firmware Scratch Size : 4096 B (total), 288 B (used), 3808 B (free)
|
||
Runtime SBI Version : 2.0
|
||
|
||
Domain0 Name : root
|
||
Domain0 Boot HART : 0
|
||
Domain0 HARTs : 0*
|
||
Domain0 Region00 : 0x0000000040001000-0x0000000040001fff M: (I,R,W) S/U: (R,W)
|
||
Domain0 Region01 : 0x0000000080040000-0x000000008005ffff M: (R,W) S/U: ()
|
||
Domain0 Region02 : 0x0000000080000000-0x000000008003ffff M: (R,X) S/U: ()
|
||
Domain0 Region03 : 0x0000000000000000-0xffffffffffffffff M: () S/U: (R,W,X)
|
||
Domain0 Next Address : 0x0000000080200000
|
||
Domain0 Next Arg1 : 0x0000000080017000
|
||
Domain0 Next Mode : S-mode
|
||
Domain0 SysReset : yes
|
||
Domain0 SysSuspend : yes
|
||
|
||
I will also try 0x80013000 as address as I saw it being used before, but I don't
|
||
think it is the problem. It must be related with how the initialization is now
|
||
different.
|
||
|
||
OpenSBI v1.4
|
||
____ _____ ____ _____
|
||
/ __ \ / ____| _ \_ _|
|
||
| | | |_ __ ___ _ __ | (___ | |_) || |
|
||
| | | | '_ \ / _ \ '_ \ \___ \| _ < | |
|
||
| |__| | |_) | __/ | | |____) | |_) || |_
|
||
\____/| .__/ \___|_| |_|_____/|____/_____|
|
||
| |
|
||
|_|
|
||
|
||
sbi_trap_error: hart0: illegal instruction handler failed (error -2)
|
||
sbi_trap_error: hart0: mcause=0x0000000000000002 mtval=0x0000000000000000
|
||
sbi_trap_error: hart0: mepc=0x00000000800131a4 mstatus=0x8000000a00007800
|
||
sbi_trap_error: hart0: ra=0x0000000080013bb0 sp=0x0000000080046e70
|
||
sbi_trap_error: hart0: gp=0x0000000000000000 tp=0x0000000080047000
|
||
sbi_trap_error: hart0: s0=0x0000000080046eb0 s1=0x0000000000000118
|
||
sbi_trap_error: hart0: a0=0x0000000080013000 a1=0x0000000080046e74
|
||
sbi_trap_error: hart0: a2=0x00000000800133c0 a3=0x0000000000000600
|
||
sbi_trap_error: hart0: a4=0x000000000000062c a5=0x0000000000000284
|
||
sbi_trap_error: hart0: a6=0x0000000000000000 a7=0x000000000000002c
|
||
sbi_trap_error: hart0: s2=0x0000000080013000 s3=0x0000000000000000
|
||
sbi_trap_error: hart0: s4=0x0000000080047000 s5=0x0000000000000000
|
||
sbi_trap_error: hart0: s6=0x0000000000000000 s7=0x0000000000000001
|
||
sbi_trap_error: hart0: s8=0x0000000000002000 s9=0x00000000800436f0
|
||
sbi_trap_error: hart0: s10=0x0000000000000000 s11=0x0000000000000000
|
||
sbi_trap_error: hart0: t0=0x0000000080009796 t1=0x0000000000000268
|
||
sbi_trap_error: hart0: t2=0x0000000000001000 t3=0x00000000000000d7
|
||
sbi_trap_error: hart0: t4=0x000000007fffffff t5=0x0000000000000001
|
||
sbi_trap_error: hart0: t6=0x0000000000000004
|
||
|
||
With 0x80013000 we seem to enter OpenSBI code region.
|
||
|
||
Let's try with openpiton again, without the FDT address. We should disable the
|
||
secondary console from the DT too.
|
||
|
||
No output, lets enable the PLIC again in the DT. Same.
|
||
|
||
Probably we have a better chance to fix it in the generic platform.
|
||
|
||
Let's use a very far away address for the FDT just in case: `0x0_c000_0000`
|
||
|
||
This one doesn't work:
|
||
|
||
=> setenv bootargs "root=/dev/ram0 loglevel=7 debug rw earlycon=sbi console=hvc0 ftrace=function ftrace_filter=* tp_printk"
|
||
|
||
This one yes:
|
||
|
||
=> setenv bootargs "root=/dev/ram0 loglevel=7 debug rw earlycon=sbi console=hvc0 trace_event=*:* tp_printk trace_buf_size=1M"
|
||
=> setenv ramdisk_size 12614846
|
||
=> booti ${kernel_addr_r} ${ramdisk_addr_r}:${ramdisk_size} ${fdtcontroladdr}
|
||
|
||
[ 164.849640] initcall_finish: func=vdso_init+0x0/0x4c ret=0
|
||
[ 164.849960] console: initcall_finish: func=vdso_init+0x0/0x4c ret=0
|
||
[ 164.862820] initcall_start: func=check_unaligned_access_all_cpus+0x0/0x1d4
|
||
[ 164.863140] console: initcall_start: func=check_unaligned_access_all_cpus+0x0/0x1d4
|
||
[ 164.878860] kmalloc: call_site=check_unaligned_access_all_cpus+0xa8/0x1d4 ptr=(____ptrval____) bytes_req=8 bytes_alloc=8 gfp_flags=GFP_KERNEL|__GFP_ZERO node=-1 accounted=false
|
||
[ 164.879200] console: kmalloc: call_site=check_unaligned_access_all_cpus+0xa8/0x1d4 ptr=(____ptrval____) bytes_req=8 bytes_alloc=8 gfp_flags=GFP_KERNEL|__GFP_ZERO node=-1 accounted=false
|
||
[ 164.912380] mm_page_alloc_zone_locked: page=(____ptrval____) pfn=0x83d28 order=2 migratetype=0 percpu_refill=1
|
||
[ 164.912680] console: mm_page_alloc_zone_locked: page=(____ptrval____) pfn=0x83d28 order=2 migratetype=0 percpu_refill=1
|
||
[ 164.934340] mm_page_alloc_zone_locked: page=(____ptrval____) pfn=0x83d2c order=2 migratetype=0 percpu_refill=1
|
||
[ 164.934660] console: mm_page_alloc_zone_locked: page=(____ptrval____) pfn=0x83d2c order=2 migratetype=0 percpu_refill=1
|
||
[ 164.956300] mm_page_alloc_zone_locked: page=(____ptrval____) pfn=0x83d30 order=2 migratetype=0 percpu_refill=1
|
||
[ 164.956620] console: mm_page_alloc_zone_locked: page=(____ptrval____) pfn=0x83d30 order=2 migratetype=0 percpu_refill=1
|
||
[ 164.978280] mm_page_alloc_zone_locked: page=(____ptrval____) pfn=0x83d34 order=2 migratetype=0 percpu_refill=1
|
||
[ 164.978600] console: mm_page_alloc_zone_locked: page=(____ptrval____) pfn=0x83d34 order=2 migratetype=0 percpu_refill=1
|
||
[ 165.000260] mm_page_alloc_zone_locked: page=(____ptrval____) pfn=0x83d38 order=2 migratetype=0 percpu_refill=1
|
||
[ 165.000560] console: mm_page_alloc_zone_locked: page=(____ptrval____) pfn=0x83d38 order=2 migratetype=0 percpu_refill=1
|
||
[ 165.022200] mm_page_alloc_zone_locked: page=(____ptrval____) pfn=0x83d3c order=2 migratetype=0 percpu_refill=1
|
||
[ 165.022540] console: mm_page_alloc_zone_locked: page=(____ptrval____) pfn=0x83d3c order=2 migratetype=0 percpu_refill=1
|
||
[ 165.044200] mm_page_alloc_zone_locked: page=(____ptrval____) pfn=0x83d40 order=2 migratetype=0 percpu_refill=1
|
||
[ 165.044500] console: mm_page_alloc_zone_locked: page=(____ptrval____) pfn=0x83d40 order=2 migratetype=0 percpu_refill=1
|
||
[ 165.066220] mm_page_alloc: page=(____ptrval____) pfn=0x83d28 order=2 migratetype=0 gfp_flags=GFP_KERNEL
|
||
[ 165.066540] console: mm_page_alloc: page=(____ptrval____) pfn=0x83d28 order=2 migratetype=0 gfp_flags=GFP_KERNEL
|
||
|
||
## 2024-08-01
|
||
|
||
Now that we have a new bitstream with a CLINT connected to a PLIC input, we may
|
||
be able to generate an interrupt.
|
||
|
||
Here is the comment where I gather the pieces:
|
||
|
||
---8<---{{{
|
||
|
||
From https://gitlab.bsc.es/hwdesign/rtl/core-tile/sa-fpga/ I can see that the
|
||
auxiliary timer [is in fact another
|
||
CLINT](https://gitlab.bsc.es/hwdesign/rtl/core-tile/sa-fpga/-/blob/10ba8b2a11ef105d7cda065e13838a3d28f3c951/fpga_core_bridge/rtl/fpga_core_bridge.sv#L685).
|
||
|
||
I don't have access to the [hlib
|
||
repository](https://gitlab.bsc.es/hwdesign/hlib.git) (@jmendoza can I get access
|
||
to it?) to see the CLINT definition, but based on [this
|
||
CLINT](https://github.com/openhwgroup/cva6/blob/master/corev_apu/clint/clint.sv)
|
||
and [this one](https://github.com/pulp-platform/clint/blob/master/src/clint.sv)
|
||
I can estimate some of the previous information:
|
||
|
||
> - The information on which port number of the PLIC the timer is connected to.
|
||
|
||
https://gitlab.bsc.es/hwdesign/rtl/core-tile/sa-fpga/-/blob/main/fpga_core_bridge/rtl/fpga_core_bridge.sv#L1114
|
||
|
||
```
|
||
plic #(
|
||
.PARAMETER_BITWIDTH (7),
|
||
.NUM_TARGETS (1),
|
||
.NUM_SOURCES (4)
|
||
) plic_inst (
|
||
.clk_i (clk_i),
|
||
.rstn_i (reset),
|
||
.irq_sources_i ({plic_timer_eirq,eth_irq,uart1_irq}),
|
||
.eip_targets_o (irq),
|
||
```
|
||
|
||
If I read it from right to left starting at 1, it should be **at 4**, as the
|
||
`eth_irq` has two "wires".
|
||
|
||
|
||
> - The memory address of the timer and the mapped registers, so I can see it
|
||
> increasing its value. I think the `aux_timer` you had in the past would be
|
||
> fine.
|
||
|
||
https://gitlab.bsc.es/hwdesign/rtl/core-tile/sa-fpga/-/blob/main/fpga_core_bridge/rtl/local_includes/defines.svh#L33-36
|
||
|
||
```
|
||
//Size: 64KB
|
||
`define AUX_TIMER_XBAR_ID 2
|
||
`define AUX_TIMER_BASE_ADDR 64'h0000_0000_4001_0000 // Need to be this space because we use a clint as aux timer
|
||
`define AUX_TIMER_END_ADDR 64'h0000_0000_4001_FFFF
|
||
```
|
||
|
||
> - The specific operations I need to do in machine mode to configure the timer
|
||
> to fire at 1 Hz (probably setting two registers).
|
||
|
||
Based on the source of the CLINT, **only one interrupt will be generated** after
|
||
setting the mtimecmp register to something larger than the mtime register. Then
|
||
I suspect I would have to make the interrupt run some code to rearm it again by
|
||
modifying the mtimecmp register to some value in the future:
|
||
|
||
```
|
||
// -----------------------------
|
||
// IRQ Generation
|
||
// -----------------------------
|
||
// The mtime register has a 64-bit precision on all RV32, RV64, and RV128 systems. Platforms provide a 64-bit
|
||
// memory-mapped machine-mode timer compare register (mtimecmp), which causes a timer interrupt to be posted when the
|
||
// mtime register contains a value greater than or equal (mtime >= mtimecmp) to the value in the mtimecmp register.
|
||
// The interrupt remains posted until it is cleared by writing the mtimecmp register. The interrupt will only be taken
|
||
// if interrupts are enabled and the MTIE bit is set in the mie register.
|
||
always_comb begin : irq_gen
|
||
// check that the mtime cmp register is set to a meaningful value
|
||
for (int unsigned i = 0; i < NR_CORES; i++) begin
|
||
if (mtime_q >= mtimecmp_q[i]) begin
|
||
timer_irq_o[i] = 1'b1;
|
||
end else begin
|
||
timer_irq_o[i] = 1'b0;
|
||
end
|
||
end
|
||
end
|
||
```
|
||
|
||
I could ensure that an interrupt has been fired by reading the mtime and
|
||
mtimecmp values, and checking that mtime > mtimecmp.
|
||
|
||
Now I only need to find a bitstream that has been generated with
|
||
https://gitlab.bsc.es/hwdesign/rtl/core-tile/sa-fpga/-/commit/10ba8b2a11ef105d7cda065e13838a3d28f3c951.
|
||
|
||
|
||
This may work:
|
||
|
||
https://gitlab.bsc.es/hwdesign/fpga/integration-lab/fpga-shell/-/jobs/968583/raw
|
||
|
||
> Submodule path 'sa-fpga': checked out '12b77cb50cf1c416f107d4c7ab1c52d7b5e59056'
|
||
|
||
Which is based on fpga-shell https://gitlab.bsc.es/hwdesign/fpga/integration-lab/fpga-shell/-/commit/01265d197f256bce2c7e82d21c7f4bf5dcb44e68
|
||
|
||
Here is the bitstream job: https://gitlab.bsc.es/hwdesign/fpga/integration-lab/fpga-shell/-/jobs/968585
|
||
|
||
And the bitstream: [artifacts.zip](/uploads/d8240a779cd485771b9e3d0147e342d1/artifacts.zip)
|
||
|
||
And full log: [job.log](/uploads/a4215e4d039065b77f7a2d2b1403e475/job.log)
|
||
|
||
The memory map would need a bit of adjustment in the device tree, but to play with the timer in machine mode not much is needed.
|
||
|
||
I think I have all the pieces now.
|
||
|
||
---8<---}}}
|
||
|
||
I will try with the last bitstream that I already had compiled, as I will have
|
||
to rebuild the required packages in nix.
|
||
|
||
To compute the memory position of the registers:
|
||
|
||
`define AUX_TIMER_XBAR_ID 2
|
||
`define AUX_TIMER_BASE_ADDR 64'h0000_0000_4001_0000 // Need to be this space because we use a clint as aux timer
|
||
`define AUX_TIMER_END_ADDR 64'h0000_0000_4001_FFFF
|
||
|
||
localparam logic [15:0] MSIP_BASE = 16'h0;
|
||
localparam logic [15:0] MTIMECMP_BASE = 16'h4000;
|
||
localparam logic [15:0] MTIME_BASE = 16'hbff8;
|
||
|
||
So, the base address 0x40010000 and the first MTIME at 0xbff8 would give us a
|
||
timer at 0x4001bff8.
|
||
|
||
Here it is:
|
||
|
||
=> md 0x4001bff8 1
|
||
4001bff8: 006e65b8 .en.
|
||
=> md 0x4001bff8 1
|
||
4001bff8: 006e9a26 &.n.
|
||
=> md 0x4001bff8 1
|
||
4001bff8: 006ebae1 ..n.
|
||
=> md 0x4001bff8 1
|
||
4001bff8: 006eda45 E.n.
|
||
=> md 0x4001bff8 1
|
||
4001bff8: 006ef9d4 ..n.
|
||
=> md 0x4001bff8 1
|
||
4001bff8: 006f1abb ..o.
|
||
|
||
Now, the MTIMECMP should be at 0x40014000, which should be 0.
|
||
|
||
=> md 0x40014000 1
|
||
40014000: 00000000 ....
|
||
|
||
Good.
|
||
|
||
Now, I suspect the MSIP is not used, so it should be 0 at 0x40010000 too:
|
||
|
||
=> md 0x40010000 1
|
||
40010000: 00000000 ....
|
||
|
||
Nice.
|
||
|
||
Just for testing, let's see if I can make the timer cause any change in the MSIP
|
||
register by setting the MTIMECMP to a value:
|
||
|
||
=> mw 0x40014000 0x01700000 # Write the MTIMECMP
|
||
=> md 0x40014000 1
|
||
40014000: 01700000 ..p.
|
||
=> md 0x4001bff8 1
|
||
4001bff8: 016da81a ..m.
|
||
=> md 0x40010000 1
|
||
40010000: 00000000 ....
|
||
=> md 0x4001bff8 1
|
||
4001bff8: 016f947c |.o.
|
||
=> md 0x4001bff8 1
|
||
4001bff8: 016fff96 ..o.
|
||
=> md 0x4001bff8 1
|
||
4001bff8: 01704367 gCp. # Now we passed it
|
||
=> md 0x40010000 1
|
||
40010000: 00000000 .... # But MSIP is still 0
|
||
|
||
As expected, nothing happens. We cannot monitor the interrupt line from the
|
||
timer itself.
|
||
|
||
Now, let see if we can inspect the state of the PLIC.
|
||
|
||
From the `plic_interface` I can see where are the memory addresses of the
|
||
registers exposed.
|
||
|
||
The PLIC is mapped here:
|
||
|
||
//Size: 4MB
|
||
`define PLIC_XBAR_ID 5
|
||
`define PLIC_BASE_ADDR 64'h0000_0000_4080_0000
|
||
`define PLIC_END_ADDR 64'h0000_0000_40BF_FFFF
|
||
|
||
There are several ways in which the interrupts are not forwarded to the
|
||
destination, and several destinations. The PLIC specification is a good resource
|
||
to understand it:
|
||
|
||
https://github.com/riscv/riscv-plic-spec
|
||
|
||
This is important:
|
||
|
||
> The interrupt gateways are responsible for converting global interrupt signals
|
||
> into a common interrupt request format, and for controlling the flow of
|
||
> interrupt requests to the PLIC core. At most one interrupt request per
|
||
> interrupt source can be pending in the PLIC core at any time, indicated by
|
||
> setting the source’s IP bit. The gateway only forwards a new interrupt request
|
||
> to the PLIC core after receiving notification that the interrupt handler
|
||
> servicing the previous interrupt request from the same source has completed.
|
||
|
||
So, there cannot be any pending interrupt, otherwise no more interrupts will be
|
||
sent to the core.
|
||
|
||
Assuming the PLIC uses the standard memory layout, we should find:
|
||
|
||
base + 0x000000: Reserved (interrupt source 0 does not exist)
|
||
base + 0x000004: Interrupt source 1 priority
|
||
base + 0x000008: Interrupt source 2 priority
|
||
|
||
Which they should begin at 0x40800000.
|
||
|
||
=> md 0x40800000 8
|
||
40800000: 00000000 00000000 00000000 00000000 ................
|
||
40800010: 00000000 00000000 00000000 00000000 ................
|
||
|
||
All the priorities are set to 0.
|
||
|
||
Let's see the pending interrupts:
|
||
|
||
base + 0x000FFC: Interrupt source 1023 priority
|
||
base + 0x001000: Interrupt Pending bit 0-31
|
||
base + 0x00107C: Interrupt Pending bit 992-1023
|
||
|
||
They should be at 0x40801000:
|
||
|
||
=> md 0x40801000 8
|
||
40801000: 00000010 00000000 00000000 00000000 ................
|
||
40801010: 00000000 00000000 00000000 00000000 ................
|
||
|
||
Whoa, look at that.
|
||
|
||
4321
|
||
0x00000010 = 10000
|
||
| |
|
||
| int 0 (reserved)
|
||
int 4 = timer
|
||
|
||
We got the interrupt 4 pending in context 0!
|
||
|
||
Other context don't seem to see anything:
|
||
|
||
=> md 0x40801080 1
|
||
40801080: 00000000 ....
|
||
=> md 0x40801100 1
|
||
40801100: 00000000 ....
|
||
=> md 0x40801180 1
|
||
40801180: 00000000 ....
|
||
=> md 0x40801200 1
|
||
40801200: 00000000 ....
|
||
=> md 0x40801280 1
|
||
40801280: 00000000 ....
|
||
=> md 0x40801300 1
|
||
40801300: 00000000 ....
|
||
=> md 0x40801380 1
|
||
40801380: 00000000 ....
|
||
|
||
So, as the priority is 0, this means it is ignored:
|
||
|
||
> If PLIC supports Interrupt Priorities, then each PLIC interrupt source can be
|
||
> assigned a priority by writing to its 32-bit memory-mapped priority register.
|
||
> A priority value of 0 is reserved to mean "never interrupt" and effectively
|
||
> disables the interrupt. Priority 1 is the lowest active priority while the
|
||
> maximum level of priority depends on PLIC implementation. Ties between global
|
||
> interrupts of the same priority are broken by the Interrupt ID; interrupts
|
||
> with the lowest ID have the highest effective priority.
|
||
|
||
Let's claim the interrupt, by just performing a read from 0x40a00004:
|
||
|
||
=> md 0x40801000 1
|
||
40801000: 00000010 ....
|
||
=> md 0x40a00004 1
|
||
40a00004: 00000000 ....
|
||
=> md 0x40801000 1
|
||
40801000: 00000010 ....
|
||
|
||
So, it continues to be pending.
|
||
|
||
We have to write the completed interrupt, by writing the number 4 to the same
|
||
register:
|
||
|
||
=> mw 0x40a00004 4
|
||
=> md 0x40801000 1
|
||
40801000: 00000010 ....
|
||
|
||
Still not cleared.
|
||
|
||
Let's try making the MTIMECMP value much higher than MTIME:
|
||
|
||
=> md 0x40014000 1
|
||
40014000: 01700000 ..p.
|
||
=> md 0x4001bff8 1
|
||
4001bff8: 03a4584b KX..
|
||
=> mw 0x40014000 0xaaaaaaaa
|
||
=> md 0x40014000 1
|
||
40014000: aaaaaaaa ....
|
||
=> md 0x4001bff8 1
|
||
4001bff8: 03abc84d M...
|
||
|
||
So... the ID that must be written to the completion register is not the
|
||
interrupt number, but the value read from the claim register, which is 0.
|
||
|
||
=> mw 0x40a00004 0
|
||
=> md 0x40801000 1
|
||
40801000: 00000010 ....
|
||
|
||
Still, nothing.
|
||
|
||
All interrupts are disabled:
|
||
|
||
=> md 0x40802000 4
|
||
40802000: 00000000 00000000 00000000 00000000 ................
|
||
|
||
Let's try enabling the interrupt 4, by writting:
|
||
|
||
=> mw 0x40802000 0x10
|
||
=> md 0x40802000 1
|
||
40802000: 00000010 ....
|
||
=> md 0x40801000 1
|
||
40801000: 00000010 ....
|
||
|
||
Now, let's set the priority to something else than 0.
|
||
|
||
First, lets make sure that the context 0 threshold priority is set to 0, so we
|
||
allow all interrupts:
|
||
|
||
0x200000: Priority threshold for context 0
|
||
|
||
=> md 0x40a00000 1
|
||
40a00000: 00000007 ....
|
||
|
||
Oh, so we are only receiving interrupts with priority 7 or higher. But our
|
||
interrupt has priority 0!
|
||
|
||
=> md 0x40800004 1
|
||
40800004: 00000000 ....
|
||
|
||
Let's make the threshold 0 and our interrupt have priority 1.
|
||
|
||
=> mw 0x40a00000 0
|
||
=> mw 0x40800004 1
|
||
=> md 0x40800004 1
|
||
40800004: 00000001 ....
|
||
=> md 0x40a00000
|
||
40a00000: 00000000 ....
|
||
|
||
Not let's see again the interrupt state:
|
||
|
||
=> md 0x40801000 1
|
||
40801000: 00000010 ....
|
||
|
||
Still on.
|
||
|
||
Let's read the claim register again.
|
||
|
||
=> md 0x40a00004
|
||
40a00004: 00000000 ....
|
||
|
||
Still 0, let's try to complete it:
|
||
|
||
=> mw 0x40a00004 0
|
||
=> md 0x40801000 1
|
||
40801000: 00000010 ....
|
||
|
||
Nope, still pending.
|
||
|
||
What, what the hell. The threshold value has changed to 1:
|
||
|
||
=> md 0x40800004 1
|
||
40800004: 00000001 ....
|
||
=> md 0x40a00000 1
|
||
40a00000: 00000001 .... <-- this was 0
|
||
|
||
Let's configure the interruption priority to something bigger than 1.
|
||
|
||
Wait, I put the priority in the wrong source:
|
||
|
||
0x000000: Reserved (interrupt source 0 does not exist)
|
||
0x000004: Interrupt source 1 priority
|
||
0x000008: Interrupt source 2 priority
|
||
|
||
Our timer should be the source 4, so 12 or 0xc:
|
||
|
||
=> md 0x4080000c 1
|
||
4080000c: 00000000 ....
|
||
|
||
(This is wrong, should be 0x40800010, see below)
|
||
|
||
Let's make it have priority 0xd:
|
||
|
||
=> mw 0x4080000c 0xd
|
||
=> md 0x4080000c 1
|
||
4080000c: 0000000d ....
|
||
|
||
Something weird is going on with the priority register?
|
||
|
||
=> md 0x40a00000 1
|
||
40a00000: 00000000 ....
|
||
=> md 0x40a00000 1
|
||
40a00000: 0000000d ....
|
||
=> md 0x40a00000 1
|
||
40a00000: 0000000d ....
|
||
=> md 0x40a00000 1
|
||
40a00000: 0000000d ....
|
||
=> md 0x40a00000 1
|
||
40a00000: 0000000d ....
|
||
|
||
Let's see the claim register, which should be in the next word:
|
||
|
||
=> md 0x40a00004 1
|
||
40a00004: 00000004 ....
|
||
|
||
Yes! Now I can see the claim register with a proper ID. Let's complete this
|
||
interrupt by writing the 4 back to that register:
|
||
|
||
=> mw 0x40a00004 4
|
||
=> md 0x40801000 1
|
||
40801000: 00000000 ....
|
||
|
||
Perfect! It properly caused the pending interrupt to disappear.
|
||
|
||
Let's try now setting the MTIMECMP to something smaller than the MTIME, so it
|
||
causes an interrupt. With a value 0 should always work, but lets choose a non
|
||
zero value:
|
||
|
||
=> md 0x40014000
|
||
40014000: aaaaaaaa ....
|
||
=> mw 0x40014000 00aaaaaa
|
||
=> md 0x40014000
|
||
40014000: 00aaaaaa ....
|
||
=> md 0x4001bff8
|
||
4001bff8: 06211a0c ..!.
|
||
=> md 0x40801000 1
|
||
40801000: 00000010 ....
|
||
|
||
Perfect! It causes the interrupt to appear as pending.
|
||
|
||
So, using the context 0, we can properly see the interrupt pending, claim it and
|
||
complete it. But the context 0 is not used in OpenSBI, only the 9 and 11:
|
||
|
||
From `include/sbi/riscv_encoding.h`:
|
||
|
||
#define IRQ_S_SOFT 1
|
||
#define IRQ_VS_SOFT 2
|
||
#define IRQ_M_SOFT 3
|
||
#define IRQ_S_TIMER 5
|
||
#define IRQ_VS_TIMER 6
|
||
#define IRQ_M_TIMER 7
|
||
#define IRQ_S_EXT 9
|
||
#define IRQ_VS_EXT 10
|
||
#define IRQ_M_EXT 11
|
||
#define IRQ_S_GEXT 12
|
||
#define IRQ_PMU_OVF 13
|
||
|
||
And from `lib/utils/irqchip/fdt_irqchip_plic.c`:
|
||
|
||
static int irqchip_plic_update_hartid_table(void *fdt, int nodeoff,
|
||
struct plic_data *pd)
|
||
{
|
||
const fdt32_t *val;
|
||
u32 phandle, hwirq, hartid;
|
||
struct sbi_scratch *scratch;
|
||
int i, err, count, cpu_offset, cpu_intc_offset;
|
||
|
||
val = fdt_getprop(fdt, nodeoff, "interrupts-extended", &count);
|
||
if (!val || count < sizeof(fdt32_t))
|
||
return SBI_EINVAL;
|
||
count = count / sizeof(fdt32_t);
|
||
|
||
for (i = 0; i < count; i += 2) {
|
||
phandle = fdt32_to_cpu(val[i]);
|
||
hwirq = fdt32_to_cpu(val[i + 1]);
|
||
|
||
cpu_intc_offset = fdt_node_offset_by_phandle(fdt, phandle);
|
||
if (cpu_intc_offset < 0)
|
||
continue;
|
||
|
||
cpu_offset = fdt_parent_offset(fdt, cpu_intc_offset);
|
||
if (cpu_offset < 0)
|
||
continue;
|
||
|
||
err = fdt_parse_hart_id(fdt, cpu_offset, &hartid);
|
||
if (err)
|
||
continue;
|
||
|
||
scratch = sbi_hartid_to_scratch(hartid);
|
||
if (!scratch)
|
||
continue;
|
||
|
||
plic_set_hart_data_ptr(scratch, pd);
|
||
switch (hwirq) {
|
||
case IRQ_M_EXT:
|
||
plic_set_hart_mcontext(scratch, i / 2);
|
||
break;
|
||
case IRQ_S_EXT:
|
||
plic_set_hart_scontext(scratch, i / 2);
|
||
break;
|
||
}
|
||
}
|
||
|
||
return 0;
|
||
}
|
||
|
||
So, lets try to do the same, but with the context 11 for machine mode
|
||
`IRQ_M_EXT`.
|
||
|
||
Let's compute the address of the input source for context 11:
|
||
|
||
base + 0x002000: Enable bits for sources 0-31 on context 0
|
||
base + 0x002004: Enable bits for sources 32-63 on context 0
|
||
...
|
||
base + 0x00207C: Enable bits for sources 992-1023 on context 0
|
||
base + 0x002080: Enable bits for sources 0-31 on context 1
|
||
base + 0x002084: Enable bits for sources 32-63 on context 1
|
||
...
|
||
base + 0x0020FC: Enable bits for sources 992-1023 on context 1
|
||
base + 0x002100: Enable bits for sources 0-31 on context 2
|
||
base + 0x002104: Enable bits for sources 32-63 on context 2
|
||
...
|
||
base + 0x00217C: Enable bits for sources 992-1023 on context 2
|
||
...
|
||
base + 0x1F1F80: Enable bits for sources 0-31 on context 15871
|
||
base + 0x1F1F84: Enable bits for sources 32-63 on context 15871
|
||
base + 0x1F1FFC: Enable bits for sources 992-1023 on context 15871
|
||
...
|
||
|
||
It should be:
|
||
|
||
>>> hex(0x40800000 + 0x2000 + (11 * 0x80))
|
||
'0x40802580'
|
||
|
||
They are all disabled:
|
||
|
||
=> md 0x40802580
|
||
40802580: 00000000 ....
|
||
|
||
So, let's enable the source 4 by writing 0x10
|
||
|
||
=> mw 0x40802580 0x10
|
||
=> md 0x40801000 1
|
||
40801000: 00000010 ....
|
||
|
||
Now, let's check the context 11 priority threshold:
|
||
|
||
0x200000: Priority threshold for context 0
|
||
0x201000: Priority threshold for context 1
|
||
0x202000: Priority threshold for context 2
|
||
0x203000: Priority threshold for context 3
|
||
|
||
The priority threshold for context 11 should be at:
|
||
|
||
>>> hex(0x40800000 + 0x200000 + (11 * 0x1000))
|
||
'0x40a0b000'
|
||
|
||
=> md 0x40a0b000
|
||
40a0b000: 00000000 ....
|
||
|
||
It has value 0, so all interrupts with non-zero priority should pass:
|
||
|
||
> For example, a threshold value of zero permits all interrupts with non-zero
|
||
> priority.
|
||
|
||
Let's see the priority of source 4 in context 11:
|
||
|
||
0x000000: Reserved (interrupt source 0 does not exist)
|
||
0x000004: Interrupt source 1 priority
|
||
0x000008: Interrupt source 2 priority
|
||
...
|
||
0x000FFC: Interrupt source 1023 priority
|
||
|
||
The address should be at:
|
||
|
||
>>> hex(0x40800000 + (4 * 0x4))
|
||
|
||
=> md 0x40800010
|
||
40800010: 00000000 ....
|
||
|
||
It has priority 0, so it would never work. Let's make it priority 1:
|
||
|
||
=> mw 0x40800010 1
|
||
=> md 0x40800010 1
|
||
40800010: 00000001
|
||
|
||
Let's check the pending interrupts:
|
||
|
||
=> md 0x40801000 1
|
||
40801000: 00000010 ....
|
||
|
||
It is still pending, so let's clear it my setting the MTIMECMP to a large value.
|
||
|
||
=> md 0x40014000
|
||
40014000: 00aaaaaa ....
|
||
=> mw 0x40014000 0xaaaaaaaa
|
||
=> md 0x40014000
|
||
40014000: aaaaaaaa ....
|
||
=> md 0x4001bff8
|
||
4001bff8: 0e8e6066 f`..
|
||
=> md 0x4001bff8
|
||
4001bff8: 0e8ea4c9 ....
|
||
=> md 0x4001bff8
|
||
4001bff8: 0e8ece24 $...
|
||
|
||
Now, let's claim and complete it for the context 0 which was already enabled
|
||
from the test before.
|
||
|
||
=> md 0x40a00004 1
|
||
40a00004: 00000004 ....
|
||
=> mw 0x40a00004 4
|
||
=> md 0x40801000 1
|
||
40801000: 00000000 ....
|
||
|
||
Perfect, now it is not pending anymore.
|
||
|
||
Now, the context 0 is still enabled, so the interruptions may be sent there
|
||
instead of context 11. So let's disable the context 0 first.
|
||
|
||
=> mw 0x40802000 0
|
||
=> md 0x40802000 1
|
||
40802000: 00000000 ....
|
||
|
||
Now let's fire the MTIMECMP and see if OpenSBI sees a machine trap.
|
||
|
||
=> md 0x40014000 1
|
||
40014000: aaaaaaaa ....
|
||
=> mw 0x40014000 00aaaaaa
|
||
=> md 0x40014000 1
|
||
40014000: 00aaaaaa ....
|
||
|
||
Nothing happened.
|
||
|
||
The interrupt is pending:
|
||
|
||
=> md 0x40801000 1
|
||
40801000: 00000010 ....
|
||
|
||
The claim on context 0 returns 0, so not interrupt there which is expected:
|
||
|
||
=> md 0x40a00004 1
|
||
40a00004: 00000000 ....
|
||
|
||
Let's compute the claim register on context 11:
|
||
|
||
0x200004: Interrupt Claim Process for context 0
|
||
0x201004: Interrupt Claim Process for context 1
|
||
0x202004: Interrupt Claim Process for context 2
|
||
0x203004: Interrupt Claim Process for context 3
|
||
...
|
||
|
||
>>> hex(0x40800000 + 0x200004 + (11 * 0x1000))
|
||
'0x40a0b004'
|
||
|
||
=> md 0x40a0b004 1
|
||
40a0b004: 00000000 ....
|
||
|
||
Hmm, there is no claim ID.
|
||
|
||
So, I checked again, and I cannot enable the interrupt on context 11:
|
||
|
||
=> md 0x40802580 1
|
||
40802580: 00000000 ....
|
||
=> mw 0x40802580 0x10
|
||
=> md 0x40802580 1
|
||
40802580: 00000000 ....
|
||
|
||
|
||
Note, the first value is 0 and must be claimed:
|
||
|
||
=> md 0x40801000 1
|
||
40801000: 00000010 ....
|
||
=> md 0x40802000 1
|
||
40802000: 00000010
|
||
=> md 0x40a00004 1
|
||
40a00004: 00000000 ....
|
||
=> mw 0x40a00004 1
|
||
=> mw 0x40a00004 4
|
||
=> md 0x40a00004 1
|
||
40a00004: 00000004 ....
|
||
=> mw 0x40a00004 4
|
||
=> md 0x40a00004 1
|
||
40a00004: 00000004 ....
|
||
=> mw 0x40a00004 4
|
||
=> md 0x40a00004 1
|
||
40a00004: 00000004 ....
|
||
|
||
## 2024-08-02
|
||
|
||
I see that the MIE sets the machine mode external interrupt enable in this way:
|
||
|
||
int sbi_irqchip_init(struct sbi_scratch *scratch, bool cold_boot)
|
||
{
|
||
int rc;
|
||
const struct sbi_platform *plat = sbi_platform_ptr(scratch);
|
||
|
||
rc = sbi_platform_irqchip_init(plat, cold_boot);
|
||
if (rc)
|
||
return rc;
|
||
|
||
if (ext_irqfn != default_irqfn)
|
||
csr_set(CSR_MIE, MIP_MEIP);
|
||
|
||
return 0;
|
||
}
|
||
|
||
Only if the external interrupt function is not the default one. But for the
|
||
PLIC, it looks like the default one is being used. So let's enable the machine
|
||
mode interrupts unconditionally.
|
||
|
||
Let's try to cause an interruption. I would need to list all the steps.
|
||
|
||
mw 0x40802000 0x10 # Enable interrupt for source 4 (timer)
|
||
mw 0x40800010 0xff # Make source 4 priority large
|
||
md 0x40a00004 1 # Show which value should be claimed
|
||
mw 0x40a00004 0 # Claim 0
|
||
|
||
Nice, I can see the trap:
|
||
|
||
Boot HART MIDELEG : 0x0000000000000022
|
||
Boot HART MEDELEG : 0x000000000000b109
|
||
...
|
||
=> mw 0x40802000 0x10 # Enable interrupt for source 4 (timer)
|
||
=> mw 0x40800010 0xff # Make source 4 priority large
|
||
=> md 0x40a00004 1 # Show which value should be claimed
|
||
40a00004: 00000000 ....
|
||
=> mw 0x40a00004 0 # Claim 0 (weird)
|
||
<i
|
||
sbi_trap_error: hart0: trap0: unhandled local interrupt (error -1000)
|
||
|
||
sbi_trap_error: hart0: trap0: mcause=0x800000000000000b mtval=0x0000000000000000
|
||
sbi_trap_error: hart0: trap0: mepc=0x00000000af71ebbc mstatus=0x8000000a00006800
|
||
sbi_trap_error: hart0: trap0: ra=0x00000000af71eba0 sp=0x00000000aeed3b00
|
||
sbi_trap_error: hart0: trap0: gp=0x00000000aeed3dd0 tp=0x0000000000000000
|
||
sbi_trap_error: hart0: trap0: s0=0x00000000af7cd170 s1=0x0000000000000000
|
||
sbi_trap_error: hart0: trap0: a0=0x0000000000000000 a1=0x0000000000000002
|
||
sbi_trap_error: hart0: trap0: a2=0x0000000000000008 a3=0x0000000000000004
|
||
sbi_trap_error: hart0: trap0: a4=0x0000000000000001 a5=0x0000000000000001
|
||
sbi_trap_error: hart0: trap0: a6=0x0000000000000008 a7=0x00000000af795778
|
||
sbi_trap_error: hart0: trap0: s2=0x0000000000000000 s3=0x00000000aeed5b90
|
||
sbi_trap_error: hart0: trap0: s4=0x0000000000000003 s5=0x00000000af7f7a4c
|
||
sbi_trap_error: hart0: trap0: s6=0x0000000000000000 s7=0x0000000000000000
|
||
sbi_trap_error: hart0: trap0: s8=0x0000000000000000 s9=0x0000000000000000
|
||
sbi_trap_error: hart0: trap0: s10=0x00000000aeed5bc0 s11=0x0000000000000000
|
||
sbi_trap_error: hart0: trap0: t0=0x00000000aeed3ac0 t1=0x0000000000000039
|
||
sbi_trap_error: hart0: trap0: t2=0x3b3d74696e695f64 t3=0x0000000000000010
|
||
sbi_trap_error: hart0: trap0: t4=0x0000000000000000 t5=0x61745f746f6f627b
|
||
sbi_trap_error: hart0: trap0: t6=0x00000000aeed3aa0
|
||
|
||
Now let's try delegating it to u-boot, and see if I can print some information.
|