From a7c460b0349f778cb8b7075aa842aacdd39bb33d Mon Sep 17 00:00:00 2001 From: Rodrigo Arias Mallo Date: Fri, 5 Jul 2024 17:05:28 +0200 Subject: [PATCH] Use headings to allow hrefs --- JOURNAL.md | 87 ++++++++++++++++++++++++++++++------------------------ 1 file changed, 49 insertions(+), 38 deletions(-) diff --git a/JOURNAL.md b/JOURNAL.md index 4221efd..c52d45a 100644 --- a/JOURNAL.md +++ b/JOURNAL.md @@ -53,17 +53,17 @@ CPU. Let's go back and try to get the initrd shell, so we can systematically hang it in the `switch_root` -**Observation**: The riscv-timer seems to be causing interrupts with IRQ 5: +### OBSERVATION: The riscv-timer seems to be causing interrupts with IRQ 5: ``` [ 62.439060] irq_handler_entry: irq=5 name=riscv-timer [ 62.444980] irq_handler_exit: irq=5 ret=handled ``` -**Observation**: Rohan reports the serial startup routine being running *after*{{{ +### OBSERVATION: Rohan reports the serial startup routine being running *after* the init begins. -**Observation**: Only interrupts in timer, others are zero. +### OBSERVATION: Only interrupts in timer, others are zero. With: @@ -95,7 +95,7 @@ I can see this: IPI4: 0 IRQ work interrupts IPI5: 0 Timer broadcast interrupts -**Observation**: There is a timer configured in 0x40170000 but in the device +### OBSERVATION: There is a timer configured in 0x40170000 but in the device tree we only have one at `timer@40002000`. #define OX_ALVEO_TIMER_BASE 0x40170000 @@ -106,7 +106,7 @@ tree we only have one at `timer@40002000`. https://gitlab.bsc.es/hwdesign/bsc-linux/-/blob/d6d194bd30d9a8fe49c2a278ffb3c3ae7852e75d/bsc_tree/patches/ox_alveo/opensbi/0001-opensbi-ox_alveo-platform.patch#L63 -**Observation**: When the serial console starts, the speed of the serial port +### OBSERVATION: When the serial console starts, the speed of the serial port changes to 9600: [ 6.845400] io scheduler mq-deadline registered @@ -181,13 +181,14 @@ That was my mistake as I need to put the baud speed in the ttyS0, like this: console=ttyS0,115200n8 -**Observation**: Trying to read from the serial console /dev/ttyS0 causes no +### OBSERVATION: Trying to read from the serial console /dev/ttyS0 causes no more messages in the console (or a hang). -**Question**: Can we make a heartbeat for the kernel? The idea is to keep a -counter in some memory of the kernel so we can see it from the host being moved. +### QUESTION: Can we make a heartbeat for the kernel? +The idea is to keep a counter in some memory of the kernel so we can see it from +the host being moved. -**Question**: Can we disable the serial driver 8250 from loading? +### QUESTION: Can we disable the serial driver 8250 from loading? initcall_blacklist= @@ -208,7 +209,7 @@ Yes, but that doesn't seem to do anything. It is hanging: [ 629.733920] stage-1-init: [Thu Jan 1 00:10:29 UTC 1970] + echo /nix/store/snvvqpxmryw1szlllk0bxpm37p8vj8sw-extra-utils/bin/modprobe -**Question**: What happens if we remap the interruptions? +### QUESTION: What happens if we remap the interruptions? - Move the serial from 0 to 1 - Move the plic from 3 to 2 and remove 7 @@ -225,14 +226,14 @@ Rather than two: [ 0.000000] plic: plic@40800000: mapped 3 interrupts with 0 handlers for 2 contexts. [ 0.000000] riscv: providing IPIs using SBI IPI extension -**Question**: What happens if we block the `sbi_ipi` driver? +### QUESTION: What happens if we block the `sbi_ipi` driver? initcall_blacklist=sbi_ipi_init Nothing, it cannot be disabled it seems. I will remove SMP support so it won't be compiled in. -**Observation**: Searching for 'riscv,plic0' only matches irq-sifive-plic driver. +### OBSERVATION: Searching for 'riscv,plic0' only matches irq-sifive-plic driver. hut% rg 'riscv,plic0' Documentation/devicetree/bindings/interrupt-controller/sifive,plic-1.0.0.yaml @@ -244,7 +245,7 @@ be compiled in. So it looks that the only driver that setups the plic is the one used by SiFive. Here is the doc: https://static.dev.sifive.com/U54-MC-RVCoreIP.pdf -**Observation**: The number of handlers is 0, so there are no interruptions. +### OBSERVATION: The number of handlers is 0, so there are no interruptions. It seems the number next to the phandle of the interrupts-extended attribute in the plic follows a different convention of values. Using 9 and 11: @@ -257,7 +258,7 @@ polled, otherwise it hangs. ## 2024-07-04 -**Observation**: I saw they changed this option in Cinco Ranch DTS for the +### OBSERVATION: I saw they changed this option in Cinco Ranch DTS for the serial: > reg-shift = <0>; // regs are spaced on 8 bit boundary (modified from Xilinx UART16550 to be ns16550 compatible) @@ -265,7 +266,7 @@ serial: Tested booting with debug1 and the ttyS0 console, and it goes extremely slow (but still outputs at 115200) and then continues to fail to read keyboard input. -**Question**: Let's try setting the console in poll mode. +### QUESTION: Let's try setting the console in poll mode. setenv bootargs "root=/dev/ram0 loglevel=7 debug rw earlycon=uart,io,0x40001000,115200n8 boot.trace console=uart,io,0x40001000,115200n8 debug1 init=/nix/store/wavmnv6wjj8y10ha07wxd5f0sqacivj8-nixos-system-nixos-riscv-23.11pre-git/init" @@ -304,17 +305,17 @@ setenv bootargs "root=/dev/ram0 loglevel=7 debug rw earlycon=uart,io,0x40001000, Also found: `no_console_suspend` -**Observation**: There are messages of address space being assigned to +### OBSERVATION: There are messages of address space being assigned to registers: Slave segment '/MEEP_uart_0/S_AXI/Reg' is being assigned into address space '/m_axi_uart0' at <0x0000_0000 [ 4K ]>. Slave segment '/MEEP_uart_1/S_AXI/Reg' is being assigned into address space '/m_axi_uart1' at <0x0000_0000 [ 4K ]>. -**Question**: What happens if I enable `CONFIG_CONSOLE_POLL`? +### QUESTION: What happens if I enable `CONFIG_CONSOLE_POLL`? With `console=ttyS0,115200n8 debug1` I cannot type. -**Observation**: I can dump iomem memory with the tool devmem: +### OBSERVATION: I can dump iomem memory with the tool devmem: But it seems I cannot dump the registers of the serial io mapped region: @@ -364,7 +365,7 @@ It works! ~ # devmem 0x40001000 0x0000000D -**Observation**: The interrupt register of the serial console is 0x0: +### OBSERVATION: The interrupt register of the serial console is 0x0: Assuming the console registers follow AXI UART 16550, here is the IER: @@ -380,8 +381,7 @@ The line control register is 0x3: ~ # devmem 0x4000100C 0x00000003 -**Question**: Can I write to some memory address and see the result from the -host? +### QUESTION: Can I write to some memory address and see the result from the host? For that I would need to find some address that is mapped to the DMA or to the pmem. Xavi recommended `0x6000_0000` as it is uncached. @@ -467,18 +467,22 @@ But we don't see the same: [bsc015557@fpgan02 nixos]$ dd if=/dev/qdma34000-MM-1 count=16 bs=1 skip=$FPGACTL_KERNEL_ADDR 2>/dev/null | xxd 00000000: 9797 9797 9797 9797 9797 9797 9797 9797 ................ [bsc015557@fpgan02 nixos]$ dd if=/dev/qdma34000-MM-0 count=16 bs=1 skip=$FPGACTL_KERNEL_ADDR 2>/dev/null | xxd - 00000000: 9797 9797 9797 9797 9797 9797 9797 9797 ................ -**Question**: Missing forward M to S via Mideleg? + 00000000: 9797 9797 9797 9797 9797 9797 9797 9797 ................ + + +### QUESTION: Missing forward M to S via Mideleg? Can it be happening that he MEDELEG is not forwarding the interruptions to the Supervisor (kernel)? Boot HART MIDELEG : 0x0000000000000222 Boot HART MEDELEG : 0x000000000000b109 + -**Question**: Can we add a timer to the PLIC to test the interrupts? +### QUESTION: Can we add a timer to the PLIC to test the interrupts? + -**Observation**: Here is the PLIC register dump: +### OBSERVATION: Here is the PLIC register dump: ~ # for i in `seq 0 16`; do addr=$((0x40600000 + $i)); printf '%08x: ' $addr; devmem $addr; done 40600000: 0x00010002 @@ -499,12 +503,12 @@ Supervisor (kernel)? 4060000f: 0x00000000 40600010: 0x00000000 -**Question**: Can we boot with the new bitstream that includes the second UART? +### QUESTION: Can we boot with the new bitstream that includes the second UART? The interruptions are enabled for the UART 1, not the default UART 0. -**Observation**: I'm using 0x100 not 0x1000 in the serial range: +### OBSERVATION: I'm using 0x100 not 0x1000 in the serial range: reg = <0x0 0x40003000 0x0 0x100>; reg = <0x0 0x40003000 0x0 0x1000>; @@ -512,12 +516,13 @@ The interruptions are enabled for the UART 1, not the default UART 0. Can this produce any problem? It doesn't seem to change anything, still unable to send any bytes. + -**Question**: Can we use virtio to mount a FS in the DMA shared memory? +### QUESTION: Can we use virtio to mount a FS in the DMA shared memory? ## 2024-07-05 -**Observation**: The kernel continues working when the console hangs. +### OBSERVATION: The kernel continues working when the console hangs. Switching to 0x100000000 as 0x60000000 shows: @@ -540,8 +545,9 @@ Shows the kernel works: a0000000: 6700 0000 g... a0000000: 6800 0000 h... a0000000: 6900 0000 i... + -**Question**: Can we reproduce it with `switch_root`? +### QUESTION: Can we reproduce it with `switch_root`? For that I would have to ensure the process continues to operate, even if we exit the console. Maybe I can make a double fork? @@ -563,10 +569,11 @@ Yes, it seems to be working. Let's load the rootfs too. I added a loop in the stage1 script. -**Question**: Can we see any clock in memory? This will allow us to check if the -AXI still works. +### QUESTION: Can we see any clock in memory? -**Observation**: The kernel stops updating the counter in the mount phase. +This will allow us to check if the AXI still works. + +### OBSERVATION: The kernel stops updating the counter in the mount phase. Managed to reach the mount and hang there: @@ -595,7 +602,7 @@ hardware clock from the DMA region too, so we can discard problems in the AXI. +### ASSUMPTION: The kernel hangs. If the kernel hangs, there must be an instruction or sequence of instructions that causes it. First I need to determine what is being executed by the kernel. @@ -607,16 +614,18 @@ hangs. (prev_comm != 2 && next_comm != 2) So, we can just enable the `tp_printk` but not the tracer. Then in the initrd -script, I enable the function tracer and the filter. +script, I enable the function tracer and the filter. -**Observation**: It takes a long time to init the pty: + +### OBSERVATION: It takes a long time to init the pty: Interesting timing: [ 12.612620] initcall_start: func=pty_init+0x0/0x3f4 [ 20.962640] initcall_finish: func=pty_init+0x0/0x3f4 ret=0 + -**Observation**: The kcompactd0 daemon is using the CPU: +### OBSERVATION: The kcompactd0 daemon is using the CPU: [ 290.394920] sched_switch: prev_comm=devmem prev_pid=129 prev_prio=120 prev_state=R ==> next_comm=init next_pid=69 next_prio=120 [ 290.408160] sched_switch: prev_comm=init prev_pid=69 prev_prio=120 prev_state=R ==> next_comm=tee next_pid=68 next_prio=120 @@ -644,8 +653,9 @@ Interesting timing: [ 290.699720] sched_switch: prev_comm=ksoftirqd/0 prev_pid=12 prev_prio=120 prev_state=R ==> next_comm=init next_pid=1 next_prio=120 [ 290.712880] sched_switch: prev_comm=init prev_pid=1 prev_prio=120 prev_state=R ==> next_comm=khvcd next_pid=31 next_prio=120 [ 290.725500] sched_switch: prev_comm=khvcd prev_pid=31 prev_prio=120 prev_state=R ==> next_comm=kcompactd0 next_pid=22 next_prio=120 + -**Question**: Can we reproduce this hang with 6.9.7? +### QUESTION: Can we reproduce this hang with 6.9.7? Disabling clang as it is failing to build: @@ -669,3 +679,4 @@ Disabling clang as it is failing to build: error: 1 dependencies of derivation '/nix/store/b13shgqj7128rdsdzzp4qicqbzl0wnfw-system-path.drv' failed to build error: 1 dependencies of derivation '/nix/store/6qghlihqcyg6155309ldj5xm9m0v835i-nixos-system-nixos-riscv-24.11pre-git.drv' failed to build error: 1 dependencies of derivation '/nix/store/l2x18cih29r1kn6vi8imwhkyk98yhw4i-nix-shell-riscv64-unknown-linux-gnu-env.drv' failed to build +