Document hang missing hvc_remove trace point

This commit is contained in:
Rodrigo Arias 2024-07-08 10:41:50 +02:00
parent 6721e1e22c
commit 4641e0d9a0

View File

@ -43,7 +43,7 @@ Nope, the u-boot is reporting the d extension is in the isa:
> riscv,isa = "rv64imafd"; > riscv,isa = "rv64imafd";
## 2024-07-03 ## 2024-07-03<!--{{{-->
I cannot switch to `gcc.arch = rv64ima` because rust fails to build. I cannot switch to `gcc.arch = rv64ima` because rust fails to build.
@ -255,8 +255,8 @@ the plic follows a different convention of values. Using 9 and 11:
**Remark**: The key combination to run Magic SysRq using the HVC console is<!--{{{--> **Remark**: The key combination to run Magic SysRq using the HVC console is<!--{{{-->
Ctrl-O and then the SysRq key. It only works it the console is being actively Ctrl-O and then the SysRq key. It only works it the console is being actively
polled, otherwise it hangs.<!--}}}--> polled, otherwise it hangs.<!--}}}-->
<!--}}}-->
## 2024-07-04 ## 2024-07-04<!--{{{-->
### OBSERVATION: I saw they changed this option in Cinco Ranch DTS for the<!--{{{--> ### OBSERVATION: I saw they changed this option in Cinco Ranch DTS for the<!--{{{-->
serial: serial:
@ -518,9 +518,11 @@ Can this produce any problem?
It doesn't seem to change anything, still unable to send any bytes. It doesn't seem to change anything, still unable to send any bytes.
<!--}}}--> <!--}}}-->
### QUESTION: Can we use virtio to mount a FS in the DMA shared memory? ### QUESTION: Can we use virtio to mount a FS in the DMA shared memory?<!--{{{-->
## 2024-07-05 <!--}}}-->
<!--}}}-->
## 2024-07-05<!--{{{-->
### OBSERVATION: The kernel continues working when the console hangs.<!--{{{--> ### OBSERVATION: The kernel continues working when the console hangs.<!--{{{-->
@ -568,11 +570,10 @@ Then
Yes, it seems to be working. Let's load the rootfs too. Yes, it seems to be working. Let's load the rootfs too.
I added a loop in the stage1 script.<!--}}}--> I added a loop in the stage1 script.<!--}}}-->
### QUESTION: Can we see any clock in memory?<!--{{{-->
### QUESTION: Can we see any clock in memory?
This will allow us to check if the AXI still works. This will allow us to check if the AXI still works.
<!--}}}-->
### OBSERVATION: The kernel stops updating the counter in the mount phase.<!--{{{--> ### OBSERVATION: The kernel stops updating the counter in the mount phase.<!--{{{-->
Managed to reach the mount and hang there: Managed to reach the mount and hang there:
@ -592,8 +593,10 @@ After almost 6 minutes, with 571 beats:
It looks like the kernel is the one getting stuck *or* at least is unable to It looks like the kernel is the one getting stuck *or* at least is unable to
propagate the heartbeat changes to the host. It would be nice to monitor a propagate the heartbeat changes to the host. It would be nice to monitor a
hardware clock from the DMA region too, so we can discard problems in the AXI.<!--}}}--> hardware clock from the DMA region too, so we can discard problems in the AXI.
<!--}}}-->
### OBSERVATION: There is an ioctl failed for /dev/console<!--{{{-->
[ 177.009540] stage-1-init: [Thu Jan 1 00:02:56 UTC 1970] + udevadm settle [ 177.009540] stage-1-init: [Thu Jan 1 00:02:56 UTC 1970] + udevadm settle
+ kbd_mode -u -C /dev/console + kbd_mode -u -C /dev/console
@ -602,6 +605,7 @@ hardware clock from the DMA region too, so we can discard problems in the AXI.<!
+ loadkmap + loadkmap
[ 266.301040] stage-1-init: [Thu Jan 1 00:04:25 UTC 1970] + kbd_mode -u -C /dev/console [ 266.301040] stage-1-init: [Thu Jan 1 00:04:25 UTC 1970] + kbd_mode -u -C /dev/console
<!--}}}-->
### ASSUMPTION: The kernel hangs.<!--{{{--> ### ASSUMPTION: The kernel hangs.<!--{{{-->
If the kernel hangs, there must be an instruction or sequence of instructions If the kernel hangs, there must be an instruction or sequence of instructions
@ -681,7 +685,6 @@ Disabling clang as it is failing to build:
error: 1 dependencies of derivation '/nix/store/l2x18cih29r1kn6vi8imwhkyk98yhw4i-nix-shell-riscv64-unknown-linux-gnu-env.drv' failed to build error: 1 dependencies of derivation '/nix/store/l2x18cih29r1kn6vi8imwhkyk98yhw4i-nix-shell-riscv64-unknown-linux-gnu-env.drv' failed to build
<!--}}}--> <!--}}}-->
### QUESTION: Missing cache information may affect?<!--{{{--> ### QUESTION: Missing cache information may affect?<!--{{{-->
Other CPUs report the cache details in the DT. For example this one Other CPUs report the cache details in the DT. For example this one
@ -716,9 +719,7 @@ https://github.com/torvalds/linux/blob/master/arch/riscv/boot/dts/sifive/fu540-c
}; };
We may want to add it to our DT to be sure that it has no effect.<!--}}}--> We may want to add it to our DT to be sure that it has no effect.<!--}}}-->
### OBSERVATION: Arrived to stage 2!<!--{{{-->
### OBSERVATION: Arrived to stage 2!
+ kill -9 74 + kill -9 74
+ readlink /proc/75/exe + readlink /proc/75/exe
@ -756,13 +757,17 @@ We may want to add it to our DT to be sure that it has no effect.<!--}}}-->
[ 425.302000] random: perl: uninitialized urandom read (4 bytes read) [ 425.302000] random: perl: uninitialized urandom read (4 bytes read)
But then it hangs. But then it hangs.
<!--}}}-->
<!--}}}-->
## 2024-07-08
### QUESTION: Who sets the plic interrupts? ### QUESTION: Who sets the plic interrupts?<!--{{{-->
Shouldn't OpenSBI read the DT and do some configuration in the plic while in Shouldn't OpenSBI read the DT and do some configuration in the PLIC while in
machine mode? machine mode?
### OBSERVATION: Semi-stack trace from CincoRanch <!--}}}-->
### OBSERVATION: Semi-stack trace from CincoRanch<!--{{{-->
hvc_remove? hvc_remove?
console_unlock <-- only called from hvc_remove() console_unlock <-- only called from hvc_remove()
@ -797,8 +802,29 @@ machine mode?
no_context.part.0 no_context.part.0
die_kernel_fault <-- last frame(?) die_kernel_fault <-- last frame(?)
### QUESTION: Can we place a tracepoint in `hvc_remove`? <!--}}}-->
### QUESTION: Can we place a trace point in `hvc_remove`?<!--{{{-->
If we are getting stuck in the same place, we should be able to see the If we are getting stuck in the same place, we should be able to see the
backtrace (assuming the console still works) just before we try to remove the backtrace (assuming the console still works) just before we try to remove the
console device. console device.
Placed, but still unable to see anything in any hang. Here is a hang in the
Stage 2:
<<< NixOS Stage 2 >>>
[ 404.158340] EXT4-fs (pmem0p2): re-mounted 44444444-4444-4444-8888-888888888888 r/w. Quota mode: none.
[ 404.242500] booting system configuration /nix/store/0za1vqh5alk7mxqs59qxx8izmwmf21w6-nixos-system-nixos-riscv-24.11pre-git
running activation script...
[ 408.148380] stage-2-init: running activation script...
[ 411.612240] random: perl: uninitialized urandom read (4 bytes read)
[ 411.866440] random: perl: uninitialized urandom read (4 bytes read)
[ 447.588880] random: perl: uninitialized urandom read (4 bytes read)
Still, it may be hang in a similar way, causing a loop of page faults just
while trying to printk to the console, which would explain why we don't see
anything and why the heartbeat stops.
<!--}}}-->