Document hang missing hvc_remove trace point
This commit is contained in:
parent
6721e1e22c
commit
4641e0d9a0
60
JOURNAL.md
60
JOURNAL.md
@ -43,7 +43,7 @@ Nope, the u-boot is reporting the d extension is in the isa:
|
||||
|
||||
> riscv,isa = "rv64imafd";
|
||||
|
||||
## 2024-07-03
|
||||
## 2024-07-03<!--{{{-->
|
||||
|
||||
I cannot switch to `gcc.arch = rv64ima` because rust fails to build.
|
||||
|
||||
@ -255,8 +255,8 @@ the plic follows a different convention of values. Using 9 and 11:
|
||||
**Remark**: The key combination to run Magic SysRq using the HVC console is<!--{{{-->
|
||||
Ctrl-O and then the SysRq key. It only works it the console is being actively
|
||||
polled, otherwise it hangs.<!--}}}-->
|
||||
|
||||
## 2024-07-04
|
||||
<!--}}}-->
|
||||
## 2024-07-04<!--{{{-->
|
||||
|
||||
### OBSERVATION: I saw they changed this option in Cinco Ranch DTS for the<!--{{{-->
|
||||
serial:
|
||||
@ -518,9 +518,11 @@ Can this produce any problem?
|
||||
It doesn't seem to change anything, still unable to send any bytes.
|
||||
|
||||
<!--}}}-->
|
||||
### QUESTION: Can we use virtio to mount a FS in the DMA shared memory?
|
||||
### QUESTION: Can we use virtio to mount a FS in the DMA shared memory?<!--{{{-->
|
||||
|
||||
## 2024-07-05
|
||||
<!--}}}-->
|
||||
<!--}}}-->
|
||||
## 2024-07-05<!--{{{-->
|
||||
|
||||
### OBSERVATION: The kernel continues working when the console hangs.<!--{{{-->
|
||||
|
||||
@ -568,11 +570,10 @@ Then
|
||||
Yes, it seems to be working. Let's load the rootfs too.
|
||||
|
||||
I added a loop in the stage1 script.<!--}}}-->
|
||||
|
||||
### QUESTION: Can we see any clock in memory?
|
||||
### QUESTION: Can we see any clock in memory?<!--{{{-->
|
||||
|
||||
This will allow us to check if the AXI still works.
|
||||
|
||||
<!--}}}-->
|
||||
### OBSERVATION: The kernel stops updating the counter in the mount phase.<!--{{{-->
|
||||
|
||||
Managed to reach the mount and hang there:
|
||||
@ -592,8 +593,10 @@ After almost 6 minutes, with 571 beats:
|
||||
|
||||
It looks like the kernel is the one getting stuck *or* at least is unable to
|
||||
propagate the heartbeat changes to the host. It would be nice to monitor a
|
||||
hardware clock from the DMA region too, so we can discard problems in the AXI.<!--}}}-->
|
||||
hardware clock from the DMA region too, so we can discard problems in the AXI.
|
||||
|
||||
<!--}}}-->
|
||||
### OBSERVATION: There is an ioctl failed for /dev/console<!--{{{-->
|
||||
|
||||
[ 177.009540] stage-1-init: [Thu Jan 1 00:02:56 UTC 1970] + udevadm settle
|
||||
+ kbd_mode -u -C /dev/console
|
||||
@ -602,6 +605,7 @@ hardware clock from the DMA region too, so we can discard problems in the AXI.<!
|
||||
+ loadkmap
|
||||
[ 266.301040] stage-1-init: [Thu Jan 1 00:04:25 UTC 1970] + kbd_mode -u -C /dev/console
|
||||
|
||||
<!--}}}-->
|
||||
### ASSUMPTION: The kernel hangs.<!--{{{-->
|
||||
|
||||
If the kernel hangs, there must be an instruction or sequence of instructions
|
||||
@ -681,7 +685,6 @@ Disabling clang as it is failing to build:
|
||||
error: 1 dependencies of derivation '/nix/store/l2x18cih29r1kn6vi8imwhkyk98yhw4i-nix-shell-riscv64-unknown-linux-gnu-env.drv' failed to build
|
||||
|
||||
<!--}}}-->
|
||||
|
||||
### QUESTION: Missing cache information may affect?<!--{{{-->
|
||||
|
||||
Other CPUs report the cache details in the DT. For example this one
|
||||
@ -716,9 +719,7 @@ https://github.com/torvalds/linux/blob/master/arch/riscv/boot/dts/sifive/fu540-c
|
||||
};
|
||||
|
||||
We may want to add it to our DT to be sure that it has no effect.<!--}}}-->
|
||||
|
||||
|
||||
### OBSERVATION: Arrived to stage 2!
|
||||
### OBSERVATION: Arrived to stage 2!<!--{{{-->
|
||||
|
||||
+ kill -9 74
|
||||
+ readlink /proc/75/exe
|
||||
@ -756,13 +757,17 @@ We may want to add it to our DT to be sure that it has no effect.<!--}}}-->
|
||||
[ 425.302000] random: perl: uninitialized urandom read (4 bytes read)
|
||||
|
||||
But then it hangs.
|
||||
<!--}}}-->
|
||||
<!--}}}-->
|
||||
## 2024-07-08
|
||||
|
||||
### QUESTION: Who sets the plic interrupts?
|
||||
### QUESTION: Who sets the plic interrupts?<!--{{{-->
|
||||
|
||||
Shouldn't OpenSBI read the DT and do some configuration in the plic while in
|
||||
Shouldn't OpenSBI read the DT and do some configuration in the PLIC while in
|
||||
machine mode?
|
||||
|
||||
### OBSERVATION: Semi-stack trace from CincoRanch
|
||||
<!--}}}-->
|
||||
### OBSERVATION: Semi-stack trace from CincoRanch<!--{{{-->
|
||||
|
||||
hvc_remove?
|
||||
console_unlock <-- only called from hvc_remove()
|
||||
@ -797,8 +802,29 @@ machine mode?
|
||||
no_context.part.0
|
||||
die_kernel_fault <-- last frame(?)
|
||||
|
||||
### QUESTION: Can we place a tracepoint in `hvc_remove`?
|
||||
<!--}}}-->
|
||||
### QUESTION: Can we place a trace point in `hvc_remove`?<!--{{{-->
|
||||
|
||||
If we are getting stuck in the same place, we should be able to see the
|
||||
backtrace (assuming the console still works) just before we try to remove the
|
||||
console device.
|
||||
|
||||
Placed, but still unable to see anything in any hang. Here is a hang in the
|
||||
Stage 2:
|
||||
|
||||
<<< NixOS Stage 2 >>>
|
||||
|
||||
[ 404.158340] EXT4-fs (pmem0p2): re-mounted 44444444-4444-4444-8888-888888888888 r/w. Quota mode: none.
|
||||
[ 404.242500] booting system configuration /nix/store/0za1vqh5alk7mxqs59qxx8izmwmf21w6-nixos-system-nixos-riscv-24.11pre-git
|
||||
running activation script...
|
||||
[ 408.148380] stage-2-init: running activation script...
|
||||
[ 411.612240] random: perl: uninitialized urandom read (4 bytes read)
|
||||
[ 411.866440] random: perl: uninitialized urandom read (4 bytes read)
|
||||
[ 447.588880] random: perl: uninitialized urandom read (4 bytes read)
|
||||
|
||||
Still, it may be hang in a similar way, causing a loop of page faults just
|
||||
while trying to printk to the console, which would explain why we don't see
|
||||
anything and why the heartbeat stops.
|
||||
|
||||
<!--}}}-->
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user