Add results of changing the CSR

It seems to arrive to systemd with all-in-order, but hangs there.
This commit is contained in:
Rodrigo Arias 2024-07-09 12:26:00 +02:00
parent 5b34b3b97b
commit dd6082e805

View File

@ -886,7 +886,83 @@ Continues to hang just after those perl messages:
May be a long shot, but if we are experiencing the same page fault loop as in May be a long shot, but if we are experiencing the same page fault loop as in
cincoranch we may as well try. cincoranch we may as well try.
## 2024-07-09
### QUESTION: Maybe we can try without out-of-order? ### QUESTION: Maybe we can try without out-of-order?
I made a small tool in C to view and change the CSR register that controls the I made a small tool in C to view and change the CSR register that controls the
in-order/out-of-order. Maybe we can try with the "in-order" setting. in-order/out-of-order. Maybe we can try with the "in-order" setting.
We arrive to execute `systemd`:
Starting interactive shell...
+ setsid /nix/store/xm3mpj9aldz5r4s5yb7p08jdjv98hj4w-extra-utils/bin/ash -c 'exec /nix/store/xm3mpj9aldz5r4s5yb7p08jdjv98hj4w-extra-utils/bin/ash < /dev/hvc0 >/dev/hvc0 2>/dev/hvc0'
[ 90.077300] stage-1-init: [Thu Jan 1 00:01:27 UTC 1970] + '[' -n 1 -a i '=' f ]
[ 90.639760] stage-1-init: [Thu Jan 1 00:01:28 UTC 1970] + '[' -n 1 -a i '=' i ]
~ # [ 90.967260] stage-1-init: [Thu Jan 1 00:01:28 UTC 1970] + echo 'Starting interactive shell...'
[ 91.234980] stage-1-init: [Thu Jan 1 00:01:28 UTC 1970] Starting interactive shell...
[ 91.569580] stage-1-init: [Thu Jan 1 00:01:29 UTC 1970] + setsid /nix/store/xm3mpj9aldz5r4s5yb7p08jdjv98hj4w-extra-utils/bin/ash -c 'exec /nix/store/xm3mpj9aldz5r4s5yb7p08jdjv98hj4w-extr
a-utils/bin/ash < /dev/hvc0 >/dev/hvc0 2>/dev/hvc0'
which csrtool
/nix/store/xm3mpj9aldz5r4s5yb7p08jdjv98hj4w-extra-utils/bin/csrtool
~ # csrtool
CSR 0x801 = 0u
~ # csrtool o
unknown 'o', use: mem-in-order, all-in-order or all-out-of-order
~ # csrtool all-in-order
CSR 0x801 = 7u
~ # csrtool
CSR 0x801 = 7u
~ #
+ IFS='='
+ echo init /nix/store/xmagm60y90pfh3yvqanvmaswa0m3cb0a-nixos-system-nixos-riscv-24.11pre-git/init
+ set -- init /nix/store/xmagm60y90pfh3yvqanvmaswa0m3cb0a-nixos-system-nixos-riscv-24.11pre-git/init
+ stage2Init=/nix/store/xmagm60y90pfh3yvqanvmaswa0m3cb0a-nixos-system-nixos-riscv-24.11pre-git/init
+ echo /nix/store/xm3mpj9aldz5r4s5yb7p08jdjv98hj4w-extra-utils/bin/modprobe
+ basename dm_mod
[...]
+ echo /sbin/modprobe
+ '[' '!' -e /mnt-root//nix/store/xmagm60y90pfh3yvqanvmaswa0m3cb0a-nixos-system-nixos-riscv-24.11pre-git/init ]
+ mkdir -m 0755 -p /mnt-root/proc /mnt-root/sys /mnt-root/dev /mnt-root/run
+ mount --move /proc /mnt-root/proc
+ mount --move /sys /mnt-root/sys
+ mount --move /dev /mnt-root/dev
+ mount --move /run /mnt-root/run
+ type -P switch_root
+ exec env -i /nix/store/xm3mpj9aldz5r4s5yb7p08jdjv98hj4w-extra-utils/bin/switch_root /mnt-root /nix/store/xmagm60y90pfh3yvqanvmaswa0m3cb0a-nixos-system-nixos-riscv-24.11pre-git/init
<<< NixOS Stage 2 >>>
[ 967.703320] EXT4-fs (pmem0p2): re-mounted 44444444-4444-4444-8888-888888888888 r/w. Quota mode: none.
[ 967.928020] booting system configuration /nix/store/xmagm60y90pfh3yvqanvmaswa0m3cb0a-nixos-system-nixos-riscv-24.11pre-git
running activation script...
[ 977.608980] stage-2-init: running activation script...
bbbbbbbbbbbbbbbbbbbbsetting up /etc...
[ 1084.376420] stage-2-init: setting up /etc...
starting systemd...
Not sure if it is a good reproducer, as it is taking around 15 minutes to hang
in a very large piece of software, while when we have the out-of-order enabled,
we can hang in half of the time in some script.
Either way, we need to see a backtrace of where it is hanging to understand why
it does. We may also enable a stage-2 heartbeat to be sure that it is hanging
the kernel and not only the console.
Another idea is to arrive at a proper bash shell, where we can have debugging
tools, which may allow us to go slowly until we catch the bug.
### OBSERVATION: Setting memory in-order only causes a hang
Tested with:
$ csrtool mem-in-order
And it hangs just after exiting the tool.
### QUESTION: Can we see the printk buffer from the host?
If the problem that we are observing is somehow related to the recursive
segfault of the kernel in Cincoranch, we may be able to see the printk ring
buffer by directly poking at the memory from the host.