diff --git a/JOURNAL.md b/JOURNAL.md index ae07f53..6080c78 100644 --- a/JOURNAL.md +++ b/JOURNAL.md @@ -886,7 +886,83 @@ Continues to hang just after those perl messages: May be a long shot, but if we are experiencing the same page fault loop as in cincoranch we may as well try. +## 2024-07-09 + ### QUESTION: Maybe we can try without out-of-order? I made a small tool in C to view and change the CSR register that controls the in-order/out-of-order. Maybe we can try with the "in-order" setting. + +We arrive to execute `systemd`: + + + Starting interactive shell... + + setsid /nix/store/xm3mpj9aldz5r4s5yb7p08jdjv98hj4w-extra-utils/bin/ash -c 'exec /nix/store/xm3mpj9aldz5r4s5yb7p08jdjv98hj4w-extra-utils/bin/ash < /dev/hvc0 >/dev/hvc0 2>/dev/hvc0' + [ 90.077300] stage-1-init: [Thu Jan 1 00:01:27 UTC 1970] + '[' -n 1 -a i '=' f ] + [ 90.639760] stage-1-init: [Thu Jan 1 00:01:28 UTC 1970] + '[' -n 1 -a i '=' i ] + ~ # [ 90.967260] stage-1-init: [Thu Jan 1 00:01:28 UTC 1970] + echo 'Starting interactive shell...' + [ 91.234980] stage-1-init: [Thu Jan 1 00:01:28 UTC 1970] Starting interactive shell... + [ 91.569580] stage-1-init: [Thu Jan 1 00:01:29 UTC 1970] + setsid /nix/store/xm3mpj9aldz5r4s5yb7p08jdjv98hj4w-extra-utils/bin/ash -c 'exec /nix/store/xm3mpj9aldz5r4s5yb7p08jdjv98hj4w-extr + a-utils/bin/ash < /dev/hvc0 >/dev/hvc0 2>/dev/hvc0' + which csrtool + /nix/store/xm3mpj9aldz5r4s5yb7p08jdjv98hj4w-extra-utils/bin/csrtool + ~ # csrtool + CSR 0x801 = 0u + ~ # csrtool o + unknown 'o', use: mem-in-order, all-in-order or all-out-of-order + ~ # csrtool all-in-order + CSR 0x801 = 7u + ~ # csrtool + CSR 0x801 = 7u + ~ # + + IFS='=' + + echo init /nix/store/xmagm60y90pfh3yvqanvmaswa0m3cb0a-nixos-system-nixos-riscv-24.11pre-git/init + + set -- init /nix/store/xmagm60y90pfh3yvqanvmaswa0m3cb0a-nixos-system-nixos-riscv-24.11pre-git/init + + stage2Init=/nix/store/xmagm60y90pfh3yvqanvmaswa0m3cb0a-nixos-system-nixos-riscv-24.11pre-git/init + + echo /nix/store/xm3mpj9aldz5r4s5yb7p08jdjv98hj4w-extra-utils/bin/modprobe + + basename dm_mod + [...] + + echo /sbin/modprobe + + '[' '!' -e /mnt-root//nix/store/xmagm60y90pfh3yvqanvmaswa0m3cb0a-nixos-system-nixos-riscv-24.11pre-git/init ] + + mkdir -m 0755 -p /mnt-root/proc /mnt-root/sys /mnt-root/dev /mnt-root/run + + mount --move /proc /mnt-root/proc + + mount --move /sys /mnt-root/sys + + mount --move /dev /mnt-root/dev + + mount --move /run /mnt-root/run + + type -P switch_root + + exec env -i /nix/store/xm3mpj9aldz5r4s5yb7p08jdjv98hj4w-extra-utils/bin/switch_root /mnt-root /nix/store/xmagm60y90pfh3yvqanvmaswa0m3cb0a-nixos-system-nixos-riscv-24.11pre-git/init + + <<< NixOS Stage 2 >>> + + [ 967.703320] EXT4-fs (pmem0p2): re-mounted 44444444-4444-4444-8888-888888888888 r/w. Quota mode: none. + [ 967.928020] booting system configuration /nix/store/xmagm60y90pfh3yvqanvmaswa0m3cb0a-nixos-system-nixos-riscv-24.11pre-git + running activation script... + [ 977.608980] stage-2-init: running activation script... + bbbbbbbbbbbbbbbbbbbbsetting up /etc... + [ 1084.376420] stage-2-init: setting up /etc... + starting systemd... + +Not sure if it is a good reproducer, as it is taking around 15 minutes to hang +in a very large piece of software, while when we have the out-of-order enabled, +we can hang in half of the time in some script. + +Either way, we need to see a backtrace of where it is hanging to understand why +it does. We may also enable a stage-2 heartbeat to be sure that it is hanging +the kernel and not only the console. + +Another idea is to arrive at a proper bash shell, where we can have debugging +tools, which may allow us to go slowly until we catch the bug. + +### OBSERVATION: Setting memory in-order only causes a hang + +Tested with: + + $ csrtool mem-in-order + +And it hangs just after exiting the tool. + +### QUESTION: Can we see the printk buffer from the host? + +If the problem that we are observing is somehow related to the recursive +segfault of the kernel in Cincoranch, we may be able to see the printk ring +buffer by directly poking at the memory from the host.