diff --git a/JOURNAL.md b/JOURNAL.md index c62bf7e..4ca71fe 100644 --- a/JOURNAL.md +++ b/JOURNAL.md @@ -3749,3 +3749,151 @@ problems with the boot. Now that we have a CI pipeline, let's try making the whole boot process automatic, so I don't have to type anything. This basically requires setting the environment in U-Boot. + + <<< NixOS Stage 1 >>> + + + An error occurred in stage 1 of the boot process, which must mount the + root filesystem on `/mnt-root' and then start stage 2. Press one + of the following keys: + + i) to launch an interactive shell + f) to start an interactive shell having pid 1 (needed if you want to + start stage 2's init manually) + [ 22.365260] stage-1-init: [Thu Jan 1 00:00:22 UTC 1970] An error occurred in stage 1 of the boot process, which must mount the + r) to reboot immediately + *) to ignore the error and continue + [ 22.526780] stage-1-init: [Thu Jan 1 00:00:22 UTC 1970] root filesystem on `/mnt-root' and then start stage 2. Press one + [ 22.611640] stage-1-init: [Thu Jan 1 00:00:22 UTC 1970] of the following keys: + [ 22.697460] stage-1-init: [Thu Jan 1 00:00:22 UTC 1970] i) to launch an interactive shell + [ 22.788100] stage-1-init: [Thu Jan 1 00:00:22 UTC 1970] f) to start an interactive shell having pid 1 (needed if you want to + [ 22.874060] stage-1-init: [Thu Jan 1 00:00:22 UTC 1970] start stage 2's init manually) + [ 22.957940] stage-1-init: [Thu Jan 1 00:00:22 UTC 1970] r) to reboot immediately + [ 23.042520] stage-1-init: [Thu Jan 1 00:00:22 UTC 1970] *) to ignore the error and continue + iStarting interactive shell... + [ 32.314080] stage-1-init: [Thu Jan 1 00:00:32 UTC 1970] Starting interactive shell... + ~ # cat /proc/sys/kernel/random + random/ randomize_va_space + ~ # cat /proc/sys/kernel/random/entropy_avail + 0 + +Let's see what happens with strace. + +## 2024-09-02 + +Interestingly, I managed to reach the login console and run some commands after +fully booting NixOS. I just needed to disable nscd and enable a daemon to fill +the entropy pull, which is depleted on boot. + +I'm not looking at the nscd binary, which works ok when running `nscd -V` but +hangs the CPU when running `nscd --invalidate group`. I tried executing it under +GDB and then ^C, but it doesn't respond. + +As always, we don't have JTAG support ready, so this is going to be an absolute +pain to debug. I could bisect the code and try to guess at which point it must +be failing. But each attempt will take around 30 minutes, so it is extremely +expensive. + +It would be nice to reproduce this in the initrd shell, which loads much faster. + + # not strictly required, but you'll likely want the log anyway + (gdb) set logging on + + # ask gdb to not stop every screen-full + (gdb) set height 0 + + (gdb) while 1 + > x/i $pc + > stepi + > end + +Interestingly, I can step up to the end of the program, but it seems to be +failing: + + (gdb) r + Starting program: /nix/store/nh1f85icnvlqs3dc8lv2ya0ylmljsfax-system-path/bin/nscd --invalidate group + [Thread debugging using libthread_db enabled] + Using host libthread_db library "/nix/store/47a03qi49pwlk3hxpfwx2vq671mlqn57-glibc-riscv64-unknown-linux-gnu-2.39-52/lib/libthread_db.so.1". + + Breakpoint 1, 0x0000002aaaaaf738 in main () + (gdb) while 1 + >x/i $pc + >nexti + >end + => 0x2aaaaaf738 : jal 0x2aaaaaf160 + 0x0000002aaaaaf73c in main () + => 0x2aaaaaf73c : auipc a0,0x19 + 0x0000002aaaaaf740 in main () + => 0x2aaaaaf740 : ld a0,1580(a0) + 0x0000002aaaaaf744 in main () + => 0x2aaaaaf744 : jal 0x2aaaaaf690 + 0x0000002aaaaaf748 in main () + => 0x2aaaaaf748 : li a5,0 + 0x0000002aaaaaf74c in main () + => 0x2aaaaaf74c : addi a4,sp,8 + 0x0000002aaaaaf750 in main () + => 0x2aaaaaf750 : li a3,0 + 0x0000002aaaaaf754 in main () + => 0x2aaaaaf754 : mv a2,s1 + 0x0000002aaaaaf758 in main () + => 0x2aaaaaf758 : mv a1,s0 + 0x0000002aaaaaf75c in main () + => 0x2aaaaaf75c : auipc a0,0x19 + 0x0000002aaaaaf760 in main () + => 0x2aaaaaf760 : addi a0,a0,-1852 + 0x0000002aaaaaf764 in main () + => 0x2aaaaaf764 : jal 0x2aaaaaef30 + [Inferior 1 (process 1962) exited with code 01] + No registers. + +Using passwd instead hangs the CPU before reaching the main: + + [root@nixos-riscv:~]# gdb --args $(which nscd) --invalidate passwd + GNU gdb (GDB) 14.2 + Copyright (C) 2023 Free Software Foundation, Inc. + License GPLv3+: GNU GPL version 3 or later + This is free software: you are free to change and redistribute it. + There is NO WARRANTY, to the extent permitted by law. + Type "show copying" and "show warranty" for details. + This GDB was configured as "riscv64-unknown-linux-gnu". + Type "show configuration" for configuration details. + For bug reporting instructions, please see: + . + Find the GDB manual and other documentation resources online at: + . + + For help, type "help". + Type "apropos word" to search for commands related to "word"... + Reading symbols from /run/current-system/sw/bin/nscd... + (No debugging symbols found in /run/current-system/sw/bin/nscd) + (gdb) b main + Breakpoint 1 at 0x5738 + (gdb) b argp_parse + Breakpoint 2 at 0x4f38 + (gdb) c + The program is not being run. + (gdb) r + Starting program: /nix/store/nh1f85icnvlqs3dc8lv2ya0ylmljsfax-system-path/bin/nscd --invalidate passwd + +It's probably not deterministic. + +I have also disabled kaslr by adding nokaslr to the bootargs. + +Anyway, doesn't make much sense to debug this on userland and pay 30 minutes for +each test. I'll wait until we have proper JTAG support or a similar way to dump +the registers on a crash. + +# 2024-09-03 + +Wrote a small tool `plictool` to dump the state of the PLIC: + + ~ # plictool -c 2 + plic=0x40800000 nsources=1024 ncontexts=2 + src=1 pend=0 prio=1 ctx=1 thre=1 + src=2 pend=0 prio=1 + src=3 pend=0 prio=1 + src=4 pend=1 prio=1 + src=33 pend=0 prio=0 ctx=1 thre=1 + +Interestingly, the auxiliar UART interrupts don't seem to be working very well. +Also, is there another source 33 enabled?