Update journal

This commit is contained in:
Rodrigo Arias 2024-09-04 10:48:48 +02:00
parent 32a7b2f3b7
commit 158e232520

View File

@ -3749,3 +3749,151 @@ problems with the boot.
Now that we have a CI pipeline, let's try making the whole boot process
automatic, so I don't have to type anything. This basically requires setting the
environment in U-Boot.
<<< NixOS Stage 1 >>>
An error occurred in stage 1 of the boot process, which must mount the
root filesystem on `/mnt-root' and then start stage 2. Press one
of the following keys:
i) to launch an interactive shell
f) to start an interactive shell having pid 1 (needed if you want to
start stage 2's init manually)
[ 22.365260] stage-1-init: [Thu Jan 1 00:00:22 UTC 1970] An error occurred in stage 1 of the boot process, which must mount the
r) to reboot immediately
*) to ignore the error and continue
[ 22.526780] stage-1-init: [Thu Jan 1 00:00:22 UTC 1970] root filesystem on `/mnt-root' and then start stage 2. Press one
[ 22.611640] stage-1-init: [Thu Jan 1 00:00:22 UTC 1970] of the following keys:
[ 22.697460] stage-1-init: [Thu Jan 1 00:00:22 UTC 1970] i) to launch an interactive shell
[ 22.788100] stage-1-init: [Thu Jan 1 00:00:22 UTC 1970] f) to start an interactive shell having pid 1 (needed if you want to
[ 22.874060] stage-1-init: [Thu Jan 1 00:00:22 UTC 1970] start stage 2's init manually)
[ 22.957940] stage-1-init: [Thu Jan 1 00:00:22 UTC 1970] r) to reboot immediately
[ 23.042520] stage-1-init: [Thu Jan 1 00:00:22 UTC 1970] *) to ignore the error and continue
iStarting interactive shell...
[ 32.314080] stage-1-init: [Thu Jan 1 00:00:32 UTC 1970] Starting interactive shell...
~ # cat /proc/sys/kernel/random
random/ randomize_va_space
~ # cat /proc/sys/kernel/random/entropy_avail
0
Let's see what happens with strace.
## 2024-09-02
Interestingly, I managed to reach the login console and run some commands after
fully booting NixOS. I just needed to disable nscd and enable a daemon to fill
the entropy pull, which is depleted on boot.
I'm not looking at the nscd binary, which works ok when running `nscd -V` but
hangs the CPU when running `nscd --invalidate group`. I tried executing it under
GDB and then ^C, but it doesn't respond.
As always, we don't have JTAG support ready, so this is going to be an absolute
pain to debug. I could bisect the code and try to guess at which point it must
be failing. But each attempt will take around 30 minutes, so it is extremely
expensive.
It would be nice to reproduce this in the initrd shell, which loads much faster.
# not strictly required, but you'll likely want the log anyway
(gdb) set logging on
# ask gdb to not stop every screen-full
(gdb) set height 0
(gdb) while 1
> x/i $pc
> stepi
> end
Interestingly, I can step up to the end of the program, but it seems to be
failing:
(gdb) r
Starting program: /nix/store/nh1f85icnvlqs3dc8lv2ya0ylmljsfax-system-path/bin/nscd --invalidate group
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/nix/store/47a03qi49pwlk3hxpfwx2vq671mlqn57-glibc-riscv64-unknown-linux-gnu-2.39-52/lib/libthread_db.so.1".
Breakpoint 1, 0x0000002aaaaaf738 in main ()
(gdb) while 1
>x/i $pc
>nexti
>end
=> 0x2aaaaaf738 <main+56>: jal 0x2aaaaaf160 <setlocale@plt>
0x0000002aaaaaf73c in main ()
=> 0x2aaaaaf73c <main+60>: auipc a0,0x19
0x0000002aaaaaf740 in main ()
=> 0x2aaaaaf740 <main+64>: ld a0,1580(a0)
0x0000002aaaaaf744 in main ()
=> 0x2aaaaaf744 <main+68>: jal 0x2aaaaaf690 <textdomain@plt>
0x0000002aaaaaf748 in main ()
=> 0x2aaaaaf748 <main+72>: li a5,0
0x0000002aaaaaf74c in main ()
=> 0x2aaaaaf74c <main+76>: addi a4,sp,8
0x0000002aaaaaf750 in main ()
=> 0x2aaaaaf750 <main+80>: li a3,0
0x0000002aaaaaf754 in main ()
=> 0x2aaaaaf754 <main+84>: mv a2,s1
0x0000002aaaaaf758 in main ()
=> 0x2aaaaaf758 <main+88>: mv a1,s0
0x0000002aaaaaf75c in main ()
=> 0x2aaaaaf75c <main+92>: auipc a0,0x19
0x0000002aaaaaf760 in main ()
=> 0x2aaaaaf760 <main+96>: addi a0,a0,-1852
0x0000002aaaaaf764 in main ()
=> 0x2aaaaaf764 <main+100>: jal 0x2aaaaaef30 <argp_parse@plt>
[Inferior 1 (process 1962) exited with code 01]
No registers.
Using passwd instead hangs the CPU before reaching the main:
[root@nixos-riscv:~]# gdb --args $(which nscd) --invalidate passwd
GNU gdb (GDB) 14.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "riscv64-unknown-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /run/current-system/sw/bin/nscd...
(No debugging symbols found in /run/current-system/sw/bin/nscd)
(gdb) b main
Breakpoint 1 at 0x5738
(gdb) b argp_parse
Breakpoint 2 at 0x4f38
(gdb) c
The program is not being run.
(gdb) r
Starting program: /nix/store/nh1f85icnvlqs3dc8lv2ya0ylmljsfax-system-path/bin/nscd --invalidate passwd
It's probably not deterministic.
I have also disabled kaslr by adding nokaslr to the bootargs.
Anyway, doesn't make much sense to debug this on userland and pay 30 minutes for
each test. I'll wait until we have proper JTAG support or a similar way to dump
the registers on a crash.
# 2024-09-03
Wrote a small tool `plictool` to dump the state of the PLIC:
~ # plictool -c 2
plic=0x40800000 nsources=1024 ncontexts=2
src=1 pend=0 prio=1 ctx=1 thre=1
src=2 pend=0 prio=1
src=3 pend=0 prio=1
src=4 pend=1 prio=1
src=33 pend=0 prio=0 ctx=1 thre=1
Interestingly, the auxiliar UART interrupts don't seem to be working very well.
Also, is there another source 33 enabled?