Update journal
This commit is contained in:
		
							parent
							
								
									32a7b2f3b7
								
							
						
					
					
						commit
						158e232520
					
				
							
								
								
									
										148
									
								
								JOURNAL.md
									
									
									
									
									
								
							
							
						
						
									
										148
									
								
								JOURNAL.md
									
									
									
									
									
								
							| @ -3749,3 +3749,151 @@ problems with the boot. | |||||||
| Now that we have a CI pipeline, let's try making the whole boot process | Now that we have a CI pipeline, let's try making the whole boot process | ||||||
| automatic, so I don't have to type anything. This basically requires setting the | automatic, so I don't have to type anything. This basically requires setting the | ||||||
| environment in U-Boot. | environment in U-Boot. | ||||||
|  | 
 | ||||||
|  |     <<< NixOS Stage 1 >>> | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  |     An error occurred in stage 1 of the boot process, which must mount the | ||||||
|  |     root filesystem on `/mnt-root' and then start stage 2.  Press one | ||||||
|  |     of the following keys: | ||||||
|  | 
 | ||||||
|  |       i) to launch an interactive shell | ||||||
|  |       f) to start an interactive shell having pid 1 (needed if you want to | ||||||
|  |          start stage 2's init manually) | ||||||
|  |     [   22.365260] stage-1-init: [Thu Jan  1 00:00:22 UTC 1970] An error occurred in stage 1 of the boot process, which must mount the | ||||||
|  |       r) to reboot immediately | ||||||
|  |       *) to ignore the error and continue | ||||||
|  |     [   22.526780] stage-1-init: [Thu Jan  1 00:00:22 UTC 1970] root filesystem on `/mnt-root' and then start stage 2.  Press one | ||||||
|  |     [   22.611640] stage-1-init: [Thu Jan  1 00:00:22 UTC 1970] of the following keys: | ||||||
|  |     [   22.697460] stage-1-init: [Thu Jan  1 00:00:22 UTC 1970] i) to launch an interactive shell | ||||||
|  |     [   22.788100] stage-1-init: [Thu Jan  1 00:00:22 UTC 1970] f) to start an interactive shell having pid 1 (needed if you want to | ||||||
|  |     [   22.874060] stage-1-init: [Thu Jan  1 00:00:22 UTC 1970] start stage 2's init manually) | ||||||
|  |     [   22.957940] stage-1-init: [Thu Jan  1 00:00:22 UTC 1970] r) to reboot immediately | ||||||
|  |     [   23.042520] stage-1-init: [Thu Jan  1 00:00:22 UTC 1970] *) to ignore the error and continue | ||||||
|  |     iStarting interactive shell... | ||||||
|  |     [   32.314080] stage-1-init: [Thu Jan  1 00:00:32 UTC 1970] Starting interactive shell... | ||||||
|  |     ~ # cat /proc/sys/kernel/random | ||||||
|  |     random/             randomize_va_space | ||||||
|  |     ~ # cat /proc/sys/kernel/random/entropy_avail | ||||||
|  |     0 | ||||||
|  | 
 | ||||||
|  | Let's see what happens with strace. | ||||||
|  | 
 | ||||||
|  | ## 2024-09-02 | ||||||
|  | 
 | ||||||
|  | Interestingly, I managed to reach the login console and run some commands after | ||||||
|  | fully booting NixOS. I just needed to disable nscd and enable a daemon to fill | ||||||
|  | the entropy pull, which is depleted on boot. | ||||||
|  | 
 | ||||||
|  | I'm not looking at the nscd binary, which works ok when running `nscd -V` but | ||||||
|  | hangs the CPU when running `nscd --invalidate group`. I tried executing it under | ||||||
|  | GDB and then ^C, but it doesn't respond. | ||||||
|  | 
 | ||||||
|  | As always, we don't have JTAG support ready, so this is going to be an absolute | ||||||
|  | pain to debug. I could bisect the code and try to guess at which point it must | ||||||
|  | be failing. But each attempt will take around 30 minutes, so it is extremely | ||||||
|  | expensive. | ||||||
|  | 
 | ||||||
|  | It would be nice to reproduce this in the initrd shell, which loads much faster. | ||||||
|  | 
 | ||||||
|  |     # not strictly required, but you'll likely want the log anyway | ||||||
|  |     (gdb) set logging on | ||||||
|  | 
 | ||||||
|  |     # ask gdb to not stop every screen-full | ||||||
|  |     (gdb) set height 0 | ||||||
|  | 
 | ||||||
|  |     (gdb) while 1 | ||||||
|  |      > x/i $pc | ||||||
|  |      > stepi | ||||||
|  |      > end | ||||||
|  | 
 | ||||||
|  | Interestingly, I can step up to the end of the program, but it seems to be | ||||||
|  | failing: | ||||||
|  | 
 | ||||||
|  |     (gdb) r | ||||||
|  |     Starting program: /nix/store/nh1f85icnvlqs3dc8lv2ya0ylmljsfax-system-path/bin/nscd --invalidate group | ||||||
|  |     [Thread debugging using libthread_db enabled] | ||||||
|  |     Using host libthread_db library "/nix/store/47a03qi49pwlk3hxpfwx2vq671mlqn57-glibc-riscv64-unknown-linux-gnu-2.39-52/lib/libthread_db.so.1". | ||||||
|  | 
 | ||||||
|  |     Breakpoint 1, 0x0000002aaaaaf738 in main () | ||||||
|  |     (gdb) while 1 | ||||||
|  |      >x/i $pc | ||||||
|  |      >nexti | ||||||
|  |      >end | ||||||
|  |     => 0x2aaaaaf738 <main+56>:	jal	0x2aaaaaf160 <setlocale@plt> | ||||||
|  |     0x0000002aaaaaf73c in main () | ||||||
|  |     => 0x2aaaaaf73c <main+60>:	auipc	a0,0x19 | ||||||
|  |     0x0000002aaaaaf740 in main () | ||||||
|  |     => 0x2aaaaaf740 <main+64>:	ld	a0,1580(a0) | ||||||
|  |     0x0000002aaaaaf744 in main () | ||||||
|  |     => 0x2aaaaaf744 <main+68>:	jal	0x2aaaaaf690 <textdomain@plt> | ||||||
|  |     0x0000002aaaaaf748 in main () | ||||||
|  |     => 0x2aaaaaf748 <main+72>:	li	a5,0 | ||||||
|  |     0x0000002aaaaaf74c in main () | ||||||
|  |     => 0x2aaaaaf74c <main+76>:	addi	a4,sp,8 | ||||||
|  |     0x0000002aaaaaf750 in main () | ||||||
|  |     => 0x2aaaaaf750 <main+80>:	li	a3,0 | ||||||
|  |     0x0000002aaaaaf754 in main () | ||||||
|  |     => 0x2aaaaaf754 <main+84>:	mv	a2,s1 | ||||||
|  |     0x0000002aaaaaf758 in main () | ||||||
|  |     => 0x2aaaaaf758 <main+88>:	mv	a1,s0 | ||||||
|  |     0x0000002aaaaaf75c in main () | ||||||
|  |     => 0x2aaaaaf75c <main+92>:	auipc	a0,0x19 | ||||||
|  |     0x0000002aaaaaf760 in main () | ||||||
|  |     => 0x2aaaaaf760 <main+96>:	addi	a0,a0,-1852 | ||||||
|  |     0x0000002aaaaaf764 in main () | ||||||
|  |     => 0x2aaaaaf764 <main+100>:	jal	0x2aaaaaef30 <argp_parse@plt> | ||||||
|  |     [Inferior 1 (process 1962) exited with code 01] | ||||||
|  |     No registers. | ||||||
|  | 
 | ||||||
|  | Using passwd instead hangs the CPU before reaching the main: | ||||||
|  | 
 | ||||||
|  |     [root@nixos-riscv:~]# gdb --args $(which nscd) --invalidate passwd | ||||||
|  |     GNU gdb (GDB) 14.2 | ||||||
|  |     Copyright (C) 2023 Free Software Foundation, Inc. | ||||||
|  |     License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> | ||||||
|  |     This is free software: you are free to change and redistribute it. | ||||||
|  |     There is NO WARRANTY, to the extent permitted by law. | ||||||
|  |     Type "show copying" and "show warranty" for details. | ||||||
|  |     This GDB was configured as "riscv64-unknown-linux-gnu". | ||||||
|  |     Type "show configuration" for configuration details. | ||||||
|  |     For bug reporting instructions, please see: | ||||||
|  |     <https://www.gnu.org/software/gdb/bugs/>. | ||||||
|  |     Find the GDB manual and other documentation resources online at: | ||||||
|  |         <http://www.gnu.org/software/gdb/documentation/>. | ||||||
|  | 
 | ||||||
|  |     For help, type "help". | ||||||
|  |     Type "apropos word" to search for commands related to "word"... | ||||||
|  |     Reading symbols from /run/current-system/sw/bin/nscd... | ||||||
|  |     (No debugging symbols found in /run/current-system/sw/bin/nscd) | ||||||
|  |     (gdb) b main | ||||||
|  |     Breakpoint 1 at 0x5738 | ||||||
|  |     (gdb) b argp_parse | ||||||
|  |     Breakpoint 2 at 0x4f38 | ||||||
|  |     (gdb) c | ||||||
|  |     The program is not being run. | ||||||
|  |     (gdb) r | ||||||
|  |     Starting program: /nix/store/nh1f85icnvlqs3dc8lv2ya0ylmljsfax-system-path/bin/nscd --invalidate passwd | ||||||
|  | 
 | ||||||
|  | It's probably not deterministic. | ||||||
|  | 
 | ||||||
|  | I have also disabled kaslr by adding nokaslr to the bootargs. | ||||||
|  | 
 | ||||||
|  | Anyway, doesn't make much sense to debug this on userland and pay 30 minutes for | ||||||
|  | each test. I'll wait until we have proper JTAG support or a similar way to dump | ||||||
|  | the registers on a crash. | ||||||
|  | 
 | ||||||
|  | # 2024-09-03 | ||||||
|  | 
 | ||||||
|  | Wrote a small tool `plictool` to dump the state of the PLIC: | ||||||
|  | 
 | ||||||
|  |     ~ # plictool -c 2 | ||||||
|  |     plic=0x40800000 nsources=1024 ncontexts=2 | ||||||
|  |     src=1 pend=0 prio=1 ctx=1 thre=1 | ||||||
|  |     src=2 pend=0 prio=1 | ||||||
|  |     src=3 pend=0 prio=1 | ||||||
|  |     src=4 pend=1 prio=1 | ||||||
|  |     src=33 pend=0 prio=0 ctx=1 thre=1 | ||||||
|  | 
 | ||||||
|  | Interestingly, the auxiliar UART interrupts don't seem to be working very well. | ||||||
|  | Also, is there another source 33 enabled? | ||||||
|  | |||||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user