DRAM errors in bay #75

Open
opened 2024-08-28 12:14:23 +02:00 by rarias · 0 comments
Owner
[132102.723222] mce: [Hardware Error]: Machine check events logged
[132102.723418] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[132102.723422] {1}[Hardware Error]: It has been corrected by h/w and requires no further action
[132102.723425] {1}[Hardware Error]: event severity: corrected
[132102.723428] {1}[Hardware Error]:  Error 0, type: corrected
[132102.723431] {1}[Hardware Error]:  fru_text: DIMM ??
[132102.723434] {1}[Hardware Error]:   section_type: memory error
[132102.723437] {1}[Hardware Error]:    error_status: Storage error in DRAM memory (0x0000000000000400)
[132102.723442] {1}[Hardware Error]:   node:0
[148184.337645] mce: [Hardware Error]: Machine check events logged
[148184.337841] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[148184.337845] {2}[Hardware Error]: It has been corrected by h/w and requires no further action
[148184.337848] {2}[Hardware Error]: event severity: corrected
[148184.337852] {2}[Hardware Error]:  Error 0, type: corrected
[148184.337855] {2}[Hardware Error]:  fru_text: DIMM ??
[148184.337858] {2}[Hardware Error]:   section_type: memory error
[148184.337861] {2}[Hardware Error]:    error_status: Storage error in DRAM memory (0x0000000000000400)
[148184.337866] {2}[Hardware Error]:   node:0
[399053.756531] perf: interrupt took too long (2502 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
[536153.675714] perf: interrupt took too long (3141 > 3127), lowering kernel.perf_event_max_sample_rate to 63000
[735045.757437] perf: interrupt took too long (3948 > 3926), lowering kernel.perf_event_max_sample_rate to 50000
[908495.317409] RAS: Soft-offlining pfn: 0x28e9f7
[908495.322412] mce: [Hardware Error]: Machine check events logged
[908495.322417] EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
[908495.322420] EDAC sbridge MC1: CPU 0: Machine Check Event: 0 Bank 11: cc000687000800c2
[908495.322425] EDAC sbridge MC1: TSC a4e31f57c27cc
[908495.322428] EDAC sbridge MC1: ADDR 28e9f7000
[908495.322431] EDAC sbridge MC1: MISC 900044048808e8c
[908495.322434] EDAC sbridge MC1: PROCESSOR 0:406f1 TIME 1723867946 SOCKET 0 APIC 0
[908495.322452] EDAC MC1: 26 CE memory scrubbing error on CPU_SrcID#0_Ha#0_Chan#2_DIMM#0 (channel:2 page:0x28e9f7 offset:0x0 grain:32 syndrome:0x0 -  OVERFLOW area:DRAM err_code:0008:00c2 socket:0 ha:0 channel_mask:4 rank:255 )
[908495.324872] {3}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[908495.324876] {3}[Hardware Error]: It has been corrected by h/w and requires no further action
[908495.324879] {3}[Hardware Error]: event severity: corrected
[908495.324882] {3}[Hardware Error]:  Error 0, type: corrected
[908495.324885] {3}[Hardware Error]:  fru_text: DIMM ??
[908495.324888] {3}[Hardware Error]:   section_type: memory error
[908495.324891] {3}[Hardware Error]:    error_status: Storage error in DRAM memory (0x0000000000000400)
[908495.324895] {3}[Hardware Error]:   node:0
[924558.089513] RAS: Soft-offlining pfn: 0xa8e9e5
[924558.094521] mce: [Hardware Error]: Machine check events logged
[924558.094526] EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
[924558.094529] EDAC sbridge MC1: CPU 0: Machine Check Event: 0 Bank 11: cc000bc7000800c2
[924558.094534] EDAC sbridge MC1: TSC a7cd600ff8aec
[924558.094537] EDAC sbridge MC1: ADDR a8e9e5000
[924558.094540] EDAC sbridge MC1: MISC 900202010100e8c
[924558.094543] EDAC sbridge MC1: PROCESSOR 0:406f1 TIME 1723884008 SOCKET 0 APIC 0
[924558.094562] EDAC MC1: 47 CE memory scrubbing error on CPU_SrcID#0_Ha#0_Chan#2_DIMM#0 (channel:2 page:0xa8e9e5 offset:0x0 grain:32 syndrome:0x0 -  OVERFLOW area:DRAM err_code:0008:00c2 socket:0 ha:0 channel_mask:4 rank:255 )
[924558.096802] {4}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[924558.096806] {4}[Hardware Error]: It has been corrected by h/w and requires no further action
[924558.096809] {4}[Hardware Error]: event severity: corrected
[924558.096812] {4}[Hardware Error]:  Error 0, type: corrected
[924558.096816] {4}[Hardware Error]:  fru_text: DIMM ??
[924558.096819] {4}[Hardware Error]:   section_type: memory error
[924558.096821] {4}[Hardware Error]:    error_status: Storage error in DRAM memory (0x0000000000000400)
[924558.096826] {4}[Hardware Error]:   node:0
[1059868.952381] perf: interrupt took too long (4946 > 4935), lowering kernel.perf_event_max_sample_rate to 40000
[1068849.051223] mce: [Hardware Error]: Machine check events logged
[1073850.383062] mce: [Hardware Error]: Machine check events logged
[1080292.113999] mce: [Hardware Error]: Machine check events logged
[1093986.949917] RAS: Soft-offlining pfn: 0xa8e9e4
[1093986.955025] mce: [Hardware Error]: Machine check events logged
[1093986.955029] EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
[1093986.955033] EDAC sbridge MC1: CPU 0: Machine Check Event: 0 Bank 7: cc00014000010092
[1093986.955038] EDAC sbridge MC1: TSC c68cca968344c
[1093986.955041] EDAC sbridge MC1: ADDR a8e9e4e00
[1093986.955044] EDAC sbridge MC1: MISC 142243886
[1093986.955046] EDAC sbridge MC1: PROCESSOR 0:406f1 TIME 1724053437 SOCKET 0 APIC 0
[1093986.955073] EDAC MC1: 5 CE memory read error on CPU_SrcID#0_Ha#0_Chan#2_DIMM#0 (channel:2 slot:0 page:0xa8e9e4 offset:0xe00 grain:32 syndrome:0x0 -  OVERFLOW area:DRAM err_code:0001:0092 socket:0 ha:0 channel_mask:4 rank:0 row:0xa1cf col:0x138 bank_addr:3 bank_group:3)
[1115770.764568] RAS: Soft-offlining pfn: 0x28e9f5
[1115770.769673] mce: [Hardware Error]: Machine check events logged
[1115770.769678] EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
[1115770.769682] EDAC sbridge MC1: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010092
[1115770.769687] EDAC sbridge MC1: TSC ca80d5d4cedcc
[1115770.769690] EDAC sbridge MC1: ADDR 28e9f5200
[1115770.769693] EDAC sbridge MC1: MISC 4022a286
[1115770.769695] EDAC sbridge MC1: PROCESSOR 0:406f1 TIME 1724075221 SOCKET 0 APIC 0
[1115770.769723] EDAC MC1: 1 CE memory read error on CPU_SrcID#0_Ha#0_Chan#2_DIMM#0 (channel:2 slot:0 page:0x28e9f5 offset:0x200 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0092 socket:0 ha:0 channel_mask:4 rank:0 row:0x21cf col:0x348 bank_addr:3 bank_group:3)
[1115770.869121] mce: [Hardware Error]: Machine check events logged
[1115911.829026] mce: [Hardware Error]: Machine check events logged
[1165531.138158] mce: [Hardware Error]: Machine check events logged
[1165531.138354] {5}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[1165531.138359] {5}[Hardware Error]: It has been corrected by h/w and requires no further action
[1165531.138362] {5}[Hardware Error]: event severity: corrected
[1165531.138366] {5}[Hardware Error]:  Error 0, type: corrected
[1165531.138369] {5}[Hardware Error]:  fru_text: DIMM ??
[1165531.138372] {5}[Hardware Error]:   section_type: memory error
[1165531.138375] {5}[Hardware Error]:    error_status: Storage error in DRAM memory (0x0000000000000400)
[1165531.138380] {5}[Hardware Error]:   node:0
[1170858.124654] RAS: Soft-offlining pfn: 0xa8e9e2
[1170858.129754] mce: [Hardware Error]: Machine check events logged
[1170858.129759] EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
[1170858.129762] EDAC sbridge MC1: CPU 0: Machine Check Event: 0 Bank 7: cc00020000010092
[1170858.129767] EDAC sbridge MC1: TSC d4801d001b900
[1170858.129770] EDAC sbridge MC1: ADDR a8e9e2f80
[1170858.129781] EDAC sbridge MC1: MISC 4406a3e86
[1170858.129784] EDAC sbridge MC1: PROCESSOR 0:406f1 TIME 1724130308 SOCKET 0 APIC 0
[1170858.129812] EDAC MC1: 8 CE memory read error on CPU_SrcID#0_Ha#0_Chan#2_DIMM#0 (channel:2 slot:0 page:0xa8e9e2 offset:0xf80 grain:32 syndrome:0x0 -  OVERFLOW area:DRAM err_code:0001:0092 socket:0 ha:0 channel_mask:4 rank:0 row:0xa1cf col:0xb8 bank_addr:3 bank_group:3)
[1172140.763365] RAS: Soft-offlining pfn: 0xa8e9e3
[1172140.768473] mce: [Hardware Error]: Machine check events logged
[1172140.768477] EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
[1172140.768480] EDAC sbridge MC1: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010092
[1172140.768486] EDAC sbridge MC1: TSC d4bbb3ea55c60
[1172140.768489] EDAC sbridge MC1: ADDR a8e9e3300
[1172140.768492] EDAC sbridge MC1: MISC 1506e0086
[1172140.768495] EDAC sbridge MC1: PROCESSOR 0:406f1 TIME 1724131590 SOCKET 0 APIC 0
[1172140.768523] EDAC MC1: 1 CE memory read error on CPU_SrcID#0_Ha#0_Chan#2_DIMM#0 (channel:2 slot:0 page:0xa8e9e3 offset:0x300 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0092 socket:0 ha:0 channel_mask:4 rank:0 row:0xa1cf col:0xc8 bank_addr:3 bank_group:3)
[1172140.770638] mce: [Hardware Error]: Machine check events logged
[1172140.770643] RAS: Soft-offlining pfn: 0xa8e9e3
[1172140.775725] EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
[1172140.775728] EDAC sbridge MC1: CPU 0: Machine Check Event: 0 Bank 7: cc00010000010092
[1172140.775732] EDAC sbridge MC1: TSC d4bbb40084a0c
[1172140.775734] EDAC sbridge MC1: ADDR a8e9e3f00
[1172140.775737] EDAC sbridge MC1: MISC 15062da86
[1172140.775739] EDAC sbridge MC1: PROCESSOR 0:406f1 TIME 1724131590 SOCKET 0 APIC 0
[1172140.775758] EDAC MC1: 4 CE memory read error on CPU_SrcID#0_Ha#0_Chan#2_DIMM#0 (channel:2 slot:0 page:0xa8e9e3 offset:0xf00 grain:32 syndrome:0x0 -  OVERFLOW area:DRAM err_code:0001:0092 socket:0 ha:0 channel_mask:4 rank:0 row:0xa1cf col:0xf8 bank_addr:3 bank_group:3)
[1172140.776155] soft_offline_page: 0xa8e9e3 page already poisoned
[1181692.861063] mce_notify_irq: 1 callbacks suppressed
[1181692.861069] mce: [Hardware Error]: Machine check events logged
[1181692.861271] {6}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[1181692.861275] {6}[Hardware Error]: It has been corrected by h/w and requires no further action
[1181692.861278] {6}[Hardware Error]: event severity: corrected
[1181692.861281] {6}[Hardware Error]:  Error 0, type: corrected
[1181692.861284] {6}[Hardware Error]:  fru_text: DIMM ??
[1181692.861288] {6}[Hardware Error]:   section_type: memory error
[1181692.861290] {6}[Hardware Error]:    error_status: Storage error in DRAM memory (0x0000000000000400)
[1181692.861295] {6}[Hardware Error]:   node:0
[1234246.590139] mce: [Hardware Error]: Machine check events logged
[1421485.280812] RAS: Soft-offlining pfn: 0x28e9f7
[1421485.285918] mce: [Hardware Error]: Machine check events logged
[1421485.285922] EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
[1421485.285925] EDAC sbridge MC1: CPU 0: Machine Check Event: 0 Bank 11: cc000607000800c2
[1421485.285931] EDAC sbridge MC1: TSC 101fbe20d0ef80
[1421485.285933] EDAC sbridge MC1: ADDR 28e9f7000
[1421485.285936] EDAC sbridge MC1: MISC 900044044000e8c
[1421485.285939] EDAC sbridge MC1: PROCESSOR 0:406f1 TIME 1724380934 SOCKET 0 APIC 0
[1421485.285957] EDAC MC1: 24 CE memory scrubbing error on CPU_SrcID#0_Ha#0_Chan#2_DIMM#0 (channel:2 page:0x28e9f7 offset:0x0 grain:32 syndrome:0x0 -  OVERFLOW area:DRAM err_code:0008:00c2 socket:0 ha:0 channel_mask:4 rank:255 )
[1421485.286575] soft_offline_page: 0x28e9f7 page already poisoned
[1421485.286597] {7}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[1421485.286600] {7}[Hardware Error]: It has been corrected by h/w and requires no further action
[1421485.286603] {7}[Hardware Error]: event severity: corrected
[1421485.286606] {7}[Hardware Error]:  Error 0, type: corrected
[1421485.286609] {7}[Hardware Error]:  fru_text: DIMM ??
[1421485.286613] {7}[Hardware Error]:   section_type: memory error
[1421485.286615] {7}[Hardware Error]:    error_status: Storage error in DRAM memory (0x0000000000000400)
[1421485.286620] {7}[Hardware Error]:   node:0
[1437351.613117] RAS: Soft-offlining pfn: 0xa8e9e5
[1437351.618228] mce: [Hardware Error]: Machine check events logged
[1437351.618233] EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
[1437351.618236] EDAC sbridge MC1: CPU 0: Machine Check Event: 0 Bank 11: cc000bc7000800c2
[1437351.618241] EDAC sbridge MC1: TSC 104dd026fbc616
[1437351.618244] EDAC sbridge MC1: ADDR a8e9e5000
[1437351.618248] EDAC sbridge MC1: MISC 900202010100e8c
[1437351.618250] EDAC sbridge MC1: PROCESSOR 0:406f1 TIME 1724396801 SOCKET 0 APIC 0
[1437351.618268] EDAC MC1: 47 CE memory scrubbing error on CPU_SrcID#0_Ha#0_Chan#2_DIMM#0 (channel:2 page:0xa8e9e5 offset:0x0 grain:32 syndrome:0x0 -  OVERFLOW area:DRAM err_code:0008:00c2 socket:0 ha:0 channel_mask:4 rank:255 )
[1437351.618874] soft_offline_page: 0xa8e9e5 page already poisoned
[1437351.618897] {8}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[1437351.618901] {8}[Hardware Error]: It has been corrected by h/w and requires no further action
[1437351.618903] {8}[Hardware Error]: event severity: corrected
[1437351.618907] {8}[Hardware Error]:  Error 0, type: corrected
[1437351.618910] {8}[Hardware Error]:  fru_text: DIMM ??
[1437351.618913] {8}[Hardware Error]:   section_type: memory error
[1437351.618915] {8}[Hardware Error]:    error_status: Storage error in DRAM memory (0x0000000000000400)
[1437351.618919] {8}[Hardware Error]:   node:0
[1804406.402949] perf: interrupt took too long (6188 > 6182), lowering kernel.perf_event_max_sample_rate to 32000
``` [132102.723222] mce: [Hardware Error]: Machine check events logged [132102.723418] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [132102.723422] {1}[Hardware Error]: It has been corrected by h/w and requires no further action [132102.723425] {1}[Hardware Error]: event severity: corrected [132102.723428] {1}[Hardware Error]: Error 0, type: corrected [132102.723431] {1}[Hardware Error]: fru_text: DIMM ?? [132102.723434] {1}[Hardware Error]: section_type: memory error [132102.723437] {1}[Hardware Error]: error_status: Storage error in DRAM memory (0x0000000000000400) [132102.723442] {1}[Hardware Error]: node:0 [148184.337645] mce: [Hardware Error]: Machine check events logged [148184.337841] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [148184.337845] {2}[Hardware Error]: It has been corrected by h/w and requires no further action [148184.337848] {2}[Hardware Error]: event severity: corrected [148184.337852] {2}[Hardware Error]: Error 0, type: corrected [148184.337855] {2}[Hardware Error]: fru_text: DIMM ?? [148184.337858] {2}[Hardware Error]: section_type: memory error [148184.337861] {2}[Hardware Error]: error_status: Storage error in DRAM memory (0x0000000000000400) [148184.337866] {2}[Hardware Error]: node:0 [399053.756531] perf: interrupt took too long (2502 > 2500), lowering kernel.perf_event_max_sample_rate to 79000 [536153.675714] perf: interrupt took too long (3141 > 3127), lowering kernel.perf_event_max_sample_rate to 63000 [735045.757437] perf: interrupt took too long (3948 > 3926), lowering kernel.perf_event_max_sample_rate to 50000 [908495.317409] RAS: Soft-offlining pfn: 0x28e9f7 [908495.322412] mce: [Hardware Error]: Machine check events logged [908495.322417] EDAC sbridge MC1: HANDLING MCE MEMORY ERROR [908495.322420] EDAC sbridge MC1: CPU 0: Machine Check Event: 0 Bank 11: cc000687000800c2 [908495.322425] EDAC sbridge MC1: TSC a4e31f57c27cc [908495.322428] EDAC sbridge MC1: ADDR 28e9f7000 [908495.322431] EDAC sbridge MC1: MISC 900044048808e8c [908495.322434] EDAC sbridge MC1: PROCESSOR 0:406f1 TIME 1723867946 SOCKET 0 APIC 0 [908495.322452] EDAC MC1: 26 CE memory scrubbing error on CPU_SrcID#0_Ha#0_Chan#2_DIMM#0 (channel:2 page:0x28e9f7 offset:0x0 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0008:00c2 socket:0 ha:0 channel_mask:4 rank:255 ) [908495.324872] {3}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [908495.324876] {3}[Hardware Error]: It has been corrected by h/w and requires no further action [908495.324879] {3}[Hardware Error]: event severity: corrected [908495.324882] {3}[Hardware Error]: Error 0, type: corrected [908495.324885] {3}[Hardware Error]: fru_text: DIMM ?? [908495.324888] {3}[Hardware Error]: section_type: memory error [908495.324891] {3}[Hardware Error]: error_status: Storage error in DRAM memory (0x0000000000000400) [908495.324895] {3}[Hardware Error]: node:0 [924558.089513] RAS: Soft-offlining pfn: 0xa8e9e5 [924558.094521] mce: [Hardware Error]: Machine check events logged [924558.094526] EDAC sbridge MC1: HANDLING MCE MEMORY ERROR [924558.094529] EDAC sbridge MC1: CPU 0: Machine Check Event: 0 Bank 11: cc000bc7000800c2 [924558.094534] EDAC sbridge MC1: TSC a7cd600ff8aec [924558.094537] EDAC sbridge MC1: ADDR a8e9e5000 [924558.094540] EDAC sbridge MC1: MISC 900202010100e8c [924558.094543] EDAC sbridge MC1: PROCESSOR 0:406f1 TIME 1723884008 SOCKET 0 APIC 0 [924558.094562] EDAC MC1: 47 CE memory scrubbing error on CPU_SrcID#0_Ha#0_Chan#2_DIMM#0 (channel:2 page:0xa8e9e5 offset:0x0 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0008:00c2 socket:0 ha:0 channel_mask:4 rank:255 ) [924558.096802] {4}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [924558.096806] {4}[Hardware Error]: It has been corrected by h/w and requires no further action [924558.096809] {4}[Hardware Error]: event severity: corrected [924558.096812] {4}[Hardware Error]: Error 0, type: corrected [924558.096816] {4}[Hardware Error]: fru_text: DIMM ?? [924558.096819] {4}[Hardware Error]: section_type: memory error [924558.096821] {4}[Hardware Error]: error_status: Storage error in DRAM memory (0x0000000000000400) [924558.096826] {4}[Hardware Error]: node:0 [1059868.952381] perf: interrupt took too long (4946 > 4935), lowering kernel.perf_event_max_sample_rate to 40000 [1068849.051223] mce: [Hardware Error]: Machine check events logged [1073850.383062] mce: [Hardware Error]: Machine check events logged [1080292.113999] mce: [Hardware Error]: Machine check events logged [1093986.949917] RAS: Soft-offlining pfn: 0xa8e9e4 [1093986.955025] mce: [Hardware Error]: Machine check events logged [1093986.955029] EDAC sbridge MC1: HANDLING MCE MEMORY ERROR [1093986.955033] EDAC sbridge MC1: CPU 0: Machine Check Event: 0 Bank 7: cc00014000010092 [1093986.955038] EDAC sbridge MC1: TSC c68cca968344c [1093986.955041] EDAC sbridge MC1: ADDR a8e9e4e00 [1093986.955044] EDAC sbridge MC1: MISC 142243886 [1093986.955046] EDAC sbridge MC1: PROCESSOR 0:406f1 TIME 1724053437 SOCKET 0 APIC 0 [1093986.955073] EDAC MC1: 5 CE memory read error on CPU_SrcID#0_Ha#0_Chan#2_DIMM#0 (channel:2 slot:0 page:0xa8e9e4 offset:0xe00 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0001:0092 socket:0 ha:0 channel_mask:4 rank:0 row:0xa1cf col:0x138 bank_addr:3 bank_group:3) [1115770.764568] RAS: Soft-offlining pfn: 0x28e9f5 [1115770.769673] mce: [Hardware Error]: Machine check events logged [1115770.769678] EDAC sbridge MC1: HANDLING MCE MEMORY ERROR [1115770.769682] EDAC sbridge MC1: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010092 [1115770.769687] EDAC sbridge MC1: TSC ca80d5d4cedcc [1115770.769690] EDAC sbridge MC1: ADDR 28e9f5200 [1115770.769693] EDAC sbridge MC1: MISC 4022a286 [1115770.769695] EDAC sbridge MC1: PROCESSOR 0:406f1 TIME 1724075221 SOCKET 0 APIC 0 [1115770.769723] EDAC MC1: 1 CE memory read error on CPU_SrcID#0_Ha#0_Chan#2_DIMM#0 (channel:2 slot:0 page:0x28e9f5 offset:0x200 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0092 socket:0 ha:0 channel_mask:4 rank:0 row:0x21cf col:0x348 bank_addr:3 bank_group:3) [1115770.869121] mce: [Hardware Error]: Machine check events logged [1115911.829026] mce: [Hardware Error]: Machine check events logged [1165531.138158] mce: [Hardware Error]: Machine check events logged [1165531.138354] {5}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [1165531.138359] {5}[Hardware Error]: It has been corrected by h/w and requires no further action [1165531.138362] {5}[Hardware Error]: event severity: corrected [1165531.138366] {5}[Hardware Error]: Error 0, type: corrected [1165531.138369] {5}[Hardware Error]: fru_text: DIMM ?? [1165531.138372] {5}[Hardware Error]: section_type: memory error [1165531.138375] {5}[Hardware Error]: error_status: Storage error in DRAM memory (0x0000000000000400) [1165531.138380] {5}[Hardware Error]: node:0 [1170858.124654] RAS: Soft-offlining pfn: 0xa8e9e2 [1170858.129754] mce: [Hardware Error]: Machine check events logged [1170858.129759] EDAC sbridge MC1: HANDLING MCE MEMORY ERROR [1170858.129762] EDAC sbridge MC1: CPU 0: Machine Check Event: 0 Bank 7: cc00020000010092 [1170858.129767] EDAC sbridge MC1: TSC d4801d001b900 [1170858.129770] EDAC sbridge MC1: ADDR a8e9e2f80 [1170858.129781] EDAC sbridge MC1: MISC 4406a3e86 [1170858.129784] EDAC sbridge MC1: PROCESSOR 0:406f1 TIME 1724130308 SOCKET 0 APIC 0 [1170858.129812] EDAC MC1: 8 CE memory read error on CPU_SrcID#0_Ha#0_Chan#2_DIMM#0 (channel:2 slot:0 page:0xa8e9e2 offset:0xf80 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0001:0092 socket:0 ha:0 channel_mask:4 rank:0 row:0xa1cf col:0xb8 bank_addr:3 bank_group:3) [1172140.763365] RAS: Soft-offlining pfn: 0xa8e9e3 [1172140.768473] mce: [Hardware Error]: Machine check events logged [1172140.768477] EDAC sbridge MC1: HANDLING MCE MEMORY ERROR [1172140.768480] EDAC sbridge MC1: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010092 [1172140.768486] EDAC sbridge MC1: TSC d4bbb3ea55c60 [1172140.768489] EDAC sbridge MC1: ADDR a8e9e3300 [1172140.768492] EDAC sbridge MC1: MISC 1506e0086 [1172140.768495] EDAC sbridge MC1: PROCESSOR 0:406f1 TIME 1724131590 SOCKET 0 APIC 0 [1172140.768523] EDAC MC1: 1 CE memory read error on CPU_SrcID#0_Ha#0_Chan#2_DIMM#0 (channel:2 slot:0 page:0xa8e9e3 offset:0x300 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0092 socket:0 ha:0 channel_mask:4 rank:0 row:0xa1cf col:0xc8 bank_addr:3 bank_group:3) [1172140.770638] mce: [Hardware Error]: Machine check events logged [1172140.770643] RAS: Soft-offlining pfn: 0xa8e9e3 [1172140.775725] EDAC sbridge MC1: HANDLING MCE MEMORY ERROR [1172140.775728] EDAC sbridge MC1: CPU 0: Machine Check Event: 0 Bank 7: cc00010000010092 [1172140.775732] EDAC sbridge MC1: TSC d4bbb40084a0c [1172140.775734] EDAC sbridge MC1: ADDR a8e9e3f00 [1172140.775737] EDAC sbridge MC1: MISC 15062da86 [1172140.775739] EDAC sbridge MC1: PROCESSOR 0:406f1 TIME 1724131590 SOCKET 0 APIC 0 [1172140.775758] EDAC MC1: 4 CE memory read error on CPU_SrcID#0_Ha#0_Chan#2_DIMM#0 (channel:2 slot:0 page:0xa8e9e3 offset:0xf00 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0001:0092 socket:0 ha:0 channel_mask:4 rank:0 row:0xa1cf col:0xf8 bank_addr:3 bank_group:3) [1172140.776155] soft_offline_page: 0xa8e9e3 page already poisoned [1181692.861063] mce_notify_irq: 1 callbacks suppressed [1181692.861069] mce: [Hardware Error]: Machine check events logged [1181692.861271] {6}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [1181692.861275] {6}[Hardware Error]: It has been corrected by h/w and requires no further action [1181692.861278] {6}[Hardware Error]: event severity: corrected [1181692.861281] {6}[Hardware Error]: Error 0, type: corrected [1181692.861284] {6}[Hardware Error]: fru_text: DIMM ?? [1181692.861288] {6}[Hardware Error]: section_type: memory error [1181692.861290] {6}[Hardware Error]: error_status: Storage error in DRAM memory (0x0000000000000400) [1181692.861295] {6}[Hardware Error]: node:0 [1234246.590139] mce: [Hardware Error]: Machine check events logged [1421485.280812] RAS: Soft-offlining pfn: 0x28e9f7 [1421485.285918] mce: [Hardware Error]: Machine check events logged [1421485.285922] EDAC sbridge MC1: HANDLING MCE MEMORY ERROR [1421485.285925] EDAC sbridge MC1: CPU 0: Machine Check Event: 0 Bank 11: cc000607000800c2 [1421485.285931] EDAC sbridge MC1: TSC 101fbe20d0ef80 [1421485.285933] EDAC sbridge MC1: ADDR 28e9f7000 [1421485.285936] EDAC sbridge MC1: MISC 900044044000e8c [1421485.285939] EDAC sbridge MC1: PROCESSOR 0:406f1 TIME 1724380934 SOCKET 0 APIC 0 [1421485.285957] EDAC MC1: 24 CE memory scrubbing error on CPU_SrcID#0_Ha#0_Chan#2_DIMM#0 (channel:2 page:0x28e9f7 offset:0x0 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0008:00c2 socket:0 ha:0 channel_mask:4 rank:255 ) [1421485.286575] soft_offline_page: 0x28e9f7 page already poisoned [1421485.286597] {7}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [1421485.286600] {7}[Hardware Error]: It has been corrected by h/w and requires no further action [1421485.286603] {7}[Hardware Error]: event severity: corrected [1421485.286606] {7}[Hardware Error]: Error 0, type: corrected [1421485.286609] {7}[Hardware Error]: fru_text: DIMM ?? [1421485.286613] {7}[Hardware Error]: section_type: memory error [1421485.286615] {7}[Hardware Error]: error_status: Storage error in DRAM memory (0x0000000000000400) [1421485.286620] {7}[Hardware Error]: node:0 [1437351.613117] RAS: Soft-offlining pfn: 0xa8e9e5 [1437351.618228] mce: [Hardware Error]: Machine check events logged [1437351.618233] EDAC sbridge MC1: HANDLING MCE MEMORY ERROR [1437351.618236] EDAC sbridge MC1: CPU 0: Machine Check Event: 0 Bank 11: cc000bc7000800c2 [1437351.618241] EDAC sbridge MC1: TSC 104dd026fbc616 [1437351.618244] EDAC sbridge MC1: ADDR a8e9e5000 [1437351.618248] EDAC sbridge MC1: MISC 900202010100e8c [1437351.618250] EDAC sbridge MC1: PROCESSOR 0:406f1 TIME 1724396801 SOCKET 0 APIC 0 [1437351.618268] EDAC MC1: 47 CE memory scrubbing error on CPU_SrcID#0_Ha#0_Chan#2_DIMM#0 (channel:2 page:0xa8e9e5 offset:0x0 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0008:00c2 socket:0 ha:0 channel_mask:4 rank:255 ) [1437351.618874] soft_offline_page: 0xa8e9e5 page already poisoned [1437351.618897] {8}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [1437351.618901] {8}[Hardware Error]: It has been corrected by h/w and requires no further action [1437351.618903] {8}[Hardware Error]: event severity: corrected [1437351.618907] {8}[Hardware Error]: Error 0, type: corrected [1437351.618910] {8}[Hardware Error]: fru_text: DIMM ?? [1437351.618913] {8}[Hardware Error]: section_type: memory error [1437351.618915] {8}[Hardware Error]: error_status: Storage error in DRAM memory (0x0000000000000400) [1437351.618919] {8}[Hardware Error]: node:0 [1804406.402949] perf: interrupt took too long (6188 > 6182), lowering kernel.perf_event_max_sample_rate to 32000 ```
rarias added the
hw
label 2024-08-28 12:14:23 +02:00
rarias changed title from DRAM errors in lake2 to DRAM errors in bay 2024-08-28 12:18:11 +02:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: rarias/jungle#75
No description provided.