[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Unstable stable KDE (was: Fwd: too late to the party?)



Hello Soren, hello list,

first of all: i am receiving your postings twice, one thru the list and
one directly. I would prefer, reading them only once, you can decide, if
via the list, or off-list, whatever you prefer.

It was interesting for me to learn, what rasdaemon is all about, and the
planned move away from edac. But i am still wondering about the correct
interpretation of my problems. When i had time to think about it, i was
even *hoping*, a memory error would have been the cause, as that could
also explain other observations with the current system. - AND - That
would open an opportunity to resolve the issues with a few bucks. ;-)

( Several years back, my computer really had memory defects and i had to
replace some chips. Since that experience, i am using server-grade
hardware, with ECC memory and other stuff.)

You suggested a memtest. Unfortunately, my only OS, that can use grub,
is the outdated buster. I installed memtest from debian there, but
found, it just did not work at all. That is why i found the PassMark
free memtest 11.0, which also comes in an efi flavor. And via grub, i
could start that one. Result: 0 errors and 0 ecc errors (after 2 passes
of their whole test suite, will have it running for some more later).

the only surprise was, that during the 2nd pass, there was a message,
and the test waitet for confirmation before proceeding. The message read:

> UEFI Firmware Error: Couldnt start CPU 15.

tbh, i dont care, the test recognized 64 CPU's and decided to use only
16 of them, probably because it couldnt deal with the second socket from
my 2 EPYCs, only one got used. (shrugs shoulder)
After confirmation, the test went on and terminated successfully.

So i am - more or less - back to the drawing board, not knowing, what
causes instabilities, maybe i should trust the following part of the
original kernel message:
> error:0 in libsqlite3.so.0.8.6

Anyhow, as you can see, i took your guidance literally, but cannot -
uptil now - confirm there being memory issues.

DdB

-------- Weitergeleitete Nachricht --------
Betreff: Re: too late to the party?
Datum: Tue, 27 Aug 2024 14:32:49 -0700
Von: Soren Stoutner <soren@stoutner.com>
An: debian-kde@lists.debian.org, DdB
<debianlist@potentially-spam.de-bruyn.de>

According to the upstream documentation, rasdaemon particularly has to
do with ECC errors (not all of which are correctable).

https://github.com/alexandrelimassantana/rasdaemon


On August 27, 2024 2:12:10 PM MST, DdB
<debianlist@potentially-spam.de-bruyn.de> wrote:
>Am 27.08.2024 um 18:17 schrieb Soren Stoutner:
>> Google indicates rasdaemon has to do with hardware memory errors.  If I
>> were you I would scan the system with Memtest86+.
>> 
>Wow! That seems extremely unlikely to me, because my hardware has lots
>of ECC RAM, which means, RAM errors should
>
>1. self correct
>and
>2. produce a different kind of message
>
>from dmesg: ECC is properly recognized and EDAC is running
>
>> [    8.334662] EDAC MC0: Giving out device to module amd64_edac controller F17h_M30h: DEV 0000:00:18.3 (INTERRUPT)
>> [    8.335358] EDAC amd64: Node 1: DRAM ECC enabled.
>
>That is why i dont think, memtest would even work as expected.
>
>Will consider it anyhow, once the machine is less busy.
>


-- 
Soren Stoutner
soren@stoutner.com


-- 
<pre>
Liebe ist ...
Datakanja
</pre>


Reply to: