System froze/hung. No inputs worked. Hardware restart required. (ROME)

Hello,

Requirements:

I have Searched the forum for my issue and found nothing related or helpful
I have checked the Resources category (Resources Index)
I have reviewed the Wiki for relevant information
I have read the the Release Notes and Errata

OpenMandriva Lx version:

@hatMs9950X3DhatMs9950X3D \~\]$ cat /etc/release
OpenMandriva Lx release 25.06 (ROME) Rolling @hatMs9950X3Dor znver1
@hatMs9950X3D \~\]$ cat /etc/lsb-release
LSB_VERSION=
DISTRIB_ID=“OpenMandriva Lx”
DISTRIB_RELEASE=25.06
DISTRIB_CODENAME=ROME
DISTRIB_DESCRIPTION=@hatMs9950X3DOpenMandriva Lx 25.06”
@hatMs9950X3D \~\]$ cat /etc/os-release
NAME=“OpenMandriva Lx”
VERSION=“25.06 (ROME) Rolling”
ID=“openmandriva”
VERSION_ID=“25.06”
PRETTY_NAME=“OpenMandriva Lx 25.06 (ROME) Rolling”
BUILD_ID=“20250605.11”
VERSION_CODENAME=“rome”
ANSI_COLOR=“1;43”
LOGO=“openmandriva”
CPE_NAME=“cpe:/o:openmandriva:openmandriva_lx:25.06”
HOME_URL=“http://openmandriva.org/”
BUG_REPORT_URL=“ https://github.com/OpenMandrivaAssociation/distribution/issues/ ”
SUPPORT_URL=“https://forum.openmandriva.org”
PRIVACY_POLICY_URL=“ https://www.openmandriva.org/tos ”

Desktop environment (KDE, LXQT…):

KDE Plasma 6; X11

Description of the issue (screenshots if relevant):

(See title) Beyond that, I’m not sure what is relevant, but I will include what I think might be here.

-I had multiple UG Chromium tabs, in ~6 windows, across 3 workspaces, open.

-I also 2 or 3 Kate documents open in different Kate tabs, and was working in one of them when it froze (ironically I was trying to figure out how to use this template for a different issue at the time lol)

-Other running apps were: Konsole; and; ProtonVPN; Proton Mail; Proton Pass; SimpleX (All FlatPaks)

-Installation was from an ISO using Ventoy, but it installed fine so I don’t think this is related to the mention in the errata.

-Another post I found in Support that mentioned system hangs was This One in which @ben79 (the legend) mentioned doing these

$ dmesg > dmesg.txt

dmesg.txt (151.2 KB)

$ inxi -F > inxi.txt

inxi.txt (3.2 KB)

$ sudo journalctl > journal.txt

journal.txt is too big and I have no clouds, nor do I want clouds, but I will find a cloud if I have to :frowning: . If I do have to, can you recommend one to me?
I have a feeling that log is going to be important.
{EDIT; It just occured to me, I think Proton.me do some sort of cloud storage thingy, would this work if it’s needed?}

-The system has hung on boot at least twice since install. Unfortunately they happened while I was making coffee after hitting power button so I don’t know what happened, but I returned to find nothing but the MSI logo and prompts for bios and boot menu keys, neither of which actually responded, and I have had to soft-off and restart.

-I tried to install HDAJackRetask from Discover a couple of days ago and it failed with the pop-up error message
“The PackageKit daemon has crashed”
(I now know this is recommended against, sorry)

I have downloaded
OpenMandrivaLx.rolling-snapshot.20250801.4119-plasma6x11.znver1.iso (the latest)
and Balena Etched it directly onto a USB drive in case this is related to having used Ventoy (or if I otherwise have to reinstall)

Let me know what else I can do to help with this

Thank you for your time and effort, I really can’t express how much I admire volunteers.

Relevant informations (hardware involved, software version, logs or output…):

Try limiting your journalctl output to a specific boot that froze. If it were the previous boot it would be:

journalctl -b -1 > journal.txt

Your current boot would be 0. To list previous boots with date and time:

journalctl --list-boots

To filter logs from a specific time before the freeze:

journalctl --since "YYYY-MM-DD HH:MM:SS"

Info:
  Memory: total: 96 GiB available: 91.9 GiB used: 3.01 GiB (3.3%)
  Processes: 512 Uptime: 27m Shell: Bash inxi: 3.3.37

Probably not memory so I do not know.

journal-10-07.txt (3.1 MB)

That worked. ty

This has happened to me in the past on a lot of systems and it ended up being a bad CR2032 battery on the motherboard. Your BIOS should also be as up to date as it can be.

Try removing any exotic or unneeded hardware (including secondary or external monitors, especially using HDMI). Then start adding things back until it produces the issue again.

Posting this update from the laptop.

Got home today and it failed to boot again. This time, (the second time since I first posted this yesterday) it is stuck on black screen, after grub but before log-in screen, there is a cursor but it’s not blinking (I found some other posts that mentioned a blinking cursor). This suggests, to my limited understanding, it may be related to X11, which might lend weight to monitors being involved?

I should have left it sit on the black screen for a while, see if it dropped out to the MSI logo screen like previously reported.
Instead, I started wondering if it might be hardware related myself, I rebooted and started the mem-test from OM boot screen (currently still running).
It is all new hardware, other than the monitors. The BIOS is up to date (first thing I did).
There was a weird issue with “Overcurrent have been detected on usb, shutting down to protect mainboard” but I thought this had been related to having my K95 KB plugged into a usb2 instead of the usb3 specified in its manual. I haven’t had that error since I switched it to usb3.
I was intending to run bench/stress tests, but I didn’t get that far.

I have been on the edge of reinstalling with ROCK a couple of times now, but I hate admitting defeat, and would rather take all this as a learning experience.

Hmmm, Mem-teat just passed, but it’s not responding to KB inputs now.

I think I’ll grab the old basic Logitek usb KB and mouse as well as unplug 1 of the monitors.

####. Meme test is re-running, no KB/Mouse response. What is going to happen when I have to power off mid test?

RAM is volatile, meaning it clears when you remove the power source. So, nothing will happen.

What have you tried from these suggestions?

OK time for another update.
TLDR I’m thinking it’s actually GPU and/or mobo issue.

Last night
-Pulled everything but the GPU, Drives and RAM (which has passed complete mem-test twice now so should be fine); Different, basic, usb mouse/KB; diff monitor, (hdmi for a reason explained soon).
-Reset the BIOS
-Restarts and reboots, both from terminal and “start” menu, 12 times in total.
— Got several hangs on boot (around 1 in 4 or 5). Some were after log-in, before the “Plasma 6” popped up; some were (I think, was getting tired at this point) after grub, but before log-in; 2 were before grub menu, stuck on MSI splash screen “Del” and “F11” prompts with no KB response.
-Pre-Grub boot fails were giving debug code 4E. That code is not listed in the manual and I found no listing for it on searches (in spite of AI “hallucinations lol).
However, Codes 10 – 1C; 2B – 2F; 31 – 3E; and 4F are PEI Progress Codes, with 4F being “DXE IPL is started” (ie. transition to the next phase?)
I tried reading this thinking I might get some insight into what the final steps in the PEI phase are, but either it’s too dense or I am lol (specifically, there were too many new acronyms for me to translate them on the fly in order to form a coherent understanding).
-A storm and sleep interrupted me here.

This morning
-Removed GPU (only HDMI or DP over usbc on the mobo, and I don’t have a USB DP cable, hence HDMI)
-Re-ran the multiple reboot series. NO FAILURES (Note, the reboot times also seemed a lot snappier without the GPU)
Thinking it must be GPU and/or driver related
-Reconnected GPU, re-ran reboot series (Reboot from CLIx3; Reboot from GUI x3; Shutdown from CLIx3; Shutdown from GUIx3)
—GUI reboot#2__Hang after log-in
—GUI Shutdown#2__Hang on MSI logo screen; Restart with reset button worked this time BUT… This time it was on debug code 4d and I got another
Overcurrent have been detected on your USB device. System will shutdown to protect mainboard
Warning again.
So now I am thinking It must be related to the gpu AND/OR the mobo.

Next Steps include
-Pull the drives also, then re-test.
-Test new GPU in old system, and test old GPU in new system.
-Throw a single drive back in the new system and throw in a W10 install and test again.

Should I continue keeping this updated for the curious, or just let the thread die quietly since it seems it’s probably unrelated to OMLx, or is there still a possibility it is driver related?

That actually occurred to me after I added that after-thought :embarrassed: lol

No need.

Do you have another power supply to test with?

Yeah; this made me wonder the same thing.

Good point, I should add that to the troubleshooting also.
I am a bit more doubtful that it’s the psu, seems to me if the fault were there the overcurrent would be basically constant, still definitely worth eliminating though.
It’s also crossed my mind to remove the PCIE_PWR1 plug. New connector and all, maybe MSI need more practice with it lol (I know there is absolutely no need for it with my card but decided to hook it up anyway).
This will all have to wait until tomorrow though, today’s all booked out with other commitments.

Quick update for those interested.

-Set up with a single drive, no gpu, no “PCIE_PWR” connected.
-mouse/kb in the back-panel USB3. _> Multi re-boot sequence, no failures.
-mouse/kb in the back-panel USB2. _> Multi re-boot sequence, multiple failures, including 4E and 4d debug codes @ MSI Splash screen, and a “Overcurrent” warning again.

I have started an in-store warranty claim with pccasegear, along with details of the troubleshooting done.

I didn’t end up testing on my older PSU. TBH I got lazy, I took one look at my rats-nest cable “management” and the thought of undoing it all, and trying to redo it again later, just for what seemed by this point a pretty slim possibility, was too much. Besides, I wanted to get the warranty process started before the W/E.

So, the bit that’s actually relevant for this forum. It seems pretty unlikely it’s actually an OMLx related issue after all.

2 Likes

Keep us posted.

you have more in your log ,

  • how many nvme are inside
  • what is your video card ? 9700 XT ?
  • have you used bifurcation in UEFI ?
  • have you these fails with others live iso Linux ?
  • PSU watt is ok ?

https://www.msi.com/support/download-manual/MAG-X870E-TOMAHAWK-WIFI

  • I had 2 NVME’s installed: A 2TB Gen5 Kingston Fury in M.2_1; and a 2TB Gen3 T-Create Classic in M.2_4. (Also a 1TB Seagate Barracuda 2.5” in SATA_1)
  • GPU is an RX 9070 XT (Powercolor Reaper fwiw)
  • I didn’t set any bifurcation options in BIOS, before or after I did the bios reset before testing. (Only settings I changed after reset were turning of the $#%##$%@ “MSI Driver Utility Installer” :rage face: and making sure the secure boot and TPM settings were all off. I didn’t even turn EXPO back on). So, going by the Block Diagram (Manual p.73) there should have been no lane sharing active.
    (This is why I went for the MSI board instead of Gigabyte, for the first time in my life. It was the only X870E board I could find in my acceptable price range that didn’t split the X16 _> CPU lanes to share them with the primary m.2 Gen 5 (or anything else). I know the 9070 probably wouldn’t need more than 8 Gen5 lanes anyway, but this is supposed to be a 10+ year build with a probable GPU upgrade in another 2-4 generations)
  • I didn’t end up testing with other OS’s, live or otherwise. I had intended to but that plan somehow got lost in my chaotic cogitations. /-:
    And now the board is all packed up ready to send back.
    I’m pretty sure it is a board issue though, I can’t see OS errors causing a “Overcurrent have been detected on usb devce” failure. Not that I would know though I suppose. Besides, I got that error before I even installed, but managed to convince myself it was the fancy KB with all the flashy bling being in USB 2 instead of 3 and promptly forgot about it while installing and setting up OMLx. (note to self, try not to let your instincts be overridden by your wishes in the future)
  • PSU is a SuperFlower Leadex VII Pro; 1000W; 80+Platinum; Cybernetics Platinum; 10 Year Warranty. Again, overkill for current build config, but who knows how hungry GPU’s will be in the coming generations.

One day I will be able to not only read journalctl logs, but understand them also! (-:

(Ps. I hate emojis! Bring back smileys please, just so this old man has one less cloud to scream at lol)

in your case avoid if possible Gen3 nvme , only Gen5 or Gen4 for Slot for limited slot M2.

i have a gigabyte x870e elite with 4 nvme , but it share pci lane gen5 with video card ,

for one nvme Gen5 is ok

for 2 nvme you will goe back for x 8

for more it will go down for x4

i have a PSU 850W titanium , this is enough for a 9900x and 9070XT gigabyte

i get on boot some lost video link ( DP ) from PM amdgpu , and i see nvme are later in systemd-analyze ( sometime 1 second for the booting nvme )

First, I think I owe you (and everyone) an apology. It looks like English is not your first language. I should be using more straight-forward language. I’m sorry.

I’m not sure what you mean by “limited slot M.2”. I’m guessing you mean the M.2 slots connected through the chipset/s.
I’m not sure why I should avoid Gen 3 M.2’s, other than they would only use half the bandwidth of the Gen 4 lanes.

That is why I chose MSI Tomahawk, it doesn’t bifurcate the x16 GPU lanes at all, other than in a dual GPU configuration.
(Unless the MSI manual, including the block-diagram, is lying to me. I wouldn’t be surprised if it was lol)

If I upgrade the GPU in 4 – 8 years 850W might not be enough. 1000W might not be enough either, but it was the most $ I was willing to spend.

I had to look up what “PM” amdgpu is. I found this article (from a website I am not familiar with).
I assume you are not referring to OC-ing your gpu.
I think you are saying that sometimes your amd drivers are reporting that the gpu is not connecting properly over Display Port on boot.
Do you think this might be related to my boot issues? If so, can you explain how and/or how to check for myself after I have re-built with the replacement board.

I think you are saying the boot-drive M.2 is taking too long to… (connect? or initialise? or respond and boot?)
Do you mean in your PC or in mine?
Is the “systemd-analyze” the same as the journalct output?
I really need to spend some more time trying to learn how to read that log.

I hope I have been clear and this is at least easy for the AI to translate.
If someone can recommend an online translator that isn’t tied to google
(don’t be evil don’t believ)
you could post in your own language and I could try that perhaps?