On Thu, 1 Feb 2024 at 09:45, Srdjan Todorovic todorovic.s@googlemail.com wrote:
I forgot to mention, the lockups happen more often when graphics is being used, e.g. steam games, obsidian graph view, sometimes YouTube videos. So I may need to look into the Nvidia drivers too. Yes, tainted kernel.
On Thu, 1 Feb 2024, 09:29 Adam Bower, adam@thebowery.co.uk wrote:
On Wed, Jan 31, 2024 at 10:00:38PM +0000, Srdjan Todorovic wrote:
How does one even know if their NVME drive is failing? AFAIK, smartctl doesnt work on them in the same way it does on spinning drives. Or is this a kernel bug?
There's a command nvme or nvme-cli that may help.
https://github.com/linux-nvme/nvme-cli
I'd just install the older kernel and boot that manually and see what
In terms of installing older kernels, when I tried (I don't understand the Ubuntu way of doing this), apt seemed to want to uninstall a lot of things, and I wasn't sure if I'd have a working system afterwards.
Before December, the machine was incredibly stable no matter what I threw at it.
Ok so if I try to install the older kernel: sudo apt-get install linux-image-5.15.0-88-generic It says: linux-image-5.15.0-88-generic is already the newest version (5.15.0-88.98).
However, uname lists this as the current running kernel: 5.15.0-94-generic
I am not confident enough with grub / grub2 to know how to make it boot the old one permanently - wasn't it the case that the config files are no longer config files but are now actually scripts?
However I did install nvme-cli, and just got this:
Smart Log for NVME device:nvme0n1 namespace-id:ffffffff critical_warning : 0 temperature : 48 C (321 Kelvin) available_spare : 100% available_spare_threshold : 10% percentage_used : 1% endurance group critical warning summary: 0 data_units_read : 68,600,474 data_units_written : 68,516,189 host_read_commands : 304,422,351 host_write_commands : 428,570,321 controller_busy_time : 2,254 power_cycles : 1,189 power_on_hours : 2,483 unsafe_shutdowns : 20 media_errors : 0 num_err_log_entries : 3,566 Warning Temperature Time : 0 Critical Composite Temperature Time : 0 Temperature Sensor 1 : 48 C (321 Kelvin) Temperature Sensor 2 : 52 C (325 Kelvin) Thermal Management T1 Trans Count : 0 Thermal Management T2 Trans Count : 0 Thermal Management T1 Total Time : 0 Thermal Management T2 Total Time : 0
Particularly of note is the large number for 'num_err_log_entries', I'm doing some googling to work out if this is bad (if anyone knows already please let me know).
Helpful pointers appreciated, thanks!
Srdjan