Why nvidia drivers sometimes don’t build when new kernel is introduced

graphics
nvidia
omlx-3.0
kernel
Tags: #<Tag:0x00007fb724712450> #<Tag:0x00007fb7247122e8> #<Tag:0x00007fb7247120b8> #<Tag:0x00007fb724711ed8>

(Ben Bullard) #1

Note on spelling: nVidia is meant to refer to the company/organization. All lower case nvidia is meant to refer to any nvidia driver software.

Full disclosure: I don’t have nvidia hardware so my interest is from perspective of QA-Team. And I may have had a few misconceptions. One I seem to share with a lot of our users is that as soon as a new kernel package is introduced then the existing nvidia kernel modules should automatically, right away work with the new kernel. By now we all should realize this does not work this way. Why?

nvidia driver releases are designed to work up to a specific kernel version and often need to be patched to work with later kernel versions. Be aware that nVidia is closed source so these patches are mostly provided by nVidia or other kernel hackers. Whatever the source of the patches they are not usually available until after a new kernel is released. It is therefore inevitable that in some instances there will be a delay between the time when a new kernel is released by OpenMandriva and when a new nVidia driver package can be made available.

So what should nVidia proprietary driver users do?

  1. Pay attention when you update your system so you know when a new kernel version is available. Then:
    a. either don’t install it until the nvida drivers for that kernel are ready OR:
    b. install the new kernel but continue to boot in to the previous one with working nvida driver until drivers for new kernel are available.

  2. Be patient and realize that OpenMandriva developers are a Community of some of the hardest working all volunteer (and part time) folks you would ever want to meet.

And 2 more points worthy of consideration from @bero:

Indeed… I’d like to make sure 2 extra points are pointed out:

  1. The best fix is to not support a company that is rabidly anti-Linux. If you’re buying new hardware, look for something with an AMD or Intel GPU, both of those will do much better. Also email nvidia support and tell them to open their drivers, support nouveau, or be boycotted next time.

  2. If you want to keep your old hardware (let’s not waste electronics), there is a supported driver and it’s called nouveau. It’s not perfect, but certainly good enough these days unless you’re playing ultra high-end games every day.

And thanks to @Colin for the wording in the 3rd paragraph that explains why nvidia driver patches are sometimes not available for latest kernel.

Edit: Edited on 1/27/18 to include language from @Colin and some important points from @bero.


PROPOSAL: Blog post about reasonable expectations for nVidia proprietary driver users
Old problem is back! bbswitch is broken!
Old problem is back! bbswitch is broken!
(Ben Bullard) #2

#3

It is not only that nVidia can’t follow the kernel development. Whether or not willful.
And often have to search for a patch to get it right again.
But also because OMA release the latest version of other program, without thinking about the consequences. In the ‘cooker’ okay. But not in the production version.

Example.
Recently an update of the gcc, from version 7.2.x to 7.3.x
As a result, nVidia 384.111 and 390.25 are no longer built successfully.
Version 384.111 was successful build on Jan. 26. (gcc 7.2.x)

Today:

CC [M] /var/lib/dkms/nvidia-current/384.111-1/build/nvidia/nv-frontend.o
cc1: error: incompatible gcc/plugin versions
CC [M] /var/lib/dkms/nvidia-current/390.25-1/build/nvidia/nv-frontend.o
cc1: error: incompatible gcc/plugin versions


(Tomasz Paweł Gajc) #4

This means, kernel that you are running is compiled with different gcc version than gcc you have installed.
This turbulence is caused by Intel and his Spectre/Meltdown vulnerabilities. We had to release immune kernel with gcc immune to these issues.


#5

TPG,

An other example.
If I try to build the 390.25 version without modification of the Kbuild file.
You get more info about the gcc failure.

make.log.txt (1.5 KB)

Kernel 4.14.14-desktop-22omv was built with gcc version 7.2.1
Later there was an update of gcc, why not together?


(Ben Bullard) #6

The only way to make this better is to participate in package testing. We’ve asked for volunteers for this many times. We are still asking. OpenMandriva is a Community distribution. If the community does not do things then those things do not get done.

OpenMandriva has never since day one had enough people involved in QA and package testing to do all that needs doing.

So to all of you we very much need your participation.

Edit: This is an answer but not the one I’d prefer to be making.


VAAPI not working on Radeon GPU
#7

I’m looking forward new kernel and nvidia packages. I would like to install testing packages but it seems setting testing repos “on” and updating the system also implies installing other packages like systemd.

The question is, is it safe/advisable installing only kernel and nvidia packages without updating systemd with testing packages? I need a working system to my daily needs.


(Ben Bullard) #8

I am a regular tester and I don’t set testing repos on and leave it. And it is perfectly OK to be a tester for specific packages. nVidia is one in particular where we need regular testers to test new packages and also to prompt when nVidia releases new versions. Kernel testing is also needed.

It is OK to selectively update packages if you know what you are doing. For instance I prefer to test things glibc/libc ONLY, systemd ONLY, and so on. Testing kernels needs to be done simultaneously with any packages that have kernel modules to build like VirtualBox and nVidia. So your answer is yes.


#9

Installed new nvidia 390.25 and kernel 4.15 and it seems to work. The only problem is having to call XFdrake to overcome the blackscreen at first reboot.


(Miloslav Havrda) #10

It does not work.
I have Dell E6520 laptop with dual graphic cards. I use an Nvidia only.
It works with 4.13.12 only.

DKMS make.log for nvidia-current-384.98-2 for kernel 4.15.1-desktop-1omv (x86_64)

I see many such errors in the /var/lib/dkms/nvidia-current/384.98-2/build/make.log file.
/var/lib/dkms/nvidia-current/384.98-2/build/nvidia/nv-gpu-numa.c:123:39: warning: passing argument 3 of ‘kernel_read’ makes integer from pointer without a cast [-Wint-conversion]
read_count = kernel_read(filp, 0, read_buffer, read_buffer_size - 1);
^~~~~~~~~~~


#11

Mila,

New nvidia drivers 390.25 are supposed to go fine. They are working fine here. I have them installed here from testing repos. I guess new nvidia are going to be available at non-free-updates soon.


(Miloslav Havrda) #12

Yes, you are right.
I have updated into 390.25 version. It works with the latest kernel.

Thank you.


Kernel-4.14.2 and VirtualBox-5.2.4 available in testing
(Ben Bullard) #13