### Submission type
- Bug report
### systemd version the issue has been s…een with
v237
### Used distribution
Debian unstable
Filed as downstream bug report at
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=890641
The following is copied from there.
Hi,
I'm filing this against udev, because that's where the fallout of this
first became obvious to me, though looking at the code there are other
places in systemd effected by it too (pretty much everywhere that the
udev_device_get_action() function is used). And given we'll release
with a kernel that breaks the assumptions this code makes, it really
should be RC - but I'm not going to niggle over severity right now.
The symptom that first got my attention was that setting the SYMLINK
key in a rule for a USB device was no longer creating that symlink in
Buster, despite working as expected in Stretch and several releases
prior.
After fixing the broken udev logging in Buster which made it impossible
to see why that was happening, it appears that the answer is Linux 4.12
added the actions "bind" and "unbind" for notification of devices being
bound to, or released from, a driver. And udev handles that poorly, so
the symlink gets created as expected for the "add" event, and then
promptly gets deleted again when the "bind" event is processed.
Bind events were added in 1455cf8dbfd06aa7651dcfccbadb7a093944ca65 of
Linus' tree. There was a followup commit two months later (Sep 2017)
6878e7de6af726de47f9f3bec649c3f49e786586, to work around a different
symptom of the same brand of problems in udev when MODALIAS is included
in an unbind event - but this one is still a live breakage. And there
may well be others that still just haven't been noticed yet.
The code in systemd/udev (and all other users of these events too)
really should be explicit about exactly which actions are expected to
be processed in any given code path, with "default" paths not doing
anything more than reporting an unknown event was received. It can't
possibly assume safely that any unknown future event should just be
treated the same as "add" or "change" would be. And looking at the
current code it's almost impossible to tell whether some of the less
common action types really were expected to follow the path they will,
or if they are just rare enough that nobody has both seen it explode
and understood why it did so ...
This is a minimal debug log from udev that demonstrates the problem
as seen from a rule setting SYMLINK+="bitbabbler/$attr{serial}", it
was logged with 50-udev-default.rules and 80-drivers.rules disabled
(while trying to figure out if it was some bad interaction with them
at fault), but that makes no difference to the end result here, it
just eliminates the distracting noise they'd interleave in the log.
- The device is hotplugged
```
Feb 17 10:16:21 buster kernel: [ 1208.116054] usb 1-2: new high-speed USB device number 9 using xhci_hcd
Feb 17 10:16:21 buster kernel: [ 1208.757285] usb 1-2: New USB device found, idVendor=0403, idProduct=7840
Feb 17 10:16:21 buster kernel: [ 1208.758344] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Feb 17 10:16:21 buster kernel: [ 1208.759314] usb 1-2: Product: White RNG
Feb 17 10:16:21 buster kernel: [ 1208.759842] usb 1-2: Manufacturer: BitBabbler
Feb 17 10:16:21 buster kernel: [ 1208.760464] usb 1-2: SerialNumber: JAXYAZ
Feb 17 10:16:21 buster systemd-udevd[262]: seq 1794 queued, 'add' 'usb'
Feb 17 10:16:21 buster systemd-udevd[262]: Validate module index
Feb 17 10:16:21 buster systemd-udevd[262]: Check if link configuration needs reloading.
Feb 17 10:16:21 buster systemd-udevd[262]: seq 1794 forked new worker [568]
```
- Begin processing the "add" action for seq 1794
```
Feb 17 10:16:21 buster systemd-udevd[568]: seq 1794 running
Feb 17 10:16:21 buster systemd-udevd[568]: GROUP 110 /lib/udev/rules.d/60-bit-babbler.rules:27
Feb 17 10:16:21 buster systemd-udevd[568]: MODE 0660 /lib/udev/rules.d/60-bit-babbler.rules:27
Feb 17 10:16:21 buster systemd-udevd[568]: LINK 'bitbabbler/JAXYAZ' /lib/udev/rules.d/60-bit-babbler.rules:27
Feb 17 10:16:21 buster systemd-udevd[568]: RUN '/usr/bin/bbvirt attach $attr{serial} --busnum $attr{busnum} --devnum $attr{devnum}' /lib/udev/rules.d/60-bit-babbler.rules:27
Feb 17 10:16:21 buster systemd-udevd[568]: RUN '/usr/bin/setfacl -m g:bit-babbler:rw $devnode' /lib/udev/rules.d/60-bit-babbler.rules:36
Feb 17 10:16:21 buster systemd-udevd[568]: ATTR '/sys/devices/pci0000:00/0000:00:1e.0/0000:01:01.0/0000:02:03.0/usb1/1-2/power/control' writing 'auto' /lib/udev/rules.d/60-bit-babbler.rules:43
Feb 17 10:16:21 buster systemd-udevd[568]: ATTR '/sys/devices/pci0000:00/0000:00:1e.0/0000:01:01.0/0000:02:03.0/usb1/1-2/power/autosuspend_delay_ms' writing '2000' /lib/udev/rules.d/60-bit-babbler.rules:44
Feb 17 10:16:21 buster systemd-udevd[262]: seq 1795 queued, 'add' 'usb'
Feb 17 10:16:21 buster systemd-udevd[262]: seq 1796 queued, 'bind' 'usb'
Feb 17 10:16:21 buster systemd-udevd[568]: handling device node '/dev/bus/usb/001/009', devnum=c189:8, mode=0660, uid=0, gid=110
Feb 17 10:16:21 buster systemd-udevd[568]: set permissions /dev/bus/usb/001/009, 020660, uid=0, gid=110
Feb 17 10:16:21 buster systemd-udevd[568]: creating symlink '/dev/char/189:8' to '../bus/usb/001/009'
Feb 17 10:16:21 buster systemd-udevd[568]: creating link '/dev/bitbabbler/JAXYAZ' to '/dev/bus/usb/001/009'
Feb 17 10:16:21 buster systemd-udevd[568]: creating symlink '/dev/bitbabbler/JAXYAZ' to '../bus/usb/001/009'
Feb 17 10:16:21 buster bbvirt[569]: starting '/usr/bin/bbvirt attach JAXYAZ --busnum 1 --devnum 9'
Feb 17 10:16:21 buster systemd-udevd[568]: Process '/usr/bin/bbvirt attach JAXYAZ --busnum 1 --devnum 9' succeeded.
Feb 17 10:16:21 buster setfacl[570]: starting '/usr/bin/setfacl -m g:bit-babbler:rw /dev/bus/usb/001/009'
Feb 17 10:16:21 buster systemd-udevd[568]: Process '/usr/bin/setfacl -m g:bit-babbler:rw /dev/bus/usb/001/009' succeeded.
Feb 17 10:16:21 buster systemd-udevd[568]: seq 1794 processed
```
- The "add" event completes being handled. At this point everything
has worked exactly as we normally expect it to.
```
Feb 17 10:16:21 buster systemd-udevd[568]: seq 1795 running
Feb 17 10:16:21 buster systemd-udevd[568]: seq 1795 processed
```
- Then we begin processing the "bind" action seq 1796
```
Feb 17 10:16:21 buster systemd-udevd[568]: seq 1796 running
Feb 17 10:16:21 buster systemd-udevd[568]: update old name, '/dev/bitbabbler/JAXYAZ' no longer belonging to '/devices/pci0000:00/0000:00:1e.0/0000:01:01.0/0000:02:03.0/usb1/1-2'
Feb 17 10:16:21 buster systemd-udevd[568]: no reference left, remove '/dev/bitbabbler/JAXYAZ'
Feb 17 10:16:21 buster systemd-udevd[568]: handling device node '/dev/bus/usb/001/009', devnum=c189:8, mode=0600, uid=0, gid=0
Feb 17 10:16:21 buster systemd-udevd[568]: preserve already existing symlink '/dev/char/189:8' to '../bus/usb/001/009'
Feb 17 10:16:21 buster systemd-udevd[568]: seq 1796 processed
```
- And it's all gone out for a long lunch ...
```
Feb 17 10:16:21 buster systemd-udevd[262]: cleanup idle workers
Feb 17 10:16:21 buster systemd-udevd[568]: Unload module index
Feb 17 10:16:21 buster systemd-udevd[568]: Unloaded link configuration context.
Feb 17 10:16:21 buster systemd-udevd[262]: worker [568] exited
```
This is what 'udevadm monitor' reports for the events which do what is
seen above. It's not from the same test run, but aside from the USB
device enumeration changing, it's the same sequence of events from a
hotplug of the device.
```
KERNEL[52.409511] add /devices/pci0000:00/0000:00:1e.0/0000:01:01.0/0000:02:03.0/usb1/1-3 (usb)
ACTION=add
BUSNUM=001
DEVNAME=/dev/bus/usb/001/004
DEVNUM=004
DEVPATH=/devices/pci0000:00/0000:00:1e.0/0000:01:01.0/0000:02:03.0/usb1/1-3
DEVTYPE=usb_device
MAJOR=189
MINOR=3
PRODUCT=403/7840/900
SEQNUM=1760
SUBSYSTEM=usb
TYPE=0/0/0
KERNEL[52.411157] add /devices/pci0000:00/0000:00:1e.0/0000:01:01.0/0000:02:03.0/usb1/1-3/1-3:1.0 (usb)
ACTION=add
DEVPATH=/devices/pci0000:00/0000:00:1e.0/0000:01:01.0/0000:02:03.0/usb1/1-3/1-3:1.0
DEVTYPE=usb_interface
INTERFACE=255/255/255
MODALIAS=usb:v0403p7840d0900dc00dsc00dp00icFFiscFFipFFin00
PRODUCT=403/7840/900
SEQNUM=1761
SUBSYSTEM=usb
TYPE=0/0/0
KERNEL[52.411193] bind /devices/pci0000:00/0000:00:1e.0/0000:01:01.0/0000:02:03.0/usb1/1-3 (usb)
ACTION=bind
BUSNUM=001
DEVNAME=/dev/bus/usb/001/004
DEVNUM=004
DEVPATH=/devices/pci0000:00/0000:00:1e.0/0000:01:01.0/0000:02:03.0/usb1/1-3
DEVTYPE=usb_device
DRIVER=usb
MAJOR=189
MINOR=3
PRODUCT=403/7840/900
SEQNUM=1762
SUBSYSTEM=usb
TYPE=0/0/0
UDEV [52.416928] add /devices/pci0000:00/0000:00:1e.0/0000:01:01.0/0000:02:03.0/usb1/1-3 (usb)
ACTION=add
BUSNUM=001
DEVLINKS=/dev/bitbabbler/JAXYAZ
DEVNAME=/dev/bus/usb/001/004
DEVNUM=004
DEVPATH=/devices/pci0000:00/0000:00:1e.0/0000:01:01.0/0000:02:03.0/usb1/1-3
DEVTYPE=usb_device
ID_BUS=usb
ID_MODEL=White_RNG
ID_MODEL_ENC=White\x20RNG
ID_MODEL_ID=7840
ID_REVISION=0900
ID_SERIAL=BitBabbler_White_RNG_JAXYAZ
ID_SERIAL_SHORT=JAXYAZ
ID_USB_INTERFACES=:ffffff:
ID_VENDOR=BitBabbler
ID_VENDOR_ENC=BitBabbler
ID_VENDOR_FROM_DATABASE=Future Technology Devices International, Ltd
ID_VENDOR_ID=0403
MAJOR=189
MINOR=3
PRODUCT=403/7840/900
SEQNUM=1760
SUBSYSTEM=usb
TYPE=0/0/0
USEC_INITIALIZED=52411809
UDEV [52.417938] add /devices/pci0000:00/0000:00:1e.0/0000:01:01.0/0000:02:03.0/usb1/1-3/1-3:1.0 (usb)
ACTION=add
DEVPATH=/devices/pci0000:00/0000:00:1e.0/0000:01:01.0/0000:02:03.0/usb1/1-3/1-3:1.0
DEVTYPE=usb_interface
ID_VENDOR_FROM_DATABASE=Future Technology Devices International, Ltd
INTERFACE=255/255/255
MODALIAS=usb:v0403p7840d0900dc00dsc00dp00icFFiscFFipFFin00
PRODUCT=403/7840/900
SEQNUM=1761
SUBSYSTEM=usb
TYPE=0/0/0
USEC_INITIALIZED=52417785
UDEV [52.418633] bind /devices/pci0000:00/0000:00:1e.0/0000:01:01.0/0000:02:03.0/usb1/1-3 (usb)
ACTION=bind
BUSNUM=001
DEVNAME=/dev/bus/usb/001/004
DEVNUM=004
DEVPATH=/devices/pci0000:00/0000:00:1e.0/0000:01:01.0/0000:02:03.0/usb1/1-3
DEVTYPE=usb_device
DRIVER=usb
ID_BUS=usb
ID_MODEL=White_RNG
ID_MODEL_ENC=White\x20RNG
ID_MODEL_ID=7840
ID_REVISION=0900
ID_SERIAL=BitBabbler_White_RNG_JAXYAZ
ID_SERIAL_SHORT=JAXYAZ
ID_USB_INTERFACES=:ffffff:
ID_VENDOR=BitBabbler
ID_VENDOR_ENC=BitBabbler
ID_VENDOR_FROM_DATABASE=Future Technology Devices International, Ltd
ID_VENDOR_ID=0403
MAJOR=189
MINOR=3
PRODUCT=403/7840/900
SEQNUM=1762
SUBSYSTEM=usb
TYPE=0/0/0
USEC_INITIALIZED=52411809
KERNEL[52.422682] bind /devices/pci0000:00/0000:00:1e.0/0000:01:01.0/0000:02:03.0/usb1/1-3/1-3:1.0 (usb)
ACTION=bind
DEVPATH=/devices/pci0000:00/0000:00:1e.0/0000:01:01.0/0000:02:03.0/usb1/1-3/1-3:1.0
DEVTYPE=usb_interface
DRIVER=usbfs
INTERFACE=255/255/255
MODALIAS=usb:v0403p7840d0900dc00dsc00dp00icFFiscFFipFFin00
PRODUCT=403/7840/900
SEQNUM=1763
SUBSYSTEM=usb
TYPE=0/0/0
UDEV [52.423548] bind /devices/pci0000:00/0000:00:1e.0/0000:01:01.0/0000:02:03.0/usb1/1-3/1-3:1.0 (usb)
ACTION=bind
DEVPATH=/devices/pci0000:00/0000:00:1e.0/0000:01:01.0/0000:02:03.0/usb1/1-3/1-3:1.0
DEVTYPE=usb_interface
DRIVER=usbfs
ID_VENDOR_FROM_DATABASE=Future Technology Devices International, Ltd
INTERFACE=255/255/255
MODALIAS=usb:v0403p7840d0900dc00dsc00dp00icFFiscFFipFFin00
PRODUCT=403/7840/900
SEQNUM=1763
SUBSYSTEM=usb
TYPE=0/0/0
USEC_INITIALIZED=52417785
```
And when it is unplugged, there are likewise new "unbind" events to
deal with, along with the normal "remove" set.
And that should be enough of what I collected while getting to the
bottom of this to make the problem clear, and reasonably easy to
reproduce with any handy device. Looking at the code the problem
should be obvious enough anyway though. I was tempted to also include
a preliminary patch for it - but it really all needs careful auditing
by as many of the eyes responsible for the existing assumptions as
possible, at least in the first pass through.
The udev rules file for bit babbler looks like this:
```
SUBSYSTEM!="usb", GOTO="bb_end"
ACTION!="add|change", GOTO="bb_end_add"
# This is what we'd like to do. Skip all the rules here if Vendor:Product
# is not 0403:7840 -- but that's not what these two tests will actually do
# (at least with udev versions up to 232-25 which shipped in Stretch).
# If the device we're handling an event for doesn't have the idVendor or
# idProduct attributes at all, then these tests are still false, the same
# as if they did have the values which we are testing for ...
#ATTR{idVendor}!="0403", GOTO="bb_end"
#ATTR{idProduct}!="7840", GOTO="bb_end"
# So instead, we need to explicitly test that they *are* the values we are
# looking for, and play goto leapfrog to get the control flow logic we want.
ATTR{idVendor}=="0403", ATTR{idProduct}=="7840", GOTO="bb_add"
GOTO="bb_end"
LABEL="bb_add"
# Allow users in group bit-babbler to access the device directly.
# Create a symlink to a well known name that can be used in the cgroup_device_acl
# configuration in /etc/libvirt/qemu.conf, and for other similar purposes too.
# And run the bbvirt script to see if this device was configured for hotplugging
# into a virtual machine.
GROUP="bit-babbler", MODE="0660", SYMLINK="bitbabbler/$attr{serial}", \
RUN+="/usr/bin/bbvirt attach $attr{serial} --busnum $attr{busnum} --devnum $attr{devnum}"
# If ACLs are supported, grant users in the bit-babbler group access to the device
# with them too. This is mainly so that if a VM is halted, the device will revert
# to normal access from the host system again. The libvirt 'managed' mode will not
# restore the original ownership when it releases the device, it will just make it
# be root:root, stomping the GROUP we set above.
TEST=="/usr/bin/setfacl", RUN+="/usr/bin/setfacl -m g:bit-babbler:rw $devnode"
# Enable USB autosuspend. The BitBabbler devices support suspending correctly,
# though not every controller they might be plugged into will always play nicely.
# It should be safe to enable it here, even if an upstream hub or controller
# needs it disabled. The XHCI controllers seem to be the most troublesome, but
# mostly with older kernels.
TEST=="power/control", ATTR{power/control}="auto"
TEST=="power/autosuspend_delay_ms", ATTR{power/autosuspend_delay_ms}="2000"
LABEL="bb_end_add"
ACTION!="remove", GOTO="bb_end"
# Explicitly detach unplugged devices from the VM if they were passed through to it.
# If we don't do this, the stale <hostdev> configuration will remain, and could
# match some other completely different device that is plugged in later ...
# This is why we can't make persistent changes to the domain definition for VMs that
# aren't running when the device is plugged in, because if the host goes down without
# this rule being run, we'd never clean those up.
#
# We can't test against the attributes here, if this would match they are already gone.
ENV{ID_VENDOR_ID}=="0403", ENV{ID_MODEL_ID}=="7840", \
RUN+="/usr/bin/bbvirt detach $env{ID_SERIAL_SHORT} --busnum $env{BUSNUM} --devnum $env{DEVNUM}"
LABEL="bb_end"
```