Access Control on a Device File

Offering access control is sometimes vital for the reliability of a device node. Not only should unauthorized users not be permitted to use the device (a restriction is enforced by the filesystem permission bits), but sometimes only one authorized user should be allowed to open the device at a time.

The problem is similar to that of using ttys. In that case, the login process changes the ownership of the device node whenever a user logs into the system, in order to prevent other users from interfering with or sniffing the tty data flow. However, it’s impractical to use a privileged program to change the ownership of a device every time it is opened, just to grant unique access to it.

None of the code shown up to now implements any access control beyond the filesystem permission bits. If the open system call forwards the request to the driver, open will succeed. We now introduce a few techniques for implementing some additional checks.

Every device shown in this section has the same behavior as the bare scull device (that is, it implements a persistent memory area) but differs from scull in access control, which is implemented in the open and close operations.

Single-Open Devices

The brute-force way to provide access control is to permit a device to be opened by only one process at a time (single openness). This technique is best avoided because it inhibits user ingenuity. A user might well want to run different processes on the same device, one reading status information while the other is writing data. In some cases, users can get a lot done by running a few simple programs through a shell script, as long as they can access the device concurrently. In other words, implementing a single-open behavior amounts to creating policy, which may get in the way of what your users want to do.

Allowing only a single process to open a device has undesirable properties, but it is also the easiest access control to implement for a device driver, so it’s shown here. The source code is extracted from a device called scullsingle.

The open call refuses access based on a global integer flag:

int scull_s_open(struct inode *inode, struct file *filp)
{
  Scull_Dev *dev = &scull_s_device; /* device information */
  int num = NUM(inode->i_rdev);

  if (!filp->private_data && num > 0)
    return -ENODEV; /* not devfs: allow 1 device only */
  spin_lock(&scull_s_lock);
  if (scull_s_count) {
    spin_unlock(&scull_s_lock);
    return -EBUSY; /* already open */
  }
  scull_s_count++;
  spin_unlock(&scull_s_lock);
  /* then, everything else is copied from the bare scull device */

  if ( (filp->f_flags & O_ACCMODE) == O_WRONLY)
    scull_trim(dev);
  if (!filp->private_data)
    filp->private_data = dev;
  MOD_INC_USE_COUNT;
  return 0;     /* success */
}

The close call, on the other hand, marks the device as no longer busy.

int scull_s_release(struct inode *inode, struct file *filp)
{
  scull_s_count--; /* release the device */
  MOD_DEC_USE_COUNT;
  return 0;
}

Normally, we recommend that you put the open flag scull_s_count (with the accompanying spinlock, scull_s_lock, whose role is explained in the next subsection) within the device structure (Scull_Dev here) because, conceptually, it belongs to the device. The scull driver, however, uses standalone variables to hold the flag and the lock in order to use the same device structure and methods as the bare scull device and minimize code duplication.

Another Digression into Race Conditions

Consider once again the test on the variable scull_s_count just shown. Two separate actions are taken there: (1) the value of the variable is tested, and the open is refused if it is not 0, and (2) the variable is incremented to mark the device as taken. On a single-processor system, these tests are safe because no other process will be able to run between the two actions.

As soon as you get into the SMP world, however, a problem arises. If two processes on two processors attempt to open the device simultaneously, it is possible that they could both test the value of scull_s_count before either modifies it. In this scenario you’ll find that, at best, the single-open semantics of the device is not enforced. In the worst case, unexpected concurrent access could create data structure corruption and system crashes.

In other words, we have another race condition here. This one could be solved in much the same way as the races we already saw in Chapter 3. Those race conditions were triggered by access to a status variable of a potentially shared data structure and were solved using semaphores. In general, however, semaphores can be expensive to use, because they can put the calling process to sleep. They are a heavyweight solution for the problem of protecting a quick check on a status variable.

Instead, scullsingle uses a different locking mechanism called a spinlock. Spinlocks will never put a process to sleep. Instead, if a lock is not available, the spinlock primitives will simply retry, over and over (i.e., “spin”), until the lock is freed. Spinlocks thus have very little locking overhead, but they also have the potential to cause a processor to spin for a long time if somebody hogs the lock. Another advantage of spinlocks over semaphores is that their implementation is empty when compiling code for a uniprocessor system (where these SMP-specific races can’t happen). Semaphores are a more general resource that make sense on uniprocessor computers as well as SMP, so they don’t get optimized away in the uniprocessor case.

Spinlocks can be the ideal mechanism for small critical sections. Processes should hold spinlocks for the minimum time possible, and must never sleep while holding a lock. Thus, the main scull driver, which exchanges data with user space and can therefore sleep, is not suitable for a spinlock solution. But spinlocks work nicely for controlling access to scull_s_single (even if they still are not the optimal solution, which we will see in Chapter 9).

Spinlocks are declared with a type of spinlock_t, which is defined in <linux/spinlock.h>. Prior to use, they must be initialized:

 spin_lock_init(spinlock_t *lock);

A process entering a critical section will obtain the lock with spin_lock:

 spin_lock(spinlock_t *lock);

The lock is released at the end with spin_unlock:

 spin_unlock(spinlock_t *lock);

Spinlocks can be more complicated than this, and we’ll get into the details in Chapter 9. But the simple case as shown here suits our needs for now, and all of the access-control variants of scull will use simple spinlocks in this manner.

The astute reader may have noticed that whereas scull_s_open acquires the scull_s_lock lock prior to incrementing the scull_s_count flag, scull_s_close takes no such precautions. This code is safe because no other code will change the value of scull_s_count if it is nonzero, so there will be no conflict with this particular assignment.

Restricting Access to a Single User at a Time

The next step beyond a single system-wide lock is to let a single user open a device in multiple processes but allow only one user to have the device open at a time. This solution makes it easy to test the device, since the user can read and write from several processes at once, but assumes that the user takes some responsibility for maintaining the integrity of the data during multiple accesses. This is accomplished by adding checks in the open method; such checks are performed after the normal permission checking and can only make access more restrictive than that specified by the owner and group permission bits. This is the same access policy as that used for ttys, but it doesn’t resort to an external privileged program.

Those access policies are a little trickier to implement than single-open policies. In this case, two items are needed: an open count and the uid of the “owner” of the device. Once again, the best place for such items is within the device structure; our example uses global variables instead, for the reason explained earlier for scullsingle. The name of the device is sculluid.

The open call grants access on first open, but remembers the owner of the device. This means that a user can open the device multiple times, thus allowing cooperating processes to work concurrently on the device. At the same time, no other user can open it, thus avoiding external interference. Since this version of the function is almost identical to the preceding one, only the relevant part is reproduced here:

 spin_lock(&scull_u_lock);
 if (scull_u_count && 
   (scull_u_owner != current->uid) && /* allow user */
   (scull_u_owner != current->euid) && /* allow whoever did su */
         !capable(CAP_DAC_OVERRIDE)) { /* still allow root */
     spin_unlock(&scull_u_lock);
     return -EBUSY;  /* -EPERM would confuse the user */
 }

 if (scull_u_count == 0)
   scull_u_owner = current->uid; /* grab it */

 scull_u_count++;
 spin_unlock(&scull_u_lock);

We chose to return -EBUSY and not -EPERM, even though the code is performing a permission check, in order to point a user who is denied access in the right direction. The reaction to “Permission denied” is usually to check the mode and owner of the /dev file, while “Device busy” correctly suggests that the user should look for a process already using the device.

This code also checks to see if the process attempting the open has the ability to override file access permissions; if so, the open will be allowed even if the opening process is not the owner of the device. The CAP_DAC_OVERRIDE capability fits the task well in this case.

The code for close is not shown, since all it does is decrement the usage count.

Blocking open as an Alternative to EBUSY

When the device isn’t accessible, returning an error is usually the most sensible approach, but there are situations in which you’d prefer to wait for the device.

For example, if a data communication channel is used both to transmit reports on a timely basis (using crontab) and for casual usage according to people’s needs, it’s much better for the timely report to be slightly delayed rather than fail just because the channel is currently busy.

This is one of the choices that the programmer must make when designing a device driver, and the right answer depends on the particular problem being solved.

The alternative to EBUSY, as you may have guessed, is to implement blocking open.

The scullwuid device is a version of sculluid that waits for the device on open instead of returning -EBUSY. It differs from sculluid only in the following part of the open operation:

 spin_lock(&scull_w_lock);
 while (scull_w_count && 
  (scull_w_owner != current->uid) && /* allow user */
  (scull_w_owner != current->euid) && /* allow whoever did su */
  !capable(CAP_DAC_OVERRIDE)) {
   spin_unlock(&scull_w_lock);
   if (filp->f_flags & O_NONBLOCK) return -EAGAIN; 
   interruptible_sleep_on(&scull_w_wait);
   if (signal_pending(current)) /* a signal arrived */
    return -ERESTARTSYS; /* tell the fs layer to handle it */
   /* else, loop */
   spin_lock(&scull_w_lock);
 }
 if (scull_w_count == 0)
   scull_w_owner = current->uid; /* grab it */
 scull_w_count++;
 spin_unlock(&scull_w_lock);

The implementation is based once again on a wait queue. Wait queues were created to maintain a list of processes that sleep while waiting for an event, so they fit perfectly here.

The release method, then, is in charge of awakening any pending process:

int scull_w_release(struct inode *inode, struct file *filp)
{
  scull_w_count--;
  if (scull_w_count == 0)
    wake_up_interruptible(&scull_w_wait); /* awaken other uid's */
  MOD_DEC_USE_COUNT;
  return 0;
}

The problem with a blocking-open implementation is that it is really unpleasant for the interactive user, who has to keep guessing what is going wrong. The interactive user usually invokes precompiled commands such as cp and tar and can’t just add O_NONBLOCK to the open call. Someone who’s making a backup using the tape drive in the next room would prefer to get a plain “device or resource busy” message instead of being left to guess why the hard drive is so silent today while tar is scanning it.

This kind of problem (different, incompatible policies for the same device) is best solved by implementing one device node for each access policy. An example of this practice can be found in the Linux tape driver, which provides multiple device files for the same device. Different device files will, for example, cause the drive to record with or without compression, or to automatically rewind the tape when the device is closed.

Cloning the Device on Open

Another technique to manage access control is creating different private copies of the device depending on the process opening it.

Clearly this is possible only if the device is not bound to a hardware object; scull is an example of such a “software” device. The internals of /dev/tty use a similar technique in order to give its process a different “view” of what the /dev entry point represents. When copies of the device are created by the software driver, we call them virtual devices—just as virtual consoles use a single physical tty device.

Although this kind of access control is rarely needed, the implementation can be enlightening in showing how easily kernel code can change the application’s perspective of the surrounding world (i.e., the computer). The topic is quite exotic, actually, so if you aren’t interested, you can jump directly to the next section.

The /dev/scullpriv device node implements virtual devices within the scull package. The scullpriv implementation uses the minor number of the process’s controlling tty as a key to access the virtual device. You can nonetheless easily modify the sources to use any integer value for the key; each choice leads to a different policy. For example, using the uid leads to a different virtual device for each user, while using a pid key creates a new device for each process accessing it.

The decision to use the controlling terminal is meant to enable easy testing of the device using input/output redirection: the device is shared by all commands run on the same virtual terminal and is kept separate from the one seen by commands run on another terminal.

The open method looks like the following code. It must look for the right virtual device and possibly create one. The final part of the function is not shown because it is copied from the bare scull, which we’ve already seen.

/* The clone-specific data structure includes a key field */
struct scull_listitem {
  Scull_Dev device;
  int key;
  struct scull_listitem *next;
  
};

/* The list of devices, and a lock to protect it */
struct scull_listitem *scull_c_head;
spinlock_t scull_c_lock;

/* Look for a device or create one if missing */
static Scull_Dev *scull_c_lookfor_device(int key)
{
  struct scull_listitem *lptr, *prev = NULL;

  for (lptr = scull_c_head; lptr && (lptr->key != key); lptr = lptr->next)
    prev=lptr;
  if (lptr) return &(lptr->device);

  /* not found */
  lptr = kmalloc(sizeof(struct scull_listitem), GFP_ATOMIC);
  if (!lptr) return NULL;

  /* initialize the device */
  memset(lptr, 0, sizeof(struct scull_listitem));
  lptr->key = key;
  scull_trim(&(lptr->device)); /* initialize it */
  sema_init(&(lptr->device.sem), 1);

  /* place it in the list */
  if (prev) prev->next = lptr;
  else    scull_c_head = lptr;

  return &(lptr->device);
}

int scull_c_open(struct inode *inode, struct file *filp)
{
  Scull_Dev *dev;
  int key, num = NUM(inode->i_rdev);
 
  if (!filp->private_data && num > 0)
    return -ENODEV; /* not devfs: allow 1 device only */

  if (!current->tty) { 
    PDEBUG("Process \"%s\" has no ctl tty\n",current->comm);
    return -EINVAL;
  }
  key = MINOR(current->tty->device);

  /* look for a scullc device in the list */
  spin_lock(&scull_c_lock);
  dev = scull_c_lookfor_device(key);
  spin_unlock(&scull_c_lock);

  if (!dev) return -ENOMEM;

  /* then, everything else is copied from the bare scull device */

The release method does nothing special. It would normally release the device on last close, but we chose not to maintain an open count in order to simplify the testing of the driver. If the device were released on last close, you wouldn’t be able to read the same data after writing to the device unless a background process were to keep it open. The sample driver takes the easier approach of keeping the data, so that at the next open, you’ll find it there. The devices are released when scull_cleanup is called.

Here’s the release implementation for /dev/scullpriv, which closes the discussion of device methods.

int scull_c_release(struct inode *inode, struct file *filp)
{
  /*
   * Nothing to do, because the device is persistent.
   * A `real' cloned device should be freed on last close
   */
  MOD_DEC_USE_COUNT;
  return 0;
}

Get Linux Device Drivers, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.