Using I/O Memory

Despite the popularity of I/O ports in the x86 world, the main mechanism used to communicate with devices is through memory-mapped registers and device memory. Both are called I/O memory because the difference between registers and memory is transparent to software.

I/O memory is simply a region of RAM-like locations that the device makes available to the processor over the bus. This memory can be used for a number of purposes, such as holding video data or Ethernet packets, as well as implementing device registers that behave just like I/O ports (i.e., they have side effects associated with reading and writing them).

The way used to access I/O memory depends on the computer architecture, bus, and device being used, though the principles are the same everywhere. The discussion in this chapter touches mainly on ISA and PCI memory, while trying to convey general information as well. Although access to PCI memory is introduced here, a thorough discussion of PCI is deferred to Chapter 15.

According to the computer platform and bus being used, I/O memory may or may not be accessed through page tables. When access passes though page tables, the kernel must first arrange for the physical address to be visible from your driver (this usually means that you must call ioremap before doing any I/O). If no page tables are needed, then I/O memory locations look pretty much like I/O ports, and you can just read and write to them using proper wrapper functions.

Whether or not ioremap is required to access I/O memory, direct use of pointers to I/O memory is a discouraged practice. Even though (as introduced in Section 8.1) I/O memory is addressed like normal RAM at hardware level, the extra care outlined in Section 8.1.1 suggests avoiding normal pointers. The wrapper functions used to access I/O memory are both safe on all platforms and optimized away whenever straight pointer dereferencing can perform the operation.

Therefore, even though dereferencing a pointer works (for now) on the x86, failure to use the proper macros will hinder the portability and readability of the driver.

Remember from Chapter 2 that device memory regions must be allocated prior to use. This is similar to how I/O ports are registered and is accomplished by the following functions:

int check_mem_region(unsigned long start, unsigned long len);
void request_mem_region(unsigned long start, unsigned long len, 
char *name);
void release_mem_region(unsigned long start, unsigned long len);

The start argument to pass to the functions is the physical address of the memory region, before any remapping takes place. The functions would normally be used in a manner such as the following:

if (check_mem_region(mem_addr, mem_size)) {
    printk("drivername: memory already in use\n");
    return -EBUSY;
}
    request_mem_region(mem_addr, mem_size, "drivername");

    [...]

    release_mem_region(mem_addr, mem_size);

Directly Mapped Memory

Several computer platforms reserve part of their memory address space for I/O locations, and automatically disable memory management for any (virtual) address in that memory range.

The MIPS processors used in personal digital assistants (PDAs) offer an interesting example of this setup. Two address ranges, 512 MB each, are directly mapped to physical addresses. Any memory access to either of those address ranges bypasses the MMU, and any access to one of those ranges bypasses the cache as well. A section of these 512 megabytes is reserved for peripheral devices, and drivers can access their I/O memory directly by using the noncached address range.

Other platforms have other means to offer directly mapped address ranges: some of them have special address spaces to dereference physical addresses (for example, SPARC64 uses a special “address space identifier” for this aim), and others use virtual addresses set up to bypass processor caches.

When you need to access a directly mapped I/O memory area, you still shouldn’t dereference your I/O pointers, even though, on some architectures, you may well be able to get away with doing exactly that. To write code that will work across systems and kernel versions, however, you must avoid direct accesses and instead use the following functions.

unsigned readb(address); , unsigned readw(address); , unsigned readl(address);

These macros are used to retrieve 8-bit, 16-bit, and 32-bit data values from I/O memory. The advantage of using macros is the typelessness of the argument: address is cast before being used, because the value “is not clearly either an integer or a pointer, and we will accept both” (from asm-alpha/io.h). Neither the reading nor the writing functions check the validity of address, because they are meant to be as fast as pointer dereferencing (we already know that sometimes they actually expand into pointer dereferencing).

void writeb(unsigned value, address); , void writew(unsigned value, address); , void writel(unsigned value, address);

Like the previous functions, these functions (macros) are used to write 8-bit, 16-bit, and 32-bit data items.

memset_io(address, value, count);

When you need to call memset on I/O memory, this function does what you need, while keeping the semantics of the original memset.

memcpy_fromio(dest, source, num); , memcpy_toio(dest, source, num);

These functions move blocks of data to and from I/O memory and behave like the C library routine memcpy.

In modern versions of the kernel, these functions are available across all architectures. The implementation will vary, however; on some they are macros that expand to pointer operations, and on others they are real functions. As a driver writer, however, you need not worry about how they work, as long as you use them.

Some 64-bit platforms also offer readq and writeq, for quad-word (eight-byte) memory operations on the PCI bus. The quad-word nomenclature is a historical leftover from the times when all real processors had 16-bit words. Actually, the L naming used for 32-bit values has become incorrect too, but renaming everything would make things still more confused.

Reusing short for I/O Memory

The short sample module, introduced earlier to access I/O ports, can be used to access I/O memory as well. To this aim, you must tell it to use I/O memory at load time; also, you’ll need to change the base address to make it point to your I/O region.

For example, this is how we used short to light the debug LEDs on a MIPS development board:

 mips.root#./short_load use_mem=1 base=0xb7ffffc0
 mips.root# echo -n 7 > /dev/short0

Use of short for I/O memory is the same as it is for I/O ports; however, since no pausing or string instructions exist for I/O memory, access to /dev/short0p and /dev/short0s performs the same operation as /dev/short0.

The following fragment shows the loop used by short in writing to a memory location:

while (count--) {
    writeb(*(ptr++), address);
    wmb();
}

Note the use of a write memory barrier here. Because writeb likely turns into a direct assignment on many architectures, the memory barrier is needed to ensure that the writes happen in the expected order.

Software-Mapped I/O Memory

The MIPS class of processors notwithstanding, directly mapped I/O memory is pretty rare in the current platform arena; this is especially true when a peripheral bus is used with memory-mapped devices (which is most of the time).

The most common hardware and software arrangement for I/O memory is this: devices live at well-known physical addresses, but the CPU has no predefined virtual address to access them. The well-known physical address can be either hardwired in the device or assigned by system firmware at boot time. The former is true, for example, of ISA devices, whose addresses are either burned in device logic circuits, statically assigned in local device memory, or set by means of physical jumpers. The latter is true of PCI devices, whose addresses are assigned by system software and written to device memory, where they persist only while the device is powered on.

Either way, for software to access I/O memory, there must be a way to assign a virtual address to the device. This is the role of the ioremap function, introduced in Section 7.4. The function, which was covered in the previous chapter because it is related to memory use, is designed specifically to assign virtual addresses to I/O memory regions. Moreover, kernel developers implemented ioremap so that it doesn’t do anything if applied to directly mapped I/O addresses.

Once equipped with ioremap (and iounmap), a device driver can access any I/O memory address, whether it is directly mapped to virtual address space or not. Remember, though, that these addresses should not be dereferenced directly; instead, functions like readb should be used. We could thus arrange short to work with both MIPS I/O memory and the more common ISA/PCI x86 memory by equipping the module with ioremap/iounmap calls whenever the use_mem parameter is set.

Before we show how short calls the functions, we’d better review the prototypes of the functions and introduce a few details that we passed over in the previous chapter.

The functions are called according to the following definition:

#include <asm/io.h>
void *ioremap(unsigned long phys_addr, unsigned long size);
void *ioremap_nocache(unsigned long phys_addr, unsigned long size);
void iounmap(void * addr);

First of all, you’ll notice the new function ioremap_nocache. We didn’t cover it in Chapter 7, because its meaning is definitely hardware related. Quoting from one of the kernel headers: “It’s useful if some control registers are in such an area and write combining or read caching is not desirable.” Actually, the function’s implementation is identical to ioremap on most computer platforms: in situations in which all of I/O memory is already visible through noncacheable addresses, there’s no reason to implement a separate, noncaching version of ioremap.

Another important feature of ioremap is the different behavior of the 2.0 version with respect to later ones. Under Linux 2.0, the function (called, remember, vremap at the time) refused to remap any non-page-aligned memory region. This was a sensible choice, since at CPU level everything happens with page-sized granularity. However, sometimes you need to map small regions of I/O registers whose (physical) address is not page aligned. To fit this new need, version 2.1.131 and later of the kernel are able to remap unaligned addresses.

Our short module, in order to be backward portable to version 2.0 and to be able to access non-page-aligned registers, includes the following code instead of calling ioremap directly:

/* Remap a not (necessarily) aligned port region */
void *short_remap(unsigned long phys_addr)
{
    /* The code comes mainly from arch/any/mm/ioremap.c */
    unsigned long offset, last_addr, size;

    last_addr = phys_addr + SHORT_NR_PORTS - 1;
    offset = phys_addr & ~PAGE_MASK;
    
    /* Adjust the begin and end to remap a full page */
    phys_addr &= PAGE_MASK;
    size = PAGE_ALIGN(last_addr) - phys_addr;
    return ioremap(phys_addr, size) + offset;
}

/* Unmap a region obtained with short_remap */
void short_unmap(void *virt_add)
{
    iounmap((void *)((unsigned long)virt_add & PAGE_MASK));
}

ISA Memory Below 1 MB

One of the most well-known I/O memory regions is the ISA range as found on personal computers. This is the memory range between 640 KB (0xA0000) and 1 MB (0x100000). It thus appears right in the middle of regular system RAM. This positioning may seem a little strange; it is an artifact of a decision made in the early 1980s, when 640 KB of memory seemed like more than anybody would ever be able to use.

This memory range belongs to the non-directly-mapped class of memory.[35] You can read/write a few bytes in that memory range using the short module as explained previously, that is, by setting use_mem at load time.

Although ISA I/O memory exists only in x86-class computers, we think it’s worth spending a few words and a sample driver on it.

We are not going to discuss PCI memory in this chapter, since it is the cleanest kind of I/O memory: once you know the physical address you can simply remap and access it. The “problem” with PCI I/O memory is that it doesn’t lend itself to a working example for this chapter, because we can’t know in advance the physical addresses your PCI memory is mapped to, nor whether it’s safe to access either of those ranges. We chose to describe the ISA memory range because it’s both less clean and more suitable to running sample code.

To demonstrate access to ISA memory, we will make use of yet another silly little module (part of the sample sources). In fact, this one is called silly, as an acronym for Simple Tool for Unloading and Printing ISA Data, or something like that.

The module supplements the functionality of short by giving access to the whole 384-KB memory space and by showing all the different I/O functions. It features four device nodes that perform the same task using different data transfer functions. The silly devices act as a window over I/O memory, in a way similar to /dev/mem. You can read and write data, and lseek to an arbitrary I/O memory address.

Because silly provides access to ISA memory, it must start by mapping the physical ISA addresses into kernel virtual addresses. In the early days of the Linux kernel, one could simply assign a pointer to an ISA address of interest, then dereference it directly. In the modern world, though, we must work with the virtual memory system and remap the memory range first. This mapping is done with ioremap, as explained earlier for short:

#define ISA_BASE    0xA0000
#define ISA_MAX    0x100000  /* for general memory access */

    /* this line appears in silly_init */
    io_base = ioremap(ISA_BASE, ISA_MAX - ISA_BASE);

ioremap returns a pointer value that can be used with readb and the other functions explained in Section 8.4.1.

Let’s look back at our sample module to see how these functions might be used. /dev/sillyb, featuring minor number 0, accesses I/O memory with readb and writeb. The following code shows the implementation for read, which makes the address range 0xA0000-0xFFFFF available as a virtual file in the range 0-0x5FFFF. The read function is structured as a switch statement over the different access modes; here is the sillyb case:

case M_8: 
  while (count) {
      *ptr = readb(add);
      add++; count--; ptr++;
  }
  break;

The next two devices are /dev/sillyw (minor number 1) and /dev/sillyl (minor number 2). They act like /dev/sillyb, except that they use 16-bit and 32-bit functions. Here’s the write implementation of sillyl, again part of a switch:

case M_32: 
  while (count >= 4) {
      writel(*(u32 *)ptr, add);
      add+=4; count-=4; ptr+=4;
  }
  break;

The last device is /dev/sillycp (minor number 3), which uses the memcpy_*io functions to perform the same task. Here’s the core of its read implementation:

case M_memcpy:
  memcpy_fromio(ptr, add, count);
  break;

Because ioremap was used to provide access to the ISA memory area, silly must invoke iounmap when the module is unloaded:

iounmap(io_base);

isa_readb and Friends

A look at the kernel source will turn up another set of routines with names like isa_readb. In fact, each of the functions just described has an isa_ equivalent. These functions provide access to ISA memory without the need for a separate ioremap step. The word from the kernel developers, however, is that these functions are intended to be temporary driver-porting aids, and that they may go away in the future. Their use is thus best avoided.

Probing for ISA Memory

Even though most modern devices rely on better I/O bus architectures, like PCI, sometimes programmers must still deal with ISA devices and their I/O memory, so we’ll spend a page on this issue. We won’t touch high ISA memory (the so-called memory hole in the 14 MB to 16 MB physical address range), because that kind of I/O memory is extremely rare nowadays and is not supported by the majority of modern motherboards or by the kernel. To access that range of I/O memory you’d need to hack the kernel initialization sequence, and that is better not covered here.

When using ISA memory-mapped devices, the driver writer often ignores where relevant I/O memory is located in the physical address space, since the actual address is usually assigned by the user among a range of possible addresses. Or it may be necessary simply to see if a device is present at a given address or not.

The memory resource management scheme can be helpful in probing, since it will identify regions of memory that have already been claimed by another driver. The resource manager, however, cannot tell you about devices whose drivers have not been loaded, or whether a given region contains the device that you are interested in. Thus, it can still be necessary to actually probe memory to see what is there. There are three distinct cases that you will encounter: that RAM is mapped to the address, that ROM is there (the VGA BIOS, for example), or that the area is free.

The skull sample source shows a way to deal with such memory, but since skull is not related to any physical device, it just prints information about the 640 KB to 1 MB memory region and then exits. However, the code used to analyze memory is worth describing, since it shows how memory probes can be done.

The code to check for RAM segments makes use of cli to disable interrupts, because these segments can be identified only by physically writing and rereading data, and real RAM might be changed by an interrupt handler in the middle of our tests. The following code is not completely foolproof, because it might mistake RAM memory on acquisition boards for empty regions if a device is actively writing to its own memory while this code is scanning the area. However, this situation is quite unlikely to happen.

unsigned char oldval, newval; /* values read from memory   */
unsigned long flags;          /* used to hold system flags */
unsigned long add, i;
void *base;
    
/* Use ioremap to get a handle on our region */
base = ioremap(ISA_REGION_BEGIN, ISA_REGION_END - ISA_REGION_BEGIN);
base -= ISA_REGION_BEGIN;  /* Do the offset once */
    
/* probe all the memory hole in 2-KB steps */
for (add = ISA_REGION_BEGIN; add < ISA_REGION_END; add += STEP) {
   /*
    * Check for an already allocated region.
    */
   if (check_mem_region (add, 2048)) {
          printk(KERN_INFO "%lx: Allocated\n", add);
          continue;
   }
   /*
    * Read and write the beginning of the region and see what happens.
    */
   save_flags(flags); 
   cli();
   oldval = readb (base + add);  /* Read a byte */
   writeb (oldval^0xff, base + add);
   mb();
   newval = readb (base + add);
   writeb (oldval, base + add);
   restore_flags(flags);

   if ((oldval^newval) == 0xff) {  /* we reread our change: it's RAM */
       printk(KERN_INFO "%lx: RAM\n", add);
       continue;
   }
   if ((oldval^newval) != 0) {  /* random bits changed: it's empty */
       printk(KERN_INFO "%lx: empty\n", add);
       continue;
   }
	
   /*
    * Expansion ROM (executed at boot time by the BIOS)
    * has a signature where the first byte is 0x55, the second 0xaa,
    * and the third byte indicates the size of such ROM
    */
   if ( (oldval == 0x55) && (readb (base + add + 1) == 0xaa)) {
       int size = 512 * readb (base + add + 2);
       printk(KERN_INFO "%lx: Expansion ROM, %i bytes\n",
              add, size);
       add += (size & ~2048) - 2048; /* skip it */
       continue;
   }
	
   /*
    * If the tests above failed, we still don't know if it is ROM or
    * empty. Since empty memory can appear as 0x00, 0xff, or the low
    * address byte, we must probe multiple bytes: if at least one of
    * them is different from these three values, then this is ROM
    * (though not boot ROM).
    */
   printk(KERN_INFO "%lx: ", add);
   for (i=0; i<5; i++) {
       unsigned long radd = add + 57*(i+1);  /* a "random" value */
       unsigned char val = readb (base + radd);
       if (val && val != 0xFF && val != ((unsigned long) radd&0xFF))
          break;
   }    
   printk("%s\n", i==5 ? "empty" : "ROM");
}

Detecting memory doesn’t cause collisions with other devices, as long as you take care to restore any byte you modified while you were probing. It is worth noting that it is always possible that writing to another device’s memory will cause that device to do something undesirable. In general, this method of probing memory should be avoided if possible, but it’s not always possible when dealing with older hardware.



[35] Actually, this is not completely true. The memory range is so small and so frequently used that the kernel builds page tables at boot time to access those addresses. However, the virtual address used to access them is not the same as the physical address, and thus ioremap is needed anyway. Moreover, version 2.0 of the kernel had that range directly mapped. See Section 8.5 for 2.0 issues.

Get Linux Device Drivers, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.