The Socket Buffers

We’ve now discussed most of the issues related to network interfaces. What’s still missing is some more detailed discussion of the sk_buff structure. The structure is at the core of the network subsystem of the Linux kernel, and we now introduce both the main fields of the structure and the functions used to act on it.

Although there is no strict need to understand the internals of sk_buff, the ability to look at its contents can be helpful when you are tracking down problems and when you are trying to optimize the code. For example, if you look in loopback.c, you’ll find an optimization based on knowledge of the sk_buff internals. The usual warning applies here: if you write code that takes advantage of knowledge of the sk_buff structure, you should be prepared to see it break with future kernel releases. Still, sometimes the performance advantages justify the additional maintenance cost.

We are not going to describe the whole structure here, just the fields that might be used from within a driver. If you want to see more, you can look at <linux/skbuff.h>, where the structure is defined and the functions are prototyped. Additional details about how the fields and functions are used can be easily retrieved by grepping in the kernel sources.

The Important Fields

The fields introduced here are the ones a driver might need to access. They are listed in no particular order.

struct net_device *rx_dev; , struct net_device *dev;

The devices receiving and sending this buffer, respectively.

union { /* ... */ } h; , union { /* ... */ } nh; , union { /*... */} mac;

Pointers to the various levels of headers contained within the packet. Each field of the unions is a pointer to a different type of data structure. h hosts pointers to transport layer headers (for example, struct tcphdr *th); nh includes network layer headers (such as struct iphdr *iph); and mac collects pointers to link layer headers (such as struct ethdr *ethernet).

If your driver needs to look at the source and destination addresses of a TCP packet, it can find them in skb->h.th. See the header file for the full set of header types that can be accessed in this way.

Note that network drivers are responsible for setting the mac pointer for incoming packets. This task is normally handled by ether_type_trans, but non-Ethernet drivers will have to set skb->mac.raw directly, as shown later in Section 14.10.3.

unsigned char *head; , unsigned char *data; , unsigned char *tail; , unsigned char *end;

Pointers used to address the data in the packet. head points to the beginning of the allocated space, data is the beginning of the valid octets (and is usually slightly greater than head), tail is the end of the valid octets, and end points to the maximum address tail can reach. Another way to look at it is that the available buffer space is skb->end - skb->head, and the currently used data space is skb->tail - skb->data.

unsigned long len;

The length of the data itself (skb->tail - skb->data).

unsigned char ip_summed;

The checksum policy for this packet. The field is set by the driver on incoming packets, as was described in Section 14.6.

unsigned char pkt_type;

Packet classification used in delivering it. The driver is responsible for setting it to PACKET_HOST (this packet is for me), PACKET_BROADCAST, PACKET_MULTICAST, or PACKET_OTHERHOST (no, this packet is not for me). Ethernet drivers don’t modify pkt_type explicitly because eth_type_trans does it for them.

The remaining fields in the structure are not particularly interesting. They are used to maintain lists of buffers, to account for memory belonging to the socket that owns the buffer, and so on.

Functions Acting on Socket Buffers

Network devices that use a sock_buff act on the structure by means of the official interface functions. Many functions operate on socket buffers; here are the most interesting ones:

struct sk_buff *alloc_skb(unsigned int len, int priority); , struct sk_buff *dev_alloc_skb(unsigned int len);

Allocate a buffer. The alloc_skb function allocates a buffer and initializes both skb->data and skb->tail to skb->head. The dev_alloc_skb function is a shortcut that calls alloc_skb with GFP_ATOMIC priority and reserves some space between skb->head and skb->data. This data space is used for optimizations within the network layer and should not be touched by the driver.

void kfree_skb(struct sk_buff *skb); , void dev_kfree_skb(struct sk_buff *skb);

Free a buffer. The kfree_skb call is used internally by the kernel. A driver should use dev_kfree_skb instead, which is intended to be safe to call from driver context.

unsigned char *skb_put(struct sk_buff *skb, int len); , unsigned char *__skb_put(struct sk_buff *skb, int len);

These inline functions update the tail and len fields of the sk_buff structure; they are used to add data to the end of the buffer. Each function’s return value is the previous value of skb->tail (in other words, it points to the data space just created). Drivers can use the return value to copy data by invoking ins(ioaddr, skb_put(...)) or memcpy(skb_put(...), data, len). The difference between the two functions is that skb_put checks to be sure that the data will fit in the buffer, whereas __skb_put omits the check.

unsigned char *skb_push(struct sk_buff *skb, int len); , unsigned char *__skb_push(struct sk_buff *skb, int len);

These functions decrement skb->data and increment skb->len. They are similar to skb_put, except that data is added to the beginning of the packet instead of the end. The return value points to the data space just created. The functions are used to add a hardware header before transmitting a packet. Once again, __skb_push differs in that it does not check for adequate available space.

int skb_tailroom(struct sk_buff *skb);

This function returns the amount of space available for putting data in the buffer. If a driver puts more data into the buffer than it can hold, the system panics. Although you might object that a printk would be sufficient to tag the error, memory corruption is so harmful to the system that the developers decided to take definitive action. In practice, you shouldn’t need to check the available space if the buffer has been correctly allocated. Since drivers usually get the packet size before allocating a buffer, only a severely broken driver will put too much data in the buffer, and a panic might be seen as due punishment.

int skb_headroom(struct sk_buff *skb);

Returns the amount of space available in front of data, that is, how many octets one can “push” to the buffer.

void skb_reserve(struct sk_buff *skb, int len);

This function increments both data and tail. The function can be used to reserve headroom before filling the buffer. Most Ethernet interfaces reserve 2 bytes in front of the packet; thus, the IP header is aligned on a 16-byte boundary, after a 14-byte Ethernet header. snull does this as well, although the instruction was not shown in Section 14.6 to avoid introducing extra concepts at that point.

unsigned char *skb_pull(struct sk_buff *skb, int len);

Removes data from the head of the packet. The driver won’t need to use this function, but it is included here for completeness. It decrements skb->len and increments skb->data; this is how the hardware header (Ethernet or equivalent) is stripped from the beginning of incoming packets.

The kernel defines several other functions that act on socket buffers, but they are meant to be used in higher layers of networking code, and the driver won’t need them.

Get Linux Device Drivers, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.