Sets and linked lists

Last post when creating rules we used:

nft add rule ip foo bar tcp dport http counter
nft add rule ip foo bar tcp dport https counter

Two rules used for the same command, could be convenient to add http and https into a single structure, right? For this we have sets, instead of the previous two rules we can type:

nft add rule ip foo bar tcp dport { http, https } counter

Where { http, https } is a set. It’s possible to create a named set, where you can add and delete elements as you will:

nft add set ip foo serv_set { type inet_service \; }

The new set is named serv_set and holds elements of type inet_service. To add elements to it:

nft add element ip foo serv_set { http, https }

And to delete http from serv_set:

nft add delete ip foo serv_set { http }

Rules can reference named sets by “@set_name”:

nft add rule ip foo bar tcp dport @serv_set counter

Now we know what are and how to use sets, the elements it holds can be of different types, not necessarily inet_service. The elements a set holds are available in a linked list, nft uses the same implementation of kernel’s linked lists, even though it runs in userspace. The kernel has an official linked list implementation since version 2.1, it was necessary to avoid code duplication and guarantee efficiency.

This list is circular and doubly linked, has a pointer to next and previous elements. An element is represented by the struct list_head:

struct list_head {
struct list_head *next;
struct list_head *prev;

To create a linked list of your own structs you just need to embed struct list_head in it. Taking struct book as example:

struct book {
int        npages;
int        pdate;
char       *name;
char       *author;

To make a list of struct book it becomes:

struct book {
struct list_head    blist;
int                 npages;
int                 pdate;
char                *name;
char                *author;

To iterate on the elements, or to modify the list you just need to write routines, or use the ones available, that manipulate struct list_head. Accessing the element that contains the struct list_head is simple with the macro:

container_of(pointer to list_head, typeof of the struct it is embedded in, name of the list attribute in the struct)

In our example:

container_of(&variable, typeof(struct book), blist)

This implementation is well documented and is available here.

Now, returning to nft sets. Every time the set is created, the elements are stored in a different order, that’s because the kernel uses a hash table with a random seed.  When “nft list ruleset” is called, set elements are returned in a linked list in the order defined during the set creation. Then, when a set is created twice, the calls of “nft list ruleset” might return the elements in a different order.

When tracking your ruleset via git, some changes can be unnecessarily triggered, in case the set has the same elements as before but now listed in a different order. To solve this issue it’s necessary to sort the elements.

There is no standard routine to sort linked lists in C, so I had to implement it. A trivial sort with O(n²) complexity took minutes to list big sets, so, a faster algorithm was needed. I chose Merge sort, it has O(n*log(n)) complexity in all cases.

The algorithm is basically:
• Split the list in two
• Sort the two halves separately
• Merge the two halves sorted

The best part was to implement the comparator of elements, this part is specific to what is being sorted. In nft I had to sort elements of a custom type, won’t enter in much detail now because I want to write about how the structures in nft’s codebase in a future post.

That’s it, here is the patch. See you.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s