Documentation weeks

nftables has two main documentation sources:

nftables wiki, the wiki provides an example oriented documentation, so the user can see how the features are useful in practice. Usually the wiki also states which Kernel and nft versions are needed for each feature. Also, since many nftables users come from iptables, it is useful to compare a feature to the one it replaces in iptables.
nft manpages, the manpages are directed to users who have some experience with the software, usually the grammar of a feature is displayed and the existent values for each component listed, along with a short description.

These past two weeks were all about documenting parts I helped implementing and others which I didn’t. Providing a good documentation is tricky, you should put yourself in the user shoes and write what’s relevant to them.

I have a feeling that documenting a feature you didn’t work leads to better results, since you don’t need to make an effort to visualize the system as an unexperienced user does. However, it is a lot harder, when you are writing references for a feature it usually means you can’t find other references except on git log and the code itself.

It feels similar to hunting bugs, actually odds are you find some in the process, or at least some unexpected behavior. I found a few places I thought worthy of improvements but this thought didn’t ripen, the reason being it provides less benefits than loss to fix them. In these past two weeks I’ve seen this a few times, after some thinking and tracking the code changes I’d see they are planned behaviors, using git blame and git log you can track the reason for the changes and often they’re a trade, an undesired behavior is allowed (when it doesn’t brake things) to avoid code duplication or too much complexity. Guess I should change my mindset to optimizing for simplicity and code maintenance.

Even though most of the “bugs” weren’t real bugs, I think I found one that really is and will try to fix it for now, see you.

Bugs solving week

There is only one week since my last post, so this is a short one.

Last week was focused on searching bugs and solving the ones I’m able to, some of them were suggested by my mentor and others I tried to choose by myself, wasn’t very lucky with those.

A good(?) thing about bugs is that they happen in every part of the system and you must chase them wherever they are, including places you’re not comfortable in.

For example, one of the bugs was a dependency issue, usually the building process follows this flow (when autoconf is used):

sh autogen.sh   (1)
./configure          (2)
make                   (3)
make install       (4)

There is usually a file named configure.ac, which contains system specifications and dependencies; this file is used in (1) to generate a configure script, which by its turn will be used in (2) to create the Makefile, needed in (3) to compile your files together. Finally (4) puts the resulting file in a appropriate place and the program is ready to be executed.

It’s expected that if ./configure finishes without errors then make and make install also will, however, that wasn’t always the case in nftables. To solve this bug I just needed to change the dependencies in configure.ac, the fix patch is a boring oneliner, the fun is in reproducing the bug and testing it.

To check the version of a dependency, configure.ac uses PKG_CHECK_MODULES(), this macro searches the dependency in some specific folders (read man pkg-config). It’s up to the developers to provide a .pc file when the software is installed, so pkg-config can find it; sometimes they don’t and you have two options, search for a source which does or write this file yourself, see what xtables.pc looks like:

prefix=/usr/local
exec_prefix=${prefix}
libdir=${exec_prefix}/lib
xtlibdir=${exec_prefix}/lib/xtables
includedir=${prefix}/include

Name:           xtables
Description:    Shared Xtables code for extensions and iproute2
Version:        1.6.1
Cflags:         -I${includedir}
Libs:           -L${libdir} -lxtables
Libs.private:   -ldl

Also, sometimes you upgrade or downgrade a library and the .pc file isn’t updated, what misleads your configure script and may cause unwanted behavior, be careful about it.

Other bugs were less interesting, two of them were only a table presentation fix and the last one I couldn’t reproduce, even after a lot of code digging and configuration changes, apparently it vanished somehow within the updates – and not much information was given, what makes reproducing it harder.

I’m still working on one, actually it’s a request for a new small feature for the parser, will enter in details later when I have some conclusion about it. See you.

Stateful objects, ICMP, bugs and tests

Sorry for the delay to post, this was a full week. Moving out is a laborious task, fortunately it is over now and I found time and inspiration to blog.

The past weeks were focused on a few good tasks, I’ll talk about them separately. One point they share is the need to track the execution path in code, many git greps and printfs were used but I think now I got a much better view of how everything works. When I need to see what a specific part of the execution does, I nearly always know instinctively what file/routine to analyze; this applies to nftables and libnftnl, in the kernel code I can’t always find my way easily, that’s a work in progress.

Now, about the actual tasks, one of them dealt with ICMP headers.

You can build rules in nftables to filter ICMP packets based on its header fields. Wikipedia has a good article about this protocol and shows that some header fields are variable. The meaning of the last 4 btyes depends on the fields type and code, the same field can represent mtu and sequence for example. But, this is a normal behavior, why this is relevant? Because nftables currently matches the offset of the field, to know which field to display on list ruleset. You can see this adding the rule:

$ nft add rule filter input icmp mtu 33 accept
$ nft list ruleset

<…>
icmp sequence 33 accept
<…>

Like I said, it matches the offset to know the field, since the fields mtu and sequence have the same offset they are displayed the same – in this case as sequence. I don’t think it spoils the filtering, the system will filter mtu, even though it displays sequence, although I’m not 100% sure on how the kernel handle this kind of rule. I say this because when a rule is written it has the right field on it, and the right message is sent to the kernel, which will set up the filtering. The problem happens on list ruleset, nft asks the active rules and the kernel return some structures. With these structures nft builds and displays the ruleset; when the rule is about ICMP header fields it has only the offset field available, and based on the offset the field name is chosen.

I spent more time that I’d like to admit to find the exact routine, where the field name is chosen, and to understand how the header matching works. Went trough the whole process of matching a rule, evaluating it, linearizing it to send netlkink messages to the kernel, and finally doing something similar to list the rule. My conclusion is: with the available information, received from the kernel, it’s not possible to always display the right field. I think we need to add a new field to the message describing the rule the kernel returns.

I added a new field to header structures, on the corresponding parts of the nft, libnftnl and kernel code; After a little debugging, the ICMP fields were displayed as expected with list ruleset. However, the changes were a little intrusive and I can’t evaluate the side effects they have, also, maybe there is a simpler way that I didn’t see. The patches are still being evaluated.

That’s it for ICMP, in summary, the fix I proposed wasn’t applied yet, maybe never will, but making it taught me a lot about how the system works. It was good for learning, hopefully the report was useful to at least provide a new insight about the problem.

The next task was about stateful objects, this one yielded patches after some time invested. Stateful objects are a new feature of nftables, you can read about them here. In a few words, they make counters and quotas, also limits in the future, independent from rules, to help organizing the ruleset. Most of it was implemented in the linked patchset, but some features lacked the code, in nft, to work.

The main feature needed was to reset a single object in a table, a provisional patch was available to base the changes on. This provisional patch proved to be almost ready to join the codebase, only needed improvement on evaluating the command before executing and some testing. Then I worked on listing a single object, and reseting and listing all objects in a table, all related to the first one. After those new features I went to create some tests for stateful objects, the testing system in nft is divided in python and shell tests.

Shell tests are used to test for high level functionalities and bugs, sometimes when a bug is disclosed a shell test is created to make sure this bug never appears again. I wrote a shell test for a known bug last week, and while thinking and experimenting I found another one. Usually the bugs of nftables are archived here, until they’re cleared. The new found bug was solved by Pablo, in a lot less time than it took me to experiment and write the bug report, and also inspired a shell test.

Python tests are more focused on functionalities and system behavior, at a lower level than shell tests, it simulates the end user. Many of the possible rules are tested, an example is create a set and reference it in a rule. What I said about ICMP and the header fields is tested in this suit, and results in many warnings because a mtu rule is listed as sequence.

Before sending a patch to the mailing list you must run both test suits. If it triggers a new error, then you better not send it; I did, on my first patch, tested a lot before sending but wasn’t aware of the automated tests and broke some of them with the patch. A few hours later I received a friendly warning to never do it again, never did. In fact, the past week I helped on making new tests, the shell tests mentioned and a some pytests for stateful objects.

As I said, stateful objects are a new feature, there were no tests for it, actually, the python tests had no support for adding stateful objects to them. Then, the first step was to provide this support, modifying the script that read the testcases so it allows adding objects to tables and referencing them in rules. Next step was to create the actual tests, they test for adding objects to tables and creating simple rules with them. Having tests to detect bugs is very helpful, sometimes even when the bug has no solution yet it’s useful to create a test for it, to remind that the problem exists and must be addressed.

Although this past week was full of unrelated but urgent issues, I liked those past three weeks very much – this one isn’t over yet, the weekend will be full of bug solving :), I did work on many different parts of the system and get used with them.

Once I said about having a post dedicated to how nftables is organized under the sheets, a kind of guide aimed to those starting on it as developers. I feel more comfortable to start it now, probably this week I’ll begin it and update it when its needed. See you.

Sets and linked lists

Last post when creating rules we used:

nft add rule ip foo bar tcp dport http counter
nft add rule ip foo bar tcp dport https counter

Two rules used for the same command, could be convenient to add http and https into a single structure, right? For this we have sets, instead of the previous two rules we can type:

nft add rule ip foo bar tcp dport { http, https } counter

Where { http, https } is a set. It’s possible to create a named set, where you can add and delete elements as you will:

nft add set ip foo serv_set { type inet_service \; }

The new set is named serv_set and holds elements of type inet_service. To add elements to it:

nft add element ip foo serv_set { http, https }

And to delete http from serv_set:

nft add delete ip foo serv_set { http }

Rules can reference named sets by “@set_name”:

nft add rule ip foo bar tcp dport @serv_set counter

Now we know what are and how to use sets, the elements it holds can be of different types, not necessarily inet_service. The elements a set holds are available in a linked list, nft uses the same implementation of kernel’s linked lists, even though it runs in userspace. The kernel has an official linked list implementation since version 2.1, it was necessary to avoid code duplication and guarantee efficiency.

This list is circular and doubly linked, has a pointer to next and previous elements. An element is represented by the struct list_head:

struct list_head {
struct list_head *next;
struct list_head *prev;
};

To create a linked list of your own structs you just need to embed struct list_head in it. Taking struct book as example:

struct book {
int        npages;
int        pdate;
char       *name;
char       *author;
};

To make a list of struct book it becomes:

struct book {
struct list_head    blist;
int                 npages;
int                 pdate;
char                *name;
char                *author;
};

To iterate on the elements, or to modify the list you just need to write routines, or use the ones available, that manipulate struct list_head. Accessing the element that contains the struct list_head is simple with the macro:

container_of(pointer to list_head, typeof of the struct it is embedded in, name of the list attribute in the struct)

In our example:

container_of(&variable, typeof(struct book), blist)

This implementation is well documented and is available here.

Now, returning to nft sets. Every time the set is created, the elements are stored in a different order, that’s because the kernel uses a hash table with a random seed. When “nft list ruleset” is called, set elements are returned in a linked list in the order defined during the set creation. Then, when a set is created twice, the calls of “nft list ruleset” might return the elements in a different order.

When tracking your ruleset via git, some changes can be unnecessarily triggered, in case the set has the same elements as before but now listed in a different order. To solve this issue it’s necessary to sort the elements.

There is no standard routine to sort linked lists in C, so I had to implement it. A trivial sort with O(n²) complexity took minutes to list big sets, so, a faster algorithm was needed. I chose Merge sort, it has O(n*log(n)) complexity in all cases.

The algorithm is basically:
• Split the list in two
• Sort the two halves separately
• Merge the two halves sorted

The best part was to implement the comparator of elements, this part is specific to what is being sorted. In nft I had to sort elements of a custom type, won’t enter in much detail now because I want to write about how the structures in nft’s codebase in a future post.

That’s it, here is the patch. See you.

NFtables

Now it’s time to talk about nftables. Quoting nftables wiki:

“nftables is the new packet classification framework that intends to replace the existing {ip,ip6,arp,eb}_tables infrastructure.”

Using it you can filter network traffic on your machine:

Configure firewalls, to accept or drop (among other features) packets based on port numbers and addresses, this is high customizable;
Control traffic flow rate;
Log traffic;
Perform NAT;
Many other things.

To organize packets classification, nftables defines many data structures.

Tables are the base of nftables, not surprised, are we? You can have many of them, they determine the family of packets (ip, ip6…) and hold other structures. To create a new table just type on command line:

nft add table ip foo

Where ip is the family of the table and foo is its name.

A table alone has no use, they must be populated with chains, sets or maps. Let’s forget maps and sets for now and look at chains first, they hold rules (the ones in charge of packets classification) and determine the type (filtering, rerouting…) and hook (input, output…) of classification. Try it with:

nft add chain ip foo bar {type filter hook output priority 0 \;}

Again, ip is the family of table foo and bar is the name of this chain. The code within brackets says it’ll filter packets originated in the local system (output), the priority determine the precedence of chains in case of conflicting rules.

Still, nothing happens, we need rules. Rules go inside chains and determine the action which packets will trigger, there are many possible rules, read this reference to create yours, it also has info for chains and tables. Simple rules are:

nft add rule ip foo bar tcp dport http counter

nft add rule ip foo bar tcp dport https counter

These rules counter the packets that leave the local system, using tcp protocol with destination ports http (80) and https (443). To see it working type:

nft list ruleset

If you have a browser sending page requests, you should see the numbers increasing with time to time. Also it displays your tables, chains and rules.

You can save the ruleset in a text file to load it again later:

nft list ruleset > tmp

The file tmp can be loaded with:

nft -f tmp

To destroy the ruleset use flush command:

nft flush ruleset

That’s a quick and simplified view on how nftables is used, I’ll use it as reference for future posts. When talking about chains, I said to forget about sets and maps, this week I worked with them and they shall appear soon, on the following post.
To a better understanding of nftables I recommend this tutorial. And for further reference, including install instructions, refer to wiki-nftables.