IPtables-translate, JSON and misfortune

In these past weeks I started with an iptables translation.

We know that nftables is here to replace iptables, so it is natural that many of its users have their preferred rules set with iptables, and would appreciate to have an easy way to set a similar ruleset using nftables, for this iptables-translate is provided.

Using iptables-translate is very simple, you just need to write your iptables rule and it outputs a similar rule in nftables if the rule is supported, if not then the iptables rule will be printed. An usage example:

$ iptables-translate -A OUTPUT -m tcp -p tcp –dport 443 -m hashlimit –hashlimit-above 20kb/s –hashlimit-burst 1mb –hashlimit-mode dstip –hashlimit-name https –hashlimit-dstmask 24 -m state –state NEW -j DROP

Translates to:

nft add rule ip filter OUTPUT tcp dport 443 flow table https { ip daddr and 255.255.255.0 timeout 60s limit rate over 20 kbytes/second  burst 1 mbytes} ct state new counter drop

The above example comes from the module I wrote the translation to, hashlimit, it’s similar to flow tables in nftables. Each module is translated separately and the code is in its iptables source file, much of the supported features have their translation written but some still need some work. Writing them is an actual nftables task in this round, future interns, go and check the xlate functions in the iptables files, it can be of great help to the community and to yourself ūüôā

After this task I looked into the JSON exportation of nftables ruleset, in the future importing a ruleset via JSON should also be possible, but for now only exporting is. This feature is still being defined and many changes are happening. What I did was to complement a patch to define some standard functions and use them to export rules. JSON in nftables is a little messy, probably it will get more attention soon.

Now about misfortune, last week an accident happened and my notebook is no longer working, I’m trying to have it fixed but it stalled my contribution with patches. Hopefully next week this will be sorted and I can finish some patches.

I’ll probably write a new post about my experience with Outreachy soon, now it is late and I need to go home :), see you.

Documentation weeks

nftables has two main documentation sources:

  • nftables wiki, the wiki provides an example oriented documentation, so the user can see how the features are useful in practice. Usually the wiki also states which Kernel and nft versions are needed for each feature. Also, since many nftables users come from iptables, it is useful to compare a feature to the one it replaces in iptables.
  • nft manpages, the manpages are directed to users who have some experience with the software, usually the grammar of a feature is displayed and the existent values for each component listed, along with a short description.

These past two weeks were all about documenting parts I helped implementing and others which I didn’t. Providing a good documentation is tricky, you should put yourself in the user shoes and write what’s relevant to them.

I have a feeling that documenting a feature you didn’t work leads to better results, since you don’t need to make an effort to visualize the system as an unexperienced user does. However, it is a lot harder, when you are writing references for a feature it usually means you can’t find other references except on git log and the code itself.

It feels similar to hunting bugs, actually odds are you find some in the process, or at least some unexpected behavior. I found a few places I thought worthy of improvements but this thought didn’t ripen, the reason being it provides less benefits than loss to fix them. In these past two weeks I’ve seen this a few times, after some thinking and tracking the code changes I’d see they are planned behaviors, using git blame and git log you can track the reason for the changes and often they’re a trade, an undesired behavior is allowed (when it doesn’t brake things) to avoid code duplication or too much complexity. Guess I should change my mindset to optimizing for simplicity and code maintenance.

Even though most of the “bugs” weren’t real bugs, I think I found one that really is and will try to fix it for now, see you.

Bugs solving week

There is only one week since my last post, so this is a short one.

Last week was focused on searching bugs and solving the ones I’m able to, some of them were suggested by my mentor and others I tried to choose by myself, wasn’t very lucky with those.

A good(?) thing about bugs is that they happen in every part of the system and you must chase them wherever they are, including places you’re not comfortable in.

For example, one of the bugs was a dependency issue, usually the building process follows this flow (when autoconf is used):

sh autogen.sh    (1)
./configure          (2)
make                   (3)
make install       (4)

There is usually a file named configure.ac, which contains system specifications and dependencies; this file is used in (1) to generate a configure script, which by its turn will be used in (2) to create the Makefile, needed in (3) to compile your files together. Finally (4) puts the resulting file in a appropriate place and the program is ready to be executed.

It’s expected that if ./configure finishes without errors then make and make install also will, however, that wasn’t always the case in nftables. To solve this bug I just needed to change the dependencies in configure.ac, the fix patch is a boring oneliner, the fun is in reproducing the bug and testing it.

To check the version of a dependency, configure.ac uses PKG_CHECK_MODULES(), this macro searches the dependency in some specific folders (read man pkg-config). It’s up to the developers to provide a .pc file when the software is installed, so pkg-config can find it; sometimes they don’t and you have two options, search for a source which does or write this file yourself, see what xtables.pc looks like:

prefix=/usr/local
exec_prefix=${prefix}
libdir=${exec_prefix}/lib
xtlibdir=${exec_prefix}/lib/xtables
includedir=${prefix}/include

Name:           xtables
Description:    Shared Xtables code for extensions and iproute2
Version:        1.6.1
Cflags:         -I${includedir}
Libs:           -L${libdir} -lxtables
Libs.private:   -ldl

Also, sometimes you upgrade or downgrade a library and the .pc file isn’t updated, what misleads your configure script and may cause unwanted behavior, be careful about it.

Other bugs were less interesting, two of them were only a table presentation fix and the last one I couldn’t reproduce, even after a lot of code digging and configuration changes, apparently it vanished somehow within the updates – and not much information was given, what makes reproducing it harder.

I’m still working on one, actually it’s a request for a new small feature for the parser, will enter in details later when I have some conclusion about it. See you.

Stateful objects, ICMP, bugs and tests

Sorry for the delay to post, this was a full week. Moving out is a laborious task, fortunately it is over now and I found time and inspiration to blog.

The past weeks were focused on a few good tasks, I’ll talk about them separately. One point they share is the need to track the execution path in code, many git greps and printfs were used but I think now I got a much better view of how everything works. When I need to see what a specific part of the execution does, I nearly always know instinctively what file/routine to analyze; this applies to nftables and libnftnl, in the kernel code I can’t always find my way easily, that’s a work in progress.

Now, about the actual tasks, one of them dealt with ICMP headers.

You can build rules in nftables to filter ICMP packets based on its header fields. Wikipedia has a good article about this protocol and shows that some header fields are variable. The meaning of the last 4 btyes depends on the fields type and code, the same field can represent mtu and sequence for example. But, this is a normal behavior, why this is relevant? Because nftables currently matches the offset of the field, to know which field to display on list ruleset. You can see this adding the rule:

$ nft add rule filter input icmp mtu 33 accept
$ nft list ruleset

<…>
icmp sequence 33 accept
<…>

Like I said, it matches the offset to know the field, since the fields mtu and sequence have the same offset they are displayed the same – in this case as sequence. I don’t think it spoils the filtering, the system will filter mtu, even though it displays sequence, although I’m not 100% sure on how the kernel handle this kind of rule. I say this because when a rule is written it has the right field on it, and the right message is sent to the kernel, which will set up the filtering. The problem happens on list ruleset, nft asks the active rules and the kernel return some structures. With these structures nft builds and displays the ruleset; when the rule is about ICMP header fields it has only the offset field available, and based on the offset the field name is chosen.

I spent more time that I’d like to admit to find the exact routine, where the field name is chosen, and to understand how the header matching works. Went trough the whole process of matching a rule, evaluating it, linearizing it to send netlkink messages to the kernel, and finally doing something similar to list the rule.¬† My conclusion is: with the available information, received from the kernel, it’s not possible to always display the right field. I think we need to add a new field to the message describing the rule the kernel returns.

I added a new field to header structures, on the corresponding parts of the nft, libnftnl and kernel code; After a little debugging, the ICMP fields were displayed as expected with list ruleset. However, the changes were a little intrusive and I can’t evaluate the side effects they have, also, maybe there is a simpler way that I didn’t see. The patches are still being evaluated.

That’s it for ICMP, in summary, the fix I proposed wasn’t applied yet, maybe never will, but making it taught me a lot about how the system works. It was good for learning, hopefully the report was useful to at least provide a new insight about the problem.

The next task was about stateful objects, this one yielded patches after some time invested. Stateful objects are a new feature of nftables, you can read about them here. In a few words, they make counters and quotas, also limits in the future, independent from rules, to help organizing the ruleset. Most of it was implemented in the linked patchset, but some features lacked the code, in nft, to work.

The main feature needed was to reset a single object in a table, a provisional patch was available to base the changes on. This provisional patch proved to be almost ready to join the codebase, only needed improvement on evaluating the command before executing and some testing. Then I worked on listing a single object, and reseting and listing all objects in a table, all related to the first one. After those new features I went to create some tests for stateful objects, the testing system in nft is divided in python and shell tests.

Shell tests are used to test for high level functionalities and bugs, sometimes when a bug is disclosed a shell test is created to make sure this bug never appears again. I wrote a shell test for a known bug last week, and while thinking and experimenting I found another one. Usually the bugs of nftables are archived here, until they’re cleared. The new found bug was solved by Pablo, in a lot less time than it took me to experiment and write the bug report, and also inspired a shell test.

Python tests are more focused on functionalities and system behavior, at a lower level than shell tests, it simulates the end user. Many of the possible rules are tested, an example is create a set and reference it in a rule. What I said about ICMP and the header fields is tested in this suit, and results in many warnings because a mtu rule is listed as sequence.

Before sending a patch to the mailing list you must run both test suits. If it triggers a new error, then you better not send it; I did, on my first patch, tested a lot before sending but wasn’t aware of the automated tests and broke some of them with the patch. A few hours later I received a friendly warning to never do it again, never did. In fact, the past week I helped on making new tests, the shell tests mentioned and a some pytests for stateful objects.

As I said, stateful objects are a new feature, there were no tests for it, actually, the python tests had no support for adding stateful objects to them. Then, the first step was to provide this support, modifying the script that read the testcases so it allows adding objects to tables and referencing them in rules. Next step was to create the actual tests, they test for adding objects to tables and creating simple rules with them. Having tests to detect bugs is very helpful, sometimes even when the bug has no solution yet it’s useful to create a test for it, to remind that the problem exists and must be addressed.

Although this past week was full of unrelated but urgent issues, I liked those past three weeks very much – this one isn’t over yet, the weekend will be full of bug solving :), I did work on many different parts of the system and get used with them.

Once I said about having a post dedicated to how nftables is organized under the sheets, a kind of guide aimed to those starting on it as developers. I feel more comfortable to start it now, probably this week I’ll begin it and update it when its needed. See you.

Sets and linked lists

Last post when creating rules we used:

nft add rule ip foo bar tcp dport http counter
nft add rule ip foo bar tcp dport https counter

Two rules used for the same command, could be convenient to add http and https into a single structure, right? For this we have sets, instead of the previous two rules we can type:

nft add rule ip foo bar tcp dport { http, https } counter

Where { http, https } is a set. It’s possible to create a named set, where you can add and delete elements as you will:

nft add set ip foo serv_set { type inet_service \; }

The new set is named serv_set and holds elements of type inet_service. To add elements to it:

nft add element ip foo serv_set { http, https }

And to delete http from serv_set:

nft add delete ip foo serv_set { http }

Rules can reference named sets by “@set_name”:

nft add rule ip foo bar tcp dport @serv_set counter

Now we know what are and how to use sets, the elements it holds can be of different types, not necessarily inet_service. The elements a set holds are available in a linked list, nft uses the same implementation of kernel’s linked lists, even though it runs in userspace. The kernel has an official linked list implementation since version 2.1, it was necessary to avoid code duplication and guarantee efficiency.

This list is circular and doubly linked, has a pointer to next and previous elements. An element is represented by the struct list_head:

struct list_head {
struct list_head *next;
struct list_head *prev;
};

To create a linked list of your own structs you just need to embed struct list_head in it. Taking struct book as example:

struct book {
int        npages;
int        pdate;
char       *name;
char       *author;
};

To make a list of struct book it becomes:

struct book {
struct list_head    blist;
int                 npages;
int                 pdate;
char                *name;
char                *author;
};

To iterate on the elements, or to modify the list you just need to write routines, or use the ones available, that manipulate struct list_head. Accessing the element that contains the struct list_head is simple with the macro:

container_of(pointer to list_head, typeof of the struct it is embedded in, name of the list attribute in the struct)

In our example:

container_of(&variable, typeof(struct book), blist)

This implementation is well documented and is available here.

Now, returning to nft sets. Every time the set is created, the elements are stored in a different order, that’s because the kernel uses a hash table with a random seed.¬† When “nft list ruleset” is called, set elements are returned in a linked list in the order defined during the set creation. Then, when a set is created twice, the calls of “nft list ruleset” might return the elements in a different order.

When tracking your ruleset via git, some changes can be unnecessarily triggered, in case the set has the same elements as before but now listed in a different order. To solve this issue it’s necessary to sort the elements.

There is no standard routine to sort linked lists in C, so I had to implement it. A trivial sort with O(n²) complexity took minutes to list big sets, so, a faster algorithm was needed. I chose Merge sort, it has O(n*log(n)) complexity in all cases.

The algorithm is basically:
‚ÄĘ Split the list in two
‚ÄĘ Sort the two halves separately
‚ÄĘ Merge the two halves sorted

The best part was to implement the comparator of elements, this part is specific to what is being sorted. In nft I had to sort elements of a custom type, won’t enter in much detail now because I want to write about how the structures in nft’s codebase in a future post.

That’s it, here is the patch. See you.

NFtables

Now it’s time to talk about nftables. Quoting nftables wiki:

“nftables is the new packet classification framework that intends to replace the existing {ip,ip6,arp,eb}_tables infrastructure.”

Using it you can filter network traffic on your machine:

  • Configure firewalls, to accept or drop (among other features) packets based on port numbers and addresses, this is high customizable;
  • Control traffic flow rate;
  • Log traffic;
  • Perform NAT;
  • Many other things.

To organize packets classification, nftables defines many data structures.

Tables are the base of nftables, not surprised, are we? You can have many of them, they determine the family of packets (ip, ip6…) and hold other structures. To create a new table just type on command line:

nft add table ip foo

Where ip is the family of the table and foo is its name.

A table alone has no use, they must be populated with chains, sets or maps. Let’s forget maps and sets for now and look at chains first, they hold rules (the ones in charge of packets classification) and determine the type (filtering, rerouting…) and hook (input, output…) of classification. Try it with:

nft add chain ip foo bar {type filter hook output priority 0 \;}

Again, ip is the family of table foo and bar is the name of this chain. The code within brackets says it’ll filter packets originated in the local system (output), the priority determine the precedence of chains in case of conflicting rules.

Still, nothing happens, we need rules. Rules go inside chains and determine the action which packets will trigger, there are many possible rules, read this reference to create yours, it also has info for chains and tables. Simple rules are:

nft add rule ip foo bar tcp dport http counter
nft add rule ip foo bar tcp dport https counter

These rules counter the packets that leave the local system, using tcp protocol with destination ports http (80) and https (443). To see it working type:

nft list ruleset

If you have a browser sending page requests, you should see the numbers increasing with time to time. Also it displays your tables, chains and rules.

You can save the ruleset in a text file to load it again later:

nft list ruleset > tmp

The file tmp can be loaded with:

nft -f tmp

To destroy the ruleset use flush command:

nft flush ruleset

That’s a quick and simplified view on how nftables is used, I’ll use it as reference for future posts. When talking about chains, I said to forget about sets and maps, this week I worked with them and they shall appear soon, on the following post.
To a better understanding of nftables I recommend this tutorial. And for further reference, including install instructions, refer to wiki-nftables.

 

 

IPv6 and Scapy

I’m interning at Linux Kernel in project nftables, it filters network traffic, soon I’ll blog about it, promise; and recently had to work with code that deals with IPv6.

Probably we’ve all heard about IP addresses, everything connected to the internet has this identifier. IP is the internet protocol that tries to send messages to IP addresses, it runs in Network Layer (L3) in the core of the internet.

There are two well known versions of IP, v4 and v6. Nowadays most of the internet traffic is based on the v4, what might change in the near (maybe not so near) future.

The main reason to replace v4 is the number of addresses, there are only 32 bits (~4 billion addresses) available. A few decades ago this number was huge, almost unimaginable, but the game has changed and the last address was sold to an Internet Service Provider (ISP) a few years ago.

NATs are used to make a single IP address identify multiple hosts; probably in your home only the ISP modem has a unique IP address, the computers, cellphones and other devices all have a local IP 192.168.x.y – you can use ifconfig in a terminal window to check your network board’s (probably named eth0) IP. However, many people don’t like NATs for many reasons, but this belongs to another discussion.

Now, IPv6 reserves 128 bits (~a lot) to source and destination address, every grain of sand on Earth can have its own IP address, how long will it last? Moreover, the decades of experience with v4 were used to improve IP in many ways. Take a look at both headers and how different they are:

ipv46
IPv4 and IPv6 header comparison, found in [1]
It’s important to notice that v4 header has variable length, options may be sent in it, what slows processing. For IPv6 the header has a fixed length of 40 bytes, no options allowed in the base header; however, options are an useful feature, so v6 still has them but they’re sent in extension headers.

The ‘Next Header’ field in the v6 defines which header follows the current one, that’s how extension headers are sent. In the last header this field contains the upper layer protocol (TCP, UDP etc.)¬† that is used, similar to v4 ‘Protocol’ field.

Sometimes it is necessary know which protocol an IPv6 packet uses, somewhere in Netfilter subsystem in the Kernel ipv6_find_hdr() is used for that. I was given the task to analyze if ipv6_skip_exthdr() can be used instead, because it’s simpler and faster. These functions are used by nftables to process every IPv6 packet, and to test them I had to create IPv6 packets, with extension headers. Now comes Scapy.

Scapy is a packet manipulation program, it has its own shell but it’s possible to use it with Python. Creating IPv6 packets with it is very simple, will just show it, since a code sample speaks for a thousand words. The following sends a message over TCP and IPv6:

from scapy.all import *

base = IPv6()

base.dst = '::1'&nbsp; # ::1 is localhost in IPv6
base.src = '::1'

prot = TCP(sport=1234, dport=1234, flags='S') # Attributes can be defined in initialization

data = Raw(load="Important message!")

packet = base / prot / data # '/' encapsulates the messages

packet.show2() # Displays the fields in the packet

send(packet)
packet_info
Output of packet.show2()

You can add as many extension headers to a v6 packet you want, just initialize and encapsulate them before sending:

...
ext = IPv6ExtHdrRouting(addresses=["::1", "::1"])

packet = base / ext / prot / data
...
packet_info_extension
Output of packet.show2() with one extension header

With it I was able to test the functions with different kinds of IPv6 packets. The values used in the packets don’t mean anything, just needed to test how the extension headers are skipped ūüôā

Scapy seems very useful to manipulate packets in different layers, and it’s easy to use, looking forward to explore it in future tasks. Thank you for reading!

[1] http://www.gta.ufrj.br/ensino/eel879/trabalhos_vf_2009_2/priscilla/ipv6_cabecalho.html