Eric Stewart: Running Off At The Mouth

Packet Captures and Offloaded Functions

by Eric Stewart on Sep.05, 2013, under Networking, Technology

The fact that we (the “work” we) use servers to act as capture devices using Wireshark or TCPDump is not news.  However, we (well, Toivo) ran across an issue today that necessitated some research on my part on how to fix properly.  As my solution pulled from multiple sites (and at least one other person), I decided to put up a blog post.

Toivo was attempting to troubleshoot/verify/prove it wasn’t our fault an issue with a server using jumbo frames.  Thing is, after verifying that the packets would be fragmented somewhere between the server sending out jumbo frames and the capture point, a Wireshark capture would show the jumbo frame in its entirety (EDIT: In actuality, it wasn’t the jumbo frame – it was the TCP packet, that in the stack would have been fragmented when passed down the stack … meaning the capture point was presenting as a single 13,000 byte frame the frames that contained the data related to the original TCP packet).  Seeing as how that wasn’t possible, Toivo took the time to figure out why this was happening.

Turns out that the capture interface on the server (CentOS 6.4) was performing some reassembly of the packets, as any interface capable of doing TCP offload work would.  You can check to see what a particular interface might be doing with “ethtool”:

[root@monbox ~]# ethtool -k em1
Features for em1:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: off

Usually this isn’t a problem – this is actually what you want for regular interfaces.  It reduces the work a system’s CPU has to do when it comes to processing network information.  For packet captures, though, you’re often looking for the data in as raw a format as possible.  It turns out that this behavior is covered in the Wireshark Wiki, and they make some suggestions there on what to change.  You can use

ethtool -K ifname tx off rx off tso off gso off

to take things a step farther – it:

  • Turns off tx-checksumming
  • Turns off rx-checksumming
  • Turns off tcp-segmentation-offload
  • Turns off generic-segmentation-offload, which Toivo suggests also turns off scatter-gather and probably generic-receive-offload

On our systems I have yet to get a system to successfully turn off the “rx/tx-vlan-offload” options.

Problem is, these changes will not stay through a reboot.  After two days of Googling, reading several web pages, and some trial and error, I eventually came up with two solutions: The Scalpel, and The Sledgehammer.

The Scalpel: /etc/udev/rules.d/

It was mentioned in a few cases in my searching that creating a custom /etc/udev/rules.d option that ran the ethtool commands for you was the way to go.  I couldn’t get this to work quickly enough and resorted to the Sledgehammer option explained farther down.  The next day, I did some additional searching, and (after tripping over a typo that brought down networking on the server completely after a reboot), got it working first on CentOS 6.4, and then eventually on CentOS 5.9.  Unfortunately, udev works quite differently, especially for networking devices, between the two, and has two different command sets.  I try to cover both sufficiently below.

At some point, I came across an explanation that suggested custom rules go in the “50”s (sorry, lost the link for that one).  Seeing as how “50-ethtool.rules” was also suggested on one of the links, I just kept that file name.  Here’s what’s in it:

CentOS 5.9:

ACTION=="add", SUBSYSTEM=="net", KERNEL=="ifname", \
   RUN+="/sbin/ethtool -K ifname tx off rx off tso off gso off"

CentOS 6.4:

ACTION=="add", SUBSYSTEM=="net", \
   DEVPATH=="/devices/pci0000:00/0000:00:03.0/0000:02:00.1/net/ifname", \
   RUN+="/sbin/ethtool -K ifname tx off rx off tso off gso off"

(wrapped for your readability, but it should actually be one line with no “\”)  That’s pretty much it.  For each interface you want to apply the ethtool command to, you’ll need a line and the commands you want to apply to it (changing the “ifname” value appropriately).  The entries on a CentOS 5.x box are fairly straight forward.  For CentOS 6.4, not so much; for different interfaces, the DEVPATH can be different.  Using

udevadm info --query=path --path=/sys/class/net/ifname

you can get the value for DEVPATH.

It is unfortunate, but DEVPATH was really the only option available (at least in my initial testing) for indicating a specific interface.  Of the available rules.d key values, neither NAME, nor KERNEL, is set for network devices (on my CentOS 6.4 system, anyway).  A value for INTERFACE is eventually linked to the device, but is not considered a valid key value.

Interesting side notes: While using INTERFACE as a possible value, it became apparent that the ethtool command was running for every interface that came up; but since the interface was specified on the ethtool command line, it just meant that the system attempted to run the command on the same interface every time.  Also, be sure to use “==” when testing key values and not “=”; while testing the NAME key, using “=” killed networking.

Now, once you have your /etc/udev/rules.d/50-ethtool.rules set up, it is suggested you “test” this first with (on CentOS 6) udevadm test /sys/class/net/ifname/:

run_command: calling: test
udevadm_test: version 147
This program is for debugging only, it does not run any program,
specified by a RUN key. It may show incorrect results, because
some values may be different, or not available at a simulation run.
parse_file: reading '/lib/udev/rules.d/10-console.rules' as rules file
[snip]
parse_file: reading '/etc/udev/rules.d/50-ethtool.rules' as rules file
[snip]
parse_file: reading '/dev/.udev/rules.d/99-root.rules' as rules file
udev_rules_new: rules use 147768 bytes tokens (12314 * 12 bytes), 35197 bytes buffer
[snip]
udev_rules_apply_to_event: RUN '/sbin/ethtool -K ifname tx off rx off tso off gso off' /etc/udev/rules.d/50-ethtool.rules:1
udev_rules_apply_to_event: PROGRAM '/lib/udev/rename_device' /lib/udev/rules.d/60-net.rules:2
[snip]
udev_rules_apply_to_event: RUN 'socket:@/org/freedesktop/hal/udev_event' /etc/udev/rules.d/90-hal.rules:2
udevadm_test: UDEV_LOG=6
udevadm_test: DEVPATH=/devices/pci0000:00/0000:00:03.0/0000:02:00.1/net/ifname
udevadm_test: INTERFACE=ifname
udevadm_test: IFINDEX=5
udevadm_test: ACTION=add
udevadm_test: SUBSYSTEM=net
udevadm_test: INTERFACE_NAME=ifname
udevadm_test: ID_VENDOR_FROM_DATABASE=Broadcom Corporation
udevadm_test: ID_MODEL_FROM_DATABASE=NetXtreme II BCM5709 Gigabit Ethernet
udevadm_test: ID_BUS=pci
udevadm_test: ID_VENDOR_ID=0x14e4
udevadm_test: ID_MODEL_ID=0x1639
udevadm_test: NET_MATCHID=0000:02:00.1
udevadm_test: run: '/sbin/ethtool -K ifname tx off rx off tso off gso off'
udevadm_test: run: '/etc/sysconfig/network-scripts/net.hotplug'
udevadm_test: run: 'socket:@/org/freedesktop/hal/udev_event'

(Some output snipped for readability – it dumps a lot more than this out for a test run; on CentOS 5, use udevtest /class/net/ifname.) If you run it a second time for a different interface, you should see that the ethtool command is not run.  The next step is to apply it, which should only require

  • CentOS 6: udevadm trigger
  • CentOS 5: udevtrigger

Use

ethtool -k ifname

to check:

[root@monbox rules.d]# ethtool -k ifname
Features for ifname:
rx-checksumming: off
tx-checksumming: off
scatter-gather: off
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: off

And you’re all set.

The Sledgehammer: /sbin/ifup-local

In my searches, I found a nice article that explained how to (for CentOS and Red Hat Enterprise Linux, anyway) make those ethtool changes persistent across a reboot, using a simple bash script saved to a special file name.  This was the option I got working first, but it is very “brute force” and you could run into edge cases where the script doesn’t work as desired (say, if you have only two interfaces and don’t use bonding, you’d have to edit the script appropriately).

During the process of bringing up an interface (say, if you use /etc/init.d/network restart) eventually a script at /etc/sysconfig/network-scripts/ifup-post is called (obviously, after the interface is actually up).  It looks for /sbin/ifup-local, and, if there, runs it with the network interface’s device name as the argument.  To solve our issues, we put the following bash script in on all of our capture servers:

#!/bin/bash
case "$1" in
lo|bond0|em1|em2|eth0|eth1)
;;
*)
 echo "Turning off offloading on $1"
 /sbin/ethtool -K $1 tx off rx off tso off gso off
;;
esac
exit 0

What the script does for lo (loopback), bond0 (the bonded interface), em1, em2 (embedded interfaces that we usually use as the bonded interfaces), eth0, and eth1 (older hardware will often report their interfaces using this name/number scheme), is, well, nothing.  It continues on as normal.  However, for everything else, it runs the ethtool command and configures the interface as desired:

[root@monbox rules.d]# ethtool -k monif
Features for monif:
rx-checksumming: off
tx-checksumming: off
scatter-gather: off
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: off

And for Toivo’s case this seemed to fix the issue to his satisfaction.

:, , ,

Hi! Did you get all the way down here and not find an answer to your question? The two preferred options for contacting me are:
  • Twitter: Just start your Twitter message with @BotFodder and I'll respond to it when I see it.
  • Reply to the post: Register (if you haven't already) on the site, submit your question as a comment to the blog post, and I'll reply as a comment.

Leave a Reply

You must be logged in to post a comment.