Scalable TCP Tuning

René Pfeiffer [lynx at luchs.at]

Mon, 31 Mar 2008 21:29:06 +0200

On Mar 31, 2008 at 0024 -0700, Erik van Zijst appeared and said:

> René Pfeiffer wrote:
>> [...]
>>  - /proc/sys/net/ipv4/tcp_low_latency controls if the data is forwarded
>>    directly through the TCP stack to the application buffer (=1) or not
>>    (=0). I have never benchmarked or compared this setting, thought it's
>>    always on on my laptop (as I noticed just now, I must have fiddled
>>    with sysctl.conf here).
>
> I'm not sure what that one does exactly, but the problem is not the 
> client-side, as it is fast enough to read the video from the socket. 
> Instead, it's the server-side that saturates the socket, filling up the 
> entire send buffer and thereby increasing the end-to-end time it takes for 
> data to travel from server to client.

I meant to try this on the server. I think it is designed to work on the client side, but I am not sure.

> The way our streaming solution works is by letting the server anticipate 
> congestion (blocking write calls) by reducing the video bitrate in 
> real-time. As a result, the send buffer is usually completely filled. For 
> that same reason, disabling Nagle's algorithm has no effect either: the send 
> buffer always contains more than one MSS of data.

I see.

> This is fine, but as I frequently get buffer underruns on networks with 
> highly fluctuating Bandwidth-Delay-Products, it looks like Linux is happy to 
> increase the send buffer's capacity when beneficial, but less so to decrease 
> it again when circumstances change.

Judging from the measurements I've seen when playing with the congestion algorithms, the Linux kernel seems to be able to decrease the sender window. However I think the behaviour is really targetted at having a full buffer and a suitable queue all of the time. You could check which one of the algorithms works best for your application and create another kernel module with the desired window behaviour. I make the distinction between buffer and window size since I believe that the congestion algorithms only affect the window handling, not the buffer handling.

>>  - The application keeps its own buffer, but you can also influence the
>>    maximum socket buffers of the TCP stack in the kernel.
>>    http://dsd.lbl.gov/TCP-tuning/linux.html describes the maximum size
>>    of send/receive buffers. You could try reducing this, but maybe you
>>    can't influence both sides of the connection.
>
> Yes, I've been tempted to manually shrink the send buffer from the 
> application-level, but since the fluctuating bandwidth and delay justify a 
> dynamic buffer size, I'm reluctant to try and hardwire any fixed values in 
> user space.

Yes, I agree, having an algorithm doing that automatically would be more useful.

> What I need effectively (I think), is to let the kernel make sure the total 
> send buffer is always exactly twice the cwnd. There's an interesting 2002 
> paper addressing exactly this issue: 
> http://www.eecg.toronto.edu/~ashvin/publications/iwqos2002.pdf

I haven't seen this one, thanks. Now I know how to start the day tomorrow at the office. It seems this publication fits perfectly to your problem.

BTW, I am playing with IPv6 now and the typical delays increase if you don't have native IPv6 connectivity but tunnels in IPv4 space. Have you done any experiments with streaming through IPv6-in-IPv4 tunnels? It might not be widely deployed, but I am curious.

Cheers, René.

Top Back

Erik van Zijst [erik.van.zijst at layerstream.com]

Mon, 31 Mar 2008 21:41:57 -0700

René Pfeiffer wrote:

> Judging from the measurements I've seen when playing with the
> congestion algorithms, the Linux kernel seems to be able to decrease
> the sender window. However I think the behaviour is really targetted at
> having a full buffer and a suitable queue all of the time. You could
> check which one of the algorithms works best for your application and
> create another kernel module with the desired window behaviour. I make
> the distinction between buffer and window size since I believe that the
> congestion algorithms only affect the window handling, not the buffer
> handling.

Yes, after more experimentation I confirm that Linux also decreases the buffer size. From what I've seen now, tcp_westwood works best and seems capable of decreasing average latency. Not surprisingly, it uses the smallest send buffer.

I agree that the main responsibility of the congestion algorithms is manipulation of the sliding window rather than the send buffer, but if real-time buffer tuning is possible in the kernel module, it'd be nice to see an implementation that provides low end-to-end latency even on congested networks. Not sure I have the required skills though ;-)

>> What I need effectively (I think), is to let the kernel make sure the total 
>> send buffer is always exactly twice the cwnd. There's an interesting 2002 
>> paper addressing exactly this issue: 
>> http://www.eecg.toronto.edu/~ashvin/publications/iwqos2002.pdf
> 
> I haven't seen this one, thanks. Now I know how to start the day
> tomorrow at the office.  It seems this publication fits perfectly to
> your problem.

Yes it does.

With the spectacular growth of online video, I'm sure we are not the only ones pushing TCP as a viable protocol for real-time streaming applications. Its reliability eliminates forward error correction overhead, while its congestion-control prevents unfair resource hogging, which is good for everyone. Currently however, it could benefit from a bit of tuning.

> BTW, I am playing with IPv6 now and the typical delays increase if you
> don't have native IPv6 connectivity but tunnels in IPv4 space. Have you
> done any experiments with streaming through IPv6-in-IPv4 tunnels? It
> might not be widely deployed, but I am curious.

No, I haven't. My measurements are gathered from production servers with real streams. All IPv4 at the moment.

cheers, Erik

Top Back

René Pfeiffer [lynx at luchs.at]

Tue, 1 Apr 2008 12:57:11 +0200

On Mar 31, 2008 at 1143 -0700, Erik van Zijst appeared and said:

> Rene,
>
> One short follow-up question: is the new TCP module effective on each new 
> TCP connection immediately after loading, or does it require a restart of 
> the server process? Also, what happens to established connections? Do they 
> continue to use the old congestion control algorithm until there are torn 
> down?

There is one setting that helps to use the congestion modules to the fullest. You should set

net.ipv4.tcp_no_metrics_save=1

in /etc/sysctl.conf (or write 1 to /proc/sys/net/ipv4/tcp_no_metrics_save). According to the kernel sources and the Gentoo Wiki (http://gentoo-wiki.com/HOWTO_TCP_Tuning) it does the following:

"This removes an odd behavior in the 2.6 kernels, whereby the kernel stores the slow start threshold for a client between TCP sessions. This can cause undesired results, as a single period of congestion can affect many subsequent connections."

I think established connections are not affected by a change of the TCP module, but I've never verified this. I noticed that you can even set the congestion algorithm per connection by using a parameter with setsockopt(), but I don't remeber the URL. I saw it yesterday though.

Cheers, René.

-- 
  )\._.,--....,'``.  fL  Let GNU/Linux work for you while you take a nap.
 /,   _.. \   _\  (`._ ,. R. Pfeiffer <lynx at luchs.at> + http://web.luchs.at/
`._.-(,_..'--(,_..'`-.;.'  - System administration + Consulting + Teaching -
Got mail delivery problems?  http://web.luchs.at/information/blockedmail.php

Top Back

René Pfeiffer [lynx at luchs.at]

Wed, 16 Apr 2008 13:46:32 +0200

Hello, Erik!

While preparing a network programming tutorial for game developers I found an interesting article that might also be useful for streaming data. It deals with the interaction of Nagle's algorithm and delayed ACK packets.

http://www.stuartcheshire.org/papers/NagleDelayedAck/

Maybe you have heard of this, maybe not; in either case it's interesting, so it goes to TAG as well.

Best, René.

Top Back

Erik van Zijst [erik.van.zijst at gmail.com]

Thu, 17 Apr 2008 17:34:03 +0200

Hi Rene,

Thanks for the article!

I'm still kind of struggling with the fact that the TCP send buffer has a tendency to get bigger than necessary, but found some relief in Linux' pluggable congestion algorithms.

The article is interesting, but isn't really applicable for me. I'm well aware of the interaction between delayed ACKs and Nagle's algorithm, but it mainly plagues interactive communication that involves application-level replies, which is exactly what the article exposes: each 100k, the client sends an application-level reply that triggers the next 100k.

In our streaming environment there's no interactivity of this kind. The server just continuously sends packets. There's no application-level return traffic.

I've not actually tried to disable Nagle on the sending-side to see what happens, but I expect no noticeable effect in my case.

cheers,

Erik

Top Back

René Pfeiffer [lynx at luchs.at]

Thu, 17 Apr 2008 18:49:53 +0200

Hello, Erik!

On Apr 17, 2008 at 1734 +0200, Erik van Zijst appeared and said:

> [...]
> I'm still kind of struggling with the fact that the TCP send buffer has a 
> tendency to get bigger than necessary, but found some relief in Linux' 
> pluggable congestion algorithms.

Maybe we'll see your name some day in the kernel's changelog.

> [...]
> In our streaming environment there's no interactivity of this kind. The 
> server just continuously sends packets. There's no application-level return 
> traffic.

I wasn't sure about that since I remembered Real Player's statistics feedback. I assume that some streaming protocols have some kind of feedback mechanism to tell the server about the link quality, but I have next to none in-depth experience with streaming.

Best, René.

Top Back

Erik van Zijst [erik.van.zijst at gmail.com]

Thu, 17 Apr 2008 19:29:36 +0200

René Pfeiffer wrote:

> 
> Maybe we'll see your name some day in the kernel's changelog.

Who knows, but I doubt I have such skills ;-)

>> In our streaming environment there's no interactivity of this kind. The 
>> server just continuously sends packets. There's no application-level return 
>> traffic.
> 
> I wasn't sure about that since I remembered Real Player's statistics
> feedback. I assume that some streaming protocols have some kind of
> feedback mechanism to tell the server about the link quality, but I have
> next to none in-depth experience with streaming.

Yes you're right, they do. And we pride ourselves on the fact that we don't :-)

In our little start-up company we designed and built a new real-time, TCP-friendly, streaming protocol in combination with bitrate-adaptive video coding. This protocol does network-capacity analysis without interactivity between client and server.

cheers, Erik

Top Back