Official eMule-Board: Upload Throttling Performance Improvements - Official eMule-Board

Jump to content


  • (2 Pages)
  • +
  • 1
  • 2

Upload Throttling Performance Improvements support for high speeds on single slot

#1 User is offline   lupzz 

  • Member
  • PipPip
  • Group: Members
  • Posts: 34
  • Joined: 03-November 05

Posted 20 January 2006 - 02:15 AM

this patch provides high performances on the emule upload throttling procedure

there are two different fixes, which are independent one from the other:
1) fix up throttling "mad sleeping" and packets to be sent at once
2) enlarge the tcp window

note 1: the second fix, is optional and could be improved by enlarging less the window on low bandwidth*delay connections.

note 2: this is a port of a fix i've made at first on amule code (which is identical)

note 3: there're some debug messages around, so don't mind

note 4: i haven't found a better way to post the patch.. suggestions are hoped

note 5: i've no more notes :D

EDIT: i forgot an important note.. the performance improvements can be noted with high upload speeds connections (e.g.: mine is a symmetric 10mbit)

diff -urd ../srchybrid-orig/AsyncSocketEx.cpp ./AsyncSocketEx.cpp
--- ../srchybrid-orig/AsyncSocketEx.cpp	2006-01-20 02:58:18.000000000 +0100
+++ ./AsyncSocketEx.cpp	2006-01-20 03:02:21.000000000 +0100
@@ -513,6 +513,12 @@
  	SOCKET hSocket=socket(AF_INET, nSocketType, 0);
  	if (hSocket==INVALID_SOCKET)
    return FALSE;
+
+  int window_size = 256 * 1024;
+
+  setsockopt(hSocket, SOL_SOCKET, SO_SNDBUF, (char *) &window_size, sizeof(window_size) );
+  setsockopt(hSocket, SOL_SOCKET, SO_RCVBUF, (char *) &window_size, sizeof(window_size) );
+
  	m_SocketData.hSocket=hSocket;
  	AttachHandle(hSocket);
  	if (!AsyncSelect(lEvent))
diff -urd ../srchybrid-orig/AsyncSocketExLayer.cpp ./AsyncSocketExLayer.cpp
--- ../srchybrid-orig/AsyncSocketExLayer.cpp	2006-01-20 02:58:18.000000000 +0100
+++ ./AsyncSocketExLayer.cpp	2006-01-20 03:02:20.000000000 +0100
@@ -476,6 +476,12 @@
  	SOCKET hSocket=socket(AF_INET, nSocketType, 0);
  	if (hSocket==INVALID_SOCKET)
    res=FALSE;
+  
+  int window_size = 256 * 1024;
+
+  setsockopt(hSocket, SOL_SOCKET, SO_SNDBUF, (char *) &window_size, sizeof(window_size) );
+  setsockopt(hSocket, SOL_SOCKET, SO_RCVBUF, (char *) &window_size, sizeof(window_size) );
+
  	m_pOwnerSocket->m_SocketData.hSocket=hSocket;
  	m_pOwnerSocket->AttachHandle(hSocket);
  	if (!m_pOwnerSocket->AsyncSelect(lEvent))
@@ -600,4 +606,4 @@
  }
  else
  	return m_pNextLayer->ShutDownNext(nHow);
-}
\ No newline at end of file
+}
diff -urd ../srchybrid-orig/UploadBandwidthThrottler.cpp ./UploadBandwidthThrottler.cpp
--- ../srchybrid-orig/UploadBandwidthThrottler.cpp	2006-01-20 02:58:20.000000000 +0100
+++ ./UploadBandwidthThrottler.cpp	2006-01-20 03:00:16.000000000 +0100
@@ -23,6 +23,7 @@
 #include "LastCommonRouteFinder.h"
 #include "OtherFunctions.h"
 #include "emuledlg.h"
+#include "Log.h"
 
 #ifdef _DEBUG
 #define new DEBUG_NEW
@@ -368,6 +369,20 @@
     uint32 rememberedSlotCounter = 0;
     DWORD lastTickReachedBandwidth = ::GetTickCount();
 
+	uint32 extraSleepTime = 0;
+	uint32 nextPrint = 0; // needed for debugging
+	DWORD lastSpent = lastLoopTick;
+	uint64 spentBytes = 0;
+	uint64 spentOverhead = 0;
+	sint64 estSendBytesT = 0;
+	uint32 timeSinceLastSpent = 0;
+	uint32 nextEstSBT = 0;
+
+
+	const uint32 maxScale = 1200 * 1024, minScale = 6 * 1024, minFragments = 2, maxFragments = 128;
+	const float factor = (float)(maxFragments - minFragments) / 2 / (maxScale - minScale);
+	const uint16 avgPeriod = 10;
+
  while(doRun) {
         pauseEvent->Lock();
 
@@ -377,7 +392,12 @@
  	allowedDataRate = theApp.lastCommonRouteFinder->GetUpload();
 
  	uint32 minFragSize = 1300;
-        uint32 doubleSendSize = minFragSize*2; // send two packages at a time so they can share an ACK
+
+  // Linearly scaling fragments number
+  uint16 nFragments = (uint16)( factor * ( ( allowedDataRate < maxScale ? allowedDataRate : maxScale ) - minScale ) + minFragments/2 + .5 ) * 2;
+
+  uint32 doubleSendSize = minFragSize*nFragments;
+
  	if(allowedDataRate < 6*1024) {
    minFragSize = 536;
             doubleSendSize = minFragSize; // don't send two packages at a time at very low speeds to give them a smoother load
@@ -386,8 +406,7 @@
 #define TIME_BETWEEN_UPLOAD_LOOPS 1
         uint32 sleepTime;
         if(allowedDataRate == 0 || allowedDataRate == _UI32_MAX || realBytesToSpend >= 1000) {
-            // we could send at once, but sleep a while to not suck up all cpu
-            sleepTime = TIME_BETWEEN_UPLOAD_LOOPS;
+  	sleepTime = extraSleepTime;
         } else {
             // sleep for just as long as we need to get back to having one byte to send
             sleepTime = max((uint32)ceil((double)(-realBytesToSpend + 1000)/allowedDataRate), TIME_BETWEEN_UPLOAD_LOOPS);
@@ -419,7 +438,34 @@
 
                 realBytesToSpend += allowedDataRate*timeSinceLastLoop;
 
+    // keep in mind that we don't count the IP overhead (~ +3% [~ 1500/1460])
+    // and we're prone to understimate using an EWMA (~ +2% [simulation])
+    if ((uint32)(estSendBytesT/avgPeriod*1.05f) > allowedDataRate)
+        realBytesToSpend = 0;
+
                 bytesToSpend = realBytesToSpend/1000;
+
+    // debug message
+    
+    if (nextPrint < time(NULL)) {
+    	AddDebugLogLine(DLP_VERYLOW, false,
+        	_T("dLL: %u dLS: %u TS: %d B realTS: %d B SB: %d B DR: %u KB/s MaxUpload: %u KB/s dSS: %u B eSBT: %d B eSR: %d KB/s nFrag: %u ST: %u"),
+        	timeSinceLastLoop,
+        	timeSinceLastSpent,
+        	(int)bytesToSpend,
+        	(int)realBytesToSpend,
+        	(int)spentBytes,
+        	allowedDataRate / 1024,
+        	theApp.lastCommonRouteFinder->GetUpload() / 1024,
+        	doubleSendSize,
+        	(int)estSendBytesT,
+        	(int)(estSendBytesT / 10240),
+        	nFragments,
+        	sleepTime);
+
+    	nextPrint=time(NULL)+2;
+    }
+
             } else {
                 realBytesToSpend = _I64_MAX;
                 bytesToSpend = _I32_MAX;
@@ -431,9 +477,9 @@
 
  	lastLoopTick = thisLoopTick;
 
+  spentBytes = 0;
+  spentOverhead = 0;
         if(bytesToSpend >= 1) {
-      uint64 spentBytes = 0;
-      uint64 spentOverhead = 0;
     
       sendLocker.Lock();
     
@@ -471,7 +517,7 @@
     
       // Check if any sockets haven't gotten data for a long time. Then trickle them a package.
       for(uint32 slotCounter = 0; slotCounter < (uint32)m_StandardOrder_list.GetSize(); slotCounter++) {
-       ThrottledFileSocket* socket = m_StandardOrder_list.GetAt(slotCounter);
+    ThrottledFileSocket* socket = m_StandardOrder_list.GetAt(( slotCounter + rememberedSlotCounter ) % m_StandardOrder_list.GetSize());
     
        if(socket != NULL) {
         if(thisLoopTick-socket->GetLastCalledSend() > SEC2MS(1)) {
@@ -572,6 +618,36 @@
       m_SentBytesSinceLastCallOverhead += spentOverhead;
     
             sendLocker.Unlock();
+
+    // This one is another important change
+    // if you send the thread to sleep you will never get good performances
+    // the cpu is already freed by the system calls (like read and write on the sockets)
+    // When implemented this you will also get even less cpu usage by the throttling thread
+    // This is happening because switching between sleep and run a lot of times
+    // could have high impact on the scheds and paging optimizations of the so.
+    extraSleepTime = 0;
+
+   } else {
+  if (extraSleepTime == 0)
+  	extraSleepTime=1;
+
+  extraSleepTime = min(extraSleepTime * 5, 1000); // 1s at most
+   }
+
+  estSendBytesT += spentBytes + spentOverhead;
+
+  if (nextEstSBT < thisLoopTick) {
+
+  	timeSinceLastSpent = thisLoopTick - lastSpent;
+  	lastSpent = thisLoopTick;
+
+
+  	sint64 estSendBytesTOld = estSendBytesT;
+  	estSendBytesT -= (sint64)(estSendBytesT*(float)timeSinceLastSpent/(1000*avgPeriod));
+
+  	AddDebugLogLine(DLP_VERYLOW, false, _T("eSBT: %d eSBTO: %d tSLL: %u Rate: %.2f\n"), (int)estSendBytesT, (int)estSendBytesTOld, timeSinceLastSpent, (float)estSendBytesT/(1024*avgPeriod));
+
+  	nextEstSBT = thisLoopTick + 250;
         }
  }
 
@@ -589,4 +665,4 @@
  sendLocker.Unlock();
 
  return 0;
-}
\ No newline at end of file
+}

This post has been edited by lupzz: 20 January 2006 - 02:26 AM

0

#2 User is offline   zz 

  • -
  • PipPipPipPipPipPipPip
  • Group: Debugger
  • Posts: 2,014
  • Joined: 30-November 02

Posted 21 January 2006 - 05:06 PM

Can you zip the changed files and put them up for download somewhere? I've never bothered with patch on Windows.

/zz B)
ZZUL - get control of your uploads: ZZUL Forum
0

#3 User is offline   lupzz 

  • Member
  • PipPip
  • Group: Members
  • Posts: 34
  • Joined: 03-November 05

Posted 22 January 2006 - 02:11 AM

i've found a one-click free hosting service, i hope it works

throttler-fix.tar.gz
0

#4 User is offline   zz 

  • -
  • PipPipPipPipPipPipPip
  • Group: Debugger
  • Posts: 2,014
  • Joined: 30-November 02

Posted 22 January 2006 - 09:58 PM

Thanks. Got it.

Am I correct when I summarize the changes as this?

- Set larger send buffers
- Send more fragments/packages the higher the set bandwidth (does this work properly when no ul limit is set, or is it set to highest value then?)
- Don't allow it to Sleep() as long as there's bandwidth to spend (and no sleep at all if no upload limit is set?).
- bandwidth estimation, and overhead calculations based on that, to sometimes choke the bandwidth when going too high. (this doesn't seem to be related to the "send really fast" fixes though?)

/zz B)

This post has been edited by zz: 22 January 2006 - 10:00 PM

ZZUL - get control of your uploads: ZZUL Forum
0

#5 User is offline   SiRoB 

  • Retired Morph Dev
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 1,691
  • Joined: 28-June 03

Posted 22 January 2006 - 11:11 PM

the top would have to make a dynamic mecanisme to avoid some extra WSAEWOULDBLOCK

I tryed with no success, hope you will. ;)
eMule 0.47c MorphXT v9.5 ::binary::source::
0

#6 User is offline   lupzz 

  • Member
  • PipPip
  • Group: Members
  • Posts: 34
  • Joined: 03-November 05

Posted 22 January 2006 - 11:25 PM

zz, on Jan 22 2006, 10:58 PM, said:

- Set larger send buffers

sure, better to say that increases the dimension of the dynamic send buffer so that, it has looser limits

because it is dynamic, it grows only when is needed

that same parm increases the tcp window size, that is the more interesting result

zz, on Jan 22 2006, 10:58 PM, said:

- Send more fragments/packages the higher the set bandwidth (does this work properly when no ul limit is set, or is it set to highest value then?)

it works anyway. the allowedDataRate var always has a value "a reasonable" value

when the limit is set it is equal to the limit, when no limit it tunes by herself

zz, on Jan 22 2006, 10:58 PM, said:

- Don't allow it to Sleep() as long as there's bandwidth to spend (and no sleep at all if no upload limit is set?).

it sleeps gracefully when it has no data and also when no limit is set

zz, on Jan 22 2006, 10:58 PM, said:

- bandwidth estimation, and overhead calculations based on that, to sometimes choke the bandwidth when going too high. (this doesn't seem to be related to the "send really fast" fixes though?)

well i was quite surprised to discover that with my patch the algorithm doesn't limit anything at all... so i implemented a more precise way

it estimates the bandwidth based on a EWMA, then limits according to this value.

the EWMA is the same algorithm used on a lot of routers that do RED and shaping, so it is very efficent.

btw your summary was enought precise :)
0

#7 User is offline   lupzz 

  • Member
  • PipPip
  • Group: Members
  • Posts: 34
  • Joined: 03-November 05

Posted 22 January 2006 - 11:30 PM

SiRoB, on Jan 23 2006, 12:11 AM, said:

the top would have to make a dynamic mecanisme to avoid some extra WSAEWOULDBLOCK

i will try to look at this problem too, btw with this patch i already get full speed slots.

in a rich-of-layers design is always difficult to get to that point of control..

will make you know anyway
0

#8 User is offline   zz 

  • -
  • PipPipPipPipPipPipPip
  • Group: Debugger
  • Posts: 2,014
  • Joined: 30-November 02

Posted 23 January 2006 - 08:21 AM

Btw, is the patch based on 0.46b? I get weird change points if I compare to 0.46c.

/zz B)
ZZUL - get control of your uploads: ZZUL Forum
0

#9 User is offline   lupzz 

  • Member
  • PipPip
  • Group: Members
  • Posts: 34
  • Joined: 03-November 05

Posted 23 January 2006 - 10:29 AM

@zz
no it's based over .45b, i posted the patch actually for this reason

but applying the patch to .46c i get no failure, i post the patched files. i've not compiled this one, but i think that this too are good

throttler-fix-0.46.tar.bz2
0

#10 User is offline   zz 

  • -
  • PipPipPipPipPipPipPip
  • Group: Debugger
  • Posts: 2,014
  • Joined: 30-November 02

Posted 23 January 2006 - 10:52 AM

I'll just diff towards 0.45b then. :) (just did: turns out nothing much happened up to 0.46b from 0.45b. Some changes in 0.46c may logically overlap with your bandwidth estimation though)

I'm trying to figure out why the increasing of doubleSendSize would make a difference. As I see it (with normal sized doubleSendSize), in the first loop it will send 2600 bytes to each socket. Then, when all sockets have gotten a call, if there's bandwidth left, it will loop over all sockets again and give each socket as much as it can take.

With the increase of doubleSendSize it will just allow Send() to give more to each socket in the first loop. But when Send() is called, it does several calls to the socket's send() anyway, so there would be about the same number of calls to each socket's send() method with both approaches.

Because of that, from a max-throughput pov I can't see that the increase of doubleSendSize would make much difference.

Have you verified that the increase of doubleSendSize increases throughput compared to normal sized doubleSendSize, when your other changes are there? Do you have a theory to why the increase doubleSendSize increases throughput?

/zz B)

This post has been edited by zz: 23 January 2006 - 10:53 AM

ZZUL - get control of your uploads: ZZUL Forum
0

#11 User is offline   lupzz 

  • Member
  • PipPip
  • Group: Members
  • Posts: 34
  • Joined: 03-November 05

Posted 23 January 2006 - 02:56 PM

well yes

usually the mtu over ip is 1500 (1492 if connecting through a pppoe link)... this is quite common if there're no vpn or other strange things in the middle. so the mss is 1460 (1452)

even if you have a very nifty vpn you can get to 1412 (but a really new age one :D)

on emule the fragmentsize is set to 1300 though. i think that has been set up for compatibility with strange network connections (and a little paranoic too :D)

but i don't liked the idea to set it to a different value (1400 is a very compatible anyway).

so i scaled the fragments to be sent at once so that if we have to send "a lot" we send more at once... thus reducing the overhead.

in this way we can reduce overhead still mantaining a very compatible setting.. (nagle algo should "partly" solve this anyway but don't be so sure about this :D)

another good point we get in this way is to "make thing easier" for the tcp scaling algo of the so :)
0

#12 User is offline   zz 

  • -
  • PipPipPipPipPipPipPip
  • Group: Debugger
  • Posts: 2,014
  • Joined: 30-November 02

Posted 23 January 2006 - 05:12 PM

My point is that one call to Send() will become many calls to the socket's send() anyway. So when looking at the socket level, there should be a minimal difference.

/zz B)
ZZUL - get control of your uploads: ZZUL Forum
0

#13 User is offline   lupzz 

  • Member
  • PipPip
  • Group: Members
  • Posts: 34
  • Joined: 03-November 05

Posted 23 January 2006 - 06:31 PM

maybe, but i don't have numbers on this... i think that AsynchSocketEx::Send is called often only once.

and keep in mind that the timing of the throttler and of the emsocket is a little different

anyway seems cleaner done in this way but a complete redesign of the sockets class used will be the best solution anyway :huh:
0

#14 User is offline   zz 

  • -
  • PipPipPipPipPipPipPip
  • Group: Debugger
  • Posts: 2,014
  • Joined: 30-November 02

Posted 23 January 2006 - 06:49 PM

Iirc the Send() method pulls the packets one by one and puts them on socket. That's why there's more than one send() calls for Send() with a large max-size.

Have you tested with and without that part, but the other changes in place? Any difference?

I'm experimenting with skipping the sleep call, I'm guessing that and the larger send buffers would have the most effect. But since I have a slower connection I can only regression test them, I can't test that they have any effect on a faster connection. :)

/zz B)
ZZUL - get control of your uploads: ZZUL Forum
0

#15 User is offline   lupzz 

  • Member
  • PipPip
  • Group: Members
  • Posts: 34
  • Joined: 03-November 05

Posted 23 January 2006 - 10:05 PM

@zz
well, on the firsts tests it gave me good results, but the patch has changed a lot so i really don't know...

i will do this test, when i organize with some testers and will make you know (i don't know when because depends on some other people too)
0

#16 User is offline   zz 

  • -
  • PipPipPipPipPipPipPip
  • Group: Debugger
  • Posts: 2,014
  • Joined: 30-November 02

Posted 24 January 2006 - 01:50 PM

Please post back here if you get the chance to test it (I'm tracking the thread). I'm very curious about it. :) Sometimes real world behaviour doesn't match up to what's expected when looking at the code, so it's very possible it makes a difference.

/zz B)
ZZUL - get control of your uploads: ZZUL Forum
0

#17 User is offline   leuk_he 

  • MorphXT team.
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 5,975
  • Joined: 11-August 04

Posted 24 January 2006 - 02:16 PM

lupzz, on Jan 23 2006, 03:56 PM, said:

on emule the fragmentsize is set to 1300 though. i think that has been set up for compatibility with strange network connections (and a little paranoic too :D)
View Post


I don't knwo where the value of 1300 came form but i beleive netf once proved that the overhead was less if all clients used the same value. Tweaking that value will increase the overhead.

unfortenutely cannot find relevant posts.
Download the MorphXT emule mod here: eMule Morph mod

Trouble connecting to a server? Use kad and /or refresh your server list
Strange search results? Check for fake servers! Or download morph, enable obfuscated server required, and far less fake server seen.

Looking for morphXT translators. If you want to translate the morph strings please come here (you only need to be able to write, no coding required. ) Covered now: cn,pt(br),it,es_t,fr.,pl Update needed:de,nl
-Morph FAQ [English wiki]--Het grote emule topic deel 13 [Nederlands]
if you want to send a message i will tell you to open op a topic in the forum. Other forum lurkers might be helped as well.
0

#18 User is offline   zz 

  • -
  • PipPipPipPipPipPipPip
  • Group: Debugger
  • Posts: 2,014
  • Joined: 30-November 02

Posted 24 January 2006 - 02:36 PM

luke ;) here are some threads related to eMule tcp overhead:

http://forum.emule-p...showtopic=42432

http://forum.emule-p...showtopic=51076

/zz B)
ZZUL - get control of your uploads: ZZUL Forum
0

#19 User is offline   zz 

  • -
  • PipPipPipPipPipPipPip
  • Group: Debugger
  • Posts: 2,014
  • Joined: 30-November 02

Posted 30 January 2006 - 09:40 AM

So, I've been playing with these modifications and some other, and I get a strange result...

The only way for me to test highspeed is to run a private server and two clients at my Gigabit-network at home. The first test I did showed an OK result: I got about 3.5 Megabytes/s in transfer speed from computer A->B.

But when I tried to transfer the other way around, from B->A, I got between 50-150 KBytes/s at top. :confused:

The eMule settings seems to be the same on both computers, and they use the same executable. They both use an updated Win XP. Don't know what's going on. :) Maybe there's some tcp settings on the computers that is different, but I'm not sure what to look for.

Anyway, one of my eMule seems to be able to be able to go to at least 3.5 MB/s.

/zz B)
ZZUL - get control of your uploads: ZZUL Forum
0

#20 User is offline   lupzz 

  • Member
  • PipPip
  • Group: Members
  • Posts: 34
  • Joined: 03-November 05

Posted 30 January 2006 - 10:52 AM

@zz
i think one or more of the following may be the reason:
- windows without enhanced tcp registry settings (rwin, sacks, no timestamps etc.)
- very low cpu/ram on one of those
- check the download/upload limits on both the mules (one time i've been testing the mod with a marvellous 36kb/s limit :-k)
- check that your network device drivers and so are up to date
- check that you're not running some strange network annoying software
0

  • Member Options

  • (2 Pages)
  • +
  • 1
  • 2

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users