Official eMule-Board: Improve To Kad's Cuint128 - Official eMule-Board

Jump to content


  • (2 Pages)
  • +
  • 1
  • 2

Improve To Kad's Cuint128 toBinaryString & toHexString

#1 User is offline   Avi-3k 

  • hebMule [retired] dev
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 1,127
  • Joined: 25-June 03

Posted 08 September 2005 - 08:11 PM

i saw some mods use netfinity's code to improve
the performance of the CUInt128::toBinaryString() &
CUInt128::toHexString() functions.

first, i checked the code and netfinity's change to
CUInt128::toHexString() is wrong since the
the m_data array is treated differently in CUInt128
then it's treated in EncodeBase16() as unsigned char...

second, i wanted to use his code but since it's not
the right solution, i've wrote 2 functions to do the job:

Quote

// Avi3k: improve toString
CString kadEncodeBase2(const uint32* pBuffer, int nLen)
{
  static const char base2Chars[3] = "01";
  char str[129] = {0};
  char* p = str;
  for (int i = 0; i < nLen; i++)
  {
    for (int j = 31; j >= 0; j--)
    {
      *p++ = base2Chars[ (pBuffer[i] >> j) & 1 ];
    }
  }
  *p = 0;
  return CString(str);
}

CString kadEncodeBase16(const uint32* pBuffer, int nLen)
{
  static const char base16Chars[17] = "0123456789ABCDEF";
  char str[33] = {0};
  char* p = str;
  for (int i = 0; i < nLen; i++)
  {
    for (int j = 28; j >= 0; j -= 4)
    {
      *p++ = base16Chars[ (pBuffer[i] >> j) & 0xf ];
    }
  }
  *p = 0;
  return CString(str);
}
// end Avi3k: improve toString

function call (within the class):

Quote

kadEncodeBase2/16(reinterpret_cast<const uint32*>(m_data), ARRSIZE(m_data))


as u can see, i didn't change Kad's code yet,
i want to know if this code is really better and also
if it's ok to modify Kad's code with these changes...
if so, i plan on merging the code to CUInt128 class
(will also save a function call :lol: )
fyi, i checked the code more than once in a
separate (VC) project with the functions taken from
eMule to compare the functions' output, it worked 100% :+1:

btw, <OffTopic>
i saw some files are still in the source code and are not used:
ListBoxST.cpp/h, LayeredWindowHelperST.cpp/h, BtnST.cpp/h...
didn't want to open a new topic just for that :P
</OffTopic>

Regards,
Avi3k

This post has been edited by Avi-3k: 10 September 2005 - 07:09 PM

retired developer of hebMule and eMule Skinner...
hebMule site and topic.
hebMule2 unique features: AntiLeech, AntiVirus, Fake Check, ServerFilter, WebSearches, Export Searches, Relative Priority, ModID and much much more...

eMule Skinner is an application to create/edit skins for eMule,
it's multilingual, supports mods, easy-to-use design, integrates to hebMule & Windows and lots more...

code fixes/improvements: #1, #2, #3, #4, #5, #6, #7, #8, #9, #10, #11 (to check/verify: #12, #13).
0

#2 User is offline   Tuxman 

  • lizzie and prog-rock fanatic
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 2,692
  • Joined: 26-July 04

Posted 08 September 2005 - 08:22 PM

Avi-3k, on Sep 8 2005, 10:11 PM, said:

i saw some files are still in the source code and are not used:
ListBoxST.cpp/h, LayeredWindowHelperST.cpp/h, BtnST.cpp/h...

minimule.h:
#include "LayeredWindowHelperST.h"


You're right with the other files, though...
(note: I myself use BtnST for some extra buttons in transfer wnd)
[ eMule beba ] :: v2.72 released, v3.00 in the works ...
- feel the lightweight! - featuring Snarl support, the Client Analyzer and tits!
Coded by a Golden eMule Award winner and most people's favorite modder!
..........................................
Music, not muzak:
Progressive Rock :: my last.fm profile
..........................................
eMule user since 0.28 ...
-[ ... and thanks for all the fish! ]-
0

#3 User is offline   Avi-3k 

  • hebMule [retired] dev
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 1,127
  • Joined: 25-June 03

Posted 08 September 2005 - 08:59 PM

thanx, i didn't catch that one since all of these files
are from the same code of the old prefs dlg... lol
but i checked the code and it's only used for Transparent percent
(calling a function from user32.dll, which can easily
be transfered to another place).

Avi3k
retired developer of hebMule and eMule Skinner...
hebMule site and topic.
hebMule2 unique features: AntiLeech, AntiVirus, Fake Check, ServerFilter, WebSearches, Export Searches, Relative Priority, ModID and much much more...

eMule Skinner is an application to create/edit skins for eMule,
it's multilingual, supports mods, easy-to-use design, integrates to hebMule & Windows and lots more...

code fixes/improvements: #1, #2, #3, #4, #5, #6, #7, #8, #9, #10, #11 (to check/verify: #12, #13).
0

#4 User is offline   Tuxman 

  • lizzie and prog-rock fanatic
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 2,692
  • Joined: 26-July 04

Posted 08 September 2005 - 09:23 PM

Avi-3k, on Sep 8 2005, 10:59 PM, said:

which can easily be transfered to another place

Obviously it was the easiest way to put it into minimule.cpp/.h as it's used only there. :P
[ eMule beba ] :: v2.72 released, v3.00 in the works ...
- feel the lightweight! - featuring Snarl support, the Client Analyzer and tits!
Coded by a Golden eMule Award winner and most people's favorite modder!
..........................................
Music, not muzak:
Progressive Rock :: my last.fm profile
..........................................
eMule user since 0.28 ...
-[ ... and thanks for all the fish! ]-
0

#5 User is offline   CiccioBastardo 

  • Doomsday Executor
  • PipPipPipPipPipPipPip
  • Group: Italian Moderators
  • Posts: 5,541
  • Joined: 22-November 03

Posted 08 September 2005 - 09:29 PM

Why not unrolling this loop. It's only eight cycles:
 for (int j = 28; j >= 0; j -= 4)
 á á{
 á á ástr += base16Chars[ (pBuffer[i] >> j) & 0xf ];
 á á}

unrolled:
str += base16Chars[ (pBuffer[i]>>0 ) & 0x0f ]; // >>0 just for simmetry;)
str += base16Chars[ (pBuffer[i]>>4) & 0x0f ];
str += base16Chars[ (pBuffer[i]>>8 ) & 0x0f ];
str += base16Chars[ (pBuffer[i]>>12 ) & 0x0f ];
str += base16Chars[ (pBuffer[i]>>16 ) & 0x0f ];
str += base16Chars[ (pBuffer[i]>>20) & 0x0f ];
str += base16Chars[ (pBuffer[i]>>24 ) & 0x0f ];
str += base16Chars[ (pBuffer[i]>>28 ) & 0x0f ];


I don't know if it better to to first the AND and then the >> or as it is done above. I mean:
...
str += base16Chars[ (pBuffer[i] & 0x00000f00) >> 8];
...


/edit: I suppose the compiler can optimize all those adds in the unrolled loop and it does not take the programmer to do a signle sum expression to produce a faster code.

This post has been edited by CiccioBastardo: 08 September 2005 - 09:33 PM

The problem is not the client, it's the user
0

#6 User is offline   Avi-3k 

  • hebMule [retired] dev
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 1,127
  • Joined: 25-June 03

Posted 09 September 2005 - 07:26 AM

@CB
both functions are linear (order of O(n))
so it's not that much of a difference,
but the loop saves us 7 more cycles of the same code...

btw, i would keep the order of the ops (shift then and)
it's looks better and won't confuse people, in my opinion...

@Tuxman
maybe a new code for me to work on :+1:

Avi3k
retired developer of hebMule and eMule Skinner...
hebMule site and topic.
hebMule2 unique features: AntiLeech, AntiVirus, Fake Check, ServerFilter, WebSearches, Export Searches, Relative Priority, ModID and much much more...

eMule Skinner is an application to create/edit skins for eMule,
it's multilingual, supports mods, easy-to-use design, integrates to hebMule & Windows and lots more...

code fixes/improvements: #1, #2, #3, #4, #5, #6, #7, #8, #9, #10, #11 (to check/verify: #12, #13).
0

#7 User is offline   Kry 

  • No Support
  • PipPipPipPipPipPipPip
  • Group: Member_D
  • Posts: 2,018
  • Joined: 27-June 03

Posted 09 September 2005 - 09:27 AM

Unrolling loops was something cool lot of years ago, but right now every average compiler does that for you.
Retired aMule developer.
Minister of Strange Operative Systems and Sarcasm (S.O.S & S) in President Birk's New World Order
0

#8 User is offline   CiccioBastardo 

  • Doomsday Executor
  • PipPipPipPipPipPipPip
  • Group: Italian Moderators
  • Posts: 5,541
  • Joined: 22-November 03

Posted 09 September 2005 - 10:39 AM

I'm a programmer of old times ;)
Unrolling always saves jumps and variable modification instructions.
If the work is done by the compiler then good. But do you trust it? :P
The problem is not the client, it's the user
0

#9 User is offline   Avi-3k 

  • hebMule [retired] dev
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 1,127
  • Joined: 25-June 03

Posted 09 September 2005 - 12:47 PM

Microsoft's softwares...? nope :lol:

since i'm not as experienced as you guys, can u tell
me how will VC optimize Ciccio's code if it all? thanx...

Avi3k
retired developer of hebMule and eMule Skinner...
hebMule site and topic.
hebMule2 unique features: AntiLeech, AntiVirus, Fake Check, ServerFilter, WebSearches, Export Searches, Relative Priority, ModID and much much more...

eMule Skinner is an application to create/edit skins for eMule,
it's multilingual, supports mods, easy-to-use design, integrates to hebMule & Windows and lots more...

code fixes/improvements: #1, #2, #3, #4, #5, #6, #7, #8, #9, #10, #11 (to check/verify: #12, #13).
0

#10 User is offline   CiccioBastardo 

  • Doomsday Executor
  • PipPipPipPipPipPipPip
  • Group: Italian Moderators
  • Posts: 5,541
  • Joined: 22-November 03

Posted 09 September 2005 - 02:46 PM

My code is already super-optimized :P

BTW, microshit's compiler would probably create a loop out of it :D
The problem is not the client, it's the user
0

#11 User is offline   gcostanza 

  • Philosopher
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 805
  • Joined: 10-October 04

Posted 09 September 2005 - 04:06 PM

Why should the compiler roll or unroll anything in this snippet automatically? The CPU can execute most of this code in parallel in both cases. The only variable that changes is the CString (and you have to stop the paralellism at the call to operator+ anyway unless it's inlined). There are no inderdependecies between the counters (actually the counter), no nested loops either. The assembler will be pretty much word for word translation from the code.
"Computer Science is no more about computers than astronomy is about telescopes."
-- E. W. Dijkstra
"Computers are useless. They can only give you answers."
-- Pablo Picasso
0

#12 User is offline   Avi-3k 

  • hebMule [retired] dev
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 1,127
  • Joined: 25-June 03

Posted 09 September 2005 - 05:15 PM

thanx gcostanza,
i thought it would copy the code word by word
and i hope i understood the rest of ur post :flowers:

@CB
lol, so does this mean my code is super-not-that-optimized? :lol:

Avi3k
retired developer of hebMule and eMule Skinner...
hebMule site and topic.
hebMule2 unique features: AntiLeech, AntiVirus, Fake Check, ServerFilter, WebSearches, Export Searches, Relative Priority, ModID and much much more...

eMule Skinner is an application to create/edit skins for eMule,
it's multilingual, supports mods, easy-to-use design, integrates to hebMule & Windows and lots more...

code fixes/improvements: #1, #2, #3, #4, #5, #6, #7, #8, #9, #10, #11 (to check/verify: #12, #13).
0

#13 User is offline   CiccioBastardo 

  • Doomsday Executor
  • PipPipPipPipPipPipPip
  • Group: Italian Moderators
  • Posts: 5,541
  • Joined: 22-November 03

Posted 09 September 2005 - 06:35 PM

Well, parallelism applies to simple instructions.
If you avoid jumps parallelsim work better, hence the unrolling of the loop. Expecially conditional jumps, as is the one to the start of the loop.
independently of parallelism, unrolling loops has been the old "magic" to make program run faster. But alas, nowadays we have 4GHz CPU and 400MB compilers that you don't really know if that is still needed.

See the specs for latest M$ with Vista to know what I mean :-k
(did they unrolled all they loops for needing so much memory? :P)
The problem is not the client, it's the user
0

#14 User is offline   SlugFiller 

  • The one and only master slug
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 6,988
  • Joined: 15-September 02

Posted 10 September 2005 - 10:36 AM

On the concept of auto-unrolling, there's a compiler option for that in VS, in case none of you noticed. I think it's active in the release build, not sure about the beta build, definately off in debug.

That aside, the branch cost for the loop is insignificant compared to the hundreds of branchs calls and other various actions in the CString::operator+, not to mention the possibility of going down to the Kernel for memory allocation.
So, CB, you failed at the most basic principal of optimizing - optimize at the bottleneck. The loop itself is definately not it in this case.
Why haven't you clicked yet?

SlugFiller rule #1: Unsolicited PMs is the second most efficient method to piss me off.
SlugFiller rule #2: The first most efficient method is unsolicited eMails.
SlugFiller rule #3: If it started in a thread, it should end in the same thread.
SlugFiller rule #4: There is absolutely no reason to perform the same discussion twice in parallel, especially if one side is done via PM.
SlugFiller rule #5: Does it say "Group: Moderators" under my name? No? Then stop telling me about who you want to ban! I really don't care! Go bother a moderator.
SlugFiller rule #6: I can understand English, Hebrew, and a bit of Japanese(standard) and Chinese(mandarin), but if you speak to me in anything but English, do expect to be utterly ignored, at best.
0

#15 User is offline   CiccioBastardo 

  • Doomsday Executor
  • PipPipPipPipPipPipPip
  • Group: Italian Moderators
  • Posts: 5,541
  • Joined: 22-November 03

Posted 10 September 2005 - 01:23 PM

Every little bit helps :P
The problem is not the client, it's the user
0

#16 User is offline   Avi-3k 

  • hebMule [retired] dev
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 1,127
  • Joined: 25-June 03

Posted 10 September 2005 - 02:46 PM

@CB
double meanning... :lol:

@all
so my code is ok? can i use it or should i wait for
an official dev's reply?

btw, how can i compare the codes' speed?
(a friend of mine suggested an external program,
but i'd prefer an internal code...)

Avi3k
retired developer of hebMule and eMule Skinner...
hebMule site and topic.
hebMule2 unique features: AntiLeech, AntiVirus, Fake Check, ServerFilter, WebSearches, Export Searches, Relative Priority, ModID and much much more...

eMule Skinner is an application to create/edit skins for eMule,
it's multilingual, supports mods, easy-to-use design, integrates to hebMule & Windows and lots more...

code fixes/improvements: #1, #2, #3, #4, #5, #6, #7, #8, #9, #10, #11 (to check/verify: #12, #13).
0

#17 User is offline   tHeWiZaRdOfDoS 

  • Man, what a bunch of jokers...
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 5,630
  • Joined: 28-December 02

Posted 10 September 2005 - 03:00 PM

An easy testing would be something like:

Quote

const DWORD start = ::GetTickCount();

//execute your code with random parameters a few thousand/million times

const DWORD end = ::GetTickCount();

//get time diff


IMHO if it's proven to work as expected there should be no necessity to wait...
0

#18 User is offline   gcostanza 

  • Philosopher
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 805
  • Joined: 10-October 04

Posted 10 September 2005 - 04:42 PM

Let me play the engineer here and actually test stuff. I had a book somewhere about assembly optimizations but I've misplaced it. The Itanium guide is very confusing like all Itanium things so it's of no help.

Whereas the intel compiler has a switch to unroll loops my VS2005 beta2 doesn't have the option anymore (or at least I can't find it). In fact it does not unroll anything in Release. Here's a snippet of the code it produces in with optimizations for speed.

Inside the loop:
004018E8  mov         edx,dword ptr [edi] 
004018EA  mov         ecx,esi 
004018EC  sar         edx,cl 
004018EE  lea         ecx,[esp+8] 
004018F2  and         edx,0Fh 
004018F5  movzx       eax,byte ptr [esp+edx+0Ch] 
004018FA  push        eax  
004018FB  call        dword ptr [__imp_ATL::CStringT<wchar_t,StrTraitMFC_DLL<wchar_t,ATL::ChTraitsCRT<wchar_t> > >::operator+= (40202Ch)] 
00401901  sub         esi,4 
00401904  jns         test1+78h (4018E8h) 


Unrolled function snippet:
004017C8  mov         edx,dword ptr [esi] 
004017CA  sar         edx,0Ch 
004017CD  and         edx,0Fh 
004017D0  movzx       eax,byte ptr [esp+edx+8] 
004017D5  push        eax  
004017D6  lea         ecx,[esp+8] 
004017DA  call        dword ptr [__imp_ATL::CStringT<wchar_t,StrTraitMFC_DLL<wchar_t,ATL::ChTraitsCRT<wchar_t> > >::operator+= (40202Ch)] 


Nothing interesting. The loop has some extra instructions. Measuring speeds using QueryPerformanceCounter in a similar way WiZaRd proposed gives a tie-breaker. Didn't have the patience to wait long but more or less sometimes one is faster, sometimes the other, according to the random glitches in the Matrix.

Here comes the cool part, though. VS2005 has "Profile guided optimization" option. Let's see...
...
...
The loop is not unrolled and my 2 test functions are inlined. Very much same code is produced.

Of course I agree with SF - the bottleneck is the CString. As for unrolling the Intel compiler has the option (ICC -unroll) but it is not imperative - the compiler will unroll only what it deems necessary. Go figure.
"Computer Science is no more about computers than astronomy is about telescopes."
-- E. W. Dijkstra
"Computers are useless. They can only give you answers."
-- Pablo Picasso
0

#19 User is offline   Avi-3k 

  • hebMule [retired] dev
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 1,127
  • Joined: 25-June 03

Posted 10 September 2005 - 05:14 PM

@Wiz
tried the very same code and it showed 0 to both funcs... :\

@gcostanza
thanx for the explaination :flowers:
if the bottleneck is in the CString object, it can easily be changed
to use a (t)char array and a pointer to go through it. after all,
both functions (my versions) return a string with constant length...

@all
i'm gonna change the code to remove the CString bottleneck
and update the thread later...

and since there's no real difference between rolling and unrolling,
i think i should keep the loop, despite the extra instructions...

Avi3k
retired developer of hebMule and eMule Skinner...
hebMule site and topic.
hebMule2 unique features: AntiLeech, AntiVirus, Fake Check, ServerFilter, WebSearches, Export Searches, Relative Priority, ModID and much much more...

eMule Skinner is an application to create/edit skins for eMule,
it's multilingual, supports mods, easy-to-use design, integrates to hebMule & Windows and lots more...

code fixes/improvements: #1, #2, #3, #4, #5, #6, #7, #8, #9, #10, #11 (to check/verify: #12, #13).
0

#20 User is offline   tHeWiZaRdOfDoS 

  • Man, what a bunch of jokers...
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 5,630
  • Joined: 28-December 02

Posted 10 September 2005 - 05:23 PM

Avi-3k, on Sep 10 2005, 06:14 PM, said:

@Wiz
tried the very same code and it showed 0 to both funcs... :\


Of course! Be careful NOT to use optimizations or the compiler will most probably remove the unneeded calls.
Also, using static values to test the function should be avoided as the compiler might also just put out the results...

I tried some time ago and found a way to have a delay of exactly the time I needed (for serial programming) - it was something like:

Quote

for(int i = 0; i < 1982; ++i)
á  int j = 37*99;

This worked great in debug but in release it was replaced by:

Quote

for(int i = 0; i < 1982; ++i)
á  int j = 3663;

which was replaced by

Quote

int j = 3663;

which was finally deleted as it's unnecessary and therefore there was no delay anymore :cry2:

This post has been edited by tHeWiZaRdOfDoS: 10 September 2005 - 05:24 PM

0

  • Member Options

  • (2 Pages)
  • +
  • 1
  • 2

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users