Official eMule-Board: [patches] The Optimization Thread (Off topic) - Official eMule-Board

Jump to content


  • (2 Pages)
  • +
  • 1
  • 2

[patches] The Optimization Thread (Off topic)

#1 User is offline   .:fl0yd:. 

  • Magnificent Member
  • PipPipPipPipPipPip
  • Group: Members
  • Posts: 355
  • Joined: 17-February 03

Posted 21 February 2003 - 01:27 AM

Loop unrolling?!? Give me a break script kiddies, this is the year 2003. I'm not exactly sure what year you are stuck in, but if anyone knew anything about modern cpu's around here they would quickly realize how ridiculous loop unrolling is, once your unrolled loop doesn't fit within a cache line boundary of 32bytes. Two more buzzwords just for the hell of it: branch prediction/micro-op cache on P4's. You guys make me laugh really hard. Thanks a lot.
0

#2 User is offline   Avi 

  • Golden eMule
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 1460
  • Joined: 11-September 02

Posted 21 February 2003 - 03:02 AM

.:fl0yd:., on Feb 20 2003, 08:27 PM, said:

Loop unrolling?!? Give me a break script kiddies, this is the year 2003. I'm not exactly sure what year you are stuck in, but if anyone knew anything about modern cpu's around here they would quickly realize how ridiculous loop unrolling is, once your unrolled loop doesn't fit within a cache line boundary of 32bytes. Two more buzzwords just for the hell of it: branch prediction/micro-op cache on P4's. You guys make me laugh really hard. Thanks a lot.

We are happy today, aren't we? ;)
0

#3 User is offline   hstink 

  • Member
  • PipPip
  • Group: Members
  • Posts: 28
  • Joined: 10-January 03

Posted 21 February 2003 - 04:57 AM

Loop unrolling?!? Give me a break script kiddies, this is the year 2003. I'm not exactly sure what year you are stuck in, but if anyone knew anything about modern cpu's around here they would quickly realize how ridiculous loop unrolling is, once your unrolled loop doesn't fit within a cache line boundary of 32bytes. Two more buzzwords just for the hell of it: branch prediction/micro-op cache on P4's. You guys make me laugh really hard. Thanks a lot.

What's your source for the P4's cache line size being 32 bytes?

-h
0

#4 User is offline   Maccara 

  • Member
  • PipPip
  • Group: Members
  • Posts: 34
  • Joined: 17-September 02

Posted 21 February 2003 - 11:15 AM

@.:fl0yd:.

Relax dude. :P

As I said, I don't know about P4s. Athlons have 64K instruction cache which DOES benefit from loop unrolling. (AMD recommends it strongly)

Of course, this completely depends on loop. That's why I asked for someone to test. Difficult to compare results as we don't have same setups. That's why I'd like to do instruction pipeline profiling.

B Rgds,
Maccara
0

#5 User is offline   SunMaster 

  • Premium Member
  • PipPipPipPipPip
  • Group: Members
  • Posts: 277
  • Joined: 03-December 02

Posted 21 February 2003 - 11:29 AM

Quote

Loop unrolling?!? Give me a break script kiddies, this is the year 2003. I'm not exactly sure what year you are stuck in, but if anyone knew anything about modern cpu's around here they would quickly realize how ridiculous loop unrolling is, once your unrolled loop doesn't fit within a cache line boundary of 32bytes. Two more buzzwords just for the hell of it: branch prediction/micro-op cache on P4's. You guys make me laugh really hard. Thanks a lot.


Instead of belittling and ridiculing other people's work, which the rest of us emule users currently benefit from, why don't you share your knowledge with the rest of us and present something even better ?

As for terminology and buzzwords, a script kiddie is generally someone who is using other people's code without knowing anything about how it actually works and your typical's script kiddies knowledge is limited to executing ./configure. Tossing buzzwords left and right without having shown anything to prove your self proclaimed vast knowledge makes you look like a script kiddie, not those who contribute.

I take it's time for unknown1 to delete some more posts, including this one.

Oh, and feel free to explain to me in another thread why an integer per defintion is signed. I'm sure that tidbit will revolutionize computing.
0

#6 User is offline   .:fl0yd:. 

  • Magnificent Member
  • PipPipPipPipPipPip
  • Group: Members
  • Posts: 355
  • Joined: 17-February 03

Posted 21 February 2003 - 12:06 PM

hstink, on Feb 21 2003, 04:57 AM, said:

What's your source for the P4's cache line size being 32 bytes?

-h

Did I say that? While not explicitly stated I was talking about PIII's cache line boundary. It may be 64bytes on a P4, although I doubt that. In any case, a P4 has a genuinely different approach in decoding/storing/scheduling/feeding micro-ops to it's RISC kernel, which doesn't at all profit from loop unrolling. Since P4's store the last decoded opcodes in a ring buffer, the loop will most likely be faster as it repeatedly executes the same instructions, while the unrolled loop has to decode each and every imul-instruction over-and-over again.
0

#7 User is offline   Maella 

  • Magnificent Member
  • PipPipPipPipPipPip
  • Group: Members
  • Posts: 410
  • Joined: 27-December 02

Posted 21 February 2003 - 12:15 PM

Hi,

Did I miss something important? I didn't come back in this thread since my last post B) .

@Unknown1

I hope that you won't waste to much time to delete posts.
I remember you that you still have a lot of coding work to do ;) !

Maella

PS: Unknown1, you can delete this post
0

#8 User is offline   .:fl0yd:. 

  • Magnificent Member
  • PipPipPipPipPipPip
  • Group: Members
  • Posts: 355
  • Joined: 17-February 03

Posted 21 February 2003 - 12:23 PM

Maccara, on Feb 21 2003, 11:15 AM, said:

As I said, I don't know about P4s. Athlons have 64K instruction cache which DOES benefit from loop unrolling. (AMD recommends it strongly)

I'm not quite sure where exactly you got that information. Loop unrolling today has it's value, but in very limited scenarios, e.g. matrix multiplication in inner loops, if and only if matrices are small, i.e. 4x4 or less. Moreover, the code needs to be rewritten and not merely expanded. In the case of matrix multiplication you would use a cpu's vector unit (SSE/SSE2/3DNow!/...) which is capable of performing SIMD instructions -- this being the only noticeable speedup.

So why are loops all that bad? It's conditional jumps that, when mispredicted, cause the pipeline to be flushed. With good branch prediction these days, it will fail twice at most (the first static branch prediction [which a modern compiler will prevent from failing] and the last dynamic branch prediction when the exit condition is met for the first time) for any given loop; only once at most in consecutive calls and very likely not at all.

If you are still convinced that you gain something through loop unrolling, make it at least automatic. As far as I know there are two valid approaches:
  • if you don't mind pollution global namespace with macros you could use DUFF's machine
  • far superior though is template meta programming

As a side-note: If a compiler is 'bad at loop-unrolling' it is so on purpose more often than by accident. After all, modern compilers do a lot of calculations during the optimization stage and make a decision on something being worth or not by the power of pure mathematics. Pretty hard to beat with human brains.
0

#9 User is offline   Maella 

  • Magnificent Member
  • PipPipPipPipPipPip
  • Group: Members
  • Posts: 410
  • Joined: 27-December 02

Posted 21 February 2003 - 12:32 PM

Hi,

Maybe it's the best place to post this link, but I thinks some of you could have interesse.

Pentium 4: Round 1 at www.emulators.com

It's a big deep analys of the architecture of the pentium 4 from the point of view of a coder.

If anybody else have good link to share about code optimalization, please feel free to share it with the others.

:) Maella

PS: Unknown1, please feel free to move this post to a new thread

This post has been edited by Maella: 21 February 2003 - 03:35 PM

0

#10 User is offline   .:fl0yd:. 

  • Magnificent Member
  • PipPipPipPipPipPip
  • Group: Members
  • Posts: 355
  • Joined: 17-February 03

Posted 21 February 2003 - 12:36 PM

SunMaster, on Feb 21 2003, 11:29 AM, said:

Oh, and feel free to explain to me in another thread why an integer per defintion is signed. I'm sure that tidbit will revolutionize computing.

Glad to be able to revolutionize your little world then. Take a look at the c++ ansi standard 3.9.1 section 2:

Quote

There are four signed integer types: “signed char”, “short int”, “int”, and “long int.”
Have fun with your broadened horizon...
0

#11 User is offline   SunMaster 

  • Premium Member
  • PipPipPipPipPip
  • Group: Members
  • Posts: 277
  • Joined: 03-December 02

Posted 21 February 2003 - 12:50 PM

Maybe you should re-read your own quote.

There is a difference between "default to" and "per definition", at least for most of us.
0

#12 User is offline   .:fl0yd:. 

  • Magnificent Member
  • PipPipPipPipPipPip
  • Group: Members
  • Posts: 355
  • Joined: 17-February 03

Posted 21 February 2003 - 01:10 PM

Not quite sure what your problem is exactly. int's are signed by definition as I stated twice in this thread already. Would you mind reading your request for explanation and my answer to that again? Now, enough of that childish crap, SunMaster. If you have something to say, then do so. If you don't, try not to do it all that loud then, ok?
0

#13 User is offline   JustusJonas 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 55
  • Joined: 09-November 02

Posted 21 February 2003 - 06:40 PM

@ floyd

unsigned
The unsigned keyword indicates that the most significant bit of an integer variable represents a data bit rather than a signed bit.

[[ unsigned ]] type-qualifier [[ int ]]identifier-name;
Parameters
type-qualifier 
Can be any of char, wchar_t, long, int, short, and small. 
identifier-name 
Specifies a valid MIDL identifier. Valid MIDL identifiers consist of up to 31 alphanumeric and/or underscore characters and must start with an alphabetic or underscore character. 
Remarks
This keyword is optional and can be used with any of the character and integer types char, wchar_t, long, short, and small. You can optionally include the keyword int after the type qualifiers long, short, and small.

When you use the MIDL compiler switch /char, character and integer types that appear in the IDL file without explicit sign keywords can appear with the signed or unsigned keyword in the generated header file. To avoid confusion, specify the sign of the integer and character types.


so if you define uint32 'typedef unsigned int uint32;' the highest bit of the 'by default' signed 32bit integer get's treated as databit instead of beeing treated as a signed bit... so it's NOTHING wrong with a typedef of signed int .... and yeah int without anyhint about if it's signed or not is always signed... but if you typedef if at unsigned then it is of course unsigned!
0

#14 User is offline   .:fl0yd:. 

  • Magnificent Member
  • PipPipPipPipPipPip
  • Group: Members
  • Posts: 355
  • Joined: 17-February 03

Posted 21 February 2003 - 06:59 PM

@ JustusJonas:

I'm well aware of what an unsigned and signed data type is. Let me explain, what I find obfuscating about eMule' types.h file: In particular, it is the following 2 lines of code
typedef unsigned short  int16;
typedef unsigned int  int32;
While this is not wrong, it is by all means misleading, as an int as well as a short int are signed by definition of the c++ ansi standard, yet those int16/int32's are typedef'ed to be unsigned. Why not keep a coherent scheme in naming data types?

Btw., I do not like to read about the c++ language from Microsoft's documentation. If it isn't in the ansi standard it is not c++, easy as that.
0

#15 User is offline   JustusJonas 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 55
  • Joined: 09-November 02

Posted 21 February 2003 - 09:31 PM

jeeez... k then show me something about 'signed' and 'unsigned' in your oh so mighty 'c++ ansi standard 3.9.1'

why is it missleading??? and why should it be 'not coherent' ?
0

#16 User is offline   .:fl0yd:. 

  • Magnificent Member
  • PipPipPipPipPipPip
  • Group: Members
  • Posts: 355
  • Joined: 17-February 03

Posted 21 February 2003 - 09:45 PM

@ JustusJonas:

easy: int is signed whereas int32 is not -- now you tell me that code with those typedef's isn't obfuscated?

If I recall correctly I already posted the exact quote of the ansi c++ standard. If it isn't as mighty to you as it is to me then keep on hacking...
0

#17 User is offline   .:fl0yd:. 

  • Magnificent Member
  • PipPipPipPipPipPip
  • Group: Members
  • Posts: 355
  • Joined: 17-February 03

Posted 21 February 2003 - 09:51 PM

Just for the fun of it, this is what the MSDN has to say about int's:

Quote

The keyword int specifies a 32-bit signed integer on 32-bit platforms.

0

#18 User is offline   zz 

  • -
  • PipPipPipPipPipPipPip
  • Group: Debugger
  • Posts: 2014
  • Joined: 30-November 02

Posted 21 February 2003 - 10:57 PM

Most of the patches in this thread has so far been algorithmic optimizations or heuristic enhancements. That kind of optimization often gives better effect, means less work, the code is still readable, and they work no matter what cpu you have.

I don't think it is worth obfuscating the code to save a few dozen cycles here and there.

Write clear, readable (and if necessary, inefficient) code to start with, and then optimize the hotspots. Only the hotspots. Please :)

I've inserted a few of these patches in my test version, and I see no need for more optimization on my hardware (700 MHz & ca 400 MBytes mem).

Except possibly for the patch that separates passive sources from the ones currently being downloaded from. I think that one is a good idea.

/zz B)

This post has been edited by zz: 21 February 2003 - 11:20 PM

ZZUL - get control of your uploads: ZZUL Forum
0

#19 User is offline   SunMaster 

  • Premium Member
  • PipPipPipPipPip
  • Group: Members
  • Posts: 277
  • Joined: 03-December 02

Posted 21 February 2003 - 11:13 PM

This post will self destruct once you've read it!
0

#20 User is offline   .:fl0yd:. 

  • Magnificent Member
  • PipPipPipPipPipPip
  • Group: Members
  • Posts: 355
  • Joined: 17-February 03

Posted 21 February 2003 - 11:19 PM

zz, on Feb 21 2003, 10:57 PM, said:

I think it time to move the bickering about cpu-level optimization etc to another thread.

Looks like Unknown1 was faster than you expected ;) -- this thread has already been partially moved.

Anyway, it's nice to hear from people that actually do have an understanding of when it's worth optimizing.

As a side note: there are ways to improve speed, even at cpu level without obfuscating the code or making it less readable. Most techniques involve templates though, which are neither trivial nor fully supported by msvc.net.
0

  • Member Options

  • (2 Pages)
  • +
  • 1
  • 2

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users