[patches] The Optimization Thread (Off topic)
#1
Posted 21 February 2003 - 01:27 AM
#2
Posted 21 February 2003 - 03:02 AM
.:fl0yd:., on Feb 20 2003, 08:27 PM, said:
We are happy today, aren't we?
#3
Posted 21 February 2003 - 04:57 AM
What's your source for the P4's cache line size being 32 bytes?
-h
#4
Posted 21 February 2003 - 11:15 AM
Relax dude.
As I said, I don't know about P4s. Athlons have 64K instruction cache which DOES benefit from loop unrolling. (AMD recommends it strongly)
Of course, this completely depends on loop. That's why I asked for someone to test. Difficult to compare results as we don't have same setups. That's why I'd like to do instruction pipeline profiling.
B Rgds,
Maccara
#5
Posted 21 February 2003 - 11:29 AM
Quote
Instead of belittling and ridiculing other people's work, which the rest of us emule users currently benefit from, why don't you share your knowledge with the rest of us and present something even better ?
As for terminology and buzzwords, a script kiddie is generally someone who is using other people's code without knowing anything about how it actually works and your typical's script kiddies knowledge is limited to executing ./configure. Tossing buzzwords left and right without having shown anything to prove your self proclaimed vast knowledge makes you look like a script kiddie, not those who contribute.
I take it's time for unknown1 to delete some more posts, including this one.
Oh, and feel free to explain to me in another thread why an integer per defintion is signed. I'm sure that tidbit will revolutionize computing.
#6
Posted 21 February 2003 - 12:06 PM
hstink, on Feb 21 2003, 04:57 AM, said:
-h
Did I say that? While not explicitly stated I was talking about PIII's cache line boundary. It may be 64bytes on a P4, although I doubt that. In any case, a P4 has a genuinely different approach in decoding/storing/scheduling/feeding micro-ops to it's RISC kernel, which doesn't at all profit from loop unrolling. Since P4's store the last decoded opcodes in a ring buffer, the loop will most likely be faster as it repeatedly executes the same instructions, while the unrolled loop has to decode each and every imul-instruction over-and-over again.
#7
Posted 21 February 2003 - 12:15 PM
Did I miss something important? I didn't come back in this thread since my last post .
@Unknown1
I hope that you won't waste to much time to delete posts.
I remember you that you still have a lot of coding work to do !
Maella
PS: Unknown1, you can delete this post
#8
Posted 21 February 2003 - 12:23 PM
Maccara, on Feb 21 2003, 11:15 AM, said:
I'm not quite sure where exactly you got that information. Loop unrolling today has it's value, but in very limited scenarios, e.g. matrix multiplication in inner loops, if and only if matrices are small, i.e. 4x4 or less. Moreover, the code needs to be rewritten and not merely expanded. In the case of matrix multiplication you would use a cpu's vector unit (SSE/SSE2/3DNow!/...) which is capable of performing SIMD instructions -- this being the only noticeable speedup.
So why are loops all that bad? It's conditional jumps that, when mispredicted, cause the pipeline to be flushed. With good branch prediction these days, it will fail twice at most (the first static branch prediction [which a modern compiler will prevent from failing] and the last dynamic branch prediction when the exit condition is met for the first time) for any given loop; only once at most in consecutive calls and very likely not at all.
If you are still convinced that you gain something through loop unrolling, make it at least automatic. As far as I know there are two valid approaches:
- if you don't mind pollution global namespace with macros you could use DUFF's machine
- far superior though is template meta programming
As a side-note: If a compiler is 'bad at loop-unrolling' it is so on purpose more often than by accident. After all, modern compilers do a lot of calculations during the optimization stage and make a decision on something being worth or not by the power of pure mathematics. Pretty hard to beat with human brains.
#9
Posted 21 February 2003 - 12:32 PM
Maybe it's the best place to post this link, but I thinks some of you could have interesse.
Pentium 4: Round 1 at www.emulators.com
It's a big deep analys of the architecture of the pentium 4 from the point of view of a coder.
If anybody else have good link to share about code optimalization, please feel free to share it with the others.
Maella
PS: Unknown1, please feel free to move this post to a new thread
This post has been edited by Maella: 21 February 2003 - 03:35 PM
#10
Posted 21 February 2003 - 12:36 PM
SunMaster, on Feb 21 2003, 11:29 AM, said:
Glad to be able to revolutionize your little world then. Take a look at the c++ ansi standard 3.9.1 section 2:
Quote
#11
Posted 21 February 2003 - 12:50 PM
There is a difference between "default to" and "per definition", at least for most of us.
#12
Posted 21 February 2003 - 01:10 PM
#13
Posted 21 February 2003 - 06:40 PM
unsigned The unsigned keyword indicates that the most significant bit of an integer variable represents a data bit rather than a signed bit. [[ unsigned ]] type-qualifier [[ int ]]identifier-name; Parameters type-qualifier Can be any of char, wchar_t, long, int, short, and small. identifier-name Specifies a valid MIDL identifier. Valid MIDL identifiers consist of up to 31 alphanumeric and/or underscore characters and must start with an alphabetic or underscore character. Remarks This keyword is optional and can be used with any of the character and integer types char, wchar_t, long, short, and small. You can optionally include the keyword int after the type qualifiers long, short, and small. When you use the MIDL compiler switch /char, character and integer types that appear in the IDL file without explicit sign keywords can appear with the signed or unsigned keyword in the generated header file. To avoid confusion, specify the sign of the integer and character types.
so if you define uint32 'typedef unsigned int uint32;' the highest bit of the 'by default' signed 32bit integer get's treated as databit instead of beeing treated as a signed bit... so it's NOTHING wrong with a typedef of signed int .... and yeah int without anyhint about if it's signed or not is always signed... but if you typedef if at unsigned then it is of course unsigned!
#14
Posted 21 February 2003 - 06:59 PM
I'm well aware of what an unsigned and signed data type is. Let me explain, what I find obfuscating about eMule' types.h file: In particular, it is the following 2 lines of code
typedef unsigned short int16; typedef unsigned int int32;While this is not wrong, it is by all means misleading, as an int as well as a short int are signed by definition of the c++ ansi standard, yet those int16/int32's are typedef'ed to be unsigned. Why not keep a coherent scheme in naming data types?
Btw., I do not like to read about the c++ language from Microsoft's documentation. If it isn't in the ansi standard it is not c++, easy as that.
#15
Posted 21 February 2003 - 09:31 PM
why is it missleading??? and why should it be 'not coherent' ?
#16
Posted 21 February 2003 - 09:45 PM
easy: int is signed whereas int32 is not -- now you tell me that code with those typedef's isn't obfuscated?
If I recall correctly I already posted the exact quote of the ansi c++ standard. If it isn't as mighty to you as it is to me then keep on hacking...
#17
Posted 21 February 2003 - 09:51 PM
Quote
#18
Posted 21 February 2003 - 10:57 PM
I don't think it is worth obfuscating the code to save a few dozen cycles here and there.
Write clear, readable (and if necessary, inefficient) code to start with, and then optimize the hotspots. Only the hotspots. Please
I've inserted a few of these patches in my test version, and I see no need for more optimization on my hardware (700 MHz & ca 400 MBytes mem).
Except possibly for the patch that separates passive sources from the ones currently being downloaded from. I think that one is a good idea.
/zz
This post has been edited by zz: 21 February 2003 - 11:20 PM
#19
Posted 21 February 2003 - 11:13 PM
#20
Posted 21 February 2003 - 11:19 PM
zz, on Feb 21 2003, 10:57 PM, said:
Looks like Unknown1 was faster than you expected -- this thread has already been partially moved.
Anyway, it's nice to hear from people that actually do have an understanding of when it's worth optimizing.
As a side note: there are ways to improve speed, even at cpu level without obfuscating the code or making it less readable. Most techniques involve templates though, which are neither trivial nor fully supported by msvc.net.