SlugFiller, on Oct 13 2005, 10:13 PM, said:
Quote
We set a global hardlimit of 3000 sources but we could have found 4900. So let's first assign 10 sources to every file
This isn't bandwidth, you can't "assign" sources to files.
Sources arrive, and you either accept or refuse them.
I'm not talking about accepting or refusing them. I'm just calculating individual hardlimits for each file in a way they add up to 3000, that's all. Everything else works as it does in the plain vanilla eMule. Talking in eMule protocol terms I'm
never refusing any source, I just treat some of these as active and others as passive (see below).
The "rare files protection treshold" might even be infinite. Which would mean that the rare files would accept as many sources as possible while the more popular files would accept as many sources as the total sources limit allows them to. The algorithm is trivial (excuse me for sounding PERLish...):
1. From your source exchange information, derive the number of potential sources for all downloading files. If their total is smaller then the total_hardlimit then {boost all these values proportionally for each file_hardlimit [$downloading_file] by the factor total_hardlimit / total_observed; we're done already}; otherwise:
2. hardlimit = 0;
3. UNTIL total_hardlimit <= 0 {
3a. hardlimit++;
3b. foreach $downloading_file {
3ba. if potential_sources ($downloading_file) > hardlimit {
file_hardlimit [$downloading_file]++; total_hardlimit--;
}}}
(This isn't even an efficient implementation, it is just easy to understand.)
Potential sources are as above: 5, 5, 10, 15, 25, 40, 50, 100, 150, 200, 300, 500, 800, 1200, 1500.
Result of the algorithm should be: 5, 5, 10, 15, 25, 40, 50, 100, 150, 200, 300, 500, 534, 534, 534.
I don't see any tendency to boost popular files - in fact the opposite is true! The result would be exactly the same as if I configured a hardlimit of 534 manually - it's just that
I would have to know this value beforehand. As you see I can compute it from the 4900 "observed" sources easily.
eMule does have to keep all 4900 sources in memory (I need their user hashes to identify duplicates) but it would only treat 3002 of these sources as "active", i. e. visible in the eMule windows, contacted every 28 minutes and such. The other 1898 sources are memorized but not used (wait - we can of course shift them into "active" status if we lose sources there, sparing us asking servers and/or other clients; preferably we'd activate those passive sources that are within a 60 minutes time span so that they never know they have been passive...). Client source exchange (active as well as passive) is done depending on the computed hardlimit per file, as it is done already in mods that support hardlimits per file (sivka-ish). So far no additional network overhead.
I'm aware that compared to the static 534 hardlimit this "observation" of the 4900 sources (actually setting their hardlimit to infinite for 30 minutes) would cause additional overhead
only for those files that reached their computed hardlimit, as I would lose track of their potential sources if I stopped searching for these sources forever (I do know the potential sources for all other files as I didn't stop searching for these). That's why I suggested to make this check only every 12 hours or so, maybe even less frequently. Actually it doesn't matter whether my hardlimits are off the optimum value for a couple of hours, it's still better than if they're completely wrong for months! And note that only 3 of 15 files would require the additional investigation - obviously those with most sources, unfortunately. Dropping sources from memory if they're not validated by the next "observation" cycle applies of course, as not to let the "observed" sources number grow infinitely.
The same applies for changes of any type. Completed files, added files... I don't care. Let new downloads have some standard hardlimit like 200 or so, only to fix it during the next observation cycle. All that counts is that the hardlimit per file is reasonable for the largest part of this download's life span.
Also note that
a configuration with a hardlimit of 1501 would have produced a lot more overhead, and perhaps even overrun my router. But that's exactly what newbies are doing as they don't understand how to reasonably configure their hardlimit. Give them a default value of 3000 sources total and they'll be happy.
SlugFiller, on Oct 13 2005, 10:13 PM, said:
Even if you consider dropping sources for even-ness, you still have an issue with picking a source to drop. You could end up dropping "the good source". There's no real measure for which is which.
Note that I never even mentioned the word "dropping"! All that I'm doing is calculating hardlimits per file. What does eMule do if I set the existing hardlimit to a lower value manually? It drops sources, right? Fine - so that's already implemented.
This post has been edited by Devil Doll: 14 October 2005 - 12:17 AM