Feature: Zz Chunk Chooser Faster completion of chunks during DL
#21
Posted 09 January 2006 - 04:55 PM
/zz
#22
Posted 09 January 2006 - 08:13 PM
#23
Posted 09 January 2006 - 08:21 PM
But I can see the advantage in your strategy as well, if we can be reasonably sure that the first chunk completes without the extra help.
/zz
#24
Posted 18 January 2006 - 04:22 AM
Based on what I have read in this forum, one difference between "official" emule and zz is that zz attempts to complete chunks by aggregating several users if that chunk is the rarest. Also, several people have pointed out the obvious that the rarest chunk is indeed THE chunk with the least sources. Official emule has claimed to pick the rarest chunk for several releases. However, based on information that I have gathered and behaviour I have observed, it does not do a very good job of this (particularly with truly rare chunks -- ie Less than 10). My understanding is that official emule splits the available chunks into rarity "bands", 0-10, 10-200, 200-800, 800+ and somewhat randomly picks the chunks from those bands. Therefore, back to my original question, previously posted in the incorrect forum....
What is the difference between zzul rare chunk selection, and official rare chunk selection.
This post has been edited by clyde: 18 January 2006 - 04:24 AM
#25
Posted 18 January 2006 - 05:28 AM
You "shocked" me over there with revealing those bands.
But how I understood zz's posting over at the other topic, those bands are used only to determine what is considered most rare,
for files with 0-10 sources : 1 is rarer than 2, is rarer than 3, ...
for files with 10-200 sources : 1 is as rare as 2, is rarer than 3,...
and so on.
If i id not (once again ) misunderstand, then those bands by far do not have the impact as you think.
#26
Posted 18 January 2006 - 06:36 PM
How eMule picks the chunk to download
There are three bands: very rare, rare, and normal.
Basically eMule will try to pick the rarest chunk in the "very rare" band if there are chunks that are very rare, otherwise it will try to pick the rarest chunk in the "rare" band. If there are no "very rare" or "rare" chunks, it will pick a random chunk from the "normal" band (preferring to continue a chunk it has already started, unless there's already a transfer going on to that chunk).
The limits for these bands are is decided by the total number of sources found for the file. If there's less than 200 sources in total for the file, the limits are that if a chunk exists at less than 10% of the sources it's considered "very rare", 10%-20% it's "rare", and otherwise it's considered normal.
The completeness of a chunk is included in the valuation of the rareness of a chunk; in the "very rare" band, if a chunk is > 75% complete, 1 source is subtracted from the number of sources available for the chunk, that is, it's evaluated as if it's slightly rarer than it really is.
For example:
If the file has 200 sources, and you are comparing the two rarest chunks, one having just one source being 0% complete, and the other having 2 sources but being 90% complete, then it may pick the chunk with two sources for download. If it is comparing two chunks that each have two sources, but one of them is 10% complete, it will prefer to continue on the started chunk.
In the "rare band" every 25% of completeness subtracts one source, so if a chunk is > 75% complete, it may be picked instead of a chunk that has three less sources.
In the normal band (>20% of the sources have the chunk) it doesn't care about the rareness of the chunk, it will just continue the most complete chunk.
So eMule will try to pick a "very rare" chunk, otherwise a "rare" chunk, and if there's no such chunks, it will pick a random chunk.
Problems with this strategy
Basically, I see two problems with the standard eMule implementations, and they are easily fixed without changing much code:
1. The rareness bands are too low. At 10 sources in total, the "very rare" band will be 1 source, and the "rare" band is 2 sources. When you select a chunk from a remote client, if it has no chunks with only 1-2 sources, eMule will pick a random chunk from the rest of the chunks (they will all have 3-10 sources). I'd prefer it pick the rarest chunk when there's so few sources. This is a change I've done in ZZUL.
2. Standard eMule prefers to transfer from only one client to each chunk. If you have two transfers going on at the same time, they will be for two separate chunks. I think it's better to try to complete a chunk as soon as possible, which would be done by letting those two transfers go to the same chunk. That chunk could then be completed and reshared twice as fast. This is also done in ZZUL.
The code
Here's the code for the standard eMule. I have marked the important places with red, for instance where the limits for the "very rare" and the "rare" band is selected.
Quote
Requested_Block_Struct** newblocks,
uint16* count) /*const*/
{
// The purpose of this function is to return a list of blocks (~180KB) to
// download. To avoid a prematurely stop of the downloading, all blocks that
// are requested from the same source must be located within the same
// chunk (=> part ~9MB).
//
// The selection of the chunk to download is one of the CRITICAL parts of the
// edonkey network. The selection algorithm must insure the best spreading
// of files.
//
// The selection is based on 4 criteria:
// 1. Frequency of the chunk (availability), very rare chunks must be downloaded
// as quickly as possible to become a new available source.
// 2. Parts used for preview (first + last chunk), preview or check a
// file (e.g. movie, mp3)
// 3. Request state (downloading in process), try to ask each source for another
// chunk. Spread the requests between all sources.
// 4. Completion (shortest-to-complete), partially retrieved chunks should be
// completed before starting to download other one.
//
// The frequency criterion defines three zones: very rare (<10%), rare (<50%)
// and common (>30%). Inside each zone, the criteria have a specific ‘weight’, used
// to calculate the priority of chunks. The chunk(s) with the highest
// priority (highest=0, lowest=0xffff) is/are selected first.
//
// very rare (preview) rare common
// 0% <---- +0 pt ----> 10% <----- +10000 pt -----> 50% <---- +20000 pt ----> 100%
// 1. <------- frequency: +25*frequency pt ----------->
// 2. <- preview: +1 pt --><-------------- preview: set to 10000 pt ------------->
// 3. <------ request: download in progress +20000 pt ------>
// 4a. <- completion: 0% +100, 25% +75 .. 100% +0 pt --><-- !req => completion --->
// 4b. <--- req => !completion -->
//
// Unrolled, the priority scale is:
//
// 0..xxxx unrequested and requested very rare chunks
// 10000..1xxxx unrequested rare chunks + unrequested preview chunks
// 20000..2xxxx unrequested common chunks (priority to the most complete)
// 30000..3xxxx requested rare chunks + requested preview chunks
// 40000..4xxxx requested common chunks (priority to the least complete)
//
// This algorithm usually selects first the rarest chunk(s). However, partially
// complete chunk(s) that is/are close to completion may overtake the priority
// (priority inversion).
// For the common chuncks, the algorithm tries to spread the dowload between
// the sources
//
// Check input parameters
if(count == 0)
return false;
if(sender->GetPartStatus() == NULL)
return false;
// Define and create the list of the chunks to download
const uint16 partCount = GetPartCount();
CList<Chunk> chunksList(partCount);
// Main loop
uint16 newBlockCount = 0;
while(newBlockCount != *count){
// Create a request block stucture if a chunk has been previously selected
if(sender->m_lastPartAsked != (uint16)-1){
Requested_Block_Struct* pBlock = new Requested_Block_Struct;
if(GetNextEmptyBlockInPart(sender->m_lastPartAsked, pBlock) == true){
// Keep a track of all pending requested blocks
requestedblocks_list.AddTail(pBlock);
// Update list of blocks to return
newblocks[newBlockCount++] = pBlock;
// Skip end of loop (=> CPU load)
continue;
}
else {
// All blocks for this chunk have been already requested
delete pBlock;
// => Try to select another chunk
sender->m_lastPartAsked = (uint16)-1;
}
}
// Check if a new chunk must be selected (e.g. download starting, previous chunk complete)
if(sender->m_lastPartAsked == (uint16)-1){
// Quantify all chunks (create list of chunks to download)
// This is done only one time and only if it is necessary (=> CPU load)
if(chunksList.IsEmpty() == TRUE){
// Indentify the locally missing part(s) that this source has
for(uint16 i = 0; i < partCount; i++){
if(sender->IsPartAvailable(i) == true && GetNextEmptyBlockInPart(i, NULL) == true){
// Create a new entry for this chunk and add it to the list
Chunk newEntry;
newEntry.part = i;
newEntry.frequency = m_SrcpartFrequency[i];
chunksList.AddTail(newEntry);
}
}
// Check if any block(s) could be downloaded
if(chunksList.IsEmpty() == TRUE){
break; // Exit main loop while()
}
// Define the bounds of the three zones (very rare, rare)
// more depending on available sources
uint8 modif=10;
if (GetSourceCount()>800) modif=2; else if (GetSourceCount()>200) modif=5;
uint16 limit = (uint16)(modif*GetSourceCount()/ 100);
if (limit==0) limit=1;
const uint16 veryRareBound = limit;
const uint16 rareBound = 2*limit;
// Cache Preview state (Criterion 2)
const bool isPreviewEnable = (thePrefs.GetPreviewPrio() || thePrefs.IsExtControlsEnabled() && GetPreviewPrio()) && IsPreviewableFileType();
// Collect and calculate criteria for all chunks
for(POSITION pos = chunksList.GetHeadPosition(); pos != NULL; ){
Chunk& cur_chunk = chunksList.GetNext(pos);
// Offsets of chunk
const uint64 uStart = (uint64)cur_chunk.part * PARTSIZE;
const uint64 uEnd = ((GetFileSize() - (uint64)1) < (uStart + PARTSIZE - 1)) ?
(GetFileSize() - (uint64)1) : (uStart + PARTSIZE - 1);
ASSERT( uStart <= uEnd );
// Criterion 2. Parts used for preview
// Remark: - We need to download the first part and the last part(s).
// - When the last part is very small, it's necessary to
// download the two last parts.
bool critPreview = false;
if(isPreviewEnable == true){
if(cur_chunk.part == 0){
critPreview = true; // First chunk
}
else if(cur_chunk.part == partCount-1){
critPreview = true; // Last chunk
}
else if(cur_chunk.part == partCount-2){
// Last chunk - 1 (only if last chunk is too small)
if( (GetFileSize() - uEnd) < (uint64)PARTSIZE/3){
critPreview = true; // Last chunk - 1
}
}
}
// Criterion 3. Request state (downloading in process from other source(s))
const bool critRequested = cur_chunk.frequency > veryRareBound && // => CPU load
IsAlreadyRequested(uStart, uEnd);
// Criterion 4. Completion
uint64 partSize = PARTSIZE;
for(POSITION pos = gaplist.GetHeadPosition(); pos != NULL; ) {
const Gap_Struct* cur_gap = gaplist.GetNext(pos);
// Check if Gap is into the limit
if(cur_gap->start < uStart) {
if(cur_gap->end > uStart && cur_gap->end < uEnd) {
partSize -= cur_gap->end - uStart + 1;
}
else if(cur_gap->end >= uEnd) {
partSize = 0;
break; // exit loop for()
}
}
else if(cur_gap->start <= uEnd) {
if(cur_gap->end < uEnd) {
partSize -= cur_gap->end - cur_gap->start + 1;
}
else {
partSize -= uEnd - cur_gap->start + 1;
}
}
}
const uint16 critCompletion = (uint16)(partSize/(PARTSIZE/100)); // in [%]
// Calculate priority with all criteria
if(cur_chunk.frequency <= veryRareBound){
// 0..xxxx unrequested + requested very rare chunks
cur_chunk.rank = (25 * cur_chunk.frequency) + // Criterion 1
((critPreview == true) ? 0 : 1) + // Criterion 2
(100 - critCompletion); // Criterion 4
}
else if(critPreview == true){
// 10000..10100 unrequested preview chunks
// 30000..30100 requested preview chunks
cur_chunk.rank = ((critRequested == false) ? 10000 : 30000) + // Criterion 3
(100 - critCompletion); // Criterion 4
}
else if(cur_chunk.frequency <= rareBound){
// 10101..1xxxx unrequested rare chunks
// 30101..3xxxx requested rare chunks
cur_chunk.rank = (25 * cur_chunk.frequency) + // Criterion 1
((critRequested == false) ? 10101 : 30101) + // Criterion 3
(100 - critCompletion); // Criterion 4
}
else { // common chunk
if(critRequested == false){ // Criterion 3
// 20000..2xxxx unrequested common chunks
cur_chunk.rank = 20000 + // Criterion 3
(100 - critCompletion); // Criterion 4
}
else{
// 40000..4xxxx requested common chunks
// Remark: The weight of the completion criterion is inversed
// to spead the requests over the completing chunks.
// Without this, the chunk closest to completion will
// received every new sources.
cur_chunk.rank = 40000 + // Criterion 3
(critCompletion); // Criterion 4
}
}
}
}
// Select the next chunk to download
if(chunksList.IsEmpty() == FALSE){
// Find and count the chunck(s) with the highest priority
uint16 count = 0; // Number of found chunks with same priority
uint16 rank = 0xffff; // Highest priority found
for(POSITION pos = chunksList.GetHeadPosition(); pos != NULL; ){
const Chunk& cur_chunk = chunksList.GetNext(pos);
if(cur_chunk.rank < rank){
count = 1;
rank = cur_chunk.rank;
}
else if(cur_chunk.rank == rank){
count++;
}
}
// Use a random access to avoid that everybody tries to download the
// same chunks at the same time (=> spread the selected chunk among clients)
uint16 randomness = 1 + (uint16)((((uint32)rand()*(count-1))+(RAND_MAX/2))/RAND_MAX);
for(POSITION pos = chunksList.GetHeadPosition(); ; ){
POSITION cur_pos = pos;
const Chunk& cur_chunk = chunksList.GetNext(pos);
if(cur_chunk.rank == rank){
randomness--;
if(randomness == 0){
// Selection process is over
sender->m_lastPartAsked = cur_chunk.part;
// Remark: this list might be reused up to ‘*count’ times
chunksList.RemoveAt(cur_pos);
break; // exit loop for()
}
}
}
}
else {
// There is no remaining chunk to download
break; // Exit main loop while()
}
}
}
// Return the number of the blocks
*count = newBlockCount;
// Return
return (newBlockCount > 0);
}
/zz
#27
Posted 18 January 2006 - 06:58 PM
Quote
for this issue there is a well working patch avaiable since a long time... just a few values to change
have a look at the codesnipsets:
http://forum.emule-p...showtopic=67087
This post has been edited by Xman1: 18 January 2006 - 07:00 PM
#28
Posted 18 January 2006 - 07:55 PM
Problem discussed was that this particular code:
uint8 modif=10; if (GetSourceCount()>800) modif=2; else if (GetSourceCount()>200) modif=5; uint16 limit = (uint16)(modif*GetSourceCount()/ 100); if (limit==0) limit=1;
Is not particular good, as if develop it for some cases you'll see those bands overlap. So a chunk with 100 sources has the same limit of one with 201. A chunk with 200 has a higher limit to be vey rare than one with 300. And one with 800 has a much higher limit than one with 801.
I just need to remeber where it was.
/edit: here it is.
To tell you the truth I have never changed it myself hoping someone would have come out with a more clever solution
This post has been edited by CiccioBastardo: 18 January 2006 - 08:29 PM
#29
Posted 18 January 2006 - 08:36 PM
/zz
#30
Posted 12 September 2006 - 10:34 AM
/zz
This post has been edited by zz: 12 September 2006 - 10:35 AM