Official eMule-Board: Always Download The Rarest Chunks First. - Official eMule-Board

Jump to content


Page 1 of 1

Always Download The Rarest Chunks First. Always download the rarest chunks first. Rate Topic: -----

#1 User is offline   DatHebIkWeer 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 66
  • Joined: 07-July 12

Posted 07 July 2012 - 08:19 PM

ALWAYS DOWNLOAD THE RAREST CHUNKS FIRST.

Of course this is hardly a new feature request. I am pretty sure that the first developers of the chunk download system already thought this is the way forward. I think we need new attention for this point now though and it is justified to treat it as a new feature that is of greater importance now than before. The reason is of course the rise in clients who just start a download from the beginning and then work up to the end. It seems most of those clients have a username with thane in it. It seems that in China a mod circulates that is either developed by really ignorant people or is purposely malicious.

If we all just start downloading a file from the beginning we will all have the same first half of the file when we are halfway. If at that point the initial sharer decides to withdraw the file we will end up with an unfinished file. All of us. Despite the fact that the total upload of the initial host may have been the file size several times over. Needless to say that this is not advantageous for any of us. Our own greed and egotism caused us to get nothing.

The most fundamental fact of continuity in availability of a file is that we need to always give priority to download of the rarest chunks. This helps ourselves and the community. It helps ourselves because by downloading the rarest chunks first we increase the chances of ever getting the whole file and it helps the community because by saving the rarest chunks we increase the likelihood the file will be available for download in the future.

I think now more than ever it is important that we take this rule very seriously and do extra effort to download the rarest chunks first.

This post has been edited by DatHebIkWeer: 07 July 2012 - 08:47 PM

0

#2 User is offline   DatHebIkWeer 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 66
  • Joined: 07-July 12

Posted 07 July 2012 - 08:49 PM

How can that be done?
We need an algorithm to determine which chunk is rarest and which needs to be downloaded first. I think the program should go through a number of steps to do that.

Step 1: Determine what the current uploader has to offer.
Obviously we should not refuse chunks if offered. We have to choose from the chunks offered by the host.

Step 2: Determine what other uploaders offer
The other uploaders are least likely to be leechers and therefore they are most likely to distribute chunks to other users also. It matters what they do or do not have. We can debate if this is the group of host we currently actually download from or the group of hosts we have been downloading from during this session.
Simply count the availability of chunks available by the other uploaders. Make a subset of chunks that are rarest among them. Maybe it is a number, maybe it is just 1. If it is just 1 download that one. Else go to

Step 3: Determine what is available among the other known hosts.
Count the availability of the rare chunk subset of step 2 in this population. The chunks in that subset that are least commonly owned by all hosts form a new subset that we transport to

Step 4: Determine which chunks we are currently downloading from other hosts
If possible remove them from the subset of step 3. We need to keep at least 1 chunk for download. If the last rarest chunk is being downloaded from someone else so be it. Then this download should join in. This creates subset of step 4

Step 5:
Now choose between the items in the subset of step 4 in a really random way. Be careful not to use a randomization that results in the same choice by every process every time.

Step 6:
Download the resulting chunk.

The idea is of course that every version or mod uses this way of determining which chunk to download. In the ideal case if everybody does this there will be no really rare chunks since every chunk will be the rarest at one time and be downloaded.

If this happens with a popular file there will not be a real need to keep finished files to download from because all chunks will be available on several hosts in unfinished files.

It is debatable if we should take step 2 first and then step 3 or the other way around.

This post has been edited by DatHebIkWeer: 07 July 2012 - 09:16 PM

0

#3 User is offline   DatHebIkWeer 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 66
  • Joined: 07-July 12

Posted 07 July 2012 - 09:17 PM

I have seen some things happen in eMule 0.50a that violate or counteract the principle of first downloading the rarest chunks

Violation 1: Download first and last chunk first
This is a very useful feature that IMO should be maintained, even though it slightly decreases the chances of getting the whole file.

Violation 2: Try to finish partial chunks first
This at first view useful feature of eMule in practice turns out to bias the download to the more common chunks. There is no real reason in the common good to share a chunk that is more commonly shared already. Even if there are only a few bytes left to download the common good and the chances for success are served more by downloading the rarest chunk first. Of course if an unfinished chunk happens to be in the rarest chunk subset it may be smart to choose it for download.
This option causes download processes to follow each other. If one download process starts a chunk it will automatically be partially finished and therefore the other download process will start downloading it too. This is about the worst thing that can happen. (I will explain that later.)

Violation 3:
Also if that option is disabled the download processes seem to follow each other. Usually 2 processes get “interlocked” downloading the same chunks. When the end of the chunk is reached the slowest process usually starts to reserve a part of a new chunk for download. The fastest process follows that choice and starts the actual download of that chunk first. Then the slowest process follows and at the end of the chunk the whole thing starts again.
Disabling the finish partial chunks first setting decreases the bias toward more common chunks a bit, but doesn’t eliminate it.

The “following” behaviour of processes is just what should not happen.
An example:
In a theoretical case of 3 chunks (chunk A, B and C) to be downloaded by 2 processes.
Chunk A is available on both hosts, chunk B is only available on host 1 and chunk C is only available on host 2.

Now in case the 2 processes follow each other and decide to download chunk A. They do that pretty fast, both doing their best and chunk A is downloaded soon. That is good. Now both processes can go to the other chunks that only one host has available. But it happens to be that We still had quota for 2 chunks on host 1, and quota for only a half chunk on host 2. When host 1 finished it’s upload there is still quota left that can not be used anymore since no more chunks are available for download on that host.
When host 2 runs out of quota chunk C is only half way. That is bad.

I think the following is more desirable behaviour:
In case the processes do not follow each other and instead decide to download the chunk first that the other does not have. That means that chunks B and C will be downloaded first. The finished chunks arrive a bit later, but they are there. Then both processes can join to download the last available chunk. Very soon process 2 runs out of quota. But no worries, process 1 has plenty of opportunity to totally finish the download.

That is why processes should not follow each other and should make a point of first downloading the chunks the other hosts do not have available.

This post has been edited by DatHebIkWeer: 07 July 2012 - 09:38 PM

0

#4 User is offline   Link64 

  • Golden eMule
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 2153
  • Joined: 25-January 04

Posted 08 July 2012 - 09:23 AM

You pretty much described here what eMule does, when it decides, which chunk to download. So eMule has this feature already since years, if some chinese mod does not, I'd suggest to request it in their forum, the official devs can't do anything about it. As to the "violations"... eMule has a pretty complex chunk choosing mechanism, it is always trying to find a good balance between the "violations" and "rare chunk first", there was even recently a feature request to put first chunk first at top priority.



View PostDatHebIkWeer, on 07 July 2012 - 11:17 PM, said:

I think the following is more desirable behaviour:
In case the processes do not follow each other and instead decide to download the chunk first that the other does not have. That means that chunks B and C will be downloaded first. The finished chunks arrive a bit later, but they are there. Then both processes can join to download the last available chunk. Very soon process 2 runs out of quota. But no worries, process 1 has plenty of opportunity to totally finish the download.

I've actually seen the desired behavior, i.e. not downloading the same chunk, however I think it does only work for really rare chunks. I agree, that it might need some fine tuning, in many cases eMule is als not thinking enough ahead, specially when both sources have different chunks available. That leads often to one source becoming "no needed parts" before it has uploaded a full chunk while the other can't complete the other chunk, because it has used part of the upload session for a chunk, which could have probably been downloaded from the first source.
So poste ich richtig! (besonders Punkt 2 beachten)
Für alle, die was heruntergeladen haben und nicht wissen was sie damit anfangen sollen: endun.gen.

BOINC ...and you can always say you're working on a science project.
0

#5 User is offline   DatHebIkWeer 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 66
  • Joined: 07-July 12

Posted 13 July 2012 - 10:07 PM

I feel this thread is the most important I have placed so far although I know I am hardly unique in posting this. The point to stress is that eMule seems to want to do this, but for 1 or another reason in practice does not.

Maybe it’s a bug somewhere that is hard to find, like a + where it should be a – or a 1 where it should be a 2 or something. Maybe it is too many exceptions built in, or different thinking over the years of eMule development.
For instance somebody once built in an option to try to download complete chunks first. That proved to make the situation worse. It leads to downloads being transferred from the rightly selected chunk to another chunk just because another download is busy downloading that chunk. That is actually diametrically opposite to what it should be.

On average eMule does quite well selecting the rarest chunk first if only 1 download thread is running for a file. It seems to me the “following behaviour” of simultaneous threads is the culprit. I think somebody built in an exception in the selection process that causes a thread to favour a partially downloaded chunk, even though the download whole chunks option is not set.
Unfortunately I speak only a tiny bit of C++, by far not enough to easily examine a comprehensive program like eMule with reasonable effort. So I won’t be able to find it myself. Maybe later I will take time and patience to do it. I could make my own mod with just a few basic bugfixes for my own use. But it would be better if the official version did it right.

Trying to finish whole chunks first is in my view undesirable behaviour. I could argue that chances of success for me and the whole community would increase if I would do the opposite and I would, if a rarer chunk gets available during a download, switch to downloading that rarer chunk, even if it would mean I would finish neither chunk in this download thread.
But it will depend on the reliability of the sources, the difference in rarity and the amount of the less rare chunk I already downloaded what the odds will be. After all downloading the rarest chunk first will give you the best chance of getting those rarest chunks, but finishing a chunk guarantees that you have it. Just to keep switching from chunk to chunk during download sessions without ever finishing one is a big gamble.

Maybe in the chunk selection the rarity of different chunks should be quantified, together with the stage of the download of it. We then could try to maximize the odds of successful download for the current client and for the community. If a chunk is for instance 80% downloaded there is less loss of overall chances in finishing it then if it is less than 20% downloaded. Whereas the gain in the probability of downloading the whole file by finishing this particular chunk is the same. So it’s a trade off.
If we take into account that the standard size of a download ration is about the size of a chunk and (if you get past the first 100kB) the chances of getting that are quite good, it may be always more advantageous to start the rarest chunk first at the start of the download, rather than finishing the more common chunk that has 80% done first and then doing the rare one till 80%. After download of the rare chunk the remaining ration can be used to further download the almost finished chunk. On the other hand extra ration often means a whole extra chunk of data so it may be always more advantageous to spend that on the next rarest chunk.

Now I am inventing exceptions myself. A dumb Mule that just downloads the rarest chunks first without regard to partially finished more common chunks may still be the way to go anytime.

This post has been edited by DatHebIkWeer: 13 July 2012 - 10:21 PM

0

#6 User is offline   jannelx 

  • Newbie
  • Pip
  • Group: Members
  • Posts: 6
  • Joined: 22-July 13

Posted 22 July 2013 - 03:53 PM

There is nothing worse than chunks being unable to download
0

  • Member Options

Page 1 of 1

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users