Official eMule-Board: How Kad Protocol Estimates The Total Amount Of Users? - Official eMule-Board

Jump to content


Page 1 of 1

How Kad Protocol Estimates The Total Amount Of Users?

#1 User is offline   Fr4nz 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 52
  • Joined: 07-February 03

Posted 19 September 2007 - 11:02 AM

I was just thinking...how the current implementation of the KAD protocol estimates the total amount of users using the network? AND is the estimation reliable?

Thanks for any answer :)

This post has been edited by Fr4nz: 25 May 2008 - 10:27 AM

0

#2 User is offline   Fr4nz 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 52
  • Joined: 07-February 03

Posted 25 May 2008 - 10:22 AM

Bump
0

#3 User is offline   Andu 

  • Morph Team
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 13015
  • Joined: 04-December 02

Posted 25 May 2008 - 02:14 PM

I think it uses some kind of statistical regression. Probably some form of linear regression.
Three Rings for the Elven-kings under the sky,
Seven for the Dwarf-lords in their halls of stone,
Nine for Mortal Men doomed to die,
One for the Dark Lord on his dark throne
In the Land of Mordor where the Shadows lie.
One Ring to rule them all, One Ring to find them,
One Ring to bring them all and in the darkness bind them
In the Land of Mordor where the Shadows lie.


Dark Lord of the Forum


Morph your Mule

Need a little help with your MorphXT? Click here

0

#4 User is offline   Fr4nz 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 52
  • Joined: 07-February 03

Posted 25 May 2008 - 02:57 PM

View PostAndu, on May 25 2008, 02:14 PM, said:

I think it uses some kind of statistical regression. Probably some form of linear regression.


Is it reliable in your opinion? It fluctuates a lot...
0

#5 User is offline   Andu 

  • Morph Team
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 13015
  • Joined: 04-December 02

Posted 25 May 2008 - 05:35 PM

Uhm I can't really tell. I'd say it strongly depends on the state the network currently is in. Might even depend on the nodes you are in contact with. I think a dev might be better suited to answer this question.
Three Rings for the Elven-kings under the sky,
Seven for the Dwarf-lords in their halls of stone,
Nine for Mortal Men doomed to die,
One for the Dark Lord on his dark throne
In the Land of Mordor where the Shadows lie.
One Ring to rule them all, One Ring to find them,
One Ring to bring them all and in the darkness bind them
In the Land of Mordor where the Shadows lie.


Dark Lord of the Forum


Morph your Mule

Need a little help with your MorphXT? Click here

0

#6 User is offline   Fr4nz 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 52
  • Joined: 07-February 03

Posted 25 May 2008 - 07:23 PM

View PostAndu, on May 25 2008, 06:35 PM, said:

Uhm I can't really tell. I'd say it strongly depends on the state the network currently is in. Might even depend on the nodes you are in contact with. I think a dev might be better suited to answer this question.


Let's hope a dev sees this thread then :)
0

#7 User is offline   PacoBell 

  • Professional Lurker ¬_¬ (so kyoot!)
  • PipPipPipPipPipPipPip
  • Group: Moderator
  • Posts: 7296
  • Joined: 04-February 03

Posted 25 May 2008 - 08:38 PM

//This is used when we find a leaf and want to know what this sample looks like.
//We fall back two levels and take a sample to try to minimize any areas of the
//tree that will give very bad results.
uint32 CRoutingZone::EstimateCount()
{
	if( !IsLeaf() )
		return 0;
	if( m_uLevel < KBASE [b]/*4*/[/b])
		return (UINT)(pow(2.0F, (int)m_uLevel)*K [b]/*10*/[/b]);
	CRoutingZone* pCurZone = m_pSuperZone->m_pSuperZone->m_pSuperZone;
	// Find out how full this part of the tree is.
	float fModify = ((float)pCurZone->GetNumContacts())/(float)(K*2);
	// First calculate users assuming the tree is full.
	// Modify count by bin size.
	// Modify count by how full the tree is.
	// Modify count by assuming 20% of the users are firewalled and can't be a contact.
	return (UINT)((pow(2.0F, (int)m_uLevel-2))*(float)K*fModify*(1.20F));
}


Disclaimer: I'm not an official dev, just a code monkey :P

This post has been edited by PacoBell: 25 May 2008 - 08:41 PM

Sed quis custodiet ipsos custodes
Math is delicious!
MmMm! Mauna Loa Milk Chocolate Toffee Macadamias are little drops of Heaven ^_^
Si vis pacem, para bellum DIE SPAMMERS DIE!

#8 User is offline   dandin1 

  • Newbie
  • Pip
  • Group: Members
  • Posts: 3
  • Joined: 15-July 08

Posted 15 July 2008 - 08:46 PM

From the paper Actively Monitoring Peers in KAD:
"[...] the number of peers is uniformly distributed over the hash space. This property will allow us to estimate the total number of peers in a zone [by doing] a partial crawl of one 256-th of the entire KAD ID space[...]"

A zone is all the nodes with a key starting with the same 8 bits.
0

#9 User is offline   Fr4nz 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 52
  • Joined: 07-February 03

Posted 25 July 2008 - 07:22 PM

View Postdandin1, on Jul 15 2008, 08:46 PM, said:

From the paper Actively Monitoring Peers in KAD:
"[...] the number of peers is uniformly distributed over the hash space. This property will allow us to estimate the total number of peers in a zone [by doing] a partial crawl of one 256-th of the entire KAD ID space[...]"

A zone is all the nodes with a key starting with the same 8 bits.


Wait, the paper says that the crawl they've made generated around 20mb of incoming/outgoing traffic, so I suppose the algorithm they've used is not the same which uses emule! :)

Anyway the amount of users displayed can vary a lot from moment to moment, so I suppose we must take in consideration that there's an interval around the value displayed in which the estimation falls in (and maybe it depends also from the "density" of a branch of the hash tree used for the "survey")...tell me if I'm wrong plz :D

This post has been edited by Fr4nz: 25 July 2008 - 07:26 PM

0

#10 User is offline   fox88 

  • Golden eMule
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 4974
  • Joined: 13-May 07

Posted 25 July 2008 - 08:35 PM

View PostFr4nz, on Jul 25 2008, 11:22 PM, said:

Anyway the amount of users displayed can vary a lot from moment to moment
It does vary; also users with open KAD mules 'see' a lot more nodes than firewalled.
0

#11 User is offline   Fr4nz 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 52
  • Joined: 07-February 03

Posted 15 August 2008 - 11:47 AM

View Postfox88, on Jul 25 2008, 09:35 PM, said:

View PostFr4nz, on Jul 25 2008, 11:22 PM, said:

Anyway the amount of users displayed can vary a lot from moment to moment
It does vary; also users with open KAD mules 'see' a lot more nodes than firewalled.


In your opinion what is the amount of people in Kad, realistically, during peak hours? Between 3-4 millions?
0

#12 User is offline   Some Support 

  • Last eMule
  • PipPipPipPipPipPipPip
  • Group: Yes
  • Posts: 3667
  • Joined: 27-June 03

Posted 15 August 2008 - 03:34 PM

The new counting experimental counting method is fairly independend from any local settings / your routing table and offers a good estimation how many active Kad nodes are in the network (see your network dialog, only works if you share enough files over kad or do enough search queries). The only uncertain point at this time are what part Firewalled clients take in those numbers.
Not only do we not know what excact percentage they have in the Kad network but we also do not know how many of them are still listed in routing tables (and therefore already included in the estimation). This becomes clearer the more 0.49b nodes will spread.

Anyway Kad most likely has about 1,5-1,7kk users in the non-peak times and 2,0-2,7kk users in the peaktimes currently. 4kk is probably too high.

#13 User is offline   Fr4nz 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 52
  • Joined: 07-February 03

Posted 15 August 2008 - 04:47 PM

View PostSome Support, on Aug 15 2008, 04:34 PM, said:

The new counting experimental counting method is fairly independend from any local settings / your routing table and offers a good estimation how many active Kad nodes are in the network (see your network dialog, only works if you share enough files over kad or do enough search queries). The only uncertain point at this time are what part Firewalled clients take in those numbers.
Not only do we not know what excact percentage they have in the Kad network but we also do not know how many of them are still listed in routing tables (and therefore already included in the estimation). This becomes clearer the more 0.49b nodes will spread.

Anyway Kad most likely has about 1,5-1,7kk users in the non-peak times and 2,0-2,7kk users in the peaktimes currently. 4kk is probably too high.


Thanks for the infos, I'm really interested into this...excuse me if I have one last question for you: given that you share a proper number of files, after how much time the kad estimation (the experimental one) of the total population is reliable?
It seems to me that it needs at least 10~20 minutes (maybe 1 hour?) to give a first reliable estimation, because numbers really goes up and down...

This post has been edited by Fr4nz: 15 August 2008 - 05:50 PM

0

#14 User is offline   Some Support 

  • Last eMule
  • PipPipPipPipPipPipPip
  • Group: Yes
  • Posts: 3667
  • Joined: 27-June 03

Posted 15 August 2008 - 05:33 PM

You are probably looking at the wrong number. The experimental one is shown only in the network dialog with a "experimental" tag attached. It shouldn't fluctuate too much as soon as its no longer 0.

#15 User is offline   Fr4nz 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 52
  • Joined: 07-February 03

Posted 15 August 2008 - 05:52 PM

View PostSome Support, on Aug 15 2008, 06:33 PM, said:

You are probably looking at the wrong number. The experimental one is shown only in the network dialog with a "experimental" tag attached. It shouldn't fluctuate too much as soon as its no longer 0.


You're right, it fluctuated a lot less than the normal estimation.

Anyway, after 1.30 hour connected I'm noticing a difference of 1 million of users between the default estimation (2,7kk) and the experimental one (1,5kk)...why this discrepancy?? Maybe because 0.49b isn't propagated like 0.49a (and previous versions) and so the experimental estimation is doing its calculations accounting only 0.49b clients?

PS: What I see is this, I don't think I'm wrong :) :

Users: 2,697,557 (Experimental: 1,507,618)

This post has been edited by Fr4nz: 15 August 2008 - 06:51 PM

0

#16 User is offline   Some Support 

  • Last eMule
  • PipPipPipPipPipPipPip
  • Group: Yes
  • Posts: 3667
  • Joined: 27-June 03

Posted 15 August 2008 - 09:31 PM

Quote

why this discrepancy??


Because they use two different methods to calculate the numbers. The new method is independent from your local routing table and it works with all Kad version. I would say if your eMule is running a while and shares or downloads enough files over kad (because this causes queries and those are needed for the new method) the experimental method should be more trustworthy, assuming that there isn't an implementation or design bug (which is possible as its only experimental) - hard to verify through.

#17 User is offline   Fr4nz 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 52
  • Joined: 07-February 03

Posted 16 August 2008 - 09:04 AM

View PostSome Support, on Aug 15 2008, 09:31 PM, said:

Quote

why this discrepancy??


Because they use two different methods to calculate the numbers. The new method is independent from your local routing table and it works with all Kad version. I would say if your eMule is running a while and shares or downloads enough files over kad (because this causes queries and those are needed for the new method) the experimental method should be more trustworthy, assuming that there isn't an implementation or design bug (which is possible as its only experimental) - hard to verify through.


Thanks for the explanation!

Anyway it's very suprising that there is a difference of 1-1.5kk of users between the old and new estimation methods.

This post has been edited by Fr4nz: 16 August 2008 - 10:23 AM

0

#18 User is offline   Aggravated 

  • Magnificent Member
  • PipPipPipPipPipPip
  • Group: Members
  • Posts: 349
  • Joined: 04-August 07

Posted 25 October 2008 - 02:52 PM

View Postdandin1, on Jul 15 2008, 10:46 PM, said:

A zone is all the nodes with a key starting with the same 8 bits.


Why 8 bits? This is different from the original Kademlia protocol, since the XOR minimum distance is not really applied. A tolerance zone of 8 bits is sufficient for a network with less than 1 million peers. Changing the tolerance zone to at least 14 bits, increases the probability to retrieve more data items. I will also suggest a reduction of the publishing frequency to 12 hours, that would really improve the data retrieving in the network.

This post has been edited by Aggravated: 25 October 2008 - 03:01 PM

๑۩۞۩๑ Aggravated ๑۩۞۩๑
0

#19 User is offline   fox88 

  • Golden eMule
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 4974
  • Joined: 13-May 07

Posted 25 October 2008 - 03:05 PM

View PostAggravated, on Oct 25 2008, 06:52 PM, said:

View Postdandin1, on Jul 15 2008, 10:46 PM, said:

A zone is all the nodes with a key starting with the same 8 bits.


Why 8 bits? This is different from the original Kademlia protocol, since the XOR minimum distance is not really applied. A tolerance zone of 8 bits is sufficient for a network with less than 1 million peers.
Then how KAD works at all with >2M users right now?

Increasing the number of matching bits makes each zone smaller, does not it?
0

#20 User is offline   Aggravated 

  • Magnificent Member
  • PipPipPipPipPipPip
  • Group: Members
  • Posts: 349
  • Joined: 04-August 07

Posted 25 October 2008 - 03:56 PM

View Postfox88, on Oct 25 2008, 05:05 PM, said:

Increasing the number of matching bits makes each zone smaller, does not it?


Correct, increasing the matching bits will reduce the tollerance zone. Eight matching bits is too large for a network with over 1 million peers. If you perform a search with the tollerance zone set to eight matching bits, the data item has to be searched in a subset of over 4000 peers, whereby only the closest peers are retrieved.

This post has been edited by Aggravated: 25 October 2008 - 04:36 PM

๑۩۞۩๑ Aggravated ๑۩۞۩๑
0

  • Member Options

Page 1 of 1

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users