Official eMule-Board: Kad: Publications And Reports - Official eMule-Board

Jump to content


  • (3 Pages)
  • +
  • 1
  • 2
  • 3

Kad: Publications And Reports and maybe some related stuff

#41 User is offline   pier4r 

  • Ex falso quodlibet ; Kad is the major concept behind emule.
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 588
  • Joined: 31-March 09

Posted 04 November 2010 - 05:17 PM

View PostNissenice, on 02 November 2010 - 11:12 PM, said:

1. Yes, I do. But I use ordinary google too and sometimes other search engines as well. Other times I try to find the authors homepages as they usually list their publications and with a little luck one can find the pdf-link to the article in question or another one that looks interesting as well. ;)

2.Thanks! Ok, I'll think about it. If I do it'll probably be here on the board in some way or another..


1. Thanks for hints.
2. Great.
>>>Feature Request (ICS) or SOTN, EmuleCollectionV2 >>> Emule on old hardware (intel pentium 2 or 3 - via c3 - and so on) with good OS settings and enough ram (256+ mb): great >>>user of: eMule - Xtreme - ZZUL bastard - SharX - SharkX 1.8b5 pierQR - ZZUL-Tra - ZZUL-Tra-TL - kMule - Beba

Extended signature: click.
0

#42 User is offline   omeringen 

  • löl
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 984
  • Joined: 01-January 06

Posted 17 November 2010 - 01:45 PM

There is Kad activity reports at jMule page. Interesting. . .

Quote

To increase the efficiency and stability of our JKad manager (the kad implementation used in JMule) we began to study very deep kad activity. The research forced us to build a crawl (aka JKad bot) capable to move through the kad dht space. It's a highly modified JKad manager capable to collect statistics about active kad nodes.

0

#43 User is offline   Nissenice 

  • clippetty-clopping...
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 4231
  • Joined: 05-January 06

Posted 23 September 2011 - 01:32 AM

:thumbup:

Content Pollution Quantification in Large P2P networks : a Measurement Study on KAD. Guillaume Montassier, Thibault Cholez, Guillaume Doyen, Rida Khatoun, Isabelle Chrisment, Olivier Festor. (2011)
http://hal.archives-...tion-Cholez.pdf

Quote

Abstract.—Content pollution is one of the major issues affecting P2P file sharing networks. However, since early studies on FastTrack and Overnet, no recent investigation has reported its impact on current P2P networks. In this paper, we present a method and the supporting architecture to quantify the pollution of contents in the KAD network. We first collect information on many popular files shared in this network. Then, we propose a new way to detect content pollution by analyzing all filenames linked to a content with a metric based on the Tversky index and which gives very low error rates. By analyzing a large number of popular files, we show that 2/3 of the contents are polluted, one part by index poisoning but the majority by a new, more dangerous, form of pollution that we call index falsification.




There is also a new paper about ID repetition in Kad. Haven't found any link to the paper yet so here is only the abstract:

ID Repetition in Structured P2P Networks. Jie Yu, Zhoujun Li, Peng Xiao, Chengfang Fang, Jia Xu, Ee-Chien Chang.
link to abstract

Quote

Abstract. Identity (ID) uniqueness is essential in distributed hash table (DHT)-based systems, as peer lookup and resource searching rely on ID matching. However, many DHT implementations in the wild, such as Kad and Mainline, do not enforce such uniqueness. Most previous works and measurements on DHTs do not take into account that IDs among peers may not be unique. Unfortunately, we observe that a significant portion of peers, i.e. 19.5% of the peers in Kad and 4.0% of the peers in Mainline, do not have unique IDs. These repetitions would mislead the measurements and modeling on those networks. We further focus on investigating the repetition in Kad considering its wider usage and more serious situation of repetition. We observe that there are a large number of peers that frequently change their UDP ports, and there are a few IDs that repeat for a large number of times and all peers with these IDs do not respond to Kad protocol. We also analyze the effects of ID repetitions under simplified settings and find that the current repetition degrades Kad's performance on publishing and searching, but has insignificant effect on lookup process. These measurement and analysis are useful to further determine the sources of repetitions and are also useful for finding suitable parameters in publishing and searching processes in DHT networks without compulsive ID uniqueness.

This post has been edited by Nissenice: 23 September 2011 - 01:34 AM

1

#44 User is offline   Nissenice 

  • clippetty-clopping...
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 4231
  • Joined: 05-January 06

Posted 23 September 2011 - 12:25 PM

Optimal Peer Identifier in eMule Network. LIU Xiang-Tao, CHENG Xue-Qi, LI Yang, CHEN Xiao-Jun, BAI Shuo, LIU Yue. (2010)
Link to article: http://www.jos.org.c...ter_id=9&falg=1

Quote

Abstract. In recent years, eMule network, a kind of peer-to-peer (P2P) file-sharing network has become more and more popular. Along with its popularity, the demand to accurately determine the peer in eMule has also increased for two reasons: it is a critical step to accurately locate sources of files in P2P file-sharing networks, and the wanton spread of vulgar content makes it necessary to censor eMule. This demand allows everyone to put forward the problem of optimal peer identifier in eMule network. However, since Kad ID (the widely-used identifier in eMule network) can be freely changed by users of eMule, there exists Kad ID aliasing, a single peer may correspond to multiple Kad IDs; reversely, There also exists Kad ID repetition, which are multiple peers corresponding with a single Kad ID. Therefore, it is difficult to accurately determine the peer by using Kad ID. This paper attempts to solve this problem. First, the stability factor (SF) of peer identifier is defined to evaluate candidate identifiers. Then, a crawler named Rainbow is designed and implemented to collect peer information from multiple candidate identifiers’ relationship in real eMule network. Note that Rainbow has been proved to be convergent and has low time and space complexity. Experimental results show that {userID} is the optimal peer identifier in peer identifier set 2{Kad ID,userID,IP}−{Φ} as {userID} has the largest SF value. Later on, in order to quantify the extent of Kad ID aliasing, the relationship between {userID} and {Kad ID} is discussed. Lastly, the effectiveness of the application of the optimal peer identifier is analyzed. Results show that peers are more accurately determined when using {userID} as the identifier of peers. All in all, the identification of optimal peer identifier provides a basis for future research of eMule network, and Rainbow serves as a useful tool for measuring real eMule network.


Unfortunately, only the abstract is in english and the rest of the article is in chinese, so this will be fun.

This is most likely a paper from the same study zz0fly is referring to in Weird Kad Nodes Id, post #26 .

This post has been edited by Nissenice: 23 September 2011 - 12:27 PM

0

#45 User is offline   Meuh6879 

  • GoldMember (Yeah, Baby !)
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 1638
  • Joined: 26-December 02

Posted 23 September 2011 - 08:36 PM

:flowers: great, specific infos is always a good ... scifi exercice for me :P
it complicate :-k (to make the thing simple).

but, if it works, it's all and it's fine !
0

#46 User is offline   Nissenice 

  • clippetty-clopping...
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 4231
  • Joined: 05-January 06

Posted 24 September 2011 - 12:31 AM

Thanks!
Yes, information is always good and it's also good to know what other might know so you could be prepared whenever there is a necessity.
It may feel like you are staring at hieroglyphs at first, but that's how it is always like when you are learning something new.
To know, well, that's fine, fun and comfortable, but not half as fun as to learn. :thumbup:


Coincidantelly I run into an article about just the Rainbow crawler, and in english too! And just before that another paper describing another crawler. So here are both of them.


------


Rainbow: a Robust and Versatile Measurement Tool for Kademlia-based DHT Networks. Xiangtao Liu, Tao Meng, Kai Cai, Xueqi Cheng. (2011)
http://sourcedb.ict....79288806248.pdf

Quote

Abstract. —In recent years, peer-to-peer (P2P) file sharing applications have dominated the Internet traffic volumes, and among them, BitTorrent and eMule constitute the majority. BitTorrent and eMule deploy their distributed networks based on Kademlia, a robust distributed hash table (DHT) protocol, to facilitate the delivery of content. Kademlia-based DHT networks have intrigued researchers in P2P community to measure and analyze them. However, to the best of our knowledge, there is still not a well-designed crawler to carry out intensive measurement and analysis on them. In this paper, we develop Rainbow, a robust and versatile crawler for Kademlia-based DHT networks. For the first time, we theoretically analyze its convergence (a main issue of robustness), that is, Rainbow can complete the crawling within a limited time. Our analysis can also be applied to other P2P crawlers with the same sampling nature. Finally, we demonstrate that Rainbow can be applied as a versatile measurement tool to identify various characteristics of Kademlia-based DHT networks at a deep level.


------


Advanced Distributed Crawling System for Kad Network. Qi WU, Xingshu CHEN. (2011)
http://www.jofcis.co...7_3_677_684.pdf

Quote

Abstract. Many distributed hash tables have been proposed recently, but only very few of them have been applied actually. KAD, a Kademlia based DHT, is only an exception which is widely used in eMule peer-to-peer system. The measurement of the network will be able to guidance for the designing of such systems. In order to make the results of measurement more accurate, it is essential to develop a high-performance crawler. In this paper, we introduce a new distributed KAD crawler and apply a new condition for stopping crawling, so that the crawler can collect information faster and more completely. On the basis of this, a crawler is developed and full KAD ID space is crawled. Then on the basis of statistical analysis, we have found some interesting phenomena.


------

This post has been edited by Nissenice: 24 September 2011 - 12:43 AM

0

#47 User is offline   Nissenice 

  • clippetty-clopping...
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 4231
  • Joined: 05-January 06

Posted 24 September 2011 - 09:22 AM

Identifying P2P Application with DHT behaviors. Lin Ye, Hongli Zhang, Qiang Dai. (2011)
Page at science alert: http://scialert.net/....565.572&org=11
Pdf-version: http://docsdrive.com...011/565-572.pdf

Quote

Abstract. Since, the emergence of peer-to-peer applications, their traffic has gradually become the dominant component on some links, which has a significant impact on the underlying infrastructure, such as internet topology, routing systems and network strategies. It is necessary to make some adjustments in design for future, which first needs accurate identification of P2P traffic. In this study, we focus on the explicit behaviors of Distributed Hash Tables (DHT), which are deployed widely in different P2P applications. Therefore, we develop a systematic methodology to identify P2P hosts in a statistical way, i.e., based on behavior patterns of DHT functions. We analyze four representative behaviors in the view of functions, including bootstrap pattern, routing pattern, diversity pattern and short-session pattern. Actually, it is proved that these behaviors are very different from traditional applications in pattern through a series of detailed experiments. At last a novel algorithm that relies on behavior patterns of DHT is proposed. Our experiment results show that we are able to identify more than 90% of P2P hosts with at least 95% accuracy and also can deal with the silent clients, which figures out the identification in a new aspect.


------


Sub-Second Lookups on a Large-Scale Kademlia-Based Overlay. Raul Jimenez, Flutra Osmani, Björn Knutsson. (2011)
http://people.kth.se...11subsecond.pdf

Quote

Abstract.—Previous studies of large-scale (multimillion node) Kademlia-based DHTs have shown poor performance, measured in seconds; in contrast to the far more optimistic results from theoretical analysis, simulations and testbeds.
In this paper, we unexpectedly find that in the Mainline BitTorrent DHT (MDHT), probably the largest DHT overlay on the Internet, many lookups already yield results in less than a second, albeit not consistently. With our backwards-compatible modifications, we show that not only can we reduce median latencies to between 100 and 200 ms, but also consistently achieve sub-second lookups.
These results suggest that it is possible to deploy latencysensitive applications on top of large-scale DHT overlays on the Internet, contrary to what some might have concluded based on previous results reported in the literature.


------


eDonkey & eMule’s Kad: Measurements & Attacks. Thomas Locher, Stefan Schmid, Roger Wattenhofer. (2011)
http://www.net.t-lab...stefan/fi11.pdf

Quote

Abstract. This article reports on the results of our measurement study of the Kad network. Although several fully decentralized peer-to-peer systems have been proposed in the literature, most existing systems still employ a centralized architecture. The Kad network is a notable exception. Since the demise of the Overnet network, the Kad network has become the most popular peer-topeer system based on a distributed hash table. It is likely that its user base will continue to grow in numbers over the next few years due to the system’s scalability and reliability.
The contribution of the article is twofold. First, we compare the two networks accessed by eMule: the centralized paradigm of the eDonkey network and the structured, distributed approach pursued by the Kad network. We re-engineer the eDonkey server software and integrate two modified servers into the eDonkey network in order to monitor traffic. Additionally, we implement a Kad client exploiting a design weakness to spy on the traffic at arbitrary locations in the ID space. The collected data provides insights into the spacial and temporal distributions of the peers’ activity. Moreover, it allows us to study the searched content. The article also discusses problems related to the collection of such data sets and investigates techniques to verify the representativeness of the measured data. Second, this article shows that today’s Kad network can be attacked in several ways. Our simple attacks could be used either to hamper the correct functioning of the network itself, to censor content, or to harm other entities in the Internet not participating in the Kad network, such as ordinary web servers. While there are heuristics to improve the robustness of Kad, we believe that the attacks cannot be thwarted easily in a fully decentralized peer-to-peer system, i.e., without some kind of a centralized certification and verification authority. This result may be relevant in the context of the current debate on the design of a clean-slate network architecture for the Internet which is based on concepts known from the peer-to-peer paradigm.


------

This post has been edited by Nissenice: 24 September 2011 - 09:30 AM

0

  • Member Options

  • (3 Pages)
  • +
  • 1
  • 2
  • 3

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users