Yes, information is always good and it's also good to know what other might know so you could be prepared whenever there is a necessity.
It may feel like you are staring at hieroglyphs at first, but that's how it is always like when you are learning something new.
To know, well, that's fine, fun and comfortable, but not half as fun as to learn.
Coincidantelly I run into an article about just the Rainbow crawler, and in english too! And just before that another paper describing another crawler. So here are both of them.
Rainbow: a Robust and Versatile Measurement Tool for Kademlia-based DHT Networks.
Xiangtao Liu, Tao Meng, Kai Cai, Xueqi Cheng. (2011
Abstract. —In recent years, peer-to-peer (P2P) file sharing applications have dominated the Internet traffic volumes, and among them, BitTorrent and eMule constitute the majority. BitTorrent and eMule deploy their distributed networks based on Kademlia, a robust distributed hash table (DHT) protocol, to facilitate the delivery of content. Kademlia-based DHT networks have intrigued researchers in P2P community to measure and analyze them. However, to the best of our knowledge, there is still not a well-designed crawler to carry out intensive measurement and analysis on them. In this paper, we develop Rainbow, a robust and versatile crawler for Kademlia-based DHT networks. For the first time, we theoretically analyze its convergence (a main issue of robustness), that is, Rainbow can complete the crawling within a limited time. Our analysis can also be applied to other P2P crawlers with the same sampling nature. Finally, we demonstrate that Rainbow can be applied as a versatile measurement tool to identify various characteristics of Kademlia-based DHT networks at a deep level.
Advanced Distributed Crawling System for Kad Network.
Qi WU, Xingshu CHEN. (2011
Abstract. Many distributed hash tables have been proposed recently, but only very few of them have been applied actually. KAD, a Kademlia based DHT, is only an exception which is widely used in eMule peer-to-peer system. The measurement of the network will be able to guidance for the designing of such systems. In order to make the results of measurement more accurate, it is essential to develop a high-performance crawler. In this paper, we introduce a new distributed KAD crawler and apply a new condition for stopping crawling, so that the crawler can collect information faster and more completely. On the basis of this, a crawler is developed and full KAD ID space is crawled. Then on the basis of statistical analysis, we have found some interesting phenomena.
This post has been edited by Nissenice: 24 September 2011 - 12:43 AM