paul perry .   blog   |   about   |   notes

Scalable P2P Search

Despite the negative press on P2P business models, there is some very interesting research now on better performing search in P2P networks.

The basic premise of these networks is that voluntary, ad-hoc, dynamic peers can exchange information (mostly large files) and attain a critical mass that allows most parties to find something they want. The core technology that supports this social dynamic is a decentralized search service that delivers quality results.

There are piles of these file sharing applications , listed in many many many directories. There is no declared winner yet, so take a look at some client reviews and a recent file sharing smackdown favoring AudioGalaxy and Gnotella. I used to use Napster and then moved on to LimeWire (Gnutella), but am now on Morpheus (same as KaZaA minus spyware) and judging from download popularity, these FastTrack based servents are out in front today (9/9/01) as THE most popular downloads of all; even unseating ICQ, the top downloaded application for years.

The P2P space is interesting because these file sharing networks are the largest user created overlay networks to date; sharing disk, bandwidth and compute time. The world's largest supercomputer has fewer than 10,000 processors, and carries a hefty $100 million price tag. But SETI@Home grew to 1.5 million users and Napster to 50M users! It's harder to count the number of active users on a Gnutella network, but clearly interest in file sharing is an order of magnitude larger than for distributed computing. Harnessing this dark matter while serving a diverse constituency is "very powerful".

P2P nets are also interesting because they can form the basic infrastructure to a host of applications, including:

  • building a truly distributed DNS [CAN]
  • content distribution networks like Akamai's [Karger99]
  • better discovery of routing paths than BGP [RON1]
  • scalable event notification infrastructures
  • instantaneous distributed search facility (better than Google)

Gnutella's scalability problem

The imminent collapse of Gnutella networks due to freeloaders didn't materialize. This call is reminiscent of Metcalfe's call for the imminent collapse of the internet. Are the Xerox people going to eat their paper this time? Turns out these networks have very strong reliability properties [Albert00].

However, while Gnutella won't collapse, and it won't collapse the internet, it doesn't scale. Gnutella's file search query propagation radius (the time to live of the search) theoretically tops out at around 10K nodes but in practice looks smaller than that. Search traffic was taking a quarter of the net bandwidth and users had limited visibility of what they could find. Better solutions in this space could win the P2P race.

There have been a couple of suggested approaches to reduce the Gnutella network traffic. The first set focuses on improving the Gnutella flooding protocol by changing the organizational structure of the network and by routing queries. Another set favors routing protocols [Route]. I was expecting some review on this split of approaches at the O'Reilly P2p Conference in November 2001 in DC. I might have missed it, but I expect much more technical details out of the IPTPS'02 held at MIT in March'02.

Self Organizing Networks

How the P2P topology maps to the underlying network topology becomes important when nodes differ in:

  • 3 orders of magnitude of bandwidth (50Kbps-100Mbps)
  • 6 orders of magnitude of latency (10us-10s)
  • 4 orders of magnitude in availability (1%-99.99%)

It is clear that not all nodes are created equal. Designers of these nets really need to decide very carefully which nodes are trusted and what functions are delegated to them [Saroiu01].

Modem users were becoming dead-end nodes on the Gnutella network, creating a modem bandwidth barrier. The first strategy was to move the hosts on slower connections to the edge of the network introducing a hierarchy. Some Gnutella servents (LimeWire) tiered the network by using gateways and connection preferencing: pinging peers who stop sending information, and dropping those who don't respond.

The FastTrack network also takes this strategy by designating some high bandwidth nodes as supernodes and matching new client arrivals to a supernode, looking more like a federated Napster than Gnutella. They claim to be able to scale to 1M nodes, and so far I see Morpheus with 0.5M visible nodes. The FastTrack network is closed but giFT in trying to start an open development interface to the FastTrack stack.

Query Routing and Cacheing

Given that Gnutella has a power-law structure [Javonavic01], with a few nodes having very high degree and many with low, [Adamic01] propose that a host pass queries only to the neighbor with the most connections. An obvious intuition if you are aware of the dynamics of social networks.

Another approach is to route the query to nodes that are likely to have the item for the query term. [Prinkey00] suggests nodes pass a hash summary of their index up the hierarchy and that subsequent nodes combine the summary hashes of their peers and itself and pass that up the hierarchy. With this information a query can be routed to the nodes that are likely to return positive results. A similar approach was used for dramatically improving web proxies [Fan98] and Prinkey00 may want to consider the use of Bloom filters [Bloom70].

Another strategy is to cache the queries of gnutella searches. But even small caches can be a drain on the memory of clients on the network.

Routing Networks

Hypercube, butterfly, and Plaxton tree [Plaxton99] architectures for highly parallel computers have re-emerged in the P2P space in CAN, Chord, Pastry, Tapestry, Past, etc.. These P2P architectures don't flood queries but devise mapping methods between object id spaces and sets of nodes.

These networks have great lookup algorithms (essentially with infinite horizon) with efficient use of memory. Most just need log(N) time and space. But log(N) for 1000 nodes is still 10 hops away, which is isn't insignificant given the dynamic nature of these networks.

But no current network in use today is based on routing architectures. One view is that these networks will have a hard time stabilizing given the heterogeneity (wrt bandwidth, etc.) and availability of peers, requiring lots of maintenance traffic to update the mapping.

But if one of these architectures can be adapted to the needs of file sharing networks available today, we may see a dramatic improvement in query performance and results.

Conclusion

In the same way Google was not first to market but provided a simple interface with better search results, so will go the file sharing network fight, with the added wrinkle that this battle is being fought on the three fronts of user adoption, protocol, and legal level.

P2P Systems

P2P Companies

References

[Adamic01] Lada A. Adamic, Rajan M. Lukose, Amit R. Puniyani and Bernardo A. Huberman Search in Power-Law Networks Internet Ecologies Area Xerox Palo Alto Research Center Palo Alto, CA.

Eytan Adar and Bernardo A. Huberman Free Riding on Gnutella, Frist Monday, Volume 5, Number 10 - October 2000.

[Albert00] Reka Albert, Hawoong Jeong, Albert-Laszlo Barabasi, Error and Attack Tolerance of Complex Networks (pdf) University of Notre Dame, Nature July 2000.
[The authors find that scale-free networks, including the Internet, display an unexpected degree of robustness- the ability of their nodes to communicate being unaffected even by un-realistically high failure rates. However, error tolerance comes at a high price in that these networks are extremely vulnerable to attacks (that is, to the selection and removal of a few nodes that play a vital role in maintaining the network's connectivity)]

[RON01] David G. Andersen, Hari Balakrishnan, M. Frans Kaashoek, and Robert Morris The Case for Resilient Overlay Networks Proc. HotOS VIII, Schloss Elmau, Germany, May 2001.
[Intro to MIT LCS's RON project]

[Route] Cesur Baransel, Wlodek Dobosiewicz, Pawel Gburzynski, Routing in Multi-hop Packet Switching Networks: Gbps Challenge
[An very lucid paper on routing and flooding protocols]

[Bloom70] Bloom, B. H. Space/time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM 13, 7 (July 1970), 422-426.

Blosky, W. J., Doucher, J. R., Ely, D., and Theimer, M. Feasibility of a Serverless Distributed File System Deployed on an Existing set of Desktop PCs Proceedings of the ACM Sigmetrics 2000 Conference, June 2000.

Johnny Chen, New Approaches to Routing for Large-Scale Data Networks Ph.D. Rice U.

[Freenet] Clarke, I. A Decentralized Information Storage and Retrieval System (pdf). Master's Thesis, University of Edinburgh, 1999.
[The Freenet paper]

[Gnutella] Clip2, The Gnutella Protocol Specification v0.4 (pdf).

[Chord01] Dabek F., E. Brunskill, M. F. Kaashoek, D. Karger, R. Morris, I. Stoica, H. Balakrishnan Building Peer-to-Peer Systems with Chord, a Distributed Lookup Service, In the Proceedings of the 8th Workshop on Hot Topics in Operating Systems (HotOS-VIII), Schloss Elmau, Germany, May 2001.

[Fan98] Fan L., Cao P., Almeida J. and Broder A., Summary Cache: A scalable wide-area Web Cache Sharing Protocol. In Proceedings of teh ACM SIGCOMM '98 Conference September 1998. pp. 254-265.
[This paper revolutionalized the scaling of caching servers. It 'rediscovered' Bloom filters and is directly applicable to searching in a peer network]

[Gribble01] Steven D. Gribble, Alon Halevy, Zachary Ives, Maya Rodrig, Dan Suciu. What Can Databases do for Peer-to-Peer? (pdf). WebDB Workshop on Databases and the Web, June 2001.

[Gribble00] Steven D. Gribble, Eric A. Brewer, Joseph M. Hellerstein, and David Culler. Scalable, Distributed Data Structures for Internet Service Construction (pdf), Proceedings of the Fourth Symposium on Operating Systems Design and Implementation (OSDI 2000).
[Distributed hash system applied to web server farms but applicable to wide-area P2P nets, but it's not fault-tolerant.]

Gritter M., D. R. Cheriton. An Architecture for Content Routing Support in the Internet. 3rd USENIX Symposium on Internet Technologies and Systems ( USITS'01), March 2001, San Francisco, California.

[Hong02] Hong T., Performance (pdf) In Peer-to-Peer: Harnessing the Power of Disruptive Technologies, ed. by A. Oram. O'Reilly and Associates 2001.
[An early paper on P2P performance and a good introduction to analyzing P2P performance and scalability.]

[Karger99] David Karger, Alex Sherman, Andy Berkheimer, Bill Bogstad, Rizwan Dhanidina, Ken Iwamoto, Brian Kim, Luke Matkins, Yoav Yerushalmi. Web Caching with Consistent Hashing, In WWW8 May 1999.
[The basis for Akamai]

[Karger00] David Karger, Eric Lehman, Tom Leighton, Matthew Levine, Daniel Lewin, Rina Panigrahy. Consistent Hashing and Random Trees: Tools for Relieving Hot Spots on the World Wide Web. STOC 1997.
[The more technical version of the paper.]

John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels, Ramakrishna Gummadi, Sean Rhea, Hakim Weatherspoon, Westley Weimer, Chris Wells, and Ben Zhao. OceanStore: An Architecture for Global-Scale Persistent Storage (pdf), Proceedings of the Ninth international Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2000), November 2000.

[Javonavic01] Mihajlo A. Jovanovic, Fred S. Annexstein, Kenneth A. Berman Scalability Issues in Large Peer-to-Peer Networks - A Case Study of Gnutella University of Cincinnati Technical Report 2001
[Not much on scalability, but observed strong small-world characteristics and a power-law distribution of node degrees in Gnutella]

[Osokine1]S. Osokine, The Flow Control Algorithm for the Distributed 'Broadcast-Route' Networks with Reliable Transport Links. Jan 2001.
[Describes the theoretical approach to flow control in distributed Gnutella-type networks]

[Osokine2] S. Osokine, The Implementation of the Flow Control Algorithm for the Distributed 'Broadcast-Route' Networks in the Finite Message Size Case. Apr 2001.
[ Describes a practical approach to flow control implementation]

Palmer C. R., Steffan J. G., Generating Network Topologies That Obey Power Laws (ps), GlobeComm 2000.

[Plaxton99] Plaxton C. G., R. Rajaraman, and A. W. Richa. Accessing nearby copies of replicated objects in a distributed environment (ps). Theory of Computing Systems, 32:241-280, 1999.
[A key paper describing Plaxton Trees]

[Prinkey00] Michael T. Prinkey, An Efficient Scheme for Query Processing on Peer-to-Peer Networks, Aeolus Research, Inc.

[CAN01] Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, A Scalable Content-Addressable Network (CAN)( pdf) In Proceedings of ACM SIGCOMM 2001.
[A scalable architecture for storing and retrieving information in the Internet. The idea is to hash the key into a virtual coordinate space and define a routing procedure and a method to construct and maintain the coordinate space. Simulations explore the performance aspects of the system]

[Ritter00] Ritter, Jordan; Why Gnutella Can't Scale. No, Really.
[Ritter, former CTO of Napster, makes the theoretical argument that Gnutella is a real bandwidth killer app. While the point is clear, it is has many unbacked assumptions. Check In defense of Gnutella and Gnutella: Setting the Record Straight]

[PAST] Rowstron, Antony; Peter Druschel;
Past: Persistent and anonymous storage in a peer-to-peer networking environment (ps). In Proceedings of the 8th Conference on Hot Topics in Operating Systems (HotOS 2001), May 2001.

[Saroiu01] Stefan Saroiu, P. Krishna Gummadi and Steven D. Gribble A Measurement Study of Peer-to-Peer File Sharing Systems University of Washington, July 2001.
[Excellent paper that argues networks must delegate different degrees of responsibility to nodes based on the server and bandwidth characteristics and the degree of trust.]

[Zhao01] Ben Y. Zhao, John Kubiatowicz and Anthony Joseph; Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing (pdf). UCB Tech. Report UCB/CSD-01-1141.

Acknowledgements

Thanks go to Skyris Networks for introducing me to this active area, and people on the Pho list and p2p-hackers list.