The concept of Naive Page rank and its significance

Tuesday, July 14, 2009

Every time I see the Google toolbar to see the check the page rank of a web-site (which is a measure of the importance of that page to Google, a score between 1 - 10), I was curious to know how page rank was calculated, given the huge number of web pages indexed by Google, fortunately I got a chance to explore page rank in-depth during my Stanford classes, how it resulted in the birth of Google, the first search engine ever to rank web-pages based on a reputation score.

Page rank of a web page is simple words means its reputation among other web-pages indexed by Google, here reputation is defined as how other highly reputed people think of you, therefore the more you get rated by other highly reputed people, the better your rank will be, in other words, if a web-page is linked by pages with high page rank, then its page rank will increase substantially, its essentially a concept sharing reputation, in other words, if a highly ranked web-page links to your web-page, then your reputation or rank will be increased as you get a high share of reputation.

One of the important problem addressed by page rank system is it answers the question "How to determine the highly reputed web-pages initially", let me give a very simple example of what is called as Naive Page rank with a simple setup and its significance.

Figure 1: A simple Naive Page rank setup with four nodes

In the above setup, there are four nodes (or web-pages), each with an incoming and an outgoing link, to determine the Naive page rank of each node, we have the following conditions.

Page rank of a node = Sum of Page ranks of nodes linking it

Also, if a node has two links, then its page rank is shared between two nodes, or each outgoing link from this node would have a page rank = Page rank of linking node / 2, i.e if a page A has 'n' links, then the page rank of the 'n' web-pages to which node 'A' links will be increased by Page-Rank(A) / n.

Another constraint in the Naive page rank system is that the sum of the page ranks of all nodes = 1, from which one can determine the relative weights of individual web-pages in a normalized form.

Therefore in the above setup, in accordance with the above conditions,

Page-Rank(A) = Page-Rank(D)
Page-Rank(B) = Page-Rank(A)
Page-Rank(C) = Page-Rank(B)
Page-Rank(D) = Page-Rank(C)

Also, we have Page-Rank(A) + Page-Rank(B) + Page-Rank(C) + Page-Rank(D) = 1, hence

Page-Rank(A) = Page-Rank(B) = Page-Rank(C) = Page-Rank(D) = 1/4.

Hope you got the basics of page rank from the above scenario, even though the Naive Page rank is quite important in understanding how web-pages are assigned reputation scores, its has an inherent weakness as well, i.e its vulnerable to link-spam or collusion where a group of nodes link to each other to improve their page rank which will drastically bring down the page rank of other nodes.

For example, in the above scenario, if node B links to node A instead of node C, then A and B can increase their page rank with circular links to each other, in which case we have,

Figure 2: Link spamming or collusion in the above Naive Page rank setup

Page-Rank(A) = Page-Rank(D) + Page-Rank(B)
Page-Rank(B) = Page-Rank(A)
Page-Rank(C) = 0 (Since no page links to C)
Page-Rank(D) = Page-Rank(C) = 0

Also, Page-Rank(A) + Page-Rank(B) + Page-Rank(C) + Page-Rank(D) = 1

Therefore, Page-Rank(A) = Page-Rank(B), hence Page-Rank(A) + Page-Rank(B) = 1, from which we get

Page-Rank(A) = 1/2
Page-Rank(B) = 1/2
Page-Rank(C) = 0
Page-Rank(D) = 0

Hence two nodes linking to each other can drastically bring down the page rank of the other two nodes in the above setup while doubling their page rank, resulting in what is known as link-spam or collusion due to circular links, this problem is addressed by the Page-rank system developed by Larry Page and Sergey Brin which we will explore in a future section, however the Naive-Page rank gives a basic overview of how web-pages are assigned a reputation score.

2 comments:

thymian said...

Are there any alternative methods for ranking, instead of traditional page-rank and its variations? May be some alternative designs help in reducing link spam by a whole lot.

Prasanna Seshadri said...

There are many, you should first understand how the real page rank algorithm works and the way it deals with circular links.


Copyright © 2016 Prasanna Seshadri, www.prasannatech.net, All Rights Reserved.
No part of the content or this site may be reproduced without prior written permission of the author.