Efficient Crawling Through Dynamic Priority of Web Page in Sitemap

Main Author: Rahul kumar1 and Anurag Jain2
Format: Article Journal
Bahasa: eng
Terbitan: , 2014
Subjects:
WWW
Online Access: https://zenodo.org/record/1324011
Daftar Isi:
  • A web crawler or automatic indexer is used to download updated information from World Wide Web (www) for search engine. It is estimated that current size of Google index is approx 8*109 pages and crawling costs could be around 4 million dollars for a full crawl if only considered network costs. Thus we need to download only most important pages. In order toward, we propose “Efficient crawling through dynamic page priority of web pages in Sitemap” which is query based approach to inform most important pages to web crawler through sitemap protocol in dynamic page priority. Through the page priority web crawler can find most important pages from any website and may just download them. Experimental results reveal our approach has better performance than existing approach