*ILO Somtoochukwu F., Victor Onuchi, Akuma Uche and Okah, Paul-Kingsley

WJERT Citation

	All	Since 2020
Citation	172	110
h-index	7	5
i10-index	1	0

Login

News & Updation

Article Invited for Publication

Article are invited for publication in WJERT Coming Issue
ICV

WJERT Rank with Index Copernicus Value 79.45 due to high reputation at International Level
WJERT New Impact Factor

WJERT Impact Factor has been Increased to 7.029 for Year 2024.
WJERT: AUGUST ISSUE PUBLISHED

AUGUST 2025 Issue has been successfully launched on 1 AUGUST 2025.
New Issue Published

Its Our pleasure to inform you that, WJERT August 2025 Issue has been Published, Kindly check it on https://www.wjert.org/home/current_issues

Abstract

DESIGN AND IMPLEMENTATION OF A HIGH PERFORMANCE WEB CRAWLER FOR INFORMATION EXTRACTION

*ILO Somtoochukwu F., Victor Onuchi, Akuma Uche and Okah, Paul-Kingsley

ABSTRACT

Broad web search engines as well as many more specialized search tools rely on web crawlers to acquire large collections of pages for indexing and analysis. Such a web crawler may interact with millions of hosts over a period of weeks or months, and thus issues of robustness, flexibility, and manageability are of major importance. In addition, I/O performance, network resources, and operating system limits must be taken into account in order to achieve high performance at a reasonable cost. In this study, we describe the design and implementation of a high performance web crawler that runs on a network of workstations. The crawler scales to (at least) several hundred pages per second, is resilient against system crashes andother events, and can be adapted to various crawling applications. We present the software architecture of the system, discuss the performance bottlenecks, and describe efficient techniques for achieving high performance. An algorithm was developed for the web crawler to download web pages that are to be indexed by the search engine and based on the algorithm developed above, the source code was written in the PHP scripting language that is suited for web development and can be embedded into HTML. The source code was then integrated in Apache Server Environment for automatic web search engine fetch sequencing. The program workability was test run in OSI model layer 7 using Hypertext Transfer Protocol (HTTP).

[Full Text Article] [Download Certificate]

World Journal of Engineering Research and Technology

( An ISO 9001:2015 Certified International Journal )

An International Peer Reviewed Journal for Engineering Research and Technology

An Official Publication of Society for Advance Healthcare Research (Reg. No. : 01/01/01/31674/16)

ISSN 2454-695X

Impact Factor : 7.029

ICV : 79.45

WJERT Citation

News & Updation

Article Invited for Publication

ICV

WJERT New Impact Factor

WJERT: AUGUST ISSUE PUBLISHED

New Issue Published

Indexing

Abstract

DESIGN AND IMPLEMENTATION OF A HIGH PERFORMANCE WEB CRAWLER FOR INFORMATION EXTRACTION