World Journal of Engineering Research and Technology (WJERT) has indexed with various reputed international bodies like : Google Scholar , Index Copernicus , Indian Science Publications , SOCOLAR, China , International Institute of Organized Research (I2OR) , Cosmos Impact Factor , Research Bible, Fuchu, Tokyo. JAPAN , Scientific Indexing Services (SIS) , Jour Informatics (Under Process) , UDLedge Science Citation Index , International Impact Factor Services , International Scientific Indexing, UAE , International Society for Research Activity (ISRA) Journal Impact Factor (JIF) , International Innovative Journal Impact Factor (IIJIF) , Science Library Index, Dubai, United Arab Emirates , Scientific Journal Impact Factor (SJIF) , Science Library Index, Dubai, United Arab Emirates , Eurasian Scientific Journal Index (ESJI) , Global Impact Factor (0.342) , IFSIJ Measure of Journal Quality , Web of Science Group (Under Process) , Directory of Research Journals Indexing , Scholar Article Journal Index (SAJI) , International Scientific Indexing ( ISI ) , Scope Database , Academia , 

World Journal of Engineering Research and Technology

( An ISO 9001:2015 Certified International Journal )

An International Peer Reviewed Journal for Engineering Research and Technology

An Official Publication of Society for Advance Healthcare Research (Reg. No. : 01/01/01/31674/16)

ISSN 2454-695X

Impact Factor : 7.029

ICV : 79.45

News & Updation

  • Article Invited for Publication

    Article are invited for publication in WJERT Coming Issue

  • ICV

    WJERT Rank with Index Copernicus Value 79.45 due to high reputation at International Level

  • New Issue Published

    Its Our pleasure to inform you that, WJERT April 2024 Issue has been Published, Kindly check it on https://www.wjert.org/home/current_issues

  • WJERT: APRIL ISSUE PUBLISHED

    APRIL 2024 Issue has been successfully launched on 1 APRIL 2024.

  • WJERT New Impact Factor

    WJERT Impact Factor has been Increased to 7.029 for Year 2024.

Indexing

Abstract

DESIGN AND IMPLEMENTATION OF A HIGH PERFORMANCE WEB CRAWLER FOR INFORMATION EXTRACTION

*ILO Somtoochukwu F., Victor Onuchi, Akuma Uche and Okah, Paul-Kingsley

ABSTRACT

Broad web search engines as well as many more specialized search tools rely on web crawlers to acquire large collections of pages for indexing and analysis. Such a web crawler may interact with millions of hosts over a period of weeks or months, and thus issues of robustness, flexibility, and manageability are of major importance. In addition, I/O performance, network resources, and operating system limits must be taken into account in order to achieve high performance at a reasonable cost. In this study, we describe the design and implementation of a high performance web crawler that runs on a network of workstations. The crawler scales to (at least) several hundred pages per second, is resilient against system crashes andother events, and can be adapted to various crawling applications. We present the software architecture of the system, discuss the performance bottlenecks, and describe efficient techniques for achieving high performance. An algorithm was developed for the web crawler to download web pages that are to be indexed by the search engine and based on the algorithm developed above, the source code was written in the PHP scripting language that is suited for web development and can be embedded into HTML. The source code was then integrated in Apache Server Environment for automatic web search engine fetch sequencing. The program workability was test run in OSI model layer 7 using Hypertext Transfer Protocol (HTTP).

[Full Text Article] [Download Certificate]