Web Mining Research: Building Intelligent Information System by exploiting web usage regularities and information structures in web pages

Download this Abstract

The Web is an immense and dynamic collection of pages that includes countless hyperlinks and huge volumes of access and usage information, which will provide a rich and unprecedented data mining source. However, the Web also poses several challenges to effective resource and knowledge discovery. First, the web page complexity far exceeds the complexity of any traditional text document collection. Second, the web constitutes a highly dynamic information source. Not only does the web continue to grow rapidly, the information it holds also receives constant updates. Linkage information and access records also undergo frequent updates.

The Internet’s rapidly expanding user community connects millions of workstations. These users have markedly different backgrounds, interests, and usage purposes. Many lack good knowledge of the information network’s structure and are unaware of a particular search’s heavy cost. Hence lengthy waits are required to retrieve search results.

The proposed research work aims at exploiting the web usage regularities and information structures in web pages to build intelligent information systems. The system should be able to collect and segregate user access information and mine useful information from it. This also should build complete concept models for web user information needs based on the surfers’ access history.

The system uses the information structures such as incoming links, out going links of a web page in mining the information. The incoming links of a page can be used to classify the page in a concise manner. This enhances the browsing and querying of web pages. To increase the commercial value and accessibility of pages, most content sites tend to publish their pages with intra-site redundant information, such as navigation panels, advertisements, and copyright announcements. Such redundant information increases the index size of general search engines and causes page topics to drift. The system should be capable of handling these intra-page informative structures and eliminate the redundant information.

Context of a hyperlink or link context is defined as the terms that appear in the text around a hyperlink within a web page. The system should be able to apply the link contexts to a variety of web information retrieval and categorization tasks. Thus the usage of these approaches in web mining will improve the information extraction and make the web friendlier to the users.

C. Rajesh Kumar, Lecturer, Sathyabama University

Monday, January 14, 2008

Building Intelligent Information System by exploiting web usage regularities and information structures in web pages

No comments:

Categories

Sanjay Sugumar

C Rajesh Kumar

P Saravanan

Blog Archive