Web Mining Research

This is a reminder that on March 3, Sanjay Sugumar sent you an invitation to become part of his or her professional network at LinkedIn.

Your invitation will expire soon! Follow this link to accept Sanjay Sugumar's invitation.

https://www.linkedin.com/e/isd/1117242870/YCTM6hnF/EML-inv_17_rem/

Signing up is free and takes less than a minute.

On March 3, Sanjay Sugumar wrote:

> To: [sanjaysugumar.wmrg@blogger.com]
> From: Sanjay Sugumar [sanjaysugumar@gmail.com]
> Subject: Invitation to connect on LinkedIn

> I'd like to add you to my professional network on LinkedIn.
>
> - Sanjay

The only way to get access to Sanjay Sugumar's professional network on LinkedIn is through the following link:

https://www.linkedin.com/e/isd/1117242870/YCTM6hnF/EML-inv_17_rem/

You can remove yourself from Sanjay Sugumar's network at any time.

--------------

The pending expiration of your invitation is an automatic process triggered by system maintenance. This is the last email message you will receive from LinkedIn about the expiration of this invitation from Sanjay Sugumar.

Tuesday, March 16, 2010

Reminder about your invitation from Sanjay Sugumar

This is a reminder that on March 03, Sanjay Sugumar sent you an invitation to become part of his or her professional network at LinkedIn.

Follow this link to accept Sanjay Sugumar's invitation.

https://www.linkedin.com/e/isd/1117242870/YCTM6hnF/EML-inv_17_rem/

Signing up is free and takes less than a minute.

This is a reminder that on March 03, Sanjay Sugumar sent you an invitation to become part of his or her professional network at LinkedIn.

> To: [sanjaysugumar.wmrg@blogger.com]
> From: Sanjay Sugumar [sanjaysugumar@gmail.com]
> Subject: Invitation to connect on LinkedIn

> I'd like to add you to my professional network on LinkedIn.
>
> - Sanjay

The only way to get access to Sanjay Sugumar's professional network is through the following link:

https://www.linkedin.com/e/isd/1117242870/YCTM6hnF/EML-inv_17_rem/

You can remove yourself from Sanjay Sugumar's network at any time.

--------------

Wednesday, March 3, 2010

Invitation to connect on LinkedIn

I'd like to add you to my professional network on LinkedIn.

- Sanjay

Confirm that you know Sanjay

Monday, January 21, 2008

Topics of Interest

Information Extraction systems
Web Structure Mining
Domain Specific Web Search
Personalized Web Search
Web Page Classification methods
Web caching methods
Automatic Fragment Detection
Intelligent Data Preparation
Automatic Identification of Informative sections in web pages
Adaptive Information Retrieval
Pattern Discovery on Web

Phd 4 years chart - Rajesh Kumar

The chart describing the four year action plan for PhD course is attached. The chart is prepared by Rajesh Kumar
Download Chart

Intelligent Preprocessor for Search Engines and Web miners

Download this Abstract

Data preprocessing is the first step of web mining, data mining, information retrieval and pattern recognition [1]. Once the data is well prepared, the mined results are more accurate and reliable. The principal aim of data preparation [2],[3],[6] is to provide a quality data for other steps in web mining and information retrieval.

Web page classification [4] is one of the essential techniques for Web mining because classifying Web pages of an interesting class is often the first step of mining the Web. However, constructing a classifier for an interesting class requires laborious preprocessing. Although Yahoo and similar Web directory service systems use human readers to classify Web documents, reduced cost and increased speed make automatic classification highly desirable. Typical classification methods [4] use positive and negative examples as training sets, and then assign each document a class label from a set of predefined topic categories. The proposed research work will identify the various such preprocessing problems and develop efficient algorithms based on machine learning and AI techniques to overcome these difficulties and make the web mining process better.

Web pages [3]—especially dynamically generated ones—contain several items that cannot be classified as the “primary content,” e.g., navigation sidebars, advertisements, copyright notices, etc. Most clients and end-users search for the primary content [5], [7], and largely do not seek the non informative content. A tool that assists an end-user or application to search and process information from Web pages automatically [8], must separate the “primary content sections” from the other content sections. The proposed system should process the web pages in such a way that the search engines and web miners concentrate only on the primary content and hence retrieve the best information for the users.

Modeling user preference [10] is one of the challenging issues in intelligent information systems. Extensive research has been performed to automatically analyze user preference and to utilize it. This is again a preprocessing to be done before applying the web mining and searching process. This will help the web miners to present personalized and rich information to the users.

We believe data mining should be integrated with the Web search engine service to enhance the quality of Web searches. To do so, we can start by enlarging the set of search keywords to include a set of keyword synonyms for web search [9]. This preprocessing step will make the search engines to search based on the semantics instead of just the keywords.

Hence processing of web pages, user profiles, user behaviors, access patterns and bringing them to a format as desired by the web miners and search engines is an important research area for web intelligence.

Efficient System for Information Extraction from Web Using Soft Computing approaches

Download this Abstract

The information extraction is the most highly performed activity on the web. Searching, comprehending and using the semi-structured information [2] stored on the web poses a significant challenge because this data is more sophisticated and dynamic than the information that commercial database systems store. To supplement keyword-based indexing, which forms the cornerstone for web search engines; researchers have applied data mining to web-page ranking [1]. In this context, data mining helps web search engines find high-quality web pages for the users.

Defining how to design an efficient information extraction system [3] for web presents a major research challenge. Achieving this requires overcoming two fundamental problems. First, the traditional schemes for accessing the immense amounts of data that reside on the web fundamentally assume the text-oriented, keyword based view [6] of web pages. Second, we must replace the current primitive access schemes with more sophisticated versions that can exploit the web fully.

Discovering and extracting novel and useful knowledge from web sources call for innovative approaches that draw from a wide range of fields spanning data mining, machine learning, soft computing, statistics, databases, information retrieval, artificial intelligence, and natural language processing [8].

The proposed research work aims at developing algorithms for building an efficient system for information extraction from web, using the suitable soft computing approaches. The preprocessing of web pages require the content of the web pages to be stored as fragments which will facilitate in identifying the primary content to be delivered. The user access information can be represented using fuzzy sets [5] which will help in taking the decisions faster [7] and ranking the pages accurately. The system has to learn from the various types of queries it receives and should be capable of performing better and faster for all similar queries. This learning capability [4] can be implemented using the neural network techniques. The Evolutionary computation techniques such as genetic algorithms can be used to support the searching mechanism for retrieving the required information faster. The system will also have support for wide range of queries using the Natural Language Processing techniques [8].
Thus usage of Soft Computing approaches for Information Extraction system is an important research thrust in Web Intelligence. These techniques will make it possible to fully use the immense information available in the web and make the web a richer, friendlier, and more intelligent resource that we can all share and explore.

Monday, March 29, 2010

Your invitation from Sanjay Sugumar is about to expire

LinkedIn

Tuesday, March 16, 2010

Reminder about your invitation from Sanjay Sugumar

LinkedIn

Wednesday, March 3, 2010

Invitation to connect on LinkedIn

LinkedIn

Monday, January 21, 2008

Topics of Interest

Phd 4 years chart - Rajesh Kumar

Intelligent Preprocessor for Search Engines and Web miners

Efficient System for Information Extraction from Web Using Soft Computing approaches

Categories

Sanjay Sugumar

C Rajesh Kumar

P Saravanan

Blog Archive