Text Mining for Automatic Knowledge Acquisition of Expert System

Ardi

Thesis

Master of Computer Science

Budi Luhur University

2008

 

Abstract

On a conventional knowledge acquisition process when developing an Expert System, the developers need to undertake an interview with the experts of a particular domain and then writes the expert system's knowledge base manually. This activity required writing a huge of codes and took a lot of time when interviewing the expert.

This research developed a new approach to automate the Knowledge Acquisition process on developing an Expert System using Text Mining and Artificial Neural Network Technique.

The automated knowledge acquisition method developed in this research uses the documented source of knowledge in a specific domain in English natural language text format and results in a structured knowledge base with the object-oriented/frame format. The inference engine calculation of the expert system would perform with Dempster-Shafer theory. As a result, this method would be able to produce several expert systems automatically based on the documented knowledge from many particular domains without coding in a short time.

Overall, the automatic knowledge acquisition process on this method starting from Part-of-speech (POS) tagging, identifying the topics, clustering the relevant topics, extracting the descriptions of the topics, recognising the indication factors of the issue, and culminates in extracting the solutions from the text.

First of all, the method would perform a Part-of-speech (POS) tagging on the whole documents to recognise any tags of each sentence to tags every verbs and noun words of the sentences in the entire documents.

Secondly, identify the potential topics using the calculation of Information Retrieval theory from every noun in the documents. Follow by clustering the relevant topics by using Self Organizing Maps (SOM) Neural Network. Only groups with higher value would be considered as the topic issues, by that, the next step is to create sentences that clearly describe the issue for the users by extracting sentences that contain the topic's keywords with a specific pattern that potentially state the issue. By using similar logic, the next step, extracting sentences that potentially describe the solutions of the issue. The solutions sentences will be provided to the users as the solution's information to their problems.

Finally, recognising sentences pattern that potentially describes the indications representing the symptoms of the particular issue, then transforms the sentences into question sentences. The knowledge base has been produced at this stage, with format one topic will have many indications. This knowledge base will be used as the based calculation of the inference engine of the expert system.

 

Keywords:

Automatic Knowledge Acquisition, Part-of-Speech Tagging, Text Mining, Information Retrieval, Neural Network, Self Organising Maps, Information Extraction, Expert System, Knowledge Base, Object-Oriented Knowledge Base Representation, Inference Engine, Dempster-Shafer.

 

Cite