We have seen that resolving ambiguity in sentences requires an interpreter to have a general knowledge of the world. Understanding the significance of a vague utterance expressed in context also requires knowledge. Thus, a theoretical model of language understanding is not complete without a model of knowledge representation and retrieval, and we cannot construct a robust understanding computer without providing it with an encyclopaedic knowledge of the world. These are rather pessimistic conclusions, but they need not prevent us from continuing with the theoretical study of language or indeed from constructing useful computer programs that operate in limited domains. They do, however, suggest that we must try to classify in a rigorous way the kinds of knowledge that guide a language user and codify enough of it to ensure that our overall models are realistic.

Assimilation of Information is a complex task, which involves a series of actions such as scanning and processing huge volume of written text. One analyst describes the problem by saying, “ If I read every bit of information that might be important to what I am working on, it would be like reading War and Peace every day”

Information Retrieval (IR) deals with the access to the representation of organized information items, where the representation of the information items depends on queries submitted by the user. Therefore, it is dynamic and different from one to another.

Information extraction (IE) is a term, which is applied to the activity of automatically extracting pre-specified types of information from short, natural language texts -- typically. IR and IE together may be seen as the activity of populating a structured information source (or Knowledge base) from an unstructured or free text information source.

The contrast between the aims of IE and IR systems can be summed up as: IR retrieves relevant documents from collections, IE extracts relevant information from documents. The two techniques are therefore complementary, and their use in combination has the potential to create powerful new tools in text processing.

