We have seen that resolving ambiguity in sentences
requires an interpreter to have a general knowledge
of the world. Understanding the significance of
a vague utterance expressed in context also requires
knowledge. Thus, a theoretical model of language
understanding is not complete without a model
of knowledge representation and retrieval, and
we cannot construct a robust understanding computer
without providing it with an encyclopaedic knowledge
of the world. These are rather pessimistic conclusions,
but they need not prevent us from continuing with
the theoretical study of language or indeed from
constructing useful computer programs that operate
in limited domains. They do, however, suggest
that we must try to classify in a rigorous way
the kinds of knowledge that guide a language user
and codify enough of it to ensure that our overall
models are realistic.
Assimilation of Information is a complex
task, which involves a series of actions such
as scanning and processing huge volume of written
text. One analyst describes the problem by saying,
If I read every bit of information that
might be important to what I am working on, it
would be like reading War and Peace every day
Information Retrieval (IR) deals with
the access to the representation of organized
information items, where the representation of
the information items depends on queries submitted
by the user. Therefore, it is dynamic and different
from one to another.
Information extraction (IE) is a term,
which is applied to the activity of automatically
extracting pre-specified types of information
from short, natural language texts -- typically.
IR and IE together may be seen as the activity
of populating a structured information source
(or Knowledge base) from an unstructured or free
text information source.
The contrast between the aims of IE and
IR systems can be summed up as: IR retrieves
relevant documents from collections, IE extracts
relevant information from documents. The two techniques
are therefore complementary, and their use in
combination has the potential to create powerful
new tools in text processing.
|