Natural language processing is a field of computer science, artificial intelligence,
and computational linguistics concerned with the interactions between computers
and human(natural) languages. As such, NLP is related to the area of human
computer interaction. Many challenges in NLP involve: natural language
understanding, enabling computers to derive meaning from human or natural
language input; and others involve natural language generation.
Modern NLP algorithms are based on machine learning,
especially statistical machine learning. The paradigm of machine learning is
different from that of most prior attempts at language processing. Prior
implementations of language-processing tasks typically involved the direct hand
coding of large sets of rules. The machine-learning paradigm calls instead for
using general learning algorithms — often, although not always, grounded
in statistical inference — to automatically learn such rules through the analysis
of large corpora of typical real-world examples. A corpus (plural, "corpora") is a
set of documents (or sometimes, individual sentences) that have been hand-
annotated with the correct values to be learned.
Many different classes of machine learning algorithms have been applied to NLP
tasks. These algorithms take as input a large set of "features" that are generated
from the input data. Some of the earliest-used algorithms, such as decision
trees, produced systems of hard if-then rules similar to the systems of hand-
written rules that were then common. Increasingly, however, research has
focused onstatistical models, which make soft, probabilistic decisions based on
attaching real-valued weights to each input feature. Such models have the
advantage that they can express the relative certainty of many different possible
answers rather than only one, producing more reliable results when such a
model is included as a component of a larger system.
tatistical natural-language processing uses stochastic, probabilistic,
and statistical methods to resolve some of the difficulties discussed above,
especially those which arise because longer sentences are highly ambiguous
when processed with realistic grammars, yielding thousands or millions of
possible analyses. Methods for disambiguation often involve the use
of corpora and Markov models. The ESPRIT Project P26 (1984 - 1988), led
by CSELT, explored the problem of speech recognition comparing knowledge-
based approach and statistical ones: the chosen result was a completely
statistical model.[9] One among the first models of statistical natural language
understanding was introduced in 1991 by Roberto Pieraccini, Esther Levin, and
Chin-Hui Lee from Bell Laboratories.[10] NLP comprises all quantitative
approaches to automated language processing, including probabilistic
modeling, information theory, and linear algebra.[11] The technology for
statistical NLP comes mainly from machine learning and data mining, both of
which are fields ofartificial intelligence that involve learning from data.
Does NLP require linguistics understanding?
What is the implementation of NLP in robotics?
Will the future study of NLP revolve AI?