by Stefano Federici, Simonetta Montemagni, Vito Pirrelli
The role and power of analogy in the acquisition and mastering of language has been largely neglected in recent linguistic literature. An explanation can mainly be found in the inherent difficulty of defining a formal setting for a rigorous evaluation of the power of analogy, which has thus been dismissed by most formal linguists as a woolly and at best unworkable notion. Nowadays, the general availability of computers with huge and cheap storage resources appears to offer an unprecedented opportunity for an algorithmic definition of analogy and for a scientific assessment of its role in Natural Language Processing applications. We discuss recent work in this area in collaboration with the "Istituto di Linguistica Computazionale" (ILC-CNR), Pisa.
Over the last four years, we have been developing in Pisa a variety of computational tools (e.g. in speech recognition and information retrieval) for the acquisition/analysis of Italian at different levels of linguistic description, all of which are based on a common analogy-based architecture. These tools have also been extended to the treatment of other languages (in particular English and French). Analogy-based self-learning techniques are competitive tools which combine the advantages of using language independent, tractable algorithms with the welcome bonus of being more reliable for real size applications than traditional systems.
Generalization by analogy can be defined as the inferential process by which an unfamiliar object (the target object) is seen as an analogue of known objects of the same type (the base objects) so that whatever properties are known about the latter are assumed to be transferable to the former. Correspondingly, by analogy-based language learning we mean the entire process of:
i) incremental acquistion of (unselected) base objects through exposure to an available repository of data (e.g. a training corpus), ii) interpretation/generation of as yet unknown objects through generalization by analogy.
The assumption in i) represents an indispensable requirement for any self-learning algorithm intended to be psycholinguistically plausible: training evidence should not be carefully selected a priori to ensure convergence of the learning algorithm.
The requirements for an algorithmic definition of linguistic analogy are:
The general properties of our definition of analogy can be summarised thus: