University Fachrichtung Up




Description of the project LRE-61-061

Reusability of Grammatical Resources (RGR)


New Paper (July 28, 1995): Extending Unification Formalisms


The Problem of Reusability

All natural language applications need to have some knowledge of how language operates, usually represented in a grammar. A computationally usable grammar states facts about a language in a grammar formalism - a kind of programming language specifically designed for linguistic information. It takes several years of well-trained specialists' labour to develop a grammar that is usable for applications.

Although there has been much progress in grammar formalisms in recent years, there is still a gap between the descriptions linguists use and the expressive means that a grammar formalism offers, so that linguistic concepts must be painstakingly encoded in the formalism. This makes it hard to re-use an existing grammar, since it is almost impossible to adapt it to new requirements.

Project Objectives

The aim of the LRE-61-061 project "The Reusability of Grammatical Resources" is to reduce the gap between linguistic descriptions and computationally usable grammars by enriching the grammar formalism, drawing on the latest developments in computational linguistics and logic programming languages.

The implementation, including documentation, will be made freely available for use to the European scientific and business communities.

Project Description

The project takes the Advanced Linguistic Engineering Platform (ALEP) as its starting point. ALEP is a state-of-the-art feature and unification-based linguistic formalism, which offers advanced text handling and version management facilities, as well as a configurable environment for developing and debugging grammars. ALEP was designed by the European Community project ET6/1 and implemented under the project ET9.

RGR aims to provide extensions to ALEP which support descriptions of natural languages as they are used in current linguistic theories. The project is divided into four phases, which largely follow the standard model for software research and development.

1. Problem Analysis Phase

In the first phase, the project surveyed the datatypes that are being used in the most important current linguistic theories, namely Head-Driven Phrase Structure Grammar, Lexical Functional Grammar, Government-Binding Theory and Categorial Grammar.

The following datatypes were selected because they enjoy widespread use in linguistic descriptions, and because they offer additional expressive power over what is available in current formalisms.

2. Design Phase

In the second phase the main focus of the work was on formalisation of the above datatypes and operations and on the exact specification of the extensions. Some of the extensions have been implemented in this phase.

3. Implementation Phase

The major goal of this phase is the implementation by July 1994 of the extensions by according to the specifications.

4. Assessment and Consolidation

Testing, evaluation and integration of the extensions into ALEP, as well as documentation are the main tasks for this last phase. The final version of the extensions will be released at the end of this phase. The implemented extensions will be presented at an end-of-project workshop.

Administrative Information

Funding

The project is supported by the Commission of the European Communities, as part of the Third RTD Framework "Telematic Systems in Areas of General Interest," area 6 "Linguistic Research and Engineering."

Duration

January 1993 to January 1995.

Reviewing

The results of each phase are reported in deliverables, and the progress is regularly evaluated by researchers from other institutions.

Publications

Deliverable A: Selection of Datatypes. July 1993

Deliverable B: Specification of Datatypes. January 1994

Deliverable C will be available in July 1994, and deliverable D together with the extensions developed in the project in January 1995.

The results are also presented at major international conferences and specialised workshops.

A description of the project has appeared in Elsnews 3.1 (1994), the newsletter of the European Network in Language and Speech.

Participants

The project involves researchers from four leading research centres in Natural Language Processing:

Contact Address

The reports and any other information about the project can be obtained from:
Herbert Ruessink
Stichting Taaltechnologie
Trans 10
NL-3512 JK Utrecht
tel.: +31-(0)30-536369
fax : +31-(0)30-536000
email: Herbert.Ruessink@let.ruu.nl
"Reusability of Grammatical Resources" Workshop (April 6-7, 1995)

Text + Layout: G. Erbach + H. Ruessink / 15.04.94