Saarland University
COLLATE
Research
corpus annotation
dialogue system
information extraction
current information management
People
Publications
Contact
Funding

COLLATE (UdS)

Computational Linguistics and Language Technology for Real Life Applications

Research

Collate Logo

Web Corpora for Information Management

We investigate the use of corpora with connectivity information (hyperlinks) for information management applications in specific domains. We will build up a web corpus for the language technology domain, which consists of a database of documents (with full-text index and meta-information) and a database of hyperlinks between documents. As a starting point for collection of the web corpus, we use the database of categorised web pages from LT-World. Information management applications include summarisation, categorisation, clustering, information extraction (discovery of relations), information retrieval, terminology extraction, and definition mining.


last change: 18th September 2002 by bering@coli.uni-sb.de