◀ 2.2. Using GATE for PIMO text services

2.2.1. How GATE is used in the Semantic Desktop

In cooperation with University of Sheffield, a PIMO Server uses GATE (General Architecture for Text Engineering) as a module for its Ontology-based Information Extraction (OBIE) service, exploiting the PIMO as vocabulary for entity recognition. It therefore utilizes GATE’s Gazetteer functionality which has been slightly modified for the domain and integrated in a way that iterative updates are passed to its underlying finite state machine as soon as changes occur in the PIMO (i.e., creation, adding of an alternative label, label changes, merging, and deletion).

The OBIE service is used throughout the Semantic Desktop infrastructure for getting proposals of PIMO concepts for textual content. This is used in the FireTag extensions for Mozilla Firefox (for web pages), Mozilla Thunderbird (for e-mails), and the SemanticFile- Explorer (for files on the file system). For web pages, we use a GATE function to extract only the relevant content of a web page (By using GATE’s gate.creole.boilerpipe.BoilerPipe; this allows to filter out decoration of the main content such as header, footer, and menus).

These extensions now benefit from a more robust, more precise, and faster recognition. Besides the usual one-time view of a text document and getting proposals, the PIMO Server now also offers bulk upload of objects which are then analysed using the OBIE service.