A tutorial walkthrough

2 A tutorial walkthrough

Welcome to the Utool tutorial! In this tutorial, we will walk you through some of the basic operations that Utool supports.

2.1 Installation

Utool is distributed as a Java package with the filename Utool-<version>.jar. After downloading it from the website, you can simply run it as follows:²

$ java -jar Utool-3.1.jar
Usage: java -jar Utool.jar <subcommand> [options] [args]
Type `utool help <subcommand>' for help on a specific subcommand.
Type `utool --help-options' for a list of global options.
Type `utool --display-codecs' for a list of supported codecs.

Available subcommands:
    solve        Solve an underspecified description.
    solvable     Check solvability without enumerating solutions.
    convert      Convert underspecified description from one format to another.
    classify     Check whether a description belongs to special classes.
    display      Start the Underspecification Workbench GUI.
    server       Start Utool in server mode.
    help         Display help on a command.

Utool is the Swiss Army Knife of Underspecification (Java version).
For more information, see www.coli.uni-sb.de/projects/chorus/utool/

This assumes that you have installed Java 5.0 or higher and it is in your path. You can move the Jar to any directory you like and then pass the pathname of the file to Java. You could also define an shell alias for calling Utool more conveniently if you like, e.g. as follows (on a bash shell):

$ alias utool='java -jar /usr/local/Utool-3.1.jar'
$ utool
Usage: java -jar Utool.jar <subcommand> [options] [args]
Type `utool help <subcommand>' for help on a specific subcommand.
Type `utool --help-options' for a list of global options.
Type `utool --display-codecs' for a list of supported codecs.
...

For the rest of this tutorial, we will only write utool for the call to the Utool main program, for easier readability. You can either define the alias, or expand utool to the appropriate java -jar Utool-3.1.jar call yourself.

2.2 The Underspecification Workbench

The most convenient way to work with Utool is via Ubench, the Underspecification Workbench. This is a GUI that will visualise USRs for you and offers access to almost the entire functionality of Utool.

2.2.1 Opening examples

You can start Ubench like so:

$ utool display

This will open a window that looks as follows:

Now let's have a look at an example. Utool comes with a number of built-in examples, which you can access from Ubench via the File/Open Example menu. So go to this menu, and load the example holesemantics-14.hs.pl. This is a Hole Semantics USR for the sentence “Every man in a restaurant knows a woman with a car”, in the Prolog format used in the Blackburn and Bos textbook [[]Blackburn and Bos2005]. Ubench will internally convert this USR into a labelled dominance graph and then draw it as follows:

The window displays some interesting information below the dominance graph. In the lower right corner, you will see the letters “N c L H”. These letters express the membership of the graph in a number of graph classes: If you mouse-over the letters, you will see that the graph is normal, compactifiable, leaf-labelled, and hypernormally connected. (These notions are explained e.g. in [[]Koller2004].) This is nice but unsurprising: The translations from Hole Semantics and MRS require the resulting dominance graph to be leaf-labelled and hypernormally connected, because these are formal prerequisites of the correctness of the translations. If the USR hadn't been of this form, Ubench would have refused to translate it and displayed an error message.

2.2.2 Solving dominance graphs

In addition, the status bar claims that this dominance graph has 14 solved forms. A solved form is an arrangement of the tree fragments (the subgraphs that are connected by solid edges) into a forest, in such a way that if there is a path from node u to node v in the dominance graph (via any kinds of edges), there is still a path from u to v in the solved form. You can look at them by clicking on the “Solve” button in the lower left. The result will look as follows:

As you can see, Ubench opened a new tab “holesemantics-14.hs.pl SF #1”, which displays the first solved form of the graph. In linguistic terms, the solved form represents one of the scope readings, in which the different quantifiers have been assigned one particular scoping. You can see that the solved form may still contain (dotted) dominance edges. But because the original graph was hypernormally connected, you can imagine that they can all be removed and the result is a tree that only consists of tree edges and represents a formula of first-order predicate logic. In technical terms, this is a configuration of the original graph; again, see e.g. [[]Koller2004] for details.

Notice that the status bar of a solved form is different than that of an unsolved dominance graph: It now displays the index of the solved form among all solved forms of this graph, and arrows for moving back and forth between the different solved forms. Go ahead and look at some solved forms.

2.2.3 Exporting graphs and solved forms

As we have said above, Utool works internally with labelled dominance graphs. It uses modules called input codecs to map string representations of USRs in some formalism into labelled dominance graphs. When you opened the example holesemantics-14.hs.pl earlier, Ubench recognised that the filename extension .hs.pl is associated with the holesem-comsem input codec, and used this codec to translate the contents of the file into the graph it displayed for you [[]Koller etal.2003]. Utool is distributed with a number of different input codecs. Some of these (e.g. the ones that deal with various concrete syntaxes of dominance graphs) are rather trivial, whereas others (e.g. the MRS input codecs) are quite sophisticated and needed to be proved correct [[]Fuchss etal.2004].

Utool also comes with several output codecs, which solve the converse task of computing a string representation for a labelled dominance graph. Let's have a look at these for a moment. Go back to the “holesemantics-14.hs.pl” tab, and choose File/Export from the menu. If you open the “file format” dropdown menu, you will see the list of output codecs that Ubench is aware of. One thing you can try here is to select the domgraph-dot output codec, which exports the current dominance graph in the “dot” graph format. Then you can use a graph-drawing tool that can deal with dot files, such as Graphviz, to visualise the graph.

Alternatively, you can make Ubench write all solved forms of the current dominance graph into a file. Choose File/Export Solved Forms from the menu and choose, for instance, the term-prolog output codec. This will save all solved forms of the graph as a list of Prolog terms, as they would e.g. be computed by the Blackburn and Bos USR solver.

You can display a list of all installed codecs by selecting Help/Show All Codecs in the Ubench menu, or by calling utool --display-codecs on the command line. You can also extend Utool/Ubench quite easily by implementing your own codecs. This is described in more detail in Section 428Codecssection.4.

2.2.4 Redundancy elimination

Let's finish off the Ubench section of this tutorial with a more advanced operation. Utool is able to modify an underspecified representation in such a way that readings are removed if they are equivalent to some other reading that is still described by the USR. This can be very useful in practice. Consider the following sentence (this is Sentence 1262 from the Rondane Corpus):

For travellers going to Finnmark there is a bus service from Oslo to Alta through Sweden.

According to the English Resource Grammar (ERG; [[]Copestake and Flickinger2000]), this sentence has got 3960 scope readings. However, this ambiguity only comes from the fact that the ERG analyses proper names as potentially scope-bearing quantifiers, and the only difference between the different readings is the relative scope of these proper-name “quantifiers”. More generally, we can say that all readings of the sentence are semantically equivalent, and it would be desirable to remove 3959 of these equivalent readings and retain only a single representative. And while the redundancy in this particular example comes only from spurious scope ambiguities between proper names, the problem of reducing the number of logically equivalent reading is not restricted to them and also covers ambiguities between e.g. different existential quantifiers or different universal quantifiers.

Utool implements a redundancy elimination algorithm, which will modify a USR in such a way that readings that are semantically equivalent to remaining readings are deleted. It does this without ever enumerating readings, and is thus very efficient [[]Koller and Thater2006]. At the same time, it is very effective: It reduces the median number of readings on the Rondane Treebank from 55 to 4. Let's try it out.

The first thing you will need is a file that defines equivalences. One such file is distributed with Utool. In order to access it, unpack the Utool Jar file by typing the following command in some directory:

$ jar xf Utool-3.1.jar

This will create, among many others, a file erg-examples.xml in the directory examples. This file contains an equivalence definition that is appropriate for the Minimal Recursion Semantics USRs that are computed by the English Resource Grammar [[]Copestake and Flickinger2000]. Remember the pathname of this file for now.

Now go back to Ubench and open the example rondane-1262.mrs.pl in Ubench; the filename extension .mrs.pl is connected to the mrs-prolog input codec, which will read MRS representations in the Prolog format used by the ERG. As you are already familiar with, Ubench will translate the USR into a labelled dominance graph and display it. Select the entry “View/Display Chart” from the menu. This will open a new window that looks as follows:

The new windows displays the dominance chart for the dominance graph. The dominance chart is an internal representation of the set of all solved forms that's more explicit and not quite as compact at the dominance graph itself, but still much smaller than the set of solved forms itself ([[]Koller and Thater2005] – see also Fig. ??). The chart is interesting in itself, and you can play with it a bit (try clicking on the different entries to highlight subgraphs), but for our current purposes, the main point is that it is on the level of charts that we can do the redundancy elimination. So choose Chart/Reduce Chart from the menu of the chart window, and select the file erg-examples.xml that you unpacked from the Jar file earlier. After a moment, most entries will have been deleted from the chart, and the status bar will say that the (smaller) chart represents only a single solved form, rather than the 3960 solved forms that the original chart represented. You can click on the “solve” button to display this single solved form. As you can see, Ubench just modified a USR in order to represent a much smaller set of readings, and it did this without computing the individual readings. From our perspective, this ability to eliminate irrelevant readings that were technically predicted by the grammar but not intended in the particular situation is the main reason why underspecification is important.

If you find that you're working with redundancy elimination a lot, and you're using the same equivalence files repeatedly, you can set a global equivalence file in the Solver menu. You can then use it conveniently from every chart window by using the menu entry “Reduce with global equivalence system”.

2.3 Running Utool from the command line

All the functionality that you have just explored from the GUI is also available on the command line. By way of example, let's solve the example file chain3.clls, which comes with Utool. This is the pure chain of length 3; it is shown in Fig. 111Running Utool from the command linefigure.1), and you can also look at it via Ubench. This graph is solvable, and has five solved forms.

In order to enumerate the solved forms of chain3.clls, you can call Utool as follows:

$ utool solve -O term-prolog ex:chain3.clls
f1(a0,f2(a1,f3(a2,a3)))
f1(a0,f3(f2(a1,a2),a3))
f2(f1(a0,a1),f3(a2,a3))
f3(f2(f1(a0,a1),a2),a3)
f3(f1(a0,f2(a1,a2)),a3)

Figure 1: The chain of length 3 (left), along with one of its five solved forms (middle) and one of its configurations (right).

As you can see, the Utool command line consists of four parts:

utool (or java -jar Utool-3.1.jar): This instructs Java to load the Jar file and run its main class. You can pass further arguments to the Java VM by putting them before the -jar option.
solve: This is the command that Utool should execute. In the example, we have run the solve command, which enumerates all solved forms of the USR and prints them. There are six other commands – solvable, convert, classify, display, server, and help – which perform different tasks. They are described in detail in Section 313Using Utoolsection.3.
-O term-prolog: After the command, you can specify options. The option -O term-prolog (or, equivalently, --output-codec term-prolog) specifies that the solve command should encode the solved forms using the term-prolog output codec. You could have specified the input codec with the -I or --input-codec option, but Utool already inferred that it should use the input codec domcon-oz from the filename extension .clls.
ex:chain3.clls: Finally, you can specify where the input USR should come from. In this example, we have used ex:chain3.clls. This tells Utool that it should look inside its Jar file for a resource with the name examples/chain3.clls and read the USR from there. Alternatively, you can specify an ordinary filename here, and Utool will read the input USR from your file system. You can also pass a hyphen (-) for this argument to make Utool read its input from standard input.

Some USRs have a lot of readings, and you either don't want to see them all, or it takes too long to enumerate them, but are still curious about how many readings the USR has. This is what the solvable command is for. You run it as follows:

$ utool solvable -s ex:thatwould.clls
The input graph is normal.
The input graph is not compact, but I will compactify it for you.

Solving graph ... it is solvable.
Splits in chart: 650
Time to build chart: 105 ms
Number of solved forms: 64764

As you can see, we have now instructed Utool to run the solvable command on the input USR ex:thatwould.clls. In addition, we have specified the -s (or --display-statistics) option to get more informative output. Notice that we didn't have to specify an output codec, because solvable doesn't output USRs or readings.

Utool has computed a dominance chart with 650 entries and established that this chart represents a set of 64764 solved forms. However, it has not actually computed the solved forms themselves. (This is what solve would have done after computing the chart.) It has also set an exit code of 1 because the graph is solvable (has solved forms), so if you are using a bash shell, you can do the following immediately after the utool call:

$ echo $?
1

If you want to give your computer a challenge, we encourage you to run solvable on the input ex:rondane-650.mrs.pl, an MRS USR representing more than two trillion readings. You will have to increase Java's memory limit by passing it the option -Xmx512m to avoid “out of memory” errors.

Two further Utool commands are convert and classify, which allow you to convert USRs into other formalisms (comparable to opening a USR and then exporting it using a different codec in Ubench) and to determine their membership in various graph classes (a more sophisticated version of Ubench's “N C L H” display in the status bar). We will not cover these commands in this tutorial. However, it is worth pointing out that you can always get help on the command-line usage of Ubench using the help command:

$ utool help

will display an overview of possible commands, whereas

$ utool help solve

will display help on how to use the solve command (and similarly for the other commands).

2.4 The Utool Server

The third way of running Utool is in a server mode. In server mode, Utool keeps running indefinitely; it accepts commands via a socket, executes these commands, and sends the results back through the socket. You can start Utool in server mode from Ubench by clicking on the server icon in the top right corner, or from the command line as follows:

$ utool server

This will open a socket on port 2802 of your machine and listen to commands sent to this port. If you have Perl installed on your system, you can test the Utool Server using the demo client we have included in the distribution. Keep the Utool server process running, change to the directory where you unpacked the Jar earlier, and execute the following command:

$ perl tools/client/utool-client.pl solve -I domcon-oz -O term-prolog \
         examples/chain3.clls
f1(a0,f2(a1,f3(a2,a3)))
f1(a0,f3(f2(a1,a2),a3))
f2(f1(a0,a1),f3(a2,a3))
f3(f2(f1(a0,a1),a2),a3)
f3(f1(a0,f2(a1,a2)),a3)

The server supports exactly the same commands as the command-line version (except that you can't use it to start another server). However, it has the advantage that you need to run only a single process to execute any number of commands, whereas you must start a new Java process for each new command when you use the command-line tool. This saves runtime if you need to run a large number of successive commands, such as when classifying all USRs in a corpus: You don't have the overhead for starting Java, and the programme will also become faster over time because the Java system has the opportunity to just-in-time compile the Java bytecode.