4 Codecs
Utool is intended as a “Swiss Army Knife” for working with
underspecified representations, and one of our design goals was
therefore to make it compatible with as many current
underspecification formalisms as possible. This is not a trivial task,
as different formalisms typically differ in crucial details, and naive
translations from one formalism to another are typically incorrect for
pathological inputs.
As we have explained above, Utool internally works with labelled
dominance graphs and uses codecs to translate between these
graphs and the actual USRs that the user sees. An input codec
transforms a USR in a certain format into an equivalent labelled
dominance graph, and an output codec transforms a labelled
dominance graph into a USR in a certain format.
Some codecs are quite simple; for instance, the domcon-gxl
codec simply deals with one particular concrete syntax for labelled
dominance graphs. However, there are also input codecs for MRS and
Hole Semantics that do quite a bit of work to compute correct
dominance graphs, and rely on nontrivial formal results; and there are
output codecs for reading the solutions as terms or translating them
into graph file formats that can be displayed using external viewers.
In addition, the Codec API is quite simple, and writing your own codec
requires almost no knowledge of the rest of the Utool system. As
long as you observe certain rules that codecs have to follow, the rest of the Utool system will simply cooperate with
your new codec. In fact, you don't even have to recompile Utool to
make your new codec available to it – you can simply package your own
codec into a Jar file and add it to the classpath when running Utool.
Below, we will go through the input and output codecs in the current
Utool distribution one by one, explain the formal background and the
concrete syntax we assume, and discuss their limitations. Then we will
explain how to write your own codec.
4.1 The domcon-oz codecs
The domcon-oz
codecs deal with representations of dominance
constraints [[]Egg etal.2001] as lists of terms of the Oz programming
language. An example USR that these codecs can deal with looks as
follows:
[label(x f(x1)) label(y g(y1)) label(z a) dom(x1 z) dom(y1 z)]
As you can see, USRs are lists of terms whose head symbols are either
label
or dom
. Each such term stands for an atom
of a dominance constraint. The labelling atom
label(x f(x1 ... xn))
(where n ≥ 0) expresses that the
node x
should have the label f
and its children in a
tree that satisfies the constraint should be the nodes x1
to
xn
, from left to right. The dominance atom
dom(x y)
expresses that the node x
should be above the
node y
in a satisfying tree. In addition, the constraint
contains an implicit inequality atom x ≠ y for any two symbols
x and y that appear as the first argument of a label
term.
This format was first used in the old CHORUS demo (in 1999 or so),
which was written in Mozart Oz. As we know today, weakly normal
dominance constraints can be seen as dominance graphs quite directly
[[]Koller2004]. If we want to read a domcon-oz
USR as a
weakly normal dominance graph, we can read the term
label(x f(x1 ... xn))
as expressing that the node x
in
the dominance graph has the label f
and the children x1
to xn
over tree edges. The term dom(x y)
then expresses
that there is a dominance edge from x
to y
.
The domcon-oz
input codec considers lines that start with a
percent symbol as comments and ignores them. Both the input and the
output codec will deal with arbitrary labelled dominance graphs, even
if they are not weakly normal. These codecs are associated with the
filename extension .clls
.
4.2 The domgraph-gxl codecs
The domgraph-gxl
codecs deal with representations of a labelled
dominance graph in the GXL graph representation language. GXL
(http://www.gupro.de/GXL/) is a XML-based standard language for
this purpose. A USR in this format looks as follows:
<gxl xmlns:xlink="http://www.w3.org/1999/xlink">
<graph id="utool-graph" edgeids="true" hypergraph="false" edgemode="directed">
<node id="x">
<type xlink:href="root" />
<attr name="label"><string>f</string></attr>
</node>
<edge from="x" to="x1" id="edge0">
<type xlink:href="solid" />
</edge>
<node id="x1">
<type xlink:href="hole" />
</node>
<edge from="x1" to="z" id="edge3">
<type xlink:href="dominance" />
</edge>
<node id="z">
<type xlink:href="leaf" />
<attr name="label"><string>a</string></attr>
</node>
</graph>
</gxl>
The USR specifies nodes (using node
elements), which can have
the type root
(a node with outgoing tree edges), hole
(a
node with incoming but no outgoing tree edges), or leaf
(a node
with not adjacent tree edges). Each node which is not a hole must have
an embedded attr
element which specifies its label. In
addition, the USR specifies edges, which can have the types
solid
or dominance
.
The domgraph-gxl
codecs are intended primarily as a portable exchange
format for labelled dominance graphs. Both the input and the output codec
will deal with arbitrary labelled dominance graphs, even if they are not
weakly normal. These codecs are associated with the filename extension
.dg.xml
.
4.3 The MRS input codecs
Utool supports two input codecs that deal with underspecified
representations based upon Minimal Recursion Semantics (MRS;
[[]Copestake etal.1999]), the standard scope underspecification formalism
used in current HPSG grammars. The mrs-prolog
codec deals with
MRS expressions in a Prolog style term representation. Alternatively,
the mrs-xml
codec deals with MRS expressions based upon an
XML-style syntax. Both codecs are compatible with the concrete syntax
that is used in the LKB system [[]Copestake2002].
Utool doesn't contain any MRS output codecs, because MRS makes some
specific assumptions about the underlying object language, and it is
not clear that a useful class of labelled dominance graphs can indeed
be correctly translated into MRS.
The mrs-prolog input codec.
A concrete example of a Prolog style MRS expression looks as follows:
psoa(h1,e2,
[ rel('prop-or-ques_m',h1,
[ attrval('ARG0',e2),
attrval('MARG',h3),
attrval('PSV',u4),
attrval('TPC',u5) ]),
rel('_every_q',h6,
[ attrval('ARG0',x7),
attrval('RSTR',h9),
attrval('BODY',h8) ]),
rel('_dog_n_1',h10,
[ attrval('ARG0',x7) ]),
rel('_bark_v_1',h11,
[ attrval('ARG0',e2),
attrval('ARG1',x7) ]) ],
hcons([ qeq(h3,h11), qeq(h9,h10) ]))
Here, h1
is the top handle, and e2
an event
variable. Terms of the form
rel(L, H, [attrval(F, V), ...])
represent elementary predications, which pair a quoted string
L
, a handle H
, and a list of feature-value pairs.
Features are represented by (quoted) atoms, and values can be either
handles, object language individual or event variables,
“unspecified” values (see below) and quoted strings. Finally, terms
of the form qeq(H1, H2)
represent handle constraints. Terms of
the form geq(H1, H2)
can also be used to express handle
constraints.
The codec makes the following assumptions:
-
Handles start with a lowercase
h
followed by a sequence
of digits.
- Individual variables start with a lowercase
x
followed by
a sequence of digits.
- Event variables start with a lowercase
e
followed by a
sequence of digits.
The codec also accepts terms starting with lowercase u
or
i
, which correspond to values left unspecified by the
syntax-semantics interface of the grammar used to derive the MRS
expression. These terms are ignored by the input codec.
The codec implements an extended version of the translation of MRS
into normal dominance graphs defined by [[]Niehren and Thater2003]. During the
translation, the codec makes certain constraints that are implicit in
the MRS description explicit by adding dominance edges (this is called normalisation). Most
importantly, it adds a dominance edge from each quantifier to its
bound variables. We assume that quantifiers are elementary
predications which contain exactly the features ARG0
,
RSTR
and BODY
. An individual variable is treated as
bound if it occurs as the value of some other feature.3
In order to guarantee correctness of the translation, the codec makes
two important assumptions about the MRS structures is processes.
First, the qeq
(and geq
) constraints translate simply
into dominance edges. This is a conceptional simplification, which
however seems to be compatible with the way modern grammars make use
of qeq
constraints [[]Fuchss etal.2004]. Second, the
translation is restricted to MRS expressions whose graph satisfies
certain structural restrictions: The MRS must be a net, or
equivalently, the resulting dominance graph must be normal,
hypernormally connected and leaf-labelled.
If the input MRS is not well-formed (see below) or not a net, the
codec reports an error. The error code is obtained by forming the
bitwise OR of 192 and the relevant bits in the following table. The
codec is able to report multiple errors at once, so e.g. if the input
MRS translates into a graph that is neither normal nor leaf-labelled,
the error code would be 201.
code |
symbolic name |
meaning |
|
1 |
NOT_NORMAL |
resulting graph is not normal |
2 |
NOT_WEAKLY_NORMAL |
resulting graph is not weakly normal |
4 |
NOT_HYPERNORMALLY_CONNECTED |
resulting graph is not hypernormally connected |
8 |
NOT_LEAF_LABELLED |
resulting graph is not leaf-labelled |
16 |
NOT_WELLFORMED |
the input MRS is not well-formed |
An MRS is not well-formed if it violates some (in the widest sense)
syntactic constraints, for instance, if it contains a variable
x
but no quantifier that binds x
. We should note that
this terminology departs from [[]Copestake etal.1999]'s terminology: in
that paper, a well-formed MRS must additionally be solvable.
Both MRS codecs have an option normalisation
, which can take the values nets
(this is the default case) or none
. The option value none
makes the codec skip the normalisation step and the well-formedness tests. This means that the dominance graph will typically only be a weakly normal and not a normal graph, and the solved forms of the graph will not necessarily correspond one-to-one to the scopings of the MRS. This means that the codec translates the USR incorrectly. However, it can sometimes be useful for debugging.
The input codec is associated with the filename extension .mrs.pl
.
The mrs-xml input codec.
In addition to the Prolog style term representation, Utool also
supports an XML style syntax. Apart from the concrete syntax (XML as
opposed to Prolog terms), the two codecs are exactly the same. The
Prolog-style MRS expressions shown above is represented in XML as
follows:
<?xml version="1.0"?>
<mrs>
<var vid="h1"/>
<ep>
<pred>prop-or-ques_m_rel</pred>
<var vid="h1"/>
<fvpair><rargname>ARG0</rargname><var vid="e2"/></fvpair>
<fvpair><rargname>MARG</rargname><var vid="h3"/></fvpair>
<fvpair><rargname>PSV</rargname><var vid="u4"/></fvpair>
<fvpair><rargname>TPC</rargname><var vid="u5"/></fvpair>
</ep>
<ep>
<pred>_every_q_rel</pred>
<var vid="h6"/>
<fvpair><rargname>ARG0</rargname><var vid="x7"/></fvpair>
<fvpair><rargname>RSTR</rargname><var vid="h9"/></fvpair>
<fvpair><rargname>BODY</rargname><var vid="h8"/></fvpair>
</ep>
<ep>
<pred>_dog_n_1_rel</pred>
<var vid="h10"/>
<fvpair><rargname>ARG0</rargname><var vid="x7"/></fvpair>
</ep>
<ep>
<pred>_bark_v_1_rel</pred>
<var vid="h11"/>
<fvpair><rargname>ARG0</rargname><var vid="e2"/></fvpair>
<fvpair><rargname>ARG1</rargname><var vid="x7"/></fvpair>
</ep>
<hcons hreln="qeq">
<hi><var vid="h3"/></hi>
<lo><var vid="h11"/></lo>
</hcons>
<hcons hreln="qeq">
<hi><var vid="h9"/></hi>
<lo><var vid="h10"/></lo>
</hcons>
</mrs>
This codec supports the same normalisation
option as mrs-prolog
. It is associated with the filename extension .mrs.xml
.
4.4 The holesem-comsem input codec
The holesem-comsem
input codec can read USRs of the Hole
Semantics formalism. Hole Semantics [[]Bos1996] is a rather popular
underspecification formalism because it is conceptually very
accessible (underspecify formulas by allowing them to have holes which
can be plugged by other formulas). We assume the concrete syntax used
in the Prolog system that accompanies the Computational Semantics
textbook [[]Blackburn and Bos2005]; that is, it
should be possible to use all USRs generated by the book software with
this codec. USRs in this syntax are Prolog terms that look e.g. as
follows:
some(A, some(B, some(C, some(X, and(label(A), and(hole(B),
and(label(C), and(some(A, X, B), and(pred1(C,man,X), leq(C,B))))))))))
Here A
and C
are labels, B
is a hole, and
X
is an object variable of the intended semantic
representation, which will be bound by an existential quantifier. All
four symbols are introduced by the outer some
terms. In
addition, the term some(A,X,B)
expresses that the formula under
the label A
is ∃ X. B
, and the term
pred1(C,man,X)
expresses that the formula under the label
C
is man(X). The term leq(C,B)
says that the label
C
must be below the hole B
.
The codec supports USRs built from the logical symbols hole
,
label
, some
, and
, or
, imp
,
not
, all
, leq
, que
, eq
,
pred1
, pred2
, and arbitrary non-logical symbols. It does
not support the symbols lam
and app
that are used in
intermediate representations during semantics construction by
Blackburn and Bos, as these symbols are beyond the scope of ordinary
Hole Semantics USRs (they don't relate holes and labels, but entire
Hole Semantics USRs).
This codec makes use of the theoretical result that Hole Semantics
USRs that are hypernormally connected and leaf-labelled can be
translated into equivalent labelled dominance graphs
[[]Koller etal.2003]. The translation itself is not that complicated,
but the correctness proof is not trivial. The codec checks whether the
resulting graph is normal, leaf-labelled, and hypernormally
connected. If it isn't, the codec reports this as a semantic error
with one of the following exit codes:
code |
symbolic name |
meaning |
|
193 |
ERROR_GRAPH_NOT_NORMAL |
resulting graph is not normal |
194 |
ERROR_GRAPH_NOT_HNC |
resulting graph is not hypernormally
connected |
195 |
ERROR_GRAPH_NOT_LEAF_LABELLED |
resulting graph is not
leaf-labelled |
196 |
ERROR_MULTIPLE_PARENTS |
graph has a node with more than one
parent |
Although every normal, hypernormally connected, and leaf-labelled
dominance graph can in principle be translated back into an equivalent
Hole Semantics USR [[]Koller etal.2003], there is no corresponding
output codec. This is because the concrete syntax imposes a number of
very inconvenient restrictions on the USRs that it can express. For
instance, all nodes that are not holes must have exactly one or two
children via tree edges, and all non-holes whose children are not
object-level variables must be labelled with a connective of predicate
logic. This has the consequence that only a tiny minority of all
labelled dominance graphs can actually be encoded given this syntax,
which made the encoding more difficult to implement and debug than we
felt it was worth.
The input codec is associated with the filename extension .hs.pl
.
4.5 The domgraph-udraw and domgraph-dot output codecs
As an alternative to the display
command, one can use utool to
convert underspecified representations into formats which can be
further processed by other graph viewers:
-
The
domgraph-udraw
output-codec outputs a dominance graph
in a format that can be used to display the graph with the
uDraw(Graph) tool (formerly daVinci) from the University of Bremen
(http://www.informatik.uni-bremen.de/uDrawGraph/en/index.html);
- the
domgraph-dot
codec outputs a dominance graph in the
“dot” format supported by many graph drawing tools, such as
graphviz
(http://graphviz.org/).
We don't describe these output formats in detail here, but you can
write the output into a file and then load it from the respective
graph viewers. Note that these commands will only produce meaningful
output if you use them from the convert
command, rather than
the solve
command, as neither graph format supports the
representation of multiple graphs (i.e. the multiple solved forms
computed by solve
) at once.
As a further convenience, the domgraph-udraw
accepts the codec
option pipe
. If you use this codec option, you can directly
feed Utool's output to uDraw(Graph) via a pipe instead of writing the
graph to a file and then opening this file from uDraw(Graph), like so:
utool convert foo.clls -O domgraph-udraw --output-codec-options pipe=true \
| uDrawGraph -pipe
The domgraph-dot
codec accepts the codec option enforceEdgeOrder
. If this option is set to true
, the dot representation computed by the codec will contain an instruction that forces the Dot/Graphviz layout algorithm to respect the left-to-right order of edges that come out of the same node. This is a good thing where tree edges are concerned, as their relative order corresponds to e.g. the restriction and scope of a quantifier. However, because of limitations in the dot format, the ordering constraint will also apply to dominance edges. This is not only unnecessary (there is no meaningful order between outgoing dominance edges), but it is also inconvenient, as the graph layout for nontrivial graphs will typically not be pretty. The enforceEdgeOrder
option is thus set to false
by default.
Both codecs are able to encode any labelled dominance graph, even if
it is not weakly normal, thus they never report any semantic
errors. The domgraph-udraw
codec is associated with the
filename extension .dg.udg
, and the domgraph-dot
codec
is associated with the extension .dg.dot
.
4.6 The plugging output codecs
The plugging-oz
and plugging-lkb
output codecs display a
dominance graph in solved form as a list of pairs (hole, root) that
specify the dominance edges in the solved form (in Hole Semantics
terminology, this is a plugging). They differ only in the
concrete syntax:
plugging-oz
is intended as a convenient output format if you
only want to see the dominance edges. On the other hand,
plugging-lkb
is practically relevant because it displays
pluggings in the format that the LKB workbench expects from its own
MRS solver. This means that Utool with the mrs-prolog
input
codec and the plugging-lkb
output codec can be used as a
drop-in replacement for the LKB's own MRS solver (see also
Section 5.343Integration with the LKB Workbenchsubsection.5.3).
These two codecs are only intended to be run on dominance graphs in
solved form. They will also accept any other dominance graph and will
then display its dominance edges, but this is less meaningful than
displaying the dominance edges in a solved form. The
plugging-oz
codec is associated with the filename extension
.plug.oz
, and the plugging-lkb
codec is associated with
the extension .lkbplug.lisp
.
4.7 The term output codecs
The term-oz
and term-prolog
output codecs can be used to
encode labelled dominance graphs in simple solved form, i.e. solved
forms which are trees and in which each hole has exactly one outgoing
dominance edge. They traverse these trees top-down and print the
ground term that corresponds to the tree structure. Their output looks
as follows:
-
term-oz
: an Oz term of the form
f2(f1(a0 a1) a2)
f1(a0 f2(a1 a2))
term-prolog
: a Prolog term of the form
f2(f1(a0,a1),a2)
f1(a0,f2(a1,a2))
The only difference between the two codecs is that the Prolog codec
separates arguments of a term by commas, whereas the Oz codec
separates them with whitespace. These two codecs are intended as
human-readable representations of solved forms.
Both codecs assume that the solved form they encode is simple and
leaf-labelled, and will report an error code of 225 if it
isn't. term-oz
is associated with the filename extension
.t.oz
, and term-prolog
with .t.pl
.
4.8 The chain input codec
The chain
input codec will generate the pure chain
[[]Koller2004] of a given length. A chain is a zig-zag graph
consisting of upper and lower fragments that are connected by
dominance edges; the pure chain of length 3 is shown in
Fig. 111Running Utool from the command linefigure.1. Chains appear frequently as parts of
linguistically motivated USRs, and are therefore a nice basis for
benchmarking (e.g. in [[]Bodirsky etal.2004]).
The “underspecified descriptions” that this codec expects in server
mode are simply string representations of numbers (such as the string
3
). The codec will then generate the pure chain of this
length. When used with the command-line version of Utool, chain
behaves differently than all other codecs discussed so far in that it
doesn't interpret its argument as a filename, but again directly as
the chain length. This means that you can use a Utool call as follows:
$ utool convert -I chain 3 -O domcon-oz
The codec will report a parsing error (code 192) if the chain length
specification isn't a number, and a semantic error with code 193 if
the number isn't positive. Because chain
doesn't read its USRs
from files, it is not associated with any filename extension.
4.9 Writing your own codecs
Although Utool comes with a collection of codecs that cover many
existing popular underspecification formalisms, there are some
formalisms we don't support (yet), and we can expect that other
formalisms will be developed in the future. To this end, it may be
helpful for you to write your own codec. We will now describe how this is done. Notice that you don't have to rebuild Utool just to add codecs.
Codec classes.
A codec is a public, non-abstract class that is derived from one of the abstract base classes
InputCodec
(if it is an input codec) or OutputCodec
(for an output codec), both of which belong to the package de.saar.chorus.domgraph.codec. Codecs
are typically not instantiated directly by the programmer. They are registered in
a codec manager (an object of class CodecManager
),
which has methods for querying and instantiating codecs. The codec
manager separates codecs from the rest of the Utool system and enforces
a uniform interface that makes sure that all components of the system
interact properly with the codecs: Once you have implemented and
registered a new codec, it will automatically be usable from the command
line, the server, and the GUI.
The abstract base class InputCodec
defines a method decode
,
which you must implement when you develop an input codec. The method
takes a Reader
, a DomGraph
, and a NodeLabels
object
as arguments. Its job is to read an USR from the Reader
, translate it
into a labelled dominance graph, and then change the DomGraph
and
the NodeLabels
to this graph. If an error occurs during this decoding
process, it may throw an IOException
, a ParserException
(if the
USR was syntactically not well-formed), or a MalformedDomgraphException
(if the USR could not be translated to a dominance graph for semantic reasons).
The Utool exit codes (see Section 3.416Exit codessubsection.3.4)
corresponding to these errors are 128 for the I/O error,
192 for the parse error, and 192+N for the semantic errors, where N is
an integer between 1 and 31 (inclusive) which you can specify when you construct
the exception.
Conversely, the class OutputCodec
defines a method encode
, which
accepts a DomGraph
and a NodeLabels
, translates them into your
output format, and writes the result to a Writer
. The method may throw
IOException
s and MalformedDomgraphException
s, with the same meaning
as above. The exit codes available to an output codec are 128 for I/O errors
and 224 + N for semantic errors.
In addition, the codec must implement the methods print_header
and
print_footer
, which are called by Utool at the very beginning and end of the
encoding process; the builtin codecs use them to print a version header at the beginning
of the output files. It is a good idea to call flush()
on the writer in the footer method.
If your output codec supports the output of multiple USRs into the same file,
it should be derived from the class MultiOutputCodec
, which is itself
derived from OutputCodec
. Only MultiOutputCodec
s can be used
as the output codec of a solve
command on the command line, or
the “Export Solved Forms” menu entry in Ubench. MultiOutputCodec
inherits
all abstract methods from OutputCodec
and adds three more: The methods print_start_list
and print_end_list
are
called before printing the first solved form (but after print_header
)
and after printing the last solved form (but before print_footer
), and print_list_separator
is called between any two encode
calls for subsequent solved forms. Notice
that print_start_list
and print_end_list
are called only for results of
solve
and “Export Solved Forms”, whereas print_header
and print_footer
are called in these cases and also for the convert
and “Export” commands.
Annotations.
Every codec class must be annotated with a @CodecMetadata
annotation. This
annotation has two required arguments name
and extension
, both of
which are of type String
. The former specifies the codec's name; there may be no
two input codecs and no two output codecs of the same name. The latter enables Utool
to associate filenames with codecs: A file whose name ends with the specified extension
will be automatically processed with this codec. If you pass an empty string for the extension,
your codec will not be associated with any filename. In addition, you may mark your
codec as experimental by passing the option experimental=true
to the metadata
annotation.
Thus, a typical declaration of an input codec would look as follows:
@CodecMetadata(name="mrs-prolog", extension=".mrs.pl")
public class MrsInputCodec extends InputCodec {
...
The next requirement is that each codec must declare exactly one codec constructor,
which will be used whenever the codec manager wants to create a new instance of the
codec.
If the codec class has exactly one public constructor, this constructor is used as the
codec constructor. (If the class declares no constructors at all, the argumentless default
constructor will do.) On the other hand, if the class declares more than
one public constructor, exactly one of them must be annotated with @CodecConstructor
,
and this annotated constructor will be used as the codec constructor.
The codec constructor
must not be declared as throwing any checked exceptions. In addition, each parameter
of the codec constructor must have an @CodecOption
annotation. This annotation
takes a required string argument name
, which gives this particular parameter
a public name. It may also take an argument defaultValue
, which defines a default value
for when the user doesn't specify an explicit value. If you specify no defaultValue
, the
empty string will be used.
Thus, a typical declaration of a codec constructor would look as follows:
@CodecConstructor
public DomgraphUdrawOutputCodec(@CodecOption(name="pipe", defaultValue="false")
boolean usePipe) {
...
This way of declaring codec constructors and codec options looks a bit complicated
at first sight, but is quite convenient in various respects. For one thing, Utool
is aware of the codec options, and will decode them automatically. If you select
the File/Export menu entry in Ubench and choose the domgraph-udraw input codec, you
will be offered a button “Option”, which will reveal an option selection panel.
This panel will contain a checkbox with the label “pipe”, because your
codec constructor has a parameter with name “pipe” and type boolean
;
that is, it uses information about the parameter types to offer the user
appropriate input forms.
Similarly, the Utool command-line tool will be able to parse an argument
--output-codec-options pipe=true
, and the server will likewise deal
with the option; these modes use the parameter types to decode the argument
strings into appropriate values. For this reason, every parameter of the
codec constructor must have one of the following types:
-
any primitive datatype except for
void
and char
; these are
presented as text fields or checkboxes (for boolean) in the GUI, and are decoded
using the valueOf
method of the respective wrapper class on the command
line and in the server;
- any enumeration type; these are presented as dropdown menus in the GUI,
and are decoded by mapping the string representations to the enum values of the
same name;
- the class
String
.
On the other hand, the way that codecs represent metadata stays out of your way
if you don't need it. Every codec class must have a @CodecMetadata
annotation.
But beyond that, there is no need for a @CodecConstructor
annotation if there
is a unique public constructor to begin with; and if the constructor has no parameters,
nothing needs to be annotated with @CodecOption
. The latter situation
accounts for the majority of the builtin codecs.
Registration.
The final step of the
story is to make Utool aware of your new codec. This is handled by
creating a file
de/saar/chorus/domgraph/codec/codecclasses.properties somewhere
in your classpath. In this file, you list the fully qualified
classname(s) of your new codec(s), one per line. Utool will then
automatically read this file and try to register your codecs. If
a codec fails to register (typically because an annotation was missing),
Utool will be terminated with an error message and exit code 142.
For instance, let's say that you have implemented and compiled a codec
class foo.bar.MyCodec. You would then create a
codecclasses.properties file with the following contents:
foo.bar.MyCodec
Then you would start a MyCodec-enhanced version
of Utool as follows:
$ jar cf mycodec.jar foo/bar/MyCodec.class \
de/saar/chorus/domgraph/codec/codecclasses.properties
$ java -cp Utool.jar:mycodec.jar de.saar.chorus.domgraph.utool.Utool -d