next up previous contents
Next: ALE Lexical Rules Up: Phrase Structure Grammars Previous: Lexical Entries   Contents


Empty Categories

[Code]
ALE allows the user to specify certain categories as occurring without any corresponding surface string. These are usually referred to somewhat misleadingly as empty categories, or sometimes as null productions. In ALE, they are supported by a special declaration of the form:
  empty <desc>.
Where <desc> is a description of the empty category.

For example, a common treatment of bare plurals is to hypothesize an empty determiner. For instance, consider the contrast between the sentences kids overturned my trash cans and a kid overturned my trash cans. In the former sentence, which has a plural subject, there is no corresponding determiner. In our categorial grammar, we might assume an empty determiner with the following lexical entry (presented here with the macros expanded):

  empty @ gdet(some).

  gdet(Quant) macro
    synsem:(forward,
            arg:(syn:(n,
                      num:plu),
                 sem:(body:Restr,
                      ind:Ind)),
            res:(syn:(np,
                      num:plu),
                 sem:Ind),
    qstore:[ (Quant,
              var:Ind,
              restr:Restr) ].
Of course, it should be noted that this entry does not match the type system of the categorial grammar in the appendix, as it assumes a number feature on nouns and noun phrases.

Empty categories are expensive to compute under a bottom-up parsing scheme such as the one used in ALE. The reason for this is that these categories can be used at every position in the chart during parsing (with the same begin and end points). If the empty categories cause local structural ambiguities, parsing will be slowed down accordingly as these structures are calculated and then propagated. Consider the empty determiner given above. It can be used as an inactive edge at every node in the chart, then match the forward application rule scheme and search through every edge to its right looking for a nominal complement. If there are relatively few nouns in a sentence, not many noun phrases will be created by this rule and thus not many structural ambiguities will propagate. But in a sentence such as the kids like the toys, there will be an edge spanning kids like the toys corresponding to an empty determiner analysis of kids. The corresponding noun phrase created spanning toys will not propagate any further, as there is no way to combine a noun phrase with the determiner the. But now consider the empty slash categories of form $X/X$ in GPSG. These categories, when coupled with the slash passing rules, would roughly double parsing time, even for sentences that can be analyzed without any such categories. The reason is that these empty categories are highly underspecified and thus have many options for combinations. Thus empty categories should be used sparingly, and prefarably in environments where their effects will not propagate.

Another word of caution is in order concerning empty categories: they can occur in constructions with other empty categories. For instance, if we specify categories $C_{1}$ and $C_{2}$ as empty categories, and have a rule that allows a $C$ to be constructed from a $C_{1}$ and a $C_{2}$, then $C$ will act as an empty category, as well. These combinations of empty categories are computed at compile-time; but the sheer number of empty categories produced under this closure may be a processing burden if they apply at run-time too productively. Keep in mind that ALE computes all of the inactive edges that can be produced from a given input string, so there is no way of eliminating the extra work produced by empty categories interacting with other categories, including empty ones.


next up previous contents
Next: ALE Lexical Rules Up: Phrase Structure Grammars Previous: Lexical Entries   Contents
TRALE User's Manual