Grammar rules in ALE are of the phrase structure variety, with annotations for both goals that need to be solved and for attribute-value descriptions of categories. The BNF syntax for rules is as follows:
<rule> ::= <rule_name> rule <desc> ===> <rule_body>. <rule_body> ::= <rule_clause> | <rule_clause>, <rule_body> <rule_clause> ::= cat> <desc> | cats> <desc> | sem_head> <desc> | goal> <goal> | sem_goal> <goal>The <rule_name> must be a Prolog atom. The description in the rule is taken to be the mother category in the rule, while the rule body specifies the daughters in the rule along with any side conditions on the rule, expressed as ALE goals. A further restriction on rules, which is not expressed in the BNF syntax above, is that there must be at least one category-seeking rule clause in each rule body.6.2Thus empty productions are not allowed and will be flagged as errors at compile time.
A simple example of such a rule, without any goals, is as follows:
s_np_vp rule (syn:s, sem:(VPSem, agent:NPSem)) ===> cat> (syn:np, agr:Agr, sem:NPSem), cat> (syn:vp, agr:Agr, sem:VPSem).There are a few things to notice about this rule. The first is that the parentheses around the category and mother descriptions are necessary. Looking at what the rule means, it allows the combination of an np category with a vp type category if they have compatible (unifiable) values for agr. It then takes the semantics of the result to be the semantics of the verb phrase, with the additional information that the noun phrase semantics fills the agent role.
Unlike the PATR-II rules, but similar to DCG rules, ``unifications'' are specified by variable co-occurrence rather than by path equations, while path values are specified using the colon rather than by a second kind of path equation. The rule above is similar to a PATR-II rule which would look roughly as follows:
x0 ---> x1, x2 if (x0 syn) == s, (x1 syn) == np, (x2 syn) == vp, (x0 sem) == (x2 sem), (x0 sem agent) == (x1 sem), (x1 agr) == (x2 agr)
Unlike lexical entries, rules are not expanded to feature structures at compile-time. Rather, they are compiled down into structure-copying operations involving table look-ups for feature and type symbols, unification operations for variables, sequencing for conjunction, and choice point creation for disjunction.
The descriptions for cat> and cats> daughters are always evaluated in the order they are specified, from left to right. This is significant when considering goals that might be interleaved with searches in the chart for consistent daughter categories. The order in which the code for the mother's and semantic head's descriptions is executed depends on the control strategy used during parsing or generation. These are described in Sections 6.4.3 and 6.4.4, respectively. In theory, the same grammar can be used for both parsing and generation. In practice, a single grammar is rarely efficient in both directions, and can even exhibit termination problems in one, just as a Prolog program may have these problems with queries that have different argument instantiations. So while it is not necessary to fully understand the parsing or generation algorithms used by ALE to exploit its power for developing grammars, practical implementations will order their procedural attachments and distribute their description-level information with these algorithms in mind.
Within a single description, in the case of feature and type symbols, a double-hashing is performed on the type of the structure being added to, as well as either the feature or the type being added. Additional operations arise from type coercions that adding features or types require. Thus there is nothing like disjunctive normal-form conversion of rules at compile time, as there is for lexical entries. In particular, if there is a local disjunction in a rule, it will be evaluated locally at run time. For instance, consider the following rule, which is the local part of HPSG's Schema 1:
schema1 rule (cat:(head:Head, subcat:), cont:Cont) ===> cat> (Subj, cat:head:( subst ; spec:HeadLoc, )), cat> (HeadLoc, cat:(head:Head, subcat:[Subj]), cont:Cont).Note that there is a disjunction in the cat:head value of the first daughter category (the subject in this case). This disjunction represents the fact that the head value is either a substantive category (one of type subst), or it has a specifier value which is shared with the entire second daughter. But the choice between the disjuncts in the first daughter of this rule is made locally, when the daughter category is fully known, and thus does not create needless rule instantiations.
ALE's general treatment of disjunction in descriptions, which is an extension of Kasper and Round's (1986) attribute-value logic to phrase structure rules, is a vast improvement over a system such as PATR-II, which would not allow disjunction in a rule, thus forcing the user to write out complete variants of rules that only differ locally. Disjunctions in rules do create local choice points, though, even if the first goal in the disjunction is the one that is solvable.6.3This is because, in general, both parts of a disjunction might be consistent with a given category, and lead to two solutions. Or one disjunct might be discarded as inconsistent only when its variables are further instantiated elsewhere in the rule.
A more complicated rule, drawn from the categorial grammar in the appendix is as follows:
backward_application rule (synsem:Z, qstore:Qs) ===> cat> (synsem:Y, qstore:Qs1), cat> (synsem:(backward, arg:Y, res:Z), qstore:Qs2), goal> append(Qs1,Qs2,Qs).Note that the goal in this rule is specified after the two category descriptions. Consequently, it will be evaluated after categories matching the descriptions have already been found, thus ensuring in this case that the variables Qs1 and Qs2 are instantiated. The append(Qs1,Qs2,Qs) goal is evaluated by ALE's definite clause resolution mechanism. goal> attachments are always evaluated in the order they are specified relative to the enforcement of cat> and cats> daughters, from left to right. All possible solutions to the goal are found with the resulting instantiations carrying over to the rule. These solutions are found using the depth-first search built into ALE's definite constraint resolver. In general, goals may be interleaved with the category specifications, giving the user control over when the goals are fired. Also note that goals may be arbitrary cut-free ALE definite clause goals, and thus may include disjunctions, conjunctions, and negations. Cuts may occur, however, within the code for any literal clause specified in a procedural attachment. The attachments themselves must be cut-free to avoid the cut taking precedence over the entire rule after compilation, thus preventing the rule to apply to other edges in the chart or for later rules to apply. Instead, if cuts are desired in rules, they must be encapsulated in an auxiliary predicate, which will restrict the scope of the cut. For instance, in the context of a phrase structure rule, rather than a goal of the form:
goal> (a, !, b)it is necessary to encode this as follows:
goal> cwhere the predicate c is defined by:
c if (a, !, b).This prevents backtracking through the cut in the goal, but does not block the further application of the rule. A similar strategy should be employed for cuts in lexical rules.
As a programming strategy, rules should be formulated like Prolog clauses, so that they fail as early as possible. Thus the features that discriminate whether a rule is applicable should occur first in category descriptions. The only work incurred by testing whether a rule is applicable is up to the point where it fails.
Just as with PATR-II, ALE is RE-complete (equivalently, Turing-equivalent), meaning that any computable language can be encoded. Thus it is possible to represent undecidable grammars, even without resorting to the kind of procedural attachment possible with arbitrary definite clause goals. With its mix of depth-first and breadth-first evaluation strategies, ALE is not strictly complete with respect to its intended semantics if an infinite number of edges can be generated with the grammar. This situation is similar to that in Prolog, where a declaratively impeccable program might hang operationally.
The cats> operator is used to describe a list of daughters, whose length cannot be determined until run-time. Daughters are not parsed or generated as quickly as part of a cats> specification. Note also the interpretation of cats> requires that its argument is subsumed by the type list, which must be defined, along with ne_list, e_list, etc., and the features HD, and TL, which we defined above. This check is not made using unification, so that an underspecified list argument will not work either. If the argument of cats> is not subsumed by list, then the rule in which that argument occurs will never match any string, and a run-time error message will be given. This operator is useful for so-called ``flat'' rules, such as HPSG's Schema 2, part of which is given (in simplified form) below:
schema2 rule (cat:(head:Head, subcat:[Subj])) ===> cat> (cat:(head:Head, subcat:[Subj|Comps])), cats> Comps.Since various lexical items have SUBCAT lists of various lengths, e.g., zero for proper nouns, one for intransitive verbs, two for transitive verbs, cats> is required in order to match the actual list of complements at run-time.
It is common to require a goal to produce an output for the argument of cats>. If this is done, the goal must be placed before the cats>. Our use of cats> is problematic in that we require the argument of cats> to evaluate to a list of fixed length. Thus parsing with the following head-final version of HPSG's Schema 2 would not work:
schema2 rule (cat:(head:Head, subcat:[SubjSyn])) ===> cats> Comps, cat> (cat:(head:Head, subcat:[Subj|Comps])).One way to work around this is to place some finite upper bound on the size of the Comps list by means of a constraint.
schema2 rule (cat:(head:Head, subcat:[SubjSyn])) goal> three_or_less(Comps), cats> Comps, cat> (cat:(head:Head, subcat:[Subj|Comps])). three_or_less() if true. three_or_less([_]) if true. three_or_less([_,_]) if true. three_or_less([_,_,_]) if true.The problem with this strategy from an efficiency standpoint is that arbitrary sequences of three categories will be checked at every point in the grammar; in the English case, the search is directed by the types instantiated in Comps as well as that list's length. From a theoretical standpoint, it is impossible to get truly unbounded length arguments in this way.
The ALE system employs a bottom-up active chart parser that has been tailored to the implementation of attribute-value grammars in Prolog. The single most important fact to keep in mind is that rules are evaluated from left to right, with the mother description coming last. Most of the implementational considerations follow from this rule evaluation principle and its specific implementation in Prolog. In parsing, sem_head> and sem_goal> specifications are treated exactly as cat> and goal> specifications, respectively.
The chart is filled in using a combination of depth- and breadth-first control. In particular, the edges are filled in from right to left, even though the rules are evaluated from left to right. Furthermore, the parser proceeds breadth-first in the sense that it incrementally moves through the string from right to left, one word at a time, recording all of the inactive edges that can be created beginning from the current left-hand position in the string. For instance, in the string The kid ran yesterday, the order of processing is as follows. First, lexical entries for yesterday are looked up, and entered into the chart as inactive edges. For each inactive edge that is added to the chart, the rules are also fired according to the bottom-up rule of chart parsing. But no active edges are recorded. Active edges are purely dynamic structures, existing only locally to exploit Prolog's copying and backtracking schemes. The benefit of parsing from right to left is that when an active edge is proposed by a bottom-up rule, every inactive edge it might need to be completed has already been found. This is actually true as long as the direction of traversal through the string is the opposite of the direction of matching daughter categories in a rule; thus the real reason for the right-to-left parsing strategy is to allow the active edges to be represented dynamically, while still evaluating the rules from left to right. While the overall strategy is bottom-up, and breadth-first insofar as it steps incrementally through the string, filling in every possible inactive edge as it goes, the rest of the processing is done depth-first to keep as many data structures dynamic as possible, to avoid copying other than that done by Prolog's backtracking mechanism. In particular, lexical entries, bottom-up rules, and active edges are all evaluated depth-first, which is perfectly sound, because they all start at the same left point (that before the current word in the right to left pass through the string), and thus do not interact with one another.
ALE computes the closure of its grammar rules under application of the first daughter's description to empty categories at compile-time. This is known as Empty-First-Daughter closure or EFD closure. This closure operation has three advantages. First, given ALE's combination of depth-first and breadth-first processing, it is necessary in order to ensure completeness of parsing with empty categories, because any permutation of empty categories can, in principle, be combined to form a new empty category. Second, it works around a problem that many non-ISO-compatible Prologs, including SICStus Prolog, have with asserted predicates that results in empty category leftmost daughters not being able to combine with their own outputs. Third, it allows the run-time parser to establish a precondition that rules only need to be closed with non-empty leftmost daughters at run-time. As a result, when a new mother category is created and closed under rules as the leftmost daughter, it cannot combine with other edges created with the same left node. This allows ALE, at each step in its right-to-left pass throught the input string, to copy all of the edges in the internal database back onto the heap before they can be used again, and thus reduces edge copying to a constant two times per edge for non-empty categories. Keeping a copy of the chart on the heap also allows for more sophisticated indexing strategies that would otherwise be overwhelmed by the cost of copying edges with large-sized categories in Prolog before the match. The EFD closure algorithm itself is described in Penn (1999).
EFD closure potentially creates new rules, a prefix of whose daughters have matched empty categories, and new empty categories, formed when every daughter of a rule has matched an empty category. The closure in computed breadth-first.
EFD closure may not terminate. As a result, compilation of some grammars may go into infinite loops. This only occurs, however, with grammars for which every parse would go into an infinite loop at run-time if EFD closure were not applied -- specifically, when empty categories alone can produce infinitely many empty categories using the rules of the grammar. Because early versions of ALE did not compute a complete closure of grammar rules over empty categories (even at run-time), some grammars that terminated at run-time under these early versions will not terminate at compile-time under the current version.
Rules can incorporate definite clause goals before, between or after category specifications. These goals are evaluated when they are found. For instance, if a goal occurs between two categories on the right hand side of a rule, the goal is evaluated after the first category is found, but before the second one is. The goals are evaluated by ALE's definite clause resolution mechanism, which operates in a depth-first manner. Thus care should be taken to make sure the required variables in a goal are instantiated before the goal is called. The resolution of all goals should terminate with a finite (possibly empty) number of solutions, taking into account the variables that are instantiated when they are called.
The parser will terminate after finding all of the inactive edges derivable from the lexical entries and the grammar rules. Of course, if the grammar is such that an infinite number of derivations can be produced, ALE will not terminate. Such an infinite number of derivations can creep in either through recursive unary rules or through the evaluation of goals.
ALE now has an optional mechanism for checking edge subsumption (Section B.9). This can be used to prevent the propagation of spurious ambiguities through the parse. A category spanning a given subsequence is said to be spurious if there is another category spanning the same subsequence such that is subsumed by . Only the most general category needs to be recorded to ensure soundness. It can also be used to detect two derivations of the same category. Our experience, however, has been that most unification-based grammars do not have any spurious ambiguity. They normally incorporate some notion of thematic or functional structure representing the meaning of a sentence; and in these cases, most structural ambiguities result in semantic ambiguities. For such grammars, subsumption checking is probably not worth the effort, and should be left disabled.
ALE also contains a generator, based on the Semantic Head-Driven Generation algorithm of van Noord (1989), as extended by Shieber et al. (1990), and adapted to the typed feature logic of Carpenter (1992) by Popescu (1996). Its implementation in ALE is described in Penn and Popescu (1997).
Given a description of a feature structure, ALE's generator will non-deterministically generate all the word strings that correspond to its most general satisfier(s). In other words, the generated word strings, when parsed in ALE using the same grammar, will result in feature structures that unify with a most general satisfier of the initial description (rather than necessarily be identical). That part of the feature structure which represents semantic information drives the generation process.
ALE identifies this part using a user-defined directive, semantics/1. This directive distinguishes a binary user-defined definite clause predicate as the predicate to use to find semantic information. The first argument is always the feature structure whose semantics are being identified; and the second argument is always the semantic information. The example below, taken again from the sample generation grammar, simply says that the semantics of a feature structure is the value of its sem feature:
semantics sem1. sem1(sem:S,S) if true.
In general, the second argument does not need to be a sub-structure of the first -- it could have a special type that is used only for the purpose of collecting semantic information, possibly spread over several unrelated sub-structures. The body can be arbitrarily complex; and there can be multiple clauses for the definition of this predicate. The predicate must, however, have the property that it will terminate when only its first argument is instantiated, and when only its second argument is instantiated. ALE will use this predicate in both ``directions'' -- to find semantics information, and in reverse to build templates to find structures that have matching semantic information. There can be only one predicate distinguished by semantics/1. If there are multiple directives, ALE will only use the first.
Semantic-head-driven generation makes use of the notion of a semantic head of a rule, a daughter whose semantics is shared with the mother. In semantic-head-driven generation, there are two kinds of rules: chain rules, which have a semantic head, and non-chain rules, which lack such a head. These two subsets are processed differently during the generation process.
Given a feature structure, called the root goal, to generate a string for, the generator builds a new feature structure that shares its semantic information (using the user-defined semantics predicate with the second argument instantiated) and finds a pivot that unifies with it. The pivot is the lowest node in a derivation tree that has the same semantics as the root. The pivot may be either a lexical entry or empty category (the base cases), or the mother category of a non-chain rule. Once a pivot is identified, one can recursively generate top-down from the pivot using non-chain rules. Since the pivot must be the lowest, there can be no lower semantic heads, and thus no lower chain-rule applications. Just as in parsing, the daughters of non-chain rules are processed from left to right.
Once top-down generation from the pivot is complete, the pivot is linked to the root bottom-up by chain rules. At each step, the current chain node (beginning with the pivot) is unified with the semantic head of a chain-rule, its non-head sisters are generated recursively, and the mother becomes the new chain node. The non-head daughters of chain rules are also processed from left to right. The base case is where the current chain node unifies with the root.
To avoid the non-termination problem that arises from infinitely long semantic head mother sequences in a grammar, ALE requires the user to specify a bound on the length of chain rule sequences at compile-time. This can be specified with the declaration:
:- chain_length(4).Other values than 4 can be used, including 0. The default value is 4. ALE compiles chains of semantic head and mother descriptions of this length to perform the pivot check more efficiently at run-time.
For the most part, the generator treats procedural attachments as the parser does -- it evaluates them with respect to other daughter specifications in the order given. The one exception to this is sem_goal> attachments. These goals are distinguished as attached to the semantic head, and are therefore evaluated either immediately before or immediately after the sem_head> description. As a result, sem_goal> specification can only occur immediately before or immediately after a sem_head> specification; and thus only in chain rules. sem_goal> attachments are not evaluated during the pivot check described above -- only the sem_head> and mother descriptions. During parsing, sem_goal> specifications are treated exactly the same as goal> specifications, i.e., evaluated in order.
To summarize, the order of execution for a non-chain rule specification during generation is: