semantic issues in a prescriptive word composition theory

From Lojban
Jump to navigation Jump to search

Originally placed at

by Nick Nicholas.

In this essay, I consider semantic issues in the formulation of a prescriptive theory for word compound meaning, and to what extent they run counter to prescription.

The language in which the formulation has occured is Lojban. Lojban is an artificial language, the offshoot of an earlier project, Loglan (Brown 1960). The declared aim of the Loglan project was to test the Sapir-Whorf hypothesis on a language based on symbolic logic. Lojban supporters are motivated by a more diffuse range of reasons, amongst which apparently a desire for unambiguous communication. The language has been under development for the past five years, and is nearing publication.

The main unambiguities claimed for the language (Cowan 1991) are phonological-graphical, morphological, and syntactic;

"The claim for semantic unambiguity is a limited one only. Lojban contains several constructs which are explicitly ambiguous semantically. The most ambiguous of these are Lojban tanru (so-called 'metaphors') and Lojban names. [...] tanru are binary combinations of predicates, such that the second predicate is the 'head' and the first predicate is a modifier for that head. The meaning of the tanru is the meaning of its head, with the additional information that there is some unspecified relationship between the head and the modifier.

tanru are the basis of compound words in Lojban. However, a compound word has a single defined meaning whereas the meaning of a tanru is explicitly ambiguous, Lojban tanru are not as free as English figures of speech; they are 'analytic', meaning that the components of tanru do not themselves assume a figurative sense. Only the connection between them is unstated." (Cowan 1991:22)

The above can be taken as a kind of manifesto of Lojban attitude to compounding. The components, the morphologically primitive predicates (gismu), are held to have an 'unambiguous' meaning by virtue of the relationship they posit between their arguments. Thus botpi 'bottle' is defined as:

x1 is a bottle/jar/urn/flask/closable container for x2, made of material x3 with lid x4

and, in veridical use of the predicate, the presence of all arguments is entailed. If one asserts that X is a bottle, it is entailed that it is a bottle of something, with some lid. If it lacks either, it is not a botpi, and cannot be called lo botpi (it can be called le botpi, where le is the nonveridical determiner, corresponding somewhat to a definite article: "that which I describe as a bottle".)

This attitude to sense clearly predates any awareness of prototype semantics, and the insistence on positing necessary conditions for sense may prove unworkable. If so, language users will presumably use the words as they are and ignore the entailments. The language definers have belatedly recognised this by introducing the particle zi'o to 'undefine' a predicate argument, but this will probably not be enough.

Notwithstanding the problems inherent to 'unambiguous' base predicates, it is the manner in which they combine I wish to deal with. As we saw, head and modifier are considered unambiguous, but the manner in which they are related is not. Thus gerku zdani, "dog house", is as ambiguous as it is in English, and can have a variety of denotations: a house housing dogs, a house that is also a dog, a dog-shaped house, a house owned by dogs, etc. But once a compound word is formed based on this pair, like gerzda, "doghouse", it is supposed to entail only one of these possible head-modifier relations, and to have a 'unique' meaning.

While talk of unambiguous meanings seems a chimera, the claims being made are not impractical, when properly reinterpreted. The predicate definitions constrain their senses; in a tanru, the different ways the head can relate to the modifier expand the resulting sense; fixing a single relationship constrains sense sufficiently for there to be a single well-defined set of predicate arguments, without implying that there is a single sense or even a single denotation to the resulting compound.

The Lojbanic propensity for disambiguation has manifested itself in the formulation of rules, according to which the compound predicate can be derived from the component predicates. Though this venture was discouraged by the chief language engineer as premature, some preliminary guidelines were outlined by Jim Carter, and formalised by myself (Nicholas 1993). Further language usage, and considerable debate on the issue, have shown that, while these guidelines are useful in deriving compound predicates and augmenting Lojban vocabulary, there are many semantic provisos to be borne in mind while using them. I will list the main such problems below.

It is worth mentioning that a similar attempt to formalise word compound meaning has occured in a more well-known artificial language, Esperanto. (see Schubert (1993)). To what extent has actual language usage conformed to the prescriptive ideal, and were there any major gaps in the prescriptive model that usage has had to work its way around?

The prescribed rules (based on descriptive analysis of a corpus, to a much greater extent than the analogous work in Lojban) have encouraged productive use of word forms not used in the early language, most notably in the elision (where the rules allow it) of the suffix -ec-, "-ness". Thus laboremeco is already considered an archaism for laboremo, "industriousness", and compounds like rugxo are gaining ground over compounds like rugxeco ("redness"). Arguably the community has picked up and creatively exploited these rules. This does not mean that they are universally adhered to.

The first reason why is calques. For example, according to the rules, korekta means not "correct" but "corrective". Schubert claims that "for speakers not yet too familiar with the inherent regularity of Esperanto, the temptation is strong to use korekta as the English adjective correct or its French, German, etc., equivalents." (Schubert 1993:329), and the Full Illustrated Dictionary of Esperanto lists this sense of korekta as evitinda (to be avoided). Though it runs against the prescriptive norm, however, the use of korekta in this sense is so prevalent, it would be irresponsible for a descriptive account of the language not to list it as its primary meaning. This, I think, has two lessons for Lojbanists: firstly, languages are not closed systems (though by virtue of its morphology and philosophy Lojban will probably be more closed than Esperanto), and any internally generated account of the language is still subject to disruption from outside forces. Secondly, lexemes in speakers' native languages fulfil certain functional needs; if an artificial language community feels that need must be met, the word will be calqued across, no matter what the prescriptivists say. This has been the case for korekta, and I suspect it may often prove the case for Lojban speakers, given the inflexibility of the rules involved.

The second reason is idiomatisation. Disregarding early calques like respondeca ("responsible"), this is most pertinent in Esperanto for compounds where meaning can no longer be compositionally accounted for. In some cases, the compound is a hyponym of the meaning obtained componentially. Thus a lernejo is not just a "place of learning", but specifically a school; "this meaning has been defined by convention. The full compositional meaning can be actualised when a speaker so wishes, but special means are needed for this." (Schubert 1993:354) (The reading obtained, I would add, would be highly marked.) Or, the compound may have a figurative sense, as in the old calque disvolvigxi "to unwind" for "to evolve". (Lojban, being used by a literal-minded, logicist community, has an abhorrence for such compounds.) These semantic shifts may also be amenable to a functionalist analysis, where the de facto readings are somehow more useful, as taxonomically more basic, than the prescribed readings of the compound. A pragmatics of word compounding in lexically productive languages --- what pragmatic circumstances motivate the coining of new lexemes as opposed to the continuing use of periphrases --- would yield some interesting results here. In any case, we can see that, again, if speakers feel they need an expression for a certain concept, or if connotations and the interaction of semantic fields are potent enough, then the meaning of a compound will shift from its original, componential sense; the metaphor underlying the compound will die. This process, some believe (Schubert quotes Ferdinand de Saussure's discussion of artificial languages in this light), is part of creolisation, and is inevitable when a language gains a wide enough speaker community.

Semantic problems

(see Appendix for the prescriptive rules discussed.)

1. If the modifier describes an argument of the head, would a place for this argument in the compound predicate be redundant? For example, given that one says la monrePOS. zdani la spot. "Mon Repos houses Spot", does it make sense for gerzda "doghouse" to still have a thematic role for the entity housed (la monrePOS. gerzda la spot. "Mon Repos dog-houses Spot")? There are intuitive reasons to say no: he is the (*van) driver of a van would indicate a straighforward equivalence between modifier and complement. Is this the case for "doghouse" though? The 'transformation' of the English compound to noun with complement is "house for dogs", where "dogs" is clearly a hypernym of "Spot". This difference in denotation indicates that "Mon Repos dog-houses Spot" is not conveying redundant information, in that the identity of the dog is unspecified by the compound.

But if the proper meaning of "doghouse" is "house for dogs", shouldn't the denotation of the "dog" thematic role be that of a mass noun, rather than a proper name? And shouldn't the usage of gerzda with a specific dog be blocked (Aronoff 1976:43) by the use of zdani? (The former gives the identical relational information, merely imposing the selectional restriction that the entity housed be a dog.) In other words, is there any motivation for the predicate gerzda to mean "X is the doghouse of dog Y", when Y seems to be properly a mass noun, and when zdani is doing the same job? Opinion in the Lojban community is divided. The conclusion I arrived at in formulating the guidelines was that most of the time there was no such motivation, but what takes the place of Y, if anything, is unclear. If Y is also dogs, then it is redundant (*X is a doghouse for dogs); if it is a hyponym for dog, while still a mass noun, it might not be (X is a doghouse for St Bernards), though it can still be blocked by the analogous X houses St Bernards.

2. The resulting compound argument order is counterintuitive. Typically the first argument in Lojban corresponds to the subject, and the second to the direct object. This doesn't work out when the compound is an action verb (agentive causative) and the base predicate is an instrumental state verb (using Fillmore's 1968 Case Grammar terminology: see Cook 1989:15) Thus vreji "X is a record of event Y in medium Z" has the causative veirgau "agent W makes X a record of event Y in medium Z", rather than the expected "agent W records Y in medium Z as X" ('X' is still the object, in case grammar terms). Similarly, the agentive of galfi "process X modifies Y into Z" is gafygau "agent W uses process X to modify Y into Z" rather than "W modifies Y into Z using process X". With an increasing number of base predicates being defined as state rather than action verbs (lacking an agent thematic role), this discrepancy is likely to cause confusion in usage, compared to non- instrumentals (cf. fengu "X is angry about Y" and fegygau "W makes X angry about Y", or glare "X is hot by standard Y" and glagau "W makes X hot (heats X) by standard Y".)

3. The decision on which arguments should be eliminated as "irrelevant to the definition" is arbitrary and non-compositional, and has drawn criticism as an indirect means of calquing: by positing such "irrelevancies", users are said to be matching word meaning to an extraneous (English) model. An example of this is laurba'u for "to bellow"; the decision to semantically restrict "loud utter" to "bellow" is externally motivated. While this is a valid criticism, in the light of Esperanto's experience with calques, and the impoverishment of Lojban's native resources, I think such coinages are inevitable. Unfortunately, they will foreseeably also cause much confusion. For example, in considering the arguments for posydji "to want something" (ponse djica "own want"), I have discarded as irrelevant the argument of ponse denoting law of ownership, while John Cowan has discarded the argument of djica denoting motive for desire. Leaving all such arguments in is unworkable, since the compound's arguments would proliferate unmanagably; but setting rules for which arguments to leave out seem to me implausible.

4. The decision on which arguments to eliminate as redundant is also somewhat arbitrary: it is a matter of deciding which thematic roles should be coindexed. For posydji, the subject of wanting needn't be coindexed with the subject of owning. It would be possible for the predicate to have the form "X wants for Y to own Z", rather than "X wants to own Z". A case could be made, however, based on some notion of iconicity, that a shorter expression like the compound should express a simpler or more frequent concept (by Zipfean metrics) The sentential paraphrase can then be reserved for the more general concept, with more frame roles. An analogy could be made with the English constructs She wants to own it (for She wants herself to own it) versus She wants him to own it. This would mean that as much coindexing as possible would be encouraged between component predicate arguments.

5. There is a concern (pointed out by Mark Shoulson) that language users are unnecessarily coining compounds to match the semantic map of English, when a language as lexically restricted as Lojban (about 1350 morphologically primitive stems) should be content with a less finely divided semantic taxonomy of the world. For example, Shoulson has criticised my coining of djabeipre "waiter" (cidja bevri prenu "food carry person"), arguing that in most contexts the stem bevri "carrier" is sufficient. While this is a perceptive observation, native semantic maps tend to be firmly entrenched in speakers' minds. Esperanto borrowed its own maps from German and Russian (Waringhien 1959), and consolidated them with a large corpus of writing. Lojban can avail itself of neither kind of resource. It remains to be seen to what extent Lojban can find, let alone enforce, a native semantic map, founded on its morphological primitives.

6. Because of the analytic nature of some of Lojban's predicates, there is frequently free choice as to which compound component is the head, and which the modifier. For example, in they killed each other, "killed" would be taken as the head, and "each other" as a complement or modifier, whether it appeared as a noun phrase, a prefix, or an adverb. But Lojban predicate simxu "mutual" is so defined ("set of participants X are reciprocally involved in activity Y"), that it could be taken as the head. simcatra "kill each other" can be transformed to the phrase simxu lenu catra "are mutual in that they kill". This atypical free choice extends to several modifiers: milxe "mildly", carmi "intense", cmalu "small", mabla "derogative", and so on.

In fact, it has become something of a shibboleth to note which element a speaker uses as the head in compounds. Most language users make the natural language modifier the modifier in their compounds, even if they are aware that a Lojban 'deep structure' analysis has it as the head. A minority though (most prominently Jorge Llambias and Jim Carter) do not. Thus the former has used kakymli "clear one's throat" instead of the expected mlikafke, from kafke milxe "cough mild, mild in coughing" rather than milxe kafke "mildly cough". The question here is what properly constitutes a head. Should the judgement be only syntactic/ morphological (in which case kafke can be both head and modifier according to individual prefernce), or should it be semantically motivated? (Which raises the problem of which semantic metalanguage one should use for the judgement. Lojban metalanguage needn't have any validity outside Lojban, and any metalinguistic decision seems entirely arbitrary.)

7. Lojban compounds are not always as compositional as the outline below would have it. Indeed, the merit of the compositional scheme given is that it works as often as it does, not that it can predict the behaviour of all conceivable compounds. The approach typically taken towards compounds like gusfu'i "photocopy" (gusni fukpi "light copy") is that they elide other morphemes in their derivation. Were these morphemes present, they would give a compositional derivation --- one, namely, in which the modifier would be an argument of the head, or in which the modfier and head both describe the same denotatum. It is worth asking how constructive such an account would be, when the current compound, while not detailing how light relates to copying, is sufficiently evocative of its sense. (Though the Chinese calque fragu'i "laser" from frati gusni "reaction light" may show this evocation is a matter of degree.)

The current system also has no account of compounds where the modifier is a modal argument of the head predicate, such as jboselsku "Lojban writings" (lojbo selsku "Lojban expression"). Again, a strict compositionalist approach would posit some elided predicate which would make the modifier a proper argument of the head (like seke lojbo pilno cusku "PASSIVE ((Lojban use) express)"); again, the descriptive adequacy of such an account is questionable. Since modal arguments don't present a problem for compositional accounts not as strict as Lojban's (like Esperanto's), the system clearly needs some broadening to treat such cases explicitly, instead of purporting to reduce away all possible ambiguities.

The 'elision' account is also invoked to treat cases where heads or modifiers are used unmodified, where the rules would predict passivisation phenomena (where the first argument of the predicate involved is exchanged with another argument). Two examples that have drawn much comment are le'avla "loan word" (lebna valsi "borrow word") and xekskapi "dark-skinned" (xekri skapi "black skin"). It has been argued that the former should be selyle'avla, from selylebna valsi "borrowed word", since loan words are both borrowed and words, but not both borrowers and words. I believe that lebna valsi is as evocative a compound as gusni fukpi, and there is no overriding need to saddle the language with longer than necessary compounds, simply to satisfy demands of an overstrict compositionality. I have accepted the criticism made to me, though, that xekskapi is 'incorrect', in that the head is not describing the denotatum, as is expected in Lojban. Someone dark-skinned is not skin, but someone with skin. This has led me to recommend against xekskapi in favour of xekselskapi ("black skinned") in my guidelines, and to frown on unmodified heads more than unmodified modifiers. It remains to be seen whether the prescriptive machinery of the language will enforce this restriction; from my experience, prescriptivism seems to be a potent force in artficial language communities, whose hold on any model of their language is always tenuous.

Appendix: The rules for Lojban compounds (lujvo)

The guidelines I have formulated in Nicholas (1993) can be summarised as follows:

The argument set for a compound is a subset of the union of arguments of its component predicates.

Arguments can be eliminated from this set by conveying redundant information (having the same denotatum as some other argument), or irrelevant information (which is taken as contrary to the definition of the compound's new concept.)

As an example of the former, the predicate for gerzda does not have both a place for the entity housed (x2 of zdani "house") and a place (thematic role) for the dog (x1 of gerku "dog"), since they are presumed to have the same denotatum. As an example of the latter, laurba'u "to bellow", from cladu bacru "loud utter") does not have a place for the location at which the bellower is loud (x2 of cladu), since a bellower in New York is still "loud-uttering", even if she is quiet relative to an observation point in Melbourne.

There are two interpretations of the relation between head and modifier. Either head and modifier are predicates both describing the denotatum (eg. balsoi "warrior" from banli sonci "great soldier"), or the modifier describes an argument of the head predicate (eg. gerzda, where gerku "dog" is an argument of zdani "house"). As a special case of the latter, the modifier may be the predicate of a sentential complement of the head (eg. ctigau "feed" from citka gasnu "eat act").

The arguments of any component predicate should appear in the same order in the compound predicate, though they may be interleaved with the arguments of other component predicates.


Aronoff, M. 1976. Word Formation in Generative Grammar. Linguistic Inquiry Monograph 1. Cambridge (Mass.): MIT Press.

Brown, J. C. 1960. Loglan. Scientific American 202(6). 53-63.

Cook, W. A. 1989. Case Grammar Theory. Washington (DC): Georgetown University Press.

Cowan, J. 1991. Response to Arnold Zwicky's review of 'Loglan 1': Loglan and Lojban: A Linguist's Questions and an Amateur's Answers. ju'i lobypli 14. 21-29.

Dasgupta, P. 1993. Idiomaticity and Esperanto texts: an empirical study. Linguistics 31. 367-386.

Fanselow, G. 1988. Word Structure and Argument Inheritance: How Much is Semantics? In The Contribution of Word-Structure-Theories to the Study of Word Formation. Linguistische Studien Reihe A Arbeitsberichthe 179. Berlin: Akademie de Wissenschaften der DDR. 31-52.

Nicholas, N. 1993. Doing the belenu blues: Lujvo place structure paper, Version 2. Lojban FTP Server (

Schubert, K. 1993. Semantic compositionality: Esperanto word formation for language technology. Linguistics 31. 311-365.

Waringhien, G. 1959. Lingvo kaj Vivo: Esperantologiaj Eseoj. La Laguna