Lojban and Sapir-Whorf

From Lojban
Jump to navigation Jump to search

Extracted Network Discussions of Lojban and Sapir-Whorf]], mostly 8-9/90.

Copyright, 1990, 1991, by the Logical Language Group, Inc. 2904 Beau Lane, Fairfax VA 22031-1303 USA Phone (703) 385-0273 lojbab@lojban.org All rights reserved. Permission to copy granted subject to your verification that this is the latest version of this document, that your distribution be for the promotion of Lojban, that there is no charge for the product, and that this copyright notice is included intact in the copy. Computer Network Discussions on Loglan/Lojban and Linguistics (and Esperanto and ...) - Recorded Primarily Aug-Sept 1990 Subject: The Sapir/Whorf Hypothesis Participants: jfl@munnari.oz.au (John Lenarcic) pautler@ils.nwu.edu (David Pautler) dtate@unix.cis.pitt.edu (David M Tate) minakami@Neon.Stanford.EDU (Michael K. Minakami) rjohnson@vela.acs.oakland.edu (R o d Johnson) hullp@cogsci.berkeley.edu dmark@acsu.buffalo.edu (David Mark) colin@cstr.ed.ac.uk (Colin Matheson) swsh@ellis.uchicago.edu (Janet M. Swisher) wdr@wang.com (William Ricker)

1. jfl: Briefly stated, the [Sapir/Whorf] hypothesis is : " Language shapes the way we think, and determines what we can think about."

2. pautler: (responding to 1.) A professor in pragmatics told me this spring that the theory only claims that a given language forces its users to mentally keep track of certain information like time-of-occurrence, etc. that is needed to make correct decisions about tense, etc. that are required to form sentences.

3. dtate: (responding to 2.) I think this understates the hypothesis, at least in Whorf's version. Whorf claimed that, since we think in language, the language in which we think will have enormous impact on the ways in which we think, tending to reinforce certain patterns and undermine others. It could be something as blatant as having the word for "good" being etymologically related to that for "strong", tending to reinforce "might makes right" thinking, or as subtle as the lack of a socially acceptable passive voice encouraging thinking of one's self as an agent and not as an object (or, of course, the converse). There is, to be sure, a "chicken and egg" question here: is it the language that shapes the culture, or the culture that shapes the language? The answer (IMHO) [Net abbreviation: "In my humble opinion"] is "both": the language evolves because of and in accordance with cultural forces, but after a certain point the language develops a momentum of its own, tending to carry the culture in directions already inherent in the language.

4. minakami: (responding to 2.) I think this is only the weak form of the Whorfian hypothesis. The strong version does assert that the structure and lexicon of a language shapes thought. According to J. R. Anderson: "Whorf felt that such a rich variety of terms would cause the speaker of the language to perceive the world differently from a person who had only a single word for a particular category." This stronger version of the hypothesis is generally considered disproved by Rosch's studies of color vision and similar experiments.

5. rjohnson: (responding to 2.) There are various versions of the idea around, which can be attributed to von Humboldt, Sapir, Whorf, and their commentators. The idea that language "determines what we can think about" is a very strong version of the hypothesis, probably stronger than Sapir would have liked, maybe stronger than Whorf. These things were not always stated with perfect clarity and consistency, though, so it's difficult to say. [jfl's version in 1.] is a slightly odd-sounding version of Whorf's thesis. It's hard to say if it's a good rendering of Whorf into modern terms, but it feels rather reductive to me. At any rate, it's too narrow: Whorf was concerned with Hopi versus English way of thinking about time in that particular article, but the thesis in general isn't strictly limited to that. Hopi merely provided (or seemed to provide) a striking illustration of two different ways of thinking. Note that "ways of thinking" is in fact rather sloppy here: Whorf didn't actually investigate the ways Hopis think about time in any detail at all - he merely projected his feeling about the language onto their thinking. In essence, he assumed the truth of what later commentators saw as a "hypothesis". To Whorf, it was almost self-evident.

6. pautler: (continuation of 2.) I believe the comparison S/W used to illustrate this was the bookkeeping required by a Southwest Native American language (Hopi?) regarding the source or validation of information - evidently there are markers performing the function of "FOAF", etc. that are as necessary to well-formedness in that language (which does not mark tense) as tense is to English (which does not mark validation). Of course, the Native American language can express time-of-occurrence if need be, just as English can express source-of-information, but neither is explicitly required by the language itself. I believe the traditional example: (~11 Inuit language words for snow) and (~1 English word for snow) ==> (Inuit language and English users think about snow differently) might not be due to S/W and probably misrepresents their idea. But I am not a linguist, nor have I read their work. I just wanted to suggest that applications of S/W may not be what you actually want to look for.

7. rjohnson: (responding to 6.) Yes. Whorf, though, not Sapir/Whorf. Whorf, though he had had some training, was basically a gifted amateur; Sapir was less inclined to make sweeping claims - he knew how language has a way of stabbing such claims in the back. Boas, in fact, in the Introduction to the "Handbook of American Indian Languages" (1911) [introduces the "snow" example]. (At least this is the point at which it was introduced into linguistics.) Geoff Pullum has recently done a fairly comprehensive study of where this idea comes from and how it has mutated into "50 words for snow", "*100* words for snow," etc. I, and I think many other linguists (though not all), have a gut feeling that somewhere, somehow, deep down, there's a kernel of truth in the idea, but no attempt to frame it as an empirical hypothesis has, to my knowledge, really led anywhere.

8. hullp: (responding to 7.) Actually, several studies have indeed led somewhere. Casagrande's 1950's studies demonstrated a so-called Whorfian effect on children's perception of shape. The comparison was between Navaho speakers (whose language mandates the marking of shape with inflections) and English speakers. There have been a few others (not many, admittedly) that have demonstrated similar effects. The problem is that most of the tests of the hypothesis have been tests of color perception and categorization. Color perception is strongly rooted in physiology and is thus uniform across cultures to a large degree. Any language effects would have to be in a domain for which there is less evidence for a physical basis.

9. dmark: (responding to 8.) In fact, Lakoff (in "Women, Fire, ...") discusses a study by Kay and Kempton that seemed to clearly demonstrate linguistic relativity in color perception. Phillip Hull is correct in pointing out the strong physiological basis of color perception. Thus different color perception due to language seems pretty powerful evidence. (I could describe the experi- ment, from Lakoff's account, and/or give the full reference, if people want me to.)

10. rjohnson: (responding to 8.) Thanks for this information. I guess I was using "led anywhere" in a somewhat more global sense. That is, I know there have been a smattering of studies that purport to be consistent with ("confirm" is too strong, I think) the S/W hypothesis - but it doesn't seem that any real coherent picture emerges of "thought" as a whole being strongly affected by "language" as a whole; that is, we have little evidence that "Whorfian" effects are of fundamental importance to cognition. Instead we get hints that there may be something there, but the results are mixed and often rather tentative. Does this fit with your perspective on things? (Admittedly, notions like "of fundamental importance" are pretty difficult to assess.) On the other hand, as you say, the best-known disconfirming studies suffer from being in the relatively few areas where there probably are reliable hard- wired universals, as in Berlin and Kay's studies of color terms. In the huge gray area, evidence seems hard to come by. I was briefly involved with a cognitive science team a few years back that was grappling with some of these questions, and it seemed to me that the task of designing experiments was extraordinarily hard - every approach had serious pitfalls. I don't know how their work turned out, though.

11. colin: (responding to 7.) I agree with your gut feeling. I suppose the trouble is, as with many Linguistic issues, that the "truth" of the matter lies at such a level of abstraction that it's difficult just to talk about it. However, here's one suggestion of one version of the thesis (count the hedges!). Perhaps it's true that the act of "compressing" abstractions into concepts represented by single lexical items or phrases has a qualitative effect on the kinds of things it is possible to talk about. Thus although it's probably the case that one can express any particular concept in any language periphrastically, it might just be that the ability to encapsulate things in immediately transferrable units affects the sorts of transfer that are possible. (Where the transfer is of information between humans.) Is this version of the Sapir/Whorf stuff part of the original, by the way?

12. swsh: (responding to 11.) No, I don't think so. In my understanding, Whorf and Sapir were not interested so much in what "one can express" in a given language, as in the conceptual categories which underlie grammatical ones and which are used by speakers as a guide to experience. Thus, the important thing in their view is not how many words for snow a language has, but what assumptions about things like space, time, form, substance, etc., are implicit in the language's grammatical categories. The controversial part about what they, particularly Whorf, said is the thesis that speakers use these assumptions to guide their habitual beliefs and attitudes, and therefore see them as arising directly from reality, rather than projected on to it. The "Whorfian hypothesis" is often stated as having two forms, a "hard" version (language determines thought) and a "soft" version (language and thought are kinda sorta related). From Whorf's writings, it appears that he himself held views more towards the "soft" end of the spectrum. He shied away from saying there is a "correlation", that being too definite a word, preferring to say that it could be shown that there are cases where linguistic categories are in some way connected to cultural ones, even if it's not universally true. However, it seems to me that it would be mighty odd to find a language whose grammar revealed a categorical system that was otherwise unused by speakers, either in individual cognition, or as part of the attendant culture.

13. wdr: (responding to 11.) If I understood that periphrastic version of the hypothesis, I think it has as a corollary that English is not highly suited to it's own transfer. Which, given the context, I suspect may have been Colin's point, but if it wasn't, I'll suggest it more openly. Is a natural language the right language in which to discuss the deficiencies of natural languages? That it was not was one of the original motivations of the Loglan/Lojban successor of Esperanto. Can one of you sci.lang folks translate the S/W hypotheses various statements in this newsgroup lately into Lojban and give us an unbiased account of how manipulable they are in a non-formal yet unnatural language? [ed.: no one has done this yet - any volunteers?]

14. pautler: (wrapping up) Perhaps many of you are tiring of the discussion about the claims made by S/W, but I'm going to take the risk of extending the debate: Does the S/W hypothesis suggest that we view a particular language as a collection of tools used to achieve social (communicative, in particular) goals? The analogy I have in mind is this: our ability to achieve tasks is determined by the tools we have at hand, which forces us to think about solving the task primarily in terms of what subtask each tool can achieve. Of course, we can always attempt to invent new tools if they are needed, but invention is difficult for both language conventions and tools, so the analogy still holds. My claim, then, is this: if this is an accurate analogy, then should the S/W hypothesis be any more surprising than a claim that farmers and stockbrokers think differently about the world due to the different means they have of interacting with it?

Subject: Lojban as seen by the linguistics and cognitive science community

Participants: dan@YOYODYNE.MIT.EDU (Dan Parmenter) cowan@marob.masa.com (John Cowan) kimba@cogsci.ed.ac.uk (Michael Newton) rjohnson@vela.acs.oakland.edu (Rod Johnson) dtate@unix.cis.pitt.edu (David M Tate) harold@ccl.umist.ac.uk (Harold Somers) aronsson@lysator.liu.se (Lars Aronsson) lojbab@snark.thyrsus.com (Bob LeChevalier) lgorbet@hydra.unm.edu (Larry P Gorbet) daryl@oravax.UUCP (Steven Daryl McCullough) daj@beach.cis.ufl.edu (David A. Johns) lee@uhccux.uhcc.Hawaii.Edu (Greg Lee)

1. dan: (starting the debate - several paragraphs below elucidate his opinions further) I have been acquainted with Lojban for a few years now, and have a few thoughts on the matter. My overall impression is that a monumental effort is being made by an astonishingly large group of people, and that while it is quite well- intentioned, its ultimate goals are unattainable at best, and highly suspicious at worst. Some minor and major objections: One: The audio-visual isomorphism. Presumably, this is an attempt to address the rather poor way that some written languages reflect the spoken language (such as English). This fails to predict variations of accent, as well as the language-specific biases of speakers - English speakers for instance will probably continue to mark yes-no questions with a rising tone. Of course this isn't indicated in the written form, so already the idea of audio-visual isomorphism is weak at best.

2. lojbab: (responding to 1.) Yes, English speakers probably will. But Hindi speakers probably won't. Thus rising tone (pitch) will not be a significant indication in Lojban. Now, in the English 'dialect' of Lojban, such suprasegmentals will probably be redundant and reinforcing information to the truly significant version of the questioned contained in the words. And if for some other reason, your voice rises in pitch, if there is no 'xu', it is not a yes/no question. As an advantage, I suspect that it will be a lot easier to get computers voice-processing the Lojban phonemes than the English suprasegmentals (Anyone have any actual knowledge on this?)

3. dan: (continuation of 1.) Furthermore, the idea of a language that assumes all of its speakers will have precisely the same accent is too terrifying to contemplate, yet Lojban's writing system would seem to depend on this fact.

4. lojbab: (responding to 3.) Lojban's prescription says nothing about 'accent'. Each of the sounds we've defined as phonemic has a certain range wherein it is phonemic. Lojban 'r' can range from a full trill to a simple flap, for example, and we've made no prescription regarding dark 'l' vs. light 'l'. Difference in these phonemes will result in different 'accents'. There will probably be less spread than most natural languages, but there will be some spread.

5. cowan: (responding to 3.) Of course [it's too terrifying to contemplate]! However, this neglects the distinction between "emic" and "etic" features of the language. The claim of audio-visual isomorphism is not that every possible dis- tinction of speech is represented in the written form, but only that all significant distinctions are so represented. For example, true-false questions may be signalled (among English speakers) with a rising tone, but also must be signalled with the prefix word "xu". The "xu" carries the entire content, and will be understood by any fluent Lojbanist from whatever background. The tone is superfluous.

6. dan: (responding to 5.) If every Lojban speaker were a native English speaker, you could just as easily argue that the "xu" is superfluous. But this is circular reasoning. Is the purpose of Lojban to be spoken in a dull mono- tone? Or do you expect the writing system to evolve to account for any variations in tone that might come along? Suppose some third-generation Lojban speakers always mark yes-no questions with a falling tone accompanied by a series of elaborate hand-jives (gestures are expressive too), will you mark this in the written version as well? How do you determine what a "significant" feature of the language is?

7. cowan: (responding to 6.) We determine significant features by defining them. Again, this is a constructed language, and a posteriori reasoning appropriate to natural (non-constructed) languages doesn't necessarily fit all cases. In the baseline version of Lojban, the way of marking a true-false question is to prefix it with "xu". This is true by definition, a priori. Once the language is baselined, the normal processes of linguistic change may indeed alter the marking system to something involving tone, gesture, or toe-wiggling. At that time, Lojban will be a natural language (defined here as one having native speakers) and will need to be investigated by the methods of ordinary synchronic linguistics. (When Bob LeChevalier, the most fluent speaker at present, speaks in the language, he does tend to talk in a monotone, possibly bending over backwards to avoid influence from English suprasegmentals. He does hesitate longer between sentences than at other mandatory pauses, though.)

8. lojbab: (responding to 6.) That would be a truly odd purpose for a language - to be spoken in a monotone.  :-) The writing system would not need recognize variations in pitch, gestures, or any other feature of spoken language unless these came to convey variations in meaning that were not already reflected (and reflectable) in the written lan- guage. In addition, since human-computer interaction using Lojban is intended to be significant in its usefulness, it seems unlikely that there will evolve variations that cannot be easily recognized AND reproduced by a computer lis- tener/speaker. A significant feature of a logical language, of course, is one that affects the truth conditions of its statements. A change or variation in the language would not be 'significant' unless it affected such truth conditions. A change which introduced ambiguity would obviously be significant.

9. cowan: (continuation of 5.) Note also that audio-visual isomorphism cuts both ways. It ensures not only that every "emic" feature of speech is representable in writing, but also that features of text such as paragraphing, structural punctuation, parenthesis, and layout have representations in speech. For example, the word "ni'o" signals a change of subject and is used to separate spoken paragraphs; likewise, non-mathematical parentheses are pronounced "to" for "(" and "toi" for ")".

10. dan: (continuation of 1., from 3.) TWO: Sapir/Whorf is tacitly assumed by almost everyone that I've talked to in connection to Lojban. This isn't unusual, since it's also assumed by an astonishing portion of the world at large.

11. cowan: (responding to 10.) The Lojban project is founded on assuming the truth of SWH; the falsity of SWH is the null hypothesis. To develop Lojban at all, we must assume SWH. If Lojban turns out to have no effect on thought, i.e. to be a mere code, SWH will not be confirmed. (This is not to say it will be disproved.)

12. lojbab: (responding to 10.) Assumed to be what? True? No. Important enough to test? Yes. If Sapir-Whorf is important enough to test, then Lojban must be designed with features that will likely have a noticeable effect, while being sufficiently culturally neutral that non-Lojban variables can be at least statistically removed. The Lojban design HAS to assume that Sapir-Whorf is true, or that design will be meaningless for experimental purposes. As to whether those working on the language 'tacitly assume' Sapir-Whorf, I doubt it. There are no doubt many who believe SWH true, and a couple I know of who believe it false, but are willing to see. Most are fairly open-minded. In any case, if we are being 'good scientists', our individual opinions on the hypotheses we investigate shouldn't matter, since some degree of professional detachment is expected. When I work on Lojban as a researcher, I try to turn off that part of me that does 'Lojban promotion' (admittedly a bit more biased). I rely on peer review to catch any biases from my personal views that slip into my work. Given the wide disparity of views among Lojban workers, and our sensitivity towards avoiding unnecessary bias, I'm confident that there is no problem. If Sapir-Whorf (or its equivalent - since a lot of people assume it without even knowing it exists) is tacitly assumed by the world, it seems an especially important question to investigate scientifically. If SWH is used by some to justify racism, some concrete data to attack such use is more effective than personal distaste. Just because a scientific question has political ramifications based on its possible outcomes does not mean that the question shouldn't be asked, or moreover, shouldn't be answered.

13. dan: (responding to 12.) Yes, I'd say that a surprisingly large number of people when informed about S/W will automatically assume it to be true. The issue to me is one of putting the cart before the horse: to whit, many people have astonishingly racist attitudes about a wide range of phenomena. Language is no exception. If you read the literature of the whole English First movement, one sees thinly veiled racism of the worst sort. Also witness the thinly veiled classism of most of the prescriptivists - the goal is to avoid sounding "low class". Even something as simple as differing accents within a homogeneous speech community can cause people to raise their eyebrows. Human beings seem to have an overwhelming urge to pigeonhole people by any method possible. What does this have to do with S/W? Well, given that nobody seems particularly satisfied either way with the results of actual psycholinguistic tests that have been tried, if someone believes S/W then they can choose to ig- nore any test results that seem to go against it and start to make some pretty frightening statements.

14. dan: (continuation of 1., from 10.) What I'm getting at is that there is a serious danger that people who believe in the S/W hypothesis will use this belief to make claims about their language being superior to someone else's. The empirical basis for these claims has already been discussed, so I won't get into it, except to say that I remain unconvinced by the S/W hypothesis.

15. cowan: (responding to 10 and 14.) One of the major workers in Lojban [ed.: pc] believes that SWH is in fact false. There is as diverse a variety of views on SWH in the Lojban community as on any other subject.

16. lojbab: (responding to 14.) Yes, there is [a serious danger]. But there is also the chance that if SWH is true, that the reverse will happen. Based on the natural selection paradigm (also perhaps questionable with regard to languages - but the analogy is useful), if one language is 'superior' to another in some small area (such as mathematical thinking - as in the previous example), the fact that the other language survives indicates that it also has some compen- sating advantages that suit its niche. Thus Sapir-Whorf might help us see the virtue in all languages and cultures. I certainly don't think that if Lojban was proved able to assist or improve logical thinking, that it should displace English or any other language. To borrow someone else's line, Lojban becomes another tool in the linguistic tool chest. You learn it like an English speaker learns French or FORTRAN, to meet a communication need that is not well served by English.

17. dan: (responding to 16.) I am told that among anthropologists, S/W in some form, is popular.

18. lojbab: (responding to 17.) Indeed. I know that in the Loglan/Lojban community, Reed Riner at Northern Arizona and John Atkins and Carol Eastman at Washington are anthropologists that were/are interested in S/W. In addition, there is another 'related field' that makes heavy use of S/W, either directly, or in an evolved form. Semiotics apparently uses a lot of ideas these days that at least tacitly assume some degree of cultural relativity, and I'm told Umberto Eco, is particularly 'Whorfian' in his ideas. I don't know these things directly, having no meaningful exposure to semiotics. My source is Robert Gorsch at St. Mary's College in CA, who teaches En- glish/Semiotics/Linguistics there. He's been developing an introductory course in Semiotics showing the evolution of S/W into current semiotics theories (incidentally relying on Esperanto and Lojban as primary examples). We published his course outline and bibliography in a recent issue of our internal journal, Ju'i Lobypli.

19. dan: (responding to 18.) Eco is interested in a number of theories that are out of vogue among Chomskian linguists. He also seems to have an interest in the so-called "meaning-based" theories of language, posited by people like Schank, in the NLP [natural language processing] community. He devotes some space to Schank's theory of conceptual dependency in several books (titles forgotten ...sorry!). Many of fields related and unrelated to semiotics also make use of certain Whorfian arguments. Some feminist theorists have an axe to grind about how language is used to oppress women.

20. dan: (continuing 17.) To me, the idea of linguistic equality - that all languages are more or less created equal, is a much more egalitarian view. It jibes well with my notion that all people are created equal. This principle forms the basis for much in the way of my political views. I don't want to get into a debate here about the politics of language, but it's something I feel very strongly about.

21. lgorbet: (responding to 20.) The phrase in Dan's recent posts that confuses me a lot is "all languages are equal". So far as I can see that may well - probably has nothing to do with whether (some version or other of) S/W is true or not. I suspect the most common belief of linguists who think about S/W at all is that (a) S/W is true; and (b) all languages are "equal". AND you seem to be assuming that the truth of S/W entails inequality (in some unstated sense) of languages. All S/W says, even in the strongest versions I know anyone competent who believes, is that languages are different in ways that leads their speakers to tend to think differently. Thanks to work by lots of folk over the past half century (oops, more than that), it's pretty clear that different languages have lots in common as well as some striking differences. So probably most of us (my wild supposition, I admit) think that the impact of a true S/W would not be all that huge a difference. But a difference in conceptualization and knowledge is not the same thing as inequality. It almost seems to me that to assume that different ways of thinking are unequal ways of thinking plays into the hands of racists even more... This is NOT a flame. You raise some important issues, many of which I agree with, especially about the ways our work can get abused by those with an unsavory agenda. [The discussion of Sapir-Whorf and its possible racist use continued for quite a while, and is omitted.]

22. dan (continuation of 1., from 14.): This empirical basis is something that I use as a foundation for my personal ideological beliefs with regard to such issues as English-only laws and prescriptivism (by the likes of Safire, Lederle, Simon et al.). It seems to me that the Lojbanists, who are already claiming that the language makes them think more clearly on certain things are setting themselves up for a type of elitism that I find frightening. THREE: Lojban's allegedly unambiguous syntax. The bottom line is that "plastic cat food can cover" is still ambiguous in Lojban.

23. cowan: (responding to 22.) This English utterance is ambiguous in three different ways. Syntactically, it might be a noun phrase (a kind of cover) or a sentence (asserting that plastic cat food is capable of covering something). Lojban does not have this kind of ambiguity: the first would be "lo slasi mlatu cidja lante gacri" and the second would be "lo slasi mlatu cidja ka'e gacri".

24. harold: (responding to 23.) Well, I think you'll find that syntactically the phrase is MUCH more ambiguous: as a noun phrase, ignoring the semantic ambiguity of any noun+noun pairing (e.g. "cat food" = food for cats, food made of cats, food which looks like a cat; "can cover" = cover for a can, cover made out of a can; "plastic cat" = cat made out of plastic, cat which behaves like plastic, cat which belongs to plastic, etc) it has readings [numbers added for later cross-reference]: a cover for plastic cat food cans i.e. a cover for cans which contain plastic cat food i.e. 1 a cover for cans which contain food for plastic cats or 2 a cover for cans which contain plastic food for cats or 3 a cover for plastic cans which contain cat food or else a can cover for plastic cat food i.e. 4 a can cover for food for plastic cats or 5 a can cover for plastic food for cats or else a food can cover for plastic cats i.e. 6 a cover for a food can for plastic cats or 7 a can cover for food for plastic cats or else a cat food can cover made of plastic i.e. a cover, made of plastic, for cat food cans i.e. 8 a cover, made of plastic, for cans for cat food or 9 a cover, made of plastic, for food cans for cats

25. cowan: (responding to 24.) Let me render each of these forms into Lojban. As a glossary, slasi 'plastic', mlatu 'cat', cidja 'food', lante 'can', and gacri 'cover' take care of all the content words, each of which (luckily for me) has a single-word Lojban equivalent. I will comment on the function words I use as I use them. It should be stated from the start that Lojban interprets dyadic compounds as <modifier> followed by <modificand>, in other words AN [adjective-noun order], although this can be changed with the particle "co". [numbers relate back to English in 24.] 1) "slasi mlatu cidja lante gacri". This form is totally unmarked, and has the meaning of the English 1) because Lojban associates left-to-right. In other words, "slasi mlatu cidja lante" modifies "gacri", "slasi mlatu cidja" modifies "lante", "slasi mlatu" modifies "cidja", and "slasi" modifies "mlatu". 2) "slasi mlatu bo cidja lante gacri". The function word "bo" causes the two content words surrounding it to be most closely associated. So "mlatu" modifies "cidja". Otherwise, left-to-right modification remains intact, so that "slasi" modifies "mlatu bo cidja", etc. 3) "slasi je mlatu bo cidja lante gacri". Here we make two coordinated claims about the "lante", namely that it is of type "mlatu bo cidja" (a cat-food can) and that it is "slasi" (plastic). So we insert the particle "je" which means this type of "and". (There are several Lojban words for "and", but "je" is the one that's grammatical in this context). 4) "slasi mlatu cidja lante bo gacri". Here "lante" and "gacri" are grouped, so that "slasi mlatu cidja" (food for plastic cats) modifies "lante bo gacri" (can-type-of cover). 5) "slasi mlatu bo cidja lante bo gacri". Here we have three components grouped in left-to-right order: "slasi", "mlatu bo cidja", and "lante bo gacri". Therefore "slasi mlatu bo cidja" modifies "lante bo gacri", making this a plastic cat-food type of can-cover. 6) "slasi bo mlatu cidja bo lante gacri". Here again we have three components, but different ones from those appearing in 5). 8) "slasi je ke mlatu cidja lante ke'e gacri". Here we introduce the new particles "ke" and "ke'e". These group in the same way that "bo" does, but everything between "ke" and "ke'e" is grouped. Wherever "bo" appears between two words, it can be replaced by "ke" before the first and "ke'e" after the second. So 4) can be rewritten as "slasi mlatu cidja ke lante gacri", with elision of "ke'e" at the end of the phrase. This is an example of a general point about Lojban: most things are expressible using both "forethought" and "afterthought" forms, comparable to the difference in English between "both A and B" and "A and B". In this case, we need the whole of "mlatu cidja lante" to group as one modifier, so "bo" is not usable. We also need "je" because again two claims are being made, that the cover is both plastic and for cat-food cans. 9) "slasi je mlatu bo cidja bo lante gacri". Here "bo" serves us again, in contradistinction to 8), because of an additional rule that comes into play when "bo" appears on both sides of an element: it is right-grouping. So whereas "A B C" means that "A B" modifies "C", "A bo B bo C" means that A modifies "B bo C". So here we claim that the cover is both plastic and is of type "cat food-can". There are other ways to express these ideas if the constraint on ordering the content words is relaxed. There are also lots of other possibilities expressible by the Lojban syntax, such as "slasi bo mlatu bo cidja bo lante bo gacri", which might be a plastic type of food-can cover for use by cats. In addition, "je" (and) can be replaced by "ja" (inclusive or) or "jonai" (ex- clusive or) or any of the other Boolean relationship, or by various non-logical connectives such as "joi" (mass mixture): "slasi joi mlatu cidja" would be food made from plastic and from cats [mixed together].

26. cowan: (continuing 23.) In the English utterance, it is unclear exactly what modifies what.

27. harold: (responding to 26., continuing 24.) I don't think so. Of the above interpretations, there is a more or less clear ranking of preference, notwithstanding some context which promotes an unusual reading (e.g. a story about plastic cats): I find (8) the most plausible, with (3) next best. The least plausible are the ones involving plastic cats or plastic food.

28. cowan: (continuing 23., from 26.) So Lojban's unmarked form is grouped left-to-right unambiguously, and other groupings can be unambiguously marked by the insertion of appropriate structure words.

29. harold: (responding to 28., continuing 27.) It is relatively easy to construct plausible noun phrases consisting of five consecutive nouns for all the above patterns, just by substituting more appropriate nouns: e.g. 1 tabby cat food can cover 2 soya-bean cat food can cover 3 (already plausible) 4 =1 5 =2 6 =1 7 =1 8 (preferred reading) 9 (already plausible) And of course, we can construct longer sequences of noun phrases, with even larger numbers of ambiguities. Can Lojban handle all of these, and, more important, would we want a language to do so? The point is that most of the readings are implausible for semantic reasons, but all (or most) groupings are possible, given the appropriate words. The same thing happens with PP attachment by the way. The problem is that you cannot tell a priori which grouping will be plausible: NLP [natural language processing] programs have to try all possible groupings and then test them for semantic coherence, a terrible waste of effort with big noun phrases or sequences of ambiguous words like: Gas pump prices rose last time oil stocks fell in each word is at least two-ways ambiguous (all are both nouns and verbs, and some are also adjectives).

30. aronsson: (responding to 28.) What if the intended grouping was "(plastic and ((cat type of food) type of can)) type of cover"? That is a plastic cover for these cans (which are probably made of tin - I would consider this more probable) rather than a generic cover for these plastic cans. Would the sentence still translate into "lo slasi je mlatu bo cidja lante gacri"? Could the same sentence also mean "(((plastic and cat) type of food) type of can) type of cover"? (Never mind why anybody would make plastic food - that is semantics!) If any of the above, Lojban must be considered ambiguous.

31. cowan: (responding to 30.) No. "(plastic and ((cat type of food) type of can) type of cover" would be "lo slasi je ke mlatu cidja lante ke'e gacri", where "ke" and "ke'e" are logical parentheses. "(((plastic and cat) type of food) type of can) type of cover)" would be "lo slasi je mlatu cidja lante gacri" because "je" has higher precedence than concatenation, though lower than "bo".

32. aronsson: (continuing 30.) Or what if both modifiers have a more complex form? In the example above, the modifier plastic has the simplest possible form, but consider a phrase like (I wrote this with Emacs LISP mode!) ((some-special type of plastic) and (((cat or dog) type of food) type of can)) type of cover Here, parenthesis are needed not only for the general grouping, but also to unambiguously determine the precedence of "and" and "or"! IMHO [Net abbreviation: "In my humble opinion"], there are exactly two ways of designing a ambiguous-free language, none of which will make it look like any human language: 1) Using parenthesis as in LISP [see examples above] and 2) Using only very short sentences as in ordinary computer machine language. In case 2, the example would read: Cover. Cover for can. Can for food. Food for cat. Cover made of plastic.

33. cowan: (responding to 32.) The first method (parenthesis) is employed, using "ke"/"ke'e" parenthesis marks as needed. This is not supposed to "look like any natural language"; this is precisely the area where Lojban differs from all natural languages, and constitutes the evidence that Lojban is not an "{English, Chinese, etc.}-based code". "And" and "or" have the same precedence and are left associative; simple concatenation is also left associative, whereas "bo" (which semantically is the same as concatenation, i.e. undefined) is high-precedence and right associative.

34. cowan: (continuing 23., from 28.) On a third level, a phrase like "cat food" is ambiguous semantically. Is it food for cats or food consisting of cats? Here Lojban really is ambiguous, but the ambiguity is semantic not syntactic. The three main kinds of ambiguity in Lojban (this kind, ellipsis, and the ambiguity of names (which Sam?)) are all semantic in nature. As in any natural language, any of these ambiguities can be "expanded" on the semantic level by adding more information: "lo mlatu cidja" (a cat type of food) could become "da poi cidja loi mlatu" (something which is-food-for the-mass-of cats).

35. dan: (responding to 34.) Semantic ambiguity is present all over the place. How does Lojban handle issues like quantifier scope ambiguity? In English, a sentence like "Every man loves a fish" is ambiguous. If Lojban merely paraphrases such utterances, to two separate utterances along the lines of: "For all x, There exists a y such that x loves y" "There exists a y for all x such that x loves y" while tolerating some version of the original utterance, than nothing has been accomplished. I can do the same thing in English.

36. cowan: (responding to 35.) 1) Lojban has mechanisms for setting quantifier scopes, involving explicit quantifiers appearing in a prenex. 2) Loglan/Lojban has never claimed to be free of semantic ambiguity. Your original objection 3 [see

22. above] (refers to "allegedly unambiguous syntax", but on investigation your objections are to semantic rather than syntactic ambiguity. Our claims are: a) Lojban is free of phonological, morphological, and syntactic ambiguity, and b) Lojban semantic ambiguity is present only in clearly marked places within the language: a Lojbanist knows when he/she is using an ambiguous form, and can replace it as needed with unambiguous ones.

37. lojbab: (responding to 35.) I disagree [with dan]. For one thing, if Lojban can express the multiple meanings better and more clearly than English, and if the expressions can be more easily manipulated logically, this would presumably 'enhance logical thinking' if SWH is true. Lojban doesn't 'tolerate some version of the original' in the sense that the parallel translation to "Every man loves a fish" - "ro nanmu cu prami pa finpe" is not equivalent to both English paraphrases.

38. dan: (responding to 37.) So what's the gloss of the Lojban sentence? Which reading does it correspond to? Is there a quick and easy way to disambiguate?

39. cowan: (responding to 38.) The Lojban rule is that quantifiers are applied in the order in which they appear in the sentence, so "ro nanmu cu prami pa finpe", literally "all man love one fish" means "For all men X, there exists one fish Y, such that X loves Y." The other interpretation could be given by "converting" the predicate with the particle "se". This operation reverses the order of the arguments to a predicate. "pa finpe se prami ro nanmu", literally "one fish be-loved-by all man" means "There exists one fish Y, for all men X, such that X loves Y." Note that conversion is analogous to the passive voice but has no semantic significance other than this inversion of quantifiers. Lojban also has machinery for expressing the quantifiers externally in a prenex, terminated by the word "zo'u". So another set of Lojban paraphrases for your sentences above is "ro da poi nanmu pa de poi finpe zo'u da prami de", literally "all X which is-a-man, one Y which is-a-fish, X loves Y"; and "pa de poi finpe ro da poi nanmu zo'u da prami de", literally "one Y which is-a-fish, all X which is-a-man, X loves Y". Presumably, a transformational grammar of Lojban would derive both of these surface structures (with and without prenex) from the same underlying deep structures. What Lojban does not have is any sentence which means both of your two forms ambiguously.

40. lojbab: (continuation of 37, in response to 35.) You cannot 'do the same thing in English'. Even if the two English paraphrases are considered 'standard English' (and many linguists do not, identifying them as a jargon), neither is the same as Dan's original. Fill in 'man' for 'x' and 'fish' for 'y', and the result is ungrammatical: *"For all man, there exists a fish such that man loves fish." *"There exists a fish for all man such that man loves fish." It takes some extensive manipulations to turn these into grammatical sentences, and the results are not 'obviously' the same as the English original. These same manipulations do not suffice for all possible substitutions: if 'x' is 'George' and 'y' is 'fish', or if 'x' is 'George' and 'y' is 'Mary', you have to perform different transforms. In Lojban, the transforms are independent of the value.

41. aronsson: (responding to 34.) I fail to see the difference. When designing an artificial language one could outlaw all use of modifiers without modifier indicators (prepositions or similar). Thus it would have been possible for the Lojban designers to make "cat food" illegal, only allowing "food for cats" or "food made-of cats". If they did not do this, they obviously failed to design an ambiguity-free language.

42. cowan: (responding to 41.) We didn't want to make the language semantically unambiguous. 1) The language is phonologically, morphologically, and syntactically unambiguous; and 2) the language is semantically ambiguous only in specified areas, of which this is one (making open compounds by concatenation).

43. dan: (continuation of 1., from 22.) Natural languages are not unambiguous. From the acquisition side, ambiguous languages are much easier to learn for a child than a logical language would be. The principles of Universal Grammar [UG] do not seem to produce unambiguous languages, and all natural languages are constructed according to the principles of UG.

44. cowan: (responding to 43.) A lot of unproven assumptions here. Common assumptions, yes, but still unproven. We simply don't know whether a child could become competent in Lojban. Maybe when the language is complete and documented, somebody will be inspired to start raising bilingual children. There are native speakers of Esperanto, after all, whose parents have no other language in common.

45. kimba: (responding to 43.) If you're going to get stuck into people for assuming Sapir/ Whorf, I think you had better not be so blase about assuming the existence of "the principles of UG". The way you throw it in "jargonwise" I assume you mean the Chomskian notion, which will meet with plenty of disagreement. I suppose you could claim to mean any statements about properties which all/no languages have, but then the 2nd clause is vacuous.

46. dan: (responding to 45.) I do tacitly assume UG. To me, it seems a whole lot easier to swallow than SW, or other theories of linguistic relativism.

47. dtate: (responding to 46.) What a strange comment. As far as I can tell, UG (as a hypothesis about language) and SW (as a hypothesis about language and thought) are independent. Buying into UG wouldn't make me more or less apt to buy into S/W, nor vice versa. They're certainly not competing theories. They address totally different topics. I think the giveaway here is the phrase "linguistic relativism". I can't tell from context exactly what Dan means by this. It looks like the link is something like "S/W says that how you think is influenced by what language you think in; UG says there's an underlying deep structure common to all languages; conflict". But of course there is no conflict; every language has its own grammatical and etymological idiosyncrasies, whether deep structure exists or not, and these idiosyncrasies are the fuel for S/W. The existence of deep structure cannot refute the fact that languages differ in significant ways, any more than a proof of S/W would disprove the existence of deep structure common to all languages.

48. lojbab: (responding to 43.) Whether UG is 'real', a question better discussed by others, I know of no useful evidence for the claim [that UG forbids unambiguous languages]. That there is no unambiguous language today is irrelevant, since nearly all languages evolved from some earlier language, interacting with other languages, etc. Most sources of ambiguity probably can be tied to these evolutionary processes. Lojban might also succumb to such ambiguity, but as an a priori language constructed after the printing press, having (unlike other languages) a complete prescription it has a lot better likelihood of resistance to 'undesirable' change. There is no way to tell if the misuse of 'hopefully' or split infinitives would have entered English if a) there had not already been a tolerance in English for non-standard usages of this type and b) either of these truly resulted in mis-communication. Note that 'misplaced modifiers', which can in some instances cause miscommunication, are a different question, and are probably frowned on by most speakers IF they become aware of the ambiguity. In Lojban, of course, the speaker WILL be more aware of the ambiguity - at least so we hope.

49. dan: (continuation of 1., from 43.) In the unlikely event that a native Lojban speaker ever exists, it will probably actually be speaking its parent native language with some version of Lojban vocabulary.

50. cowan: (responding to 49.) I presume you mean "parents' native language". As I mentioned above, its parents might not have the same native language.

51. dan: (continuation of 1., from 49.) But even that is unlikely since even the phonology (like everything else in the language) is arbitrary, and it is questionable how easy it would be for a child to learn.

52. rjohnson: (responding to 51.) Isn't the phonology of any language arbitrary in this sense? No language avails itself of all the possibilities.

53. dan: (responding to 52.) Yes, but certain combinations are unlikely to occur.

54. cowan: (responding to 53.) I don't understand this claim. The phonology is the least arbitrary thing about the language. Lojban has six vowels and 18 consonants, all of which are exceedingly familiar and found in many languages world-wide: German, for example, has all of them (although Lojban 'j' is rare in German and found mostly in borrowings from French). On the suprasegmental level, Lojban has two levels of stress (primary and weak) and significant pauses; where "pause" may represent either a complete silence or a glottal stop. Tone is not significant, as mentioned above.

55. dan: (responding to 54.) See what I mean about arbitrary? The Lojban engineers have decided that tone isn't important and that pauses are the same as glottal stops. This is lunacy!

56. rjohnson: (responding to 54. and 55, also 1.-8.) By the way, both of you [cowan and dan] are abusing the term "tone". You're talking about pitch. Tone, by definition, involves significant pitch contrasts. You can't have tone be unimportant in a language. If morphemes are systematically contrastive in pitch, the language has tone; if not, there is no tone.

57. dan: (responding to 56.) Guilty as charged. Sorry about that.

58. cowan: (responding to 56.) Thanks for this correction.

59. cowan: (responding to 55.) Of course it's arbitrary in the sense that we select some features of the total human phonological repertoire and not others, but so does every natural language. The phonemes we use are found in many natural languages, and there exists at least one natural language (viz. German) that contains all of them. The consonant clusters and diphthongs we use are also all to be found in natural languages. We go to some pains to prevent difficult clusters like *td or *fz; we also limit which consonant clusters can be used initially to a subset. Pauses and glottal stops are the "same" in Lojban in the sense that they are allophones. In German, the phones [r] and [R] are the "same" in exactly the same sense: they are allophones of /r/ in free variation.

60. lojbab: (responding to 55.) Tone is reflected poorly or not-at-all in writing systems of the world, as is pitch and speech rhythm. Audio-visual isomorphism therefore precluded these being critical to disambiguation and we chose better ways to convey the equivalent meanings. In each case where we did so, a similar mechanism is found in some natural languages. For example, in French "est-ce que" almost exactly parallels Lojban 'xu'.

61. dan: (responding to 60.) Which is one of the many reasons that linguists concentrate on spoken language.

62. lojbab: (continuation of 60.) Pause in Lojban is used only to preserve morphological distinctions. For example, you must pause before a [word-initial] vowel to protect against it being absorbed into the previous word either as a final vowel in a consonant-final word or as a diphthong. A glottal stop provides similar separation of sounds; hence it is phonemically equivalent to a pause. In neither case was the decision arbitrary; we had a good reason for each. This is in general true throughout Lojban - a decision to choose one form over many was primarily to achieve unambiguity. In other circumstances, we chose the least restrictive form possible (thus making tense, number, gender, etc. optional and hence more highly marked forms).

63. dan: (continuation of 1., from 51.) In typically blundering fashion, the Lojban engineers have ignored this issue, concentrating entirely on the learnability issue for SECOND language acquisition, that is, adults learning a second language, with no native competence.

64. cowan: (responding to 63.) (You raise an interesting side issue here. Do you argue a priori that persons learning a language as adults cannot achieve competence which is empirically indistinguishable from that of native speakers?)

65. dan: (responding to 64.) I guess I do. A Native French speaker might learn English well enough to be indistinguishable from a native English speaker, but he or she will not have native competence. In other words, you cannot ask that speaker a question regarding something like say, contraction and get a truthful answer.

66. daj: (responding to 65.) Even worse, you would never be able to use this speaker as a guinea pig in a SWH test, since he would be a native speaker of two languages, so his perception of the world would be conditioned by both. This would be true for any bilingual speaker, it seems to me. So you'll never be able to test the SWH until you have a "pure strain" of Lojban speakers.

67. cowan: (responding to 66.) Some Lojbanists agree, and say we will need to wait for a second generation. Another viewpoint is that by having people who speak Lojban+English, Lojban+French, Lojban+Vietnamese, Lojban+Navajo, etc. etc. we will be able to factor out the Lojban contribution when compared with people bilingual in two natural languages. ("Bilingual" here means "bilingual within the acquisition period".)

68. dan: (continuation of 65.) E.g. In English, one can contract words like "he" and "is", but only in particular circumstances. Hence: He's a nice boy Isn't he a nice boy?/* yes, he's The starred sentence is ungrammatical, the contraction is not acceptable in that position. It is acceptable in the first sentence. A native French speaker who knows English might be able to guess on that, but he or she certainly would NOT have a reliable intuition on the matter.

69. rjohnson: (responding to 68.) I have to agree with Dan here, sort of. I don't think the distinction to be made is between L1 and L2 competence, though, but between critical-period learning and post-critical-period (or "adult") learning. I think it's pretty clear that they're two different processes (though of course they may share some features). An adult learner may indeed learn a language well-enough to pass an operationalist sort of test (i.e., be indistinguishable from a native speaker), but shouldn't be taken as a reliable judge of grammaticalness.

70. cowan: (responding to 63, continuation of 64.) We know that the phonology is learnable by children, because it is a subset of phonologies which children can and do learn. We have every reason to believe that the vocabulary is learn- able: the words are similar in morphology to those existing in natural languages, and the consonant clusters and diphthongs are all to be found in natural languages.

71. dan: (responding to 70.) Yes, but if there is a theory of phonological universals, then it is argued that certain combinations simply won't ever occur. Did the Lojban engineers take this into account, accept at the most rudimentary level? I doubt it.

72. cowan: (responding to 71.) What do you call "rudimentary"? [Brief summary of Lojban phonology omitted.] The rules are arbitrary, yes, but I should like to be shown wherein they are unlearnable. Furthermore, they need to be known only to people inventing new words: several of them are relaxed for borrowings and names.

73. lojbab: (responding to 71.) An interesting conditional, that first sentence. Is Dan claiming that there is a theory or not? Is he claiming that certain combinations won't occur? He seems to be claiming that Lojban has combinations that cannot occur but gives no examples. He'll have trouble finding them. We did indeed take phonological universals into account in several ways. In the first place, as John Cowan mentions, the set of permitted sounds was selected as a subset of those found in many languages. We constrained consonant clusters by restrictive rules that recognize phonological properties like voiced/voiceless assimilation and included redundancy as a criteria in assigning words, reducing the number of minimal pairs distinctions. We added the apostrophe to prevent unwanted diphthongization; it represents devoicing of the glide between two adjacent vowels. In addition, the frequency of sounds in predicate words should statistically parallel the sum of the corresponding frequencies in our six source languages. (For those unfamiliar, most of Lojban's predicate root words are formed by maxi- mizing the appearance of phoneme patterns found in those source languages weighted by approximate number of speakers.) I would say that more time has been spent overall during Loglan/Lojban's history on the interaction between phonology and morphology than on any other single feature of the language. This is probably because it is the best docu- mented feature of the design and also the most easily compared to other languages.

74. cowan: (responding to 63, continuation of 70.) What we don't know is whether the grammar is learnable by a child. We won't know that until the experiment is tried, first by raising a bilingual or trilingual child, and then eventually as part of a community of monolingual speakers.

75. lojbab: (responding to 63.) We've hardly ignored the question [of learnability by children]. However, from what I've read, children learn lan- guages from adult role models. We need adult fluent speakers therefore in order to teach children. Within the next two decades at least, all such adults will be 2nd language speakers. So why not concentrate now on what we can do some- thing about.

76. dan: (responding to 75.) My point from my first posting on has been that I can't imagine any child being able to acquire something as baroque as Lojban in its current form. My understanding of acquisition is that non-ambiguity is sacrificed in favor of learnability.

77. cowan: (responding to 76.) Maybe so. After all, the English my daughter spoke at the age of two was hardly "acceptable" as a full adult English, although now (at three) her English is clearly acceptable (she seems to be a bit in advance of her age-mates in this respect). There is no reason to think that a Lojban-speaking child would be different. In one respect, some of the simpler Lojban constructions like observatives (bare predicators without arguments) are more analogous to young-child linguistic forms. The English utterance "Dog!" is a bit deviant, in that English-speakers would think it rather odd for an adult to say simply "Dog!" on seeing a dog, but for a child this utterance would be quite acceptable. The exact Lojban translation "gerku", on the other hand, is fully grammatical and not at all deviant.

78. lojbab: (responding to 76.) Baroque? Compared to natural languages, Lojban is incredibly simple, and children acquire natural languages (else they would not be 'natural'). Now whether Lojban will be seen as simple to a child is a valid question, but there is no reason to believe otherwise, and we'll know soon enough. How can non-ambiguity be sacrificed in favor of learnability in natural languages acquisition? They aren't unambiguous in the first place. To whatever extent there IS unambiguity, the sheer complexity and irregularity of most of the language would overwhelm this. Lojban, being so much simpler to express unambiguously, MIGHT be able to be acquired unambiguously or at least relatively so (with the child growing into more accurate usage with age and understanding just as children of the natural languages do).

79. dan: (responding to 78.) I was suggesting that ambiguous languages are easier to learn than unambiguous ones. There aren't any unambiguous natural languages that I know of, so it's difficult to test this. An unambiguous language would require enough additional baggage, that it would make learning it unwieldy. An ambiguous language has fewer rules. And just for the record, let's get things straight with regard to our definition of "rules". By rules, I mean rules that are used to characterize the language, not rules in the prescriptive sense. The average child learns his or her language (barring language disorders or highly unusual circumstances) quite rapidly, ambiguity and all. As to whether Lojban is baroque or not, the question is this: If there were hypothetical native speakers of Lojban, how complicated would an abstract characterization of their competence be? If such an abstract characterization were more complicated than a similar characterization of say, Klammath, then I would stand by my assertion. Of course, one might beg the question and ask whether such abstractions are meaningful at all (as the Schankians do), but that's a whole other ball o' wax (quite interesting too).

80. lee: (responding to 76.) The discussion of irregularity might profit from distinguishing types of irregularity: (1) semantic irregularity - no one-to-one correspondence between form and meaning, as for example when phonological changes produce variations in the form of a stem; (2) morphological irregularity - no uniform way of deriving related words, as in the examples of archaic paradigms; (3) distributional irregularity - certain combinations of forms (or features) are not permitted, for instance when obligatory phonological changes eliminate some phone(me) combinations; (4) form class irregularity - it is not possible to distinguish forms or their categories directly from their pronunciation, as when a phonological change is extended from word-internal to cross word boundaries, making it more difficult to tell where words begin and end. Then it's interesting to catalog the various ways that changes which remedy one sort of irregularity may create others.

81. lojbab: (responding to 80.) Each of these has a corresponding 'ambiguity', as well, in which various degrees of inconsistency and inconstancy exist in the rules for building and interpreting forms of each of these types. Lojban has defined regularity and unambiguity in the last three. We can expect to directly observe the causes and effects that result in changes in these areas.

82. lojbab: (continuation of 75., responding to 63.) There are several Lojbanists that have indicated intent to try to raise their children as bilingual Lojban/natural-language speakers, probably the best that can and should be attempted until/unless Lojban proves its value. I certainly wouldn't ask anyone to raise children solely Lojban-speaking; it would smack of human- experimentation to me (an issue I'm fairly sensitive on).

83. dan: Some Lojban propaganda claims that the language has been characterized by a transformational grammar, but this has never actually been demonstrated, and seems quite unlikely, since I would imagine that a native speaker would be required to characterize a Lojban-user's competence. Since there probably will never BE a native Lojban speaker, how can you possibly ask one whether XXXX is an allowable sentence or word of his or her language? Current Lojban speakers are of no use, because they do not have such intuitions about the language any more than a fluent second-language speaker of French (a French speaker whose native language is say Hindi) would have such intuitions about French.

84. cowan: (responding to 83.) This illustrates a confusion between natural and constructed languages. In a natural language, the source of competence is the native speaker's intuition. In a constructed language, during the construction phase (which Lojban is still in, though rapidly coming to the end of it), competence is defined by the constructor. A grammatical Lojban sentence is what we say it is, where "what we say" is defined by the baselined vocabulary lists and machine grammar. The reference for syntactic correctness is a parsing program, and when a Lojbanist utters something the program can't parse, we say that he has made an "error".

85. dan: (responding to 84.) Once again, completely arbitrary. In English, or any other natural language, grammaticalness is also defined by what we can say and understand. "I ain't got none" is perfectly grammatical, because people use and understand it all the time. Only English teachers and guys like John Simon sit around and contemplate (by their own arbitrary standards) whether or not it's okay to split infinitives and use "hopefully" right. The rest of us just do it.

86. cowan: (responding to 85.) Correct, and therefore for a natural language like English, the only way to determine the grammar is by {in,intro}spection. But this has nothing to do with the grammar being in transformational form, i.e. a set of PS rules generating a deep structure with a set of T rules generating the surface structure from them. Such a grammar has not been fully worked out for Lojban, but is clearly not impossible in principle. It also happens to be the case that PS rules are sufficient to generate the whole of the language's surface structure all by themselves (probably not true of English), although the PS-only version of the grammar which we have now baselined does not explain semantic equivalences of different structures.

87. cowan: (continuation of 84.) But this will not always be so. When the language is fully defined and baselined, it will be "launched" and the normal processes of linguistic change will be allowed to operate. We expect that some grammatical forms, vocabulary items, etc. will be "pruned" because nobody uses them. They will remain in the formal language definition, available to all speakers in the same sort of way that archaic grammar or vocabulary forms are available to speakers of natural languages: viz. if they take the trouble to look them up. At that time it will be appropriate to consult human speakers (and AI programs, if any) to investigate correct linguistic behavior a posteriori.

88. dan: (responding to 87.) Org! What a mess! "Correct" linguistic behavior? Lojban will be a linguistic battlefield with prescriptivists running around telling people that they can't say such-and-such a sentence, because it can't be parsed by Lojban's computationally sound grammar (verified by a genuine computer!).

89. cowan: (responding to 88.) Don't be silly. Of course Lojbanists can do that if they want to, just as speakers of English and other languages can if they want to. Again, you are ignoring the difference between a language that is born a priori and one that isn't. After the language is delivered from the womb, anything can and quite probably will happen in the way of changes, which will not be dictated from above.

90. lojbab: (responding to 85.) Not true for English, really, nor for all natural languages. English is of course not even a single language in the sense that there are many dialects spoken around the world [not all 100% mutually understandable]. Many of these do not use constructs found in the 'standard language', even though they are obviously understood by their listeners. But how could we say this if we didn't have a concept of what the 'standard language' is, which is distinct from what we say and understand. (Of course, the definition of standard language varies from country to country, too. British speakers would even less accept some of Dan's Americanisms, and in some cases might misunderstand them. (Actually, there is some variation among 'standard Englishes', as well, as evidenced by differences in the various published style manuals.)) In addition, each language has registers, in some of which certain constructs may be permitted, but which in others are unacceptable. Try using "I ain't got none." in a journal paper. In other languages, such as Japanese, registers are so structured and formalized as to almost make for independent languages. Understanding is not a sufficient criteria for grammaticalness..

91. dan: (responding to 90.) This is where I disagree most strongly. To my mind, grammaticalness. is determined solely by whether a member of a speech community finds a given utterance acceptable. Members of my speech community will, if they put their biases aside, admit that "I ain't got none" is a perfectly acceptable sentence.

92. cowan: (responding to 91.) Northrop Frye tells a story about going to a hardware store and asking for something or other, and being told "We haven't got any". The speaker then glanced at Frye and added, "We haven't got none." This remark, says Frye, has what literary critics call texture: it means 1) we haven't got any, and 2) you look to me like a schoolteacher, and nobody's going to catch me talking like one of those. The "bias" in question is part of an English-speaker's competence, which is not limited to separating the intelligible from the unintelligible, but also can separate what kinds of grammatical constructions may be used by what speakers in what situations. *"Lazy the jumps fox quick dog brown over the" is ungrammatical in all situations. *"Me see she" is probably also ungrammatical in all situations, although perfectly intelligible. *"Mama like pretty spoon" is good toddler-English but unacceptable adult-English. *"I ain't got none" is ungrammatical in some dialects (mine, for example) and entirely grammatical in others. *"For all x, for some y, such that x is a man, such that y is a fish, x loves y" is grammatical to me, but many native speakers would reject it as almost as unintelligible as my first example. I have asterisked all of these examples as ungrammatical for some speakers in some situations.

93. lojbab: (continuation of 90.) And of course, for many nations there are academies that dictate the standard language for that nation (I use nations instead of languages since, for example, Brazil has an academy separate from that of Portugal, although both work together at times.) English has no academy, but this is an exception. Therefore we end up with individuals setting themselves up as a self-appointed 'academy'.

94. dan: (responding to 93.) Thank God we don't have such academies. Take a look at how much attention is paid to such academies too. French speakers are constantly being advised to avoid English borrowings like "Picque-Nique" and "Le Weekend" or "Fair du ski", but they use them constantly and of course they should be allowed to if they want to.

95. cowan: (responding to 94.) Discussions of "allowing people to do things" are political, not linguistic. Linguistics as such is silent on the subject of what people "should" do, permit, or forbid. "Does a rock roll down hill because it wants to or because it has to?" An animist would plump for the former reply; most educated Westerners, probably the latter. But a pure operational scientist would reply "Neither. Rocks simply do roll down hill, that's all."

96. lojbab: (continuation of 90.) This does not make 'academies', or language prescription 'wrong'. Dan's libertarian view of language is understandable given his American and English language cultural values. In addition, there is a difference between the prescriptive/descriptive debate from the point of view of linguists as opposed to that of regular speakers. Most people, for example, expect a dictionary to be prescriptive, even thought the linguists who write them disagree.

97. dan: (responding to 96.) I prefer "anarchistic" to "libertarian" for personal reasons  :-)

98. lojbab: (continuation of 90.) Lojban has a valid reason (unambiguity) to prescribe its standard form. If Dan chooses to learn Lojban, and then chooses to deviate from those standard forms, he may be expanding the language. Of course, he also may have trouble getting his computer to understand him. Since ideally Lojban's target 'speaker' population may include computers, failure to express himself so that the computer understands him (unambiguously) means Dan is speaking ungrammatically even by his own definition.

99. dan: (responding to 98.) Whaaaat? The goal of Natural Language Understanding should be for the system to understand human languages, not for human speakers to alter their speech so that a computer can understand it. Since we've already established that Lojban isn't unambiguous, any Lojban NLP system is already going to be having a hissy fit over plastic cats.

100. cowan: (responding to 99.) Of course. But such a Lojban NLP can 1) recognize unambiguously that it has detected an ambiguity, 2) ask for help, and 3) get an unambiguous response. If a Lojban computer sees "slasi mlatu" in its input, it can ask "lu slasi mlatu li'u ta'unai pei", literally "quote plastic cat unquote expand-the-metaphor how?" and expect a response such as "lo mlatu poi ke'a cidja lo slasi", literally "a cat such-that it eats plastic", or else "lo mlatu poi zo'e zbasu ke'a lo slasi", literally "a cat such-that something makes it from plastic". And other responses are of course also possible.

101. dan: (continuation of 99.) Besides, many prescriptivists have used the same arguments against various "slang" forms. The argument against "double negatives" is that they are "illogical". The fact that no one seems to have a bit of trouble understanding them doesn't matter I suppose.

102. lojbab: (continuation of 90.) Some other 'natural languages' are indeed defined exactly as Lojban is, by an a priori 'committee' that selected the valid forms. Norse, Modern Hebrew, and several African languages were defined by some nationalists taking features from other languages used by the target population (and in the case of Hebrew, from incomplete knowledge of a dead language), and arbitrary features sometimes where the several languages collided. These all became living natural languages. Why can't Lojban, which is merely doing the same on a grander scale?

103. dan: (responding to 102.) I would imagine that all of them underwent creolization, which seems to be nature's way of smoothing things out, linguistically. If Lojban develops a native speech community, then it will undoubtedly do the same, probably in all of the worst sorts of ways (the moral equivalent of "I ain't got none" in Lojban) and Lojban will be yet another zany, irregular, ambiguous, beautiful language. In other words, what's the point?

104. cowan: (responding to 103.) Well, perhaps you are right. Then we'll have learned something. And perhaps you are wrong. And then we'll have learned something else. That's what makes this experimental linguistics.

105. cowan: (continuation of 87.) There will also be growth in the language: technical terms in all fields will be borrowed and Lojbanized as needed; new compounds will be freely created, and it is even possible that new grammatical constructions will be built by usage, although we have really tried to be quite comprehensive in this domain. I don't understand what the stuff about transformational grammar vs. any other kind has to do with this issue. A transformational grammar is simply certain kind of formal description. Doubtless many natural languages exist of which no transformational grammar has ever been given: do TG [transformational grammar - a linguistics theory] advocates doubt that such grammars are possible a priori?

106. dan: (responding to 105.) TG is a formal description that requires native speakers to confirm. Even you have admitted that there are no native speakers of the language. How can there be a transformational account of a language without native speakers? Yet Bob LeChevalier told me point blank that such a transformational account did exist.

107. cowan: (responding to 106.) I believe what Bob meant to convey was that an investigation had been made to see whether the semantic equivalence of certain Lojban constructions could be represented by T rules which would transform certain syntax trees into other trees in a meaning-preserving way. Indeed, this can be done, although it has not been done for every detail of the language. Again, I see no difference between TG formal descriptions and others in this respect. Every formal description of a natural language requires speakers of that language to confirm or disconfirm it, but a constructed language is launched with an a priori formal description from which (or from simplified/clarified forms of which) new speakers learn. Think of Lojban as being spoken by people who live so far away that we can't ever go there to talk with them, but they have sent us some of their Lojban as a Second Language materials used for instructing their neighbors in their lan- guage. Magically, these materials have been translated into English. Some of us now learn this language and begin to speak it. Our children hear us speaking it and either learn it natively (i.e. as other languages are learned) or else they don't. Either way, a datum for experimental linguistics. A board of psychologists then administers some tests to us and our children to see if either population thinks differently (in some sense) from a matched control group. Another datum for experimental linguistics. Many generations pass and the language undoubtedly changes. All this history is forgotten. A Linguist (capital L) comes on the scene and decides to study this language called Lojban; perhaps he is himself a native speaker. He re- cords, using whatever linguistic theory is current at that time, a model of the grammar (a posteriori) of the language as it is spoken then. An archaeologist digs up a copy of the original Lojban textbook, machine grammar, etc., and his- torical linguistics goes to work reconstructing the way the language has changed. Why not?

108. rjohnson: (responding to 106.) Dan, you're conflating the formal (mathematical) and the psychological issues here. A transformational grammar is simply a class of formal device for characterizing (generating) sentences. it has nothing to do with competence. You could (and do) have transformational grammars for characterizing computer languages, strings of arbitrary symbols, etc. "Transformational" belongs in the same paradigm as "phrase structure", "finite state", "indexed" and so on; these are classes of grammars, not empirical theories.

109. dan: (responding to 108.) I suppose you're right again, although perhaps my studies in Montague Grammar have made me lose sight of psychological vs. mathematical distinctions :-) Seriously though, one does rely on grammaticalness. judgements when trying to determine if a certain movement is viable: for example in the case of "wanna" contraction: 1 a. Which movie(t) do you want to see? (t) b. Which movie do you wanna see? 2 a. Which team(t) do you want (t) to win? b. *Which team do you wanna win? The presence of the trace in (2) between "to" and "want" blocks "wanna" contraction.

110. rjohnson: (continuation of 108.) The (now moribund) theory of Transformational Grammar, on the other hand, is a set of claims about linguistic competence, largely abandoned by generativists in favor of GB [this, as well as other jargon terms in this paragraph, is a linguistic theory of grammar] and other systems. Among these claims is the idea that the basic data are the grammaticalness. judgements of native speakers. But this has nothing to do with the formal notion of transformations, and can be applied in LFG, GPSG, dependency, or just about any other formal framework as well. The original poster [cowan], quite properly, kept the two levels separate.

111. dan: (responding to 110.) Well you're probably right again. I'm not a professional linguist yet - only a Cognitive Science type.

112. rjohnson: (continuation of 110, also responding to 46.) Of course you [assume UG]. You're an MIT student. For most of the rest of the world, however, the jury is still out, and it's a mistake to assume what you're trying to prove.

113. dan: (responding to 112.) I'm not actually, I just post from here :-( I don't want to misrepresent myself as an MIT linguist. I studied cognitive science as an undergrad at Hampshire College, with a strong bias towards linguistics. As you can see, I play fast and loose with some of the terminology. As for assuming what we're trying to prove, isn't that the crux of this argument? Most Chomskian linguists assume UG, and most Lojbanists assume Sapir/Whorf. In the words of The Brady Bunch "I guess we've all learned a valuable lesson".

114. kimba: (responding to 113.) The point was supposed to be, if you are slamming someone else's assumptions, the least you can do is write your own in black ink in a clear and legible hand, rather than saying (effectively) "this is inconsistent with UG and therefore wrong". As I ought, if I were actually saying anything:-) I find neither [UG nor SWH] particularly convincing or illuminating.

115. lojbab: (responding to 106.) The claim I made is that John Parks-Clifford, a linguist involved with Loglan since 1975, told me that he investigated 1970's Loglan using TG techniques during the 70's and was able to demonstrate to his own satisfaction that all features of Loglan were amenable to TG analysis, and that he found no 'unusual' transforms. More recently, a student in Cleveland has been attempting to develop a more formal TG description of the language. This will undoubtedly take a while, but he reported to me earlier this year that not only had he found nothing unusual, he had identified some elegant features of the language using TG techniques. The features he reported are indeed con- sistent with the language definition, and included aspects that the student had not been taught (i.e. that we had not put into any published documents that the student had received.

116. dan (conclusion of 1., from 63.): Ultimately, the enterprise of Lojban is at best an intellectual puzzle, and perhaps on this level, it is interesting. To learn a "language" (perhaps "code" would be better) like Lojban, based on principles of logic can be seen as the equivalent of a Pig-Latin for intellectuals and engineers.

Subject: Lojban: is it naive?

Participants: cowan@marob.masa.com (John Cowan) daj@beach.cis.ufl.edu (David A. Johns)

1. [The following exchange between cowan and daj began with a one-liner from daj that Lojban was "naive". cowan wrote back privately to ask "Why do you say that?"]

2. daj: Well, the three things that jump out at me right away are: (1) You can't design a culture-free language. Simply the choice of categories to represent in the language (tense, aspect, definite- indefinite, etc.) are culture-bound. In addition, there's a lot of talk in that description about using metaphor to extend the bare bones of the language. Can there be anything more culture-bound than metaphor (not the mechanism, but the choices of images)?

3. cowan: (responding to 2.) Absolutely correct. Lojban is not a culture-free language; every language creates its own culture if the SWH is correct, and we assume it correct (its falsity is the null hypothesis) for purposes of the Lojban experiment. Assuming SWH, then lei lojbo 'the mass of those pertaining to Lojban' will create their own culture, with its own metaphors and characteristic idioms.

4. daj: (responding to 3.) Then what's the point of the language? All you would end up with is a bunch of creolized Lojban daughter languages, wouldn't you?

5. cowan: (responding to 4.) We hope not. Of course in the very long term that can happen to any language: Latin split into lots of daughters, some of which are more or less heavily influenced by other languages (Rumanian being the prime example). The idea is that Lojban ways of thought (assuming there are such things) will influence the creation of Lojbanic culture.

6. cowan: (continuation of 3.) Lojban deals with the category problem (which we refer to as the "metaphysical assumptions" problem) by minimizing required categories. Tense, aspect, and definiteness are optional categories of discourse in the language, but can be represented when needed. We can also represent things like the observational status of assertions, the emotional attitude which goes with them (there is an entire set of paralinguistic grunts for expressing emotions), and so on.

7. daj: (responding to 6.) Since every known language (as far as I know) has a set of required categories, they must fulfill some function. Again, real speakers would make the categories compulsory and create something different from the original design.

8. cowan: (responding to 7.) Maybe, maybe not. Since the non-required categories are expressed by marked forms (using the particles), sentences that don't express categories are always possible. Again, they might come to seem archaic or childish, but that's a second-order effect. When a 2-year-old says "Dog!" we usually consider that a bit deviant, but the Lojban literal translation "gerku" is fully grammatical Lojban - a predicate with all arguments elliptically omitted.

9. daj: (continuation of 7.) Another point. A few weeks ago you posted a list of Lojban pronouns. It struck me then that this paradigm was probably too rich for human language. This is just a gut feeling, but it seems to me that in real languages the number of elements in a contrastive set is pretty severely limited.

10. cowan: (responding to 9.) Depends on what you mean by "contrastive". The 43 Lojban pronouns are indeed contrastive in the sense of being interchangeable in the grammar, but they aren't semantically interchangeable. They fall into several categories: personal, bound-variable, free-variable, question, relativized argument, reflexive, demonstrative, pro-utterance, pro-argument, and indefinite. Within each category there are only a few pronouns (or "anaphora" more technically - "ba'ivla" in Lojban). Grammatically, "do" and "dei" are interchangeable, but no one will confuse "you" (the listener) with "this utterance I am now uttering"!

11. daj: (continuation of 7., from 9.) I can see that it would be possible in some cases to have people speaking different dialects of the same language, where each dialect over-specified some categories from the point of view of other dialects. After all, we don't really have much trouble understanding Chinese speakers of English who simply eliminate the verb tense system and replace it with adverbs. But I don't think this would work with the pronouns, since a listener wouldn't know what any given pronoun meant without knowing the entire set.

12. cowan: (responding to 11.) Correct. On the other hand, it may be that lots of the ba'ivla don't come up much. For example "da'e" meaning "a far future utterance" probably won't be used very often, and someone who doesn't understand it or even recognize it may still be quite a fluent speaker. One can speak English fluently without knowing "thou", for example, although certainly it is a personal pronoun contrasting with "I" and "you" and the rest. The occasions for its use (in Modern English) just aren't that common.

13. daj: (continuation of 2.) (2) If you're going to design a language that people are actually going to speak, you're going to have to deal with whatever it is that leads human languages to be the way they are. One obvious universal of real language is a floating equilibrium between ambiguity and redundancy. If you want to design a language without ambiguity, you'll have to figure out what role ambiguity plays and compensate for the loss. There are many other characteristics like this, such as why semantically external predicates like negation and tense tend to become reduced and attached to internal pieces of a sentence, etc.

14. cowan: (responding to 13.) Lojban is not free of ambiguity, only of phonological and syntactic ambiguity.

15. daj: (responding to 2.) First phonological ambiguity. In your original posting you gave examples which seemed to indicate that Lojban words were polysyllabic, with syllable-initial stress. I assume that your claim that analysis of the input stream into words was unambiguous has to depend on that stress placement - in other words, a word begins where a stress occurs and includes all following unstressed syllables. But in natural languages, there are unstressed words - clitics - plus other uses of stress for phrase boundary identification, discourse function, etc. How are you going to prevent phonological ambiguity from creeping into Lojban?

16. cowan: (responding to 15.) I must have misled you. Lojban stress is as follows: stress on content words ("brivla") is penultimate. All root brivla are two-syllabled, so stress appears to be initial. Structure words ("cmavo") are one or two syllables and may be stressed freely. A structure word with final stress immediately followed by a brivla must have a separating pause (which can be a full pause or just a glottal stop). Thus in "le bridi", "bridi" has penultimate stress; if "le" is unstressed it can be proclitic [sounded together with the following word], whereas if it is stressed a pause is required to forbid the reading "lebri di". Names have free stress, which must be indicated by capitalization in writing when it is not penultimate. Names are always followed by pause, and must be preceded by either pause or one of the cmavo "la", "lai", "la'i", or "doi" (the first three are articles, the last a vocative marker). These same cmavo may not be embedded in names, so "*doil" for "Doyle" is not a valid Lojban name; it would have to be "do'il", roughly "Dough-heel". (The Lojban ' character represents IPA [h], or more accurately a voiceless vowel glide.)

17. daj: (continuation of 15.) And then there's syntactic ambiguity. Math/logic notation has an extremely powerful device for preventing ambiguity - parentheses. With parentheses you can resolve "old men and women" into either "((old men) and (women))" or "(old (men and women))." It's hard to imagine anything like this in natural language that could operate at more than one or two levels of embedding. Even with all kinds of contrastive stress and artificial intonation breaks we can't read even slightly complicated math formulas so that they can be written down correctly.

18. cowan: (responding to 17.) Lojban has lots of kinds of parentheses: "ke" and "ke'e" for Boolean connective groupings, "vei" and "ve'o" for strictly numerical/mathematical parentheses, "to" and "toi" for discursive parentheses (like these). These can be stacked up as required. Of course, if things get too complicated people may not be able to understand what is said, but English has that problem as well. "The cheese that the mouse that the man that the woman married chased ate rotted" is grammatical, but not intelligible due to stack overflow in the listener. But the words do exist as a regular part of the language: if the worst comes to the worst, the listener could write down what is said verbatim, pass it through a machine parser, and figure out exactly what is bracketed with what. This ability could be quite useful for things like draft- ing regulations, which are notoriously ridden with unintentional ambiguity: having a parser looking over your shoulder as you write such a thing would help you in seeing ways in which your listener/reader could get confused, and clarifying them.

19. daj: (continuation from 15., from 17.) Also, once you allow idiomatization into the language, you're going to have syntactic reanalysis, which will produce syntactic ambiguity. For instance, every language has some way of embedding one sentence inside another, and as far as I know, they all have ways of reducing the information in the embedded sentence. For instance, take a structure like (I like (I swim)), which can be realized as either "I like swimming" or "I like to swim." It's pretty clear that the action indicated by "swim" is subordinate to the main verb "like." On the other hand, I don't think anyone would analyze "I am swimming" as (I am (I swim)). Here we think of "am" as being a marker on the main verb, so that the structure is [something like] (I (am swim)). But both structures are realized in actual speech as V-V sequences, and there are many such sequences that are hard to classify: "am to," "am going to," "am supposed to," etc. This sort of reanalysis is extremely common and probably unavoidable in any real language.

20. cowan: (responding to 19.) I'm not sure how to comment on this. However, I guess the best point I can make is that in Lojban, the "surface structure" is quite close to the "deep structure". We simply do not have things like embed- ding and tense marking being realized with the same forms. (I like (I swim)) comes out "mi nelci le nu mi limna" which is "I like the event-of I swim". (I (am swim)) comes out "mi ca limna" which is "I now swim". The first form could be collapsed into "mi limna nelci" = "I swimly like", which is one of the forms which is explicitly marked as semantically ambiguous: the exact way in which the liking is a kind of swimming is not indicated. This process of making a "tanru" (Lojban for "open compound") is a kind of Lojban transformation, and the current grammar does not express it - it is a grammar of surface structure alone, but a surface structure that is more like the deep structure of other languages. This is the kind of embedding we call "abstraction": there are also other embeddings, involving description, relativization, metalinguistic comments, etc.

21. cowan: (continuation of 14.) Metaphors (which, as you say, are fundamental - they are Mandarin-type metaphors and really correspond more to nominal compounds in English) are semantically ambiguous, and there is also ambiguity in names and through the extensive use of ellipsis and defaults: the full translation of a simple utterance like mi klama is 'I/we go to somewhere, from somewhere, via some route, by some means'.

22. daj: (responding to 21.) But as soon as you allow these metaphors, you've compromised universal comprehensibility, which I assume is one purpose of the language. Do you think a Mongol tribesman would understand "heart ache," "dog days," etc., or indeed would he have any way of knowing that "back stabber" wasn't to be taken literally?

23. cowan: (responding to 22.) There is a subtle point here. There is a marker for "figurative speech" which would be used on "back stabber" and would signal "There is a culturally dependent construction here!" The intent is not that ev- erything is instantly and perfectly comprehensible to someone who knows only the root words, but rather that non-root words are built up creatively from the roots. Thus "heart pain" would refer to the literal heart and literal pain; what would be ambiguous would be the exact connection between these two. Is the pain in the heart, because of the heart, or what? But "heart pain" would not be a valid tanru for "emotional pain", absent the figurative speech marker. It is "malglico" (#*$@ English).

24. daj: (continuation of 22.) In natural language words exist in paradigmatic sets: "No contrast, no content." The meaning of "mi klama" would be determined in any single dialect by the categories that had become compulsory in that di- alect. In other words, "I go" does not mean the same thing as German "ich gehe," because in English it contrasts with "I am going," while in German there is no such tense.

25. cowan: (responding to 24.) Each root word in Lojban expresses an N-place predicate, and its meaning is defined by the significance of the N places. Thus "klama" is a 5-place predicate meaning "A goes to B from C via route D by means E". The Lojban design maintains that these five places are an essential part of the meaning of "klama", and that any state of affairs not involving an agent, a destination, an origin, a route, and a means is not validly captured by the word "klama". Most roots have 1, 2, or 3 places, and 5 is the maximum. Additional places (such as the time, the location, the purpose, etc.) can be expressed as well by an extensible set of tags, but they are not considered essential to meaning. In the case of "klama" there is no word which precisely "contrasts" with it in the sense of having exactly the same five places, although "benji" (A transfers B to C from D via E) and "muvdu" (A moves B from C to D via E) come close - the difference is that "muvdu" and "klama" involve physical objects, whereas "benji" doesn't necessarily. But all Lojban predicates with the same number of places contrast in that they are freely substitutable, although perhaps nonsense-producing.

26. cowan: (continuation of 14., from 21.) Negation, tense, etc. can be expressed either externally through the semantics or internally through the grammar. Negation in particular has gotten a great deal of attention: we split it into contradictory negation (with na or naku), contrary/ polar/scalar negation (with a variety of particles for simple contrary, polar opposite, and "scale neutral"), and metalinguistic negation (with na'i).

27. daj: (responding to 26.) Again, I think the evidence from natural language suggests that people won't tolerate very much paradigmatic indeterminacy. They will boil down all these choices to a few that seem particularly important to them.

28. daj: (continuation of 2., from 13.) (3) You can't design a language "not based on any existing languages." You might be able to choose totally arbitrary vocabulary, since vocabulary IS arbitrary, but interestingly enough, Lojban doesn't do that (words are based on U. N. languages as I remember). But in syntax the choices are limited, and Lojban seems to opt for a word-order language rather than a morphology language like Russian. Lojban is thereby biased toward languages that use word order to indicate structural relationships.

29. cowan: (responding to 28.) You remember correctly. The relevant languages are Mandarin, English, Russian, Hindi, Spanish, and Arabic, weighted according to the numbers of speakers, and using a phoneme-matching algorithm to assign words with the highest figures of merit relative to the six languages. This mechanism is a "marketing device" to make the vocabulary easier to learn for speakers of any of those languages, especially Mandarin and English. Word order plays a fairly limited role in determining meaning: it determines which arguments of predicates are which, but can be overridden. Lojban is really a particle language: almost everything about the grammar is determined by which particles are used and where.

30. daj: (responding to 29.) My mistake. But how do you come up with a culture-free list of particles?

31. cowan: (responding to 30.) Again, we can't exactly. We attempt to be superinclusive, as I said above. The list of particles is large (~550) and if anybody comes up with a construct which cannot be handled by existing ones, we add one. Hopefully this process is now complete. The last few things to come in included the observationals (which say "how the speaker knows", from Amerind languages), scalar negation, and the tense system, which is quite comprehensive (it covers space location and aspect as well as time). A few more may still need to be added to cover the needs of mathematics.

32. daj: (continuation of 2., from 28.) I could go on. One obvious area is how Lojban indicates discourse functions like old and new information components of a sentence (or clause), whether it is iconic in tense sequences, whether it prefers coordination or subordination, etc., etc. All these factors are going to make it look like particular languages. All of them are going to have to be specified if the language isn't going to break up into dialects based on the way speakers of other languages implement unspecified features in their own speech.

33. cowan: (responding to 32.) Discourse functions are handled by a large set of discursives, each of which has a polar opposite: things like specifically/generally, hypothetically/actually, metaphorically/explicitly, etc.

34. daj: (responding to 33.) These seem more pragmatic than discourse, but I admit the boundaries are fuzzy, and I may be using non-standard divisions. What I had in mind was the universally marked distinction between information that's already part of the conversation and information being introduced for the first time (in this conversation). English does it with articles (the/a) and intonation, Russian and Chinese do it with word order, Japanese does it with particles, etc., etc.

35. cowan: (responding to 34.) The nearest Lojban equivalent to the "the/a" distinction is the "le/lo" distinction. "le finpe" means "the fish, the thing(s) I describe as (a) fish". It may be a whale, or a mermaid, or indeed my cat Freddy: as long as the listener understands what is meant, "le finpe" is correct; "le" is non-veridical. "Lo finpe" on the other hand means "fish, a fish, some fish, the thing(s) that really is-a (are) fish". "Lo" is veridical and makes a claim; sentences containing "lo" are valid only if the thing is as described (they may be vacu- ously true otherwise, but probably a human listener would consider them ill- formed semantically).

36. cowan: (responding to 32.) I don't understand "iconic in tense sequences." Could you explain further?

37. daj: (responding to 36.) In many languages (Chinese is one, I believe) you can say "After I went home I went to bed" or "I went home before I went to bed," but you can't say "Before I went to bed I went home" or "I went to bed after I went home." Clause sequence has to match time sequence. I think it's even impossible in Chinese to say "I'm staying home because I've got a cold," since the presupposed cause has to precede the consequent. Many other languages, of course, have no such restriction.

38. cowan: (responding to 37.) Lojban has no such restriction. Of course, Chinese-native Lojbanists might be unlikely to construct Lojban sentences which violate this restriction, but they should be able to understand them passively if they are fluent in the language.

39. cowan: (responding to 32.) Coordination and subordination are both fully supported. Lojban features redundant structures: there are often many ways to say "the same thing" semantically. Lojban's specified grammar is not a transformational one, but that is not to say that a transformational grammar cannot exist or is trivial. Lojban has a "deep structure" even though we didn't design it to! Usage will decide, for example, whether the subordinating or coordinating versions of "A is true because B is true" will become dominant.

40. daj: (responding to 39.) But won't different versions become dominant in different areas? And if so, won't that defeat the purpose of Lojban?

41. cowan: (responding to 40.) Remember that the purposes of Lojban are threefold: 1) experimental investigation of the SWH; 2) communications with computers; 3) international communication. Purposes 2) and 3) are effective if everybody can understand every construct (or almost every construct) even if they do not often use them in their own dialect. Purpose 1) probably cannot be satisfied until some people begin to speak Lojban as native bilinguals. There are native Esperanto speakers, whose parents had no other common language. Learning Lojban involves finding out about a rich set of structural resources. Some of these will go over automatically because they match your own language. Some will seem strange because they conflict with your language, and you will have trouble with them, but you will use them anyway because they are the easiest, shortest ways of saying what you mean in Lojban. The simple, unmarked forms of Lojban are the ones least like natural languages: the predicate gram- mar, the contradictory negation, and the logical (Boolean) connectives. The things that are "in there to emulate natural languages" are more heavily marked and so more difficult to exploit. The best example of this that comes to mind is the form of embedded sentence called abstraction: the (I like (I swim)) above. This is unnatural in English, especially in complex constructions, but is the most painless in Lojban: you wrap an entire predication into "nu"/"kei" brackets (you can omit the "kei" if no ambiguity results) and the result is suitable as an argument for another predication. So you find yourself saying the Lojban for "I like the event of I swim" even though that is not at all natural in English, because Lojban makes it easy. You can ellipsize it to "mi nelci le nu limna", omitting the second "I" and hoping the listener will reconstruct it correctly if you want, but you know that this is ambiguous (or more accurately, vague) because of the omitted place in the embedded predication. The listener is also aware of this vagueness, and can ask "ma limna" (Who swims?) to get clarification.

42. cowan: (responding to 32.) [Dialectization] is certainly a known problem. All of us speak more or less pidginized versions of Lojban at best: we tend to exploit features that have parallels in English or our own languages. But the fact that the language is not very "large" means that it is possible to exploit the other resources after a modest amount of learning and so prevent Lojban from becoming an English-based code. The Lojban metaphor malglico 'that #*%^ English' is applied to the tendency to copy English-based constructions into Lojban.

43. daj: (responding to 42.) As long as it remains a pidgin language, there should be no problem. But your original posting indicated that speakers should be able to extend the language on their own. They can extend the vocabulary by combining the 1300 (?) basic words, and they can extend the expressive power of the language by improvising on the rather unspecialized grammatical structure. But here is where I think things will necessarily go awry. Speakers who extend Lojban on their own will do it in accordance with their own already established linguistic habits, and they will categorize their vocabulary according to their semantic habits (this is only a weak SWH, by the way). To the extent that Lojban becomes a real vehicle for communication, it will take on the characteristics of existing natural languages. It may be fun to see to what extent this can be resisted, but I really think it's hopeless to think that it can be prevented altogether.

44. cowan: (responding to 43.) I agree about "prevented altogether". We do try to resist, though, sometimes by bending over backwards to avoid "malglico". Consider the following translation of Simonides' epigram at Thermopylae: "ko cusku fi le me la lakedaimon. doi klama do'u fe le nu mi nu tinbe le ri flalu kei morsi". Literally this is: "(Imperative!) You express to what-I-describe- as pertaining to Lakedaimon, O comer/goer, the event-of (we are (the event-of (something) obeys the laws of the-last-mentioned) kind-of dead)." I think you will admit that this slop is not English, and that the grammar underlying this Lojban utterance is sui generis and not something derived from English in the manner of a code. (I know no Greek, by the way, so my translation is from English not from Greek.)

45. daj: (continuation of 43.) The alternative, of course, would be to extend the language by design. But this would produce either a language that looked like some other human language (and therefore unlike most human languages) or a "PL/1" language, so rich in devices that subsets would develop, fragmenting the language into dialects.

46. cowan: (responding to 45.) Indeed, Lojban is comparable to PL/I or Ada in complexity. But its scope is much larger than any programming language's. If English were to be put in purely phrase-structure form, the result would be incomprehensibly large (to say nothing of desperately ambiguous). I don't believe that the entire repertoire of Lojban devices is beyond human learning, although some of the recursive complexities made possible may be beyond human understanding (as is the case in English also).

47. cowan: (continuation of 42.) In translating a story involving dialogue, for example, I found it necessary to make frequent use of the observational particles of the language, which certainly had no counterpart in the English version. These mean things like 'I hear', 'I observe', 'I deduce', 'I know by cultural means', etc. Likewise, in delivering the lines realistically, it was necessary to supply paralinguistic attitudinal indicators, as Lojban makes no use of tones of voice (part of its phonological unambiguity) that an English- speaker would surely use.

48. daj: (responding to 47.) Why? Have these categories become compulsory in your dialect?  :)

49. cowan: (responding to 48.) Of course not! But to make the meaning of the story clear to those who didn't belong to my culture, the observationals were indispensable. We know that when somebody says "It must be the wind" in reference to a sound, this is a conclusion from incomplete evidence: but a Mongol tribesman might not. Hence the observational helps to make the cross- cultural meaning clear. For communication among, say, my own family (if they spoke Lojban), I would probably not need such a thing.

50. daj: (continuation of 2., from 28.) Frankly, I don't think the designers of Lojban knew much about language.

51. cowan: (responding to 50.) Guilty, especially in the beginning. But we've learned a lot, even if we take a non-standard slant on some things. Lojban/Loglan has a "historical" dimension as well, even if the history is only some 35 years old, and there are things in the language that probably would be removed now or changed if an a priori redesign were done. Lojban is not designed to be a "universal notation", just a language. Although it shares many features with other languages, it is clearly not a dialect or a code or a jargon. It has its own feature set and its own characteristic way of exploiting the set: the set is large, but the language is still small because of its high degree of regularity. Whether it is possible to internalize the language, in the sense of gaining Chomsky-competence, is still an open issue. I believe it is possible: I am beginning to think in the language's terms now, and so are several other ad- vanced students; some of the paralinguistics are also becoming internalized.

52. daj: (responding to 51.) I have to apologize for my snotty attitude there. You've obviously done more homework than I thought at first. I still can't help thinking, though, that you're underestimating the incredible complexity of human language, both in its use and in its potential for change. I doubt that you will be able to create a language free of irregularity, ambiguity, etc. On the other hand, you may have a really interesting semi-laboratory experiment in the process of creolization, and that would make the whole thing worthwhile in itself.

53. cowan: (responding to 52.) Well, new purposes always help. These letters are being passed to the president of the Logical Language Group, by the way - I hope you don't mind - for comments.

54. daj: (responding to 53.) I'll try to watch more and snarl less. Thanks for the education.

55. cowan: (responding to 54.) je'e .uicai ("Roger. Happy!!!)").

Subject: Why use Lojban for S/W?

Participants: dan@YOYODYNE.MIT.EDU (Dan Parmenter) cowan@marob.masa.com (John Cowan) rjohnson@vela.acs.oakland.edu (Rod Johnson) dtate@unix.cis.pitt.edu (David M Tate) lojbab@snark.thyrsus.com (Bob LeChevalier)

1. dan: S/W is pretty much disavowed by the linguistic orthodoxy in this country. I'm told that anthropologists are still interested in it, but I don't know enough about anthropology to say.

2. rjohnson: (responding to 1.) There is no linguistic orthodoxy in this country (and why do national boundaries enter into this question anyway? There is certainly no linguistic orthodoxy in the world). Linguists are a pretty fractious bunch. There may be a generative orthodoxy (though I doubt it), but they don't speak for me.

3. dan: (responding to 2.) When was the last time you saw an article in any of the journals on Sapir-Whorf?

4. rjohnson: (responding to 3.) Well, I suppose it depends on which journals you look at. I've seen articles fairly recently that are "Whorfian" in some sense here and there. It's certainly not a major topic in the field at present, but there are any number of reasons that could be, including: - it's held to be clearly true; - it's held to be clearly false; - other ideas are exciting people nowadays; - people are stumped as to how to approach it. My guess is that it's all of the above, variously.

5. dan: (continuation of 3.) The introductory textbooks on linguistics that I've looked at seem to cover the topic [of S/W] briefly, if at all, and then as a discredited hypothesis.

6. rjohnson: (responding to 5.) In the totally unscientific sample of textbooks on my desk, Lyons has a fairly sympathetic discussion of it; Finegan and Besnier have only a page or so, mostly sympathetic but critical; Eysenck's cognitive psych textbook gives it an extended but guarded treatment; Bolinger gives it a mild thumbs down ("exaggerated") but is essentially in sympathy with some form of the idea; and Akmajian et al. don't mention it anywhere I can find. Everyone that mentions it finds it attractive but in need of revision or special understanding. Finegan and Besnier, for instance, say: "Today few scholars take the Sapir-Whorf hypothesis literally. Many linguists take the position that language may have some influence on thought but thought may also influence the structure of language" etc. If we strip away the mealymouthedness (which I've spared you most of), they seem to be saying that the influence goes both ways, a position that neither Sapir nor Whorf would have any objection to.

7. dan: (continuation of 3., from 5.) This doesn't disprove anything, but it certainly seems to indicate a lack of interest in the subject currently. I didn't mean to imply that all linguists were of one mind, but on this topic, there seems to be a pretty general agreement, in what I've read.

8. rjohnson: (responding to 7.) I'll agree there's not a whole lot of interest among the people who currently dominate the field. This is not to say that those people are committed to a position on either side of the issue - it's just not relevant to their work. "Exotic" languages are no longer the center of interest that they were in the heyday of Sapir and Whorf. That doesn't mean the issue is resolved, though.

9. rjohnson: (continuation of 2.) No matter how you try to slant the issue, the status of the Sapir-Whorf "hypothesis" is still very unclear. (Personally, I don't think it's even a hypothesis; it's a problematic, it's a topos, it's an ideological litmus test.) But in any event, though there may be unanimity on this point in some linguistics departments dominated by Chomskyans, for the rest of us (and that's most of us) the debate is still alive. (No anti-Chomsky animus expressed or implied.) You don't know enough about linguistics [either]. Anyway, the question of orthodoxy is beside the point. This is not something you vote over. There have been some suggestive studies on both sides; there has been nothing conclusive, and I see little indication that most of the partisans on both sides have really gotten to terms with what the debate is all about.

10. dan: (responding to 9.) I'm calling it as I've seen it. When I was hyped up on Sapir-Whorf myself a few years ago, I went through any number of texts looking for information on it and came to the conclusion that most linguists that I read seem to disavow it. I guess I read the wrong books. Even the anti- Chomsky linguists didn't seem to have much to say on the matter.

11. rjohnson: (responding to 10.) This isn't some kind of insult: you don't know enough about linguistics to say. There are several reasons for this:

1. No one does.  The field is too big and too heterogeneous, the social networks too fractured, to be able to gauge consensus adequately. 
2. As you just told us, you're not a trained linguist (yet). Pronouncements about what's orthodox are hazardous enough for the most highly trained finger- licker (if you follow the imagery); one's words have a way of coming back and biting one on the ass here. 
3. "... but I don't know enough about anthropology to say."  But anthropology, and psycholinguistics, and rhetoric, and such areas, are where a lot of the SW work goes on nowadays. These people aren't disqualified from contributing sim- ply because they don't hold down lines in the budget of a linguistics department. 

12. dan: (responding to 9., from 10.) I never said anything about "voting" on anything.

13. rjohnson: (responding to 12.) But isn't that what orthodoxy amounts to? Chomsky was took a few highly unorthodox positions once, and was roundly "outvoted" by the field. That changed. It's arguments that decide these things, and evidence (and funding, and ...), not which way the wind is blowing in any given decade. Orthodoxy is fickle. 20 years ago everyone was into in- trinsic rule ordering, squishes and (trans)derivational constraints. No one talks about them now - but the underlying problems are still there waiting to be explored. Likewise the complex of problems and questions people lump together as "the Sapir-Whorf hypothesis".

14. dan: (responding to 9., from 12.) If I'm missing something, please let me know, rather than telling me I don't know what I'm talking about. As it happens, I have tried to learn about s/w and have considered the issue at great length. I admit, that in the course of this thread, I've made some mistakes, but does that qualify me as an ignorant boob? I don't think so.

15. rjohnson: (responding to 14.) Dan, I thought you didn't take this personally! Of course you're not an ignorant boob, not at all. Still, it would be a lot of fun to handle this this way: >I admit, that in the course of this thread, I've made some mistakes, but does that qualify me as an ignorant boob? Sorry - the weak must die.  :)

16. dan: (responding to 9., from 12.) In several cases, I've misunderstood what people were saying, and been misunderstood in kind. This happens, but I like to think that I'm relatively informed about linguistics, based on my education and my intent to pursue graduate studies in the field.

[... continuing on the same topic later]

17. dan: [SWH] is something I'm rather interested in (as a curiosity, I used to be utterly convinced by it too), and I'm actually glad the Lojbanists have dredged it up for serious discussion again. I question their methods though, why not do psychological tests on existing languages, rather than trying to come up with a whole new one? Presumably, if S/W is confirmed by the Lojban project, no one would assume that it is only true for Lojban itself. This goes back to my feeling that Lojban is at best, an intellectual puzzle. If you can learn it and gain some degree of fluency in it, well that's fine for some people. Not for me.

18. dtate: (responding to 17.) Hey, we agree! Weird... S/W is about natural languages, of which we have lots. Presumably, if S/W is true, then it is true now, for the languages currently being used. The only problem might be if all current natural languages are sufficiently similar in their world-views that S/W doesn't kick in. If this is true, then it would constitute (IMHO) a practical refutation of S/W, since S/W was originally motivated by observation of the divergence among current natural languages. There is theoretical interest in knowing if a constructed language like Lojban has a detectable effect on thought patterns, but not nearly as strong as the interest in whether there is a difference between (say) Korean and Japanese thought patterns, or German vs. French, or Sioux vs. Hopi. I'd go even farther, though, and question what it is that we hope to learn using Lojban that we couldn't learn better (and more easily) using natural languages. There's hardly any chance of Lojban ever becoming a widespread native tongue, so any conclusions we get about people whose primary language is Lojban will include the strong bias of self-selection for Lojban proficiency by the subject or some close relative of the subject...

19. cowan: (responding to 18.) [We hope to learn] the same kinds of things we learn about the mechanics of falling bodies by rolling them down inclined planes rather than dropping them from the Leaning Tower of Pisa. "JCB's [the founder of Loglan] plan was to attempt to build a language tool that would have the major features of natural languages, but would have some strong warping in its structure that was deviant from all other natural lan- guages. This warping would attempt to take normal structures that presumably set limits on thought, and 'push them outward in some predictable dimension'. His language tool would be an extreme case, not a 'typical language' but 'a severely atypical one', in order to enable any Whorfian effects to be more easily seen. He attempted to put 'decisive but non-essential differences' into the language; he still needed the language to be speakable.... "The structural extreme he chose was to model the grammar on the well- understood structures of symbolic logic. There are no natural languages based on a predicate grammar, yet logicians are skilled at analyzing the structural relationships between natural language and formal logic.... The essence of these concepts is that 'it forces on its speakers a reasonably small set of assumptions about the world ... perhaps the smallest possible set'. 'Any speaker, from any culture, should find it possible to express in Loglan what he takes for granted about the world ... without imposing ... or being able to impose these assumptions on his auditor'...." (Outer text by Robert LeChevalier, from Ju'i Lobypli #6. Inner quotations are from James Cooke Brown, Loglan 1, 3rd Edition.)

20. lojbab: (responding to 18.) Psychological and other tests of S/W were performed using natural languages in the 1950's - at least two large studies, though I don't have references handy. They turned up fairly negative results, and this is one reason why S/W went into eclipse. (Other factors included an inability to agree even on what the actual hypothesis was; i.e. how to formulate it, the racial/political issue, attacks on Whorf's scholarly credentials, and the rise of Chomsky's theories which were orthogonal to S/W and soon attracted all the money). The tests were not conclusive, though. One major problem is that with natural languages, you can't ever be sure that hidden cultural features might obscure the results. There are also more variables to control with natural language speakers. (This is NOT the same as saying natural languages are 'too similar'; merely that we don't know how to test for the differences.) How does Lojban improve on this? Being better defined as a language than any natural language allows better monitoring of actual usage vs. some theoretical norm. Having a structure drastically different from any natural language should lead to a much larger S/W effect than between two natural languages. Furthermore, if a S/W effect is found, its nature and manifestation will help experimental design for a new test based on natural languages, when we better understand what we're looking for. Being culture-free (at least initially) makes it much easier to filter out cultural effects. Being different from all language families allows better cross-cultural studies. Because there are several identifiable areas of structural difference, there is a greater likeli- hood of finding effects that may be constrained by the TYPE of structure (S/W may not be general, only specific to certain types of structures). As to Lojban becoming widely spoken, you have to decide how wide the goal is. Esperanto managed up to a million speakers in 100 years, and the world population and mass media needed for rapid expansion of a language teaching effort should make Lojban's potential expansion rate significantly higher, if people find a reason to learn it. Right now the primary such reason is as a linguistic toy, as Dan accuses, since there is no obvious financial gain. Thus we indeed have considerable self-selection in the community today. This can easily change: - development of computer applications could make learning Lojban a necessity external to personal choice in some fields; - development of cross-cultural/foreign language education applications could lead to more widespread use of Lojban at a low level by large segments of population. Some of these will pursue more advanced study of Lojban. - identifying any preliminary S/W effects that are perceived as beneficial will greatly heighten interest in learning the language among potential beneficiaries. - if research using Lojban is funded, some people might actually be paid to learn Lojban as test subjects (and teach it to their children?). These would presumably be chosen to negate self-selection factors, though willingness to accept payment for this sort of thing is itself a kind of selection (all psychological studies of volunteers could be questioned on this basis, but such studies are standard in the field, so presumably there is capability to filter out such bias in the testing methods). In short, if the language in useful as a tool, it will be used. As the size and diversity of the community grows, self-selection becomes less of a bias factor. However, self-selection isn't an irremediable bias. Nor is the lack of a large community of speakers. In internal discussions, some Loglan/ Lojban supporters have argued for preliminary S/W testing using second-language adults, notably language inventor J. C. Brown who proposed in his book on the language (Loglan 1, 4th edition) a study where adults of several cultures are all taught Loglan over a summer and tested before and after for changes in 'the way they think'. (I personally think his design to be flawed and too simplistic, but if Lojban's S/W effects are truly dramatic, they could show up in 2nd language flu- ent speakers. And such appearance would pretty much guarantee that people would find a way to build a testable 'culture' of 1st language speakers, perhaps by raising children bilingually during the 'critical period', or even from birth.) Incidentally, current thinking in the community is that 'logical' thought or expression is not necessarily the aspect most likely to generate noticeable S/W effects. The removal of grammatical ambiguity from modification (as exemplified by the much-discussed plastic cat food lid) seems to heighten creative exploration of word combination. This comes from self-observation, and is a linguistic toy feature, but could lead to profound changes in problem-solving in a community speaking Lojban, which ought to qualify as a bona-fide S/W effect. Other areas of possible benefit are (surprisingly in a 'logical' language) emotional expression. Lojban has a fully developed set of metalinguistic and emotional attitude indicators that supplant much of the baggage of aspect and mood found in natural languages, but most clearly separate indicative statements from the emotional communication associated with those statements. This might lead to freer expression and consideration of ideas, since stating an idea can be distinguished from supporting that idea. The set of possible indicators is also large enough to provide specificity and clarity of emotions that is difficult in natural languages. It is easy to imagine enormous changes in communicative activities that involve emotions, and corresponding 'world view' changes as a result. Again, only time will tell. Time is a significant factor here in evaluating Lojban's relevance to linguistics today. In the next 10 years, there will be ONLY 2nd language adults and perhaps a few children raised by non-fluent adults. For at least a generation after that, immediate self-selection will be a significant potential factor, and Lojban will be at best questionably a 'living language', making its results less than certain. Still, for linguists TODAY, interest in Lojban can be tied to any of several major channels: - possible use of 2nd language speakers to get preliminary ideas on whether S/W is likely; - making sure that Lojban's design is as linguistically sound as we can make it given current linguistic knowledge, ensuring that eventual S/W results are meaningful; - developing tools and techniques for eventual S/W testing; trying to identify what the effects will be and how they can be detected; - actually participating in the language community, using your linguistic skills to help quickly build a set of initial usage patterns based on the unambiguous language (and vocabulary, idiom, etc.) that when passed on to 'native speakers' in the future provides them with a better, more robust, starting point for evolutionary change; - developing techniques of teaching the language as a second language, when there is no existing idiom. Related to this is possibly using Lojban's simple structures and culture-free properties to enhance language education. - preparing other, non-S/W related research based on Lojban's features and its availability as a experimental linguistics platform or alternatively as a totally self-contained 'model' of a language; - using Lojban for other linguistic research that is not as dependent on a 'native' base, including studies of language learning (1st and 2nd), as a medium for culture-free recording of linguistic information in studies of other languages (translating to English may help an English-native reader of your paper get the gist of what a foreign language is saying, but is subject to all the problems of English cultural usage and ambiguity. There are a lot of non-native English readers who may not be aware of those features. (In short, using Lojban as an 'international language of linguistics' much as IPA serves for phonetics). - and finally, serving as peer reviewers to make sure that those of us working directly on the project don't get our heads too far into the clouds. This of course requires that you know something of what we're trying to do, which is why we keep bombarding this forum with so many long messages :-)


The following are additions to the bibliography of Sapir-Whorf Hypothesis materials compiled during the discussions on the computer networks. Here are some references to discussions of the Sapir-Whorf Hypothesis. One is recent; the Fishman article as far as I know has not really been replied to anywhere that I know of. (The first part of the bibliography is courtesy of Alan Munn, University of Maryland, who made these comments). Brown, R. (1957) "Linguistic Determinism and Parts of Speech", Journal of Abnormal Social Psychology 55, 1-5. Brown, R. and E. Lenneberg (1958) "Studies in Linguistic Relativity", in E. Maccoby, T. H. Newcomb & E. L. Hartley (eds.), Readings in Social Psychology (3rd ed.), New York: Holt, Rinehart & Winston, pp. 9-

18. In the same volume, "The Function of Language Classification in Behavior", by John B. Carroll and Joseph B. Casagrande, pp. 18-

31. Fishman, J. (1960) "A Systematization of the Whorfian Hypothesis", Behavioral Science 5, pp. 232-

239. Hoijer, H. (1954) Language in Culture (Comparative Studies of Cultures and Civilizations, No. 3; Memoirs of the American Anthropological Association, No. 79), Chicago: University of Chicago Press. Kay, P. and W. Kempton (1984) "What is the Sapir-Whorf Hypothesis?", American Anthropologist pp. 86, 65-

79. Whorf, B.L. (1939) "The relation of habitual thought and behavior to language", in B.L. Whorf (1956) The Selected Writings of Benjamin Lee Whorf, Cambridge MA: MIT Press. These articles are both for and against SWH; The Brown papers and the Kay/Kempton paper are attempts to test the hypothesis. The Fishman article discusses the results of some experiments and where they leave us with respect to various versions of SW. Other Sapir-Whorf references: Alford, Danny K. 1978. "The Demise of the Whorf Hypothesis (A Major Revision in the History of Linguistics)", Proceedings of the 4th Annual Meeting of the Berkeley Linguistic Society 4:485-99. Hymes, Dell, 1968. "Two Types of Linguistic Relativity", in Sociolinguistics: Proceedings of the UCLA Sociolinguistics Conference (1964). Ed. by W. Bright. Janua Linguarum Series Major, 20. Mouton: The Hague. pp. 114-167. Lucy, John, 1985. "Whorf's View of the Linguistic Mediation of Thought", in E. Mertz and R. J. Parmentier, Semiotic Mediation: Sociocultural and Psychosocial Perspectives, Orlando: Academic Press. McNeill, David, 1987. "Linguistic Determinism: The Whorfian Hypothesis", Chapter 6 of Psycholinguistics, A New Approach, New York: Harper and Row. pp. 173-209.

Subject: Esperanto and Lojban

Participants: neal@druhi.ATT.COM (Neal D. McBurnett) cowan@marob.masa.com (John Cowan) daj@beach.cis.ufl.edu (David A. Johns) pepke@gw.scri.fsu.edu (Eric Pepke) loren@tristan.llnl.gov (Loren Petrich) dtate@unix.cis.pitt.edu (David M Tate) lojbab@snark.thyrsus.com (Bob LeChevalier)

1. neal: Esperanto is much easier to learn than English or any other ethnic language because it has few irregularities and it has a phonetic writing system. In studies done with English school children it was demonstrated that one year of instruction in Esperanto gave the students the same level of language competence as five years of studying French. Once you learn to conjugate one verb, you know how to conjugate them all!

2. daj: (responding to 1.) I agree 100% that an artificial language is easier to learn as a second language, and as a medium of international communication, something like Esperanto may make more sense than English. In fact, after teaching English as a foreign language for a couple of years, I came to the conclusion that it would make much more sense to teach Pidgin English than real English. But when pidgins become the primary language of a community, they cease to be regular and simple. Why? Is creolization a degenerative process, or do the irregularities have a function in language? I think we need an answer to this question before we assume that we can construct a "logical" language and use it as a real medium of communication.

3. lojbab: (responding to 2.) On the other hand, why not invent a completely regular language, with a 'cultural ethic' that values that regularity, and observe what if any irregularities come into existence.

4. dtate: (responding to 3.) Because you can't create a 'cultural ethic' by fiat.

5. lojbab: (continuation of 3.) Lojban is not limited in linguistic research application to testing Sapir-Whorf; I've given a lot of my own effort to ensuring that the design is robust enough to allow other studies. Pidgins and creoles of the world have all evolved from interaction between two or more al- ready irregular and highly complex languages. Variables to watch in analyzing the evolution of the language are too many and too poorly understood. Lojban is both much simpler and highly regular. Presumably as a result, the variables affecting pidginization and creolization, and indeed all other manner of linguistic change will stand out much better. Furthermore, as a fledgling 'international language' that differs structurally from all of the 'first languages' of the world, the studies of evolutionary processes can be conducted over and again as Lojban interacts with each of the languages and cultures in which it is introduced. Other areas of possible Lojban application include language universals (Lojban is relatively neutral on some of these, supporting many competing forms; the ones that survive or spread as the language becomes a 'living' language' are thus worth studying to find out why.) and universal grammar (if Lojban proves to be acquired by children and adults as easily as natural languages, UG will have to be able to explain it). Note that a small number of Lojban speakers (especially in a specific speaking locale) would be expected to show evolutionary effects more quickly, enhancing the chances of observing such effects during a short research period. We've set an early prescriptive policy towards the language precisely to allow enough of a fluent speaker base to form to preserve some type of linguistic identity to serve as a starting point.

6. pepke: (responding to 2.) "Degenerative" is kind of a loaded term. It may just be the point of view. If you start off with an artificially "perfect" language, just about any change will seem degenerative.

7. lojbab: (responding to 6.) Not in the case of Lojban. ONLY a change that introduces structural ambiguity is automatically 'frowned upon', and I personally doubt there is a major evolutionary force in language that promotes such ambiguity 'for it's own sake' - there would have to be some other explanation for an ambiguity to be introduced. Most other types of changes (word formation rules, phonological changes, preference in word order among them) would not be inherently degenerative. No one in the Lojban community thinks that we've created a 'perfect' language, only an 'adequate' one for communication and linguistic research.

8. loren: (later in the discussion) I wonder how Lojban handles (1) words for opposites and (2) verb aspects (if present).

9. cowan: (responding to 8.) The term "opposite" is a bit vague. Among its 1300+ root words, some have "opposites" and some don't. There are words for both "increase" and "decrease"; "beautiful" is a root but "ugly" is not. Since the root words are primarily chosen for ease-of-use in making compounds, this was justified primarily by the desire to make shorter compounds. There is a faction which has argued that there are too many root words (and that opposites in particular should be stripped out); another faction holds that there are too few (that choosing "beautiful" rather than "ugly" is an unwanted bias). In fact, having a list of root words at all is ipso facto a bias, but it is a known bias which can be allowed for. The alternative is having to construct 4-5 million distinct words with no compounding rules at all to cover the vocabulary range of the world's languages. The general Lojban solution lies in the four particles "na'e", "to'e", "no'e", and "je'a", which are four kinds of scalar negation. This is distinct from contradictory negation ("It is not the case that...") which is represented in Lojban by "na" and "naku". "na'e" is nonspecific scalar negation, analogous to English "non-". "lo na'e gerku" means "a non-dog", which in principle could be anything that is not a dog, but probably means some other kind of animal. "to'e" is polar opposite scalar negation, analogous to some uses of English "un-"/"in-". "Beautiful" is "melbi", and "ugly" is "to'e melbi". "barda" ("large") means the same as "to'e cmalu" ("unsmall"), and vice versa. "no'e" is scalar neutral negation. This arises when a scale whose opposing ends are "X" and "to'e X" has a natural midpoint. "no'e melbi" for example might be translated "plain" or "ordinary-looking". "je'a" is affirmation, and has the same meaning as no particle at all. It is chiefly useful to deny one of the other particles in conversation [ed. note, also for emphatic affirmation]. (Lojban also has another type of negation called metalinguistic negation, where the adequacy of the utterance is denied due to category mistake or what have you. The particle "na'i" indicates that what precedes it (or the whole last utterance, if nothing precedes in this utterance) is erroneous in some such way. If a Lojbanist asks another: xu do sisti le zu'o do rapdarxi le do fetspe literally: (True or false?) You cease the activity of repeat-hitting your female-spouse? or idiomatically: Have you stopped beating your wife? a good and sufficient answer is "na'i".) The above sentence could be expressed with the aspect grammar rather than with the word "sisti" (cease), but I don't know the language well enough to do so yet. The tense/aspect system of Lojban is one of the most complex parts of the grammar, and I am far from sure that I understand it altogether. Fortunately, it is 100% optional. Everything it can express can also be expressed semantically through the predicate grammar, or just omitted altogether. Rather than trying to explain the whole thing systematically, I will simply give an unsystematic catalogue of the kinds of things that can be expressed. Note: any of these items may be combined either by logical connectives (and, or, xor, etc.) or by non-logical ones (joined with, mixed with, union, intersection, etc.) It is also worth mentioning that Lojban tense is "sticky" and that once set it propagates to all following untensed sentences [ed. note: This is the default pragmatic interpretation for many contexts; however there may be contextual cir- cumstances where tense does not carry over, such as:] In stories, this is modified a bit by the assumption that narrative flows in time, so each sentence may represent a time later than that of the preceding one. One may, however, by proper use of the time offset machinery, tell stories backwards or inside-out as desired. First, Lojban tense handles both time relations and space relations, where time may be treated either as sui generis or in an Einsteinian way as the fourth spatial dimension. Time and space are formally parallel: for each, there is a way of specifying an origin, one or more offsets from the origin (directions in time or space), and an interval around the point thus determined. In the case of space only, the interval may be specified as 1-, 2-, 3- or 4-dimensional. In addition, there is machinery for representing motions in space, but not in time. Should time travel become practicable, the 4-dimensional facilities of the space motion grammar may become useful. Intervals may also be modified by either or both of two kinds of modifiers. One type is a quantified tense, which may be either objective (corresponding to English "never", "once", "twice", ..., "always" for time, or "nowhere", "in one place", ..., "everywhere" for space) or subjective (things like "habitually" and "continuously"). The other type is an "event contour", handling things like "during", "after the (natural) end of", "after the termination of", etc. There is also a mechanism for specifying the actuality/potentiality status of a predication: things like "can and has", "can but has not", etc. Separate from all this, Lojban prepositions (really case tags) can be used as adverbials also, and are grammatically almost interchangeable with the tenses. Likewise, the tenses can be used prepositionally. "pu" represents the past tense (time direction in the past), but means "earlier than" as a preposition. "bai" on the other hand is the preposition "under the compulsion of" but means "forcedly" when used as an aspectual. This list of prepositions/adverbials/ aspectuals/case tags is extensible to any predicate whatsoever by using the particle "fi'o" which makes a predicate into an aspectual.

Subject: Lojban gismu Vocabulary

Participants: iad@chaos.cs.brandeis.edu (Ivan Derzhanski lojbab@snark.thyrsus.com (Bob LeChevalier)

1. lojbab: [part of a longer discussion on Lojban roots] We wanted to maximize ease of learning, BUT not at the expense of cultural neutrality. Loglan (generic) thus maximizes reflecting the sequences of phonemes in a given word from the corresponding words in the source languages, weighted by speaker population. Thus 'blanu' has the phonemes in order of English 'blue' and Chin- ese 'lan' (with appropriate tone which I don't have handy). The result is intended to be words that are distinctly different from those of any one language, but which sound 'natural' to speakers of the source languages and also have an indirect cognate value - not one that is necessarily obvious, but one that can be used to learn the word if it is pointed out.

2. ivan: (responding to 1.) If it is pointed out indeed. I speak Russian, English, Spanish and Hindi, and I know some Arabic, but my attempts to analyze some Lojban words and to discover their roots failed almost totally.

3. lojbab: (responding to 2.) At first contact, you WILL need to have the connection pointed out. But I suspect that after the connections are pointed out for a few words, someone with your language experience will begin to see the patterns. One problem, of course, is that we go for aural recognition, NOT visual recognition, and use Lojbanized phonetics. The Procrustean bed of Lojban morphology (all roots are of the pattern CCVCV or CVCCV) also constrains the result enormously. The algorithm we use attempts, within the framework of this morphology, to maximize aural recognition for an active student of the language.

4. lojbab: (continuation of 1.) Incidentally, once you get used to them, the regularities in Lojban words have their own aesthetic value, just as Nick's portmanteau words from Esperanto do. Lojban words have a lot of medial 'n' and 'r' and initial fricatives 'j', 'c', and 's', all derived from the heavy Chinese weighting. I have a little trouble with the fricatives unless I'm relaxed - I get 'she sells sea shells' type tongue twisters, but I presume the Chinese will find it pleasant.

5. ivan: (responding to 4.) No offence intended, but I'd like to hear the Chinese confirm this. For all you know, they may not. Schleyer went out of his way to put as few "r"s as possible in Volap�k words, so that the Chinese will be happy. I hope at least his Chinese find it easy to say "obs" `we' or "coecs" `government officials' (i.e. `judges'), because I don't.  :-)

6. lojbab: (responding to 5.) That of course is the problem with any a priori word-making scheme. Especially without strong aid from native speakers. We have had one Chinese speaker look at this question directly, but since she is also fluent in German and English, she isn't necessarily an unbiased observer. The reason for the high sibilant frequencies though, is that several Chinese consonants map into Lojban 'c', 's', and 'j'. Still, there is a balancing act. Chinese is favored by the weighting scheme, but as you point out, we have 'r' and 'l' as phonemes which are much more common in other languages. Still, a high percentage of Lojban roots have syllable ending '-an' making 'n' such a common letter in the language, its frequency exceeds most vowels (in a language more vowel rich than English because of all the CV and CVV structure words). We had to make guesses on how to achieve recognizability in other languages, (and were also constrained to be consistent with 30 years of prior work by language inventor Brown). Ideally, there would have been scientific testing of our algorithm in native speakers of each language before making the words, but this wasn't possible and indeed wasn't important enough. The important thing was to have a neutral word-making method that did not favor any one language population, and paid at least lip service to recognizing language diversity. We also wanted non-random words, with phonemes occurring in orders that are speakable and familiar, and we got this.

7. lojbab: (continuation of 1., from 4.) Some of the initial consonant clusters look intimidating, but Ivan won't mind them.

8. ivan: (responding to 7.) I certainly don't. I don't take them all for granted, but they are not intimidating in any case.

9. lojbab: (continuation of 7.) (and might prefer them)

10. ivan: (responding to 9.) ... prefer them to what? Not to simple consonant- vowel alternation, no. I wouldn't miss the clusters if they weren't there. But they are, and I won't complain.

11. lojbab: (responding to 10.) One of the most frequent comments about Lojban words is that the consonant clusters look hard to English speakers, and this was more an answer to this criticism than a claim about the aesthetics of Slavic language speakers. Still, it seems a reasonable presumption that most people feel more comfortable with a language that sounds a little like their own. Interestingly, our phonology has a result that several people with experience with a variety of languages have said that Lojban (as I speak it) sounds like a south Slavic language. It will be interesting at some point to have a southern Slavic speaker confirm this. The range of consonant clusters we permit in Lojban was augmented after a Slavic languages expert pointed out that our set was extremely tame and excessively constraining on the words and their recognition. Lojban root words can be recognized as roots by the presence of the consonant cluster - which is never found in structure words and always found in predicate words. We thus con- strained the set of clusters in medial position by disallowing voiced/unvoiced mixing of stops and fricatives, doubled consonants, and most mixed sibilants. Permitted initial clusters are a subset of these (48), which are phonetically symmetric (thus, because we allow the unvoiced 'st', we allow the voiced equivalent 'zd', even though it isn't found in English. Languages require a certain amount of redundancy to be understandable. My own comparative examination seems to indicate that most languages have either consonant clusters or tones, and that having one seems to minimize the evolutionary pressure towards the other. Polynesian and Japanese are the only exceptions to this I know of (and Japanese actually has some clusters, though they aren't reflected in the writing system). Can anybody confirm or deny my observation? What other techniques are found in languages that improve redundancy.

12. lojbab: (continuation of 1., from 7. and 9.) So we end up with a language that has some aesthetic appeal for everyone, but perhaps doesn't satisfy everyone; a pleasant cultural tension/ balance.

13. ivan: (responding to 12.) And again, don't stress too much on the aesthetic side. It is too subjective. It is up to the person. Let's talk efficiency and ease.

14. lojbab: (responding to 13.) Aesthetics is enormously important, even though subjective. It determines people's first reactions to the language. Efficiency can be quantified, and is more objective, as you say. But languages need some minimum redundancy and I suspect that we don't know what that minimum is. So pushing too hard in this direction might give a language that is too efficient to be practical (Anyone for Speedtalk - Heinlein's language in 'Gulf').

15. lojbab: (continuation of 1.) Thus spaghetti becomes 'djarspageti', with the 'dja' from 'cidja', the word for 'food'.

16. ivan: (responding to 15.) "ci" is the Chinese _shi4_, I presume. What is "dja"?

17. lojbab: (responding to 16.) Ivan Derzhanski asked about the Lojban etymologies, and gave 'cidja' as an example word. It is halfway down this list. The following are rough etymologies of a sampling of Lojban words. These are being assembled for eventual publication as a set, but we have to manually reconstruct what the computer-run algorithm did for each word. It is key to remember that we often ran several words from a single language against words from other languages in order to select the word with the highest score. In some cases, this means that the word from a language that 'won' is not the best word for the concept in the language. Instead, subject to a little educated guesswork, we have words that offer a reasonable cognate-like memory hook between the Lojban word and a related source-language word. A second note, is that words are Lojbanized phonetically. This can result in some strange-looking spellings; e.g. English and Russian vowels and final consonants often change. I'll schematically outline the information for the first word: 714c katna 82.00 cut [Algo [Lojban [score [English run #] word] (0-100)] keyword] [This line is from a summary file of algorithm outputs, prepared manually at the time we made the words.] kan kat kat kort kas kata [Lojbanized phonetic forms of the source language words - the order of words is Chinese, English, Hindi, Spanish, Russian, Arabic. We have not yet manually gone back to our paper originals to get the Romanized natural language spellings. Note: some declensional word endings were systematically removed to get a true root. This was to avoid getting a false recognition score solely from the declension. The stop component of affricates were removed for the same reason. There were a few other systematic a priori modifications to the source language words that I can respond to if anyone has questions about a word. Note that the source word may not be the best word for the concept in the language. We aren't expert in all these languages, and in any case wanted to have a memory hook for the word more than a cognate. (cut ) [English keyword from the algorithm output file] katna 82.00 3 3 3 0 2 4 [Lojban word and score from the output file - there were occasional typos in making the manual summary, which we are only now finding (about 3-4% error rate - we were working quickly and didn't check ourselves well). The 6 digits are scores for the 6 source words, in order. The numbers represent phoneme matches, in order - a score of 1 was considered useless for recognition, and a score of 2 required the phonemes to be adjacent or separated by exactly one phoneme in BOTH source and Lojban. Thus 'kort' from Spanish gets a 0 score even though it has some cognate value.] 714c katna 82.00 cut kan kat kat kort kas kata (cut ) katna 82.00 3 3 3 0 2 4 714c klaku 60.90 weep (cry) ku krai vilap ior plak baka (weep ) klaku 60.90 2 2 2 0 3 2 714c krixa 61.30 cry out xan krai cila grit kric sarax (cry out ) krixa 61.30 2 3 2 2 3 2 714c kulnu 45.20 culture uen kalcr sabiat kultur kultur takaf uen kalcr sanskrit kultur kultur takaf uen kalcr sabiat kultur kultur tarbut uen kalcr sanskrit kultur kultur tarbut (culture ) kulnu 45.20 2 2 0 4 4 0 714c mitre 89.40 meter mi mitr mitar metr mietr mitr (meter ) mitre 89.40 2 4 4 3 4 4 714c sanmi 62.90 meal san mil bojan sen eda taam (meal ) sanmi 62.90 3 2 2 2 0 2 714c sefta 60.00 surface/face 2/2o lower score no conflict [the highest score word was used] se srfis satax kostad pavierxnast satxa (surface ) sefta 60.00 2 2 3 3 0 3 714d bersa 57.00 son er san beta ix sin ibn er san beta ix sin najl (son ) bersa 57.00 2 2 3 0 0 0 714d pruxi 53.00 spirit guei spirit pret espiritu dux rux (spirit ) pruxi 53.00 2 3 2 3 2 3 714d suksa 61.20 sudden su sadn saxsa subit vdruk faja su sadn saxsa subit vdruk bagta (sudden ) suksa 61.20 2 2 3 2 2 0 714e cidja 61.45 food/feed ci fid bojan komid pic gida (food ) cidja 61.45 2 2 2 2 0 3 714e fetsi 62.14 female/fem- si fem stri feminin jiensk uncau (female ) fetsi 62.14 2 2 2 3 2 0 714e spoja 57.51 explode ja iksplod vispot eksplo vzriv fajar (explode ) spoja 57.51 2 3 3 3 0 2 714f catlu 45.05 look ciau luk dek mir smatr tatala ciau luk dek ve smatr tatala (look at ) catlu 45.05 3 2 0 0 2 3 714f grake 80.70 gram ke gram gram gram gram giram (gram ) grake 80.70 2 3 3 3 3 3 714f krefu 57.53 recur 3/3o lower score no conflict affix [the 3rd best word was taken to give the word a short affix] fu rikr pir rekur pere takrar (recur ) krefu 57.53 2 2 0 3 2 2 714f lijda 42.72 religion (relig-) jiau rilij darm relixio religi din (religious ) lijda 42.72 2 3 2 2 2 0 714f mlana 54.29 side/lateral 4/4o lower score no conflict affix mian latrl satax lad starana janib mian latrl bagal lad starana janib (side ) mlana 54.29 3 2 2 2 3 2 714f rinju 49.08 restrain ju ristrein pratiband refren abuzdiv kabax ju ristrein pratiband refren sdierjiv kabax (restrain ) rinju 49.08 2 3 3 2 0 0

Subject: Interlinguistics and Lojban Vocabulary Building

Participants: jsp@milton.u.washington.edu (Jeff Prothero) lojbab@snark.thyrsus.com (Bob LeChevalier) urban@rand.org (Mike Urban)

Jeff Prothero: I've been poking through the Linguistics section of the campus library, and found a book which might interest other Loglanists: Trends in Linguistics - Studies and Monographs 42: Interlinguistics Aspects of the Science of Planned Languages, Klaus Schubert (Ed.), Mouton de Gruyter 1989, ISBN 3-11-011910-2, 350 pg., $66. "This book ... is an invitation to all those interested in languages and linguistics to make themselves acquainted with some recent streams of scientific discussion in the field of planned languages." The book is a collection of fifteen recent papers in interlinguistics. For folks who (like me) haven't been following the field, the bibliographies provide an up-to-date set of pointers into the literature, plus some overviews of it. I think the table of contents gives an adequate idea of the scope and focus of the book: ----------------------------------------- Part I: Introductions Andre Martinet: The proof of the pudding Klaus Schubert: Interlinguistics - its aims, its achievements, and its place in language science. Part II: Planned Languages in Linguistics Aleksandr D Dulicenko: Ethnic language and planned language. Detlev Blanke: Planned languages - a survey of some of the main problems. Sergej N Kuznecov: Interlinguistics: a branch of applied linguistics? Part III: Languages Design and Language Change Dan Maxwell: Principles for constructing Planned Languages Francois Lo Jacomo: Optimization in language planning Claude Piron: A few notes on the evolution of Esperanto Part IV: Sociolinguistics and Psycholinguistics Jonathan Pool - Bernard Grofman: Linguistic artificiality and cognitive competence Claude Piron: Who are the speakers of Esperanto Tazio Carlevaro: Planned auxiliary language and communicative competence. Part V: The Language of Literature Manuel Halvelik: Planning nonstandard language Pierre Janton: If Shakespeare had written in Esperanto Part VI: Grammar Probal Dasgupta: Degree words in Esperanto and categories in Universal Grammar Klaus Schubert: An unplanned development in planned languages. Part VII: Terminology and Computational Lexicography Wera Blanke: Terminological standardization - its roots and fruits in planned languages Rudiger Eichholz: Terminics in the interethnic language Victor Sadler: Knowledge-driven terminography for machine translation ----------------------------- I'm not a linguist, and won't attempt to review the book from a linguistics point of view, but I will highlight some points of particular interest to Loglanists: First, there is some mention of Loglan (and the thousand-odd other artificial language projects to date), but the bulk of the focus is on Esperanto, for the simple reason that 99.9% of fluent planned-language users speak Esperanto, and a similar percentage of the written-text corpus from the planned language community is in Esperanto. (Any Loglanists who cannot tolerate mention of That Language are invited to stop reading at this point. :-) Second, I (and perhaps most Loglanists) was unaware of the Distributed Language Translation project, which seems to be of considerable potential interest to Loglanists. The following is quoted for comment: "Distributed Language Translation is the name of a long-term research and development project carried out by the BSO software house in Utrecht with funding from the Netherlands Ministry of Economic Affairs. For the present seven year period (1985-1991) it has a budget of 17 million guilders... Although much larger in size than earlier attempts, DLT started off as just another project of the second stage, using Esperanto as its intermediate language. Esperanto had been judged suitable for this purpose because of its highly regular syntax and morphology and because its agglutinative nature promised an especially efficient possibility of morpheme-based coding of messages for network transmission. During the course of the first years of the large-scale practical development, however, the role of Esperanto in the DLT system increased substantially. the intermediate language took over more and more processes originally designed to be carried out either in the source or in the target languages of the multilingual system. When I consider the DLT system to be one step more highly developed than the earlier implementations involving Esperanto, it is because the increase in the role of Esperanto was due to intrinsic qualities of Esperanto as a planned language. In other words, Esperanto is in DLT no longer treated as any other language (which incidentally has a somewhat more computer-friendly grammar than other languages), but it is now used in DLT for a large part of the overall translation process because of its special features as a planned language. Some facets of this complex application are discussed by Sadler [in this volume]. "The functions fulfilled in DLT by means of Esperanto are numerous. Generally speaking one can say that since the insight about the usefulness of a planned language's particular features for natural-language processing, the whole DLT system design has tended to move into the Esperanto part of the system all functions that are not specific for particular source or target languages. These are all semantic and pragmatic processes of meaning disambiguation, word choice, detection of semantic deixis and reference relations, etc. So-called knowledge of the world has been stored in a lexical knowledge bank and is consulted by a word expert system. All these applications of Artificial Intelligence are in DLT carried out entirely in Esperanto. Let it be said explicitly: Esperanto does not serve as a programming language (DLT is implemented in Prolog and C), but as a human language which renders the full content of the source text being translated with all its nuances, disambiguates it and conveys it to the second translation step to a target language." Obviously, the existence of significant amounts of fully disambiguated, machine-processable Esperanto text in such a translation system opens up the possibility of wholesale mechanical translation into Loglan. This would be, obviously, particularly easy if the (currently poorly-defined) semantics of the Loglan affix system were brought into line with the existing semantics of the Esperanto affix system. In this case, bi-directional mechanical translation be- tween the two languages might become quite easy, possible producing sort of an "instant literature" for the Loglanist. Building a simple correspondence between Esperanto and Loglan affixes is not as far-fetched an idea as it might first seem. Esperanto, like Loglan, uses a single root-stock of affixes which may be arbitrarily concatenated to form compound words. Where Loglan assigns two forms to (most) concepts, a pred and an affix, Esperanto uniformly assigns only a single affix (cutting the learning load in half!), but this poses no particular intertranslation problem. Loglan affixes are designed to be uniquely resolvable, and Esperanto affixes are not, but this problem has evidently already been solved, hence again poses no particular problem to bi-directional translation. Again, Loglan has a (putatively) unambiguous grammar which Esperanto lacks, but this problem has apparently already been satisfactorily resolved at the Esperanto end. ---------------------------------------- Elsewhere on the affix front, Loglan has a set of affixes, but has barely begun the enormous task of building the compound-word vocabulary. Loglan could learn from Esperanto on (at least) two levels. Most obviously, bringing the Loglan affix system into semantic correspondence with the Esperanto affix system would open the door to wholesale borrowing of Esperanto compound metaphors, capitalizing on the planned language community's multi-mega-man-year investment. Unless there are sound engineering concerns to the contrary (I see none), there seems no reason to idly re-invent a wheel of this magnitude. This ain't a DOD project, folks :-) There will be language bigots on both sides opposed in principle to any cooperation, of course... Less obviously, Loglan may be able to benefit from the design knowledge gained from a century's experience with, and linguistic study of, the Esperanto affix system. Klaus Schubert's paper "An unplanned development in planned languages: A study of word grammar" is suggestive. Zamenhof, like Jim Brown, paid no particular attention to word formation in his original design, simply providing a uniform stock of primitives which could be concatenated at will to create new words. Despite this lack of conscious planning, linguistic study of word formation in Esperanto (started by Rene de Saussure - not to be confused with Ferdinand Saussure - and continued by Sergej Kuznecov and others), this simple syntactic combination rule has supported the development of a systematic set of semantic combination rules. These (unwritten and unconscious but nevertheless universal) semantic combination rules allow the Esperantist, when faced with an unfamiliar compound word, to not only decompose it into (usually) familiar primitives, but also to somewhat systematically deduce the meaning of the word. Recent decades have apparently seen increasingly free use of these facilities. I won't attempt a summary of these semantic rules here, but will try to give the flavor. Even though the primitive stock syntactically forms a single neutral pool, it appears that prims [gismu] are semantically treated in word combination by Esperantists as being divided into noun, verb and modifier (combined adverb/adjective) classes, which combine with distinctively different rules. This distinction provides one dimension for sorting prims. A second, orthogonal dimension sorts prims into the categories independent morpheme, declension morpheme, ending (these first three correspond roughly to Loglan's "little words"), affixoid, affix and root (these final three correspond to the Loglan affix set). These affix types combine according to a word- compounding grammar which allows the listener to distinguish (among other things) those compounds whose meaning is directly deducible from the meaning of the component prims, from those compounds whose meaning is metaphorical and must be learned. If Loglan were to borrow the Esperanto compound vocabulary wholesale, it would of course, willy nilly, inherit these semantic regularities as well. Otherwise, it might be well to study these regularities and consciously incorporate them in the Loglan vocabulary. ------------------------------------- lojbab responds:

1. Of the authors, Detlev Blanke is on our mailing list, but probably too recently to have based anything he wrote on our material.

2. Jeff's quoted description of the Netherlands translation project is useful; we were certainly aware of it.

3. The Netherlands project is based on Esperanto - but with a caveat. It uses a formalized 'written' Esperanto form that may be slightly different from spoken forms, but most importantly has disambiguating information encoded in the way the language is written. For example grouping of modifiers (our 'pretty little girls school' problem) is solved by using extra SPACES to disambiguate which terms modify which.

4. Esperanto's affix system is similarly ambiguous, though not as bad as 1975 Loglan was. I've been given a few examples. Some handy ones are 'romano' which is either a 'novel' (root + no affix) or 'Roman' (root 'Romo' = Rome plus affix -an-) and 'banano' which is either 'banana' or 'bather' (from 'bano' = bath + - an- again). I've been told there are many others. This type of ambiguity presents no problem to a machine translator, which can store hyphens to separate affixes etc.

5. I have not investigated Esperanto's affix system thoroughly, but it is not compatible with Lojban's. (We did ensure at one point that we had gismu, and therefore rafsi corresponding to each of the Esperanto affixes, though.) Simply put, Lojban has rafsi for EACH of its gismu. Esperanto has only a couple of dozen, and a MUCH larger root set. Some Esperanto affixes have several Lojban equivalents. For example, we now have "na'e", "no'e" and "to'e" for scalar negation of various sorts to correspond to Esperanto's "mal-". Note that Jeff did not mention the large root set in his comments. Most of these roots are combined by concatenation, like German. But apparently as often as not a new root is coined rather than concatenate, since Esperanto has no stigma attached to borrowing. But it is not true that Lojban has two forms while Esperanto only has one.

6. The Esperanto affix/semantic system is probably even more poorly defined than Lojban's. As Jeff said, it is largely intuitive; this means independent of a rule system. However, there are rules; this was mentioned a few times in the recent JL debates between Don Harlow, Athelstan and myself. A guy named Kalocsay apparently wrote up the rules early in this century; they are some 40- 50 pages long and most Esperantists never read them much less learn them. They also are apparently rather freely violated in actual usage; they were descriptive of the known language, not prescriptive. By the way, I suspect that Lojban's compounding semantics is actually better-defined than it seems. I just don't know enough about semantic theory to attempt to write it up. Jim Carter wrote a paper several years ago, which we can probably offer for distribution (or he can), on the semantics of compound place structures. We haven't adopted what he has said whole-hog, but it certainly has been influential.

7. We will probably make extensive use of Esperanto dictionaries when we start our buildup of the Lojban lujvo vocabulary. We thus will not reinvent the wheel in totality. BUT, we cannot do this freely for a large number of reasons. a) our root set is different than theirs. Some of their compounds will thus not work. The same is true of old Loglan words. We've been held up on translating Jim Carter's Akira story (the one he uses in all his guaspi examples) from old Loglan to Lojban by this need to retranslate all the compounds (which he used extensively and in ways inconsistent with our current, better defined semantics). b) as mentioned above, our affixes are not in 1-to-1 correspondence. c) their compounds undoubtedly have a strong European bias. I doubt if it is as bad as Jim Brown's (who built the compound for 'to man a ship' from the metaphor 'man-do'; i.e. 'to do as a man to'. He also did 'kill' as 'dead-make' where 'make' is the concept 'to make ... from materials ...' Sounds more like Frankenstein to me, folks.) But I suspect Esperanto has a few zinger's in there. Indeed, I understand the Ido people criticized Esperanto most significantly for its illogical word building, though I don't have details. I also intend to draw heavily from Chinese, which has a more Lojbanic tanru 'meta- phor' system since it doesn't distinguish between nouns, verbs, and adjectives. Esperanto tries to get around this by allowing relatively free conversion between these categories, but the root concepts are taken from European languages that more rigidly categorize words, and their compounds probably reflect European semantics. d) Most importantly, Esperanto words are not gismu. They do not have place structures. Lojban words do, and the affix semantics and compound semantics must be consistent with those place structures. We've covered this in previous discussions in the guise of warning against 'figurative' metaphors that are inconsistent with the place structures. e) Nope. Most importantly is another reason. Lojban is its own language. It should not be an encoded Esperanto any more then it should be an encoded English. I suspect that just like English words, Esperanto words sometimes have diverse multiple context-dependent meanings (though again perhaps less severely than English). We want to minimize this occurrence in Lojban if not prevent it (we may not succeed, but we can try - the rule that every word created must have a place structure is a good start.) The bottom line is that each Esperanto word must be checked for validity, just like any other lujvo proposal, but must also be translated into its closest equivalent Lojban tanru as well, and have a place structure written, etc. The bulk of dictionary writing is this other work. I can and have made new tanru/lujvo (without working out the place structures) at the rate of several per MINUTE for related concepts. Coranth D'Gryphon posted a couple hundred proposals last December (that no one commented on), which he made based on English definitions. We have perhaps 200 PAGES of word proposals to go through. Nearly all of these have no place structures defined or are defined haphazardly. Lojban also has a multi-man-year investment behind it, though not 'mega'. No, Jeff, we aren't a DOD project, but in terms of people working on it and time spent, we've far exceeded many such projects. And word-building, whether for better or worse, has received the greatest portion of that effort, since that is all most people have felt competent to work on. (Incidentally, the Netherlands project IS a government sponsored project, if not defense-related. If we had several million dollars, I think we'd be well along the way to a translator ourselves. Sheldon Linker has claimed that he could do a Lojban conversing program with heuristic 'understanding' a la HAL 9000 in 5 man-years. This is, in my mind, of comparable difficulty to a heuristic translation program. Any comments out there from those who know more than I do on this subject? ----------------------------------------- Mike Urban: While I am a dyed-in-the-wool Esperantist, I agree that attempting to modify or extend Lojban in imitation of various features of Esperanto would be a mistake (I also lose patience with reformers who want to Lojbanify aspects of Esperanto). Esperanto's `affix system is ambiguous' to the extent that the language itself is indeed lexically ambiguous. Not only `affixes' but roots themselves are combinable, and so it is possible to come up with endless puns like the `banano' ones you mentioned (`literaturo' might be a tower of letters, i.e., a `litera turo'). Without the careful, but somewhat restrictive, phonological rules that Loglan or Lojban provides, this kind of collision is inevitable. The borrowing of words in Esperanto (`neologisms') instead of using a compound form is a controversial topic. Claude Piron, in his recent book, La Bona Lingvo, argues (quite convincingly, I think) that the tendency of some Es- perantists to use neologisms, usually from French, English, or Greek, is partly based on pedanticism, partly based on Eurocentrism (``you mean, everyone doesn't know what `monotona' means?), partly a Francophone desire to have a separate word for everything, and largely a failure to really Think IN Esperanto, rather than translating. In any case, the distinction in Esperanto between affixes and root words has always been a thin one (Zamenhof mentioned that you can do anything with an affix that you can do with a root), and has been getting even thinner in recent years. Combining by concatenation is every bit as intrinsic to the language as the use of suffixes. You asked about Ido and Esperanto. While I have not looked at Ido in a number of years, I recall that the main gripe of the Idists was not that Esperanto was too European - indeed, one of their reforms was to discard Esperanto's rather a priori `correlative' system of relative pronouns (which works rather as if we used `whus' instead of `how' for parallelism with `what/that, where/ there') in favor of a more latinate - but unsystematic - assortment of words. If anything, Idists tended towards a more Eurocentric (or Francocentric) view than Esperantists did. Ido's affix system, however, attempted to be more like Loglan/Lojban. They took the view that predicates did not have intrinsic parts of speech; thus any conversion of meaning through the use of affixes should be `reversible'. Thus, if `marteli' is `to hammer', then `martelo' must mean an act of hammering, not (as in Esperanto) `a hammer'; or, if `martelo' means `a hammer', then `marteli' must mean `to be a hammer'. One result of this is a somewhat larger assortment of affixes than Esperanto possesses, (for example, a suffix that would transform a noun root `martelo' to a root meaning `to hammer') with rather subtle shades of distinction in some cases. The result is a language that is only slightly more logical than Esperanto, but proportionally harder to learn, and no less Eurocentric. Linguistic tinkerers like the Idists underestimated the organic quality of Esperanto, or of any living language. Indeed, one of the valuable aspect of Lojban or Loglan, if either ever develops a substantial population of fluent speakers, will be to observe the extent to which the common usages of the language diverge from the prescriptive definitions. Such effects will, I think, be easier to isolate and analyze in a language that was created `from whole cloth' than in an a posteriori language like Esperanto.