Talk:BPFK Section: PEG Morphology Algorithm

From Lojban
Jump to navigation Jump to search

Posted by PierreAbbat on Wed 30 of Jan., 2008 04:32 GMT posts: 324 On Tuesday 29 January 2008 15:42, arj wrote: > Re: BPFK Section: PEG Morphology Algorithm > > Author: arj > > How is the lack of pauses following BY treated in the morphology? How > should it be treated? > > CLL is terribly ambiguous on this point.

Pe'i the lack of a pause following BY should not be an error, but if it results in a brivla it should be parsed as a brivla. If the BY is followed by any number of CV cmavo and then a pause, this cannot be parsed as a brivla, but if BY is followed by CCV or CVV or CV'V, it might. So I suggest that if BY is followed by {bu}, the pause be moved after {bu} to keep the letteral without a pause in it.

If the "Y" is stressed, however, the BY should be treated as a letteral, regardless of whether a pause occurs before the next CCV or CVV. "y" is never stressed in brivla, and someone spelling a word may stress all the letter names so that they can be heard clearly.

Pierre


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by Anonymous on Wed 30 of Jan., 2008 11:45 GMT > Author: arj > > How is the lack of pauses following BY treated in the morphology?

It accepts Cy without a pause as long as it is not both preceded directly by CV and followed directly by a brivla or by a CVV cmavo, i.e. as long as it cannot be mistaken for a part of a CVCy-lujvo.

{.y'y} never needs a final pause since it can never be the start of a lujvo.

mu'o mi'e xorxes


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by Anonymous on Wed 30 of Jan., 2008 11:59 GMT On 1/29/08, arj wrote: > > Currently, camxes accepts commas anywhere. However, the CLL says that "The comma is used to indicate a syllable break within a word ...". I take this to mean that commas may only occur within words.

It simply ignores all commas.

It also accepts things like {v,a,,,,l,s,i} even though neither {v} nor {s}, nor {} are possible syllables.

It could be tweaked to accept commas only at syllable breaks, but I'm not sure it's worth legislating on that. camxes also accepts some symbols like "?" or "!" as spaces, even though I don't think CLL mentions them as alternatives for {.}.

mu'o mi'e xorxes


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

arj Posted by arj on Wed 30 of Jan., 2008 20:57 GMT posts: 953 On Wed, Jan 30, 2008 at 08:43:52AM -0300, Jorge Llambías wrote: > > Author: arj > > > > How is the lack of pauses following BY treated in the morphology? > > It accepts Cy without a pause as long as it is not both > preceded directly by CV and followed directly by a brivla or by > a CVV cmavo, i.e. as long as it cannot be mistaken for a part > of a CVCy-lujvo. > > {.y'y} never needs a final pause since it can never be the start of > a lujvo.

Shortly after I wrote the original question, Stephen Pollei on IRC pointed out that chapter 4 of the CLL says that "A cmavo of the form ``Cy must be followed by a pause unless another ``Cy-form cmavo follows."

So the CLL is actually unambiguous, and the machine morphology is incorrect.

-- Arnt Richard Johansen http://arj.nvg.org/ Du klickar bara p� en ikon s� SER DU DITT LOKALA N�TVERK. — Z mag@zine lovpriser Win95 i nr. 7/95


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by JohnCowan on Wed 30 of Jan., 2008 21:07 GMT posts: 149 Arnt Richard Johansen scripsit:

> Shortly after I wrote the original question, Stephen Pollei on IRC > pointed out that chapter 4 of the CLL says that "A cmavo of the form > ``Cy must be followed by a pause unless another ``Cy-form cmavo > follows." > > So the CLL is actually unambiguous, and the machine morphology is incorrect.

That statement was not meant by me to be the whole truth. At that time there was no effective mechanization of the morphology, and I put it in as a rule of thumb.

-- But the next day there came no dawn, John Cowan and the Grey Company passed on into the cowan@ccil.org darkness of the Storm of Mordor and were http://www.ccil.org/~cowan lost to mortal sight; but the Dead followed them. --"The Passing of the Grey Company"


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by JohnCowan on Fri 01 of Feb., 2008 15:22 GMT posts: 149 Pierre Abbat scripsit:

> Pe'i the lack of a pause following BY should not be an error, but if it > results in a brivla it should be parsed as a brivla. If the BY is followed by > any number of CV cmavo and then a pause, this cannot be parsed as a brivla, > but if BY is followed by CCV or CVV or CV'V, it might. So I suggest that if > BY is followed by {bu}, the pause be moved after {bu} to keep the letteral > without a pause in it. > > If the "Y" is stressed, however, the BY should be treated as a letteral, > regardless of whether a pause occurs before the next CCV or CVV. "y" is never > stressed in brivla, and someone spelling a word may stress all the letter > names so that they can be heard clearly.

This proposal sounds plausible to me.

-- John Cowan cowan@ccil.org http://www.ccil.org/~cowan Thor Heyerdahl recounts his attempt to prove Rudyard Kipling's theory that the mongoose first came to India on a raft from Polynesia. --blurb for Rikki-Kon-Tiki-Tavi


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

arj Posted by arj on Mon 04 of Feb., 2008 02:11 GMT posts: 953 On Fri, Feb 01, 2008 at 10:19:35AM -0500, John Cowan wrote: > Pierre Abbat scripsit: > > > Pe'i the lack of a pause following BY should not be an error, but if it > > results in a brivla it should be parsed as a brivla. If the BY is followed by > > any number of CV cmavo and then a pause, this cannot be parsed as a brivla, > > but if BY is followed by CCV or CVV or CV'V, it might. So I suggest that if > > BY is followed by {bu}, the pause be moved after {bu} to keep the letteral > > without a pause in it. > > > > If the "Y" is stressed, however, the BY should be treated as a letteral, > > regardless of whether a pause occurs before the next CCV or CVV. "y" is never > > stressed in brivla, and someone spelling a word may stress all the letter > > names so that they can be heard clearly. > > This proposal sounds plausible to me.

Maybe, but having just joined the Dot Side, I am wary of morphological rules that need a lot of context to reliably disambiguate.

How sure can we be that it is possible for humans to learn Pierre's system? I am receptive to persuasive arguments, as I have been in the past. :-)

(On a side note, is it really legal to stress y in any circumstance?)

-- Arnt Richard Johansen http://arj.nvg.org/ Jeg er nok verdens sydligste sengev�ter. Forutsatt at ingen p� basen p� Sydpolen driver med slikt, da. --Erling Kagge: Alene til Sydpolen


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by JohnCowan on Mon 04 of Feb., 2008 02:18 GMT posts: 149 Arnt Richard Johansen scripsit:

> How sure can we be that it is possible for humans to learn Pierre's > system? I am receptive to persuasive arguments, as I have been in the > past. :-)

There's what's legal, and then there's what's advisable. The name mgrvgrvlnmsrpr is legal but not advisable.

> (On a side note, is it really legal to stress y in any circumstance?)

Sure, notably in names: dybYtolsrfrz.

-- As you read this, I don't want you to feel John Cowan sorry for me, because, I believe everyone cowan@ccil.org will die someday. http://www.ccil.org/~cowan --From a Nigerian-type scam spam


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by PierreAbbat on Mon 04 of Feb., 2008 04:35 GMT posts: 324 On Sunday 03 February 2008 11:17, Arnt Richard Johansen wrote: > Maybe, but having just joined the Dot Side, I am wary of morphological > rules that need a lot of context to reliably disambiguate. > > How sure can we be that it is possible for humans to learn Pierre's system? > I am receptive to persuasive arguments, as I have been in the past. :-)

To learn to *speak* it, or to learn to *hear* it? Take for instance /lEkymoi/. I'm not recommending that anyone say that, but that if someone does, it not be received as an error. What it is parsed as depends on another question: whether "y" is allowed in lujvo where it is not required. If "y" is allowed, /lEkymoi/ is {lekymoi} which is the same as {lekmoi}; if not, it is {le ky moi}. If one wants to say {le ky moi}, one should say /lekYmoi/ or /lekY.moi/ or /leky.moi/, but not /lekymoi/.

Pierre


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by Anonymous on Tue 05 of Feb., 2008 10:38 GMT On 1/30/08, Pierre Abbat wrote: > > Pe'i the lack of a pause following BY should not be an error, but if it > results in a brivla it should be parsed as a brivla. If the BY is followed by > any number of CV cmavo and then a pause, this cannot be parsed as a brivla, > but if BY is followed by CCV or CVV or CV'V, it might. So I suggest that if > BY is followed by {bu}, the pause be moved after {bu} to keep the letteral > without a pause in it.

That part is already covered by camxes. In fact, if Cy is folowed by {bu} or any other CV cmavo there is no need to pause at all on account of that Cy.

> If the "Y" is stressed, however, the BY should be treated as a letteral, > regardless of whether a pause occurs before the next CCV or CVV. "y" is never > stressed in brivla, and someone spelling a word may stress all the letter > names so that they can be heard clearly.

This is currently not covered by camxes, which only recognizes A, E, I, O, U (or space after a brivla) as stress markers. So for example both {LOMYmoi} and {lomYmoi} are parsed as lujvo. L, M and Y are never distinguished from l, m and y.

mu'o mi'e xorxes


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by "=?ISO-8859-1?Q?Jorge_Llamb=EDas?=" on Sun 07 of Dec., 2008 14:26 GMT On Sun, Dec 7, 2008 at 9:59 AM, arj wrote: > > I'm running Robin's test corpus through bouth camxes and the official parser to check for discrepancies.

Yes, there are quite a few.

> This is one of the first things I found: > > Official: PASS camxes: FAIL > .i,iai,ii,iai,ion. > > The official parser thinks this is a cmene; camxes thinks this is a nonLojbanWord.

I seem to remember CLL doesn't like triphthongs but I can't find a quote now..

camxes is OK with triphthongs (eight of them: iai, iau, iei, ioi, uai, uau, uei, uoi), but doesn't like a syllable that ends in a semi-vowel to be immediately followed by another that starts with one, so the problem for camxes are "iai,ii" and "iai,ion".

.i,ia,ii,ia,ion should be fine.

> If there is a good reason why this shouldn't be allowed (I don't see any), then we should add it to the CLL errata.

We don't yet have a clear consensus about what the rules for long vowel clusters should be. (Nor for long consonant clusters, for that matter.)

mu'o mi'e xorxes


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by "=?ISO-8859-1?Q?Jorge_Llamb=EDas?=" on Mon 08 of Dec., 2008 15:53 GMT The camxes rules for vowels can be summarized as follows:

(1) There are exactly ten possible vocalic syllable nuclei: a, e, i, o, u, ai, au, ei, oi, y

(2) Every syllable MUST have an onset. (Including ".", " ' ", "i" and "u" as special non-C onsets.)

(3) A syllable that ends with a diphthong nucleus cannot be directly followed by one with an i/u-onset.

Therefore, there are only 40 possible syllabes without a C, namely:

.a, .e, .i, .o, .u, .ai, .au, .ei, .oi, .y 'a, 'e, 'i, 'o, 'u, 'ai, 'au, 'ei, 'oi, 'y ia, ie, ii, io, iu, iai, iau, iei, ioi, iy ua, ue, ui, uo, uu, uai, uau, uei, uoi, uy

With these rules, it is possible to have indefinitely long strings of vowels, but the "middle syllables" will always have to be one of ia, ie, ii, io, iu, (iy), ua, ue, ui, uo, uu, (uy). Falling diphthongs can only occur at the end of a string of vowels, and "single" vowels can only occur at the beginning of a string (in fact preceded by . or ').

mu'o mi'e xorxes


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

arj Posted by arj on Mon 08 of Dec., 2008 16:35 GMT posts: 953 On Mon, Dec 08, 2008 at 12:51:01PM -0300, Jorge Llambías wrote: > The camxes rules for vowels can be summarized as follows: > > (1) There are exactly ten possible vocalic syllable nuclei: a, e, i, > o, u, ai, au, ei, oi, y > > (2) Every syllable MUST have an onset. (Including ".", " ' ", "i" and > "u" as special non-C onsets.) > > (3) A syllable that ends with a diphthong nucleus cannot be directly > followed by one with an i/u-onset.

What happens if we remove 2) and 3)? Especially 2) is as I understand it rare among natural languages.

-- Arnt Richard Johansen http://arj.nvg.org/ "My speech recognition software may have trouble with ordinary words, but not with ketoprofen." --Magnus Itland


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by JohnCowan on Mon 08 of Dec., 2008 17:00 GMT posts: 149 Arnt Richard Johansen scripsit:

> > (2) Every syllable MUST have an onset. (Including ".", " ' ", "i" and > > "u" as special non-C onsets.) > > > > (3) A syllable that ends with a diphthong nucleus cannot be directly > > followed by one with an i/u-onset. > > What happens if we remove 2) and 3)? Especially 2) is as I understand > it rare among natural languages.

Lifting those constraints means a severe threat to stability and audio-visual isomorphism. We really don't want words like aoaoaoa, and things like ai,iu are too easily mistaken for ai,u or a,iu. We introduced ' into Lojban (Loglan didn't have it) precisely to reduce the risk of such problems.

-- John Cowan cowan@ccil.org http://www.ccil.org/~cowan C'est la` pourtant que se livre le sens du dire, de ce que, s'y conjuguant le nyania qui bruit des sexes en compagnie, il supplee a ce qu'entre eux, de rapport nyait pas. --Jacques Lacan, "L'Etourdit"


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by "=?ISO-8859-1?Q?Jorge_Llamb=EDas?=" on Mon 08 of Dec., 2008 19:06 GMT On Mon, Dec 8, 2008 at 1:59 PM, John Cowan wrote: > Arnt Richard Johansen scripsit: > >> > (2) Every syllable MUST have an onset. (Including ".", " ' ", "i" and >> > "u" as special non-C onsets.) >> > >> > (3) A syllable that ends with a diphthong nucleus cannot be directly >> > followed by one with an i/u-onset. >> >> What happens if we remove 2) and 3)? Especially 2) is as I understand >> it rare among natural languages. > > Lifting those constraints means a severe threat to stability and > audio-visual isomorphism. We really don't want words like aoaoaoa,

That particular case is not something I mind much. In fact I used "sincrboa" and "tricrbaobao" in the Little Prince translation, which was done before camxes, and which I guess I'll have to change if I want it to comply. CLL also has "gugdrkorea", although in another place it says that there can't be 5-letter fu'ivla without an apostrophe, inplicitly forbidding something like "sprea".

The main reason I went with (2) is to disallow aaa, eee, ooo, and also to simplify the rules for i/u, with their double function as vowel and semi-vowel.

>and > things like ai,iu are too easily mistaken for ai,u or a,iu.

Yes. Disallowing "ai,ia" and "au,ua" can be seen as a simple extension of the "no double consonant" rule, just like an,na or at,ta are disallowed, since i/u can be seen as consonants there, and disallowing "ai,ua" and "au,ia" is similar to an additional forbidden consonant pair.

mu'o mi'e xorxes


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by JohnCowan on Mon 08 of Dec., 2008 20:00 GMT posts: 149 Jorge Llambías scripsit:

> That particular case is not something I mind much. In fact I used > "sincrboa" and "tricrbaobao" in the Little Prince translation, which > was done before camxes, and which I guess I'll have to change if I > want it to comply.

Vowel hiatus is often not stable: it tends to turn into diphthongs. Loglan, in fact, uses "ao" for the diphthong written "au" in Lojban; Loglan "aa", "ae", "au", "ea", "ee", "eo", "eu", "oa", "oe", "oo", "ou" are all instances of hiatus. Adding "'" not only eliminated hiatus, but added lots more vowel pairs (meaning more cmavo and rafsi) to play with. In addition, eliminating things like "tiu" in favor of "ti'u" reduced the chance that the "t" would be palatalized into "tcu".

> CLL also has "gugdrkorea",

Something I have regretted *deeply* since then. "gugdrkore'a" would have been appropriate, or something based on a native name.

-- John Cowan cowan@ccil.org http://ccil.org/~cowan Heckler: "Go on, Al, tell 'em all you know. It won't take long." Al Smith: "I'll tell 'em all we *both* know. It won't take any longer."


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

arj Posted by arj on Mon 08 of Dec., 2008 20:36 GMT posts: 953 On Mon, Dec 08, 2008 at 11:59:58AM -0500, John Cowan wrote: > Arnt Richard Johansen scripsit: > > > > (2) Every syllable MUST have an onset. (Including ".", " ' ", "i" and > > > "u" as special non-C onsets.) > > > > > > (3) A syllable that ends with a diphthong nucleus cannot be directly > > > followed by one with an i/u-onset. > > > > What happens if we remove 2) and 3)? Especially 2) is as I understand > > it rare among natural languages. > > Lifting those constraints means a severe threat to stability and > audio-visual isomorphism.

Hang on a minute. This sounds as if those constraints existed in Lojban to begin with, which AFAIK, they did not. We may need to _introduce_ new constraints to maintain stability and AVI, but it isn't immediately obvious what those should be.

I assert that any new morphological rules that we come up with should be so simple that they can be translated into a human-readable form.

> We really don't want words like aoaoaoa, and > things like ai,iu are too easily mistaken for ai,u or a,iu. We introduced > ' into Lojban (Loglan didn't have it) precisely to reduce the risk of > such problems.

The rules also must not be an insurmountable burden to the speaker. Competent Lojbanists could not get the no la/lai/doi in cmene right. How can they internalise the rules that rule out *.i,iai,ii,iai,ion?

-- Arnt Richard Johansen http://arj.nvg.org/ Many familiar with Descartes' work are likely to remember him from philosophy courses as that French guy who was wrong a lot. --Daniel Harbour


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by "=?ISO-8859-1?Q?Jorge_Llamb=EDas?=" on Mon 08 of Dec., 2008 20:36 GMT On Mon, Dec 8, 2008 at 4:59 PM, John Cowan wrote: > > Vowel hiatus is often not stable: it tends to turn into diphthongs.

How is that measured? Everything in phonology is unstable in the long run, but how do we determine whether some feature is especially unstable?

> Loglan, in fact, uses "ao" for the diphthong written "au" in Lojban;

Yes, I never really understood that. I can unserstand "ao" eventually turning into "au" but not why anyone would choose "ao" as the starting orthography for that diphthong.

> Loglan "aa", "ae", "au", "ea", "ee", "eo", "eu", "oa", "oe", "oo", "ou" > are all instances of hiatus.

Yes. All of those except "aa" and "ou" occur in Spanish words, so they are not particularly problematic for me. (Even "aa" occurs in some names of Arabic origin. And "ou" can occur between words.)

> Adding "'" not only eliminated hiatus, but > added lots more vowel pairs (meaning more cmavo and rafsi) to play with.

That can be seen as a bad thing actually. Lojban has an excess, not a lack of cmavo. :-)

> In addition, eliminating things like "tiu" in favor of "ti'u" reduced > the chance that the "t" would be palatalized into "tcu".

But that would have meant 170 more monosyllabic cmavo to play with! :-) camxes currently does allow Ci and Cu as syllable onsets though.

mu'o mi'e xorxes


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by "=?ISO-8859-1?Q?Jorge_Llamb=EDas?=" on Mon 08 of Dec., 2008 20:45 GMT On Mon, Dec 8, 2008 at 5:35 PM, Arnt Richard Johansen wrote: > > How can they internalise the rules that rule out *.i,iai,ii,iai,ion?

It's basically the same rule that rules out *.an,nas.

mu'o mi'e xorxes


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by JohnCowan on Mon 08 of Dec., 2008 22:29 GMT posts: 149 Arnt Richard Johansen scripsit:

> Hang on a minute. This sounds as if those constraints existed in Lojban > to begin with, which AFAIK, they did not. We may need to _introduce_ > new constraints to maintain stability and AVI, but it isn't immediately > obvious what those should be.

There has never been a official formalized morphology algorithm, so the question of what pre-existed really doesn't arise.

> I assert that any new morphological rules that we come up with should > be so simple that they can be translated into a human-readable form.

I agree.

> The rules also must not be an insurmountable burden to the > speaker. Competent Lojbanists could not get the no la/lai/doi > in cmene right. How can they internalise the rules that rule out > *.i,iai,ii,iai,ion?

As Jorge says, it's just a matter of avoiding double consonant sounds, if we read the i in Vi and iV (and ditto for u) as pseudo-consonants.

-- We pledge allegiance to the penguin John Cowan and to the intellectual property regime cowan@ccil.org for which he stands, one world under http://www.ccil.org/~cowan Linux, with free music and open source software for all. --Julian Dibbell on Brazil, edited


Posted by PierreAbbat on Tue 09 of Dec., 2008 06:27 GMT posts: 324 On Monday 08 December 2008 10:51:01 Jorge Llambías wrote: > The camxes rules for vowels can be summarized as follows: > > (1) There are exactly ten possible vocalic syllable nuclei: a, e, i, > o, u, ai, au, ei, oi, y > > (2) Every syllable MUST have an onset. (Including ".", " ' ", "i" and > "u" as special non-C onsets.)

I consider " ' " to be ambisyllabic: it can't be assigned to either syllable, but belongs to both or separates them. This exists in natlangs too: in "narrow" the r can't be assigned to either syllable because neither æ nor ær is a valid word ending (except in geekish, where I pronounce "char" as kær, but I've heard ker for that).

> (3) A syllable that ends with a diphthong nucleus cannot be directly > followed by one with an i/u-onset. > > Therefore, there are only 40 possible syllabes without a C, namely: > > .a, .e, .i, .o, .u, .ai, .au, .ei, .oi, .y > 'a, 'e, 'i, 'o, 'u, 'ai, 'au, 'ei, 'oi, 'y > ia, ie, ii, io, iu, iai, iau, iei, ioi, iy > ua, ue, ui, uo, uu, uai, uau, uei, uoi, uy

As far as the brivla morphology is concerned, there are no triphthongs; thus "mliau" is two syllables, by default "mlia,u". I consider all sequences of "aeiou" (I'm not sure about sequences with "y") to be valid, with commas implicit after every two or between pairs that aren't valid diphthongs.

Pierre


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by "=?ISO-8859-1?Q?Jorge_Llamb=EDas?=" on Tue 09 of Dec., 2008 11:46 GMT On Tue, Dec 9, 2008 at 2:07 AM, Pierre Abbat wrote: > > As far as the brivla morphology is concerned, there are no triphthongs; > thus "mliau" is two syllables, by default "mlia,u".

camxes wouldn't accept it as a valid brivla, because it takes it as one syllable. And CLL implicitly doesn't allow it either, when it says (Ch. 4 Sect. 4):

<< The five letter length distinguishes gismu from lujvo and fu'ivla. (It is possible to have fu'ivla like ``spa'i that are five letters long, but they must have ``'; no gismu contains ``'.) >>

("spa'i" is a typo for "spra'i", "spa'i" fails the slinku'i test and besides doesn't have five letters as counted in this chapter.)

> I consider all sequences > of "aeiou" (I'm not sure about sequences with "y") to be valid, with commas > implicit after every two or between pairs that aren't valid diphthongs.

But is it possible to hear the difference between ai,a and ai,ia? It seems it would have to be just one of length, which is usually not phonemic in Lojban. And although I might be able to tell "a" and "aa" apart, I don't think I could tell "aaa" apart from "aaaa".

mu'o mi'e xorxes


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by "=?ISO-8859-1?Q?Jorge_Llamb=EDas?=" on Tue 09 of Dec., 2008 11:50 GMT On Tue, Dec 9, 2008 at 8:45 AM, Jorge Llambías wrote:

> On Tue, Dec 9, 2008 at 2:07 AM, Pierre Abbat wrote: >> >> As far as the brivla morphology is concerned, there are no triphthongs; >> thus "mliau" is two syllables, by default "mlia,u". > > camxes wouldn't accept it as a valid brivla, because it takes it as > one syllable.

Correction: it takes "liau" as one syllable, not the whole "mliau". So "am,liau" for example would be accepted.

mu'o mi'e xorxes


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by PierreAbbat on Tue 09 of Dec., 2008 14:15 GMT posts: 324 On Tuesday 09 December 2008 06:45:54 Jorge Llambías wrote:

> On Tue, Dec 9, 2008 at 2:07 AM, Pierre Abbat wrote: > > As far as the brivla morphology is concerned, there are no triphthongs; > > thus "mliau" is two syllables, by default "mlia,u". > > camxes wouldn't accept it as a valid brivla, because it takes it as > one syllable. And CLL implicitly doesn't allow it either, when it says > (Ch. 4 Sect. 4): > > << > The five letter length distinguishes gismu from lujvo and fu'ivla. (It > is possible to have fu'ivla like ``spa'i that are five letters long, > but they must have ``'; no gismu contains ``'.) > > > ("spa'i" is a typo for "spra'i", "spa'i" fails the slinku'i test and > besides doesn't have five letters as counted in this chapter.)

I think that's an error in the Book. "sprae" is a valid fu'ivla because it's two syllables and not a slinku'i. So is "spae", but "spa'e" is a slinku'i. And there are five-letter fu'ivla that begin with a vowel and have no apostrophe, such as "aizdo". (Of course, "aizdo kumte" must be distinguished from "ai zdokumte".) What distinguishes gismu is that they have CVCCV or CCVCV form.

> > I consider all sequences > > of "aeiou" (I'm not sure about sequences with "y") to be valid, with > > commas implicit after every two or between pairs that aren't valid > > diphthongs. > > But is it possible to hear the difference between ai,a and ai,ia? It > seems it would have to be just one of length, which is usually not > phonemic in Lojban. And although I might be able to tell "a" and "aa" > apart, I don't think I could tell "aaa" apart from "aaaa".

"aaaa" is four syllables, not a long vowel, so you could distinguish them by changing pitch at each syllable boundary. In Spanish, how do you pronounce "ay, a", "haya", and "halla"?

Pierre


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by PierreAbbat on Tue 09 of Dec., 2008 14:16 GMT posts: 324 On Tuesday 09 December 2008 06:50:25 Jorge Llambías wrote: > On Tue, Dec 9, 2008 at 8:45 AM, Jorge Llambías wrote:

> > On Tue, Dec 9, 2008 at 2:07 AM, Pierre Abbat wrote: > >> As far as the brivla morphology is concerned, there are no triphthongs; > >> thus "mliau" is two syllables, by default "mlia,u". > > > > camxes wouldn't accept it as a valid brivla, because it takes it as > > one syllable. > > Correction: it takes "liau" as one syllable, not the whole "mliau". So > "am,liau" for example would be accepted.

I take "am,liau" as two words "a mliau". "almiau" is one word.

Pierre


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by "=?ISO-8859-1?Q?Jorge_Llamb=EDas?=" on Tue 09 of Dec., 2008 14:46 GMT On Tue, Dec 9, 2008 at 11:14 AM, Pierre Abbat wrote: > > "aaaa" is four syllables, not a long vowel, so you could distinguish them by > changing pitch at each syllable boundary.

Perhaps you could, but that would require introducing a new feature in Lojban, since pitch is never otherwise phonemic.

> In Spanish, how do you > pronounce "ay, a", "haya", and "halla"?

I* pronounce "haya" and "halla" identically (something like Lojban "aja" or "aca", I don't have any voicing distinction there in Spanish and it's hard to tell which of the two my y=ll is exactly, something in between), and I pronounce "ay, a" as Lojban "aia". Others will pronounce all three the same, and some people in Spain will pronounce "halla" as something close to Lojban "alia".

But none of that helps to distinguish Lojban "ai,ia" from "ai,a". (I can think of some ways of distinguishing them, but none that involve Lojbanic phonemic features).

I remember a discussion I had about the distinction between "bra,ian" and "brai,an". I don't doubt some people can make and hear a difference (for me it's very hard), but in Lojban we are not required to make or hear any difference between those two valid forms of writing the same name.

mu'o mi'e xorxes


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by JohnCowan on Tue 09 of Dec., 2008 15:16 GMT posts: 149 Pierre Abbat scripsit:

> I think that's an error in the Book. "sprae" is a valid fu'ivla because it's > two syllables and not a slinku'i. So is "spae", but "spa'e" is a slinku'i.

Neither "sprae" nor "spae" is a valid word at all, because "ae" is not a valid vowel sequence.

> And there are five-letter fu'ivla that begin with a vowel and have no > apostrophe, such as "aizdo".

That's an interesting class of fu'ivla; if it's ever been discussed before, I don't remember it.

> What distinguishes gismu is that they have CVCCV or > CCVCV form.

In fact yes.

-- Do I contradict myself? John Cowan Very well then, I contradict myself. cowan@ccil.org I am large, I contain multitudes. http://www.ccil.org/~cowan --Walt Whitman, Leaves of Grass


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by "=?ISO-8859-1?Q?Jorge_Llamb=EDas?=" on Tue 09 of Dec., 2008 20:05 GMT On Mon, Dec 8, 2008 at 7:28 PM, John Cowan wrote: > Arnt Richard Johansen scripsit: > >> I assert that any new morphological rules that we come up with should >> be so simple that they can be translated into a human-readable form. > > I agree.

Here is a simple way to state the camxes vowel cluster rule:

V = a, e, i, o, u, y (full vowels) S = i, u (semi-vowels)

(1) Two full vowels cannot be adjacent. (2) Two semivowels cannot be adjacent.

That means two of i/u can be adjacent only if one is playing full vowel and the other is playing semi-vowel. Then the only vowel clusters allowed are those of the form: SVSV...S. At least one V, and then any number of alternating S's and V's

mu'o mi'e xorxes


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by JohnCowan on Tue 09 of Dec., 2008 20:25 GMT posts: 149 Jorge Llambías scripsit: > On Mon, Dec 8, 2008 at 7:28 PM, John Cowan wrote: > > Arnt Richard Johansen scripsit: > > > >> I assert that any new morphological rules that we come up with should > >> be so simple that they can be translated into a human-readable form. > > > > I agree. > > Here is a simple way to state the camxes vowel cluster rule: > > V = a, e, i, o, u, y (full vowels) > S = i, u (semi-vowels) > > (1) Two full vowels cannot be adjacent. > (2) Two semivowels cannot be adjacent.

+1

-- John Cowan http://www.ccil.org/~cowan cowan@ccil.org "After all, would you consider a man without honor wealthy, even if his Dinar laid end to end would reach from here to the Temple of Toplat?" "No, I wouldn't", the beggar replied. "Why is that?" the Master asked. "A Dinar doesn't go very far these days, Master. --Kehlog Albran Besides, the Temple of Toplat is across the street." The Profit


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

arj Posted by arj on Tue 09 of Dec., 2008 20:25 GMT posts: 953 On Tue, Dec 09, 2008 at 05:04:12PM -0300, Jorge Llambías wrote: > On Mon, Dec 8, 2008 at 7:28 PM, John Cowan wrote: > > Arnt Richard Johansen scripsit: > > > >> I assert that any new morphological rules that we come up with should > >> be so simple that they can be translated into a human-readable form. > > > > I agree. > > Here is a simple way to state the camxes vowel cluster rule: > > V = a, e, i, o, u, y (full vowels) > S = i, u (semi-vowels) > > (1) Two full vowels cannot be adjacent. > (2) Two semivowels cannot be adjacent. > > That means two of i/u can be adjacent only if one is playing full > vowel and the other is playing semi-vowel. Then the only vowel > clusters allowed are those of the form: SVSV...S. At least one > V, and then any number of alternating S's and V's

Good.

This has the interesting consequence that vowel sequences that would not otherwise be legal, is permissible when one of them is playing semi-vowel. For example, the name of Ioioui, a character from one of Jon Bing's novels, can be Lojbanized as {.ioiouin.}, while {.ioioun}.

(Yes, the no ou rule has been bugging me for quite some time. If Anglophones can keep e and ei apart, they should be able to handle o and ou.)

-- Arnt Richard Johansen http://arj.nvg.org/ Vacuum cleaners suck. Kings rule. Ice is cool.


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by "=?ISO-8859-1?Q?Jorge_Llamb=EDas?=" on Tue 09 of Dec., 2008 21:07 GMT On Tue, Dec 9, 2008 at 5:24 PM, Arnt Richard Johansen wrote: > On Tue, Dec 09, 2008 at 05:04:12PM -0300, Jorge Llambías wrote: >> >> Here is a simple way to state the camxes vowel cluster rule: >> >> V = a, e, i, o, u, y (full vowels) >> S = i, u (semi-vowels) >> >> (1) Two full vowels cannot be adjacent. >> (2) Two semivowels cannot be adjacent. >> >> That means two of i/u can be adjacent only if one is playing full >> vowel and the other is playing semi-vowel. Then the only vowel >> clusters allowed are those of the form: SVSV...S. At least one >> V, and then any number of alternating S's and V's > > Good.

I should have added that the S at the very end can only be an "i" after "a", "e", "o", or an "u" after "a". Not just any S after any V.

> This has the interesting consequence that vowel sequences that would not otherwise be legal, is permissible when one of them is playing semi-vowel. For example, the name of Ioioui, a character from one of Jon Bing's novels, can be Lojbanized as {.ioiouin.}, while {.ioioun}. > > (Yes, the no ou rule has been bugging me for quite some time. If Anglophones can keep e and ei apart, they should be able to handle o and ou.)

The missing "eu" and "ou" diphthongs are indeed an ugly gap in the Lojban system.

mu'o mi'e xorxes


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by JohnCowan on Tue 09 of Dec., 2008 21:32 GMT posts: 149 Arnt Richard Johansen scripsit:

> (Yes, the no ou rule has been bugging me for quite some time. If > Anglophones can keep e and ei apart, they should be able to handle o > and ou.)

Blame it on the anglophones, and specifically the caught-cot-merging American ones. We can keep e and ei apart because e is /E/ (as in DRESS) whereas ei is /eI/ (as in FACE). If we had specified that o was /O/ (as in THOUGHT), we could have made ou /oU ~ @U/ (as in GOAT), and there would be sufficient difference to be safe.

But perhaps half of all Americans have /A/ rather than /O/ in THOUGHT words, which would have made Lojban a and o too similar, so o remains /O ~ oU ~ @U/ and ou remains banned.

As for eu, there is nothing like it in any accent of English (the nearest thing is the /æU/ in the Southern Hemisphere, which is the local representation of /aU ~ AU/), so it was never in the running.

-- How they ever reached any conclusion at all is starkly unknowable to the human mind. http://www.ccil.org/~cowan --"Backstage Lensman", Randall Garrett


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by PierreAbbat on Wed 10 of Dec., 2008 03:38 GMT posts: 324 On Tuesday 09 December 2008 10:15:16 John Cowan wrote: > Pierre Abbat scripsit: > > I think that's an error in the Book. "sprae" is a valid fu'ivla because > > it's two syllables and not a slinku'i. So is "spae", but "spa'e" is a > > slinku'i. > > Neither "sprae" nor "spae" is a valid word at all, because "ae" is not a > valid vowel sequence.

It's not a valid diphthong, but the example "bang,r,kore,a" in the Book shows that vowel sequences that aren't diphthongs can occur in fu'ivla. Commas make no difference to the identity of a word, so "sprae" is the same as "spra,e".

As vowels in a string of vowels are paired from the left, as long as they form valid diphthongs, "i,iai,i,iai,ion" is pronounced the same as "i,ia,i,i,ia,i,ion". There is a list of valid diphthongs in the Book, but no list of valid triphthongs.

> > And there are five-letter fu'ivla that begin with a vowel and have no > > apostrophe, such as "aizdo". > > That's an interesting class of fu'ivla; if it's ever been discussed before, > I don't remember it.

"iglu" has been discussed, and we concluded that such words are valid whether the cluster is a valid initial or only a valid medial. Other fu'ivla with only two consonants are "a'orne", "iedra", "io'imbe", "jboia", "orka", "uiski", "uitki", "ulmu", and "urci".

Pierre


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by JohnCowan on Wed 10 of Dec., 2008 03:58 GMT posts: 149 Pierre Abbat scripsit:

> It's not a valid diphthong, but the example "bang,r,kore,a" in the Book > shows that vowel sequences that aren't diphthongs can occur in fu'ivla.

No, it shows that John Cowan should never have put in that damned example.

> Commas make no difference to the identity of a word, so "sprae" is > the same as "spra,e".

Agreed. And both are invalid.

> As vowels in a string of vowels are paired from the left, as long as > they form valid diphthongs, "i,iai,i,iai,ion" is pronounced the same as > "i,ia,i,i,ia,i,ion".

Both of them suck.

> There is a list of valid diphthongs in the Book, but > no list of valid triphthongs.

Should have been.

> "iglu" has been discussed, and we concluded that such words are valid > whether the cluster is a valid initial or only a valid medial. Other > fu'ivla with only two consonants are "a'orne", "iedra", "io'imbe", > "jboia", "orka", "uiski", "uitki", "ulmu", and "urci".

I'm good with all of these.

-- But you, Wormtongue, you have done what you could for your true master. Some reward you have earned at least. Yet Saruman is apt to overlook his bargains. I should advise you to go quickly and remind him, lest he forget your faithful service. --Gandalf John Cowan


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by PierreAbbat on Wed 10 of Dec., 2008 14:43 GMT posts: 324 On Tuesday 09 December 2008 16:32:15 John Cowan wrote: > Blame it on the anglophones, and specifically the caught-cot-merging > American ones. We can keep e and ei apart because e is /E/ (as in DRESS) > whereas ei is /eI/ (as in FACE). If we had specified that o was /O/ > (as in THOUGHT), we could have made ou /oU ~ @U/ (as in GOAT), and there > would be sufficient difference to be safe. > > But perhaps half of all Americans have /A/ rather than /O/ in THOUGHT > words, which would have made Lojban a and o too similar, so o remains /O ~ > oU ~ @U/ and ou remains banned.

We shouldn't base Lojban phonology on what some dialect of English has. I have no trouble pronouncing "cot", "caught", "coat", "cotte", and "côte" distinctly.

I think it's a bad idea to make "re" and "rei" both number words. I have a similar problem with Spanish; I sometimes mishear "doce" and "trece" as "dos" and "tres" (no one I know is a Castilian) or confuse "sesenta" and "setenta".

If we allow "ou" as a diphthong, will it be treated like "ei" in that "Cou" is a possible rafsi?

> As for eu, there is nothing like it in any accent of English (the > nearest thing is the /æU/ in the Southern Hemisphere, which is the > local representation of /aU ~ AU/), so it was never in the running.

It occurs in Spanish (e.g. neutro). When I coined a word for "Basque country", I took the Basque phrase, but changed the initial "eu" to "au" because "e,u" didn't sound as close to the original.

Pierre


Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

BPFK Section: PEG Morphology Algorithm

Posted by JohnCowan on Wed 10 of Dec., 2008 15:41 GMT posts: 149 Pierre Abbat scripsit:

> We shouldn't base Lojban phonology on what some dialect of English has.

I wasn't justifying the past, just explaining it. Loglan/Lojban has had four and only four falling diphthongs for almost half a century. My sense is that that isn't about to change.

> I think it's a bad idea to make "re" and "rei" both number words.

The hex digits had to be squeezed in in order to make the mnemonic pattern work given the existing cmavo assignments. I agree that this was unfortunate.

> If we allow "ou" as a diphthong, will it be treated like "ei" in that > "Cou" is a possible rafsi?

I suppose it would.

-- We do, doodley do, doodley do, doodley do, John Cowan What we must, muddily must, muddily must, muddily must; Muddily do, muddily do, muddily do, muddily do, http://www.ccil.org/~cowan Until we bust, bodily bust, bodily bust, bodily bust. --Bokonon

Earlier

Posted by arj on Tue 29 of Jan., 2008 20:36 GMT posts: 953 Currently, camxes accepts commas anywhere. However, the CLL says that "The comma is used to indicate a syllable break within a word ...". I take this to mean that commas may only occur within words.

-arj

Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

Re: BPFK Section: PEG Morphology Algorithm

arj Posted by arj on Tue 29 of Jan., 2008 20:42 GMT posts: 953 How is the lack of pauses following BY treated in the morphology? How should it be treated?

CLL is terribly ambiguous on this point.

> Note that the lerfu words ending in ``y were written (in Example 2.1 and Example 2.2) with pauses after them. It is not strictly necessary to pause after such lerfu words, but failure to do so can in some cases lead to ambiguities: ...

> A safe guideline is to pause after any cmavo ending in ``y unless the next word is also a cmavo ending in ``y. The safest and easiest guideline is to pause after all of them.

-arj

Score: 0.00 Vote: 1 2 3 4 5 top of page Reply

Edit  Delete  Report this post	

Re: BPFK Section: PEG Morphology Algorithm

arj Posted by arj on Sun 07 of Dec., 2008 12:59 GMT posts: 953 I'm running Robin's test corpus through bouth camxes and the official parser to check for discrepancies. This is one of the first things I found:

Official: PASS camxes: FAIL .i,iai,ii,iai,ion.

The official parser thinks this is a cmene; camxes thinks this is a nonLojbanWord.

If there is a good reason why this shouldn't be allowed (I don't see any), then we should add it to the CLL errata.

-arj

Posted by rlpowell on Thu 16 of Dec., 2004 20:35 GMT posts: 14214

This grammar classifies words by their morphological class (cmene, gismu, lujvo, fuhivla, cmavo, and non-lojban-word). It does not sort them into grammatical classes (CMENE, BRIVLA, A, BAI, BAhE, ..., ZOhU).

Why not? Mine certainly does.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Thu 16 of Dec., 2004 23:18 GMT posts: 1912

Robin: > Re: PEG Morphology Algorithm > This grammar classifies words by their morphological class (cmene, gismu, > lujvo, fuhivla, cmavo, and non-lojban-word). It does not sort them into > grammatical classes (CMENE, BRIVLA, A, BAI, BAhE, ..., ZOhU). > > Why not? Mine certainly does.

Mainly to save myself a lot of typing. :-)

Also, because I want to keep separate the two things which are conceptually different. The transition to grammatical classes could be done as follows:

CMENE <- cmene BRIVLA <- gismu / lujvo / fuhivla A <- &cmavo (a / e / o / u / j i) .... ZOhU <- &cmavo z o h u CMAVO <- !A !BAI ... !ZOhU cmavo

(I am accepting commas anywhere, things like {b,roda}, {co,i} etc. I'm not clear on what the official comma rules outside of cmene are.)

Of course, an actual parser can jump over some steps in order to be more efficient, but this is intended to be conceptually clear rather than efficient.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Yahoo! Mail - Helps protect you from nasty viruses. http://promotions.yahoo.com/new_mail

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Thu 16 of Dec., 2004 23:42 GMT posts: 1912

I have the rule:

CVV-rafsi

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Thu 16 of Dec., 2004 23:53 GMT posts: 1912

> Re: PEG Morphology Algorithm > > I have the rule: > > CVV-rafsi

I don't know what happened to the rest of what I had written...

Let's try again:

CVV-rafsi <- consonant vowel h? vowel

which allows for example {voe} as a rafsi. This is like {vo'e}, of rafsi form but not actually assigned.

The alternative would be to make the rule:

CVV-rafsi <- consonant (vowel h vowel / a i / a u / e i / o i)

So for example {voebra} will then be rejected as a lujvo form, but allowed as a fuhivla.

The choice has more drastic consequences for other words:

If {voebra} is of lujvo form, then {zvoebra} fails the slinku'i test and so is not a valid fuhivla, otherwise, it is a valid fuhivla.

Conversely, if {voebra} is a valid lujvo, so is {tozvoebra}. Otherwise {tozvoebra} breaks down as {to zvoebra}.

My preference at the moment is to allow these pseudo-rafsi and pseudo-lujvo because that makes the rules simpler. Any other opinions?

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Send a seasonal email greeting and help others. Do good. http://celebrity.mail.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Fri 17 of Dec., 2004 00:12 GMT

On Thursday 16 December 2004 18:51, Jorge "Llambías" wrote: > --- wikidiscuss@lojban.org wrote: > > Re: PEG Morphology Algorithm > > > > I have the rule: > > > > CVV-rafsi > > I don't know what happened to the rest of what I had written... > > Let's try again: > > CVV-rafsi <- consonant vowel h? vowel > > which allows for example {voe} as a rafsi. This is like > {vo'e}, of rafsi form but not actually assigned. > > The alternative would be to make the rule: > > CVV-rafsi <- consonant (vowel h vowel / a i / a u / e i / o i) > > So for example {voebra} will then be rejected as a lujvo form, > but allowed as a fuhivla. > > The choice has more drastic consequences for other words: > > If {voebra} is of lujvo form, then {zvoebra} fails the slinku'i > test and so is not a valid fuhivla, otherwise, it is a valid fuhivla. > > Conversely, if {voebra} is a valid lujvo, so is {tozvoebra}. > Otherwise {tozvoebra} breaks down as {to zvoebra}. > > My preference at the moment is to allow these pseudo-rafsi and > pseudo-lujvo because that makes the rules simpler. Any other > opinions?

valfendi does not allow {voe} as a rafsi, thus {voebra} is a fu'ivla, {zvoebra} is also a fu'ivla, and {tozvoebra} breaks up. I put the whole table of adjacent characters in it.

The same applies if the second of these three-letter groups has a non-lujvo vowel pair: {kankua} is a fu'ivla, {ckankua} is a fu'ivla (it means "skunk"), and {packankua} breaks up.

phma -- li ze te'a ci vu'u ci bi'e te'a mu du li ci su'i ze te'a mu bi'e vu'u ci

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Fri 17 of Dec., 2004 00:12 GMT posts: 1912

I'm allowing a "y" after any CVC-rafsi no matter what consonant follows. So I allow {selyma'o} as well as {selma'o}.

I'm not sure if there was a rule against this, but the restriction is not required for unambiguity, and implementing it would complicate the rules enormously, so I'm not doing it.

mu'o mi'e xorxes

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Fri 17 of Dec., 2004 03:13 GMT posts: 1912

> The same applies if the second of these three-letter groups has a non-lujvo > vowel pair: {kankua} is a fu'ivla, {ckankua} is a fu'ivla (it means "skunk"), > and {packankua} breaks up.

OK, I'm modifying my rules to bring them in line with that.

Another question:

CLL says:

(1) "It is always legal to use the apostrophe (IPA h) sound in pronouncing a comma."

(2) "Commas are never required: no two Lojban words differ solely because of the presence or placement of a comma."

(3) "There exist 16 diphthongs in the Lojban language. ... Diphthongs always constitute a single syllable."

This seems to lead to contradiction. If {kanku,a} can be pronounced like {kanku'a} by (1) and the comma is not required by (2) so it is equivalent to {kankua}. How many different words are among {kanku'a}, {kanku,a} and {kankua}, and which are they?

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Yahoo! Mail - You care about security. So do we. http://promotions.yahoo.com/new_mail

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Fri 17 of Dec., 2004 03:13 GMT

On Thursday 16 December 2004 19:33, Jorge "Llambías" wrote: > Another question: > > CLL says: > > (1) "It is always legal to use the apostrophe (IPA h) sound in > pronouncing a comma." > > (2) "Commas are never required: no two Lojban words differ solely > because of the presence or placement of a comma." > > (3) "There exist 16 diphthongs in the Lojban language. ... Diphthongs > always constitute a single syllable." > > This seems to lead to contradiction. If {kanku,a} can be > pronounced like {kanku'a} by (1) and the comma is not required > by (2) so it is equivalent to {kankua}. How many different > words are among {kanku'a}, {kanku,a} and {kankua}, and which > are they?

(1) is a leftover from Loglan, or a confusion because of the Loglan compatibility orthography, or something like that, and is incorrect. Diphthongs are pronounced as one syllable, unless there is a comma between the vowels, but the comma makes no difference to whether it's a valid word, or which one (the question of which one is moot with cmene, since they can be polysemous). So {kankua} and {kanku,a} are the same word, though stressed on different syllables, and {kanku'a} is different. I asked the same question when I was developing valfendi.

phma -- S Fa1>+/- !TM M-- K H T-- t? AT++ SY Te- SC- FO- D P !Tz E++ L

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Fri 17 of Dec., 2004 03:13 GMT posts: 1912

> > > > (1) "It is always legal to use the apostrophe (IPA h) sound in > > pronouncing a comma." > > > > (2) "Commas are never required: no two Lojban words differ solely > > because of the presence or placement of a comma." > > > > (3) "There exist 16 diphthongs in the Lojban language. ... Diphthongs > > always constitute a single syllable." > > (1) is a leftover from Loglan, or a confusion because of the Loglan > compatibility orthography, or something like that, and is incorrect.

OK, good.

> Diphthongs are pronounced as one syllable, unless there is a comma between > the vowels, but the comma makes no difference to whether it's a valid word, > or which one (the question of which one is moot with cmene, since they can be > polysemous). So {kankua} and {kanku,a} are the same word, though stressed on > different syllables, and {kanku'a} is different.

What happens with things like {prua} vs. {pru,a}. {prua} can't be a valid fuhivla because it has only one syllable, but {pru,a} seems to fill all the requisites for valid fuhivlahood.

> I asked the same question > when I was developing valfendi.

Yes, I vaguely remember, but I didn't remember the answers.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Read only the mail you want - Yahoo! Mail SpamGuard. http://promotions.yahoo.com/new_mail

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Fri 17 of Dec., 2004 09:40 GMT posts: 14214

On Thu, Dec 16, 2004 at 04:12:04PM -0800, wikidiscuss@lojban.org wrote: > Re: PEG Morphology Algorithm > > I'm allowing a "y" after any CVC-rafsi no matter what consonant > follows. So I allow {selyma'o} as well as {selma'o}. > > I'm not sure if there was a rule against this, but the restriction > is not required for unambiguity,

  • UUHHHH*.

What you just wrote is "se ly ma'o", so far as I can tell.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sat 18 of Dec., 2004 00:15 GMT

On Friday 17 December 2004 02:54, Robin Lee Powell wrote: > On Thu, Dec 16, 2004 at 04:12:04PM -0800, wikidiscuss@lojban.org > > wrote: > > Re: PEG Morphology Algorithm > > > > I'm allowing a "y" after any CVC-rafsi no matter what consonant > > follows. So I allow {selyma'o} as well as {selma'o}. > > > > I'm not sure if there was a rule against this, but the restriction > > is not required for unambiguity, > > *UUHHHH*. > > What you just wrote is "se ly ma'o", so far as I can tell.

No it's not. {se ly ma'o} requires a pause between {ly} and {ma'o}.

phma -- ..i toljundi do .ibabo mi'afra tu'a do ..ibabo damba do .ibabo do jinga ..icu'u la ma'atman.

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sat 18 of Dec., 2004 00:15 GMT

On Thursday 16 December 2004 21:29, Jorge "Llambías" wrote: > --- Pierre Abbat wrote: > > Diphthongs are pronounced as one syllable, unless there is a comma > > between the vowels, but the comma makes no difference to whether it's a > > valid word, or which one (the question of which one is moot with cmene, > > since they can be polysemous). So {kankua} and {kanku,a} are the same > > word, though stressed on different syllables, and {kanku'a} is different. > > What happens with things like {prua} vs. {pru,a}. > {prua} can't be a valid fuhivla because it has only one syllable, > but {pru,a} seems to fill all the requisites for valid fuhivlahood.

The commas are ignored, so {pru,a} is invalid, but {prae} is valid. valfendi currently says that {prua} is invalid but {pru,a} is valid, which is a bug.

phma -- Mes règles mensuelles ont lieu une fois par an. -Les Perles de la médecine

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sat 18 of Dec., 2004 00:15 GMT

wikidiscuss@lojban.org scripsit: > Re: PEG Morphology Algorithm > > I'm allowing a "y" after any CVC-rafsi no matter what consonant > follows. So I allow {selyma'o} as well as {selma'o}. > > I'm not sure if there was a rule against this, but the restriction > is not required for unambiguity, and implementing it would > complicate the rules enormously, so I'm not doing it.

Currently there is no such thing as an optional y-hyphen; all hyphens (both -y- and -n-/-r-) are either required or forbidden. Whether this matters depends on whether your grammar is intended to be definitional (in which case it has to get this right) or only an implementation (in which case it is allowed to have bugs).

-- John Cowan http://www.ccil.org/~cowan "One time I called in to the central system and started working on a big thick 'sed' and 'awk' heavy duty data bashing script. One of the geologists came by, looked over my shoulder and said 'Oh, that happens to me too. Try hanging up and phoning in again.'" --Beverly Erlebacher

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Sat 18 of Dec., 2004 00:15 GMT posts: 1912

> > I'm allowing a "y" after any CVC-rafsi no matter what consonant > > follows. So I allow {selyma'o} as well as {selma'o}. > > > > I'm not sure if there was a rule against this, but the restriction > > is not required for unambiguity, > > *UUHHHH*. > > What you just wrote is "se ly ma'o", so far as I can tell.

That requires a pause after ly. Otherwise, {selyli'a} would be {se ly li'a} too. Since CVCy is obligatory with some following consonants, it won't be ambiguous if it's allowed with any following consonant.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Yahoo! Mail - Easier than ever with enhanced search. Learn more. http://info.mail.yahoo.com/mail_250

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Sat 18 of Dec., 2004 00:15 GMT posts: 1912

> Currently there is no such thing as an optional y-hyphen; all hyphens > (both -y- and -n-/-r-) are either required or forbidden.

That's what I thought I remembered. The rule makes some sense for the n/r-hyphens, but I don't see the point of it for the y-hyphen.

> Whether > this matters depends on whether your grammar is intended to be > definitional (in which case it has to get this right) or only an > implementation (in which case it is allowed to have bugs).

It's intended to be definitional, but not necessarily in accordance with the current official definition. :-)

Since some things in the grammar definition are changing anyway, it's not a big deal to adjust these small details.

One other thing that bothers me in the official definition is the restriction for ntc/nts/ndj/ndz in lujvo but not in cmene or fuhivla.

Currently the rules as I wrote them handle this as in the official prescription, but this lujvo-only restriction is jarring. If they are not pronounceable, then they should not be allowed in cmene and fuhivla either. If they are pronounceable, they should be allowed in lujvo. (My preference would be the latter.)

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''__ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sat 18 of Dec., 2004 00:15 GMT

Jorge Llamb���)B�as scripsit:

> CLL says: > > (1) "It is always legal to use the apostrophe (IPA h) sound in > pronouncing a comma."

The rationale for this is that "ae"/"a,e" (for example) is a sequence that occurs only in "foreign words", and that a "native-speaker" Lojbanist should be able to pronounce it with the nearest "native" analogue, namely "a'e". This is not a Loglan hangover as others have speculated, since Loglan does not have "'"; you simply have to know which Loglan vowel-pairs are diphthongs and which are vowel sequences.

I personally would be quite content if all such "foreign" sequences were forbidden altogether. Can someone easily check to see whether we have used them in fu'ivla?

> (2) "Commas are never required: no two Lojban words differ solely > because of the presence or placement of a comma."

This was done because the contrary rule (as in Loglan) led to absurdities like ai,ai,aiaglu being different from a,ia,ia,iaglu (in pre-Lojban Loglan, "aiaiaiaglu" was read as the former, but in current Loglan it's the latter). This difference seemed to us to be too subtle, and to threaten audio-visual isomorphism.

> This seems to lead to contradiction. If {kanku,a} can be > pronounced like {kanku'a} by (1) and the comma is not required > by (2) so it is equivalent to {kankua}. How many different > words are among {kanku'a}, {kanku,a} and {kankua}, and which > are they?

I take the current position to be that "kanku'a" is a lujvo, and "kankua" and "kanku,a" are different spellings of the same fu'ivla; I would be in favor of forbidding "kanku,a" altogether. Note that its stress accent is very different, KANkua vs. kanKU,a, so this is not just a matter of a glide vs. a full vowel.

-- John Cowan www.ccil.org/~cowan jcowan@reutershealth.com www.reutershealth.com Monday we watch-a Firefly's house, but he no come out. He wasn't home. Tuesday we go to the ball game, but he fool us. He no show up. Wednesday he go to the ball game, and we fool him. We no show up. Thursday was a double-header. Nobody show up. Friday it rained all day. There was no ball game, so we stayed home and we listened to it on-a the radio. --Chicolini

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sat 18 of Dec., 2004 00:16 GMT

Jorge Llamb���)B�as scripsit:

> That's what I thought I remembered. The rule makes some sense for > the n/r-hyphens, but I don't see the point of it for the y-hyphen.

Optional rules always complicate things for the user.

> One other thing that bothers me in the official definition is > the restriction for ntc/nts/ndj/ndz in lujvo but not in cmene or > fuhivla.

If that's so, it's an error in description; the ban on these should be language-wide, like the ban on "bb" or "pg", and for the same reason: they threaten audio-visual isomorphism. (Which is not to say that speakers of some languages can't make clear distinctions in all these cases.)

I suspect it's specified for lujvo because it came up in the context of lujvo and its extension to fu'ivla and cmene wasn't thought through. This dates back to Loglan days.

-- LEAR: Dost thou call me fool, boy? John Cowan FOOL: All thy other titles http://www.ccil.org/~cowan thou hast given away: jcowan@reutershealth.com That thou wast born with. http://www.reutershealth.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sat 18 of Dec., 2004 00:16 GMT

On Friday 17 December 2004 07:41, Jorge "Llambías" wrote: > One other thing that bothers me in the official definition is > the restriction for ntc/nts/ndj/ndz in lujvo but not in cmene or > fuhivla.

As I understood it, they are forbidden in all words. {mk}, on the other hand, is permitted at the beginning of a cmene, but not a brivla.

phma -- ..i toljundi do .ibabo mi'afra tu'a do ..ibabo damba do .ibabo do jinga ..icu'u la ma'atman.

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Sat 18 of Dec., 2004 00:16 GMT posts: 1912

> > (1) "It is always legal to use the apostrophe (IPA h) sound in > > pronouncing a comma." > > The rationale for this is that "ae"/"a,e" (for example) is a sequence > that occurs only in "foreign words", and that a "native-speaker" Lojbanist > should be able to pronounce it with the nearest "native" analogue, > namely "a'e". This is not a Loglan hangover as others have speculated, > since Loglan does not have "'"; you simply have to know which Loglan > vowel-pairs are diphthongs and which are vowel sequences. > > I personally would be quite content if all such "foreign" sequences > were forbidden altogether. Can someone easily check to see whether we > have used them in fu'ivla?

So the rule you would favor would be something more like: "The vowel pairs aa, ae, ao, ea, ee, eo, eu, oa, oe, oo, ou (with or without intervening commas) are equivalent to a'a, a'e, a'o, e'a, e'e, e'o, e'u, o'a, o'e, o'o, o'u respectively."

But, for example, ua = u,a is always different from u'a.

> > (2) "Commas are never required: no two Lojban words differ solely > > because of the presence or placement of a comma." > > This was done because the contrary rule (as in Loglan) led to > absurdities like ai,ai,aiaglu being different from a,ia,ia,iaglu > (in pre-Lojban Loglan, "aiaiaiaglu" was read as the former, but > in current Loglan it's the latter). This difference > seemed to us to be too subtle, and to threaten audio-visual isomorphism.

{aiaiaiaglu} is not currently a valid fu'ivla, because it doesn't have a consonant cluster in the first five letters.

That rule for fu'ivla is also quite odd. I would find more reasonable to either impose no restriction on the number of letters that can precede the cluster, or make the restriction to be a maximum of two vowels before the cluster (that's what happens with lujvo). Allowing three vowels but not a consonant with three vowels is weird, and makes that part of the algorithm unreasonably complicated.

> I take the current position to be that "kanku'a" is a lujvo, and > "kankua" and "kanku,a" are different spellings of the same fu'ivla; > I would be in favor of forbidding "kanku,a" altogether. Note that > its stress accent is very different, KANkua vs. kanKU,a, so this > is not just a matter of a glide vs. a full vowel.

So the rule you would want is something like "commas are not allowed to break what otherwise would be a diphthong in brivla"?

(I think that agrees with what Pierre said re prua/pru,a)

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Yahoo! Mail - 250MB free storage. Do more. Manage less. http://info.mail.yahoo.com/mail_250

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sat 18 of Dec., 2004 00:16 GMT

Jorge Llamb���)B�as scripsit:

> So the rule you would favor would be something more like: > "The vowel pairs aa, ae, ao, ea, ee, eo, eu, oa, oe, oo, ou > (with or without intervening commas) are equivalent > to a'a, a'e, a'o, e'a, e'e, e'o, e'u, o'a, o'e, o'o, o'u > respectively."

The rule I'd favor is that all of these (including the ones with y, like ay and yo) are erroneous, period.

> {aiaiaiaglu} is not currently a valid fu'ivla, because it > doesn't have a consonant cluster in the first five letters.

Right, but that was not a rule in (at least some versions of) Loglan.

> That rule for fu'ivla is also quite odd. I would find more > reasonable to either impose no restriction on the number of > letters that can precede the cluster, or make the restriction > to be a maximum of two vowels before the cluster (that's what > happens with lujvo).

I have no idea what the motivation was for this rule; it was already a given. I would favor no restriction.

> So the rule you would want is something like "commas are not > allowed to break what otherwise would be a diphthong in brivla"?

I'd favor "Commas are garbage and shouldn't be allowed"; but that may be too radical. How about "Commas are used to clarify pronunciation, not to change it."

-- John Cowan jcowan@reutershealth.com http://www.ccil.org/~cowan O beautiful for patriot's dream that sees beyond the years Thine alabaster cities gleam undimmed by human tears! America! America! God mend thine every flaw, Confirm thy soul in self-control, thy liberty in law! — one of the verses not usually taught in U.S. schools

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Sat 18 of Dec., 2004 00:16 GMT posts: 1912

> > That's what I thought I remembered. The rule makes some sense for > > the n/r-hyphens, but I don't see the point of it for the y-hyphen. > > Optional rules always complicate things for the user.

Not always. My impression is that simple algorithms in general produce easier to learn rules. The algorithm to forbid some y-hyphens would be rather complicated. A user should have no trouble in understanding what {selyma'o} means, and there is no need for them to ever produce that form. It's more complicated to learn to recognize {selyma'o} as an error.

> > One other thing that bothers me in the official definition is > > the restriction for ntc/nts/ndj/ndz in lujvo but not in cmene or > > fuhivla. > > If that's so, it's an error in description; the ban on these should be > language-wide, like the ban on "bb" or "pg", and for the same reason: > they threaten audio-visual isomorphism. (Which is not to say that > speakers of some languages can't make clear distinctions in all these > cases.)

I probably misinterpreted this:

"Lojbanized names can begin or end with any permissible consonant pair, not just the 48 initial consonant pairs listed above, and can have consonant triples in any location, as long as the pairs making up those triples are permissible."

That would mean the name {santcos} I used in the Quixote translation ages ago is wrong.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Yahoo! Mail - Easier than ever with enhanced search. Learn more. http://info.mail.yahoo.com/mail_250

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sat 18 of Dec., 2004 00:16 GMT

Jorge Llamb���)B�as scripsit:

> "Lojbanized names can begin or end with any permissible consonant > pair, not just the 48 initial consonant pairs listed above, and > can have consonant triples in any location, as long as the pairs > making up those triples are permissible."

Yes, this should certainly say that the forbidden triples are forbidden in names as well.

-- Henry S. Thompson said, / "Syntactic, structural, John Cowan Value constraints we / Express on the fly." jcowan@reutershealth.com Simon St. Laurent: "Your / Incomprehensible http://www.reutershealth.com Abracadabralike / schemas must die!" http://www.ccil.org/~cowan

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Sat 18 of Dec., 2004 00:16 GMT posts: 1912

> > "aa, ae, ao, ea, ee, eo, eu, oa, oe, oo, ou" > > The rule I'd favor is that all of these (including the ones with > y, like ay and yo) are erroneous, period.

I think I will implement that.

> > That rule for fu'ivla is also quite odd. I would find more > > reasonable to either impose no restriction on the number of > > letters that can precede the cluster, or make the restriction > > to be a maximum of two vowels before the cluster (that's what > > happens with lujvo). > > I have no idea what the motivation was for this rule; it was already > a given. I would favor no restriction.

I will implement that too. It simplifies things.

> > So the rule you would want is something like "commas are not > > allowed to break what otherwise would be a diphthong in brivla"? > > I'd favor "Commas are garbage and shouldn't be allowed"; but that > may be too radical.

That would be great.

> How about "Commas are used to clarify > pronunciation, not to change it."

Sounds good.

Should something like {co,i} return an error, or should it be allowed but parsed just like {coi}?

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Send a seasonal email greeting and help others. Do good. http://celebrity.mail.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by pycyn on Sat 18 of Dec., 2004 00:16 GMT posts: 2388

Two minor questions:

Why is {ou} not allowed? I suspect the answer is that somepeople can't distinguish between it and {o} but that doesn't seem to be a very good reason, since it suggests that some people mispronounce {o} (as I know that Lojbab with his diminished vowel set does).

Does the restriction on {ntc/nts/ndj/ndz} mean that NONE of them can occur? This seems excessive; at most the recogninition patterns would require that not all of them occur, that some distinctions are neutralized at this point. It would seen that, for example, both {ntc} and {ndz}(except perhaps before {i}) could both occur.

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sat 18 of Dec., 2004 00:16 GMT

On Friday 17 December 2004 07:45, John Cowan wrote: > I personally would be quite content if all such "foreign" sequences > were forbidden altogether. Can someone easily check to see whether we > have used them in fu'ivla?

Looking at the fu'ivla in jbovlaste, and not checking whether these were actually used anywhere, I find the following: cipnrxuazine io'imbe mandioka spatrleoxari spatrxapio stagrleoxari (I have used this one in a recipe)

This last one is pronounced {stagrle,oxari}, which jbovlaste considers to be a different word.

phma -- Without glasses, I can't even distinguish smells... -Les Perles de la médecine

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sat 18 of Dec., 2004 00:16 GMT

On Friday 17 December 2004 09:53, John E Clifford wrote: > Does the restriction on {ntc/nts/ndj/ndz} mean > that NONE of them can occur? This seems > excessive; at most the recogninition patterns > would require that not all of them occur, that > some distinctions are neutralized at this point. > It would seen that, for example, both {ntc} and > {ndz}(except perhaps before {i}) could both > occur.

They are forbidden because they can be confused with {nc/ns/nj/nz}. Spend a year in a southern city and you'll find out why.

phma -- GCS/M d- s-: a+ C++ UL++++$ P+ L+++ E- W+++ N+ o? K? w-- O? M- V- Y++ PGP++ t- 5? X? R- !tv b++ DI !D G e++ h+>---- r- y>+++

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Sat 18 of Dec., 2004 00:16 GMT posts: 1912

> On Friday 17 December 2004 07:45, John Cowan wrote: > > I personally would be quite content if all such "foreign" sequences > > were forbidden altogether. Can someone easily check to see whether we > > have used them in fu'ivla? > > Looking at the fu'ivla in jbovlaste, and not checking whether these were > actually used anywhere, I find the following: > cipnrxuazine > io'imbe > mandioka > spatrleoxari > spatrxapio > stagrleoxari (I have used this one in a recipe)

iV and uV would still be allowed, because they occur in cmavo. The only ones affected would be spatrleoxari and stagrleoxari, which could become -lexari, -loxari, -le'oxari, -lioxari or something else.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Yahoo! Mail - Find what you need with new enhanced search. http://info.mail.yahoo.com/mail_250

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sat 18 of Dec., 2004 00:16 GMT

John E Clifford scripsit:

> Why is {ou} not allowed? I suspect the answer is > that somepeople can't distinguish between it and > {o} but that doesn't seem to be a very good > reason, since it suggests that some people > mispronounce {o} (as I know that Lojbab with his > diminished vowel set does).

Essentially all Americans pronounce long "o" as in "so" as ou, so we ban it. (In British English it's @u, more or less "yu" in Lojban orthography.)

> Does the restriction on {ntc/nts/ndj/ndz} mean > that NONE of them can occur?

Correct. The point is that nc and ntc, ns and nts, ndj and nj, ndz and nz are too easily confused by anglophones, so we ban the first of each pair. We probably should have added mps to this list, as illustrated by words like "Hampshire", "Thompson", "glimpse"; mpz isn't a problem because mz is already banned for idiosyncratic JCB reasons.

-- Henry S. Thompson said, / "Syntactic, structural, John Cowan Value constraints we / Express on the fly." jcowan@reutershealth.com Simon St. Laurent: "Your / Incomprehensible http://www.reutershealth.com Abracadabralike / schemas must die!" http://www.ccil.org/~cowan

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by pycyn on Sat 18 of Dec., 2004 00:16 GMT posts: 2388

Yeah; I heard that when I actually said them -- and I am not even in a southern environment (well, St. Louis is borderline).


> On Friday 17 December 2004 09:53, John E > Clifford wrote: > > Does the restriction on {ntc/nts/ndj/ndz} > mean > > that NONE of them can occur? This seems > > excessive; at most the recogninition patterns > > would require that not all of them occur, > that > > some distinctions are neutralized at this > point. > > It would seen that, for example, both {ntc} > and > > {ndz}(except perhaps before {i}) could both > > occur. > > They are forbidden because they can be confused > with {nc/ns/nj/nz}. Spend a > year in a southern city and you'll find out > why. > > phma > — > GCS/M d- s-: a+ C++ UL++++$ P+ L+++ E- W+++ N+ > o? K? w-- O? M- V- Y++ > PGP++ t- 5? X? R- !tv b++ DI !D G e++ h+>---- > r- y>+++ > > >

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sat 18 of Dec., 2004 00:16 GMT

Jorge Llamb���)B�as scripsit:

> Should something like {co,i} return an error, or should it be > allowed but parsed just like {coi}?

An error, I'd say; it's an attempt to change pronunciation.

-- John Cowan jcowan@reutershealth.com www.reutershealth.com www.ccil.org/~cowan I am he that buries his friends alive and drowns them and draws them alive again from the water. I came from the end of a bag, but no bag went over me. I am the friend of bears and the guest of eagles. I am Ringwinner and Luckwearer; and I am Barrel-rider. --Bilbo to Smaug

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by pycyn on Sat 18 of Dec., 2004 00:17 GMT posts: 2388

> John E Clifford scripsit: > > > Why is {ou} not allowed? I suspect the > answer is > > that somepeople can't distinguish between it > and > > {o} but that doesn't seem to be a very good > > reason, since it suggests that some people > > mispronounce {o} (as I know that Lojbab with > his > > diminished vowel set does). > > Essentially all Americans pronounce long "o" as > in "so" as ou, so we ban it. > (In British English it's @u, more or less > "yu" in Lojban orthography.) > Relevance? Lojban {o} is supposedly the "Italian," "pure," form. since most Lojbanists are native speakers of American English (which doesn't differentiate much on this issue)who cannot hit that tone, the best solution was and is to match the corresponding solution for {e}, using the lower, "short," form. I see that CLL doesn't do that, creating yet another asymmetry in the phonology and so allowing {ei} but not {ou} (of course, there is a matching asymmetry in allowing {oi} but not {eu} and I wouldn't want to do away with that — even though native speakers of AE can — and do — produce this in paralinguistic contexts; disgust being the typical case). Strictly speaking, as a practical matter rather than a theoretic one, {ei} ought to be disallowed as well, since as a matter of fact it and simple {e} are often confused in even fairly clear contexts (the fact that we have a number of word pairs Ce-Cei that go often in the same places doesn't help, of course). But the root is again regular mispronunciation of the vowel (this time in spite of the CLL prescription).

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Sat 18 of Dec., 2004 00:17 GMT posts: 1912

> Jorge Llamb���)B�as scripsit: > > > Should something like {co,i} return an error, or should it be > > allowed but parsed just like {coi}? > > An error, I'd say; it's an attempt to change pronunciation.

Possible comma rules would be:

(1) Allow commas anywhere at all. They don't affect anything.

(2) Allow commas anywhere except in the middle of a diphthong. So for example the the name {i,ain} is illegal, because {iain} must parse as {ia,in}.

(3) Allow commas anywhere in names, but nowhere else.

(4) Allow commas only at permissible syllable boundaries.

(5) ...

The problem with (4) is that then we have to define what the permissible syllable boundaries are. (4) might even be the same as (2). Is {brod,a} allowed? Is {b,roda} allowed? Is {b,r,o,d,a} allowed?

Also, should multiple commas be allowed, as in {b,,r,,o,,d,,a}? They could be used to show slow and careful pronunciation, for example.

I'm inclined to go with (1) at the moment, as I don't see any reasonable restriction rule. I don't think we want the comma rules to be the main part of the morphology algorithm, which is what would happen if we imposed complicated (and quite unnecessary) syllable rules.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Send a seasonal email greeting and help others. Do good. http://celebrity.mail.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sat 18 of Dec., 2004 00:17 GMT

On Friday 17 December 2004 12:31, John E Clifford wrote: > Yeah; I heard that when I actually said them -- > and I am not even in a southern environment > (well, St. Louis is borderline).

I meant southern Lojbangug ;) actually I meant to say it in Lojban (le nu ko zvati lo nantca cu nanca) but it came out in English.

phma -- Now I need a magnifier to find my eyeglasses! -Les Perles de la médecine

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sat 18 of Dec., 2004 00:17 GMT

John E Clifford scripsit:

> Relevance? Lojban {o} is supposedly the > "Italian," "pure," form. since most Lojbanists > are native speakers of American English (which > doesn't differentiate much on this issue)who > cannot hit that tone, the best solution was and > is to match the corresponding solution for {e}, > using the lower, "short," form. I see that CLL > doesn't do that, creating yet another asymmetry > in the phonology

CLL definitely does permit the Italian open "o", also used as the Polish "o". However, this does not help Americans that much, since the short version of this (as in "hot", "pot", "top") has been basically eliminated throughout the U.S. (it persists in Canada), and the long version (as in "awl", "law") survives only in those born east of a certain line and before a certain date. The best option for most Americans is to use their native long "o" sound, and disallow "ou" in Lojban.

-- John Cowan jcowan@reutershealth.com www.reutershealth.com www.ccil.org/~cowan "It's the old, old story. Droid meets droid. Droid becomes chameleon. Droid loses chameleon, chameleon becomes blob, droid gets blob back again. It's a classic tale." --Kryten, Red Dwarf

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Sat 18 of Dec., 2004 00:47 GMT posts: 1912

I'm implementing stress marking as follows:

1- Commas are ignored always. So for example BRAli,e is identical to BRAlie and is a valid fuhivla.

2- Case of all consonants is ignored. BRoDa = broda

3- Case is ignored in both cmene and cmavo, because stress is irrelevant for them. {PApiPEtis} is a valid cmene and {la'E'Au} is a valid cmavo form.

4- iV, uV, ai, au, ei, oi are the only vowel pairs allowed. Other sequences give "no-lojban-word". Strings like aiaueiaii are allowed as long as every adjacent pair in them is allowed.

5- Vowel strings are broken in pairs from the left for purposes of counting syllables: ai-au-ei-ai-i has five syllables.

6- Stress on a diphthong is shown by capitalizing the first vowel in ai, au, ei, oi, and the second vowel in iV, uV. The other member of the diphthong is treated as a consonant, i.e. its case is ignored. {Ia} is considered an unstressed syllable, just like {Ba}. {iA} is stressed, like {bA}.

7- Words with wrong stress patterns such as {broDA} or {brIvlA} produce "non-lojban-word".

Comments?

mu'o mi'e xorxes

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Sat 18 of Dec., 2004 02:59 GMT posts: 14214

On Fri, Dec 17, 2004 at 04:47:52PM -0800, wikidiscuss@lojban.org wrote: > Comments?

I have only one, and it's a meta-comment.

I am pleased beyond words that someone other than me is doing this, and I'm going to take this opportunity to largely ignore the entire proceedings. I really don't want to go anywhere near the morphology if I can help it.

Having said that, I'm going to try to set up the program that builds my parser to snarf the morphology from that page, so it will get at least partially tested.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sat 18 of Dec., 2004 02:59 GMT

I think the defining document for the morphology should be the algorithm, not the PEG code. (The algorithm is in the valfendi tarball and needs editing for clarity.) In an algorithm, and in the C code that implements the algorithm, one can make a copy of a string, modify it in some way, run a test on it, make another copy, modify it in a different way, and run another test. This is not so easy to do in PEG. Thus making the parser simultaneously check that all the y's in a lujvo are valid and that the stress is in the right place given where the commas are makes it a lot bigger than checking each one separately.

I'm talking without much knowledge of PEG, so if there is a way to do two or three tests without multiplying complexity, please let me know.

phma -- li ze te'a ci vu'u ci bi'e te'a mu du li ci su'i ze te'a mu bi'e vu'u ci

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sat 18 of Dec., 2004 02:59 GMT

On Fri, Dec 17, 2004 at 09:08:02PM -0500, Pierre Abbat wrote: > I think the defining document for the morphology should be the > algorithm, not the PEG code. (The algorithm is in the valfendi > tarball and needs editing for clarity.)

No. No. No.

That is unacceptable.

Totally unacceptable.

The English description of the morphology is a cute toy. Not a formalism. It needs to die. Quickly. The sooner the better. The world will be a better place the instant every extant copy is expunged.

> In an algorithm, and in the C code that implements the algorithm, > one can make a copy of a string, modify it in some way, run a test > on it, make another copy, modify it in a different way, and run > another test. This is not so easy to do in PEG. Thus making the > parser simultaneously check that all the y's in a lujvo are valid > and that the stress is in the right place given where the commas are > makes it a lot bigger than checking each one separately. > > I'm talking without much knowledge of PEG, so if there is a way to > do two or three tests without multiplying complexity, please let me > know.

Parsing Expression Grammars are just another formalism like Context Free Grammars. Once the grammar is written, there won't be any reason to write your own morphology parser. You'll take a parser generator for the desired language, and you'll run it on the grammar. It will produce code you don't have to test.

If you're considering writing your own parsing code to duplicate the effect of what is described by the grammar, well, you can entertain yourself however you please. But don't ask the rest of us to suffer for it.

-- Jay Kominek

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sat 18 of Dec., 2004 02:59 GMT

On Friday 17 December 2004 21:22, jkominek@miranda.org wrote: > Parsing Expression Grammars are just another formalism like Context > Free Grammars. Once the grammar is written, there won't be any reason > to write your own morphology parser. You'll take a parser generator > for the desired language, and you'll run it on the grammar. It will > produce code you don't have to test. > > If you're considering writing your own parsing code to duplicate the > effect of what is described by the grammar, well, you can entertain > yourself however you please. But don't ask the rest of us to suffer > for it.

I've already written valfendi, and I did not do it to entertain myself.

phma -- ..i toljundi do .ibabo mi'afra tu'a do ..ibabo damba do .ibabo do jinga ..icu'u la ma'atman.

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sat 18 of Dec., 2004 05:02 GMT

On Fri, Dec 17, 2004 at 09:41:57PM -0500, Pierre Abbat wrote: > On Friday 17 December 2004 21:22, jkominek@miranda.org wrote: > > Parsing Expression Grammars are just another formalism like Context > > Free Grammars. Once the grammar is written, there won't be any reason > > to write your own morphology parser. You'll take a parser generator > > for the desired language, and you'll run it on the grammar. It will > > produce code you don't have to test. > > > > If you're considering writing your own parsing code to duplicate the > > effect of what is described by the grammar, well, you can entertain > > yourself however you please. But don't ask the rest of us to suffer > > for it.

That didn't come out quite the way I wanted, but, shrug.

> I've already written valfendi, and I did not do it to entertain myself.

No, not many people write parsers by hand any more, for fun, or otherwise. Lets stick with describing the morphology in a fashion readily fed into parser generators, so that developers can get guaranteed correctness easily. A English algorithm description condemns them to duplicating effort for potentially dubious and difficult to verify results.

(As an aside, I'm already working on a PEG parser generator which produces C. It mostly works already, and if there is interest, I can also produce a parser generator/parser combo more suited to interactive debugging of the grammar.)

-- Jay Kominek

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sat 18 of Dec., 2004 05:02 GMT

Here's the comment of one of valfendi's functions, isslinkuhi, which is used in finding the beginning of a brivla (or rejecting a string as not containing a valid brivla) and in checking for a valid fu'ivla rafsi (according to my rule, which allows many arbitrarily long fu'ivla to have rafsi). Below is my attempt at a PEG translation. How is the translation?

phma


/* A slinku'i, as far as word breaking is concerned, is anything that matches the regex

Craf3*(gim?$|raf4?y)

but does not match the regex

raf3*(gim?$|raf4?y)

where C matches any consonant raf3 matches any 3-letter rafsi raf4 matches any 4-letter rafsi gim matches any gismu. Anything after the first 'y' is ignored. It has no effect on where to break the word, only on whether the word is valid. */

slinkuhi <- !(3-letter-rafsi* (gismu? space / long-rafsi? y)) consonant 3-letter-rafsi* (gismu? space / long-rafsi? y)

3-letter-rafsi <- CVV-rafsi / CVC-rafsi / CCV-rafsi

-- ..i le babzba ba zbasu lo jbazbabu lo babjba

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Sat 18 of Dec., 2004 07:34 GMT posts: 14214

On Fri, Dec 17, 2004 at 09:08:02PM -0500, Pierre Abbat wrote: > I think the defining document for the morphology should be the > algorithm, not the PEG code.

Over my rotting corpse.

If you want something that is definably unambiguous, but not PEG, that's negotiable, but the day that the Lojban community votes to define Lojban with a giant English algorithm description instead of something provably unambiguous is the day I find something else to do with my spare time.

-Robin

-- http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/ Reason #237 To Learn Lojban: "Homonyms: Their Grate!" Proud Supporter of the Singularity Institute - http://singinst.org/

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sat 18 of Dec., 2004 18:03 GMT

On Saturday 18 December 2004 00:53, Robin Lee Powell wrote: > If you want something that is definably unambiguous, but not PEG, > that's negotiable, but the day that the Lojban community votes to > define Lojban with a giant English algorithm description instead of > something provably unambiguous is the day I find something else to > do with my spare time.

Is a C program sufficiently unambiguous?

xorxes's grammar tells which kind of word a word is, but it requires the word to be already delimited with spaces or periods. valfendi does not require this (except in some places such as the end of cmevla), as long as the stress is indicated in brivla. Like BRKWORDS.TXT, it is designed to take a speech stream and break it into words.

The problem I see with implementing this in PEG is that valfendi bites off a piece by counting syllables after the stress, then checks whether, among other things, the hyphens are in the right place. Is there a way to check one PE against the part of a string that matched another PE?

phma -- Now I need a magnifier to find my eyeglasses! -Les Perles de la médecine

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Sat 18 of Dec., 2004 18:03 GMT posts: 1912

> Here's the comment of one of valfendi's functions, isslinkuhi, which is used > in finding the beginning of a brivla (or rejecting a string as not containing

> a valid brivla) and in checking for a valid fu'ivla rafsi (according to my > rule, which allows many arbitrarily long fu'ivla to have rafsi).

I have not incorporated the concept of fu'ivla rafsi yet in my PEG, but I will try to do so once I understand it well. The idea is that a fu'ivla rafsi can be inserted into a lujvo as long as it can be separated with y hyphens: {other-rafsi y fu'ivla-rafsi y other-rafsi} without ambiguities, right?

> Below is my > attempt at a PEG translation. How is the translation? > > phma > --- > /* A slinku'i, as far as word breaking is concerned, is anything that matches > the regex

>
Craf3*(gim?$|raf4?y)

> but does not match the regex

>
raf3*(gim?$|raf4?y)

> where > C matches any consonant > raf3 matches any 3-letter rafsi > raf4 matches any 4-letter rafsi > gim matches any gismu. > Anything after the first 'y' is ignored. It has no effect on where to > break > the > word, only on whether the word is valid. */ > > slinkuhi <- !(3-letter-rafsi* (gismu? space / long-rafsi? y)) consonant > 3-letter-rafsi* (gismu? space / long-rafsi? y) > > 3-letter-rafsi <- CVV-rafsi / CVC-rafsi / CCV-rafsi

I can't really tell if they are equivalent because I'm not very familiar with C, but it sounds basically right. This is how I handle slinku'i in my PEG:

fuhivla <- !(consonant lujvo) !(consonant final-rafsi) initial-cluster syllable fuhivla-tail

Any lujvo have already been absorbed, otherwise you can just add !lujvo at the beginning.

"!(consonant lujvo) !(consonant final-rafsi)" will reject any string that consists of a consonant+lujvo or a consonant+final-rafsi (e.g. slinku'i, spe'a or zbroda}.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Meet the all-new My Yahoo! - Try it today! http://my.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by stevo on Sat 18 of Dec., 2004 18:03 GMT posts: 381

In a message dated 2004-12-17 11:42:05 AM Eastern Standard Time, jcowan@reutershealth.com writes:


> Correct. The point is that nc and ntc, ns and nts, ndj and nj, ndz and nz > are too easily confused by anglophones, so we ban the first of each pair. > We probably should have added mps to this list, as illustrated by words > like "Hampshire", "Thompson", "glimpse"; mpz isn't a problem because mz > is already banned for idiosyncratic JCB reasons. >

I suspect this should be "mbz", not "mpz", which is disallowed because of the different voicing of 'p' and 'z'. "mz" should be allowed, but isn't.

stevo

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Sat 18 of Dec., 2004 18:03 GMT posts: 1912

> xorxes's grammar tells which kind of word a word is, but it requires the word > to be already delimited with spaces or periods.

That was the first version. The current version already handles stress marking with caps.

> The problem I see with implementing this in PEG is that valfendi bites off a > piece by counting syllables after the stress, then checks whether, among > other things, the hyphens are in the right place. Is there a way to check one

> PE against the part of a string that matched another PE?

Yes, with "&" and "!".

exp <- &exp1 exp2

will succeed only if exp2 starts with or is the start of exp1

exp <- !exp1 exp2

will succeed only if exp2 doesn't start with nor is the start of exp1

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Send a seasonal email greeting and help others. Do good. http://celebrity.mail.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sat 18 of Dec., 2004 18:03 GMT

On Saturday 18 December 2004 08:39, Jorge "Llambías" wrote: > I have not incorporated the concept of fu'ivla rafsi yet in my PEG, but > I will try to do so once I understand it well. The idea is that a > fu'ivla rafsi can be inserted into a lujvo as long as it can be > separated with y hyphens: {other-rafsi y fu'ivla-rafsi y other-rafsi} > without ambiguities, right?

That is correct. It also has to be unambiguous at the beginning of a word, which is what shot down {skalduna}: {le skaldunynai} lexes as {les-kal-dun-y-nai}.

phma -- ..i le babzba ba zbasu lo jbazbabu lo babjba

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sat 18 of Dec., 2004 18:03 GMT

On Saturday 18 December 2004 08:45, Jorge "Llambías" wrote: > --- Pierre Abbat wrote: > > The problem I see with implementing this in PEG is that valfendi bites > > off a piece by counting syllables after the stress, then checks whether, > > among other things, the hyphens are in the right place. Is there a way to > > check one > > > > PE against the part of a string that matched another PE? > > Yes, with "&" and "!". > > exp <- &exp1 exp2 > > will succeed only if exp2 starts with or is the start of exp1 > > exp <- !exp1 exp2 > > will succeed only if exp2 doesn't start with nor is the start of exp1

But how do you check that exp1 and exp2 are identical?

phma -- ..i toljundi do .ibabo mi'afra tu'a do ..ibabo damba do .ibabo do jinga ..icu'u la ma'atman.

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by pycyn on Sat 18 of Dec., 2004 18:03 GMT posts: 2388

Do you really mean to say that only a relatively restricted grpup of Americans say "law" with a low mid back rounded vowel? Aside from a few people in a narrow band across the upper south who add an "r" and a few (I'm not quite sure what the line is) who collapse "aw" with "ah" — but generally say it more like "aw" — I can't remember hearing anyone fail to get this sound right (and even those cases get it right but either add to it or change its role in the overall scheme).


> John E Clifford scripsit: > > > Relevance? Lojban {o} is supposedly the > > "Italian," "pure," form. since most > Lojbanists > > are native speakers of American English > (which > > doesn't differentiate much on this issue)who > > cannot hit that tone, the best solution was > and > > is to match the corresponding solution for > {e}, > > using the lower, "short," form. I see that > CLL > > doesn't do that, creating yet another > asymmetry > > in the phonology > > CLL definitely does permit the Italian open > "o", also used as the Polish "o". > However, this does not help Americans that > much, since the short version > of this (as in "hot", "pot", "top") has been > basically eliminated throughout > the U.S. (it persists in Canada), and the long > version (as in "awl", "law") > survives only in those born east of a certain > line and before a certain date. > The best option for most Americans is to use > their native long "o" sound, > and disallow "ou" in Lojban. > > — > John Cowan jcowan@reutershealth.com > www.reutershealth.com www.ccil.org/~cowan > "It's the old, old story. Droid meets droid. > Droid becomes chameleon. > Droid loses chameleon, chameleon becomes blob, > droid gets blob back > again. It's a classic tale." --Kryten, Red > Dwarf > > >

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sat 18 of Dec., 2004 18:04 GMT

John E Clifford scripsit: > Do you really mean to say that only a relatively > restricted grpup of Americans say "law" with a > low mid back rounded vowel?

Yes, indeed. Westerners and youngsters have merged /O:/ and /A:/.

> and a few (I'm not quite sure what > the line is) who collapse "aw" with "ah" — but > generally say it more like "aw" — I can't > remember hearing anyone fail to get this sound > right (and even those cases get it right but > either add to it or change its role in the > overall scheme).

By no means "a few", but the growing majority.

-- LEAR: Dost thou call me fool, boy? John Cowan FOOL: All thy other titles http://www.ccil.org/~cowan thou hast given away: jcowan@reutershealth.com That thou wast born with. http://www.reutershealth.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by pycyn on Sun 19 of Dec., 2004 07:37 GMT posts: 2388

I know that Lojbab has this feature but I can't find anyone else with it, including a fairly large array of youngsters — from 3 up — and Arizonians of all ages, ditto New Mexicans, Californians and Oregonians. What is the source of your claim?


> John E Clifford scripsit: > > Do you really mean to say that only a > relatively > > restricted grpup of Americans say "law" with > a > > low mid back rounded vowel? > > Yes, indeed. Westerners and youngsters have > merged /O:/ and /A:/. > > > and a few (I'm not quite sure what > > the line is) who collapse "aw" with "ah" -- > but > > generally say it more like "aw" — I can't > > remember hearing anyone fail to get this > sound > > right (and even those cases get it right but > > either add to it or change its role in the > > overall scheme). > > By no means "a few", but the growing majority. > > — > LEAR: Dost thou call me fool, boy? John > Cowan > FOOL: All thy other titles > http://www.ccil.org/~cowan > thou hast given away: > jcowan@reutershealth.com > That thou wast born with. > http://www.reutershealth.com > > >

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Sun 19 of Dec., 2004 07:37 GMT posts: 14214

On Sat, Dec 18, 2004 at 08:13:24AM -0500, Pierre Abbat wrote: > On Saturday 18 December 2004 00:53, Robin Lee Powell wrote: > > If you want something that is definably unambiguous, but not > > PEG, that's negotiable, but the day that the Lojban community > > votes to define Lojban with a giant English algorithm > > description instead of something provably unambiguous is the day > > I find something else to do with my spare time. > > Is a C program sufficiently unambiguous?

Absolutely not. It's not a formalism, it's a piece of code.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sun 19 of Dec., 2004 07:37 GMT

On Saturday 18 December 2004 18:52, Robin Lee Powell wrote: > On Sat, Dec 18, 2004 at 08:13:24AM -0500, Pierre Abbat wrote: > > Is a C program sufficiently unambiguous? > > Absolutely not. It's not a formalism, it's a piece of code.

Is there a formalism that I can translate valfendi into?

The problem appears to be that you and I think differently. When I wrote valfendi, I separated, as well as I could, the operation of splitting a stream of phonemes into words from the operation of determining whether those words are valid. This is easier for me to understand. To do that, I have to find the string that matches one regular expression (or parsing expression or whatever) at the beginning of the remaining text, then check whether that string, no more and no less, matches another expression. For instance, if the text is /dAmymlongEnavau/, the first expression matches /dAmymlongEna/, which the second does not match, although it does match /dAmymlo/. I could write a PEG with two expressions, called lex-brivla and valid-brivla, and then write "brivla <- &lex-brivla valid-brivla", but that would match /dAmymlongEnavau/ and consume /dAmymlo/, even though lex-brivla matched /dAmymlongEna/. Trying to do both checks at once is confusing to me, though you seem to understand it.

So, is there something sufficiently programming-language-like that I can check that it's doing the same as valfendi, and sufficiently formal that you can check that it's doing the same as the PEG?

phma -- My monthly periods happen once per year. -Les Perles de la médecine

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Sun 19 of Dec., 2004 07:37 GMT posts: 14214

On Sat, Dec 18, 2004 at 03:52:24PM -0800, Robin Lee Powell wrote: > On Sat, Dec 18, 2004 at 08:13:24AM -0500, Pierre Abbat wrote: > > On Saturday 18 December 2004 00:53, Robin Lee Powell wrote: > > > If you want something that is definably unambiguous, but not > > > PEG, that's negotiable, but the day that the Lojban community > > > votes to define Lojban with a giant English algorithm > > > description instead of something provably unambiguous is the > > > day I find something else to do with my spare time. > > > > Is a C program sufficiently unambiguous? > > Absolutely not. It's not a formalism, it's a piece of code.

This is slightly innacurate: a C program written to *very* strictly act like a PDA (push down automaton) or FSM (finite state machine) would be acceptable, in that I could turn it in to a real formalism easily.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Sun 19 of Dec., 2004 07:37 GMT posts: 14214

On Sat, Dec 18, 2004 at 08:01:57PM -0500, Pierre Abbat wrote: > On Saturday 18 December 2004 18:52, Robin Lee Powell wrote: > > On Sat, Dec 18, 2004 at 08:13:24AM -0500, Pierre Abbat wrote: > > > Is a C program sufficiently unambiguous? > > > > Absolutely not. It's not a formalism, it's a piece of code. > > Is there a formalism that I can translate valfendi into?

I don't know what formalisms you know, so I can't really answer that.

> The problem appears to be that you and I think differently.

Yes. I tried to explain this to you when I was asking you question about the valfendi algorithm some months ago. I gave up after a while; I couldn't figure out what you were talking about.

> When I wrote valfendi, I separated, as well as I could, the > operation of splitting a stream of phonemes into words from the > operation of determining whether those words are valid. This is > easier for me to understand. To do that, I have to find the string > that matches one regular expression (or parsing expression or > whatever) at the beginning of the remaining text, then check > whether that string, no more and no less, matches another > expression.

That is a fundamentally algorithmic way of thinking, yes.

> For instance, if the text is /dAmymlongEnavau/, the first > expression matches /dAmymlongEna/, which the second does not > match, although it does match /dAmymlo/. I could write a PEG with > two expressions, called lex-brivla and valid-brivla, and then > write "brivla <- &lex-brivla valid-brivla", but that would match > /dAmymlongEnavau/ and consume /dAmymlo/, even though lex-brivla > matched /dAmymlongEna/. Trying to do both checks at once is > confusing to me, though you seem to understand it.

You've taken brivla out of context, so it's hard to go from what you just said to something useful, but I assume that the top level is something like:

morphology <- word*

word <- cmavo / brivla / cmene

Then with your brivla, it only consumes dAmymlo, but that's fine because the next run of "word" will start at "ng" (which will probably cause breakage, but that's fine I hope). I have no idea if this helps you; as with the algorithm itself, I'm not certain I actually have any real idea what you're asking.

One way to do what you want, I suppose, would be to add another stage to the parser. Right now, it's got a morphology stage and a grammar stage. You could break morphology into word-grouping and word-recognition stages. I have not the slightest idea why this approach is useful however.

> So, is there something sufficiently programming-language-like that > I can check that it's doing the same as valfendi,

See, umm, that's impossible. There is no way, in general, to compare to non-trivial programs to show that they are doing the same thing. This would be the whole reason I like formalisms.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Sun 19 of Dec., 2004 07:37 GMT posts: 14214

On Sat, Dec 18, 2004 at 08:53:57AM -0500, Pierre Abbat wrote: > On Saturday 18 December 2004 08:45, Jorge "Llamb?as" wrote: > > --- Pierre Abbat wrote: > > > The problem I see with implementing this in PEG is that > > > valfendi bites off a piece by counting syllables after the > > > stress, then checks whether, among other things, the hyphens > > > are in the right place. Is there a way to check one > > > > > > PE against the part of a string that matched another PE? > > > > Yes, with "&" and "!". > > > > exp <- &exp1 exp2 > > > > will succeed only if exp2 starts with or is the start of exp1 > > > > exp <- !exp1 exp2 > > > > will succeed only if exp2 doesn't start with nor is the start of > > exp1 > > But how do you check that exp1 and exp2 are identical?

By "identical" I assume you mean "consume exactly the same input".

Within the strict formalism, I'm not sure you can. I will ponder this. You can add code to do it, but that rather defeats the purpose (the only place this trick is used in the current grammar is with zoi, where it is unavoidable).

I suggested a workaround in another mail, however.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Sun 19 of Dec., 2004 07:37 GMT posts: 14214

On Sat, Dec 18, 2004 at 05:39:07AM -0800, Jorge Llamb?as wrote: > --- Pierre Abbat wrote: > > Below is my > > attempt at a PEG translation. How is the translation? > > I can't really tell if they are equivalent because I'm not very > familiar with C,

Just for the record, there was no C there, just some regular expressions.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sun 19 of Dec., 2004 07:38 GMT

John E Clifford scripsit: > I know that Lojbab has this feature but I can't > find anyone else with it, including a fairly > large array of youngsters — from 3 up — and > Arizonians of all ages, ditto New Mexicans, > Californians and Oregonians. What is the source > of your claim?

It's a well-known fact. http://cla.calpoly.edu/~jrubba/phon/ipafaq.html is one source picked at random; http://itre.cis.upenn.edu/~myl/languagelog/archives/000836.html is another.

-- Ambassador Trentino: I've said enough. I'm a man of few words. Rufus T. Firefly: I'm a man of one word: scram! --Duck Soup John Cowan

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by pycyn on Sun 19 of Dec., 2004 20:43 GMT posts: 2388

Very interesting though hardly evidence that most people or even most westerners or youths lack "aw." In fact, it seems that most have it, even if only paralinguistically. Oddly — from the point of view of the claims given — the only place where I have heard the collapse regularly is in the extreme Northeast, Maine, where for example, "John" (general American "jahn") is "jawn" or even "jawuhn" (the latter to give length to a normally short vowel, I suspect). Of course, here the collapse goes the opposite way from the American norm, presumably influenced by the Canadian pattern — which does seem to be pretty general in Ontario and the Maritimes (though not in BC and the flyover provinces). None of this seems to me a good case for ignoring "aw" as a preferred pronunciation for Lojban {o}, which was the point here.


> John E Clifford scripsit: > > I know that Lojbab has this feature but I > can't > > find anyone else with it, including a fairly > > large array of youngsters — from 3 up — and > > Arizonians of all ages, ditto New Mexicans, > > Californians and Oregonians. What is the > source > > of your claim? > > It's a well-known fact. > http://cla.calpoly.edu/~jrubba/phon/ipafaq.html > is one source picked at random; > http://itre.cis.upenn.edu/~myl/languagelog/archives/000836.html > is another. > > — > Ambassador Trentino: I've said enough. I'm a > man of few words. > Rufus T. Firefly: I'm a man of one word: scram! > --Duck Soup John > Cowan > > >

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Sun 19 of Dec., 2004 22:33 GMT posts: 1912

I have now added handling of rafsi fuhivla, so except for some minor adjustments I will probably have to do, the morphology PEG is basically ready. Anyone wants to test it?

mu'o mi'e xorxes

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Mon 20 of Dec., 2004 02:12 GMT posts: 14214

On Sun, Dec 19, 2004 at 02:33:25PM -0800, wikidiscuss@lojban.org wrote: > Re: PEG Morphology Algorithm > > I have now added handling of rafsi fuhivla, so except for some > minor adjustments I will probably have to do, the morphology PEG > is basically ready. Anyone wants to test it?

I'll see what I can do. Pierre, can you point me to your latest test suite again?

Which reminds me: I want to make it clear that I'm very happy about Pierre's work on valfendi. I don't think it's the right approach for language definitional purposes (and I told him that right when he started), but that doesn't mean it's worthless, and I don't want to give the impression that I think that.

Pierre's work on valfendi serves two important purposes: it built up a reservoir of expertise in him on the topic, and it got us something to test *against*. Having valfendi's output to compare my own to means we can debug both methods better.

So, Pierre, thank you. And hook me up with those test cases, please.

-Robin

-- http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/ Reason #237 To Learn Lojban: "Homonyms: Their Grate!" Proud Supporter of the Singularity Institute - http://singinst.org/

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Mon 20 of Dec., 2004 02:12 GMT

On Sunday 19 December 2004 20:01, Robin Lee Powell wrote: > On Sun, Dec 19, 2004 at 02:33:25PM -0800, wikidiscuss@lojban.org > > wrote: > > Re: PEG Morphology Algorithm > > > > I have now added handling of rafsi fuhivla, so except for some > > minor adjustments I will probably have to do, the morphology PEG > > is basically ready. Anyone wants to test it? > > I'll see what I can do. Pierre, can you point me to your latest > test suite again?

http://phma.hn.org/Language/valfendi.html There are currently two known bugs: it calls {pru,a} valid but rejects {prua} (I say they're both invalid), and checking of 'y' in lujvo is broken if the "-r" option is specified. I've fixed the second and found what to fix in the first, and I've added a few test cases to the file in the tarball. The new version should be out in a few days.

> Which reminds me: I want to make it clear that I'm very happy about > Pierre's work on valfendi. I don't think it's the right approach > for language definitional purposes (and I told him that right when > he started), but that doesn't mean it's worthless, and I don't want > to give the impression that I think that. > > Pierre's work on valfendi serves two important purposes: it built up > a reservoir of expertise in him on the topic, and it got us > something to test *against*. Having valfendi's output to compare my > own to means we can debug both methods better. > > So, Pierre, thank you. And hook me up with those test cases, > please.

Glad to hear that. I'll change the blurb at the top of the algorithm.

phma -- My monthly periods happen once per year. -Les Perles de la médecine

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Mon 20 of Dec., 2004 02:13 GMT posts: 14214

On Sun, Dec 19, 2004 at 08:23:02PM -0500, Pierre Abbat wrote: > On Sunday 19 December 2004 20:01, Robin Lee Powell wrote: > > On Sun, Dec 19, 2004 at 02:33:25PM -0800, wikidiscuss@lojban.org > > > > wrote: > > > Re: PEG Morphology Algorithm > > > > > > I have now added handling of rafsi fuhivla, so except for some > > > minor adjustments I will probably have to do, the morphology > > > PEG is basically ready. Anyone wants to test it? > > > > I'll see what I can do. Pierre, can you point me to your latest > > test suite again? > > http://phma.hn.org/Language/valfendi.html

OK, there's a testdata.txt file there, but no indications as to how the various lines should come out (i.e. which ones are valid and which aren't and so on). Is that data around somewhere?

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Mon 20 of Dec., 2004 02:13 GMT

On Friday 17 December 2004 07:45, John Cowan wrote: > I personally would be quite content if all such "foreign" sequences > were forbidden altogether. Can someone easily check to see whether we > have used them in fu'ivla?

Checking the IRC log, I find the following: {cafnee}: typo for {cafne}. {skamrmouse}: I'd go with the pronunciation and change it to {skamrmause} if I used that fu'ivla. {xukrteobromino}: Turkeys are not made of chocolate, so this should be {xumrteobromino}, which still has "eo" in it. Checking jboske, I find nothing.

In case anyone else wants to check more data, the command I used is tr \ \\n |sort -u |egrep -i '(aa|ae|ao|ea|ee|eo|eu|oa|oe|oo|ou)'|less followed by the same with commas inserted.

phma -- Maintenant, j'ai besoin d'une loupe pour trouver mes lunettes! -Les Perles de la médecine

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Mon 20 of Dec., 2004 02:13 GMT

On Sunday 19 December 2004 20:32, Robin Lee Powell wrote: > OK, there's a testdata.txt file there, but no indications as to how > the various lines should come out (i.e. which ones are valid and > which aren't and so on). Is that data around somewhere?

No, because that depends on the options. I'll send you the output with the options "-als", which I think corresponds to what xorxes is doing.

phma -- Ils pensent que j'ai un cancer du thé russe... -Les Perles de la médecine

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Mon 20 of Dec., 2004 02:13 GMT posts: 14214

On Sun, Dec 19, 2004 at 09:06:13PM -0500, Pierre Abbat wrote: > On Sunday 19 December 2004 20:32, Robin Lee Powell wrote: > > OK, there's a testdata.txt file there, but no indications as to > > how the various lines should come out (i.e. which ones are valid > > and which aren't and so on). Is that data around somewhere? > > No, because that depends on the options. I'll send you the output > with the options "-als", which I think corresponds to what xorxes > is doing.

I can generate the *output* myself, but the fact that the output is a certain way doesn't mean that it's *right*. A test suite should be marked up to indicate that a human has reviewed it and that result X is expected.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Mon 20 of Dec., 2004 02:16 GMT posts: 14214

One thing that will definately need to get changed is that "y"-as-space and "y bu" both need to get handled in the morphology section. I'll be putting up a new version soon, though.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Mon 20 of Dec., 2004 02:27 GMT

On Sunday 19 December 2004 21:10, Robin Lee Powell wrote: > I can generate the *output* myself, but the fact that the output is > a certain way doesn't mean that it's *right*. A test suite should > be marked up to indicate that a human has reviewed it and that > result X is expected.

I review it before each release. Should I have someone else review it too? Is Nora available for this?

phma -- le xruki le ginxre xrixruba xu xrula cu xrani?

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Mon 20 of Dec., 2004 02:27 GMT posts: 14214

On Sun, Dec 19, 2004 at 09:23:38PM -0500, Pierre Abbat wrote: > On Sunday 19 December 2004 21:10, Robin Lee Powell wrote: > > I can generate the *output* myself, but the fact that the output > > is a certain way doesn't mean that it's *right*. A test suite > > should be marked up to indicate that a human has reviewed it and > > that result X is expected. > > I review it before each release.

....

You seem to be missing my point.

The fact that you reviewed it does not provide me with said review unless you wrote it down somewhere.

I'm trying to generate automated tests here. I can't do that without knowing what a human thinks the expected results should be.

See http://en.wikipedia.org/wiki/Automated_testing and http://en.wikipedia.org/wiki/Regression_testing.

When you said you had a bunch of test cases, I assumed that you had the (human-) expected results too, otherwise it's not of much use.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Mon 20 of Dec., 2004 04:27 GMT

Pierre Abbat scripsit:

> {xukrteobromino}: Turkeys are not made of chocolate, so this should be > {xumrteobromino}, which still has "eo" in it.

Well, I don't know the context, but there are chocolate candies in the shape of a turkey: a hollow chocolate shell covered with aluminum foil colored to look like a turkey.

-- John Cowan jcowan@reutershealth.com www.reutershealth.com www.ccil.org/~cowan Rather than making ill-conceived suggestions for improvement based on uninformed guesses about established conventions in a field of study with which familiarity is limited, it is sometimes better to stick to merely observing the usage and listening to the explanations offered, inserting only questions as needed to fill in gaps in understanding. --Peter Constable

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Mon 20 of Dec., 2004 05:49 GMT posts: 14214

Woohoo!

The morphology on this page is now auto-snarfed and compiled in with the rest of my grammar, and (after I tweaked some things) actually works.

It's a beauty to see it run. Which reminds me: xorxes, why haven't you installed it? It's just a jar file, should run on any computer made in the last 10 years.

First bug: it things "mim" is "mi" (cmavo) + "m" (cmene).

Oh, and "y" handling isn't fixed.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Mon 20 of Dec., 2004 06:19 GMT posts: 14214

Bug number 2:

Morphology pass: text=( CMAVO=( SU=( s=( s ) u=( u ) ) ) nonMorphLojbanMorphWord=( 'i ) )

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Mon 20 of Dec., 2004 18:53 GMT posts: 1912

Robin: > It's a beauty to see it run. Which reminds me: xorxes, why haven't you > installed it? It's just a jar file, should run on any computer made in the > last 10 years.

It's probably trivial, but I wouldn't know where to begin. I don't really know what "jar file" means.

> First bug: it things "mim" is "mi" (cmavo) + "m" (cmene).

Probably fixed now. I changed cmavo to:

cmavo <- !cmene !gismu !lujvo !fuhivla consonant? vowels

I didn't need the restrictions without the selmaho sorting, because cmavo was tested last:

word <- cmene / gismu / lujvo / fuhivla / cmavo / non-lojban-word

> Oh, and "y" handling isn't fixed.

I think:

BY <- Y spaces? BU / &cmavo (j o h o / r u h o / ...

Or we don't allow spaces in {ybu}?

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''__ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Mon 20 of Dec., 2004 18:53 GMT posts: 1912

> Re: PEG Morphology Algorithm > Bug number 2: > > Morphology pass: text=( CMAVO=( SU=( s=( s ) u=( u ) ) ) > nonMorphLojbanMorphWord=( 'i ) )

That should be fixed now.

I added "&(space / consonant)" at the end of the cmavo rule.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Yahoo! Mail - 250MB free storage. Do more. Manage less. http://info.mail.yahoo.com/mail_250

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Mon 20 of Dec., 2004 20:55 GMT posts: 14214

This:

cmavo

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Tue 21 of Dec., 2004 00:27 GMT posts: 14214

On Mon, Dec 20, 2004 at 12:55:54PM -0800, wikidiscuss@lojban.org wrote: > Re: PEG Morphology Algorithm > This: > > cmavo

Boy, that sure failed spectacularily.

This:

cmavo <- !cmene !gismu !lujvo !fuhivla consonant? vowels &(spaces / consonant)

can't work, because at least one of those ! productions has stuff out front that calls cmavo (or !cmavo, or whatever). One of them I've found, which is the lujvo !tosmabru test, but there's at least one other. Having cmavo call cmavo to match cmavo is left recursion, and is bad.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Tue 21 of Dec., 2004 00:27 GMT posts: 1912

> This: > > cmavo <- !cmene !gismu !lujvo !fuhivla consonant? vowels &(spaces / > consonant) > > can't work, because at least one of those ! productions has stuff > out front that calls cmavo (or !cmavo, or whatever). One of them > I've found, which is the lujvo !tosmabru test, but there's at least > one other.

fuhivla-head started with &cmavo

> Having cmavo call cmavo to match cmavo is left > recursion, and is bad.

Changed to:

cmavo <- !cmene !gismu !lujvo !fuhivla cmavo-form

cmavo-form <- consonant? vowels &(spaces / consonant)

lujvo and fuhivla now use cmavo-form.

mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Meet the all-new My Yahoo! - Try it today! http://my.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Tue 21 of Dec., 2004 00:27 GMT posts: 14214

On Mon, Dec 20, 2004 at 04:49:20AM -0800, Jorge Llamb?as wrote: > Robin: > > It's a beauty to see it run. Which reminds me: xorxes, why > > haven't you installed it? It's just a jar file, should run on > > any computer made in the last 10 years. > > It's probably trivial, but I wouldn't know where to begin. I don't > really know what "jar file" means.

"Java Archive".

I want to mention in passing that talking to someone who could write that much of a grammar without testing it is like talking to, I dunno, *Einstein* or something. If the singularity comes, I may one day be as smart as you, but not otherwise.

Anyways, assuming you're on windows, Start -> Run "cmd". Enter "java". If it says command not found, go to http://www.java.com/en/download/manual.jsp

It it simply hangs there until you hit ^c, we're good to go. Run something like:

java -jar lojban_peg_parser.jar test.txt

to process the stuff it test.txt

> > First bug: it things "mim" is "mi" (cmavo) + "m" (cmene). > > Probably fixed now.

Indeed.

Currently, zoi is broken and using zei causes a crash (!).

Working on it.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Tue 21 of Dec., 2004 00:27 GMT posts: 14214

On Mon, Dec 20, 2004 at 01:33:34PM -0800, Jorge Llamb?as wrote: > > --- Robin Lee Powell wrote: > > > This: > > > > cmavo <- !cmene !gismu !lujvo !fuhivla consonant? vowels > > &(spaces / consonant) > > > > can't work, because at least one of those ! productions has > > stuff out front that calls cmavo (or !cmavo, or whatever). One > > of them I've found, which is the lujvo !tosmabru test, but > > there's at least one other. > > fuhivla-head started with &cmavo > > > Having cmavo call cmavo to match cmavo is left recursion, and is > > bad. > > Changed to: > > cmavo <- !cmene !gismu !lujvo !fuhivla cmavo-form > > cmavo-form <- consonant? vowels &(spaces / consonant) > > lujvo and fuhivla now use cmavo-form.

A couple more places needed to use cmavo-form, but it seems to work now.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Tue 21 of Dec., 2004 01:28 GMT posts: 1912

> I want to mention in passing that talking to someone who could write > that much of a grammar without testing it is like talking to, I > dunno, *Einstein* or something. If the singularity comes, I may one > day be as smart as you, but not otherwise.

u'i ki'e

> java -jar lojban_peg_parser.jar test.txt > > to process the stuff it test.txt

That worked, thank you. I found a bug for the lujvo rule with fuhivla rafsi: it accepted fuhivla that start with a vowel as non-initial rafsi, I think that's fixed now. Is it too complicated for me to generate lojban_peg_parser.jar with a modified grammar?

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Meet the all-new My Yahoo! - Try it today! http://my.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Tue 21 of Dec., 2004 02:38 GMT posts: 14214

On Mon, Dec 20, 2004 at 05:14:17PM -0800, Jorge Llamb?as wrote: > --- Robin Lee Powell wrote: > > java -jar lojban_peg_parser.jar test.txt > > > > to process the stuff it test.txt > > That worked, thank you.

w00t

> I found a bug for the lujvo rule with fuhivla rafsi: it accepted > fuhivla that start with a vowel as non-initial rafsi, I think > that's fixed now. Is it too complicated for me to generate > lojban_peg_parser.jar with a modified grammar?

Very, very much too complicated. It's actually just one command, but you need to be me to run it. :-)

We stomped on each other, but I believe I've fixed it, and a new version is up and seems to be working excellently. I've put in the y+ mod, btw.

Check that your stuff got in.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Tue 21 of Dec., 2004 14:48 GMT posts: 14214

Bug?:

Morphology pass: text=( nonLojbanWord=( KREFU ) )

I'm assuming that caps are always equivalent, as that's what the CLL seems to say, so that should just be {krefu}, yes?

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Tue 21 of Dec., 2004 14:48 GMT posts: 14214

It Would Be Nice for digits to be treated as PA, but that's not currently happening and I don't want to fix it right now.

Morphology pass: text=( nonLojbanWord=( 123 ) )

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Tue 21 of Dec., 2004 14:48 GMT posts: 14214

Everybody (i.e. the other two parsers) but us likes:

tci'ile and

tci'ilykemcantutra

I'm marking this NOT SURE in test_sentences.txt. If xorxes and/or Pierre could review the NOT SURE lines in that file for what they should actually be, that would be good.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Tue 21 of Dec., 2004 14:48 GMT

On Tuesday 21 December 2004 03:53, Robin Lee Powell wrote: > Everybody (i.e. the other two parsers) but us likes: > > tci'ile > and > > tci'ilykemcantutra > > I'm marking this NOT SURE in test_sentences.txt. If xorxes and/or > Pierre could review the NOT SURE lines in that file for what they > should actually be, that would be good.

Both of those are valid words. Where do I find test_sentences.txt?

phma -- li fi'u vu'u fi'u fi'u du li pa

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Tue 21 of Dec., 2004 14:48 GMT posts: 1912

> Bug?: > > Morphology pass: text=( nonLojbanWord=( KREFU ) ) > > I'm assuming that caps are always equivalent, as that's what the CLL > seems to say, so that should just be {krefu}, yes?

{krefu} can't have the last syllable stressed, so {KREFU} or {krEfU} is not a lojban word. Any variant without U should be accepted: {KREFu}, {KREfu}, {KReFu }, {KrEFu}, {kREFu}, {KRefu }, {KrEfu}, {kREfu}, {KreFu }, {kReFu }, {krEFu}, {Krefu }, {kRefu }, {krEfu}, {kreFu }, {krefu }.

I suppose we could make a rule that if only caps are used throughout the text, they are treated as lower case. Not sure how to implement it though.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Jazz up your holiday email with celebrity designs. Learn more. http://celebrity.mail.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Tue 21 of Dec., 2004 14:48 GMT posts: 1912

> It Would Be Nice for digits to be treated as PA, but that's not > currently happening and I don't want to fix it right now. > > Morphology pass: text=( nonLojbanWord=( 123 ) )

I think this is all it takes:

cmavo <- !cmene !gismu !lujvo !fuhivla cmavo-form / digit

I made the change, and added comma* in front of the digit definition, so that commas are allowed everywhere without disrupting anything.

I thought about replacing:

stressed <- comma* AEIOU

with:

stressed <- comma* áéíóú

and making the corresponding changes in the letter definitions, and adding:

letter <- comma* A-Z

BY <- Y space-chars* BU / &cmavo ( j o h o / ... / x y / z y / letter ) &(spaces / consonant)

but I guess that would be too revolutionary for some people.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Yahoo! Mail - 250MB free storage. Do more. Manage less. http://info.mail.yahoo.com/mail_250

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Tue 21 of Dec., 2004 14:48 GMT posts: 1912

> Everybody (i.e. the other two parsers) but us likes: > > tci'ile

I don't understand why this is not accepted.

fuhivla <- !cmene !gismu !lujvo (stressed-fuhivla-head cluster fuhivla-tail / fuhivla-head cluster stressed-fuhivla-tail)

cmene !gismu !lujvo should be satisfied.

stressed-fuhivla-head starts with &cmavo-form, so that is not satisfied.

fuhivla-head <- !slinkuhi &initial-cluster / &cmavo-form syllable (!consonant syllable)* &non-initial-cluster

slinkuhi <- consonant medial-rafsi* final-rafsi

slinkuhi is satisfied, because {ci'i} is a medial-rafsi but {le}

is not a final-rafsi. &initial-cluster is satisfied.

Then an empty fuhivla-head should be acceptable. Is this a problem?

cluster absorbs {tc}.

stressed-fuhivla-tail <- syllable syllable+ &spaces / syllable* stressed-syllable syllable &(spaces / consonant)

syllable absorbs {i} syllable+ absorbs {'i} and {le} &spaces is satisfied.

So why is {tci'ile} not accepted?

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Yahoo! Mail - 250MB free storage. Do more. Manage less. http://info.mail.yahoo.com/mail_250

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Tue 21 of Dec., 2004 16:25 GMT posts: 14214

On Tue, Dec 21, 2004 at 04:41:17AM -0800, Jorge Llamb?as wrote: > > --- Robin Lee Powell wrote: > > > It Would Be Nice for digits to be treated as PA, but that's not > > currently happening and I don't want to fix it right now. > > > > Morphology pass: text=( nonLojbanWord=( 123 ) ) > > I think this is all it takes: > > cmavo <- !cmene !gismu !lujvo !fuhivla cmavo-form / digit

No, they need to be PA, not just cmavo:

Morphology pass: text=( CMAVO=( cmavo=( digit=( 1 ) ) ) CMAVO=( cmavo=( digit=( 2 ) ) ) CMAVO=( PA=( digit=( 3 ) ) ) )

> I thought about replacing: > > stressed <- comma* [AEIOU] > > with: > > stressed <- comma* [?????] > > and making the corresponding changes in the letter definitions, > and adding: > > letter <- comma* [A-Z] > > BY <- Y space-chars* BU / &cmavo ( j o h o / ... / x y / z y / letter ) > &(spaces / consonant) > > but I guess that would be too revolutionary for some people.

I don't understand what this would do?

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Tue 21 of Dec., 2004 16:25 GMT posts: 1912

> On Tue, Dec 21, 2004 at 04:41:17AM -0800, Jorge Llamb?as wrote: > > I think this is all it takes: > > > > cmavo <- !cmene !gismu !lujvo !fuhivla cmavo-form / digit > > No, they need to be PA, not just cmavo: > > Morphology pass: text=( CMAVO=( cmavo=( digit=( 1 ) ) ) > CMAVO=( cmavo=( digit=( 2 ) ) ) CMAVO=( PA=( digit=( 3 ) ) > ) )

Right, the problem is that PA is followed by space or consonant, not by digit, so PA should end &(space / consonant / digit). Probably every &(space / consonant) should be changed to that.

post-word <- &(space / consonant / digit)

> > I thought about replacing: > > > > stressed <- comma* [AEIOU] > > > > with: > > > > stressed <- comma* [?????] > > > > and making the corresponding changes in the letter definitions, > > and adding: > > > > letter <- comma* [A-Z] > > > > BY <- Y space-chars* BU / &cmavo ( j o h o / ... / x y / z y / letter ) > > &(spaces / consonant) > > > > but I guess that would be too revolutionary for some people. > > I don't understand what this would do?

Use caps to represent lerfu, and use an acute mark on vowels to represent stress.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Take Yahoo! Mail with you! Get it on your mobile phone. http://mobile.yahoo.com/maildemo

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Tue 21 of Dec., 2004 16:27 GMT posts: 1912

I changed cmene-syllaboid to:

cmene-syllaboid

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Tue 21 of Dec., 2004 20:28 GMT posts: 1912

> Re: PEG Morphology Algorithm > > I changed cmene-syllaboid to: > > cmene-syllaboid > >

(Posting from the discussion forum doesn't like the "<-".)

I changed cmene-syllaboid to:

cmene-syllaboid <- !doi-la-lai-lahi consonant* vowels / digit

so that things like {la 2005nan.} are allowed.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Yahoo! Mail - You care about security. So do we. http://promotions.yahoo.com/new_mail

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Tue 21 of Dec., 2004 20:28 GMT posts: 14214

On Tue, Dec 21, 2004 at 08:11:07AM -0800, Jorge Llamb?as wrote: > > --- Robin Lee Powell wrote: > > On Tue, Dec 21, 2004 at 04:41:17AM -0800, Jorge Llamb?as wrote: > > > I think this is all it takes: > > > > > > cmavo <- !cmene !gismu !lujvo !fuhivla cmavo-form / digit > > > > No, they need to be PA, not just cmavo: > > > > Morphology pass: text=( CMAVO=( cmavo=( digit=( 1 ) ) ) > > CMAVO=( cmavo=( digit=( 2 ) ) ) CMAVO=( PA=( digit=( 3 ) > > )) ) > > Right, the problem is that PA is followed by space or consonant, > not by digit, so PA should end &(space / consonant / digit). > Probably every &(space / consonant) should be changed to that. > > post-word <- &(space / consonant / digit)

Can't do that, sorry. A non-terminal must contain at least one non-& and non-! element. Removed the & from post-word, changed all calls to it to be &post-word.

> > > I thought about replacing: > > > > > > stressed <- comma* [AEIOU] > > > > > > with: > > > > > > stressed <- comma* [?????] > > > > > > and making the corresponding changes in the letter > > > definitions, and adding: > > > > > > letter <- comma* [A-Z] > > > > > > BY <- Y space-chars* BU / &cmavo ( j o h o / ... / x y / z y > > > / letter ) &(spaces / consonant) > > > > > > but I guess that would be too revolutionary for some people. > > > > I don't understand what this would do? > > Use caps to represent lerfu, and use an acute mark on vowels to > represent stress.

Aaah. The acute marks come through as ?, which should explain well enough why I oppose this. :-)

I'm sure I could figure out how to view them properly, but that's not the point: until nothing but X (where X is probably Unicode) is the sole accepted option for all computer-based text, we need to stick to ascii.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Tue 21 of Dec., 2004 20:28 GMT posts: 14214

On Tue, Dec 21, 2004 at 04:22:32AM -0800, Jorge Llamb?as wrote: > > --- Robin Lee Powell wrote: > > > Bug?: > > > > Morphology pass: text=( nonLojbanWord=( KREFU ) ) > > > > I'm assuming that caps are always equivalent, as that's what the > > CLL seems to say, so that should just be {krefu}, yes? > > {krefu} can't have the last syllable stressed, so {KREFU} or > {krEfU} is not a lojban word. Any variant without U should be > accepted: {KREFu}, {KREfu}, {KReFu }, {KrEFu}, {kREFu}, {KRefu }, > {KrEfu}, {kREfu}, {KreFu }, {kReFu }, {krEFu}, {Krefu }, {kRefu }, > {krEfu}, {kreFu }, {krefu }.

text selbri3 |- BRIVLA | gismu: KREFu |- BRIVLA | gismu: KREfu |- BRIVLA | gismu: KReFu |- BRIVLA | gismu: KrEFu |- BRIVLA | gismu: kREFu |- BRIVLA | gismu: KRefu |- BRIVLA | gismu: KrEfu |- BRIVLA | gismu: kREfu |- BRIVLA | gismu: KreFu |- BRIVLA | gismu: kReFu |- BRIVLA | gismu: krEFu |- BRIVLA | gismu: Krefu |- BRIVLA | gismu: kRefu |- BRIVLA | gismu: krEfu |- BRIVLA | gismu: kreFu |- BRIVLA gismu: krefu

..u'i sai

> I suppose we could make a rule that if only caps are used > throughout the text, they are treated as lower case. Not sure how > to implement it though.

Nah, let's not bother. (although it is from Alice, IIRC, so you may want to look at fixing it).

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Tue 21 of Dec., 2004 20:28 GMT posts: 14214

On Tue, Dec 21, 2004 at 05:32:55AM -0800, Jorge Llamb?as wrote: > > --- Robin Lee Powell wrote: > > > Everybody (i.e. the other two parsers) but us likes: > > > > tci'ile > > I don't understand why this is not accepted. > > fuhivla <- !cmene !gismu !lujvo (stressed-fuhivla-head cluster > fuhivla-tail / fuhivla-head cluster stressed-fuhivla-tail) > > !cmene !gismu !lujvo should be satisfied. stressed-fuhivla-head > starts with &cmavo-form, so that is not satisfied. > > fuhivla-head <- !slinkuhi &initial-cluster / &cmavo-form syllable > (!consonant syllable)* &non-initial-cluster > > slinkuhi <- consonant medial-rafsi* final-rafsi > > !slinkuhi is satisfied, because {ci'i} is a medial-rafsi but {le} > is not a final-rafsi.

I have checked thus far and believe I agree.

> &initial-cluster is satisfied.

Heh. No, it's not.

nitial-cluster <- initial-consonant+ consonant !consonant

{t} and {c} are both eaten by initial-consonant, leaving nothing for consonant.

You just got bitten by greedy absorption. Welcome to my world. :-)

Fixed:

text BRIVLA fuhivla |- cluster | |- consonant | | unvoicedConsonant | | t | | t | |- consonant | unvoicedConsonant | c | c |- stressedFuhivlaTail |- syllable | syllableCore | vowel | vowelY | i | i |- syllable | |- h: ' | |- syllableCore | vowel | vowelY | i | i |- syllable |- consonant | l | l |- syllableCore vowel vowelY e e

Isn't it pretty?

The solution is:

initial-cluster <- (initial-consonant &consonant)+ consonant !consonant

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Tue 21 of Dec., 2004 20:29 GMT posts: 14214

On Tue, Dec 21, 2004 at 05:53:01AM -0500, Pierre Abbat wrote: > On Tuesday 21 December 2004 03:53, Robin Lee Powell wrote: > > Everybody (i.e. the other two parsers) but us likes: > > > > tci'ile and > > > > tci'ilykemcantutra > > > > I'm marking this NOT SURE in test_sentences.txt. If xorxes > > and/or Pierre could review the NOT SURE lines in that file for > > what they should actually be, that would be good. > > Both of those are valid words.

And both are now recognized as such.

> Where do I find test_sentences.txt?

http://www.digitalkingdom.org/~rlpowell/hobbies/lojban/grammar/test_sentences.txt

You already went through some of it at one point, but I don't have it hand. It would probably be best if you posted here, so we could fight about it. However, note that I haven't finished marking it up yet. I'll post here when I do.

Oh, and I need to roll valfendi into my testing.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Tue 21 of Dec., 2004 20:29 GMT posts: 1912

> You just got bitten by greedy absorption. Welcome to my world. :-)

Aaahhrgghhh!!

> The solution is: > > initial-cluster <- (initial-consonant &consonant)+ consonant !consonant

Or how about:

initial-cluster <- initial-consonant+ consonant? !consonant

mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Dress up your holiday email, Hollywood style. Learn more. http://celebrity.mail.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Tue 21 of Dec., 2004 20:29 GMT posts: 14214

On Tue, Dec 21, 2004 at 10:08:31AM -0800, Jorge Llamb?as wrote: > > --- Robin Lee Powell wrote: > > > You just got bitten by greedy absorption. Welcome to my world. > > :-) > > Aaahhrgghhh!! > > > The solution is: > > > > initial-cluster <- (initial-consonant &consonant)+ consonant > > !consonant > > Or how about: > > initial-cluster <- initial-consonant+ consonant? !consonant

Nope. That will match {t} alone. What you're trying to do there is: initial-cluster <- initial-consonant initial-consonant+ consonant? !consonant / initial-consonant+ consonant !consonant

which seems needlessly complicated.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Tue 21 of Dec., 2004 20:29 GMT posts: 1912

> On Tue, Dec 21, 2004 at 10:08:31AM -0800, Jorge Llamb?as wrote: > > Or how about: > > > > initial-cluster <- initial-consonant+ consonant? !consonant > > Nope. That will match {t} alone.

How come? Both {t} and {c} are valid initial-consonant so they should both be grabbed by initial-consonant+ No consonant is found next, so consonant? grabs nothing. And finally !consonant is satisfied because what follows is {i}.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Send holiday email and support a worthy cause. Do good. http://celebrity.mail.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Tue 21 of Dec., 2004 20:29 GMT posts: 14214

Our parser (can't call it "my parser" anymore, really) is the only one that believes that {neal} and {daed} are not valid cmene. Bug or feature?

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Tue 21 of Dec., 2004 20:29 GMT posts: 14214

On Tue, Dec 21, 2004 at 10:17:12AM -0800, Jorge Llamb?as wrote: > > --- Robin Lee Powell wrote: > > On Tue, Dec 21, 2004 at 10:08:31AM -0800, Jorge Llamb?as wrote: > > > Or how about: > > > > > > initial-cluster <- initial-consonant+ consonant? !consonant > > > > Nope. That will match {t} alone. > > How come? Both {t} and {c} are valid initial-consonant so they > should both be grabbed by initial-consonant+ > No consonant is found next, so consonant? grabs nothing. > And finally !consonant is satisfied because what follows is {i}.

You misunderstand. That will just as easily match {t} in {ti'ile} as {tc} in {tci'ile}. {t} is "one or more" initial consonants.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Tue 21 of Dec., 2004 20:29 GMT posts: 1912

> Our parser (can't call it "my parser" anymore, really) is the only > one that believes that {neal} and {daed} are not valid cmene. Bug > or feature?

It was done on purpose: no non-permissible consonant or vowel pairs are allowed anywhere in lojban words, including cmene. The apostrophe is only allowed between vowels.

I guess those rules could be relaxed, but if we do, we should do it for cmavo and for fuhivla as well as for cmene.

Some restrictions might still still be missing for digits, though. I think {1'an} will be accepted.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Yahoo! Mail - Find what you need with new enhanced search. http://info.mail.yahoo.com/mail_250

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Tue 21 of Dec., 2004 20:29 GMT posts: 1912

> You misunderstand. That will just as easily match {t} in {ti'ile} > as {tc} in {tci'ile}. {t} is "one or more" initial consonants.

Oops, right.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? All your favorites on one personal page – Try My Yahoo! http://my.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Tue 21 of Dec., 2004 20:29 GMT posts: 14214

On Tue, Dec 21, 2004 at 10:27:48AM -0800, Jorge Llamb?as wrote: > > --- Robin Lee Powell wrote: > > > Our parser (can't call it "my parser" anymore, really) is the > > only one that believes that {neal} and {daed} are not valid > > cmene. Bug or feature? > > It was done on purpose:

That's all I needed.

> no non-permissible consonant or vowel pairs are allowed anywhere > in lojban words,

I didn't realize {ae} and {ea} counted, hence my asking.

> I guess those rules could be relaxed,

No, that's fine; this is all English contamination in IRC anyways.

> Some restrictions might still still be missing for digits, though. > I think {1'an} will be accepted.

Nope:

Morphology pass: text=( CMAVO=( cmavo=( digit=( 1 ) ) ) nonLojbanWord=( 'an ) )

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Tue 21 of Dec., 2004 20:29 GMT posts: 14214

Morphology pass: text=( nonLojbanWord=( pravda ) )

Bug or feature?

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Tue 21 of Dec., 2004 20:29 GMT posts: 14214

The "commas anywhere" thing allows the camxes parser to accept things like "2, by tirno", which seems to maybe not be what was intended. :-)

Not sure if this should be fixed, but thought it was worth mentioning.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Tue 21 of Dec., 2004 20:29 GMT posts: 1912

> Morphology pass: text=( nonLojbanWord=( pravda ) ) > > Bug or feature?

Feature: it fails slinku'i {le pravda} could be the lujvo lep-ravda.


mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Read only the mail you want - Yahoo! Mail SpamGuard. http://promotions.yahoo.com/new_mail

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Tue 21 of Dec., 2004 20:29 GMT posts: 14214

On Tue, Dec 21, 2004 at 10:51:08AM -0800, Jorge Llamb?as wrote: > > --- Robin Lee Powell wrote: > > > Morphology pass: text=( nonLojbanWord=( pravda ) ) > > > > Bug or feature? > > Feature: it fails slinku'i {le pravda} could be the lujvo > lep-ravda.

Even if there's a pause in there? Seems the slinku'i test is a bit overzealous.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Tue 21 of Dec., 2004 20:29 GMT posts: 1912

> > Feature: it fails slinku'i {le pravda} could be the lujvo > > lep-ravda. > > Even if there's a pause in there? Seems the slinku'i test is a bit > overzealous.

We could allow {.slinku'i} as a fu'ivla, but then it would be the only type of brivla that must begin with a pause.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Read only the mail you want - Yahoo! Mail SpamGuard. http://promotions.yahoo.com/new_mail

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Tue 21 of Dec., 2004 20:29 GMT posts: 14214

On Tue, Dec 21, 2004 at 11:04:25AM -0800, Jorge Llamb?as wrote: > > --- Robin Lee Powell wrote: > > > > Feature: it fails slinku'i {le pravda} could be the lujvo > > > lep-ravda. > > > > Even if there's a pause in there? Seems the slinku'i test is a > > bit overzealous. > > We could allow {.slinku'i} as a fu'ivla, but then it would be the > only type of brivla that must begin with a pause.

Ewww. Nevermind, forget I said anything.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

arjPosted by arj on Tue 21 of Dec., 2004 20:29 GMT posts: 953

On Tue, 21 Dec 2004, Robin Lee Powell wrote:

>>> Morphology pass: text=( nonLojbanWord=( pravda ) ) >> >> Feature: it fails slinku'i {le pravda} could be the lujvo >> lep-ravda. > > Even if there's a pause in there? Seems the slinku'i test is a bit > overzealous.

The slinku'i test AFAIK applies to single words, not to strings of words.

If we were to allow this, we would have to *enforce* a pause in front of it, which we never otherwise do for consonant-initial brivla.

Isn't that too much of a privilege to give to a lowly fu'ivla?

-- Arnt Richard Johansen http://arj.nvg.org/ The problem is, witchcraft is not fantasy; it is a sinful reality in our world. --christiananswers.net

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Tue 21 of Dec., 2004 20:40 GMT posts: 1912

Every fu'ivla that starts with a consonant can be used as the final rafsi of a lujvo.

Given that {i} is permissible after any vowel, and that {iy} is a valid vowel pair, we could give every fuhivla that starts with a consonant a medial rafsi if we use -iy- as the hyphen. (This could be in addition to the priviledged fuhivla that have shorter rafsi, so {tci'ile} for example would have both tci'ily- and tci'ileiy- as rafsi, just as {valsi} has valsy- and val-. This would be easy to implement.

fuhivla that start with a vowel still need to start with a pause, so they can form lujvo non-initially only with zei.

mu'o mi'e xorxes

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Tue 21 of Dec., 2004 23:21 GMT

Robin Lee Powell scripsit: > Our parser (can't call it "my parser" anymore, really) is the only > one that believes that {neal} and {daed} are not valid cmene. Bug > or feature?

Feature, he said firmly. The old (can't really call it "official any more") parser is completely clueless about morphology: all it can do is break up compound cmavo.

-- "No, John. I want formats that are actually John Cowan useful, rather than over-featured megaliths that http://www.ccil.org/~cowan address all questions by piling on ridiculous http://www.reutershealth.com internal links in forms which are hideously jcowan@reutershealth.com over-complex." --Simon St. Laurent on xml-dev

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

arjPosted by arj on Tue 21 of Dec., 2004 23:21 GMT posts: 953

On Tue, 21 Dec 2004 wikidiscuss@lojban.org wrote:

> Re: PEG Morphology Algorithm > > Every fu'ivla that starts with a consonant can be used > as the final rafsi of a lujvo.

I suppose this is a proposed change.

Apart from my general reluctance to fix something that is not broken (which all of you are probably aware of by now) a question:

How can you tell the difference between a lujvo with a final rafsi fu'ivla, and a stage-4 fu'ivla that just happens to have a lujvolike form in the source language?

-- Arnt Richard Johansen http://arj.nvg.org/ XP kjennes ... sprengt. Som om noe har eksplodert der.

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Tue 21 of Dec., 2004 23:49 GMT

On Tuesday 21 December 2004 13:31, Robin Lee Powell wrote: > On Tue, Dec 21, 2004 at 10:27:48AM -0800, Jorge Llamb?as wrote: > > --- Robin Lee Powell wrote: > > > Our parser (can't call it "my parser" anymore, really) is the > > > only one that believes that {neal} and {daed} are not valid > > > cmene. Bug or feature? > > > > It was done on purpose: > > That's all I needed. > > > no non-permissible consonant or vowel pairs are allowed anywhere > > in lojban words, > > I didn't realize {ae} and {ea} counted, hence my asking. > > > I guess those rules could be relaxed, > > No, that's fine; this is all English contamination in IRC anyways.

For fu'ivla, the Book has the example {kuln,r,kore,a}, and I've found {xumrte,obromino} in IRC and used {stagrle,oxari} in a recipe. Cmene can also have "iy" and "uy". If you want to insist that non-diphthong vowel pairs have commas, that's fine, but they are allowed.

phma

-- ..i le babzba ba zbasu lo jbazbabu lo babjba

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Tue 21 of Dec., 2004 23:54 GMT posts: 1912

> > > > Every fu'ivla that starts with a consonant can be used > > as the final rafsi of a lujvo. > > I suppose this is a proposed change.

It's in CLL, at least in embryonic form. Pierre worked it out in more detail.

> Apart from my general reluctance to fix something that is not broken > (which all of you are probably aware of by now) a question: > > How can you tell the difference between a lujvo with a final rafsi > fu'ivla, and a stage-4 fu'ivla that just happens to have a lujvolike form > in the source language?

A fu'ivla-rafsi must always be separated with "y" from any other rafsi.

If the thing between two y's (or between the start of the word and y, or between y and the endo of the word) is a string of normal rafsi, then it has to be a string of normal rafsi and cannot be a fu'ivla-rafsi.

If the thing between y's is not a string of normal rafsi, and when adding a vowel is a fu'ivla, then it is a fu'ivla rafsi.

If the thing between y's is neither a string of rafsi nor a fu'ivla-rafsi, then we have a non-lojban-word.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Send holiday email and support a worthy cause. Do good. http://celebrity.mail.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Tue 21 of Dec., 2004 23:54 GMT

On Tuesday 21 December 2004 15:40, wikidiscuss@lojban.org wrote: > Re: PEG Morphology Algorithm > > Every fu'ivla that starts with a consonant can be used > as the final rafsi of a lujvo. > > Given that {i} is permissible after any vowel, and that {iy} > is a valid vowel pair, we could give every fuhivla that starts > with a consonant a medial rafsi if we use -iy- as the hyphen. > (This could be in addition to the priviledged fuhivla that > have shorter rafsi, so {tci'ile} for example would have both > tci'ily- and tci'ileiy- as rafsi, just as {valsi} has valsy- and > val-. This would be easy to implement. > > fuhivla that start with a vowel still need to start with a pause, > so they can form lujvo non-initially only with zei.

The way I set up valfendi, there is one class of fu'ivla that have rafsi that can be used anywhere in a lujvo, and all others cannot be used in a lujvo except with zei. "-iy-" was tried and discarded a long time ago.

phma -- S Fa1>+/- !TM M-- K H T-- t? AT++ SY Te- SC- FO- D P !Tz E++ L

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Wed 22 of Dec., 2004 00:23 GMT

On Tuesday 21 December 2004 16:33, Arnt Richard Johansen wrote: > On Tue, 21 Dec 2004 wikidiscuss@lojban.org wrote: > > Re: PEG Morphology Algorithm > > > > Every fu'ivla that starts with a consonant can be used > > as the final rafsi of a lujvo. > > I suppose this is a proposed change. > > Apart from my general reluctance to fix something that is not broken > (which all of you are probably aware of by now) a question: > > How can you tell the difference between a lujvo with a final rafsi > fu'ivla, and a stage-4 fu'ivla that just happens to have a lujvolike form > in the source language?

A lujvo that ends with a rafsi fu'ivla always has 'y' before it (e.g. {nalytci'ile}; a fu'ivla never has 'y' in it.

phma -- ..i toljundi do .ibabo mi'afra tu'a do ..ibabo damba do .ibabo do jinga ..icu'u la ma'atman.

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Wed 22 of Dec., 2004 00:23 GMT posts: 1912

> For fu'ivla, the Book has the example {kuln,r,kore,a}, and I've found > {xumrte,obromino} in IRC and used {stagrle,oxari} in a recipe. Cmene can also > > have "iy" and "uy". If you want to insist that non-diphthong vowel pairs have > > commas, that's fine, but they are allowed.

I don't have a problem with allowing non-diphthong vowel pairs in cmene as long as they are allowed in cmavo and fu'ivla as well. All we need to do in order to allow them is eliminate the !a !e !o !u !y at the end of the vowel rules. The syllable counting rules required for fu'ivla are not affected by this.

The current PEG allows iy and uy in cmene and cmavo.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Meet the all-new My Yahoo! - Try it today! http://my.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Wed 22 of Dec., 2004 00:23 GMT posts: 1912

> The way I set up valfendi, there is one class of fu'ivla that have rafsi that

> can be used anywhere in a lujvo, and all others cannot be used in a lujvo > except with zei.

Why do you disallow fu'ivla that start with a consonant as final rafsi?

> "-iy-" was tried and discarded a long time ago.

I remember reading something about it. How was it tried, and why was it discarded?

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Dress up your holiday email, Hollywood style. Learn more. http://celebrity.mail.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by klark on Wed 22 of Dec., 2004 00:27 GMT posts: 10

Jorge, this is just marvelous work — I'm in awe. (I'm also envious of the amount of free time you appear to have. :-)However, I have a concern about the overall approach you're taking — the high-level design, as it were.

The grammar in its current state does four separable things: 1. It partitions the input stream into words. 2. It validates the words, rejecting invalid vowel and consonant patterns. 3. It determines the selma'o of a cmavo. 4. It categorizes brivla into gismu, lujvo and fu'ivla.

As a result, the grammar is fearsomely complex in spots. (OK, the part that recognizes selma'o isn't complex; it's just huge.) And it could be argued that categorizing brivla really belongs to semantic analysis, not parsing.

For the sake of modularity and reducing point-complexity, I think it would be worth considering splitting the job into its components, and writing separate grammars: 1. A partitioning grammar that considers an input string, and accepts a word (cmene, brivla, cmavo or non-Lojban) from its head. 2. A validating grammar that considers a Lojban word, and rejects it (re-categorizing it as non-Lojban?) if it has invalid vowel or consonant patterns. 3. Selma'o determination might be more easily described as a symbol table lookup than as a parsing problem. 4. A grammar that considers a valid Lojban brivla, and categorizes it.

Of course this scheme depends on being able to combine multiple PEG-generated parsers into a single program. But if the parser generator takes parameters which can be used to name the input and parser functions, that shouldn't be hard.

Or is there already a consensus that the requirement is for a single grand grammar covering every relevant aspect of the language?

Clark Nelson

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Wed 22 of Dec., 2004 02:56 GMT posts: 14214

On Tue, Dec 21, 2004 at 04:27:01PM -0800, wikidiscuss@lojban.org wrote: > Jorge, this is just marvelous work — I'm in awe. (I'm also > envious of the amount of free time you appear to have. :-) > However, I have a concern about the overall approach you're taking > — the high-level design, as it were.

Actually, the high-level design is mine, not his. See:

http://www.digitalkingdom.org/~rlpowell/hobbies/lojban/grammar/

> The grammar in its current state does four separable things:

Just because they *can* be seperated, doesn't mean they should be.

> 1. It partitions the input stream into words. > > 2. It validates the words, rejecting invalid vowel and consonant patterns. > > 3. It determines the selma'o of a cmavo. > > 4. It categorizes brivla into gismu, lujvo and fu'ivla.

In fact, these are not seperate actions, so far as I know, in either jbofihe or the current official parser.

I don't consider step 2 to be distinct from step 4, by the way.

> As a result, the grammar is fearsomely complex in spots. (OK, the > part that recognizes selma'o isn't complex; it's just huge.)

Yup. You should see the version in the main grammar.

> And it could be argued that categorizing brivla really belongs to > semantic analysis, not parsing.

Umm, what?

> For the sake of modularity and reducing point-complexity, I think > it would be worth considering splitting the job into its > components, and writing separate grammars:

The problem with this is that we could argue for hours over where the seperations lie. I was vehemently opposed to seperating out the morphology from the rest of the grammar in the first place, in fact.

> Of course this scheme depends on being able to combine multiple > PEG-generated parsers into a single program.

Already done. What you're describing might result in a noticeable slowdown in processing, but I can't be sure.

> But if the parser generator takes parameters which can be used to > name the input and parser functions, that shouldn't be hard.

It's a pain in the ass, but it's not hard.

> Or is there already a consensus that the requirement is for a > single grand grammar covering every relevant aspect of the > language?

As I said, the grammar is already in two parts: morphology and syntax. The only reason I agreed to that, however, is that it was pointed out that other, completely different, morphologies might want to be used, and that that should be allowed for.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Wed 22 of Dec., 2004 02:56 GMT posts: 14214

Morphology pass: text=( CMAVO=( cmavo=( cuu ) ) )

I assume this is a bug.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Wed 22 of Dec., 2004 02:56 GMT

On Tuesday 21 December 2004 19:05, Jorge "Llambías" wrote: > --- Pierre Abbat wrote: > > The way I set up valfendi, there is one class of fu'ivla that have rafsi > > that > > > > can be used anywhere in a lujvo, and all others cannot be used in a lujvo > > except with zei. > > Why do you disallow fu'ivla that start with a consonant as final rafsi?

I don't. Why do you think I do? {nalytci'ile} is valid but {nalyskalduna} is not.

> > "-iy-" was tried and discarded a long time ago. > > I remember reading something about it. How was it tried, and why > was it discarded?

That was before my time.

phma -- ..i toljundi do .ibabo mi'afra tu'a do ..ibabo damba do .ibabo do jinga ..icu'u la ma'atman.

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by klark on Wed 22 of Dec., 2004 02:56 GMT posts: 10

>> The grammar in its current state does four separable things: > > Just because they *can* be seperated, doesn't mean they should be.

No, of course not.

> In fact, these are not seperate actions, so far as I know, in either > jbofihe or the current official parser.

And just because they have heretofore been unified, doesn't mean that they should be, either.

>> And it could be argued that categorizing brivla really belongs to >> semantic analysis, not parsing. > > Umm, what?

Simple: for parsing purposes, a brivla is a brivla is a brivla. It's only when you get around to trying to figure out the meaning of a sentence that it begins to matter how it was formed, from which one can determine what it means.

>> For the sake of modularity and reducing point-complexity, I think >> it would be worth considering splitting the job into its >> components, and writing separate grammars: > > The problem with this is that we could argue for hours over where > the seperations lie. I was vehemently opposed to seperating out the > morphology from the rest of the grammar in the first place, in fact.

Well, of course if one (very influential) partipant is "vehemently opposed" to any separation, then any proposal for separation would necessarily either be rejected immediately, or result in hours of argument. :-)

>> Of course this scheme depends on being able to combine multiple >> PEG-generated parsers into a single program. > > Already done. What you're describing might result in a noticeable > slowdown in processing, but I can't be sure.

It might also result in a noticeable speedup. Just for example, with the current grammar for determining selma'o, validation would be done twice: once when &cmavo is evaluated, and again when each of the letters is scanned, because of all the lookahead involved in all the single-letter rules.

>> Or is there already a consensus that the requirement is for a >> single grand grammar covering every relevant aspect of the >> language? > > As I said, the grammar is already in two parts: morphology and > syntax. The only reason I agreed to that, however, is that it was > pointed out that other, completely different, morphologies might > want to be used, and that that should be allowed for.

Like I say, I believe that partitioning, validation and characterization are probably simpler considered separately than together. It takes a genius of Jorge's caliber to write or understand a parser that does all three simultaneously. I strongly suspect that if separate grammars were used to solve pieces of the whole problem, each would be simple enough that many, many more people would be able to understand them. Ideally, they would be simple enough that it would be feasible to see whether the grammar(s) do what the prose description says.

Clark

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Wed 22 of Dec., 2004 02:56 GMT

On Tuesday 21 December 2004 19:47, Robin Lee Powell wrote: > On Tue, Dec 21, 2004 at 04:27:01PM -0800, wikidiscuss@lojban.org > > wrote: > > Jorge, this is just marvelous work — I'm in awe. (I'm also > > envious of the amount of free time you appear to have. :-) > > However, I have a concern about the overall approach you're taking > > — the high-level design, as it were. > > Actually, the high-level design is mine, not his. See: > > http://www.digitalkingdom.org/~rlpowell/hobbies/lojban/grammar/ > > > The grammar in its current state does four separable things: > > Just because they *can* be seperated, doesn't mean they should be.

They are separated in valfendi (except it doesn't do 3).

> > 1. It partitions the input stream into words. > > > > 2. It validates the words, rejecting invalid vowel and consonant > > patterns. > > > > 3. It determines the selma'o of a cmavo. > > > > 4. It categorizes brivla into gismu, lujvo and fu'ivla. > > > For the sake of modularity and reducing point-complexity, I think > > it would be worth considering splitting the job into its > > components, and writing separate grammars: > > The problem with this is that we could argue for hours over where > the seperations lie. I was vehemently opposed to seperating out the > morphology from the rest of the grammar in the first place, in fact.

The problem with doing it in PEG is that it appears to be impossible to check that a string matches two different PEs with the same number of characters matched for both. That's why every selma'o PE ends with checking for a space or consonant, even though "cmavo" already checked for that.

phma -- li fi'u vu'u fi'u fi'u du li pa

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Wed 22 of Dec., 2004 02:56 GMT posts: 14214

On Tue, Dec 21, 2004 at 08:36:31PM -0500, Pierre Abbat wrote: > The problem with doing it in PEG is that it appears to be > impossible to check that a string matches two different PEs with > the same number of characters matched for both. That's why every > selma'o PE ends with checking for a space or consonant, even > though "cmavo" already checked for that.

1. I have no idea what that has to do with the rest of the conversation.

2. So what? It works quite well, you'll notice.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by klark on Wed 22 of Dec., 2004 02:57 GMT posts: 10

> The problem with doing it in PEG is that it appears to be impossible to > check > that a string matches two different PEs with the same number of characters > matched for both. That's why every selma'o PE ends with checking for a > space > or consonant, even though "cmavo" already checked for that.

Actually, that's the problem with doing it *in a single PEG grammar*, which is what I'm suggesting that we not do.

Clark

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

arjPosted by arj on Wed 22 of Dec., 2004 02:57 GMT posts: 953

On Tue, 21 Dec 2004 wikidiscuss@lojban.org wrote:

> Re: PEG Morphology Algorithm — design > The grammar in its current state does four separable things:

> 1. It partitions the input stream into words. ... > 4. It categorizes brivla into gismu, lujvo and fu'ivla.

I believe that it is possible that these two tasks are not separable. I= n=20 any case, the current approach of the morphology part of does it in a w= ay=20 consistent with the traditional (not fully operationalized) method of=20 determining which words are of what kind.

Basically, a fu'ivla is any word that fits the definition of a brivla=20 (consonant cluster in first five letters, not counting y or '), but is = not=20 either a gismu or a lujvo. So a fu'ivla is a very open-ended set of=20 words. When cmavo are preceding a fu'ivla, there are some potential=20 ambiguities that we have to handle. This is done via the so-called=20 "slinku'i test", which is explained at:

http://www.lojban.org/tiki/3Dslinku%27i

In order to do the slinku'i test, we have to know what a lujvo is like.= To=20 know what a lujvo is like, we have to know what a rafsi is like. Final=20 rafsi can be gismu, so we have to match against that, too. So, only to=20 separate words consistently in the face of fu'ivla, we have to implemen= t=20 all of these concepts. So I believe further modularization is not=20 possible.

--=20 Arnt Richard Johansen http://arj.nvg.org= /

ABN=E5r jeg kommer til kloakken, er det for =E5 rense opp - n=E5r Zola

bes=F8ker det samme sted, er det for =E5 bade!=BB --Henrik Ibsen

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Wed 22 of Dec., 2004 02:57 GMT posts: 1912

> The grammar in its current state does four separable things: > 1. It partitions the input stream into words. > 2. It validates the words, rejecting invalid vowel and consonant patterns. > 3. It determines the selma'o of a cmavo. > 4. It categorizes brivla into gismu, lujvo and fu'ivla. > > As a result, the grammar is fearsomely complex in spots.

Yes. Unfortunately, this is unavoidable. Lojban morphology is an ugly monster, that's a fact.

It was me who asked Robin to separate the morphology from the main syntax part of the grammar. The determination of selmaho is not part of what I did, and I agree it belongs in a separate module, but the way it is written now, you can ignore the selmaho part and it works with just "words" at the highest level.

1, 2 and 4 are inextricably linked. You can't do one without the other.

>(OK, the part that > recognizes selma'o isn't complex; it's just huge.) And it could be argued > that categorizing brivla really belongs to semantic analysis, not parsing.

You can't detect valid brivla without categorizing it. Brivla is a collection of gismu, lujvo and fuhivla rether than these being a partition of an initial class brivla, as it were.

> For the sake of modularity and reducing point-complexity, I think it would be > worth considering splitting the job into its components, and writing separate > grammars: > 1. A partitioning grammar that considers an input string, and accepts a word > (cmene, brivla, cmavo or non-Lojban) from its head. > 2. A validating grammar that considers a Lojban word, and rejects it > (re-categorizing it as non-Lojban?) if it has invalid vowel or consonant > patterns. > 3. Selma'o determination might be more easily described as a symbol table > lookup than as a parsing problem. > 4. A grammar that considers a valid Lojban brivla, and categorizes it.

I tried to make the morphology as modular as possible. Validation of consonant and vowel pairs is done at the lowest level.

Then each word class has its own module. You can't put all brivla in a single module. You could say that a brivla is any string that ends in a vowel and whose second consonant is part of a cluster, but then you'd be letting in some cmavo+brivla combinations and also some invalid stuff. It doesn't really advance you much.

> Of course this scheme depends on being able to combine multiple PEG-generated > parsers into a single program. But if the parser generator takes parameters > which can be used to name the input and parser functions, that shouldn't be > hard.

I wouldn't know anything about that. The separation can be done within a single grammar, by making a section take the output of a lower section as its "pseudo-terminals". That's not the problem. The problem is the inherent comnplexity of the grammar itself. (Indeed, when I asked Robin to separate the morphology part this is all I had in mind.)

> Or is there already a consensus that the requirement is for a single grand > grammar covering every relevant aspect of the language?

Not from my part. I want as much modularity as possible.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Send a seasonal email greeting and help others. Do good. http://celebrity.mail.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Wed 22 of Dec., 2004 02:57 GMT posts: 1912

> Morphology pass: text=( CMAVO=( cmavo=( cuu ) ) ) > > I assume this is a bug.

Any valid vowel pair is accepted in cmavo: {.uau}, {miau}, {cuu}, {kiy}, etc.

I don't want to restrict them in cmavo unless they are equally restricted in cmene and fu'ivla.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Dress up your holiday email, Hollywood style. Learn more. http://celebrity.mail.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Wed 22 of Dec., 2004 05:17 GMT posts: 1912

> On Tuesday 21 December 2004 19:05, Jorge "Llambías" wrote: > > --- Pierre Abbat wrote: > > > The way I set up valfendi, there is one class of fu'ivla that have rafsi > > > that > > > > > > can be used anywhere in a lujvo, and all others cannot be used in a lujvo > > > except with zei. > > > > Why do you disallow fu'ivla that start with a consonant as final rafsi? > > I don't. Why do you think I do? {nalytci'ile} is valid but {nalyskalduna} is > not.

I meant: Why do you disallow *some* fu'ivla that start with a consonant as final rafsi. Any fu'ivla that starts with a consonant can be used equally unambiguously as a final rafsi.

> > > "-iy-" was tried and discarded a long time ago. > > > > I remember reading something about it. How was it tried, and why > > was it discarded? > > That was before my time.

Well, mine too. Now is our time, so let's consider it. {iy} puts every fu'ivla (at least those that start with a consonant) on an equal footing. Just as with gismu, some are priviledged with shorter rafsi, but all have at least the long ones. Why not extend the same benefit to fu'ivla?

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Jazz up your holiday email with celebrity designs. Learn more. http://celebrity.mail.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Wed 22 of Dec., 2004 05:17 GMT posts: 1912

> Like I say, I believe that partitioning, validation and characterization are > probably simpler considered separately than together. It takes a genius of > Jorge's caliber to write or understand a parser that does all three > simultaneously.

Thank you for the compliment, but in fact what I tried to do is to separate them as much as I could.

> I strongly suspect that if separate grammars were used to > solve pieces of the whole problem, each would be simple enough that many, > many more people would be able to understand them.

The cmene, gismu and cmavo rules are very easy to understand, I would say.

The lujvo rule is somewhat complicated by the stress rules. I plan to do a separate parallel grammar that does not handle capital letters when this one is done, which I believe is much easier to read. Other than that, the lujvo section is long but straightforward.

fu'ivla is probably the trickiest part to figure out.

> Ideally, they would be > simple enough that it would be feasible to see whether the grammar(s) do > what the prose description says.

Indeed, that's the goal. That's one reason for using terms like "slinkuhi" and "tosmabru" for rule names for example, because they perform the slinkuhi and tosmabru tests respectively.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? The all-new My Yahoo! - Get yours free! http://my.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Wed 22 of Dec., 2004 05:17 GMT posts: 1912

> Basically, a fu'ivla is any word that fits the definition of a brivla=20 > (consonant cluster in first five letters, not counting y or '),

(With John Cowan's consent) I extended the definition of brivla to "second consonant belongs to a cluster" rather than "cluster in first five letters". The restriction in the number of leading vowels is not well motivated. The number five just comes from the particular restrictions on the length of vowel strings of gismu and lujvo, but fu'ivla need not be so restricted.

> In order to do the slinku'i test, we have to know what a lujvo is like.= > To=20 > know what a lujvo is like, we have to know what a rafsi is like. Final=20 > rafsi can be gismu, so we have to match against that, too. So, only to=20 > separate words consistently in the face of fu'ivla, we have to implemen= > t=20 > all of these concepts. So I believe further modularization is not=20 > possible.

Indeed.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Take Yahoo! Mail with you! Get it on your mobile phone. http://mobile.yahoo.com/maildemo

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Wed 22 of Dec., 2004 05:17 GMT

On Tuesday 21 December 2004 20:46, Robin Lee Powell wrote: > On Tue, Dec 21, 2004 at 08:36:31PM -0500, Pierre Abbat wrote: > > The problem with doing it in PEG is that it appears to be > > impossible to check that a string matches two different PEs with > > the same number of characters matched for both. That's why every > > selma'o PE ends with checking for a space or consonant, even > > though "cmavo" already checked for that. > > 1. I have no idea what that has to do with the rest of the > conversation. > > 2. So what? It works quite well, you'll notice.

In cmavo, yes. But when you consider lujvo and fu'ivla, especially lujvo with fu'ivla rafsi in them, you have to check whether the y-hyphens are necessary and whether something can be decomposed into rafsi with or without a consonant at the front removed, and the end of the brivla is marked, not by something simple such as a consonant, but by the stress, with an unstressed syllable (or two, if there's a 'y' present) after the stressed one.

phma -- Now I need a magnifier to find my eyeglasses! -Les Perles de la médecine

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by klark on Wed 22 of Dec., 2004 06:21 GMT posts: 10

From: "Jorge Llambías" > --- Arnt Richard Johansen wrote: >> Basically, a fu'ivla is any word that fits the definition of a brivla=20 >> (consonant cluster in first five letters, not counting y or '), > > (With John Cowan's consent) I extended the definition of brivla > to "second consonant belongs to a cluster" rather than "cluster > in first five letters". The restriction in the number of leading > vowels is not well motivated. The number five just comes from the > particular restrictions on the length of vowel strings of gismu > and lujvo, but fu'ivla need not be so restricted. > >> In order to do the slinku'i test, we have to know what a lujvo is like.= >> To=20 >> know what a lujvo is like, we have to know what a rafsi is like. Final=20 >> rafsi can be gismu, so we have to match against that, too. So, only to=20 >> separate words consistently in the face of fu'ivla, we have to implemen= >> t=20 >> all of these concepts. So I believe further modularization is not=20 >> possible. > > Indeed.

Hmm. Consider the following passage from CLL (4.3):

All brivla have the following properties: 1) always end in a vowel; 2) always contain a consonant pair in the first five letters, where "y" and apostrophe are not counted as letters for this purpose; 3) always are stressed on the next-to-last (penultimate) syllable; this implies that they have two or more syllables.

I always assumed this to be definitive, rather than descriptive: that any word having all these characteristics is defined as being a brivla; not that this happens to be true as a consequence of other rules. I also believed that any brivla that didn't match the pattern of a gismu or lujvo was defined to be a fu'ivla.

Are these things true or not?

Clark

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Wed 22 of Dec., 2004 06:21 GMT

On Wednesday 22 December 2004 00:24, Clark & Janiece Nelson wrote: > Hmm. Consider the following passage from CLL (4.3): > > All brivla have the following properties: > 1) always end in a vowel; > 2) always contain a consonant pair in the first five letters, where "y" and > apostrophe are not counted as letters for this purpose; > 3) always are stressed on the next-to-last (penultimate) syllable; this > implies that they have two or more syllables. > > I always assumed this to be definitive, rather than descriptive: that any > word having all these characteristics is defined as being a brivla; not > that this happens to be true as a consequence of other rules. I also > believed that any brivla that didn't match the pattern of a gismu or lujvo > was defined to be a fu'ivla. > > Are these things true or not?

There are three kinds of lerpoi that satisfy those properties but aren't brivla: tosmabru, slinku'i, and invalid lujvo. A lerpoi beginning with a consonant cluster and the result of prepending a cmavo to it cannot both be brivla; either the shorter is a slinku'i, or the longer is a tosmabru (or the cmavo has more than three letters). (I am using "tosmabru" loosely.) The set of strings that satisfy those properties and are not tosmabru or slinku'i, I call greater brivla space. Strings in greater brivla space that aren't brivla are invalid lujvo. There are two kinds: errors of hyphenation (lekymoi) and errors of rafsi (lekybumoi).

phma -- Without glasses, I can't even distinguish smells... -Les Perles de la médecine

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Wed 22 of Dec., 2004 10:58 GMT posts: 14214

On Tue, Dec 21, 2004 at 06:51:58PM -0800, Jorge Llamb?as wrote: > > --- Robin Lee Powell wrote: > > > Morphology pass: text=( CMAVO=( cmavo=( cuu ) ) ) > > > > I assume this is a bug. > > Any valid vowel pair is accepted in cmavo: {.uau}, {miau}, {cuu}, > {kiy}, etc. > > I don't want to restrict them in cmavo unless they are equally > restricted in cmene and fu'ivla.

OK; nevermind then.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Wed 22 of Dec., 2004 10:58 GMT posts: 14214

On Tue, Dec 21, 2004 at 06:12:38PM -0800, Clark & Janiece Nelson wrote: > >The problem with doing it in PEG is that it appears to be > >impossible to check that a string matches two different PEs with > >the same number of characters matched for both. That's why every > >selma'o PE ends with checking for a space or consonant, even > >though "cmavo" already checked for that. > > Actually, that's the problem with doing it *in a single PEG > grammar*, which is what I'm suggesting that we not do.

It's not a problem in *either* case. Pierre is stuck on it because he, apparently, can't think in terms of string-wise decomposition rather than algorithmics. I've got 6600 lines of comparison output between his parser and the PEG one, and I see no evidence that this "problem" is a problem.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Wed 22 of Dec., 2004 10:58 GMT posts: 14214

On Tue, Dec 21, 2004 at 10:27:54PM -0500, Pierre Abbat wrote: > On Tuesday 21 December 2004 20:46, Robin Lee Powell wrote: > > On Tue, Dec 21, 2004 at 08:36:31PM -0500, Pierre Abbat wrote: > > > The problem with doing it in PEG is that it appears to be > > > impossible to check that a string matches two different PEs > > > with the same number of characters matched for both. That's > > > why every selma'o PE ends with checking for a space or > > > consonant, even though "cmavo" already checked for that. > > > > 1. I have no idea what that has to do with the rest of the > > conversation. > > > > 2. So what? It works quite well, you'll notice. > > In cmavo, yes. But when you consider lujvo and fu'ivla, especially > lujvo with fu'ivla rafsi in them, you have to check whether the > y-hyphens are necessary and whether something can be decomposed > into rafsi with or without a consonant at the front removed, and > the end of the brivla is marked, not by something simple such as a > consonant, but by the stress, with an unstressed syllable (or two, > if there's a 'y' present) after the stressed one.

Once again, I can't follow what you're saying. Clearly, you and I think in different ways when it comes to this sort of thing.

I see no evidence that the current PEG grammar is doing anything in a substantially incorrect fashion. If such evidence comes along, I'll let you know.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Wed 22 of Dec., 2004 13:46 GMT posts: 1912

> Hmm. Consider the following passage from CLL (4.3): > > All brivla have the following properties: > 1) always end in a vowel; > 2) always contain a consonant pair in the first five letters, where "y" and > apostrophe are not counted as letters for this purpose; > 3) always are stressed on the next-to-last (penultimate) syllable; this > implies that they have two or more syllables. > > I always assumed this to be definitive, rather than descriptive:

That is true (I changed 2 slightly to "its second consonant is always part of a cluster", but the point is the same). Those are properties that all brivla have, but not everything with those properties is a brivla.

> that any > word having all these characteristics is defined as being a brivla; not that > this happens to be true as a consequence of other rules.

No, that's not the case. For instance: {tosmabru}, {slinku'i}, {kniku}, {loertu}, {bytumi} all have those properties but are not brivla.

> I also believed > that any brivla that didn't match the pattern of a gismu or lujvo was > defined to be a fu'ivla.

That's true, but only because gismu, lujvo and fu'ivla are all the kind of brivla there are. You first need to find if it's a gismu, a lujvo or a fu'ivla, and only then can you conclude that it's a brivla. You can't start with brivla without detecting the others first.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Read only the mail you want - Yahoo! Mail SpamGuard. http://promotions.yahoo.com/new_mail

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Wed 22 of Dec., 2004 13:46 GMT

On Wednesday 22 December 2004 07:11, Jorge "Llambías" wrote: > No, that's not the case. For instance: {tosmabru}, {slinku'i}, {kniku}, > {loertu}, {bytumi} all have those properties but are not brivla.

I say {loertu} is a brivla (properly written {lo,ertu}), and the others aren't.

phma -- Mes règles mensuelles ont lieu une fois par an. -Les Perles de la médecine

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Wed 22 of Dec., 2004 13:46 GMT posts: 1912

> On Wednesday 22 December 2004 07:11, Jorge "Llambías" wrote: > > No, that's not the case. For instance: {tosmabru}, {slinku'i}, {kniku}, > > {loertu}, {bytumi} all have those properties but are not brivla. > > I say {loertu} is a brivla (properly written {lo,ertu}), and the others > aren't.

Yes, that's still a dubious case. I'm willing to go either way, as long as cmene, cmavo and fu'ivla are all given the same treatment.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Yahoo! Mail - 250MB free storage. Do more. Manage less. http://info.mail.yahoo.com/mail_250

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Wed 22 of Dec., 2004 14:20 GMT posts: 1912

Humanly readable algorithm for identifying fu'ivla.

A "syllable" is any permissible consonant cluster, or an apostrophe, or nothing, followed by a diphthong or by a single vowel.

Given a string of characters:

1. Check that it does not start with a cmene, a gismu or a lujvo.

2. Check whether it starts with a fu'ivla-head. A fu'ivla-head is something that looks like a cmavo without any y's. If there is no fu'ivla-head, go straight to 3.

A. If the fu'ivla-head is not followed by a consonant cluster, there is no fu'ivla (the head will fall off as a cmavo).

B. If the fu'ivla-head is followed by a non-initial cluster and one or more syllables, we have a fuhivla. If one of the syllables is stressed, the fu'ivla ends with the next syllable, otherwise it ends after the final syllable.

C. If the fu'ivla-head is followed by a permissible cluster, it may fall off. There is one case where it is saved: if only a single syllable follows the cluster, or if the head has a final stress so that it will accept only one more syllable. In those cases we have a fu'ivla.

3. If there is no fu'ivla head, that means we have a cluster. If it is not an initial-cluster, we don't ahve a valid word. If it is an initial cluster, it has to be followed by at least two syllables, and you need to check that adding {le} in front (or any other CV cmavo) does not convert it into a lujvo. If that doesn't happen, we have a fu'ivla.

In summary, we have just three types of possible fu'ivla:

1- Head fu'ivla with non-initial cluster plus tail. 2- Head fu'ivla with initial-cluster plus a single syllable ("short-tail") 3- Headless fu'ivla that pass the slinku'i test

mu'o mi'e xorxes

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Wed 22 of Dec., 2004 20:05 GMT

Jorge Llamb���)B�as scripsit:

> Any valid vowel pair is accepted in cmavo: > {.uau}, {miau}, {cuu}, {kiy}, etc. > > I don't want to restrict them in cmavo unless they are equally restricted > in cmene and fu'ivla.

The iy and uy sequences have *always* been restricted to use in cmene, because cmene are a wastebasket category that can't threaten the morphology: the only reason to restrict things in cmene is to preserve audio-visual isomorphism. Other than that, they are reserved for something so important and overriding that we absolutely need them. Using iy as fu'ivla-rafsi glue falls in that category, but allowing them in random user-constructed fu'ivla definitely does not. As for cmavo, we have more than enough long cmavo capability without any need to allow iy and uy there.

It's my considered opinion that vowel glides beyond the standard diphthongs shouldn't exist in Lojban at all, for the same reason that the forbidden consonant clusters are forbidden: glides too threaten audio-visual isomorphism. It's very hard to reliably distinguish between {u,au} and {u,uau}, or between {ua,u} and {ua,uu}. In fact, uu and ii are only tolerable IMHO because they are always preceded by ".".

It's time to tighten up. Sixteen diphthongs only, and of those, "iy" and "uy" in cmene only, unless we can prove that "iy" is really fully usable for fu'ivla-rafsi. The few corpus words that conflict with this should be replaced one way or another.

-- But that, he realized, was a foolish John Cowan thought; as no one knew better than he jcowan@reutershealth.com that the Wall had no other side. http://www.ccil.org/~cowan --Arthur C. Clarke, "The Wall of Darkness"

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Wed 22 of Dec., 2004 20:05 GMT

Jorge Llamb���)B�as scripsit:

> Well, mine too. Now is our time, so let's consider it. {iy} puts every > fu'ivla (at least those that start with a consonant) on an equal footing. > Just as with gismu, some are priviledged with shorter rafsi, but all > have at least the long ones. Why not extend the same benefit to fu'ivla?

I'm not sure if Nora was able to prove that iy didn't work, or if she was simply unable to prove that it did, but she's the person to ask.

-- When I'm stuck in something boring John Cowan where reading would be impossible or (who loves Asimov too) rude, I often set up math problems for jcowan@reutershealth.com myself and solve them as a way to pass http://www.ccil.org/~cowan the time. --John Jenkins http://www.reutershealth.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Wed 22 of Dec., 2004 20:05 GMT posts: 1912

> The iy and uy sequences have *always* been restricted to use in cmene, > because cmene are a wastebasket category that can't threaten the morphology: > the only reason to restrict things in cmene is to preserve audio-visual > isomorphism. Other than that, they are reserved for something so important > and overriding that we absolutely need them. Using iy as fu'ivla-rafsi glue > falls in that category, but allowing them in random user-constructed > fu'ivla definitely does not. As for cmavo, we have more than enough long > cmavo capability without any need to allow iy and uy there.

I agree "y" should not be allowed in fu'ivla other than as a hyphen for rafsi. I don't see how allowing anything in cmavo is more or less threatening to audio-visual isomorphism than allowing them in cmene, especially since new cmavo will be extremely rare anyway, whereas new cmene crop up all the time.

> It's my considered opinion that vowel glides beyond the standard diphthongs > shouldn't exist in Lojban at all, for the same reason that the forbidden > consonant clusters are forbidden: glides too threaten audio-visual > isomorphism.

That's how the PEG is set to work now:

a can never be followed by a, e, o or y e can never followed by a, e, o, u or y o can never followed by a, e, o, u or y y can never followed by a, e, i, o, or u

Those restrictions are absolute, no matter if there are intervening commas. (An intervening apostrophe allows any pair, the vowels are not adjacent then.)

In gismu: Only a, e, i, o, u. No vowel can be followed by another vowel. In lujvo: ai, au, ei, oi are the only pairs allowed. y allowed as hyphen. In fu'ivla: (i/u)(a/e/i/o/u) are added. Possibly iy as hyphen in lujvo. In cmene and lujvo: iy, uy and yy are added.

fu'ivla, cmene and lujvo allow longer strings of vowels as long as each adjacent pair is allowed.

> It's very hard to reliably distinguish between {u,au} and {u,uau}, or between > {ua,u} and {ua,uu}. In fact, uu and ii are only tolerable IMHO because they > are always preceded by ".".

It would be relatively easy to forbid vowel triples everywhere. We just add "!(vowel-y vowel-y)" at the end of the vowel rules.

> It's time to tighten up. Sixteen diphthongs only, and of those, "iy" and > "uy" > in cmene only, unless we can prove that "iy" is really fully usable for > fu'ivla-rafsi. The few corpus words that conflict with this should be > replaced one way or another.

I don't see any reason to make cmene different from cmavo as far as vowels are concerned. If it's pronounceable in Lojban, it can be allowed in both places without any ambiguity. If it's not pronounceable in Lojban, then obviously it is not pronounceable anywhere. Arbitrarily restricting one but not the other in the absence of ambiguity doesn't seem right.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? The all-new My Yahoo! - Get yours free! http://my.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Wed 22 of Dec., 2004 20:05 GMT posts: 14214

On Wed, Dec 22, 2004 at 12:09:30PM -0500, John Cowan wrote: > Jorge Llamb??????�)B???as scripsit: > > > Any valid vowel pair is accepted in cmavo: {.uau}, {miau}, > > {cuu}, {kiy}, etc. > > > > I don't want to restrict them in cmavo unless they are equally > > restricted in cmene and fu'ivla. > > The iy and uy sequences have *always* been restricted to use in > cmene, because cmene are a wastebasket category that can't > threaten the morphology:

As I've said before, I'm not reading any of the hardcore morphology discussion. I *hope*, though, that someone is writing up anything that might be controversial, for when it comes time to vote, just as I did with the main grammar.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Wed 22 of Dec., 2004 20:05 GMT posts: 14214

On Tue, Dec 21, 2004 at 06:49:01PM -0800, Jorge Llamb?as wrote: > > > The grammar in its current state does four separable things: > > 1. It partitions the input stream into words. > > 2. It validates the words, rejecting invalid vowel and consonant patterns. > > 3. It determines the selma'o of a cmavo. > > 4. It categorizes brivla into gismu, lujvo and fu'ivla. > > > > As a result, the grammar is fearsomely complex in spots. > > Yes. Unfortunately, this is unavoidable. Lojban morphology is an > ugly monster, that's a fact. > > It was me who asked Robin to separate the morphology from the main > syntax part of the grammar. The determination of selmaho is not > part of what I did, and I agree it belongs in a separate module, > but the way it is written now, you can ignore the selmaho part and > it works with just "words" at the highest level.

I could move the selma'o determination to the main grammar, and may very well do so, but it was easier at the time to add it to the morphologoy.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Wed 22 of Dec., 2004 20:05 GMT posts: 1912

> I'm not sure if Nora was able to prove that iy didn't work, or if she was > simply unable to prove that it did, but she's the person to ask.

It all depends on how it was supposed to work. fu'ivla-rafsi can't be combined with all normal rafsi: the immediately preceding one has to be CVCy-, CCVCy-, CVCCy- or another fu'ivla-rafsi. This is always achievable because every gismu has a four-letter rafsi. Given that, iy clearly does work: it can be added unambiguously after any fu'ivla and it could never be confused with a normal rafsi. The assumption here is that "i" can be added to any final vowel string. If we forbid vowel triples this won't work.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Send holiday email and support a worthy cause. Do good. http://celebrity.mail.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Wed 22 of Dec., 2004 20:05 GMT posts: 1912

> I could move the selma'o determination to the main grammar, and may > very well do so, but it was easier at the time to add it to the > morphologoy.

I don't have a problem with it as it stands. In fact, I wouldn't want selmaho sorting to be a part of the main grammar. I see three stages: (1) A character string split into words by their form, (2) words sorted into selmaho (3) a string of selmaho parsed as sentences. Each stage isolated from the others as much as possible (not necessarily in different files, just in clearly delimited sections).

Stage (2) is longwinded but trivial: three word forms collapse into one selma'o (BRIVLA), one word form is moved directly into its own selma'o (CMENE) and one word form is split into a hundred and some selmaho (A, BAI, ... ZOhU).

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? The all-new My Yahoo! - What will yours do? http://my.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Wed 22 of Dec., 2004 20:05 GMT posts: 14214

Cage match between camxes and valfendi, round one:

$ echo "muSTEl,aVIson" | valfendi -a -l -s >muSTE< -l,a VIson.

(which means "muSTE" is a non-Lojban word, "l,a" is a cmavo, and "VIson" is a cmene)

$ echo "muSTEl,aVIson" | myparser -m text |- CMAVO | PA: mu |- BRIVLA | gismu: STEl,a |- CMENE cmene: VIson

Seems to me like valfendi's bad here, but I'll let you guys fight it out.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Wed 22 of Dec., 2004 20:06 GMT posts: 14214

On Wed, Dec 22, 2004 at 10:58:13AM -0800, Robin Lee Powell wrote: > Cage match between camxes and valfendi, round one: > > $ echo "muSTEl,aVIson" | valfendi -a -l -s > >muSTE< -l,a VIson. > > (which means "muSTE" is a non-Lojban word, "l,a" is a cmavo, and > "VIson" is a cmene) > > $ echo "muSTEl,aVIson" | myparser -m > text > |- CMAVO > | PA: mu > |- BRIVLA > | gismu: STEl,a > |- CMENE > cmene: VIson > > Seems to me like valfendi's bad here, but I'll let you guys fight it > out.

Heh.

  • Sentence: muSTElaVIson 1

MISMATCH! valfendi: >muSTE< -la VIson. pegbased: -mu (STEla) VIson.

  • Sentence: muSTEla.VIson 1

MISMATCH! valfendi: >muSTE< -la VIson. pegbased: -mu (STEla) VIson.

  • Sentence: muSTE.laVIson 1

MISMATCH! valfendi: >muSTE< -la VIson. pegbased: -mu (STEla) VIson.

  • Sentence: muSTE.la.VIson 1

MISMATCH! valfendi: >muSTE< -la VIson. pegbased: -mu (STEla) VIson.

  • Sentence: muSTEl,aVIson 1

MISMATCH! valfendi: >muSTE< -l,a VIson. pegbased: -mu (STEl,a) VIson.

These have been normalized to valfendi's format, which is:

>nonLojbanWord< (brivla) -ma'o cmen.

valfendi *really* doesn't want to take that mu.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Wed 22 of Dec., 2004 20:06 GMT posts: 14214

On Wed, Dec 22, 2004 at 11:36:40AM -0800, Robin Lee Powell wrote: > On Wed, Dec 22, 2004 at 10:58:13AM -0800, Robin Lee Powell wrote: > > Cage match between camxes and valfendi, round one: > > > > $ echo "muSTEl,aVIson" | valfendi -a -l -s > > >muSTE< -l,a VIson. > > > > (which means "muSTE" is a non-Lojban word, "l,a" is a cmavo, and > > "VIson" is a cmene) > > > > $ echo "muSTEl,aVIson" | myparser -m > > text > > |- CMAVO > > | PA: mu > > |- BRIVLA > > | gismu: STEl,a > > |- CMENE > > cmene: VIson > > > > Seems to me like valfendi's bad here, but I'll let you guys fight it > > out. > > Heh. > > *** Sentence: muSTElaVIson 1 > MISMATCH! > valfendi: >muSTE< -la VIson. > pegbased: -mu (STEla) VIson.

Despite my insistence to not get involved, we figured this out in

  1. lojban. They're both full of shit.

camxes is allowing cmene without a preceding pause *or* {la}, which is especially insane if it's supposed to allow cmene with la/lai/doi in them, which modification *requires* that there be a pause before

  • all* cmene.

valfendi, OTOH, is invalidating potentially valid words on the

  • left* (mu stela) in favour of valid words on the *right* (la

vison), which is so much unlike how a human listener would deal with the issue that I'm quite stunned.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Wed 22 of Dec., 2004 20:06 GMT posts: 1912

> *** Sentence: muSTElaVIson 1 > MISMATCH! > valfendi: >muSTE< -la VIson. > pegbased: -mu (STEla) VIson.

Hmmm... This is arguable both ways, because names are not supposed to be allowed without an initial pause unless they follow doi/la/lai/la'i. We'd have to decide whether the stress-syllable rule for brivla or the rule for cmene has priority. (I go with left-to-right priority.)

> *** Sentence: muSTE.laVIson 1 > MISMATCH! > valfendi: >muSTE< -la VIson. > pegbased: -mu (STEla) VIson.

Oh dear, does PEG really do that? Which rule allows it to absorb that dot?

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Dress up your holiday email, Hollywood style. Learn more. http://celebrity.mail.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Wed 22 of Dec., 2004 21:23 GMT posts: 14214

On Wed, Dec 22, 2004 at 12:03:57PM -0800, Jorge Llamb?as wrote: > > --- Robin Lee Powell wrote: > > > *** Sentence: muSTElaVIson 1 > > MISMATCH! > > valfendi: >muSTE< -la VIson. > > pegbased: -mu (STEla) VIson. > > Hmmm... This is arguable both ways, because names are not supposed > to be allowed without an initial pause unless they follow > doi/la/lai/la'i. We'd have to decide whether the stress-syllable > rule for brivla or the rule for cmene has priority. (I go with > left-to-right priority.)

I would as well.

> > *** Sentence: muSTE.laVIson 1 > > MISMATCH! > > valfendi: >muSTE< -la VIson. > > pegbased: -mu (STEla) VIson. > > Oh dear, does PEG really do that?

As it turns out, no:

text |- CMAVO | PA: mu |- nonLojbanWord: STE |- spaces: . |- CMAVO | LA: la |- CMENE cmene: VIson

AKA -mu >STE.< -la VIson.

Let me figure out where my test script is borked.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Wed 22 of Dec., 2004 21:23 GMT posts: 1912

> > *** Sentence: muSTElaVIson 1 > > camxes is allowing cmene without a preceding pause *or* {la}, which > is especially insane if it's supposed to allow cmene with la/lai/doi > in them, which modification *requires* that there be a pause before > *all* cmene.

We don't require a pause before doi/la/lai/la'i. If we did, there would be no point to the !doi-la-lai-la'i restriction. Both camxes and valfendi allow doi/la/lai/la'i when preceded by a consonant or followed by a vowel, that's all, because in those cases there is no ambiguity.

I would vote to abolish the doi-la-lai-la'i restriction on cmene altogether, and always require pauses at both ends of cmene. Then this sentence would not be problematic, it's just a cmene.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? The all-new My Yahoo! - What will yours do? http://my.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Wed 22 of Dec., 2004 21:23 GMT posts: 14214

> > > *** Sentence: muSTE.laVIson 1 > > > MISMATCH! > > > valfendi: >muSTE< -la VIson. > > > pegbased: -mu (STEla) VIson. > > > > Oh dear, does PEG really do that? > > As it turns out, no: snip > Let me figure out where my test script is borked.

Fixed:

  • Sentence: muSTE.laVIson
  • Sentence: muSTE.laVIson ESC[033m1ESC[000m

MISMATCH! valfendi: >muSTE< -la VIson. pegbased: -mu >STE.< -la VIson.

The new mismatch list for these words is:

  • Sentence: muSTElaVIson 1

MISMATCH! valfendi: >muSTE< -la VIson. pegbased: -mu (STEla) VIson.

  • Sentence: MUstela.VIson 1

MISMATCH! valfendi: (MUste) -la VIson. pegbased: (MUste) -la. VIson.

  • Sentence: muSTEla.VIson 1

MISMATCH! valfendi: -mu (STEla) VIson. pegbased: -mu (STEla.) VIson.

  • Sentence: MUste.laVIson 1

MISMATCH! valfendi: (MUste) -la VIson. pegbased: (MUste.) -la VIson.

  • Sentence: muSTE.laVIson 1

MISMATCH! valfendi: >muSTE< -la VIson. pegbased: -mu >STE.< -la VIson.

  • Sentence: MUste.la.VIson 1

MISMATCH! valfendi: (MUste) -la VIson. pegbased: (MUste.) -la. VIson.

  • Sentence: muSTE.la.VIson 1

MISMATCH! valfendi: >muSTE< -la VIson. pegbased: -mu >STE.< -la. VIson.

  • Sentence: muSTEl,aVIson 1

MISMATCH! valfendi: >muSTE< -l,a VIson. pegbased: -mu (STEl,a) VIson.

Unfortunately, valfendi drops . and my parser doesn't, so these don't compare properly in several cases. I'll have to hack my test script more.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Wed 22 of Dec., 2004 21:23 GMT posts: 14214

On Wed, Dec 22, 2004 at 12:24:52PM -0800, Robin Lee Powell wrote: > Unfortunately, valfendi drops . and my parser doesn't, so these > don't compare properly in several cases. I'll have to hack my > test script more.

Fixed, new list:

  • Sentence: muSTElaVIson 1

MISMATCH! valfendi: >muSTE< -la VIson. pegbased: -mu (STEla) VIson.

  • Sentence: muSTE.laVIson 1

MISMATCH! valfendi: >muSTE< -la VIson. pegbased: -mu >STE< -la VIson.

  • Sentence: muSTE.la.VIson 1

MISMATCH! valfendi: >muSTE< -la VIson. pegbased: -mu >STE< -la VIson.

  • Sentence: muSTEl,aVIson 1

MISMATCH! valfendi: >muSTE< -l,a VIson. pegbased: -mu (STEl,a) VIson.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Wed 22 of Dec., 2004 21:23 GMT posts: 14214

On Wed, Dec 22, 2004 at 12:23:19PM -0800, Jorge Llamb?as wrote: > > --- Robin Lee Powell wrote: > > > > *** Sentence: muSTElaVIson 1 > > > > camxes is allowing cmene without a preceding pause *or* {la}, > > which is especially insane if it's supposed to allow cmene with > > la/lai/doi in them, which modification *requires* that there be > > a pause before *all* cmene. > > We don't require a pause before doi/la/lai/la'i. If we did, there > would be no point to the !doi-la-lai-la'i restriction. Both camxes > and valfendi allow doi/la/lai/la'i when preceded by a consonant or > followed by a vowel, that's all, because in those cases there is > no ambiguity. > > I would vote to abolish the doi-la-lai-la'i restriction on cmene > altogether, and always require pauses at both ends of cmene.

Umm, I thought that you were already doing this? Oh, I seem to have misunderstood valfendi's -a option:

-a cmevla can contain {la'u}, {doie}, etc.

That's a seperate issue, isn't it?

> Then this sentence would not be problematic, it's just a cmene.

Right.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Wed 22 of Dec., 2004 21:24 GMT

Jorge Llamb���)B�as scripsit:

> Hmmm... This is arguable both ways, because names are > not supposed to be allowed without an initial pause > unless they follow doi/la/lai/la'i. We'd have to decide > whether the stress-syllable rule for brivla or the > rule for cmene has priority. (I go with left-to-right > priority.)

The traditional understanding (as in the 1988 morphology) is that the cmene rule has absolute priority: a cmene is a maximal string of letters ending with consonant+pause and not containing la, lai, or doi (unless preceded by a consonant).

-- At the end of the Metatarsal Age, the dinosaurs John Cowan abruptly vanished. The theory that a single jcowan@reutershealth.com catastrophic event may have been responsible www.reutershealth.com has been strengthened by the recent discovery of www.ccil.org/~cowan a worldwide layer of whipped cream marking the Creosote-Tutelary boundary. --Science Made Stupid

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Wed 22 of Dec., 2004 21:24 GMT

Jorge Llamb���)B�as scripsit:

> I would vote to abolish the doi-la-lai-la'i restriction on > cmene altogether, and always require pauses at both ends of > cmene. Then this sentence would not be problematic, it's > just a cmene.

Arrgh. Pauses are obnoxious, and the more of them, the worse.

-- Andrew Watt on Microsoft: John Cowan "Never in the field of human computing jcowan@reutershealth.com has so much been paid by so many http://www.ccil.org/~cowan to so few!" (pace Winston Churchill) http://www.reutershealth.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Wed 22 of Dec., 2004 21:24 GMT posts: 1912

> The traditional understanding (as in the 1988 morphology) is that > the cmene rule has absolute priority: a cmene is a maximal string > of letters ending with consonant+pause and not containing la, lai, > or doi (unless preceded by a consonant).

Yes, we all agree with that part.

musTElaVIson

produces {VIson} as a cmene. That's the maximal string of letters ending with consonant+pause and not containing la, lai, or doi

The question is: What about cmene that follow a syllable {la} that is not a cmavo. Do they have to begin with a pause or not?

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Yahoo! Mail - Easier than ever with enhanced search. Learn more. http://info.mail.yahoo.com/mail_250

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Wed 22 of Dec., 2004 21:24 GMT posts: 14214

So far as I know, Lojban syllabification has never been clearly defined.

I assume that both cmaxes and valfendi make decisions about this. Is there any way to know if they've made the same decisions?

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Wed 22 of Dec., 2004 21:24 GMT posts: 1912

> > Arrgh. Pauses are obnoxious, and the more of them, the worse.

Don't some languages have the glottal stop as just another phoneme? Are they obnoxious just because we don't happen to have them as phonemes in our native language, or are they obnoxious in a more fundamental naturalistic sense?

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Meet the all-new My Yahoo! - Try it today! http://my.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Wed 22 of Dec., 2004 21:24 GMT posts: 1912

> So far as I know, Lojban syllabification has never been clearly > defined. > > I assume that both cmaxes and valfendi make decisions about this. > Is there any way to know if they've made the same decisions?

There is no need to fully define syllabification. All we need is to agree on what counts as a syllable core, and we all agree about that: Only ai, au, ai, oi, ia, ie, ii, io, iu, ua, ue, ui, uo, uu, a, e, i, o, and u count as syllable cores. Nothing else. And vowel strings of more than two vowels break in pairs from the left, as long as the pair is a valid diphthong.

The only doubt is whether we can use a comma to override the left to right pairing. We seem to agree that we cannot.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? The all-new My Yahoo! - Get yours free! http://my.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Wed 22 of Dec., 2004 22:09 GMT

Jorge Llamb���)B�as scripsit:

> Don't some languages have the glottal stop as just another phoneme? > Are they obnoxious just because we don't happen to have them as > phonemes in our native language, or are they obnoxious in a more > fundamental naturalistic sense?

Many languages have glottal stop as a phoneme, and it's not rare in some varieties of English as an allophone of /t/. But few languages have consonant clusters involving it, as Lojban does.

-- Not to perambulate John Cowan the corridors http://www.reutershealth.com during the hours of repose http://www.ccil.org/~cowan in the boots of ascension. --Sign in Austrian ski-resort hotel

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Wed 22 of Dec., 2004 22:09 GMT

Robin Lee Powell scripsit: > So far as I know, Lojban syllabification has never been clearly > defined.

The rule is, AFAIK, that every vowel or diphthong makes a syllable, and that consonant clusters are split between syllables unless they are a permissible initial, in which case they both belong to the following syllable. All such rules are artificial: there is no objective cross-language definition of "syllable".

-- "And it was said that ever after, if any John Cowan man looked in that Stone, unless he had a jcowan@reutershealth.com great strength of will to turn it to other www.ccil.org/~cowan purpose, he saw only two aged hands withering www.reutershealth.com in flame." --"The Pyre of Denethor"

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Thu 23 of Dec., 2004 00:22 GMT

On Wednesday 22 December 2004 13:11, Jorge "Llambías" wrote: > That's how the PEG is set to work now: > > a can never be followed by a, e, o or y > e can never followed by a, e, o, u or y > o can never followed by a, e, o, u or y > y can never followed by a, e, i, o, or u > > Those restrictions are absolute, no matter if there are > intervening commas. (An intervening apostrophe allows > any pair, the vowels are not adjacent then.) > > In gismu: Only a, e, i, o, u. No vowel can be followed by another vowel. > In lujvo: ai, au, ei, oi are the only pairs allowed. y allowed as hyphen. > In fu'ivla: (i/u)(a/e/i/o/u) are added. Possibly iy as hyphen in lujvo. > In cmene and lujvo: iy, uy and yy are added. > > fu'ivla, cmene and lujvo allow longer strings of vowels as > long as each adjacent pair is allowed.

Valfendi is set up like this: In gismu: Only a, e, i, o, u. In lujvo: ai, au, ei, oi are the only pairs allowed. y allowed as hyphen. In fu'ivla: All pairs (a/e/i/o/u)(a/e/i/o/u) are allowed. If the two vowels do not form a diphthong, they are assumed to have a comma between them. Arbitrarily long vowel strings are allowed. In fu'ivla lujvo: All pairs (a/e/i/o/u)(a/e/i/o/u) are allowed. y is allowed between consonants and must occur at least once. In cmene: All pairs are allowed. iy and uy are diphthongs. If the two vowels do not form a diphthong, they are assumed to have a comma between them. In cmavo: ai, au, ei, oi are allowed in arbitrarily long cmavo. (i/u)(a/e/i/o/u) are allowed in two-letter cmavo. No more than two vowels in a row are allowed; a cmavo with more than two vowels must contain an apostrophe.

phma -- Maintenant, j'ai besoin d'une loupe pour trouver mes lunettes! -Les Perles de la médecine

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Thu 23 of Dec., 2004 00:23 GMT

On Wednesday 22 December 2004 09:20, wikidiscuss@lojban.org wrote: > Re: PEG Morphology Algorithm > > Humanly readable algorithm for identifying fu'ivla. > > A "syllable" is any permissible consonant cluster, or an apostrophe, or > nothing, followed by a diphthong or by a single vowel. > > Given a string of characters: > > 1. Check that it does not start with a cmene, a gismu or a lujvo. > > 2. Check whether it starts with a fu'ivla-head. A fu'ivla-head is something > that looks like a cmavo without any y's. If there is no fu'ivla-head, go > straight to 3. > > A. If the fu'ivla-head is not followed by a consonant cluster, there is > no fu'ivla (the head will fall off as a cmavo). > > B. If the fu'ivla-head is followed by a non-initial cluster and one or > more syllables, we have a fuhivla. If one of the syllables is stressed, the > fu'ivla ends with the next syllable, otherwise it ends after the final > syllable. > > C. If the fu'ivla-head is followed by a permissible cluster, it may fall > off. There is one case where it is saved: if only a single syllable follows > the cluster, or if the head has a final stress so that it will accept only > one more syllable. In those cases we have a fu'ivla.

There are two cases. The other is that the string following the head is a slinku'i, and the fu'ivla-head is not of CV form (which makes a lujvo). E.g. {suicmardi} is a fu'ivla.

phma -- GCS/M d- s-: a+ C++ UL++++$ P+ L+++ E- W+++ N+ o? K? w-- O? M- V- Y++ PGP++ t- 5? X? R- !tv b++ DI !D G e++ h+>---- r- y>+++

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Thu 23 of Dec., 2004 00:23 GMT posts: 14214

Round two:

valfendi: -la mer. samo,as. pegbased: -la mer. -sa >mo,as<

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Thu 23 of Dec., 2004 00:23 GMT

On Wednesday 22 December 2004 13:58, Robin Lee Powell wrote: > Cage match between camxes and valfendi, round one: > > $ echo "muSTEl,aVIson" | valfendi -a -l -s > > >muSTE< -l,a VIson. > > (which means "muSTE" is a non-Lojban word, "l,a" is a cmavo, and > "VIson" is a cmene)

On finding {la} before a cmene without a pause between them, it breaks off what's on both sides of {la}. This is the way BRKWORDS did it.

If camxes and valfendi give different output on an invalid string (in this case, {mu stela vison} requires a pause before {vison}, and btw when I try {muSTEla.VIson} I get {mu stela vison}), that is not necessarily a bug.

phma -- Mes règles mensuelles ont lieu une fois par an. -Les Perles de la médecine

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Thu 23 of Dec., 2004 00:23 GMT

On Wednesday 22 December 2004 16:15, Jorge "Llambías" wrote: > --- Robin Lee Powell wrote: > > So far as I know, Lojban syllabification has never been clearly > > defined. > > > > I assume that both cmaxes and valfendi make decisions about this. > > Is there any way to know if they've made the same decisions? > > There is no need to fully define syllabification. All we need > is to agree on what counts as a syllable core, and we all > agree about that: Only ai, au, ai, oi, ia, ie, ii, io, iu, > ua, ue, ui, uo, uu, a, e, i, o, and u count as syllable cores. > Nothing else. And vowel strings of more than two vowels break > in pairs from the left, as long as the pair is a valid diphthong. > > The only doubt is whether we can use a comma to override > the left to right pairing. We seem to agree that we cannot.

I say it can, but the only time it matters in valfendi is in a brivla that ends with a diphthong. If you say {draga,u}, it expects the second A to be stressed.

phma -- Without glasses, I can't even distinguish smells... -Les Perles de la médecine

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Thu 23 of Dec., 2004 00:23 GMT posts: 14214

On Wed, Dec 22, 2004 at 06:51:36PM -0500, Pierre Abbat wrote: > If camxes and valfendi give different output on an invalid string snip > that is not necessarily a bug.

That's a good point, but in many of these cases I'm going to need you guys to tell me what is and is not a valid string.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Thu 23 of Dec., 2004 00:23 GMT posts: 14214

On Wed, Dec 22, 2004 at 04:12:08PM -0800, Robin Lee Powell wrote: > On Wed, Dec 22, 2004 at 06:51:36PM -0500, Pierre Abbat wrote: > > If camxes and valfendi give different output on an invalid > > string > snip > > that is not necessarily a bug. > > That's a good point, but in many of these cases I'm going to need > you guys to tell me what is and is not a valid string.

For example, in this case one of you thinks it's invalid, the other does not:

  • Sentence: muSTElaVIson 1

MISMATCH! valfendi: >muSTE< -la VIson. pegbased: -mu (STEla) VIson.

Morphologically invalid, I mean. Both cases are grammatically invalid.

I'm pretty sure camxes is wrong on this one.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Thu 23 of Dec., 2004 05:49 GMT posts: 14214

I've got a *lot* of these:

  • Sentence: fi'oricyrAtcu airicyrAtcu ruericyrAtcu ioricyrAtcu 1

MISMATCH! valfendi: -fi'o (ricyrAtcu) -ai (ricyrAtcu) >rue< (ricyrAtcu) -io (ricyrAtcu) pegbased: -fi'o (ricyrAtcu) -ai (ricyrAtcu) -rue (ricyrAtcu) -io (ricyrAtcu)

It seems that there are a *bunch* of cases where camxes accepts cmavo that valfendi does not. I could probably find several hundred if I tried. You guys should hash that out.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Thu 23 of Dec., 2004 05:49 GMT posts: 14214

Ummm...

  • Sentence: .ui mi facki fi le mi mapku 1

MISMATCH! valfendi: -ui -mi (facki) -fi -le -mi >mapku< pegbased: -ui -mi (facki) -fi -le -mi (mapku)

That's a *bug*, Pierre.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Thu 23 of Dec., 2004 05:49 GMT posts: 14214

On Wed, Dec 22, 2004 at 04:27:24PM -0800, Robin Lee Powell wrote: > Ummm... > > *** Sentence: .ui mi facki fi le mi mapku 1 > MISMATCH! > valfendi: -ui -mi (facki) -fi -le -mi >mapku< > pegbased: -ui -mi (facki) -fi -le -mi (mapku) > > That's a *bug*, Pierre.

  • Sentence: ti poi ke'a nazbi kapkevna ku'o cu barda 1

MISMATCH! valfendi: -ti -poi -ke'a (nazbi) >kapkevna< -ku'o -cu (barda) pegbased: -ti -poi -ke'a (nazbi) (kapkevna) -ku'o -cu (barda)

An aversion to pk, apparently.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Thu 23 of Dec., 2004 05:49 GMT posts: 1912

> > C. If the fu'ivla-head is followed by a permissible cluster, it may fall > > off. There is one case where it is saved: if only a single syllable follows > > the cluster, or if the head has a final stress so that it will accept only > > one more syllable. In those cases we have a fu'ivla. > > There are two cases. The other is that the string following the head is a > slinku'i, and the fu'ivla-head is not of CV form (which makes a lujvo). E.g. > {suicmardi} is a fu'ivla.

Ouch! You're very right, I had completely missed that case. I have now modified fuhivla to:

fuhivla <- !cmene !gismu !lujvo (stressed-fuhivla-head cluster fuhivla-tail / fuhivla-head cluster stressed-fuhivla-tail) / !CVC-rafsi &cmavo-form syllable (!consonant syllable)* slinkuhi

The final alternative should take care of that very odd case.

(fuhivla-rafsi was also modified accordingly.)

ki'e mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Send a seasonal email greeting and help others. Do good. http://celebrity.mail.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Thu 23 of Dec., 2004 05:49 GMT posts: 1912

> Round two: > > valfendi: -la mer. samo,as. > pegbased: -la mer. -sa >mo,as<

That's a known difference. valfendi is more permissive with vowel groups in cmene and fuhivla. If we want to match valfendi in this regard, we have to eliminate the !a !e etc. at the end of the vowel rules.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Send a seasonal email greeting and help others. Do good. http://celebrity.mail.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Thu 23 of Dec., 2004 05:49 GMT posts: 1912

> On Wednesday 22 December 2004 16:15, Jorge "Llambías" wrote:

> > The only doubt is whether we can use a comma to override > > the left to right pairing. We seem to agree that we cannot. > > I say it can, but the only time it matters in valfendi is in a brivla that > ends with a diphthong. If you say {draga,u}, it expects the second A to be > stressed.

But then, is {DRAga,u} allowed? Is it different from {DRAgau}?

camxes (correctly, I think) rejects {dragA,u}.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Read only the mail you want - Yahoo! Mail SpamGuard. http://promotions.yahoo.com/new_mail

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Thu 23 of Dec., 2004 05:50 GMT

On Wednesday 22 December 2004 19:14, Robin Lee Powell wrote: > On Wed, Dec 22, 2004 at 04:12:08PM -0800, Robin Lee Powell wrote: > > On Wed, Dec 22, 2004 at 06:51:36PM -0500, Pierre Abbat wrote: > > > If camxes and valfendi give different output on an invalid > > > string > > > > snip > > > > > that is not necessarily a bug. > > > > That's a good point, but in many of these cases I'm going to need > > you guys to tell me what is and is not a valid string. > > For example, in this case one of you thinks it's invalid, the other > does not: > > *** Sentence: muSTElaVIson 1 > MISMATCH! > valfendi: >muSTE< -la VIson. > pegbased: -mu (STEla) VIson. > > Morphologically invalid, I mean. Both cases are grammatically > invalid. > > I'm pretty sure camxes is wrong on this one.

It's invalid as an encoding of {mu stela vison} because the cmene is preceded by a brivla without a pause between them. It's invalid as an encoding of {muste la vison} because the accent is on the wrong syllable.

{kybuladjan} is invalid because {ky} needs a pause after it. Both lexers, however, lex this as {ky bu la djan} (or so xorxes claims for camxes). The official rules state that the pause must be between the Cy and the next word that isn't Cy, but I figured out that it can be between the Cy and the next word that contains CVV, CV'V, or CCV, so I say {kybu.ladjan}.

{kymoi}, {kybumoi}, {kybumlatu}, {lekymoi}, {lekybumoi}, and {lekybumlatu} are more phrases with the pause after the lervla missing. valfendi thinks they all contain brivla, but errors out trying to identify it, except for {ky bu mlatu}.

phma -- S Fa1>+/- !TM M-- K H T-- t? AT++ SY Te- SC- FO- D P !Tz E++ L

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Thu 23 of Dec., 2004 05:50 GMT posts: 1912

> On Wed, Dec 22, 2004 at 04:12:08PM -0800, Robin Lee Powell wrote: > > On Wed, Dec 22, 2004 at 06:51:36PM -0500, Pierre Abbat wrote: > > > If camxes and valfendi give different output on an invalid > > > string > > snip > > > that is not necessarily a bug. > > > > That's a good point, but in many of these cases I'm going to need > > you guys to tell me what is and is not a valid string.

I don't think that's right. If the string is invalid, both parsers should say it is invalid. If they say anything else, it is a bug.

> For example, in this case one of you thinks it's invalid, the other > does not: > > *** Sentence: muSTElaVIson 1 > MISMATCH! > valfendi: >muSTE< -la VIson. > pegbased: -mu (STEla) VIson. > > Morphologically invalid, I mean. Both cases are grammatically > invalid.

Make it grammatically valid:

lo'u musSTElaVIson le'u lojbo valsi

> I'm pretty sure camxes is wrong on this one.

I'm not so sure. I'm inclined to say it is not wrong, because the rules for identifying cmene are purely *morphological*. They should not rely on identifying the preceding "la" as a gadri. Any syllable "la" will allow the cmene to skip the initial pause.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? All your favorites on one personal page – Try My Yahoo! http://my.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Thu 23 of Dec., 2004 05:50 GMT posts: 1912

> I've got a *lot* of these: > > *** Sentence: fi'oricyrAtcu airicyrAtcu ruericyrAtcu ioricyrAtcu 1 > MISMATCH! > valfendi: -fi'o (ricyrAtcu) -ai (ricyrAtcu) >rue< (ricyrAtcu) -io (ricyrAtcu) > pegbased: -fi'o (ricyrAtcu) -ai (ricyrAtcu) -rue (ricyrAtcu) -io (ricyrAtcu) > > It seems that there are a *bunch* of cases where camxes accepts > cmavo that valfendi does not. I could probably find several hundred > if I tried. You guys should hash that out.

That's a known difference: camxes allows the y-hyphen after any CVC-rafsi, whether required or not.

This can probably be changed with some work, but I think this is a feature, not a bug. In fact, I'm inclined to do the same for the r/n-hyphen after CVV.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Read only the mail you want - Yahoo! Mail SpamGuard. http://promotions.yahoo.com/new_mail

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Thu 23 of Dec., 2004 05:50 GMT

On Wednesday 22 December 2004 19:27, Robin Lee Powell wrote: > Ummm... > > *** Sentence: .ui mi facki fi le mi mapku 1 > MISMATCH! > valfendi: -ui -mi (facki) -fi -le -mi >mapku< > pegbased: -ui -mi (facki) -fi -le -mi (mapku) > > That's a *bug*, Pierre.

In pairtable, change the 'p' line to: /*p*/ " + + +=++ =++ + ",

phma -- Ils pensent que j'ai un cancer du thé russe... -Les Perles de la médecine

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Thu 23 of Dec., 2004 05:50 GMT posts: 1912

> Valfendi is set up like this: > In gismu: Only a, e, i, o, u. > In lujvo: ai, au, ei, oi are the only pairs allowed. y allowed as hyphen. > In fu'ivla: All pairs (a/e/i/o/u)(a/e/i/o/u) are allowed. If the two vowels > do > not form a diphthong, they are assumed to have a comma between them. > Arbitrarily long vowel strings are allowed. > In fu'ivla lujvo: All pairs (a/e/i/o/u)(a/e/i/o/u) are allowed. y is allowed > between consonants and must occur at least once. > In cmene: All pairs are allowed. iy and uy are diphthongs. If the two vowels > do not form a diphthong, they are assumed to have a comma between them.

All that makes sense to me: maximally permissive.

> In cmavo: ai, au, ei, oi are allowed in arbitrarily long cmavo. > (i/u)(a/e/i/o/u) are allowed in two-letter cmavo. No more than two vowels in > a row are allowed; a cmavo with more than two vowels must contain an > apostrophe.

This, however, which is maximally restrictive, makes no sense to me in conjunction with the above rules. If you are maximally permissive with cmene and fu'ivla I see no reason not to be equally permissive with cmavo.

I don't have a strong opinion on which way we should go as far as permissiveness of vowel pairs, but I do think we should be consistent and not have arbitrary restrictions that apply to one class of words and not to other, for no aparent reason.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Yahoo! Mail - 250MB free storage. Do more. Manage less. http://info.mail.yahoo.com/mail_250

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Thu 23 of Dec., 2004 05:50 GMT posts: 1912

> > *** Sentence: muSTElaVIson 1 > > MISMATCH! > > valfendi: >muSTE< -la VIson. > > pegbased: -mu (STEla) VIson. > > It's invalid as an encoding of {mu stela vison} because the cmene is preceded

> by a brivla without a pause between them.

Is there a rule that says that a cmene can't be preceded by a brivla without a pause between them? That would be odd, because cmene can practically never appear after a brivla. Isn't the cmene morphology rule about the syllable {la} rather than the cmavo {la}? And if it isn't, shouldn't it be? The morphology should not care about what the words mean, only about their form.

> {kybuladjan} is invalid because {ky} needs a pause after it. Both lexers, > however, lex this as {ky bu la djan} (or so xorxes claims for camxes). The > official rules state that the pause must be between the Cy and the next word > that isn't Cy, but I figured out that it can be between the Cy and the next > word that contains CVV, CV'V, or CCV, so I say {kybu.ladjan}.

Right. The official rules are more strict than they need to be here. Both parsers have a bug with respect to the official rules, but this will not be a bug with respect to the new official rules if they are approved.

> {kymoi}, {kybumoi}, {kybumlatu}, {lekymoi}, {lekybumoi}, and {lekybumlatu} > are > more phrases with the pause after the lervla missing. valfendi thinks they > all contain brivla, but errors out trying to identify it, except for {ky bu > mlatu}.

camxes should give:

ky moi ky bu moi ky bu mlatu lekymoi (= lekmoi) le ky bu moi le ky bu mlatu

Again, these are bugs with respect to the official rules, which are more strict than required for unambiguity. We could force these to be errors, but that seems pointless.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Yahoo! Mail - now with 250MB free storage. Learn more. http://info.mail.yahoo.com/mail_250

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Thu 23 of Dec., 2004 05:50 GMT

Jorge Llamb���)B�as scripsit:

> The question is: What about cmene that follow a syllable > {la} that is not a cmavo. Do they have to begin with a pause > or not?

I guess the answer is that it doesn't matter much, since such a situation is always going to be a syntax error anyhow.

-- Real FORTRAN programmers can program FORTRAN John Cowan in any language. --Allen Brown jcowan@reutershealth.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Thu 23 of Dec., 2004 05:50 GMT

Jorge Llamb���)B�as scripsit:

> In cmene and lujvo: iy, uy and yy are added.

I assume you mean cmavo rather than lujvo. "yy" should not be allowed; it has no defined pronunciation.

> It would be relatively easy to forbid vowel triples everywhere. We just > add "!(vowel-y vowel-y)" at the end of the vowel rules.

I believe we should do so. This is a break with the past, but a modest one.

> I don't see any reason to make cmene different from cmavo as far > as vowels are concerned.

The point is that iy and uy are reserved for a morphological mechanism. Can arbitrary fu'ivla lujvo be constructed using iy after an initial fu'ivla, iy before and after a medial fu'ivla, and iy before a final fu'ivla? That was the original design.

As I said before, allowing iy and uy in cmene even when they are reserved otherwise is safe because cmene are defined backwards.

-- Schlingt dreifach einen Kreis vom dies! John Cowan Schliesst euer Aug vor heiliger Schau, http://www.reutershealth.com Denn er genoss vom Honig-Tau, http://www.ccil.org/~cowan Und trank die Milch vom Paradies. — Coleridge (tr. Politzer)

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Thu 23 of Dec., 2004 05:50 GMT

Jorge Llamb���)B�as scripsit:

> It all depends on how it was supposed to work. fu'ivla-rafsi > can't be combined with all normal rafsi: the immediately preceding > one has to be CVCy-, CCVCy-, CVCCy- or another fu'ivla-rafsi.

But when iy is the glue?

-- John Cowan jcowan@reutershealth.com www.reutershealth.com www.ccil.org/~cowan Reversing the apostolic precept to be all things to all men, I usually [http://www.lojban.org/tiki/before%3Cbr%20/%3EDarwin before Darwin] defended the tenability of the received doctrines, when I had to do with the evolutionists; and stood up for the possibility of evolution among the orthodox — thereby, no doubt, increasing an already current, but quite undeserved, reputation for needless combativeness. --T. H. Huxley

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Thu 23 of Dec., 2004 09:31 GMT posts: 14214

On Wed, Dec 22, 2004 at 04:57:48PM -0800, Jorge Llamb?as wrote: > > --- Robin Lee Powell wrote: > > Round two: > > > > valfendi: -la mer. samo,as. > > pegbased: -la mer. -sa >mo,as< > > That's a known difference. valfendi is more permissive with vowel > groups in cmene and fuhivla.

camxes is more permissive in cmene, though.

valfendi: -zoi -fy booz. -fy -co -sa -zoi bar. baz. bar. pegbased: -zoi -fy >booz< -fy -co -sa -zoi bar. baz. bar.

> If we want to match valfendi in this regard, we have to eliminate > the !a !e etc. at the end of the vowel rules.

I don't much care what you two agree on, so long as you do.

This may not be possible, of course.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Thu 23 of Dec., 2004 09:31 GMT posts: 14214

On Wed, Dec 22, 2004 at 05:17:11PM -0800, Jorge Llamb?as wrote: > > --- Robin Lee Powell wrote: > > > I've got a *lot* of these: > > > > *** Sentence: fi'oricyrAtcu airicyrAtcu ruericyrAtcu ioricyrAtcu 1 > > MISMATCH! > > valfendi: -fi'o (ricyrAtcu) -ai (ricyrAtcu) >rue< (ricyrAtcu) -io (ricyrAtcu) > > pegbased: -fi'o (ricyrAtcu) -ai (ricyrAtcu) -rue (ricyrAtcu) -io (ricyrAtcu) > > > > It seems that there are a *bunch* of cases where camxes accepts > > cmavo that valfendi does not. I could probably find several hundred > > if I tried. You guys should hash that out. > > That's a known difference: camxes allows the y-hyphen after any > CVC-rafsi, whether required or not.

That has nothing whatever to do with this case that I can see. This is about whether rue is a valid cmavo.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Thu 23 of Dec., 2004 09:31 GMT posts: 14214

How about this:

valfendi: (soindebi) (soindebytsi) (betysoindebytsi) (betysoindebi) pegbased: (soindebi) (soindebytsi) (betysoindebytsi) -be -ty (soindebi)

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Thu 23 of Dec., 2004 12:16 GMT posts: 1912

> On Wed, Dec 22, 2004 at 04:57:48PM -0800, Jorge Llamb?as wrote: > > > valfendi: -la mer. samo,as. > > > pegbased: -la mer. -sa >mo,as< > > > > That's a known difference. valfendi is more permissive with vowel > > groups in cmene and fuhivla. > > camxes is more permissive in cmene, though. > > valfendi: -zoi -fy booz. -fy -co -sa -zoi bar. baz. bar. > pegbased: -zoi -fy >booz< -fy -co -sa -zoi bar. baz. bar.

That's valfendi being permissive again.

camxes is more permissive in cmavo: it should accept {miau} which I believe valfendi rejects.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? The all-new My Yahoo! - Get yours free! http://my.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Thu 23 of Dec., 2004 12:25 GMT posts: 1912

> How about this: > > valfendi: (soindebi) (soindebytsi) (betysoindebytsi) (betysoindebi) > pegbased: (soindebi) (soindebytsi) (betysoindebytsi) -be -ty (soindebi)

That was a bug in pegbased. medial-rafsi needed a !fuhivla in front. Hopefully that fixes it.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Meet the all-new My Yahoo! - Try it today! http://my.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Thu 23 of Dec., 2004 14:09 GMT posts: 1912

> Jorge Llamb���)B�as scripsit: > > The question is: What about cmene that follow a syllable > > {la} that is not a cmavo. Do they have to begin with a pause > > or not? > > I guess the answer is that it doesn't matter much, since such a situation > is always going to be a syntax error anyhow.

Not always:

lo'u stela vison le'u cu lojbo valsi

That should not be a syntax error.

I agree it doesn't matter much. I just don't want to call the split of a cmene after {stela} a bug.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Yahoo! Mail - 250MB free storage. Do more. Manage less. http://info.mail.yahoo.com/mail_250

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Thu 23 of Dec., 2004 14:09 GMT posts: 1912

> Jorge Llamb���)B�as scripsit: > > > In cmene and lujvo: iy, uy and yy are added. > > I assume you mean cmavo rather than lujvo.

Yes.

> "yy" should not be allowed; > it has no defined pronunciation.

That's a concesion to usage. People tend to write yyyyyy for long hesitations.

We can define that y+ is equivalent to y for all purposes.

> > It would be relatively easy to forbid vowel triples everywhere. We just > > add "!(vowel-y vowel-y)" at the end of the vowel rules. > > I believe we should do so. This is a break with the past, but a modest one.

I tend to agree. valfendi does this for cmavo already, but not for cmene and fuhivla.

> > I don't see any reason to make cmene different from cmavo as far > > as vowels are concerned. > > The point is that iy and uy are reserved for a morphological mechanism. > Can arbitrary fu'ivla lujvo be constructed using iy after an initial > fu'ivla, iy before and after a medial fu'ivla, and iy before a final > fu'ivla? That was the original design.

Yes, for all fuhivla that don't start with a vowel.

(For those that do start with a vowel, we would need to allow iy+vowel which I don't think is acceptable.)

The design I had in mind was for iy to be used only after fuhivla to form its rafsi, but I see that allowing it after normal rafsi would have its advantages too. Another possibility would be to not allow it after normal rafsi but allow it to give a rafsi to every cmavo!

> As I said before, allowing iy and uy in cmene even when they are reserved > otherwise is safe because cmene are defined backwards.

In PEG they are defined forwards, but it doesn't matter anyway.

If iy is allowed after normal rafsi, or to give rafsi to every cmavo, then it should not be allowed in cmavo. That's a good reason.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Send holiday email and support a worthy cause. Do good. http://celebrity.mail.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Thu 23 of Dec., 2004 14:10 GMT posts: 1912

> Jorge Llamb���)B�as scripsit: > > > It all depends on how it was supposed to work. fu'ivla-rafsi > > can't be combined with all normal rafsi: the immediately preceding > > one has to be CVCy-, CCVCy-, CVCCy- or another fu'ivla-rafsi. > > But when iy is the glue?

I hadn't thought of allowing {iy} after normal rafsi. We should not allow it after {CVC}. I'm not sure if ambiguities would result, but the whole point of using the "i" is to attach to a vowel. CVC can take y directly. (Same for CVCC and CCVC, of course.)

Instead of allowing iy after CVV rafsi, we could allow it after any cmavo: that way every cmavo that starts with a consonant could be used in lujvo in non-final position.

As a further refinement, if we don't allow consonant triples we can use {'iy} as the hyphen after a diphthong:

pavyseljirna = paiyseljirna pai zei seljirna = pai'iyseljirna

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Send a seasonal email greeting and help others. Do good. http://celebrity.mail.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Thu 23 of Dec., 2004 14:10 GMT posts: 1912

New idea: Forget about {-iy-} and use {-'y-} as the general fuhivla hyphen. That won't interfere with an eventual no-triple-vowels rule, and it won't introduce diphthongs where there weren't any. Also allow {-'y-} to give rafsi to cmavo:

pavyseljirna = pa'yseljirna

pai zei seljirna = pai'yseljirna

iglu zei xabju = iglu'yxa'u

{iy} could be used to give rafsi to cmene, but then we should not allow it otherwise in cmene.

djan zei zdani = djaniyzda

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Send holiday email and support a worthy cause. Do good. http://celebrity.mail.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Fri 24 of Dec., 2004 01:20 GMT

Jorge Llamb���)B�as scripsit:

> That's a concesion to usage. People tend to write yyyyyy > for long hesitations.

No problem.

> We can define that y+ is equivalent to y for all purposes.

No, no, no. We do not want people writing selylujvo as selyyyyyyyyyyyyyyyyylujvo. If you want to say that the word y (or rather .y.) can be written with multiple y's, fine. But not everywhere in the language!

-- John Cowan jcowan@reutershealth.com www.ccil.org/~cowan www.reutershealth.com Linguistics is arguably the most hotly contested property in the academic realm. It is soaked with the blood of poets, theologians, philosophers, philologists, psychologists, biologists and neurologists, along with whatever blood can be got out of grammarians. - Russ Rymer

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Fri 24 of Dec., 2004 01:20 GMT

Jorge Llamb���)B�as scripsit:

> New idea: Forget about {-iy-} and use {-'y-} as the > general fuhivla hyphen. That won't interfere with > an eventual no-triple-vowels rule, and it won't > introduce diphthongs where there weren't any. > Also allow {-'y-} to give rafsi to cmavo:

snip

> {iy} could be used to give rafsi to cmene, but then > we should not allow it otherwise in cmene.

Even though all this can be made to work, I think we are better off forgetting about it and sticking to zei. For one thing, you can pause around zei to catch your breath.

-- Deshil Holles eamus. Deshil Holles eamus. Deshil Holles eamus. Send us, bright one, light one, Horhorn, quickening, and wombfruit. (3x) Hoopsa, boyaboy, hoopsa! Hoopsa, boyaboy, hoopsa! Hoopsa, boyaboy, hoopsa! — Joyce, Ulysses, "Oxen of the Sun" jcowan@reutershealth.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Fri 24 of Dec., 2004 01:20 GMT posts: 1912

> If you want to say that the word y (or rather .y.) can be written with > multiple y's, fine. But not everywhere in the language!

OK. The way it is written now, only a single y is accepted for hyphens in lujvo, but any number are accepted in cmavo and cmene.

I'll fix that.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Yahoo! Mail - 250MB free storage. Do more. Manage less. http://info.mail.yahoo.com/mail_250

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Fri 24 of Dec., 2004 01:20 GMT posts: 1912

> > Even though all this can be made to work, I think we are better off > forgetting > about it and sticking to zei. For one thing, you can pause around zei > to catch your breath.

I'm not proposing to eliminate zei but just to add more options. Something like {'y} would be much easier to use than having to work out on the fly whether a given fuhivla can have a rafsi or not.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? The all-new My Yahoo! - What will yours do? http://my.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Fri 24 of Dec., 2004 01:20 GMT posts: 14214

More fun:

  • Sentence: yspatrkomposIta bYspatrkomposIta YspatrkomposIta lOspatrkomposIta 1

MISMATCH! valfendi: -y (spatrkomposIta) -bY (spatrkomposIta) -Y (spatrkomposIta) -lO (spatrkomposIta) pegbased: -y >spatrkomposIta< -bY >spatrkomposIta< -Y >spatrkomposIta< (lOspa) >trkomposIta<

  • Sentence: jAIspatrkomposIta lO'espatrkomposIta mu'eispatrkomposIta uaiuspatrkomposIta 1

MISMATCH! valfendi: -jAI (spatrkomposIta) -lO'e (spatrkomposIta) -mu'ei (spatrkomposIta) >uaiu< (spatrkomposIta) pegbased: (jAIspa) >trkomposIta< -lO'e >spatrkomposIta< (mu'eispatrkomposIta) (uaiuspatrkomposIta)

  • Sentence: spatrliliAce ispatrliliAce lespatrliliAce rauspatrliliAce 1

MISMATCH! valfendi: (spatrliliAce) -i (spatrliliAce) -le (spatrliliAce) -rau (spatrliliAce) pegbased: >spatrliliAce< (ispatrliliAce) (lespatrliliAce) (rauspatrliliAce)

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Fri 24 of Dec., 2004 01:20 GMT posts: 14214

On Wed, Dec 22, 2004 at 05:40:20PM -0800, Jorge Llamb?as wrote: > > {kymoi}, {kybumoi}, {kybumlatu}, {lekymoi}, {lekybumoi}, and > > {lekybumlatu} are more phrases with the pause after the lervla > > missing. valfendi thinks they all contain brivla, but errors out > > trying to identify it, except for {ky bu mlatu}. > > camxes should give: > > ky moi > ky bu moi > ky bu mlatu > lekymoi (= lekmoi) > le ky bu moi > le ky bu mlatu

$ echo "kymoi kybumoi kybumlatu lekymoi lekybumoi and lekybumlatu" | myparser -m Simple morphological breakdown requested. Processing /dev/stdin ... Morphology pass: text |- CMAVO | BY: ky |- CMAVO | MOI: moi |- spaces: |- CMAVO | BY: ky |- CMAVO | BU: bu |- CMAVO | MOI: moi |- spaces: |- CMAVO | BY: ky |- CMAVO | BU: bu |- BRIVLA | gismu: mlatu |- spaces: |- BRIVLA | lujvo: lekymoi |- spaces: |- CMAVO | LE: le |- CMAVO | BY: ky |- CMAVO | BU: bu |- CMAVO | MOI: moi |- spaces: |- CMENE | cmene: and |- spaces: |- CMAVO | LE: le |- CMAVO | BY: ky |- CMAVO | BU: bu |- BRIVLA gismu: mlatu

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Fri 24 of Dec., 2004 01:21 GMT

On Thursday 23 December 2004 16:49, Robin Lee Powell wrote: > More fun: > > *** Sentence: yspatrkomposIta bYspatrkomposIta YspatrkomposIta > lOspatrkomposIta 1 MISMATCH! > valfendi: -y (spatrkomposIta) -bY (spatrkomposIta) -Y (spatrkomposIta) -lO > (spatrkomposIta) pegbased: -y >spatrkomposIta< -bY >spatrkomposIta< -Y > >spatrkomposIta< (lOspa) >trkomposIta<

Breaking {lospa} off is not necessarily a bug (valfendi already knows that there's a brivla, not a cmene, in this text and sees that "trk" cannot begin anything but a cmene, so it calls "lO" a secondary stress), but {spatrkomposita} is a valid type-4.

phma -- Ils pensent que j'ai un cancer du thé russe... -Les Perles de la médecine

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Fri 24 of Dec., 2004 01:21 GMT posts: 1912

> *** Sentence: yspatrkomposIta bYspatrkomposIta YspatrkomposIta > pegbased: -y >spatrkomposIta< -bY >spatrkomposIta< -Y >spatrkomposIta< > (lOspa) >trkomposIta<

I think it's fixed, but some rules look more ugly than they need to, so I will probably be tinkering some more with it.

mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Yahoo! Mail - Find what you need with new enhanced search. http://info.mail.yahoo.com/mail_250

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Fri 24 of Dec., 2004 01:21 GMT posts: 14214

On Thu, Dec 23, 2004 at 09:49:49AM -0800, Jorge Llamb?as wrote: > > --- John Cowan wrote: > > If you want to say that the word y (or rather .y.) can be > > written with multiple y's, fine. But not everywhere in the > > language! > > OK. The way it is written now, only a single y is accepted for > hyphens in lujvo, but any number are accepted in cmavo and cmene. > > I'll fix that.

You broke it! :-)

$ echo "yyy" | myparser -m text |- nonLojbanWord: yy |- spaces: y

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by klark on Fri 24 of Dec., 2004 01:21 GMT posts: 10

>> All brivla have the following properties: >> 1) always end in a vowel; >> 2) always contain a consonant pair in the first five letters, where "y" >> and >> apostrophe are not counted as letters for this purpose; >> 3) always are stressed on the next-to-last (penultimate) syllable; this >> implies that they have two or more syllables. >> >> I always assumed this to be definitive, rather than descriptive: > > That is true (I changed 2 slightly to "its second consonant is always > part of a cluster", but the point is the same). Those are properties > that all brivla have, but not everything with those properties is > a brivla.

I think I see. The above is the definition of "brivla-form" (not "brivla"). A word that matches "brivla-form" may be a gismu, lujvo, fu'ivla, or invalid. The are some simple validity tests that can be applied to a brivla-form that do not require characterization, but full validation can't be done without characterization.

But I still don't see why a word-partitioning parser, which has to deal with more word-forms and therefore needs additional complexity, couldn't use the simple brivla-form definition, and thereby reduce point-complexity.

Clark

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Fri 24 of Dec., 2004 01:21 GMT posts: 1912

> I think I see. The above is the definition of "brivla-form" (not "brivla"). > A word that matches "brivla-form" may be a gismu, lujvo, fu'ivla, or > invalid.

It may also be a cmavo+brivla ("tosmabru").

> The are some simple validity tests that can be applied to a > brivla-form that do not require characterization, but full validation can't > be done without characterization.

Right. And you can't be sure that you don't have a cmavo hiding there either if you don't know what a lujvo is yet.

> But I still don't see why a word-partitioning parser, which has to deal with > more word-forms and therefore needs additional complexity, couldn't use the > simple brivla-form definition, and thereby reduce point-complexity.

There is no way that I know to partition a string into cmavo, brivla and cmene without working out the details of brivla. If your string begins with CV(V), you don't know if that is a cmavo or part of a brivla unless you can figure out the brivla.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Yahoo! Mail - You care about security. So do we. http://promotions.yahoo.com/new_mail

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Fri 24 of Dec., 2004 01:21 GMT posts: 1912

> On Thu, Dec 23, 2004 at 09:49:49AM -0800, Jorge Llamb?as wrote: > > > > --- John Cowan wrote: > > > If you want to say that the word y (or rather .y.) can be > > > written with multiple y's, fine. But not everywhere in the > > > language! > > > > OK. The way it is written now, only a single y is accepted for > > hyphens in lujvo, but any number are accepted in cmavo and cmene. > > > > I'll fix that. > > You broke it! :-) > > $ echo "yyy" | myparser -m > text > |- nonLojbanWord: yy > |- spaces: y

Now I fixed what I broke while fixing the other part, hopefully without breaking anything else. :-)

mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Yahoo! Mail - You care about security. So do we. http://promotions.yahoo.com/new_mail

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Sat 25 of Dec., 2004 09:56 GMT posts: 14214

Please check the last few changes I made. It wasn't compiling. I'm particularily unsure about the change to h.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Sat 25 of Dec., 2004 09:56 GMT posts: 14214

On Fri, Dec 24, 2004 at 12:03:38PM -0800, Robin Lee Powell wrote: > Please check the last few changes I made. It wasn't compiling. > I'm particularily unsure about the change to h.

The fuhivla-rafsi-C change I'm a bit worried about to.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Sat 25 of Dec., 2004 09:56 GMT posts: 14214

On Fri, Dec 24, 2004 at 12:05:33PM -0800, Robin Lee Powell wrote: > On Fri, Dec 24, 2004 at 12:03:38PM -0800, Robin Lee Powell wrote: > > Please check the last few changes I made. It wasn't compiling. > > I'm particularily unsure about the change to h. > > The fuhivla-rafsi-C change I'm a bit worried about to.

Current behaviour:

  • Sentence: la kolombias 1

MISMATCH! valfendi: -la kolombias. pegbased: -la -ko -lo >mbias<

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Sat 25 of Dec., 2004 09:57 GMT posts: 14214

On Fri, Dec 24, 2004 at 12:06:41PM -0800, Robin Lee Powell wrote: > On Fri, Dec 24, 2004 at 12:05:33PM -0800, Robin Lee Powell wrote: > > On Fri, Dec 24, 2004 at 12:03:38PM -0800, Robin Lee Powell wrote: > > > Please check the last few changes I made. It wasn't compiling. > > > I'm particularily unsure about the change to h. > > > > The fuhivla-rafsi-C change I'm a bit worried about to. > > Current behaviour: > > *** Sentence: la kolombias 1 > MISMATCH! > valfendi: -la kolombias. > pegbased: -la -ko -lo >mbias<

Oh, this is bad:

  • Sentence: cinkrxomoptErata cinkrxomoptEragau cinkrxomoptEramu'ei cinkrxomoptEracu'i 1

MISMATCH! valfendi: (cinkrxomoptEra) -ta (cinkrxomoptEra) -gau (cinkrxomoptEra) -mu'ei (cinkrxomoptEra) -cu'i pegbased: (cinkrxomoptEra) -ta (cinkrxomoptEra) -gau (cinkrxomoptEra) -mu >'ei< (cinkrxomoptEra) -cu'i

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Sat 25 of Dec., 2004 09:57 GMT posts: 1912

> > > > Please check the last few changes I made. It wasn't compiling.

I made a number of changes and missed a few things, sorry. But I think it's becoming more readable.

> > > > I'm particularily unsure about the change to h.

Hmmm, we still need to allow y after ' for y'y. I forgot to change that when I eliminated vowel-y.

Probably the & is not needed after h, but it won't hurt for now.

> > > The fuhivla-rafsi-C change I'm a bit worried about to.

That was correct, thanks.

> > Current behaviour: > > > > *** Sentence: la kolombias 1 > > MISMATCH! > > valfendi: -la kolombias. > > pegbased: -la -ko -lo >mbias<

That was kind of on purpose. I'm disallowing i/u vowel except initially, as per John's suggestion. I'm not sure that's such a good idea.

> Oh, this is bad: > > *** Sentence: cinkrxomoptErata cinkrxomoptEragau cinkrxomoptEramu'ei > cinkrxomoptEracu'i 1 > MISMATCH! > valfendi: (cinkrxomoptEra) -ta (cinkrxomoptEra) -gau (cinkrxomoptEra) -mu'ei > (cinkrxomoptEra) -cu'i > pegbased: (cinkrxomoptEra) -ta (cinkrxomoptEra) -gau (cinkrxomoptEra) -mu > >'ei< (cinkrxomoptEra) -cu'i

Is it still doing it? The cmavo and vowels section was where I was working and I made several saves.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? The all-new My Yahoo! - Get yours free! http://my.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by JohnCowan on Sat 25 of Dec., 2004 09:57 GMT posts: 149

Jorge Llamb?as scripsit:

> That was kind of on purpose. I'm disallowing i/u vowel except > initially, as per John's suggestion. I'm not sure that's such > a good idea.

  • scratches head*

I think I said that iV and uV (V = a,e,i,o,u) should only appear in cmavo by themselves (ua but not *kua), whereas they can appear anywhere in cmene or fu'ivla.

-- Income tax, if I may be pardoned for saying so, John Cowan is a tax on income. --Lord Macnaghten (1901) cowan@ccil.org

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Sat 25 of Dec., 2004 23:37 GMT posts: 1912

> Jorge Llamb?as scripsit: > > > That was kind of on purpose. I'm disallowing i/u vowel except > > initially, as per John's suggestion. I'm not sure that's such > > a good idea. > > *scratches head* > > I think I said that iV and uV (V = a,e,i,o,u) should only appear in > cmavo by themselves (ua but not *kua), whereas they can appear anywhere > in cmene or fu'ivla.

OK, I probably took something you said about cmavo to be about all words. I don't understand the rationale for being so strict with cmavo but not so with fu'ivla and cmene. It can't be about pronounceability because while you disallow {kua broda} you allow the fu'ivla {kuabroda} which is pronounced identically.

I'm going back to allowing (i/u) vowel as syllable core, including in cmavo, unless there is a convincing reason to forbid this. It's not hard to make the formal rule for cmavo different than for fu'ivla and cmene, but it complicates matters for the user: you have to remember that some CVV sequences will break away from a following lujvo and some won't, for example.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Yahoo! Mail - Helps protect you from nasty viruses. http://promotions.yahoo.com/new_mail

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Sat 25 of Dec., 2004 23:37 GMT posts: 14214

On Fri, Dec 24, 2004 at 12:34:04PM -0800, Jorge Llamb?as wrote: > > Oh, this is bad: > > > > *** Sentence: cinkrxomoptErata cinkrxomoptEragau cinkrxomoptEramu'ei > > cinkrxomoptEracu'i 1 > > MISMATCH! > > valfendi: (cinkrxomoptEra) -ta (cinkrxomoptEra) -gau (cinkrxomoptEra) -mu'ei > > (cinkrxomoptEra) -cu'i > > pegbased: (cinkrxomoptEra) -ta (cinkrxomoptEra) -gau (cinkrxomoptEra) -mu > > >'ei< (cinkrxomoptEra) -cu'i > > Is it still doing it?

No, although now it is very much unlike valfendi's answer:

text |- nonLojbanWord: cinkrxomoptErata |- spaces: |- nonLojbanWord: cinkrxomoptEragau |- spaces: |- nonLojbanWord: cinkrxomoptEramu'ei |- spaces: |- nonLojbanWord: cinkrxomoptEracu'i

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Sat 25 of Dec., 2004 23:37 GMT posts: 14214

cmaxes is really messed up:

  • Sentence: zo'onai li'a 1

MISMATCH! valfendi: -zo'o -nai -li'a pegbased: -zo'o -nai >li'a<

  • Sentence: ze'epuku da pinxe lo xalka ze'a le nunjbosla 1

MISMATCH! valfendi: -ze'e -pu -ku -da (pinxe) -lo (xalka) -ze'a -le (nunjbosla) pegbased: -ze'e >puku< -da (pinxe) -lo (xalka) -ze'a -le (nunjbosla)

  • Sentence: zi xruti le zdani 1

MISMATCH! valfendi: -zi (xruti) -le (zdani) pegbased: >zi< >xruti< -le >zdani<

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by JohnCowan on Sat 25 of Dec., 2004 23:37 GMT posts: 149

Jorge Llamb?as scripsit:

> I'm going back to allowing (i/u) vowel as syllable core, including > in cmavo, unless there is a convincing reason to forbid this. It's > not hard to make the formal rule for cmavo different than for fu'ivla > and cmene, but it complicates matters for the user: you have to > remember that some CVV sequences will break away from a following > lujvo and some won't, for example.

I think you've given precisely the convincing reason. There is no shortage of experimental cmavo, and complicating things for fu'ivla-creators is a Bad Thing, as they are already complicated enough.

-- Long-short-short, long-short-short / Dactyls in dimeter, Verse form with choriambs / (Masculine rhyme): cowan@ccil.org One sentence (two stanzas) / Hexasyllabically http://www.reutershealth.com Challenges poets who / Don't have the time. --robison who's at texas dot net

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Sat 25 of Dec., 2004 23:37 GMT posts: 1912

> |- nonLojbanWord: cinkrxomoptErata > |- spaces: > |- nonLojbanWord: cinkrxomoptEragau > |- spaces: > |- nonLojbanWord: cinkrxomoptEramu'ei > |- spaces: > |- nonLojbanWord: cinkrxomoptEracu'i

Yes, when I put the "i/u vowel" back I messed up. The problem was with the "i", cenkrxomoptErata parses correctly. I think I've fixed that now.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Yahoo! Mail - Find what you need with new enhanced search. http://info.mail.yahoo.com/mail_250

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Sat 25 of Dec., 2004 23:38 GMT posts: 1912

> I think you've given precisely the convincing reason. There is no shortage > of > experimental cmavo, and complicating things for fu'ivla-creators is a Bad > Thing, as they are already complicated enough.

We seem to have different ideas about what is complicated. Allowing {kuabroda} but not {kaubroda} as a fu'ivla is more complicated than saying that both cases are disallowed for the same reason: they both break as cmavo + gismu.

Pierre, does valfendi accept {kuabroda} as a fu'ivla?

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? The all-new My Yahoo! - Get yours free! http://my.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Sun 26 of Dec., 2004 00:00 GMT posts: 1912

Elidable terminators are sometimes required for disambiguation, but they are always allowed, even when not required.

Pauses between words are sometimes required for disambiguation, but they are always allowed, even when not required.

Marking stress with caps is sometimes required for disambiguation, but it is always allowed, even when not required.

Using long rafsi in lujvo instead of the short ones is sometimes required due to morphology constraints, but it is always allowed, even when not required.

There seems to be a pattern there. What about hyphens?

y- and r-hyphens after CVC and CVV rafsi are sometimes required due to morphology constrains, but they are always allowed, even when not required... NOT! When not required they are not allowed!

This seems to go against the Lojban way of doing things, and it is also a burden for the user. You get used to a lujvo like {tosymabru} and when you try to form a new lujvo by adding an additional rafsi: say {naltosymabru} it turns out it is not valid: it has to be {naltosmabru}. You get used to {ro'inre'o} and when you form {braro'inre'o} it turns out that's not a lujvo, either.

In the PEG grammar, I allowed -y- after any CVC, not only when it is required, mainly because that was easier than disallowing it when it wasn't necessary. It doesn't seem to be worth complicating the grammar for such an unnecessary and bothersome restriction.

I am now allowing the r-hyphen after any non-final CVV as well, because that is more user-friendly. In this case there is a small cost: we take some forms that would otherwise be fu'ivla and put them in lujvo-space, but lujvo have always had priority over fu'ivla so that's not a big deal (and it is not a noticeable chunk of fu'ivla-space anyway).

These are also fairly common mistakes people make when creating lujvo, so this move is actually supported by usage.

mu'o mi'e xorxes

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sun 26 of Dec., 2004 01:31 GMT

On Saturday 25 December 2004 18:36, Jorge "Llambías" wrote: > We seem to have different ideas about what is complicated. > Allowing {kuabroda} but not {kaubroda} as a fu'ivla is > more complicated than saying that both cases are disallowed > for the same reason: they both break as cmavo + gismu. > > Pierre, does valfendi accept {kuabroda} as a fu'ivla?

No. It breaks {kua} off and calls it an error. >kua< (broda)

phma -- Ils pensent que j'ai un cancer du thé russe... -Les Perles de la médecine

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by JohnCowan on Sun 26 of Dec., 2004 04:53 GMT posts: 149

wikidiscuss@lojban.org scripsit:

> In the PEG grammar, I allowed -y- after any CVC, not only when it is > required, mainly because that was easier than disallowing it when it > wasn't necessary. It doesn't seem to be worth complicating the grammar > for such an unnecessary and bothersome restriction. > > I am now allowing the r-hyphen after any non-final CVV as well, because > that is more user-friendly. In this case there is a small cost: we take > some forms that would otherwise be fu'ivla and put them in lujvo-space, > but lujvo have always had priority over fu'ivla so that's not a big deal > (and it is not a noticeable chunk of fu'ivla-space anyway).

I actually support this, somewhat reluctantly, because we have already so many different allo-lexes for lujvo, what's a few more — and it does make errors less likely.

-- Said Agatha Christie / To E. Philips Oppenheim John Cowan "Who is this Hemingway? / Who is this Proust? cowan@ccil.org Who is this Vladimir / Whatchamacallum, http://www.reutershealth.com This neopostrealist / Rabble?" she groused. http://www.ccil.org/cowan --author unknown to me; any suggestions?

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Sun 26 of Dec., 2004 17:06 GMT posts: 14214

Still:

  • Sentence: muSTEl,aVIson ESC[033m1ESC[000m

MISMATCH! valfendi: >muSTE< -l,a VIson. pegbased: -mu (STEl,a) VIson.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Sun 26 of Dec., 2004 17:06 GMT

On Sunday 26 December 2004 03:13, Robin Lee Powell wrote: > Still: > > > *** Sentence: muSTEl,aVIson ESC[033m1ESC[000m > MISMATCH! > valfendi: >muSTE< -l,a VIson. > pegbased: -mu (STEl,a) VIson.

What does camxes do with {mustelAvison}?

phma -- Sans lunettes, je ne distingue même pas les odeurs... -Les Perles de la médecine

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Sun 26 of Dec., 2004 17:06 GMT posts: 1912

> What does camxes do with {mustelAvison}?

{mu steLAvi son}

Also {mustelalalatiTAtuvison} gives:

{mu stelalalatiTAtu vison}

camxes works left to right, so if it doesn't see a cmene to its right it looks for other things. I suppose I could force requiring a pause before a cmene that does not follow doi/la/lai/la'i, but there doesn't seem to be any point to doing that.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Take Yahoo! Mail with you! Get it on your mobile phone. http://mobile.yahoo.com/maildemo

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Mon 27 of Dec., 2004 02:16 GMT posts: 1912

valfendi and camxes will also probably differ in how they handle things like: {zoi STElamrtteladjan STEla} and {zoi djan STElamrtteladjan}.

camxes will parse the fisrt one as {zoi STEla >mrtteladjan< STEla} while the second one needs a closing delimiter {.djan.} for zoi.

I suspect valfendi won't like the first one, but will parse the second one as {zoi djan STElamrtte la djan}.

These are very weird cases. I'm not sure how we should handle strings that contain non-words but are connected, to the left or to the right, without any spaces, with what seem to be lojban words. Are we allowed to break such things from the left (as camxes does) or from the right (as valfendi does)? Or should we require non-words to absorb anything not separated with pauses?

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Jazz up your holiday email with celebrity designs. Learn more. http://celebrity.mail.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Mon 27 of Dec., 2004 02:16 GMT

On Sunday 26 December 2004 12:39, Jorge "Llambías" wrote: > valfendi and camxes will also probably differ in how they > handle things like: {zoi STElamrtteladjan STEla} > and {zoi djan STElamrtteladjan}. > > camxes will parse the fisrt one as {zoi STEla >mrtteladjan< STEla} > while the second one needs a closing delimiter {.djan.} for zoi. > > I suspect valfendi won't like the first one, but will parse the > second one as {zoi djan STElamrtte la djan}.

valfendi currently lexes them as "-zoi >STElamrtte< -la djan. (STEla)" and "-zoi djan. >STElamrtte< -la djan.". I haven't written any magic word handling yet, but what I think it will do is reject the first (because "stelamrtte" isn't a Lojban word) and reject the second (because finding a {zoi} and delimiter turns off lexing until it sees the delimiter again after a pause).

phma -- Mes règles mensuelles ont lieu une fois par an. -Les Perles de la médecine

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Mon 27 of Dec., 2004 02:16 GMT

On Sunday 26 December 2004 11:46, Jorge "Llambías" wrote: > --- Pierre Abbat wrote: > > What does camxes do with {mustelAvison}? > > {mu steLAvi son} > > Also {mustelalalatiTAtuvison} gives: > > {mu stelalalatiTAtu vison}

valfendi says {muste LA vison} and {mu stelala la tiTAtuvison}.

> camxes works left to right, so if it doesn't see a cmene to its > right it looks for other things. I suppose I could force requiring > a pause before a cmene that does not follow doi/la/lai/la'i, but > there doesn't seem to be any point to doing that.

I think we should just say that if a string is valid except for a required pause, the two lexers may give valid output, even different valid output. You may want to list where you allow omitting pauses that the official rules require.

phma -- GCS/M d- s-: a+ C++ UL++++$ P+ L+++ E- W+++ N+ o? K? w-- O? M- V- Y++ PGP++ t- 5? X? R- !tv b++ DI !D G e++ h+>---- r- y>+++

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Mon 27 of Dec., 2004 02:16 GMT posts: 1912

> > > What does camxes do with {mustelAvison}? > > {mu steLAvi son} > > Also {mustelalalatiTAtuvison} gives: > > {mu stelalalatiTAtu vison} > > valfendi says {muste LA vison} and {mu stelala la tiTAtuvison}.

So it will hear the brivla {muste} and {mustelala} even when they are not penultimately stressed?

> > camxes works left to right, so if it doesn't see a cmene to its > > right it looks for other things. I suppose I could force requiring > > a pause before a cmene that does not follow doi/la/lai/la'i, but > > there doesn't seem to be any point to doing that. > > I think we should just say that if a string is valid except for a required > pause, the two lexers may give valid output, even different valid output.

That's more or less what happens now, yes.

> You > may want to list where you allow omitting pauses that the official rules > require.

What I would like is for camxes to agree exactly with the official rules, either by modifying camxes or by modifying the official rules, whichever we agree is better.

I have now added !non-lojban-word at the end of post-word. This means that camxes will require a pause before and after any non-lojban word. It won't break {mutt} as {mu >tt<} (which is what it was doing) nor will it break {tteladjan} as {>tte< la djan} which is what valfendi does.

Now I have to figure out how to impose the rule that a pause is required before a cmene unless preceded by the *words* doi/la/lai/la'i, not just the syllables or a brivla containing those syllables, which is the rule it follows now.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? All your favorites on one personal page – Try My Yahoo! http://my.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

arjPosted by arj on Mon 27 of Dec., 2004 02:16 GMT posts: 953

On Sun, 26 Dec 2004, Jorge Llamb=EDas wrote:

> > valfendi and camxes will also probably differ in how they > handle things like: {zoi STElamrtteladjan STEla} > and {zoi djan STElamrtteladjan}. > > camxes will parse the fisrt one as {zoi STEla >mrtteladjan< STEla} > while the second one needs a closing delimiter {.djan.} for zoi. > > I suspect valfendi won't like the first one, but will parse the > second one as {zoi djan STElamrtte la djan}.

IIRC there MUST be a pause before and after delimiter words, so it=20 shouldn't accept any of them.

--=20 Arnt Richard Johansen http://arj.nvg.org= / Let's have some real examples from a real, non-English language.

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Mon 27 of Dec., 2004 02:16 GMT

On Sunday 26 December 2004 14:34, Jorge "Llambías" wrote: > --- Pierre Abbat wrote: > > valfendi says {muste LA vison} and {mu stelala la tiTAtuvison}. > > So it will hear the brivla {muste} and {mustelala} even when they > are not penultimately stressed?

It breaks before {la}, and finding no stress in the piece assumes that the brivla ends at the piece's end. If the stress is ultimate, as in {musTE}, it calls it an error.

phma -- ..i toljundi do .ibabo mi'afra tu'a do ..ibabo damba do .ibabo do jinga ..icu'u la ma'atman.

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Tue 28 of Dec., 2004 03:27 GMT posts: 14214
  • Sentence: la BALtazar. cu me le ci nolraitru 1

MISMATCH! valfendi: -la BALtazar. -cu -me -le -ci (nolraitru) pegbased: -la (BALta) zar. -cu -me -le -ci (nolraitru)

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Tue 28 of Dec., 2004 03:27 GMT posts: 14214

Here's the list of differences between valfendi and camxes:

http://www.teddyb.org/~rlpowell/media/regular/morph-auto-test.out.txt

I'll try to keep it up to date as both programs progress. It's 12425 lines.

In an ideal universe, xorxes and Pierre (and Nora, I suppose) would get together and decide which of these were errors and which were acceptable differences due to different styles of processing, out of which would emerge a unified morphology.

Then those differences that remain would be mailed to me so I could mark them for regression testing purposes.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Tue 28 of Dec., 2004 03:27 GMT posts: 14214

On Tue, Dec 21, 2004 at 05:24:50PM -0800, Clark & Janiece Nelson wrote: > >>For the sake of modularity and reducing point-complexity, I > >>think it would be worth considering splitting the job into its > >>components, and writing separate grammars: > > > >The problem with this is that we could argue for hours over where > >the seperations lie. I was vehemently opposed to seperating out > >the morphology from the rest of the grammar in the first place, > >in fact. > > Well, of course if one (very influential) partipant is "vehemently > opposed" to any separation, then any proposal for separation would > necessarily either be rejected immediately, or result in hours of > argument. :-)

Indeed.

I feel it's worth stating *why* I'm opposed.

I don't want people to need to understand the divisions we're creating to try to understand how the language works. I think that even "morphology" versus "grammar" is artificial and arbitrary, and I don't think people should have to go to two places to get their questions answered.

It's not really all that important, though.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Tue 28 of Dec., 2004 03:28 GMT

On Monday 27 December 2004 02:01, Robin Lee Powell wrote: > Here's the list of differences between valfendi and camxes: > > http://www.teddyb.org/~rlpowell/media/regular/morph-auto-test.out.txt

First thing I see is that camxes is trying to lex lines beginning with number signs, which in my test file are comments. Second is that you should be running valfendi with the -a option, so that it would recognize cmevla such as {laus} and {doi'as}.

{mi benji le brablolai la laus} is a sentence I made up to try to confuse the lexer. It seems to have succeeded. :-)

More later. Time to sleep.

phma -- Without glasses, I can't even distinguish smells... -Les Perles de la médecine

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Tue 28 of Dec., 2004 03:28 GMT posts: 14214

On Mon, Dec 27, 2004 at 02:16:11AM -0500, Pierre Abbat wrote: > On Monday 27 December 2004 02:01, Robin Lee Powell wrote: > > Here's the list of differences between valfendi and camxes: > > > > http://www.teddyb.org/~rlpowell/media/regular/morph-auto-test.out.txt > > First thing I see is that camxes is trying to lex lines beginning > with number signs, which in my test file are comments.

Yeah, I know. I figured it would be a test case like any other, but it looks like valfending actually *drops* lines that start with #.

That's, umm, a rather idiosyncratic piece of behaviour.

> Second is that you should be running valfendi with the -a option, > so that it would recognize cmevla such as {laus} and {doi'as}.

Fixed. Re-running.

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Tue 28 of Dec., 2004 03:28 GMT posts: 14214
  • Sentence: coi kOnsept 1

MISMATCH! valfendi: -coi kOnsept. pegbased: -coi (kOnse) pt.

  • Sentence: doi kOnsept coi pado 1

MISMATCH! valfendi: -doi kOnsept. -coi -pa -do pegbased: -doi (kOnse) pt. -coi -pa -do

-Robin

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Tue 28 of Dec., 2004 03:28 GMT posts: 1912

> *** Sentence: la BALtazar. cu me le ci nolraitru 1 > MISMATCH! > valfendi: -la BALtazar. -cu -me -le -ci (nolraitru) > pegbased: -la (BALta) zar. -cu -me -le -ci (nolraitru)

I didn't have !cmene at the beginning of gismu, because it wasn't needed for the words rule. That should be fixed.

I also added !cmene at the end of gismu, lujvo and fuhivla. This will force a pause before any cmene not preceded by the *cmavo* doi/la/lai/la'i, not just the syllables.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Yahoo! Mail - 250MB free storage. Do more. Manage less. http://info.mail.yahoo.com/mail_250

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Tue 28 of Dec., 2004 03:28 GMT

On Monday 27 December 2004 02:25, Robin Lee Powell wrote: > On Mon, Dec 27, 2004 at 02:16:11AM -0500, Pierre Abbat wrote: > > First thing I see is that camxes is trying to lex lines beginning > > with number signs, which in my test file are comments. > > Yeah, I know. I figured it would be a test case like any other, but > it looks like valfending actually *drops* lines that start with #. > > That's, umm, a rather idiosyncratic piece of behaviour.

So how should comments in the test file be indicated?

phma -- ..i toljundi do .ibabo mi'afra tu'a do ..ibabo damba do .ibabo do jinga ..icu'u la ma'atman.

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Tue 28 of Dec., 2004 03:28 GMT posts: 1912

> {mi benji le brablolai la laus} is a sentence I made up to try to confuse the > > lexer. It seems to have succeeded. :-)

camxes has no problem with {brablolai lalaus} or {braBLOlailalaus}.

But {brablolailalaus} is not a valid word. It is not a cmene because it contains forbidden syllables. It does not start with a brivla because there is no stressed syllable followed by an unstressed syllable. I am not taking "la + cmene" as an indicator of stress two syllables back. Should we add that as another official way to represent stress in writing? The only stress representations I accept currently are caps and a following syllable followed by a space.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Send holiday email and support a worthy cause. Do good. http://celebrity.mail.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by pycyn on Tue 28 of Dec., 2004 03:28 GMT posts: 2388

wrote:

> I don't want people to need to understand the > divisions we're > creating to try to understand how the language > works. I think that > even "morphology" versus "grammar" is > artificial and arbitrary, and > I don't think people should have to go to two > places to get their > questions answered.

I assume you mean that the distinction between which parts of a parsing process are counted as dealing with morphology and which with grammar (syntax?) is arbitrary. the distinction between morphology and syntax is at least a whole lot less arbitrary. > > It's not really all that important, though. > > -Robin > > >

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Tue 28 of Dec., 2004 03:28 GMT posts: 1912
  • Sentence: jAIckAnkua lO'eckAnkua mu'eickAnkua uaiuckAnkua

MISMATCH! valfendi: -jAI (ckAnkua) -lO'e (ckAnkua) -mu'ei (ckAnkua) >uaiu< (ckAnkua) pegbased: (jAIckA) >nkua< -lO'e (ckAnkua) -mu'ei (ckAnkua) >uaiuckAnkua<

valfendi and camxes have different ideas on how to treat two consecutive stressed syllables not separated by spaces.

camxes strongly favours left-to-right processing, so it will take the first stress it runs into as the determining one.

I presume valfendi and camxes will agree that {jAIckankua} breaks as (jAIcka) >nkua<.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? All your favorites on one personal page – Try My Yahoo! http://my.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by pycyn on Tue 28 of Dec., 2004 03:28 GMT posts: 2388

The traditional claim that a Lojban speech steam can be uniquely partitioned into Lojban words seems to be in trouble. the difficulties seem to center on the "foreign" parts of the language, cmevla and fuhivla — and lujvo insofar as they impinge on the latter (though these last questions seem to be getting solutions). Fuhivla have always been a bit problematic as have cmene in their relation both to their native languages and to Lojban and various devices have come along to deal with these problems, mainly restricted -- and often very complex — phonological patterns and — for cmene at least — obligatory pauses. These last have seemed impractical in actual speech — people forget to make them and, when they do, others fail to notice them as distinct from nonsignificant pauses.

One possibility for relieving this latest problem is to replace significant absences (pauses) by significant presences, a unique sound or mark. In the discussion of the morphological problems, it turns out that the exact role of /iy/ and /uy/ is up for grabs (assuming they are allowed at all). Thus, /uy/ could replace morphologically obligatory pauses — a minimal utterance (well, longer only than /y/, which is already dealt with) could replace a troublesome pause. (Putting /uy/ at the end of names is remeniscent of Japanese postclitic "wa," though with a different function and a m0re indistinct vowel.)

In the discussion of cmevla, one constant complaint is the peculiar restriction agains /doi/, /la/ and /lai/ occurring in the name even when they are in the native original — poor Lila Doyle! The solution usually made is to allow the prohibited strings but to place an obligatory pause before all names, so that the confusion with the words {la, lai, doi} is prevented. Of course, another obligatory pause (other than the phonologically determined ones between final vowel of one word and initial of the next) merely extends the problem of obligatory pauses. So we might again suggest that this pause become a positive utterance, different from that for the world final version and so /iy/. (The idea here is that /coi iy djan/ is almost exactly the pattern of /hiya John/.) /iy/ and /uy/ are probably elidable in some cases, but that would need to be investigated.

What can go between a /iy/ and a /uy/? Any phonologically legal Lojban string speech string, that is, one that contains no illegal vowel or consonant clusters (nor /iy/ and /uy/ of course).

Not even a final consonant need be required, though it might be for continuity's sake. But this opens possibilities beyond dealing with names; any foreign word of phrase suitably Lojbanized can go in this space and be nativized, not merely quoted. What happens between /iy/ and /uy/ is not subject to further analysis beyond whether it is phonologically permitted and the whole is taken as a block. This block can be used other than as the core of a cmene sumti. In particular, it can be inserted as a unit what is otherwise lujvo construction (with some adjustments probably rerquired, but certainly fewer restrictions than now are involved in fuhivla — apparently just a glue between vowel finals and /iy/ and /uy/ and vowel initials). I think that the only limitation is that the block can not be compound-final, which would mean that the pattern of many fuhivla (which went agains the usual Lojban modifier-modified anyhow) would have to be changed to put the category last.

This is a radical suggestion, but it carries a load of benefits. It does, on the other hand, require a reworking of existing fuhivla, the cost of which is not very clear at the moment.

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Tue 28 of Dec., 2004 03:28 GMT

On Monday 27 December 2004 10:06, Jorge "Llambías" wrote: > *** Sentence: jAIckAnkua lO'eckAnkua mu'eickAnkua uaiuckAnkua > MISMATCH! > valfendi: -jAI (ckAnkua) -lO'e (ckAnkua) -mu'ei (ckAnkua) >uaiu< (ckAnkua) > pegbased: (jAIckA) >nkua< -lO'e (ckAnkua) -mu'ei (ckAnkua) >uaiuckAnkua< > > valfendi and camxes have different ideas on how to treat > two consecutive stressed syllables not separated by spaces. > > camxes strongly favours left-to-right processing, so it > will take the first stress it runs into as the determining one. > > I presume valfendi and camxes will agree that {jAIckankua} > breaks as (jAIcka) >nkua<.

Actually, since a word other than a cmene can't begin with "nk", it breaks after {jAI}. Then it calls {ckankua} an error. I'd have to examine the code to see why.

phma -- Now I need a magnifier to find my eyeglasses! -Les Perles de la médecine

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Tue 28 of Dec., 2004 03:29 GMT posts: 1912

pc: > The traditional claim that a Lojban speech steam > can be uniquely partitioned into Lojban words > seems to be in trouble. the difficulties seem to > center on the "foreign" parts of the language, > cmevla and fuhivla — and lujvo insofar as they > impinge on the latter (though these last > questions seem to be getting solutions).

They aren't so much difficulties as different positions on how strict or permissive the morphology should be. Once we decide that, the uniqueness of the partitioning of the stream is not under threat.

The main differences in criteria seem to be:

1) How do we represent a stressed syllable?

The official prescription has: capital letters or a following syllable followed by a space. valfendi also allows a following syllable followed by doi/la/lai/la'i + cmene, camxes doesn't.

2) Do we allow stress in syllables that shouldn't have it?

valfendi allows some of these "secondary stresses" in brivla. camxes allows the last syllable of a brivla to be marked as stressed.

3) Which vowel combinations are allowed?

camxes allows ai, au, ei, oi, (i/u) vowel in cmavo, cmene and fu'ivla, and no other vowel combinations anywhere.

valfendi allows any combination in cmene and fu'ivla, but only ai, au, ei, oi in cmavo and (i/u) vowel only as a single cmavo by itself.

camxes allows {iy} in cmene as the only vowel combination with y. valfendi allows any combination with y in cmene and fu'ivla.


> One possibility for relieving this latest problem > is to replace significant absences (pauses) by > significant presences, a unique sound or mark. .... > What can go between a /iy/ and a /uy/? Any > phonologically legal Lojban string speech string, > that is, one that contains no illegal vowel or > consonant clusters (nor /iy/ and /uy/ of course).

{la iy anything uy} is similar to {la'o any-word anything any-word}, although this last one requires pauses.

I suppose the initial {.iy} will require a glottal stop too, otherwise {la iy ...} and {lai iy ...} would be practically indistinguishable.

> whole is taken as a block. This block can be > used other than as the core of a cmene sumti. > In particular, it can be inserted as a unit what > is otherwise lujvo construction (with some > adjustments probably rerquired, but certainly > fewer restrictions than now are involved in > fuhivla — apparently just a glue between vowel > finals and /iy/ and /uy/ and vowel initials). I > think that the only limitation is that the block > can not be compound-final, which would mean that > the pattern of many fuhivla (which went agains > the usual Lojban modifier-modified anyhow) would > have to be changed to put the category last.

I am preparing an addition along these lines, by allowing cmene-rafsi, which are just any cmene followed by -iy. So for example the rafsi for {djan} would be {djaniy-}. These rafsi, like fuhivla-rasi, can only be preceded by y-rafsi (four-letter rafsi or CVC-y rafsi or fuhivla-rafsi or other cmene-rafsi). And y has to be disallowed in cmene.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Yahoo! Mail - Easier than ever with enhanced search. Learn more. http://info.mail.yahoo.com/mail_250

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Tue 28 of Dec., 2004 03:29 GMT

On Monday 27 December 2004 10:58, John E Clifford wrote: > The traditional claim that a Lojban speech steam > can be uniquely partitioned into Lojban words > seems to be in trouble. the difficulties seem to > center on the "foreign" parts of the language, > cmevla and fuhivla — and lujvo insofar as they > impinge on the latter (though these last > questions seem to be getting solutions). Fuhivla > have always been a bit problematic as have cmene > in their relation both to their native languages > and to Lojban and various devices have come along > to deal with these problems, mainly restricted -- > and often very complex — phonological patterns > and — for cmene at least — obligatory pauses. > These last have seemed impractical in actual > speech — people forget to make them and, when > they do, others fail to notice them as distinct > from nonsignificant pauses.

The test phrases include stressed and unstressed cmavo preceding brivla without a pause. Stressing a cmavo before a brivla without a pause is likely to result in a different word division, even if the brivla is a lujvo: /lojboJBEna/ is {lo jbojbena}, while /LOjboJBEna/ is {lojbo jbena}. That camxes lexes /jAIckAnkua/ differently than valfendi isn't a big problem. I'm more concerned about /LIXtenctain/, which camxes splits as {lixte nctain}.

phma -- Mes règles mensuelles ont lieu une fois par an. -Les Perles de la médecine

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Tue 28 of Dec., 2004 03:29 GMT posts: 1912

> That > camxes lexes /jAIckAnkua/ differently than valfendi isn't a big problem. I'm > more concerned about /LIXtenctain/, which camxes splits as {lixte nctain}.

That has been fixed. The problem was that the gismu rule didn't have !cmene in front, because words didn't need it.

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? Jazz up your holiday email with celebrity designs. Learn more. http://celebrity.mail.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by pycyn on Tue 28 of Dec., 2004 03:29 GMT posts: 2388

wrote:

> > pc: > > The traditional claim that a Lojban speech > steam > > can be uniquely partitioned into Lojban words > > seems to be in trouble. the difficulties > seem to > > center on the "foreign" parts of the > language, > > cmevla and fuhivla — and lujvo insofar as > they > > impinge on the latter (though these last > > questions seem to be getting solutions). > > They aren't so much difficulties as different > positions > on how strict or permissive the morphology > should be. > Once we decide that, the uniqueness of the > partitioning > of the stream is not under threat.

This is of course exactly what puts the whole under threat: the fact that these issues have not been decided — nor do there seem to be principled ways to decide them (except tidy algorithms, which may be enough). The claim appears to have been false for Lojban up to now and the aim is to figure out how best to make it true.

> > The main differences in criteria seem to be: > > 1) How do we represent a stressed syllable? > > The official prescription has: capital letters > or a following > syllable followed by a space. valfendi also > allows a following > syllable followed by doi/la/lai/la'i + cmene, > camxes doesn't.

This is not a real problem with the claim, only with how to represent the speech stream. Is it clear that once stress is represented we alsways know what it signifies?

> 2) Do we allow stress in syllables that > shouldn't have it? > > valfendi allows some of these "secondary > stresses" in brivla. > camxes allows the last syllable of a brivla to > be marked as stressed. > > 3) Which vowel combinations are allowed? > > camxes allows ai, au, ei, oi, (i/u) vowel in > cmavo, cmene and fu'ivla, and no other vowel > combinations > anywhere. > > valfendi allows any combination in cmene and > fu'ivla, > but only ai, au, ei, oi in cmavo and (i/u) > vowel only > as a single cmavo by itself. > > camxes allows {iy} in cmene as the only vowel > combination > with y. valfendi allows any combination with y > in cmene > and fu'ivla. > > > > One possibility for relieving this latest > problem > > is to replace significant absences (pauses) > by > > significant presences, a unique sound or > mark. > ... > > What can go between a /iy/ and a /uy/? Any > > phonologically legal Lojban string speech > string, > > that is, one that contains no illegal vowel > or > > consonant clusters (nor /iy/ and /uy/ of > course). > > {la iy anything uy} is similar to > {la'o any-word anything any-word}, although > this > last one requires pauses.

Yes; the difference is only in terms of possible roles (and the simpler markers).

> I suppose the initial {.iy} will require a > glottal stop > too, otherwise {la iy ...} and {lai iy ...} > would be > practically indistinguishable.

Of course; the glottal stop between vowels in different words is phonologically automatic (well, should be, though even here some speakers manage to screw up by using variants that their interlocutors don't recognize as pauses).

> > whole is taken as a block. This block can be > > used other than as the core of a cmene > sumti. > > In particular, it can be inserted as a unit > what > > is otherwise lujvo construction (with some > > adjustments probably rerquired, but certainly > > fewer restrictions than now are involved in > > fuhivla — apparently just a glue between > vowel > > finals and /iy/ and /uy/ and vowel initials). > I > > think that the only limitation is that the > block > > can not be compound-final, which would mean > that > > the pattern of many fuhivla (which went > agains > > the usual Lojban modifier-modified anyhow) > would > > have to be changed to put the category last. > > > I am preparing an addition along these lines, > by allowing > cmene-rafsi, which are just any cmene followed > by -iy. > So for example the rafsi for {djan} would be > {djaniy-}. These > rafsi, like fuhivla-rasi, can only be preceded > by y-rafsi > (four-letter rafsi or CVC-y rafsi or > fuhivla-rafsi or other > cmene-rafsi). And y has to be disallowed in > cmene.

Yes, this would work as well, though it leaves the (merely practical perhaps) cmene problems untouched and is slightly less general otherwise. It maybe more feasible since less radical. It certainly is desirable in some form or other.

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by pycyn on Tue 28 of Dec., 2004 03:29 GMT posts: 2388

> On Monday 27 December 2004 10:58, John E > Clifford wrote: > > The traditional claim that a Lojban speech > steam > > can be uniquely partitioned into Lojban words > > seems to be in trouble. the difficulties > seem to > > center on the "foreign" parts of the > language, > > cmevla and fuhivla — and lujvo insofar as > they > > impinge on the latter (though these last > > questions seem to be getting solutions). > Fuhivla > > have always been a bit problematic as have > cmene > > in their relation both to their native > languages > > and to Lojban and various devices have come > along > > to deal with these problems, mainly > restricted -- > > and often very complex — phonological > patterns > > and — for cmene at least — obligatory > pauses. > > These last have seemed impractical in actual > > speech — people forget to make them and, > when > > they do, others fail to notice them as > distinct > > from nonsignificant pauses. > > The test phrases include stressed and > unstressed cmavo preceding brivla > without a pause. Stressing a cmavo before a > brivla without a pause is likely > to result in a different word division, even if > the brivla is a lujvo: > /lojboJBEna/ is {lo jbojbena}, while > /LOjboJBEna/ is {lojbo jbena}. That > camxes lexes /jAIckAnkua/ differently than > valfendi isn't a big problem. I'm > more concerned about /LIXtenctain/, which > camxes splits as {lixte nctain}.

Well, the name problem is one thing I particularly aimed to deal with. I am not sure that this proposal helps for stress problem, which like pauses require a kind of control over the speech stream that most of us lack a lot of the time. xorxes' suggestion to always use hephens (well, his is not quite that but...) goes some way to resolving that however: it works in the given case at least.

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by Anonymous on Tue 28 of Dec., 2004 04:08 GMT

Robin Lee Powell scripsit:

> I don't want people to need to understand the divisions we're > creating to try to understand how the language works. I think that > even "morphology" versus "grammar" is artificial and arbitrary, and > I don't think people should have to go to two places to get their > questions answered.

There's pretty good cross-linguistic and psycholinguistic evidence that people do have separate morphology and syntax modules in their heads (in particular, different kinds of mental problems can impair one but not the other).

-- John Cowan http://www.ccil.org/~cowan jcowan@reutershealth.com To say that Bilbo's breath was taken away is no description at all. There are no words left to express his staggerment, since Men changed the language that they learned of elves in the days when all the world was wonderful. --The Hobbit

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

Posted by xorxes on Tue 28 of Dec., 2004 04:08 GMT posts: 1912

I'm trying to figure out what to do with misplaced stress marks.

What does valfendi do with {lobroDABRODA.} and {loBRODABRODA.}?

I think camxes currently does -lo (broDABRO) -DA and -lo (BRODA) (BRODA).

mu'o mi'e xorxes


'''''''''''''''''''''''''''''''''''''''''''__ Do you Yahoo!? The all-new My Yahoo! - Get yours free! http://my.yahoo.com

Score: 0.00 Vote:
1 2 3 4 5
16px|top of page

rlpowellPosted by rlpowell on Tue 28 of Dec., 2004 04:08 GMT posts: 14214

On Mon, Dec 27, 2004 at 07:58:24AM