cow oak: Difference between revisions

From Lojban
Jump to navigation Jump to search
mNo edit summary
 
m (Gleki moved page jbocre: cow oak to cow oak without leaving a redirect: Text replace - "jbocre: c" to "c")
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:


* Decided Issues
From [http://www.lojban.org/lists/lojban-list/msg09535.html]:
** Point 1, below, is the official viewpoint: ZOI divides the input stream into words before looking for a token. Points 2 and 3 are not carried.


* Oustanding Issues
** The CLL needs to be updated to reflect the behavior of the [[jbocre: CLL PEG Errata EG grammar with regard to ZOI|CLL PEG Errata EG grammar with regard to ZOI]].


This page resulted from a discussion about the way ZOI is handled in camxes and the specification for it in the CLL:
~pp~


http://groups.google.com/group/lojban/browse_frm/thread/aa27ff25e6dd5201
* To: lojban-list@lojban.org


zoi, as described in the CLL:
* Subject: The case of the cow oak: a lexicographical anecdote


http://dag.github.com/cll/19/10/
* From: Arnt Richard Johansen <axx@xxx.xxx>


states, about delimiters:
* Date: Thu, 17 Mar 2005 12:35:31 +0100 (CET)


"...and which is not found in the written text or spoken phoneme stream."
* Sender: nobody <nobody@digitalkingdom.org>


"Within written text, the Lojban written word used as a delimiting word may not appear..."
On Jbovlaste, we (that is, Robin :) ) have imported the contents of


It then goes on to provide an example that is claimed to be ungrammatical:
luj1999.ZIP into these pages:


mi djuno fi le valsi po'u zoi gy. gyrations .gy.
http://jbovlaste.lojban.org/wiki/noralujv%20entries%20with%20no%20definitions?lang=en


One could infer, given the example provided, that "gy" could not appear *at all* in the quoted text.  The official parser (grammar.300) does not behave this way, however, as it breaks up the words into tokens and fails to detect that the delimiter might be a substring of a quoted word.
This word list has, I believe, been assembled programmatically by Nora


camxes also allows the form above as quoted by the CLL. It does not allow the following:
based on occurrences in the corpus.


mi djuno fi le valsi po'u zoi gy. gyrate .gy.
Now, many of these words are nonce, typos, or simply erroneous


because "gyrate" are three separate Lojban words, and camxes parses the quoted text as if it were Lojban, stopping after matching the first parsed word against the terminator.
registrations of natlang words. But one thing that caught my eye in the  


Exactly how a parser should parse text between zoi delimiters needs to be decided.  There are three extant proposals:
first few pages of that list was:


# Consider the PEG grammar is it is written now to be correct and update the CLL to more accurately describe how zoi works.
{bakcindu} bovine+oak


# replace the rule for zoi-word to match non-Lojban words (strings of non-whitespace) rather than lojban words, so 'gyrate' won't be divided into three words.  The CLL will need to be updated for this behavior too.
It *could* simply be an error, I thought, but there are many species of  


# Replace the PEG grammar with something that reads the stuff between the ZOI delimiter a character at a time. Either requiring a pause before the final delimiter or not.  This is consistent with the CLL.
oak. Maybe one of these species is the "bovine" one in some language the  


===  ZOhOI ===
original author wanted to calque from, possibly Linnean binomials. I


The proposed selma'o ZOhOI quotes a single word, allowing a non-Lojban word to be quoted without using delimiters.  The behavior of ZOhOI is worth considering when deciding on how ZOI should parse between it's delimiters.
searched for "bakcindu" in the lojban.org search interface (which covers,  


What ZOhOI considers a single word should probably be what ZOI considers a single word, unless it is decided that ZOI works a character at a time.
among other things, this mailing list since the dawn of time). But no
 
occurrences were found. I pored over the List of Quercus species on
 
Wikipedia, but nothing seemed remotely cow-like there.
 
I then asked Pierre Abbat, our resident taxonomist, to help out. A few
 
minutes after receiving a response that there was no such thing as cow
 
oak, I started racking my brain about how it could come to be that that
 
word had ended up in the list. Did Nora have access to some texts back in
 
1999 that wasn't searchable on lojban.org yet? Probably not. So it *had*
 
to be in there somewhere.
 
Then it hit me. Obviously, Nora's script did canonicalization of the lujvo
 
forms. Otherwise, there would be at least some alternative forms in the
 
list. So it was probably registered from a non-canonical form. I tried
 
searching for "baknycindu", and BINGO!:
 
http://balance.wiw.org/~jkominek/lojban/9602/msg00128.html
 
That was, apparently, the only usage ever of that word. And in a really
 
bizarre metalinguistic discussion, too (read it!).
 
I feel a bit bad about going to Pierre before thinking through *why* there
 
was no hits on the search form now.
 
--
 
Arnt Richard Johansen                                http://arj.nvg.org/
 
Så mange ord som mulig per gram var utvelgelsesgrunnlaget da bøkene
 
skulle plukkes ut.                    --Erling Kagge: Alene til Sydpolen
 
~/pp~
 
And now I discover that there actually '''is''' a cow oak. See [http://wordnet.princeton.edu/cgi-bin/webwn2.0?stage=1&amp;word=cow%20oak ordnet].

Latest revision as of 11:47, 23 March 2014

From [1]:


~pp~

  • To: lojban-list@lojban.org
  • Subject: The case of the cow oak: a lexicographical anecdote
  • From: Arnt Richard Johansen <axx@xxx.xxx>
  • Date: Thu, 17 Mar 2005 12:35:31 +0100 (CET)
  • Sender: nobody <nobody@digitalkingdom.org>

On Jbovlaste, we (that is, Robin :) ) have imported the contents of

luj1999.ZIP into these pages:

http://jbovlaste.lojban.org/wiki/noralujv%20entries%20with%20no%20definitions?lang=en

This word list has, I believe, been assembled programmatically by Nora

based on occurrences in the corpus.

Now, many of these words are nonce, typos, or simply erroneous

registrations of natlang words. But one thing that caught my eye in the

first few pages of that list was:

{bakcindu} bovine+oak

It *could* simply be an error, I thought, but there are many species of

oak. Maybe one of these species is the "bovine" one in some language the

original author wanted to calque from, possibly Linnean binomials. I

searched for "bakcindu" in the lojban.org search interface (which covers,

among other things, this mailing list since the dawn of time). But no

occurrences were found. I pored over the List of Quercus species on

Wikipedia, but nothing seemed remotely cow-like there.

I then asked Pierre Abbat, our resident taxonomist, to help out. A few

minutes after receiving a response that there was no such thing as cow

oak, I started racking my brain about how it could come to be that that

word had ended up in the list. Did Nora have access to some texts back in

1999 that wasn't searchable on lojban.org yet? Probably not. So it *had*

to be in there somewhere.

Then it hit me. Obviously, Nora's script did canonicalization of the lujvo

forms. Otherwise, there would be at least some alternative forms in the

list. So it was probably registered from a non-canonical form. I tried

searching for "baknycindu", and BINGO!:

http://balance.wiw.org/~jkominek/lojban/9602/msg00128.html

That was, apparently, the only usage ever of that word. And in a really

bizarre metalinguistic discussion, too (read it!).

I feel a bit bad about going to Pierre before thinking through *why* there

was no hits on the search form now.

--

Arnt Richard Johansen http://arj.nvg.org/

Så mange ord som mulig per gram var utvelgelsesgrunnlaget da bøkene

skulle plukkes ut. --Erling Kagge: Alene til Sydpolen

~/pp~

And now I discover that there actually is a cow oak. See ordnet.