free gismu space: Difference between revisions

From Lojban
Jump to navigation Jump to search
mNo edit summary
 
mNo edit summary
Line 1: Line 1:


mi'e xod .i le kulnu gismu goi ko'a cu toldrani je selsrera ki'u di'e
Total number of possible gismu forms: ''96,475''


# '''pa'enai zmanei''' le pu selcuxna kulnu na'ebo vo'e
Total number of possible forms excluding the last vowel: ''19,295''


# ko'a '''na sance simsa''' le fatci kulnu cmene
Total number of official gismu: ''1,342''


.iseni'ibo .e'ucai pilno le fu'ivla be le'a li ci
Total number of gismu forms that clash with official gismu: ''11,874''  


----
(according to the definition of which gismu clash with each other in the book, including the forms of the official gismu themselves)
 
Percentage of forms actually taken: ''12.31'' (about 1 in 8)


Hesitatingly I agree that they should indeed be replaced with fu'ivla. Going to produce the list of fu'ivla for us, xod? :) --[[jbocre: Jay Kominek|Jay]]
(whether by clashes or by the actual gismu themselves)


''mi'e xod .i zo'o .o'u la nitcion. noi dukse selcuntu cu catni la'e di'u le jbogri''
* But this is misleading: once you actually assign a new gismu, a variable amount of gismu space becomes used up.
** Good point. However, the most forms that a gismu can use up is 13: 5 for the last vowel plus 2 for each consonant. So there are at least ''6,507'' free gismu. In practice, it seems they use up about 9 forms each, so there are about ''9,400'' free forms.


----
Percentage of forms actually taken (ignoring clashes other than the last vowel): ''6.96''


I too agree with Xod, regarding both his reasons and his conclusions. -- mi'e [[User:And Rosta|And Rosta]].
(about 1 in 15)


----
----


=== The blacklist: 54 cultural gismu ===
I get a different analysis, as follows:


''source: [[jbocre: gismu etymology|gismu etymology]], with 27 scientific constants and powers of ten removed''
# There are 19365 legal gisms (4-letter rafsi) by the rules.


||
# There are 1338 gisms in actual use.


baxso | Malay-Indonesian
# The gismu avoidance rules block 3731 more gisms, for an effective total of 5069 in use.


bengo | Bengali         
# This leaves 14566 gisms available.


bemro | North American
# On average, each gism blocks 4 other gisms.


bindo | Indonesian
# In effect, then, we can have about 2900 more gismu, depending on the exact details of assignments.


brazo | Brazilian
* That's not correct. You can't count 4-letter rafsi, because 4-letter rafsi '''are''' allowed to be very similar to other 4-letter rafsi. The gismu they come from could differ by one small change in the 4-letter stem and by the final vowel.


brito | British         
If this is correct, then we can't have a distinct gismu for every culture, because the number of languages spoken around the world is 6000-7000, depending on how you count.


budjo | Buddha         
* But we can have an indistinct gismu for every culture. [[User:And Rosta|And Rosta]]
* Can you tell me exactly the rules by which gismu block each other, and gisms block each other? -[[User:tsali|tsali]]


dadjo | Tao
** See [http://www.lojban.org/files/reference-grammar/chap4.html#s14 Chapter 4, Section 14 of the Book]
*** Unfortunately, the consonant blocking table is illegible in the on-line version.  Here it is:


dotco | German
b blocks p, v


dzipo | Antarctican
c blocks j, s


filso | Palestinian
d blocks t


fraso | French
f blocks v, p


friko | African         
g blocks k, x


gento | Argentinian
j blocks c, z


glico | English         
k blocks g, x


jegvo | Jehovah         
l blocks r


jerxo | Algerian       
m blocks n


jordo | Jordanian       
n blocks m


jungo | Chinese
p blocks b, f


kadno | Canadian
r blocks l


ketco | South American
s blocks z, c


kisto | Pakistani       
t blocks d


latmo | Latin/Latium   
v blocks b, f


libjo | Libyan         
x blocks g, k


lojbo | Loglandic       
z blocks j, s


lubno | Lebanese       
----
 
meljo | Malaysian
 
merko | American       
 
mexco | Mexican         
 
misro | Egyptian
 
morko | Moroccan       
 
muslo | Islam/Moslem
 
polno | Polynesian     
 
ponjo | Japanese
 
porto | Portuguese     
 
rakso | Iraqi           
 
ropno | European       
 
rusko | Russian         
 
sadjo | Saudi           
 
semto | Semitic         
 
sirxo | Syrian         
 
skoto | Scottish       
 
softo | Soviet/USSR     
 
spano | Spanish         
 
sralo | Australian     
 
srito | Sanskrit       
 
xazdo | Asiatic         
 
xebro | Hebrew(Israeli) 
 
xelso | Greek


xindo | Hindi
Enough speculation!  I have written a program that by brute force actually ''counts'' the number of free gismu.  I started by generating a list of all 96475 possible gismu forms.  We can call 'em "candidate gismu" or "proto-gismu".  I then went through the list of 1342 official gismu, one at a time.  For each one, I deleted it from the list of proto-gismu. Then I deleted all the proto-gismu that differed only by the final vowel (except for the "brodV" series (are there any other exceptions like this?].  Then I deleted all the proto-gismu that were blocked based on consonant similarity as in the table given just above.  When I had done that for all the official gismu I counted the remaining proto-gismu.  (On my Pentium III 450 running Windoze 98 all this takes less than a minute.)


xispo | Hispanic       
The answer was rather surprising.  I find that there are still 85,536 available gismu forms.  Now, I'm sure you'll say, "That's simply not possible!" But think about it.  A lot of the forms that would be blocked by the consonant similarity rules aren't even valid proto-gismu.  (Don't forget, to be a valid proto-gismu a form still has to conform to certain rules.)  So fewer forms are blocked than you might think.  Additionally, a number of forms (no, I haven't counted them; maybe later) are blocked multiple ways.  For instance, *bajru is blocked by both '''bacru''' and '''bajra'''.  Because of this overlap fewer forms are blocked than you might think.


xrabo | Arabic         
I'm pretty confident that my program is correct, but it would be a Good Thing if someone were to attempt to verify my results. --mi'e skat


xriso | Christ         
*You have to consider that those 85,536 are not independently available. Choosing one will block many of the others. Close to 80% of them will be blocked by the last vowel rule. --mi'e [[User:xorxes|xorxes]]
*But experimental gismu won't block each other (or at least, we can't say which blocks which). --[[User:And Rosta|And Rosta]] Furthermore, the blocking rules were part of the algorithm for assigning the original gismu forms and are not part of the living morphological rules that constrain, say, fuhivla and cmevla, since the gismu are supposed to form a closed class. Since experimental gismu are inherently unofficial, there is no need to suppose that the blocking rules would apply to them. I accept that one would expect official gismu to conform to the blocking rules.


xurdo | Urdu ||
**Right, but in that case the relevant number is 96475 - 1342 = 95133 available forms. --[[User:xorxes|xorxes]]

Revision as of 16:50, 4 November 2013

Total number of possible gismu forms: 96,475

Total number of possible forms excluding the last vowel: 19,295

Total number of official gismu: 1,342

Total number of gismu forms that clash with official gismu: 11,874

(according to the definition of which gismu clash with each other in the book, including the forms of the official gismu themselves)

Percentage of forms actually taken: 12.31 (about 1 in 8)

(whether by clashes or by the actual gismu themselves)

  • But this is misleading: once you actually assign a new gismu, a variable amount of gismu space becomes used up.
    • Good point. However, the most forms that a gismu can use up is 13: 5 for the last vowel plus 2 for each consonant. So there are at least 6,507 free gismu. In practice, it seems they use up about 9 forms each, so there are about 9,400 free forms.

Percentage of forms actually taken (ignoring clashes other than the last vowel): 6.96

(about 1 in 15)


I get a different analysis, as follows:

  1. There are 19365 legal gisms (4-letter rafsi) by the rules.
  1. There are 1338 gisms in actual use.
  1. The gismu avoidance rules block 3731 more gisms, for an effective total of 5069 in use.
  1. This leaves 14566 gisms available.
  1. On average, each gism blocks 4 other gisms.
  1. In effect, then, we can have about 2900 more gismu, depending on the exact details of assignments.
  • That's not correct. You can't count 4-letter rafsi, because 4-letter rafsi are allowed to be very similar to other 4-letter rafsi. The gismu they come from could differ by one small change in the 4-letter stem and by the final vowel.

If this is correct, then we can't have a distinct gismu for every culture, because the number of languages spoken around the world is 6000-7000, depending on how you count.

  • But we can have an indistinct gismu for every culture. And Rosta
  • Can you tell me exactly the rules by which gismu block each other, and gisms block each other? -tsali

b blocks p, v

c blocks j, s

d blocks t

f blocks v, p

g blocks k, x

j blocks c, z

k blocks g, x

l blocks r

m blocks n

n blocks m

p blocks b, f

r blocks l

s blocks z, c

t blocks d

v blocks b, f

x blocks g, k

z blocks j, s


Enough speculation! I have written a program that by brute force actually counts the number of free gismu. I started by generating a list of all 96475 possible gismu forms. We can call 'em "candidate gismu" or "proto-gismu". I then went through the list of 1342 official gismu, one at a time. For each one, I deleted it from the list of proto-gismu. Then I deleted all the proto-gismu that differed only by the final vowel (except for the "brodV" series (are there any other exceptions like this?]. Then I deleted all the proto-gismu that were blocked based on consonant similarity as in the table given just above. When I had done that for all the official gismu I counted the remaining proto-gismu. (On my Pentium III 450 running Windoze 98 all this takes less than a minute.)

The answer was rather surprising. I find that there are still 85,536 available gismu forms. Now, I'm sure you'll say, "That's simply not possible!" But think about it. A lot of the forms that would be blocked by the consonant similarity rules aren't even valid proto-gismu. (Don't forget, to be a valid proto-gismu a form still has to conform to certain rules.) So fewer forms are blocked than you might think. Additionally, a number of forms (no, I haven't counted them; maybe later) are blocked multiple ways. For instance, *bajru is blocked by both bacru and bajra. Because of this overlap fewer forms are blocked than you might think.

I'm pretty confident that my program is correct, but it would be a Good Thing if someone were to attempt to verify my results. --mi'e skat

  • You have to consider that those 85,536 are not independently available. Choosing one will block many of the others. Close to 80% of them will be blocked by the last vowel rule. --mi'e xorxes
  • But experimental gismu won't block each other (or at least, we can't say which blocks which). --And Rosta Furthermore, the blocking rules were part of the algorithm for assigning the original gismu forms and are not part of the living morphological rules that constrain, say, fuhivla and cmevla, since the gismu are supposed to form a closed class. Since experimental gismu are inherently unofficial, there is no need to suppose that the blocking rules would apply to them. I accept that one would expect official gismu to conform to the blocking rules.
    • Right, but in that case the relevant number is 96475 - 1342 = 95133 available forms. --xorxes