Why oligosynthesis probably fails

And why taxonomic and arithmographic languages probably fail, too

1. Introduction

In learning a language, most of the time and effort is taken up by the learning of the vocabulary, which consists of thousands of (seemingly) arbitrary items which show no relationships to other items with related meanings. Many designers of artificial languages have attempted to lighten this burden by means of an oligosynthetic design, in which there is only about a hundred basic lexical items to learn from which the rest of the lexicon is derived by productive morphological combination of the basic items. This, at first glance, is a stroke of genius: Learn about a hundred words, and some simple rules of grammar, and you can master the language after a few weeks!

2. Problems with oligosynthetic languages

But the truth is: It is not as easy as that. The back side of the coin is that you need a lot of compounds to express things that are pretty basic and have their own root words in "ordinary" languages. In essence, every time you mention something, you have to define it in your own words. This, of course, can quickly get very clumsy, and it is often very difficult. Just try to define some common words, and you will quickly find that this is often quite hard if you have a vocabulary of only about a hundred items. You will find that you often need more words than the word you are trying to define has segments ("letters", i.e. individual consonants and vowels); even in a language of the "speedtalk" type where all root words are just one segment long, the resulting compound will often be longer than the word in a language like English. It is often more practical to form an idiomatic compound - which has to be learned like a root word.

And there is another problem: proper names, and words derived from them. An oligosynthetic language needs an "escape mechanism" for proper names. Of course, in theory, you can translate them, but in practice, this does not work because different names from different languages may have the same meaning, and often the original meaning of a name is unknown. The latter is true for many geographical names which are from lost languages once spoken in the relevant region but now completely forgotten. And with proper names, the arbitariness oligosynthetic languages are designed to banish, creeps back in.

3. Problems with taxonomic languages

A historical antecedent of the oligosynthetic language is the taxonomic language, of which the 17th-century "philosophical language" projects of Dalgarno and Wilkins are the most famous examples, but it has been tried later, too. The idea is to arrange all concepts in a general taxonomy of ideas, and then assign words to them by adding more and more suffixes as one moves through the taxonomy from top to bottom.

There are fields where this works well, such as animals and other life forms, where taxonomies of this kind are in actual use. In chemistry, we have the periodic table, etc. For example, if you have a word for 'canid', you can derive words like 'dog', 'wolf' or 'fox' from it by adding different suffixes.

But in many fields, such taxonomies are less self-evident. And even where it works, the mental load is not necessarily less than in an arbitrary language. So you have learned the word for 'canid', and are aware that the fox is a canid, which of the various suffixes to add to it is that for 'fox'? You have to learn the whole path through the taxonomy to learn the word, with all the suffixes associated with it, which is hardly easier than just learning the word.

4. Problems with arithmographic languages

Finally, there is the idea for which I haven't found a term in general use, so I coined my own: the arithmographic language. The idea comes from Gottfried Wilhelm Leibniz - and it is one that is brilliant in theory but hardly practical. Leibniz's idea was to assign prime numbers to semantic primes (if there are such things at all, a question which linguists are divided about). Then, when you need a word for a concept, you take the prime numbers corresponding to the semantic primes your concept is made of, multiply them, and behold the word you need. Elegant, isn't it?

Truth is: it is not. Or at least, it is impractical. You will hardly be able to speak such a language without a pocket calculator or similar gadget. (Leibniz, of course, had invented a mechanical calculator, which he thought would make using such a language easier.) And worse: if you hear a word you hadn't heard before, while in theory you could find out its meaning by factorizing the number, this is not practical. Factorizing a number requires an awful lot of computation - modern cryptography to a great part relies on the intractability of this operation with large numbers (though in an arithmographic language with perhaps 100 semantic primes, the numbers won't be very large, but it would be inconvenient enough).

And of course, you have to memorize which prime number corresponds to which semantic prime - and as we all know, people tend to suck at memorizing numbers. So unless you are a lightning calculator, don't expect to ever become fluent in an arithmographic language. Unless, of course, you just memorize the words as you'd do in an ordinary language, without caring for their prime factors, but then, it doesn't matter any more that the language is arithmographic.

5. Conclusion

So, the idea of an oligosynthetic, taxonomic or arithmographic language, however elegant it appears to be in theory, is hardly practical. You just cannot exorcize the complexity of an arbitrary vocabulary from a language that way. Well, the amount of information has to go somewhere. The world we live in is simply too complex to pigeonhole it into an overseeable number of basic concepts from which to build a language that is easier to learn and use than a natural language.

Yet, these considerations are theoretical, and I am not aware of tests how well or how bad it turns out in practice. One of my projects on the back burner is to design an oligosynthetic language of the "speedtalk" type and try it out. But honestly, I expect the words in this language, where each morpheme shall be exactly one segment long (except proper names, for which an "escape mechanism" will be built into the language), not to be shorter than in an "ordinary" language. Alas, this project has low priority, so don't hold you breath for it.