Indonesian words are long – why?

This is something I found out while Umar (our toddler) learns to speak.. It feels to me he clearly prefers learning new English words than Indonesian ones. For example, when I tell him something like “it’s a car, Umar” he would repeat “Yes, it’s a car, daddy!” and henceforth he would call the thing “car”, while if I teach him “Itu mobil” he would react a bit more slowly.

Environment is probably the main factor.[1] But another thing I and my wife notice is that English words are almost always shorter (or at least equally short) to the Indonesian counterparts, in terms of syllable count. Some illustration, just some random words that come to my mind right now:

  • Basic colors: blue, red, yellow, green, black, white vs biru, merah, kuning, hitam, putih
  • Pets and farm animals: cat, dog, cow, goat, chicken vs kucing, anjing, sapi, kambing, ayam
  • Basic verbs: eat, drink, sleep, walk, run, jump, go, come vs makan, minum, tidur/bobok, jalan, lari, lompat, pergi, datang[2]
  • Everyday objects: home, food, toys, car, train, bike, tree, shirt, pants vs rumah, makanan, mainan, mobil, kereta, sepeda, pohon, baju, celana

I bolded the English words that have fewer syllables, which for the list above is almost all of them. Some of them equal. But none of the Indonesian words above are shorter. If I think harder, I can find some opposite examples (e.g. gajah vs elephant, besok vs tomorrow), but they are much rarer.

Why is it that Indonesian words are longer? One can explain it in various ways, but being an engineer, my favourite theory/explanation is using information theory. Before that word scares you (you, non-technical people!), let me show you something.

English has way more vowel sounds[3] than Indonesia. Like tons more. Standard Indonesian (like its twin, Malay) have just 6 vowels (a, i, u, e like kecil and e like becak), and 3 diphtongs (ai, au, and oi). According to this chart English (considering only the standard dialect in the US and Britain) has about  27 vowels and diphtongs[4]:

english vowels

To add to this, English also has more consonant clusters. An English syllable can begin with up to 3-consonant cluster (e.g. string) and end with 3-consonant clusters too (e.g. warmth[5]). Indonesian syllables don’t even have clusters except for loan words, and that’s only limited to 2-consonant cluster in the beginning like “truk” and does not use clusters at the end of a syllable[6].

How is this related to the length of words? This is where information theory comes in. Syllables are units of information, like bits or bytes for computer systems. Since an English syllable have more possible vowels and more possible consonant clusters, it has more possible values, which means it packs more information (more ‘bits’). Just like in computer systems, a byte packs more information than a hex because a byte can have 256 values while a hex can only have 16 values.

Because an English syllable packs more information than Indonesian, it follows that to convey the same amount of information, English needs fewer syllables than Indonesian.[7]

This means Indonesian (at least, written Indonesian) is less efficient. You’ll also know this when you compare Indonesian translation of foreign books like Harry Potter or Dan Brown novels, which are usually much thicker than the original work in English. As for spoken Indonesian, the theory is the low information rate/syllable should be counterbalanced by speaking faster. I’m not sure how true this is for Indonesian, but a research has shown it by using other languages.[8]

So, to recap, I think English words are shorter because they contain more information per syllable. In other words, for English more information is encoded in the choice of syllables, while in Indonesian more is encoded in the quantity of syllable.


[1] Since we live in the US, people (including his friends and people on TV) speaks English all the time. I and my wife speak mostly in Indonesian at home though. We keep teaching the little one our national language, even though mostly less successfully than teaching him English.

[2] this doesn’t yet include the prefix and affix that often accompany verbs in Indonesian, like berjalan

[3] Vowel sounds, not vowel letters. There is only 5 vowel letter of the alphabet (a, e, i, o, u) in English or in Indonesia

[4] The number will vary in different sources because English has a lot of variation, see

[5] Yes, I know, r, m, t, h are 4 letters. But they are only 3 sounds, because th here represents only one sound

[6] If you’re interested in this topic, this is an interesting paper discussing Indonesian’s lack of clusters and how it affects Indonesian learners/speakers of English: Final Consonant Clusters Simplification by Indonesian Learners of English and Its Intelligibility in International Context

[7] This can be illustrated, again, using computer system. Look at the count of unit to convey the same information (the letter ‘a’) increases as the possible values/unit decreases.

lowercase a

[8] See Which language is the most efficient? and Across-Language Perspective on Speech Information Rate

Indonesian words are long – why?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s