Basic Austronesian Vocabulary Database: Investigación (2023)

Questions about human origins exert an enduring fascination (for example, where did the Polynesians come from?). Languages, like genes, are archives of history. They provide vital evidence to help unlock the mysteries of our past. Recently there have been major advances in computational methods used to make inferences from genetic data.

Languages ​​evolve in a remarkably similar way to biological species. They split into new languages, mutate and sometimes become extinct. However, despite these parallels, linguists have not routinely used the phylogenetic methods that have revolutionized evolutionary biology over the last twenty years.

The Austronesian language family is one of the largest in the world and one of the most dispersed, with around 1,200 languages ​​spoken in the area between Madagascar, Taiwan, Hawaii, Easter Island and Aotearoa/New Zealand. Our research uses phylogenetic methods to test hypotheses about Austronesian language family expansion and settlement in the Pacific. By putting genetic and linguistic evidence into a common methodological framework, we hope to be able to make more powerful inferences about our past.

What have we done

We analyzed the basic vocabulary of 400 languages ​​in this database using computational phylogenetic methods to construct a set of "family trees" for Pacific languages.

The results clearly show that the origin of the entire Austronesian language family can be traced back to Taiwan, around 5,200 years ago, and moved across the island of Southeast Asia, New Guinea and Polynesia.

We show that Pacific peopling occurred through a series of expansion pulses and peopling pauses. We can link these pulses to the development of new technologies: better canoes, agriculture, social techniques for dealing with the distances between islands in Polynesia, etc.

Basic Austronesian Vocabulary Database: Investigación (1)

Pacific map and linguistic "family tree" showing the settlement of Austronesian peoples in the Pacific. Pauses occurred before the colonization of the Philippines and before the colonization of Western Polynesia. Pulses of rapid expansion occurred in the Philippines, along the coast of New Guinea, in Micronesia and Polynesia.

Higher resolution version of the figure (400kb)

Basic Austronesian Vocabulary Database: Investigación (2)

Full picture of the Austronesian language tree.

(Video) Complexité des structures linguistiques, simplicité des mécanismes du langage - Luigi Rizzi (2020)

Highest resolution version of the figure (1.5 MB)

How it works?

  1. Collect data:

    To accurately test hypotheses about prehistory, we needed a huge amount of data. We collected this data over a six-year period from several sources:

    1: Word lists collected by linguists during fieldwork. The main suppliers were Robert Blust, John Lynch and Malcolm Ross. Many other linguists have kindly contributed word lists for languages ​​they are familiar with.

    2: Word lists and published dictionaries, including the Polynesian Lexicon POLLEX project (Biggs and Clark 2000) and a large collection of Micronesian reconstructions (Bender et al. 2003a, 2003b).

    3: Native speakers who have contributed word lists for their languages ​​through theweb interface

    See the Austronesian Basic Vocabulary Database authors page for a complete list of contributors, and see our newspaperin the database too.

  2. Group entries into like sets:

    To identify relationships between languages ​​we use theLinguistic comparative method. This method usually takes a lexical sample and proceeds to reconstruct systematic sound correspondences across languages ​​to discover historically related "cognate" forms. These matches can be used to identify words (and therefore languages) that descend from a common ancestor.

    In the table below, the entries for "hand" show a common sound change from "l" to "r". This is also seen in the entries for "skin", with a consistent correspondence between the Hawaiian "l" and the Tahitian/Maori/Rapanui "r".

    Another systematic correspondence can be seen in the entries for "bone" and "woman". These matches can be used to identify words (and therefore languages) that descend from a common ancestor. In this case, the light blue colored shapes share a common ancestor.

    In the entries for "spit", there are two cognate sets: the first "anu/aanu" is present in Samoan and Rapanui and descends from the ancestral Polynesian form *anu, while the second "tuhu/tutuha" is an innovation. in the East Polynesian languages ​​of Tahitian and Maori.

    Basic Austronesian Vocabulary Database: Investigación (3)

    Polynesian language table with color-coded cognate words.

    (Video) Studying a language with only 15 speakers - Refold Podcast - Ep 26

    These related judgments have been made by or in consultation with a range of linguistic experts, includingRoberto Blust(Professor of Linguistics, University of Hawaii at Manoa),jeff mark(Associate Researcher, Australian National University),Juan Lynch(Professor of Pacific Languages ​​and Director of the Pacific Languages ​​Unit at the University of the South Pacific),laurent sagart(Senior Scientist, National Center for Scientific Research),Malcom Ross(Professor of Linguistics, Australian National University) and ourselves.

  3. Convert the affine sets to a binary matrix:

    To analyze these affine sets, we encode them as binary characters that show the presence or absence of the affine set in each language.

    Basic Austronesian Vocabulary Database: Investigación (4)

    Words meaning "bone" in various Austronesian languages, showing idiom, cognate set, and binary encoding.

  4. Analyze the data:

    The methods we use here are known as Bayesian phylogenetic methods. These are the tools used by modern evolutionary biology to build family trees based on DNA sequences.

    In this framework, the analysis aims to find the most likely set of trees given the data and a stochastic model of lexical evolution. Our model allows differentExchange ratesbetween affine sets where some elements may evolve faster than others at different places in the tree.

  5. Tree dating:

    We can date the trees we find using phylogenetic dating methods. The trees we found in the survey above have branches proportional to the amount of change along that lineage. These branch lengths can be converted to time by adding calibration points. For example, the East Polynesian subgroup may be limited to around 1,200 to 1,300 years ago based on early settlement times. Likewise, the Chamic subgroup can also be calibrated based on the fact that Chamic speakers were mentioned in Chinese records around 1,800 years ago, probably entering Vietnam around 2,600 years ago.

    (Video) How to Make an A Posteriori Language - Part 3: Research (Conlang Showcase)

    These calibrations allow the method to estimate how quickly the changes measured by branch lengths occur. We can then convert the branch lengths into time estimates by "smoothing" the rates of change in the tree. Rather than assuming a constant retention rate, this allows certain parts of the tree to change faster or slower than others.

Common questions

  1. What are the methodological innovations in relation to your study published inNatureem 2000

    The pulse/pause model makes four important predictions about the origin, age, sequence, and timing of pulses and pauses in the Austronesian expansion. Inthe paper of 2000we just test the prediction in the sequel.

    We are able to test the other three predictions in this paper because of four methodological advances.

    1. In this article we base our analyzes on avery large database about basic vocabulary we build(over 34,000 related sets for 400 languages). The sample of languages ​​and vocabulary in the 2000 document was spotty. This meant that we could not obtain accurate estimates of branch lengths and therefore could not date the trees.

    2. We tested the Taiwanese origin using some outgroups to root the trees. The 2000 document only rooted the tree in Taiwan (assuming a Taiwanese origin).

    3. We use Bayesian phylogenetic methods rather than a parsimonious approach to building the trees. This means that we can explicitly incorporate uncertainty into our estimates of trees and the length of their branches in our analyses.

    4. Most significantly, in this article we date the trees. This means that we can test key predictions about the age of Austronesian and the timing of expansion pulses and pauses.

  2. And thatlexicostatistics?

    No. Lexicostatistics calculates pairwise affine similarity measures across languages ​​to link them into subgroups. Our methods compute the probability of each cognate set in a set of trees in a search to find the most likely set of trees. Therefore, both analyzed data and statistical analyzes are fundamentally different.

    For more information on how phylogenetic methods work, seeheWikipediapage.

  3. Your date estimates are based onglotochronology?

    No. Our methods do not assume a single constant rate of change over time. Instead, we use a method ("Penalty Probability Rate Smoothing") which allows us to "smooth out" observed rates of change in trees by taking into account historical information such as "calibration points". This allows certain languages ​​and subgroups to change at different rates over time. Also, provided that by calculating these age estimates through a distribution of trees, we have a confidence interval around each age estimate.

    (Video) [sub] Korean originate from 9,000 years ago west-Liaohe agricultural people

  4. Languages ​​borrow from each other: how would that affect your results?

    In our database, we have many identified loanwords. We remove them from the analysis. Furthermore, we avoid the use of known creole languages ​​or with "hybrid" stories.

    More importantly, our methods are actually very robust for cross-language borrowing (we are working on a study that shows this at the moment), and even reasonably high levels of borrowing are unlikely to have substantially influenced our results.

  5. Wasn't everyone already convinced that Wallace's model of origin was wrong?

    It depends on who you talk to. Among linguists and archaeologists it has few advocates, but many geneticists take it very seriously.

  6. What about Thor Heyerdahl's theory that Polynesians came from South America?

    There is no evidence that Polynesians originated in the Americas, and most modern researchers consider this highly unlikely. Instead, genetic, linguistic, and archaeological evidence points to a Taiwanese origin.

  7. 400 languages, 34,000 affinity sets... that's pretty big, isn't it?

    Yes. Most mitochondrial genomics work is done with data from around 16,000 base pairs, and usually no more than 400 species either!

    Thanks toMark Pagel e Andrew Meadewe were lucky enough to have access to a large group of computers inCenter for Advanced Computing and Emerging Technologies (ACET) at the University of Reading.

    In total, the analyzes we did here took over a month and a half on this 150-node cluster.

another read

See all our publications

give a clickhere to read the Belarusian translation

(Video) Cariban languages | Wikipedia audio article


What are the 3 most spoken language groups in Austronesian? ›

Very broadly, one can divide the Austronesian languages into three groups: Philippine-type languages, Indonesian-type languages and post-Indonesian type languages: The first group includes, besides the languages of the Philippines, the Austronesian languages of Taiwan, Sabah, North Sulawesi and Madagascar.

What are the common languages in Austronesian? ›

Major Austronesian languages include: Cebuano, Tagalog, Ilocano, Hiligaynon, Bicol, Waray-Waray, Kapampangan and Pangasinan, which are spoken in the Philippines; Malay, Javanese, Sundanese, Madurese, Minangkabau, the Batak languages, Acehnese, Balinese, and Buginese, which are spoken in western Indonesia; and Malagasy, ...

What are 2 Austronesian languages? ›

Major Austronesian languages include Cebuano, Tagalog, Ilocano, Hiligaynon, Bicol, Waray-Waray, Kapampangan, and Pangasinan of the Philippines; Malay, Javanese, Sundanese, Madurese, Minangkabau, the Batak languages, Acehnese, Balinese, and Buginese of western Indonesia; and Malagasy of Madagascar.

What is the most common Austronesian language? ›

Major languages
  • Ilokano (8 million native, ~10 million total)
  • Hiligaynon (Ilonggo) (7 million native, ~11 million total)
  • Minangkabau (7 million)
  • Bugis (5 million)
  • Bikol (4.6 million, all dialects)
  • Banjar (4.5 million)
  • Acehnese (3.5 million)
  • Balinese (3 million)

What is the hardest Austronesian language to learn? ›

Tagalog: An Austronesian language, Tagalog is the language spoken by almost a quarter of the total population of the Philippines. Its grammar and uncommon sentence structure make it pretty difficult to master.

What are the 7 Austronesian languages still spoken in the Philippines? ›

There are more than 170 known languages in the Philippines including Bikol, Cebuano, Hiligaynon (Ilonggo), Ilocano, Kapampangan, Pangasinan, Tagalog, and Waray. All of these languages, with the exception of Spanish-based creole language Chavacano, belong to the Austronesian language family.

Is Austronesian the same as Polynesian? ›

Polynesians, including Samoans, Tongans, Niueans, Cook Islands Māori, Tahitian Mā'ohi, Hawaiian Māoli, Marquesans and New Zealand Māori, are a subset of the Austronesian peoples.

Is Hawaiian an Austronesian language? ›

Hawaiian is a Polynesian member of the Austronesian language family. It is closely related to other Polynesian languages, such as Samoan, Marquesan, Tahitian, Māori, Rapa Nui (the language of Easter Island) and Tongan.

What is the largest Austronesian ethnic group? ›

The Javanese people of Indonesia are the largest Austronesian ethnic group.

What ethnicity are Austronesian people? ›

The Austronesian Linguistic Family belongs to Malay race, the most widely spread ethnic group in the world. The range starts from Madagascar in Southeast Africa across the Indian Ocean to Easter Island in the Pacific Ocean and from Taiwan in the north to New Zealand in the south.

Are Japanese and Austronesian related? ›

Several linguists have proposed that the Japonic languages are genetically related to the Austronesian languages. Some linguists think it is more plausible that Japanese was instead influenced by Austronesian languages, perhaps by an Austronesian substratum.

Why is it called Austronesian? ›

The name “Austronesian” comes from the Greek words for 'south' and 'island. ' Austronesia includes Madagascar, Indonesia, the Philippines, Taiwan and the Pacific islands of Melanesia, Micronesia and Polynesia.

What language is not Austronesian? ›

The Papuan languages are the non-Austronesian and non-Australian languages spoken on the western Pacific island of New Guinea, as well as neighbouring islands in Indonesia, Papua New Guinea, Solomon Islands, and East Timor by around 4 million people.

Are Austronesian languages gendered? ›

No grammatical gender

Certain language families, such as the Austronesian, Turkic and Uralic language families, usually have no grammatical genders (see genderless language).

Is Samoan an Austronesian language? ›

Samoan is a Polynesian language and a member of the Eastern or Oceanic subgroup of the Austronesian family of languages.

What's the easiest language in the world to learn? ›

5 easy languages to learn
  • English. It's the most widely spoken language in the world, making practice possible. ...
  • Spanish. It's heavily influenced by Latin and Arabic, spoken as it's written and has fewer irregularities than other romance languages. ...
  • Italian. ...
  • Swahili.

Are Philippines Polynesian? ›

No, the Philippines is not a Polynesian island, but is rather an archipelago in Southeast Asia. The Filipinos are of Austronesian ancestry, like the Polynesians are. There are almost 8,000 islands that make up the Philippines.

What is the oldest language in the world? ›

What is the first language? Sumerian can be considered the first language in the world, according to Mondly. The oldest proof of written Sumerian was found on the Kish tablet in today's Iraq, dating back to approximately 3500 BC.

Are some Filipinos Hispanic? ›

What about Brazilians, Portuguese and Filipinos? Are they considered Hispanic? People with ancestries in Brazil, Portugal and the Philippines do not fit the federal government's official definition of “Hispanic” because the countries are not Spanish-speaking.

Are Filipinos Pacific Islander? ›

Southeast Asia: Bruneian, Burmese, Cambodian, Filipino (also regarded as Pacific Islanders), Hmong, Indonesian, Laotian, Malaysian, Mien, Singaporean, Timorese, Thai, Vietnamese. South Asia: Bangladeshi, Bhutanese, Indian, Maldivians, Nepali, Pakistani, Sri Lankan. West Asia is typically referred to as the Middle East.

What are the dead Philippine languages? ›

According to Ethnologue, a total of 182 native languages are spoken in the nation and four languages have been classified as extinct: Dicamay Agta, Katabaga, Tayabas Ayta and Villaviciosa Agta.

What is the religion of Austronesian? ›

The focus of Austronesian beliefs range from localised ancestral spirits to powerful creator gods. A wide range of practices also exist, such as headhunting, elaborate tattooing, and the construction of impressive monuments.

Are Aboriginals Austronesian? ›

Abstract. Taiwanese aborigines have been deemed the ancestors of Austronesian speakers which are currently distributed throughout two-thirds of the globe.

Is Hawaii considered Polynesian? ›

Hawai'i is the only archipelago of Polynesia that is north of the equator.

Are Hawaiian people Hispanic? ›

Latinos make up almost 10 percent of Hawaii's population, and it turns out the Aloha State has deep Spanish and Latino roots. The first Spanish immigrant, Francisco de Paula Marín arrived in the late 1700s.

Is Maori and Hawaiian related? ›

Maori and Hawaiian, two Eastern Polynesian languages that are separated by some 5,000 miles of sea, appear to be about as closely related as Dutch and German.

Is Tongan Austronesian? ›

Tongan is an Austronesian language of the Polynesian branch spoken in Tonga.

Are Austronesians related to East Asians? ›

Modern Austronesian and Austroasiatic speaking populations of Southeast Asia were found to have mostly East Asian-related ancestry (89% to 96%, with 94% on average).

Is Taiwan an Austronesian? ›

Indigenous Taiwanese are Austronesian peoples, with linguistic and genetic ties to other Austronesian ethnic groups, such as peoples of the Philippines, Malaysia, Indonesia, Madagascar, and Oceania.

Where does Austronesian originate? ›

The Austronesian language family, which originated in Taiwan, spans half the world from Madagascar to Easter Island. Despite centuries of colonization and assimilation policies, that 17 indigenous languages survive in Taiwan to this day is a testament to the resiliency of Taiwan's indigenous peoples.

Is Negrito Austronesian? ›

All the Negrito languages are Austronesian, as are all the native languages of the Philippines. The Negrito languages do not form a subfamily among the Philippine Austronesian languages.

Are Austronesians Africans? ›

But latest DNA findings establish no relation between Africans and dark-skinned Austronesians. Instead, the brown and black types of Austronesians are closer to each other genetically than to any outside groups.

Are Japanese descended from Chinese? ›

The study revealed for the Japanese as a whole, some genetic components from all of the Central, East, Southeast and South Asian populations are prevalent in the Japanese population with the major components of ancestry profile coming from the Korean and Han Chinese clusters.

What is Japan dominant ethnicity? ›

The Yamato people are the dominant native ethnic group of Japan and because of their numbers, the term Yamato is often used interchangeably with the term Japanese.

Are Ryukyuan people Chinese? ›

An autosomal DNA analysis from Okinawan samples concluded that they are most closely related to other Japanese and East Asian contemporary populations, sharing on average 80% admixture with mainland Japanese and 19% admixture with Chinese population, and that have isolate characteristics.

What is the synonym of Austronesian? ›

Also called Malayo-Polynesian.

What is the difference between Austronesian and Austroasiatic? ›

Malaysian indigenous languages are of two entirely different families: Austronesian and Austroasiatic. The former consists of Malay and all the languages of Sabah and Sarawak, while the latter the aboriginal languages found only in Peninsular Malaysia.

What is the oldest language in Southeast Asia? ›

The oldest and most extensive written language of Southeast Asia is Old Javanese, or Kawi. It is the oldest language in terms of written records, and the most extensive in the number and variety of its texts.

What is the most gender-neutral language? ›

- Genderless languages (such as Estonian, Finnish and Hungarian), where there is no grammatical gender and no pronominal gender.

Which language has no gender? ›

There are some languages that have no gender! Hungarian, Estonian, Finnish, and many other languages don't categorize any nouns as feminine or masculine and use the same word for he or she in regards to humans.

What is the most spoken language in the world? ›

1. English (1,452 million speakers) According to Ethnologue, English is the most-spoken language in the world including native and non-native speakers. Like Latin or Greek at the time, English has become the world's common language.

Why are Samoans so big? ›

“This high prevalence of obesity among Samoans is a relatively recent phenomenon,” Arslanian notes. It appears to be “heavily influenced by globalization” and “the shift from subsistence agriculture to excess consumption of high calorie, processed foods and sedentary lifestyles.”

What is the DNA of Samoan people? ›

Samoans have less Papuan admixture (estimated through f4 ratio) than the other Polynesian (Tongans) and Polynesian outlier (Ontong_Java, RenBel, and Tikopia) populations in our dataset, which collectively have an average of 35.38% Papuan ancestry (Fig.

What ethnicity is similar to Samoan? ›

Samoan is believed to be among the oldest of the Polynesian tongues and is closely related to the Maori, Tahitian, Hawaiian, and Tongan languages.

What are the 3 major language groups? ›

Glottolog 4.6 (2022) lists the following as the largest families, of 8,565 languages: Atlantic–Congo (1,406 languages) Austronesian (1,271 languages) Indo-European (583 languages)

What are all the Group 3 languages? ›

Group III Languages:

Including Amharic, Bengali, Burmese, Czech, Finnish, Hebrew, Hungarian, Khmer, Lao, Nepali, Pipilino, Polish, Russian, Serbo-Croatian, Sinhala, Thai, Tamil, Turkish, Vietnamese

What are the most common Austroasiatic languages? ›

Khmer and Vietnamese are the most important of the Austroasiatic languages in terms of numbers of speakers. They are also the only national languages—Khmer of Cambodia, Vietnamese of Vietnam—of the Austroasiatic stock.

What are the 7 international languages? ›

These are Arabic, Chinese, English, French, Russian and Spanish.
The days are as follows:
  • Arabic Language Day (18 December)
  • Chinese Language Day (20 April)
  • English Language Day (23 April)
  • French Language Day (20 March)
  • Russian Language Day (6 June)
  • Spanish Language Day (23 April)

What is the easiest language to learn? ›

We've used data from the Foreign Service Institute (FSI) to rank them from the easier to the somewhat more challenging.
  • Frisian. ...
  • Dutch. ...
  • Norwegian. ...
  • Spanish. ...
  • Portuguese. ...
  • Italian. ...
  • French. ...
  • Swedish.
Oct 24, 2021

What is the oldest language family in the world? ›

The Tamil language is recognized as the oldest language in the world and it is the oldest language of the Dravidian family. This language had a presence even around 5,000 years ago.

What is the 3rd most talked language? ›

Which Languages Have the Most Speakers?
RankLanguageTotal Speakers
1English1,132 million
2Mandarin Chinese1,117 million
3Hindi615 million
4Spanish534 million
6 more rows
Feb 15, 2020

What is the hardest language to learn? ›

Across multiple sources, Mandarin Chinese is the number one language listed as the most challenging to learn. The Defense Language Institute Foreign Language Center puts Mandarin in Category IV, which is the list of the most difficult languages to learn for English speakers.

Is speaking 3 languages polyglot? ›

Monolingual – Speaks one language. Bilingual – Two different languages. Trilingual – Three different languages. Polyglot – (Three)/Four+ different languages.

How many languages makes you a polyglot? ›

How many languages do you have to speak to be a polyglot? If you speak even more than three, you might be known as a polyglot. And if you're any of the above, you can also describe yourself as multilingual.

Are Austroasiatic and Austronesian related? ›

Austroasiatic is an integral part of the controversial Austric hypothesis, which also includes the Austronesian languages, and in some proposals also the Kra–Dai languages and the Hmong–Mien languages.

Is Japanese an Austroasiatic language? ›

Within the Austroasiatic language family, the Munda languages are a clear-cut case of father tongues, whereas Japanese and Korean are manifestly not. In this study, the cases of Munda and Japanese are juxtaposed.

Where did the Austronesians come from? ›

They originated from a prehistoric seaborne migration, known as the Austronesian expansion from pre-Han Taiwan, circa 1500 to 1000 BCE. Austronesians reached the northernmost Philippines, specifically the Batanes Islands, by around 2200 BCE.


1. All Hands on Deck - Day 2
(MIT Media Lab)
2. Beginner Polynesian Genealogy
3. Que es Access? para que sirve?
(Aplicaciones Office)


Top Articles
Latest Posts
Article information

Author: Rubie Ullrich

Last Updated: 06/10/2023

Views: 6656

Rating: 4.1 / 5 (72 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Rubie Ullrich

Birthday: 1998-02-02

Address: 743 Stoltenberg Center, Genovevaville, NJ 59925-3119

Phone: +2202978377583

Job: Administration Engineer

Hobby: Surfing, Sailing, Listening to music, Web surfing, Kitesurfing, Geocaching, Backpacking

Introduction: My name is Rubie Ullrich, I am a enthusiastic, perfect, tender, vivacious, talented, famous, delightful person who loves writing and wants to share my knowledge and understanding with you.