Common Voice across borderless waters

The dawn has just broken. Yet, the children are already seated in their classroom. They follow their teacher’s lead on screen and practice reading in their own native language.

🎪 This is a scene from the Council of the Indigenous Peoples’ “Live Broadcasting System Project”. When the time arrives for self-study each morning, the language teacher appears in front of a green studio screen and give lessons to the children on the other side of Taiwan. One of these studios is located in the national Social Innovation Lab.

🌏 As an island located on the Pacific Ocean, Taiwan has 16 ethnic languages and 42 dialects among its indigenous peoples. However, these languages are facing the serious threat of extinction. Ten of these languages, including Puyuma, Saisiyat, Sakizaya, Kavalan, Tsou, Hla’alua, Kanakanavu, Mangesle, Mamangele and Donnarukai, have been listed as endangered languages by the United Nations and the Council of the Indigenous Peoples.

🏡 Teaching these languages to the next generation is the only way to ensure their revival. Therefore, when I saw that the studio of the Social Innovation Lab provides this function, I not only was overjoyed but also began thinking: what else can we contribute in addition to the live broadcast?

💬 The Common Voice project, developed by the social enterprise Mozilla, has enabled me to see another possibility.

📲 This project uses crowd participation to improve on the existing AI speech recognition system. The mobile phone voice assistant currently available has made people’s lives much more convenient. However, voice recognition still requires a massive amount of voice data to enable deep learning by the machines. It is extremely expensive to build a computing system or a voice database. For this reason, the speech recognition technology currently available still focuses on mainstream languages, resulting in a decrease in the use of remote languages, which is not conducive to the rehabilitation of remote languages.

📖 So, Mozilla has come up with the Common Voice Project. From collecting, processing to archiving, it uses CC0 open authorization and completely forgoes the concept of copyright. Anyone can read the text on the website and record their own voice, thereby adding to the voice collection of different languages, accents, gender, age and contributing to the construction of the world’s largest open source voice database.

🏫 Our “Basic Law of the Indigenous Peoples”, “Basic Law of Hakka” or the “National Language Development Law” under review all have an emphasis on the rights of every person to enjoy an equal opportunity to learn the inherent ethnic languages. Therefore, in the future, we will be working with Common Voice to combine the existing government resources and build a collection of Taiwan’s remote languages, such as the aboriginal ethnic languages, or the tones and accents of each inherent ethnic language. As a result, we are committed to compiling complete textbooks of all related languages.

🎨 For many years, I have taken part in the “MoeDict” project in the g0v community to build an online dictionary that integrates Taiwan’s multiple languages. I believe that the spirit of Common Voice and MoeDict is the spirit of an “inclusive cultural affirmation” through digital tools and open materials.

As long as everyone works together, every language in Taiwan can become the “common voice” of everyone in the future.

Common Voice across borderless waters

How is the health insurance card virtualized? Three collaborative meetings to clarify possible directions

Can Dongsha Island open up ecotourism? Participants suggested opening from the boat dive