Should 窩心 be translated as nervous or heartwarming?
When language models are used in a business environment, further fine-tuning and customization is often required according to communication standards.
How would you translate the sentence “他讓我很窩心”? Using Meta Llama 2’s language model to translate it, it will be translated as “He made me feel very nervous”.
“窩心” is used in Taiwan to make people feel warm and touching, while “nervous” has a completely opposite meaning in English.
But with Taiwan’s own TAIDE model, it will be accurately translated as “He made me feel very touched.
One of the ten topics in the AI Evaluation Center is “Accuracy”. If the AI model lacks accuracy, it may cause misunderstandings and even affect work and life.
For example, in the medical field, inaccurate translation may lead to misdiagnosis or delayed treatment; in the financial field, it may lead to transaction losses; in the legal field, it may lead to litigation.
Second, if we do not have a language model that fits the cultural environment, or if a foreign language model does not have accurate evaluation capabilities when it enters our market, it is possible that other countries’ language models will use the most accurate evaluation capabilities. Cheap methods will be casually used to deal with the past, enter the market and be used by the public, causing adverse effects.
Finally, who should be the standard for “accuracy”? This is another question that many people ask. The AI Evaluation Center does not make up standard answers, but relies on normative sources, such as authoritative educational institutions like the National Institute of Education, as the basis for accuracy.
Going back to the example at the beginning, in the initial training data of Llama 2, it is obvious that there is a lot of original data that “warmth” means discomfort, which is why it chose to translate it that way. Are you saying that this is wrong? It cannot be said that it is wrong, but it does not meet Taiwan’s requirements for accuracy.
That is why it is very important to choose the unit to measure the accuracy against. If we expand it further, the accuracy must also take into account the needs of different businesses. When language models are used in a business environment, further fine-tuning and customization is often required according to communication standards.
For example, many fields have what we commonly call “jargon” — a customized vocabulary. People in the same business can understand these words at a glance, but people outside the field may not necessarily know them. Another reason is that the same words can have completely different meanings in different industries.
For example, official documents written by law firms and government agencies have very clear jargon, and outsiders may not necessarily understand this specialized vocabulary. At this point, we need to build a dataset within an organization to fine-tune and further adapt the previously trained AI model.
When all areas of life can freely fine-tune translation models, text translators will be relegated to the task of final proofreading and model tuning — this could happen within the next two to three years.