1. Startups

Indonesian "Large Language Model" Project Announced, Result of Public and Private Sector Collaboration

This initiative is driven by BRIN, KORIKA, and two GDP Venture portfolios (Glair.ai & Datasaur.ai) together with AI Singapore

BRIN (National Research and Innovation Agency), KORIKA (Artificial Intelligence Research & Innovation Collaboration), and two GDP Venture portfolios (Glair.ai & Datasaur.ai) together with AI Singapore (AISG) announced a collaborative project initiative to develop an open Large Language Model (LLM) for Indonesian so that it can be widely used by various parties.

“The current LLM model is heavily influenced by western culture, making it less likely that ChatGPT will behave like humans in that region. "ASEAN in the global economy has an important role, but we are still underrepresented," said AI Singapore Head of Strategy, Partnerships & Growth Darius Liu in a press conference, Thursday (30/11).

AISG is SEA-LION developer (Southeast Asian Languages ​​in One Network), a open-source An LLM developed to better understand and represent the diverse contexts, languages, and cultures of Southeast Asia. AISG is a national program supported by the National Research Foundation of Singapore and hosted by the National University of Singapore.

SEA-LION is built on a powerful MPT (Mosaic Pretained Transformers) architecture and has a vocabulary size of 256k. For tokenization, the model uses SEABPETokenizer, specifically designed for Southeast Asian languages, ensuring optimal model performance.

LLM is a type of artificial intelligence model designed to understand and reproduce human language. They are trained using large amounts of text data and can perform a variety of tasks such as translating, summarizing, answering questions, and even writing code.

Current LLMs (ChatGPT from Open AI, Bard from Google) exhibit strong biases in terms of cultural values, political beliefs and social attitudes. This is because training data, especially taken from the internet, often skews WEIRD (Western, Educated, Industrialized, Rich, Democratic) influences.

This phenomenon leaves a void in other language markets and concentrates technological advantages among English-speaking countries. Based on Statista data in January 2023, English language dominance reached 58,8% for web content, while Indonesian's share was only 0,6%. This fact underscores the need for broader research and development to meet Indonesian language needs.

Claimed, compared open source LLM from a western country, SEA-LION is able to answer as if talking to a human because its use of language is not rigid. There are also some local contexts that LLM cannot answer, such as ChatGPT. Since the SEA-LION initiative was carried out, this LLM has trained a lot in Indonesian and Thai. Then followed by Malay and Vietnamese, languages ​​from other countries still need to be trained more.

Collaborative project

CTO GDP Venture/CEO & CTO GDP Labs On Lee conveyed, in line with AISG's vision which wants to create a special LLM in Indonesian that can be useful in Southeast Asia. GDP Venture, through its portfolios Glair.ai and Datasaur.ai, is adapting the SEA-LION platform to suit the Indonesian context in order to create a comprehensive open Indonesian language LLM.

“This initiative promises benefits such as reduced operational costs, increased revenue and productivity, and effective human and AI collaboration, all of which contribute to economic growth and technological progress in Indonesia and Southeast Asia,” said On Lee.

More Coverage:

Meanwhile, for BRIN, adopting Indonesian LLM can improve the quality and efficiency of research, increase accessibility to the public, support technological development, and increase human resources. Apart from that, it also provides opportunities for the acquisition of knowledge both scientific and local cultural in nature.

Datasaur.ai, Glair.ai, BRIN, and AISG target the development of this LLM to ultimately encourage the creation of AI platforms, such as ChatGPT. The difference is the intended use which will be more specialized according to the target consumer. “ChatGPT is more like general purpose, so it's hard to compete head-on. "We have to be smart about how we can meet our consumers," added On Lee.

Are you sure to continue this transaction?
Yes
No
processing your transactions....
Transaction Failed
try Again

Sign up for our
newsletter

Subscribe Newsletter
Are you sure to continue this transaction?
Yes
No
processing your transactions....
Transaction Failed
try Again