Building India's own ChatGPT Faces Data Challenges in Terms of Languages

By Consultants Review Team Monday, 25 December 2023

The year 2023 was all about Open AI's ChatGPT chatbot, a human-like software application developed to simulate conversation-based user instructions. 

Bhavish Aggarwal, chief executive officer of Ola Electric, introduced Krutrim, an LLM his business claims as "India's own AI" (artificial intelligence) at the end of the year. 

Aggarwal is not the first to enter the Indian LLM market. Other comparable efforts include Bhashini, a government of India initiative, AI4Bharat from the Indian Institute of Technology Madras, Sarvam.ai, and Project Vaani.

Developing an LLM in Indian languages is a difficult task. According to experts, each language in India has its own nuance, making developing a ChatGPT-like product an ambitious task.

Billionaire Vinod Khosla, an early backer of OpenAI and a pioneer in Silicon Valley AI investments, stated of Sarvam's $41 million fund-raise, "We need companies like Sarvam AI to develop deep expertise for building AI in and for India." 

To build an LLM, three factors are critical: access to data in the local language, computational capacity, and constant dataset training.

When constructing an Indian LLM, these three conditions are obstacles. This contrasts ChatGPT, which was developed mostly in English and had access to a large amount of data.


 

Current Issue