
How to get AI to work in 22 languages

"Being on the road is always very stressful and especially in cities like Mumbai," he says.
But when he started out language barriers were an additional problem.
His first language is Marathi and Mr Sawant speaks "very little" English. "I can understand but it's very difficult to read," he explains.
That caused problems at his new job.
He said: "At first, it was difficult. Everything was in English, and I can understand some of it, but I'm more comfortable in Marathi. I used to ask other delivery guys to help me figure out what to do."
His employer, Zepto, promises "India's Fastest Online Grocery Delivery". So having drivers struggling with delivery instructions was not ideal.
To smooth this process a year ago, Zepto partnered with Reverie Language Technologies to introduce an AI translation service for its drivers.
Since then its delivery drivers have been able to choose between six languages on the Zepto app.
"I don't have to guess anymore," says Mr Sawant.
"Earlier I would take more time to read and sometimes even made mistakes. Now if the customer writes 'ring bell', I get that instruction in Marathi. So, I don't have to ask or check again. It's all clear."
Mr Sawant's difficulties are common.
"India has 22 official languages and hundreds of dialects," says Professor Pushpak Bhattacharyya, from IIT Mumbai, one of India's leading experts in the use of AI in Indian languages.
"Without tech, that understands and speaks these languages, millions are excluded from the digital revolution - especially in education, governance, healthcare, and banking," he points out.
The rollout of new generative AI systems, like ChatGPT, has made the task more urgent.
Vast amounts of data, like web pages, books or video transcripts are used to train an AI.
In widely spoken languages like Hindi and English that is relatively easy to get, but for others it is more difficult.
"The main challenge to create Indian language models is the availability of data. I'm talking about refined data. Coarse quality data, is available. But that data is not of very high quality, it needs filtering," says Professor Bhattacharyya.
"The issue in India is for many Indian languages, especially tribal and regional dialects, this data simply doesn't exist or is not digitised."