
Monday, June 21, 2021

Data Analysis for Singlish Texts

Like many other non-English nations, we Sri Lankans used to type Sinhala words using English text. Though there are many word processing tools and apps are available for Sinhala texts, still, we see a lot of people use Singlish words. 
Not only these Singlish texts are difficult to read, at the research level there are difficulties in identifying these words.
In every text related research, the first task would be identifying the Language. We have discussed how to detect a language using Azure machine learning in a previous article
This post is to look at whether we can detect Singlish text using Azure Machine Learning. The following is the configured Azure Machine Learning experiment.

You can download the experiment from the Azure AI gallery. Let us look at some important findings in this experiment. 

Out of the 1400+ texts, 35% were identified as English may be due to the fact that letters are in English. Then the big surprise is much Singlish texts were identified as Indonesian and Romanian and the percentages are 26%, 13%. Not sure there is a relationship between the Singlish language with Indonesian and Romanian languages. 
Another important finding is that Singlish texts are identified as 40 different languages such as Maly, Turkish, Polish, Irish etc. 

No comments:

Post a Comment