In Natual Language Processing, one of the important processes is to identify the context by keywords. Even in songs, we can use keywords to identify the context of the song. Though, there are some instances where music is also used to understand the context, most of the time it is the keywords that are used to identify the context of songs.
Let us look at how to identify Sinhala songs that describe different aspects of University lives. It can be radical or love related. But the important question is what are keywords. Before that let us see what are those Sinhala Songs that are related to University Songs. There are two blog posts, this and this that contains some sets of Songs. Further, this youtube video also contains university songs.
Now, what are those keywords? If you analyse these songs, you can categorise those keywords to Similarly Terms (සරසවිය, තක්සලා, සිප් හල, වාසිටි), University Locations (හන්තාන, මහවැලි, රොබරෝසියා,බෙලිහුල් ඔය, දෙසුම් හල, කලාගාර, ලෙක්චර්, ජපුරේ,සරුංගලේ, කැලණි), University Activities (ශිල්ප, සිව් වසරක, නවක වදය, උපාධි )
We have identified 48 Sinhala Songs that are related to Sri Lankan Universities. (PN: If you know any other songs than this let me know). Following is the Analysis of these keywords.
Most of the songs contain the keyword සරසවිය. Next is හන්තාන which is a unique location of Univerity of Peradeniya. Further, මහවැලි and රොබරෝසියා are unique locations for the University of Peradeniya. However, when මහවැලි keyword exists there is always හන්තාන or සරසවිය but when රොබරෝසියා there only සරසවිය Keywords. This means out of 48 songs collected, 24 songs are from the University of Peradeniya. Meanwhile, there are three other songs on three other universities, the University of Jayawardenapura (හන්තාන නැතිමුත් ජපුරේ), the University of Sabaragamuwa (සමනළ වැව පුරවන්නට) and the University of Kelaniya (කතාකරන්නට හැකි නම් බිත්ති වලට සරුංගලේ).
The next important keyword is සිව් වසරක which tells you about the duration of the University Learning period. Except for one instance, (තනිවෙන්නට මගේ ලොවේ) this keyword comes with the general keyword (සරසවිය).
Another important finding is that there are three songs that are Sinhala University-related songs but does not contain any keywords. Those songs are මේ උයන් තෙරේ, සැලේ මහද සැලේ සැලේ, ආදරේ මල් අතින් අර ගෙන. However, සැලේ මහද සැලේ සැලේ Song exists in the film called, හන්තානේ කතාව. In the NLP world, these will result in less value for Recall. To improve the Precision we need to use keywords with combinations. For example, if you use only මහවැලි or කැලණි, you will end up with songs that are not about the University.
Further, there is one song that contains five of these keywords. හන්තාන කඳුවැටිය song contains සරසවිය, මහවැලි, හන්තාන, සිව් වසරක, දෙසුම් හල keywords. There are 17 songs that contain one keyword 19 songs has two keywords.
What do you think about this analysis? Do you think we can consider any other keywords?