Translate

Thursday, September 23, 2021

Recovering Deleted Data in SQL Server Databases


How many times you have come across unexpected data deletion in the production environment as looking for data costly tools, to recover your data. If you cannot recover your data, there can be situations where you will be thrown out of the business. 

How do you plan for these accidental or deliberate data deletions? Point in Time Recovery with SQL Server is the option that allows you to recover the deleted data. However, you need to better understanding SQL Server Recovery Models and Transaction Log Use in order to enable Point in Time Recovery. 

This is an important configuration that needs to be done and no point complaining later. 

Tuesday, September 21, 2021

Article : Building Ensemble Classifiers in Azure Machine Learning

A new article of the series, Building Ensemble Classifiers in Azure Machine Learning that discusses how to combine multiple classifiers. 

In ensemble Classifiers, we will look at how to perform predictions using multiple classification techniques so that it can produce better models with higher accuracy or they can avoid overfitting. This is equivalent to a patient that is referring multiple specialist doctors to diagnosis a disease rather than relies on one doctor.



The complete experiment can be found at Ensemble Classification | Azure AI Gallery 

Introduction to Azure Machine Learning using Azure ML Studio
Data Cleansing in Azure Machine Learning
Prediction in Azure Machine Learning
Feature Selection in Azure Machine Learning
Data Reduction Technique: Principal Component Analysis in Azure Machine Learning
Prediction with Regression in Azure Machine Learning
Prediction with Classification in Azure Machine Learning
Comparing models in Azure Machine Learning
Cross Validation in Azure Machine Learning
Clustering in Azure Machine Learning
Tune Model Hyperparameters for Azure Machine Learning models
Time Series Anomaly Detection in Azure Machine Learning
Designing Recommender Systems in Azure Machine Learning
Language Detection in Azure Machine Learning with basic Text Analytics Techniques
Azure Machine Learning: Named Entity Recognition in Text Analytics
Filter based Feature Selection in Text Analytics
Latent Dirichlet Allocation in Text Analytics
Recommender Systems for Customer Reviews
AutoML in Azure Machine Learning
AutoML in Azure Machine Learning for Regression and Time Series
Building Ensemble Classifiers in Azure Machine Learning


Thursday, September 16, 2021

Grouping the Flags - Image Processing using Orange


If you look at different flags of countries, you would think that some flags look similar. This post is to explain how Orange Data Mining Tool can be used in order to cluster images into groups. You can get the dataset and Orage Package in ImageProcessing-Orange (github.com).

This is a simple Data Mining Package, this will show that how easily you can perform image processing in Orange. 


Let us see how each cluster so that we can see how the grouping is done. 

The following does not have all the clusters as it has only the distinguished clusters. 

Cluster 2 


Cluster 3

Cluster 9


Cluster 10

Though this was done for fun, you can how the Orange tool can be used to determine clusters of images.

Monday, September 13, 2021

Top Business Intelligence Trends 2021

If you are in the world of data, it always needs to understand the current trends in the field of data. Top Business Intelligence Trends 2021 has the survey results for 2000 professionals. During these days of the Pandemic, trends will be changed as more and more organizations are moving towards the Data-Driven industry.  As you can see from the following figure, Master Data Management and Data Quality Management are the most important data trends. It seems that many users are fed up with master data is duplicated in many systems. 


Further, most of the users want a data-driven culture in their organization. However, Data catalogues, IoT data and Analysis do not have major importance to the users. 

After the data trends, let us look at the BI trends have changed in the last four years. 


Data Governance has a major increase while Advanced analytics and Machine Learning / AI has gone down in the last four years.  Data warehouse modernization is also has a noticeable increase in the last four years. Even though cloud for data & analytics has less importance, still it has a noticeable increase. 

Research Paper : A Language Modelling Approach to Authorship Identification for Online Examinations in Sinhala

With the Covid-19 outbreak, e-learning has become the ‘new normal’ with many universities and institutions adopting online platforms to deliver their programs. One aspect of this that has posed many challenges is in conducting written examinations. This is mainly because it has become increasingly difficult to verify the identity of individuals sitting for an examination remotely. The primary objective of this research is to address this problem by developing a Language Model that can be used in authorship identification for online examinations conducted in Sinhala. Essentially, the idea is that by training a language model solely on the writings of a given author, it is possible to determine the likelihood  (probability) of an entirely new piece of writing having been written by that author. It was found that a character-level language model can be used to identify the author of whose writings it was trained, using the concept of perplexity. 

Tuesday, September 7, 2021

Gartner Hype Cycle for Emerging Technologies, 2021

The hype cycle is a branded graphical presentation developed and used by the American research, advisory and information technology firm Gartner to represent the maturity, adoption, and social application of specific technologies. The hype cycle claims to provide a graphical and conceptual presentation of the maturity of emerging technologies through five phases. Learn about Understanding Gartner’s Hype Cycles 

The following is the Hyper Cycle for Emerging Technologies in 2021. 


On the Gartner hype cycle for emerging technologies, 2021, the three overarching trends that will drive organisations to explore emerging technologies such as Non-Fungible tokens (NFT), Sovereign cloud, data fabric, generative AI, and composable networks to help secure competitive advantage are engineering trust, accelerating growth, and sculpting change.



This radar includes three different overarching trends include:

  • Interfaces and Experiences include technologies that are fundamentally changing the way we interact with the world.
  • Business Enablers are technologies and trends that impact enterprises by changing practices, processes, methods, models and/or functions.
  • Productivity Revolution is driven by the confluence of multiple technologies and trends that has resulted in solutions that help organizations quickly, accurately, and, in greater volume, classify, predict, and solve problems that humans cannot.
Read the entire article at Gartner Blog Network

Sunday, September 5, 2021

NLP Case Study for University Related Songs


In Natual Language Processing, one of the important processes is to identify the context by keywords. Even in songs, we can use keywords to identify the context of the song. Though, there are some instances where music is also used to understand the context, most of the time it is the keywords that are used to identify the context of songs. 

Let us look at how to identify Sinhala songs that describe different aspects of University lives. It can be radical or love related. But the important question is what are keywords. Before that let us see what are those Sinhala Songs that are related to University Songs. There are two blog posts, this and this that contains some sets of Songs. Further, this youtube video also contains university songs.

Now, what are those keywords? If you analyse these songs, you can categorise those keywords to Similarly Terms (සරසවිය, තක්සලා, සිප් හල, වාසිටි), University Locations (හන්තාන, මහවැලි, රොබරෝසියා,බෙලිහුල් ඔය, දෙසුම් හල, කලාගාර, ලෙක්චර්, ජපුරේ,සරුංගලේ, කැලණි), University Activities (ශිල්ප, සිව් වසරක, නවක වදය, උපාධි ) 

We have identified 48 Sinhala Songs that are related to Sri Lankan Universities. (PN: If you know any other songs than this let me know). Following is the Analysis of these keywords.




Most of the songs contain the keyword සරසවිය. Next is හන්තාන which is a unique location of Univerity of Peradeniya. Further, මහවැලි and රොබරෝසියා are unique locations for the University of Peradeniya. However, when මහවැලි keyword exists there is always හන්තාන or සරසවිය but when රොබරෝසියා there only සරසවිය Keywords. This means out of 48 songs collected, 24 songs are from the University of Peradeniya. Meanwhile, there are three other songs on three other universities, the University of Jayawardenapura (හන්තාන නැතිමුත් ජපුරේ),  the University of Sabaragamuwa (සමනළ වැව පුරවන්නට) and the University of Kelaniya (කතාකරන්නට හැකි නම් බිත්ති වලට සරුංගලේ). 
The next important keyword is සිව් වසරක which tells you about the duration of the University Learning period. Except for one instance, (තනිවෙන්නට මගේ ලොවේ) this keyword comes with the general keyword (සරසවිය). 

Another important finding is that there are three songs that are Sinhala University-related songs but does not contain any keywords. Those songs are මේ උයන් තෙරේ, සැලේ මහද සැලේ සැලේ, ආදරේ මල් අතින් අර ගෙන. However, සැලේ මහද සැලේ සැලේ Song exists in the film called, හන්තානේ කතාව. In the NLP world, these will result in less value for Recall. To improve the Precision we need to use keywords with combinations. For example, if you use only මහවැලි or කැලණි, you will end up with songs that are not about the University. 
Further, there is one song that contains five of these keywords. හන්තාන කඳුවැටිය song contains සරසවිය, මහවැලි, හන්තාන, සිව් වසරක, දෙසුම් හල keywords. There are 17 songs that contain one keyword 19 songs has two keywords. 

What do you think about this analysis? Do you think we can consider any other keywords?