Translate

Monday, November 22, 2021

Most Popular Software Programming Languages In Pictures

If you are a developer you might be wondering what is the best language for application development. Let us look at it from some pictures. 


Obviously, JAVA is the winner and you can see C, C++ is also in the higher ranking. However, not sure the reason for the higher ranking on Visual Basic .NET over C#.
Following is the ranking from IEEE. 


JAVA is the non-dispute leader in this ranking as well and it too has the same ranking as previous. The notable observation is the growth of R language over the years. It has jumped to rank 6 from 9. 

Another important parameter is the Salary and Job opening.


Though JAVA has a lot of openings than other programming languages, the average salary is higher in other languages such as Python, C++, Ruby etc.

Finally, let us compare the different aspects of programming languages. 

Thursday, November 18, 2021

Cluster Validation - Purity Calculation

As we know, clustering is an unsupervised technique. When it comes to classification, there are a lot of evaluation techniques such as Precision, Recall, F1, MCC etc. However, what are the techniques that can be used to evaluate clustering techniques. Purity calculation is one of the simplest calculations to evaluate your clusters.

In the Purity cluster quality measure, we will analyse the cluster distribution with respect to a selected variable. Let us look at how to calculate Purity in a Text Clustering using Orange and the following is the Orage flow. 


Further, you can get the Orange flow from Github
First, let us look at how the Purity is calculated. 
Let us assume that following are the clusters and data distribution.


In each cluster, the maximum number of objects that are falling to each cluster is calculated. For example, in Cluster 1, X has three instances while Cluster 2 has three instances of O and Cluster 3 has four L instances. Those numbers are added up and divided by the total number of instances which is 16. 

Let us look at this example with our popular film review dataset.

After the text Preprocessing, the Loving Clustering technique is used. Following is the cluster distribution with respect to the review classification.

So the Purity is (190 + 193 + 158 + 123+ 136 + 112+124+102 +11) / 2000. Ideally, this should be close to 1 meanwhile in the case of multi-class we can calculate the Purity with a Minimum value which should be close to 0. 
Entropy is another calculation that is performed to measure the Cluster Quality which we will leave for another day.

Saturday, November 13, 2021

Hasan Ali & Tweets


As they say in Cricket, "Catches win matches". It will be more relevant when you missed a catch in the WorldCup semi-final.  During the T20I world cup when Hasan Ali dropped the catch, the match turned to head to tail. As cricket is a great game of uncertainty, the crowd don't believe in that. After the dropped catch, there were a lot of allegations against Hasan Ali. It went to an extent that his wife and his religion also are part of these allegations. 
Let us analyse tweets against Hasan Ali using Tweet Sentiment Visualization App

Though there were a lot of hate comments against Hasan Ali on Facebook, Instagram etc, Twitter users are seems to be more professional as we see a lot of positive tweets against him. Some tweets wishing him success as well. 

When you look at the topics, catch, stay strong are the common topics. 
Then let us look at the Tag Cloud in different quadrants. 







Friday, November 5, 2021

Microsoft SQL Server 2022

After three years, Microsoft is gearing up to release its next version of its flagship database product Microsoft SQL Server which is 2022. As for every new release, obvious question us what are the new features.


You can get more details from the following references.

Announcing SQL Server 2022 preview: Azure-enabled with continued performance and security innovation - Microsoft SQL Server Blog

SQL Server 2022 | Microsoft

What's new in SQL Server 2022 - YouTube

PASS Data Community Summit November 8-12 2021 

SQL Server 2022 integrates with Azure Synapse Link and Azure Purview which will enable its users to drive more insights, predictions, and governance from their data at a higher scale. Cloud integration is enhanced with disaster recovery (DR) to Azure SQL Managed Instance, along with no-ETL (extract, transform, and load) connections to cloud analytics, which allow database administrators to manage their data estates with greater flexibility and minimal impact to the end-user. Performance and scalability are automatically enhanced via built-in query intelligence. There is choice and flexibility across languages and platforms, including Linux, Windows, and Kubernetes.

Thursday, November 4, 2021

Article: Use Replication to improve the ETL process in SQL Server



As we have discussed in many articles, ETL is one of the challenging tasks in a Data Warehouse. It is important to extract data from data sources without impacting the performance of the data sources. in SQL Server, replication can be used to safeguard the performance of data sources during the ETL. Read this article Use Replication to improve the ETL process in SQL Server.