Translate
Thursday, December 31, 2020
Templates in Draw.IO
Tuesday, December 29, 2020
Power BI End-To-End Features
Power BI is a Business analytics framework from Microsoft who is the leader in BI for the last 12 years according to the Gartner. Since we have a dynamic environment, we would love to know the capabilities and integration with this tool.
The following link will take you to the end-to-end features of Power BI.
Wednesday, December 23, 2020
Creating your First Azure SQL Database
As the cloud has become something that you cannot avoid in the current technology race, it is important to understand what are the options you have to create a database in the Azure platform. As shown in the below image, there are different architectural options.
Wednesday, December 16, 2020
Defining Fuzzy Membership Function Using Box Plot
The membership function is the key component in fuzzy techniques. When fuzzy techniques are extended to the data warehouse, so that we can make decisions using fuzzy techniques in a data warehouse, it was identified that in the many implementations does no have the data-driven techniques to define fuzzy membership function.
In this research paper, which is a research project on Investigation and Development for Fuzzy Data Warehouse, we have used the famous Box Plot technique to derive a fuzzy function. In this technique, we have mapped the fuzzy function parameters to the Box Plot parameters as shown below.
Saturday, December 12, 2020
Data Warehouse in SQL Server
Data Warehouse is a comprehensive technology that provides the key people within an enterprise with access to any level of the required information within the enterprise. It is an enterprise-wide framework that permits the management of all enterprise information.
Let us see how we can utilise Microsoft technologies at varies stages of the Data Warehouse technologies.
Friday, December 11, 2020
RDBMS -> NoSQL -> NewSQL
Thursday, December 10, 2020
Customized Transaction Log Backups
Transaction Log backups are important in a Production environment. It will make sure that you manage your log file size and keeping backups in case of a need to restore.
I am pretty much sure, most of you have scheduled transaction log backups. If you have scheduled Transaction log backups every 15 minutes, then you will see four log backups every hour and will result in nearly 100 backup files a day and you are looking at around 700 log backups per day. Unlike differential backups, you need all your lob backups to recover. Sometimes, you might have less or no transactions but still, there will be a log backup.
Now the question is, Can we create transaction log backup when there is sufficient size. Yes, you can if you are running SQL Server 2017 or later.
In sys.dm_db_log_stats Dynamic Management Function (DMF), there is a new column called log_since_last_log_backup_mb tells you what is the log file size after the last log backup.
Using the following script, you can perform transaction log backups when the log file size is more than a specific size.
DECLARE @log_since_last_log_backup_mb NUMERIC(9, 2) DECLARE @ThreasholdSize INT = 25 DECLARE @folderName VARCHAR(30) = 'D:\DBBACKUP' DECLARE @DatabaseName VARCHAR(30) = 'LB1' SELECT @log_since_last_log_backup_mb = log_since_last_log_backup_mb FROM sys.dm_db_log_stats(db_id(@DatabaseName)) IF @log_since_last_log_backup_mb > @ThreasholdSize BEGIN DECLARE @fileName NVARCHAR(400) = @folderName + '\' +
@DatabaseName + SUBSTRING(REPLACE(CONVERT(VARCHAR, GETDATE(), 111), '/', '')
+ REPLACE(CONVERT(VARCHAR, GETDATE(), 108), ':', ''), 0, 13) + '.bak' BACKUP LOG [LB1] TO DISK = @fileName WITH NOFORMAT ,NOINIT ,SKIP ,NOREWIND ,NOUNLOAD ,STATS = 10 END ELSE PRINT 'No BACKUP'
Monday, December 7, 2020
Technology Initiatives
Friday, December 4, 2020
Database Design and Modeling with PostgreSQL
Wednesday, December 2, 2020
Epidemic Mathematical Model
Tuesday, December 1, 2020
Hierarchies for Data Analytics in SSAS
Monday, November 30, 2020
Sri Lanka Qualifications Framework (SLQF) for Higher Education
There are various types of courses available, BSc, Postgraduate Diploma, BA, MSc, MA, MBA, MDA, MPhil, PhD etc. Many of us don't know how these courses are ordered and unaware that there is a framework for these qualifications.
In 2013, the Ministry of Higer Education with the funding from Worldbank defined the SLQF for higher education in Sri Lanka. In 2015 this was updated with the world standards,
The following is the SLQF in summary.
Sunday, November 29, 2020
Linguistic Analytics in Data Warehouse Using Fuzzy Techniques
Thursday, November 26, 2020
Troubleshooting using Wait Stats in SQL Server
Wednesday, November 25, 2020
Different Types of Clustering Techniques
Who are the best players in Meeting Solutions?
During this pandemic times, meeting solutions are playing a huge role by keeping the professionals, teachers, students at home and still being able to help their work and study whether it is IT or Non-IT.
You might have your own favourite tool to communicate between your teams and groups, but who is the best among all. Let us hear from the Gartner for their opinion on these Meeting Solutions. They have come up with their traditional magic quadrants for Unified Communications as a Service and Meeting Solutions released in 2020 November.
Tuesday, November 24, 2020
Image Classification in Orange
We have discussed How Orange tool can be used for Image Clustering. Now let us look at how we can perform Classification in Orange.
Like before, we let us select an image set which in classified folders. Those folder names will be taken as the classify names.
Following are the set of images that were used for the Image Classification.
Monday, November 23, 2020
Model Comparison in Azure Machine Learning
We are building models in Machine Learning. How do you know these models are correct. What are the accuracy levels of these models? As we know there are a lot of parameters to verify. In the case of Classification, we use Recall, Precision, F1 measure are the most common evaluation methods apart from accuracy. In this article, it provides how can we compare models that were built in Azure Machine Learning.
Following are the other articles in the series.
Saturday, November 21, 2020
IPL and Overfitting
After a long wait, Indian Premier League was started and completed for the thirteenth time with Mumbai Indian declared as the undisputed winner.
We discussed about overfitting by using USA and Sri Lankan President elections and this post is to understand overfitting from a different perspective. Overfitting is your machine learning or predicting model is too accurate. You might think that how come high accuracy is a problem as we always try to increase the accuracy of reciting models. Overfitting can occur due to two reasons.
- Too little data - When you have too little data, it is very likely that the model has high accuracy but it can go wrong with the next data point. Further, with many combinations, you will be making your data less.
- When unnecessary data is collected - We can predict using uncorrelated data, For example, No US president is left-handed who is divorced or No Sri Lankan President is re-elected who has moustache are classical examples for over fittings.
Now let us look at Overfitting criterias in IPL for the champions over the last thirteen years.
Year | 2008 | 2009 | 2010 |
Conclusion | No team has won an IPL Title | No 4th placed team in the preliminary round has won the IPL title | No team who is lead by an Indian player has won the Championship. Man of the series never loos the final |
Event | Until Rajasthan Royals won the Championship | Until Decan Charges won the Championship who were ranked 4th in the preliminary round. | Until Dhoni's Chennai Super Kings won the Championship. Previously this was won by the teams captained by Shane Warne and Adam Gilchrist. In previous editions, Shane Watson (RR) and Adam Glischist (DC) were the Man of the series who were from the winning team. in 2010, Sachin Tendulkar was the man of series who is not part of Chennai Super Kings. |
Reason | If predicting with less data is bad, what about predicting with no data. | Predicting with less data is bad. | Tough the captain plays a huge role in winning, it does not make any sense to him become an Indian or Foreigner. |
Year | 2011 | 2012 | 2013 |
Conclusion | No team has won the championship in continuously | 4th ranked team in the preliminary round was never beaten in the final | Chennai Super Kings were never beaten twice in finals. MI never beaten CSK in a final |
Event | Chennai Super Kings won the championship in 2010 and 2011 became the first team to won the championship being the defending champion. | Kolkata Knight Riders won the championship by beating the Chennai Super Kings who were 4th in the Preliminary round. | in 2012 and 2013 CSK were beaten in the finals by the KKR and MI respectively. MI and CSK has met in 2010 before in the final where CSK became the winner. |
Reason | With four data points, what you can predict is very limited. Even though historically, your accuracy is 100%. | Again this prediction is correct till 2012, only one time 4th ranked made it final of the IPL championship. | When you are predicting with combinations of events, it is obvious that your accuracy will be very high as there are the only handful of events. |
Year | 2014 | 2015 | 2016 |
Conclusion | Most runs player team has never won the IPL title | Chennai Super Kings were never beaten twice in finals. MI was never beaten CSK in a final | The third-ranked team in the preliminary stage has never beaten the second-ranked team in the final. |
Event | Robin Uthappa of KKR was the highest scorer in the tournament and was a member of eventual winners KKR. In three of the previous occasions, highest scorer represented runner up team but no the champions. | in 2012 and 2013 CSK were beaten in the finals by the KKR and MI respectively. MI and CSK has met in 2010 before in the final where CSK became the winner. | Sunrisers Hyderabad beat Royal challengers Banglore int he final which is the first time third-ranked team beat the second ranked team in the final. in 2010 CSK who was the third-ranked team who became the champions by beating the ranked one team. |
Reason | Again only six events before thus very fewer data do not make very good in predictions. | When you are predicting with combinations of events, it is obvious that your accuracy will be very high as there are the only handful of events. | Two many combinations in fewer data should not be used for predictions. |
Year | 2017 | 2018 | 2019 |
Conclusion | Every MI win resulted in Highest Wicket taker is an Indian bowler. Number one ranked team has never beaten the second team in the final. | CSK has never won by Chasing | MI has never beaten CSK when MI was ranked 1 |
Event | In 2013 and 2015 MI has won the championship. in both those years, highest wickets take was Bravo from West Indies. In 2017 Bhuwaneswar Kumar was the highest wicket-taker who is an Indian. | in 2010 and 2011 CSK won but by defending. This is the first time that they have won the championship by chasing. | MI became champions in 2017 when they were ranked first in the preliminary stage. However, when they beat CSK in 2013 they were the second-ranked team. in 2019 they were ranked first who was able to defeat ranked two CSK. |
Reason | Well, this is a combination of data and uncorrelated data. | Again, by combining the team and method, you are reducing the data. | Again, by combining the multiple teams and ranking, you are reducing the data. |
Year | 2020 | 2021 | 2022 |
Conclusion | MI has never won in even year. MI hs never won chasing. The highest number of Six hitter was never in the championship-winning team. Fair Play team never has won the Title. | ||
Event | MI has won the championship in 2013, 2015, 2017 and 2019 and this is the first time that they have won while chasing. Ishan Kishan (MI) who was the highest six-hitter and became the first time that six-hitter was part of the tournament champs. Fair Play award initiated in 2012 and this is the first time that Fair Play ward winning has won the championship. | ||
Reason | Though these are somewhat valid prediction, due to fewer data points accuracy is 100%. |