Translate

Saturday, February 27, 2021

Into Thin Air to Touching the Void

 

Source: https://www.amazon.co.uk/Into-Thin-Touching-Books-Collection/dp/9123820950

in 1988, British Mountain Climber named, Joe Simpson wrote a book named Touching the Void which has a story of a people who climbed the Andes from Peru. Though it has got reviews, it was not a selling success.  

After around another decade, the world heard a shocking incident at Everest Mountain where few people from two groups died during their climbing. After this tragedy,  Jon Krakauer wrote a book on the incident named Into Thin Air. Unlike the other book, Into Thin Air was a marketing success. 

With the help of the Amazon recommendation engine, readers of Into Thin Air, started to buy Touching the Void. Finally, Touching the Void has exceeds the sales amount of the Into Thin Air in short term. 

This example shows how the recommendation engine can help to generate revenue and bring products to customers who have never heard of it. 

The recommendation engine has a lot of options as shown in the below figure.



Tuesday, February 23, 2021

The First ever MCT Summit in Sri Lanka

I will be speaking at the first-ever MCT summit in Sri Lanka on Time Series Analytics in Azure Machine Learning. 

Friday, February 19, 2021

Reasons for Software Implementation Failures in Sri Lankan Organizations



This research paper may be older than a decade, but most of the reasons are still valid. A variety of software projects were considered for this research based on factors such as the business nature of the organization,   place of implementation,   type of software, number of users of software and investment for the software etc. To identify the reasons for software implementation failures,  input from both software professionals and end-users were gathered and analyzed in order to present valid reasons. It could be concluded from this research that a  majority of software projects in  Sri  Lanka fail at the implementation stage. The major reason for software implementation failures is found to be the difficulties in transition  (from old processes to new processes).  In fact, 60% of projects are suffering from the Transitions issues in Software Implementation. It was indicated that Transition is the major reason for failures when it comes to software implementations in Sri Lanka. The transition is more difficult in the Agriculture and Transportation sectors. 

Poor customer support,  negative user attitudes and lack of infrastructure are the next predominant reasons for failures in Software Implementations in Sri Lanka.  It was further identified that most of these software implementation failures occur in the agricultural sector and that small scale organizations do not follow a  robust product selection process leading to the failure of projects. 
Read the full research paper at Reasons for Software Implementation Failures in Sri Lankan Organizations. from  https://www.researchgate.net/publication/281230276_Reasons_for_Software_Implementation_Failures_in_Sri_Lankan_Organizations

 

Thursday, February 18, 2021

Sri Lanka MCT Summit 2021


I will be speaking at first ever MCT Summit in Sri Lanka. Time Series Analysis with Azure Machine Learning will be the topic of mine. The event will be on March 6-7,  5:00 PM-11:30 PM IST. If you are free , do register for the event at https://www.technetleaders.com/

Elastic Jobs in Azure SQL Databases



Yesterday it was the presentation at Sri Lanka Data Community User and today is the article on Elastic Jobs in Azure SQL Databases. 
Preview feature, Elastic Jobs can be utilized to execute the same execution at multiple Azure SQL databases in multiple servers parallelly. 
This article discusses the feature along with a case. Enjoy reading. 

KoBoToolBox: A Data Collection Tool



We all are looking at prospects of data scientists and we are looking at descriptive, diagnostic, and predictive analytics. However, before we do all these fancy stuff, we need to collect data. What are the tools that you use to collect data?
KoBoToolBox is a tool that can be used to collect data. This tool is used by varies NGO and following are some of those examples. 

JORDAN -  Testing the quality of services for Syrian refugees
TANZANIA - Monitoring food security
THE SEYCHELLES - Analysing a dengue fever emergency
NIGERIA - Tackling malnutrition
HAITI - Tracking vaccination campaigns
MADAGASCAR - Measuring vanilla yields
INDIA - Calculating climate change impact
NEW ZEALAND - Conducting large digitised surveys

What are the main features of KoBoToolBox?
  • Design forms quickly and easily using intuitive form builder
  • Reuse existing questions and blocks of questions and manage them in the question library
  • Build complex forms with skip logic and validation 
  • More than 20 different question types available including location, image, video, rating, matrix, etc.
  • Easily share projects with colleagues and set granular permission levels.
  • Import and export XLSForms and Import via URL or upload from your computer.
  • Online and Offline data collection
Reporting Options
  • Create summary reports with graphs and tables and fine-tune your report's charts, colours and questions
  • Visualize collected data on a map includes a heatmap, clustering, other base layers, etc.
  • Disaggregate data in reports and maps i.e. by gender, region or educational level
  • Export all your data at any time Supported formats: Excel, CSV, KML, ZIP (for media) and SPSS.

Sunday, February 14, 2021

Data Mining Techniques in Prevention and Diagnosis of Non Communicable Diseases



During the time of the pandemic, the entire world is sceptical about human health. Non Communicable diseases such as Diabetes Mellitus, Heart Disease, Hypertension, Cancer are troubling societies for a long time. The research was done to Prevent and Diagnosis Non-Communicable using data mining techniques.  This research was carried out using a data sample in Semi-Rural area in Sri Lanka. 
The major challenge in the health sector in rural, underdeveloped areas that the patients are not attending the medical clinics. These numbers are further high in male categories. We identified the challenge of getting the males to the medical clinics, so we used the spouse data to predict the other better half's health conditions. 
Logistic regression analysis,  Classification and  Regression  Tree  (CART),  decision tree,  Chi-squared  Automatic  Interaction  Detector  (CHAID),  exhaustive  CHAID,  and discriminant analysis techniques were used in this research. 
Read the research paper at ResearchGate

Thursday, February 11, 2021

Cheat Sheet for Recommender Systems

Recommender systems have become an important system in today's competitive world. Mainly you can utilize these types of systems to improve sales by target specific customer groups. In order to identify all the options in the Recommender system, the follow cheat system was developed.


 

Tuesday, February 9, 2021

Investigation and Development of Technology for Fuzzy Data Warehouse

Five-year research is completed on the topic Investigation and Development of Technology for Fuzzy Data Warehouse with the final presentation today. 

You can find the project presentation at researchgate which does not include the final theses. As defined, Data warehouse is a framework that permits the strategic management access to all organizational data towards strategic decision making for a competitive advantage over competitors. It covers comprehensive technology. 

When it comes to the data warehouse, it covers more technical aspects than data warehouse design as shown below. 

Source: [Han J., Kamber M., 2012] 

In modern days data warehouse is used to analysis but mostly crip values are used. For example, when it comes to age, we will define age groups such as Young, Middle, and Old. When the ranges are defined it will be an approximation which will lead to veracity in data.
Fuzzy logic can be used to handle the veracity aspects, so we have tried to include fuzzy logic to the data warehouse.  
The following are the research objectives set at initially. 
  1. Review current work on data warehousing, fuzzy data warehousing and fuzzy databases.
  2. Conduct a feasibility study to identify the domains and areas where the fuzzy data warehouse can be implemented.
  3. Introduce Data-Driven Technique to define Fuzzy Membership Functions for different scenarios different data warehouse technologies. 
  4. Implement Linguistic Analysis of Data warehousing using Fuzzy Techniques. 
  5. Design methodology for dimensions and fact tables in Fuzzy Data Warehouse. 
  6. Design other relative features of the data warehouse to support fuzzy modelling. 
  7. Define non-functional requirements in a fuzzy data warehouse.
  8. Provide Proof of concepts for fuzzy data warehouse implementation.
The heart of this research is to introduce a derive of fuzzy membership function which was the major drawback in the previous research. Different types of fuzzy membership functions were introduced as shown below. 


We have covered other features such as data warehouse design, ETL and OLAP Cubes with fuzzy logic using a real-world dataset of 2.6 millions of records. 

Multiple research papers were published as follows. 
  1. PPG Dinesh Asanka, Amal Shehan Perera, Design Strategy for Fuzzy Data Warehouses, 2nd International Conference on Innovative Research in Science, Technology & Management, National University Singapore, 29-30 September 2018.   
  2. PPG Dinesh Asanka, Amal Shehan Perera, Defining Fuzzy Membership Function for Fuzzy Data Warehouses, 4th I2CT IEEE Conference, SDMIT Ujire, Mangalore, India, October 2018.
  3. PPG Dinesh Asanka, Amal Shehan Perera, Linguistic Analytics in Data Warehouses Using Fuzzy Techniques, IEEE International Research Conference on Smart Computing and Systems Engineering – 20019, Department of Industrial Management, University of Kelaniya, 28th Match 2019.
  4. PPG Dinesh Asanka, Amal Shehan Perera, Feasibility of Fuzzy Data Warehouse, International Journal of Research in Computer Applications and Robotics, ISSN 2320-7345, Vol. 5 Issue 11, November 2017.
  5. PPG Dinesh Asanka, Amal Shehan Perera, Defining Fuzzy Membership Function Using Box Plot, International Journal of Research in Computer Applications and Robotics, ISSN 2320-7345, Vol. 5 Issue 9, September 2017.

Thursday, February 4, 2021

Presentation on Elastic Jobs in Azure SQL Databases

I will be speaking at Sri Lankan Data Community February 2021 Online Meetup on  Elastic Jobs in Azure SQL Databases. 

This feature allows you to run scheduled tasks in your Azure SQL Databases. This is similar to SQL Server Agent you have in the On-prem SQL Server versions. However, in Elastic Jobs, you can execute the scheduled tasks in multiple Azure SQL Servers and multiple Databases which is an added advantage when considering the features of SQL Server Agent. Further, this execution performs parallelly.

Register at https://www.meetup.com/en-AU/sldatacommunity/events/276148796/


 

Tuesday, February 2, 2021

Dynamic Data Masking in SQL Server


Data Masking is an important aspect of data security. Though, it is not as strong as data encryption. it will provide some sort of security. The latest article on Data masking discusses the data masking in SQL Server as well as in SQL Azure. 

Please find the latest article at SQLShack in this link

Monday, February 1, 2021

Monitoring Long Running Transactions in TempDB

TempDB database plays a major role in SQL Server. Therefore, it is extremely important to monitor the health of the TempDB database. One of the major challenges in TempDB is maintaining it's log file. If there are transactions that use the TempDB and if those are long-running transactions, there can be situations where the log file will grow. Since these transactions are not closing, log space will not be returned and the entire server will not be able to run queries that use the TempDB. 

Recently, one of the Clients had a similar problem. One query was running for more than four days and it had consumed TempDB log file. This has caused empty disk space and the entire server is halted for operations.

In this situation, the easiest and laziest thing to do is the restart the server. Restart will kill all the transactions and return TempDB back to the original size. This is not something that you can do for a system of 24x7. 

However, we choose not to restart but to identify the long-running query from the following simple query.

SELECT  se_tr.session_id,

sec.login_name,

trn.database_transaction_begin_lsn,

trn.database_transaction_begin_time,

trn.database_transaction_log_record_count,

 trn.database_transaction_log_bytes_used,

 trn.database_transaction_log_bytes_reserved,

 t.text,

 q.query_plan

FROM sys.dm_tran_database_transactions trn

INNER JOIN sys.dm_tran_session_transactions se_tr ON trn.transaction_id = se_tr.transaction_id

INNER JOIN sys.dm_exec_sessions sec ON se_tr.session_id = sec.session_id

INNER JOIN sys.dm_exec_connections con ON con.session_id = sec.session_id

LEFT OUTER JOIN sys.dm_exec_requests req ON req.session_id = sec.session_id

CROSS APPLY sys.dm_exec_sql_text  (con.most_recent_sql_handle) t

OUTER APPLY sys.dm_exec_query_plan (req.plan_handle) q

WHERE trn.database_id =DB_ID('TempDB') 

This gave the option to identify the long running query and we killed the relevent session. With that, TempDB log file was emptied and by shrinking the tempdb log file, we were able to gain the disk space. 

Further, we took a pro-active decision by enabling an alert, so that if a query runs for more than 8 hrs (configurable) that will be altered the DBA so that he can kill the session straightway.