Translate

Thursday, April 29, 2021

How to Build Recommender Systems with Customer Review Data

Text Analytics is a complex and challenging task due to the fact that it has a variety of data.  In a previous blog post, we looked at how recommender systems were used by Amazon to improve sales of one book
This Azure Machine Learning Experiment shows to build Recommender systems for customer Review Data. 

References for Datasets
https://jmcauley.ucsd.edu/data/amazon/
Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering R. He, J. McAuley WWW, 2016 pdf

Image-based recommendations on styles and substitutes J. McAuley, C. Targett, J. Shi, A. van den Hengel SIGIR, 2015 pdf


Wednesday, April 28, 2021

Things you Should Avoid when Designing a Data Warehouse

We always look at the practices that we follow during the data warehouse design. However, we don't look at what are the things we should avoid during the data warehouse design. This article is looking at those points. 

We looked at a few of the data warehouse concepts before in Microsoft Technologies. First of all, you need to look at the infrastructure planning for a data warehouse. During the data warehouse design, it is important to include surrogate keys to dimension tablesDate dimensions is a special dimension that is used in data warehouse modelling. Historical data is an important aspect in a data warehouse that is used in Slowly Changing Dimensions (SCD).

Sunday, April 18, 2021

Latent Dirichlet Allocation in Azure Machine Learning

Latent Dirichlet Allocation (LDA) is a Topic modelling technique that used in Text Mining. This technique was first introduced in 2003 by this research paper. 

LDA can be achieved in the Azure Machine Learning platform as it has specific LDA control. This is the experiment that was created in the Gallery with more than 50 Azure Machine Learning controls. 

Thursday, April 15, 2021

Recognizing Hollywood Beauties when they are without Makeup

Less than one month ago, by using Orange Data Mining Tool, we looked at how to recognize Bollywood beauties when they are not under their fancy makeup. 
Let us do the same exercise with Hollywood Beauties. 
These are stars that we are going to match.


These are the image of them when they are without makeup.


We have used the same Package in Orange which is shown in the below figure. 


In the Image Embedding, we have used OpenFace embedder and the Cosine Distance in the Neighbours. Four images were ignored in the Embedding. 
Following are perfect matches of those actresses with and no makeups. 


Out of 26 images, 8 of them were matched with the first image and many of them were matched when they were matched to 3rd neighbour. 

Tuesday, April 13, 2021

Designing Recommender Systems in Azure Machine Learning

Recommender systems are one of the very common applications in Machine Learning. We have previously discussed the cheat sheet for the recommender system in a previous blog post. Further, we discussed how Amazon has used a recommender system to increase sales of published books.



This is the 13th article in the Azure Machine Learning series which is on the Recommender Systems. In this article, the Hybrid recommender technique called MatchBox Recommender is used. In this article, Live data from Azure SQL Database was used in order to showcase the features of Azure Machine Learning.

The previous articles are listed below. 

  

Friday, April 9, 2021

Data Mining Templates in RapidMiner 9.9

Templates are very handy so that you just configure few environmental parameters and you are set to execute the Predictive Analytics Template. 

The Following are some of those templates in RapidMiner. 


Download Rapid Miner from https://rapidminer.com/get-started/ and start using these templates. 

Thursday, April 8, 2021

Multi Language Support for SSAS


When you are asked to define the term data warehouse, You will say that it is a framework that will be used to analyse enterprise-level data If it is a framework, data warehouse user should have the option of using the data warehouse irrespective of the language that he is familiar with. 

This new article brings you how to analyse your data using multiple languages. This article explains, what are the modifications that you need to do for the SQL Server stack to incorporate multi-language into the data warehouse. 

There are multiple articles on SQL Server Analysis Service (SSAS). 

Sunday, April 4, 2021

Extra Delivery, Free Hit and IPL

Less than a week away from the Universe Cricket Carnival, IPL 2021. This post is not to do any prediction analytics as Cricket is the great game of uncertainties. 


A Research was carried in in 2014 to find out the outcome of the extra deliveries in Cricket. As there is a common belief when there No ball or wide ball is delivered, extra delivery will cost heavily for the fielding team. This research was done in order to validate the claim and with the above research, the claim was rejected. However, since the data was captured for matches played at different venues and times, there were questions of the research credibility. 
New validation research will be done on IPL 2021 data. 
These are the guidelines for the data collection. 
1. Full completed innings are considered. In the case of rain, the number of overs will be reduced and that will have a different impact hence those matches/innings are ignored. 
2. If there are two extra deliveries in one over that over will not be considered for extra delivery analysis but will be considered for the free hit analysis. 
3. Half completed overs will not be considered for Extra Delivery Analysis and will be considered for the Free hit. 

Parameters to Consider
There are many parameters to be considered. Name of the Bowler, Team of the Bowler, Name of the Batsmen, Team of the Batsmen, Runs scored by the Batsmen at time of facing the delivery, Number of Deliveries faced by the Batsmen at the time of facing the delivery, Ball Number, Runs in the Previous Over, Runs in After Over, Runs of the overs, Is Batsmen Changed, Score in Free Hit, Scores in the Extra Delivery, Power Play Partnership, Partnership Runs, Partnership Deliveries Wicket in Same Over, Over Number, Ground, T20 Stas of the Bowler and the Batsmen data will be collected. 
Still, the model is at the development stage and will be released next week. Any other data to be considered? Let me know.