This story is a real story that happened back in 2008. I was working in Project for UK client whose prime business is producing advertisements.
Let me explain the business case first of all. They had few studios in the UK where they have artists working in these studios. They had teams and each team has one or many sales representatives. Sales representatives were the people who bridged between artists and their clients. Sales Reps goes to the clients and get their requirements. Then he comes to his team and explains the requirements. Then the artist comes up with different artwork. Then again busy sales reps go to the clients for their reviews. If he is lucky enough, the trip will end up where the client has selected one artwork with few reviews. However, until the client selects an artwork, the poor sales representative has to run over and over again.
Management has a problem to solve.
What are the qualifications of artist that makes the artwork to select?
By answering this question, management looking at fixing the issue in two ways.
1. Train the existing artists with the relevant qualification.
2. When hiring new artists, make sure they have these qualifications.
Then, I was one member to provide them with a solution with. Now, we required two sets of data, Production & HR which were developed by two different vendors with different technologies and not under any maintenance agreement. If I remember the numbers, there were 121 artists in the production database that needs to be matched in the HR system.
Obviously, the matching key would be Employee ID or Employee Number. Unfortunately, both systems did not have the same global key to match. Then we decided to match the artist with the Employee Name.
out of 121 artists, 101 artists did not give us any trouble as they were exactly matched. out of balance 20 artists, 17 of artist we were able to match with Fuzzy Lookup. This is due to the fact in of the system, there were spaces and dots in the names which were not in the other system. However, after spending some quality time with different configurations of Confidence and Similarity levels in Fuzzy Lookup Configurations, we were able to crack those 17 artists. Now we have three artists left where we could not match.
Further analysis shows that these artists were ladies! Any guesses??
Yes, most of you are correct. After their marriage, they have changed their names and in one system you have the option of changing the name in another system you do not have an option t change the name. (You know what, I have been telling this story for more than 12 years now, every time it is the male who provides the correct reason for this mismatch. Will leave that analysis for a different date).
Since we know the reason for the non-matching for these three artists, we tagged them to UNKNOWN category which is what is our typical technical approach in reporting or in data warehousing. So we thought we crack the Artists "problem" and we were over the moon. So in all of our fancy reports and dashboards, whether it is Sales Rep wise, Artist Groups wise, Studio wise, all of these reports shows three artists work under UNKNOWN category. During our presentation, we presented these reports and dashboards with our head is high, needless to say proudly. We never thought that our proud would be short-lived.
During the question time, Finance Manager who was a lady very quietly asked "guys, who are these unknowns". As you can imagine, I was ready with the answer. I had byhearted those ladies names, as this was expected question. I proudly said those names and the reason for not matching them. Further, I took the opportunity to explain our Fuzzy lookup techniques boasting our technical capabilities.
She first acknowledged our capabilities, "great work guys", but then came the killer questions.
"If you know these ladies names, how can you call them unknowns"
I was looking like a tiger without nails. Though they accept our project, that question has left me to thrive for a solution. I came up with a very simple solution. I used type 3 slowly changing dimension design. I introduced a simple table, where you have two columns with the previous name and current name.
So the final solution was
1. Perform a simple lookup
2. Perform a fuzzy lookup for non-matching names,
3. Lookup new table and find the previous name and perform a lookup.
Though we came up with a solution. This incident has changed our thinking ability. If you further analyse her question, business-wise we knew who these employees are, but technically we did not know them. In approach, we placed our technical knowledge over business knowledge. That was the clinical mistake we did. The lady with less technical knowledge was driven us for a technical solution to achieve success.
Finally, I leave you with this great saying.
"Whether you succeed or not is irrelevant, there is no such thing. Making your unknown known is the important thing"
Georgia O'Keeffe