Roman's data science

Analysis paralysis in Data Science

Haste leads to waste. All of the biggest mistakes I have ever made can be explained by the fact that I was in a hurry. For example, 15 years ago I was hired by Ozon.ru (10 bln $ on NASDAQ now) to create an analytical division. Part of my job was to compile a huge weekly summary of the company's activities, and I had no opportunity to check the information regularly. The pressure from above, as well as the lack of time, meant that the weekly reports were full of mistakes, which took a lot of time and effort to correct.

Today's world moves at breakneck speed. But compiling metrics requires careful attention, and that takes time. Of course, this does not mean that the reverse situation - "analysis paralysis," when an inordinate amount of time is spent on every figure on the sheet - is also desirable. Sometimes the desire to make the right choice leads to precisely this state of analysis paralysis, where it becomes impossible to make any decision. The uncertainty about where the decision will lead is inordinately high, or the framework for making a decision is too rigid. An easy way to fall into analysis paralysis is to approach decision making in a purely rational, logic-only way.

Graham Simsion's book The Rosie Project illustrates this idea perfectly. The novel tells the story of Don Tillman, a successful young genetics professor who really wants to find a wife, but has never gotten beyond a first date. So he comes to the conclusion that the traditional way of finding the other half is ineffective, and decides to take a scientific approach to the problem. The first phase of his so-called "wife project" is a detailed 30-page questionnaire designed to screen out unsuitable candidates and identify the ideal couple. Obviously, no one can meet such a list of requirements. Don is then introduced to a girl who does not exhibit any of the traits of his ideal woman. You won't guess who Don ends up dating.

I can give you an example from my professional life that also has to do with hypotheses or, more accurately, tests. Imagine that you come up with a new recommendation algorithm that you think is better than the existing one, and you want to test it. So, you compare the two algorithms on ten sites. Four times your new algorithm is better, two times it's worse, and four times there's nothing in common between them. Will you abandon the old algorithm in favor of the new one? It all depends on what criteria you developed to make a decision before comparing the two algorithms. Was the new algorithm supposed to be better in all cases? Or only in most cases? In the first case, you are likely to bury yourself in endless iterations of the algorithm, honing it to perfection, especially given the fact that the tests will take several weeks to run. This is a classic case of "analysis paralysis". As for the second case, the task seems much simpler. But practice tells us otherwise.

I believe that there must always be an element of informed risk in decision making; we must be able to make a decision even if we do not have all the information. These days we can't afford to take the time to think about decisions. The world moves too fast for us to afford that luxury. If you don't make a decision, someone else-a competitor-will make it for you.

This post has been taken from "Roman's Data Science" book.

Roman Zykov

2021-09-23 14:51 Posts