Why Machine Learning Projects Fail

Long before the term “data science” became a resume item worth goodly sums, before “ML” and “AI” became permanent fixtures on the “must-have” lists of senior management and VC’s alike, there was simply data. Consultants toiled away in Microsoft Access databases, exported data begat cuts, pivots, and charts at the behest of managers, and ultimately, there were innumerable PowerPoint decks. In statistics departments, students slogged their way through regressions. In applied mathematics departments, students spent sleepless nights performing matrix operations, learning the simplex algorithm for linear optimization like those in a seminary might study scripture.

And then some guys in the Bay Area wrote some better search algorithms, a kid in a Harvard dorm found some portraits of classmates, a bunch of people made gazillions of dollars and the world changed forever. Fine, there’s a little more to the story. Ok, maybe a lot more. Possibly, I’m under-caffeinated and a little snarky. Nonetheless, suddenly, every for-profit, non-profit, or religious-prophet wants ML in their business, to become AI-enabled, and predict user behavior, conversion rates, and the time of the impending rapture. We are all millenarians now. We all believe in the transformative power of the next great model, the next great algorithm, the next predictive tool. Our fervor borders on religion, our messianic impulses become our downfall.

While algorithms, to some, offer similar elegance and beauty, they are not a proper vessel for prayer or other divine expectations. "ML is math, not magic"

Here’s how ML goes awry:

1) The existing solution leaves little room for improvement

Believe me, I wish I could make this argument in the intimidating baritone of a comic hero...

Before the time, energy, and expense of ML is to be considered, first consider the BATMAN (Best Alternative To ML, AI, or NNs). Remember that all that time, energy, and expense had better be better than BATMAN (and most things just aren’t better than BATMAN). Circa 2010, a graduate student friend of mine was attempting to develop a machine learning model to open and close sluice gates beneath the city of Chicago in response to meteorological events. She aspired to avoid combined sewer overflows. Good use of ML? Well, BATMAN in this case took the form of a couple savvy engineers beneath Gotham (er, Chicago). They’d been keeping the city safe from such overflows for the better part of thirty years. They earned comparatively modest municipal salaries, and were extremely adept at executing their duties, so the problem her modeling aspired to prevent was rare in the first place. For my grad student friend to publish papers, complete her doctorate, and get the hell out of dodge (Champaign, IL), she needed to beat BATMAN. It took her eight years...and one might argue that BATMAN was still superior.

AE might not always surpass the capacities of the caped crusader. However, we’ll get to know BATMAN early in the process, and determine if we dare tussle with the man in the mask.

2) The problem and its constraints are not well defined

ML projects fail when some well-paid scientist delivers a model that addresses the topic with which they are tasked, but fails to solve the larger problem senior management envisioned at the grown-ups table. Jane Q Data Scientist was told to predict benthic hypoxia1 in Corpus Christi Bay. Jane wrote an excellent model, integrating cutting edge science, partial differential equations, and all the exciting Matlab code that money could buy circa 2006. Jane solved the problem she was asked to solve...except that the real “problem” was that researching benthic hypoxia in real-time means deploying portable sensors to the location when the phenomenon occurs. This means predicting the phenomenon, sure, but it also means mobilizing a bunch of boats carrying professors, sensors, and perhaps grad students (if they misbehave, perhaps not). The decision to launch the boats at 6am the following morning was made at around 3pm in the lab...Jane Q’s model required over two days to calibrate and calculate (it was 2006, we didn’t have AWS, ok?). Jane’s tool, however brilliant, delivered predictions for what was ultimately yesterday’s bay. What I learned in 2006, and what AE practices today with modern tools, is that a simple model that makes an accurate prediction with real, accessible data trumps the perfect model on some ethereal plane we mortals shan't ever visit.

3) There are barriers to implementation

You were expecting, perhaps, another comic-book character? I love birds, what can I tell you?

Even if you beat BATMAN, there’s still ROBIN (Restrictions in Operation Blocking Implementation Now). Circa 2019, a team on which I worked was tasked with exploring a database of user profiles, determining duplications, and merging as needed. Over the years, millions of profiles had been accumulated, many of which were almost certainly duplicative, as evidenced by the same name, overlapping job histories, etc. Of course, there might also really be two cardiologists named John Smith in New York. Redundancy is wasteful, leads to misdirected outreach, etc. Merging two distinct profiles into one is assuredly worse (the loss of at least one user, the liability of assuming something about a person that ain’t true, etc). We built a model. The model identified a few hundred thousand pairs of duplications (remember, if a profile has been created 10 times, there are 45 duplicate pairs2). The potential false positive rate (merging two different people) was literally less than one in 100,000.

Legal wouldn’t sign off on merges of any pair of individuals in which meaningful activity had occurred on either profile. Months of work went to waste because the relevant decision-makers were not consulted when the project began. Apparently, one should talk to lawyers before undertaking such initiatives. At AE, we have assess plausibility of implementation on (or before!) day 1.

4) The model provides information no one wants

Once upon a time, I built a model, which predicted that an important client account will actually cost more money to service3 than it will generate in revenue. Wanna tell the salesperson who just closed that seven-figure account to cut ‘em loose? Wanna tell the regional head whose bonus is revenue-based? How about a model that reveals a long-standing and somewhat unalterable strategy yields a horrific, self-defeating fate? Be careful what you model for. There are worse fates than standing before the C-level describing the output of a model and receiving cynical inquiry that reveals erroneous assumptions on your part. Receiving the same poking and prodding when both parties are 100% convinced that your math is correct, but the folks in power have no idea whatsoever what to do about it is far worse.

5) The model provides information everyone wants

The expected exultation from a profusion of digital insight...this ain't what really happens.

No good data science deed goes unpunished. Once, as a consultant, I automated a process, which grew into a model, which generated some interesting outputs. Suddenly, every division in the business wanted the insight. Each wanted their own bespoke version of the output. Before long, fielding requests and addressing edge cases (that were neither part nor parcel of the original design) became a full time job4. Eventually, Dr. Highly-Trained data scientist is pulling data, answering one-off requests, and struggling to find the extra hours to systematize, automate, and standardize. Then she quits. And the next one lacks the relevant domain knowledge, reinvents the wheel, the stakeholders are irritated, and fingers are pointed.

At AE, we scope. We adjust scopes as needed, but we recognize that sometimes, heaven-forbid, there are limitations to what can be accomplished in a finite number of hours. We have the chutzpah to tell our clients “no.” Would you really prefer the folks who only say yes, underdeliver, and burn out their people in so doing?

6) Only human beings are allowed to be wrong

Sometimes, beating BATMAN isn’t good enough. No amount of objective reasoning addresses the visceral disdain for a model’s error, especially if the model makes an error “no human being would ever make.” Human beings fail constantly. We drive cars recklessly, injuring drivers, pedestrians, and woodland creatures. Risks are everywhere. But BATMAN’s defeat demands more than an algorithm that brings less harm to drivers, pedestrians, and furry things. If the algorithm causes an accident that we presume that a human would have avoided, the floodgates will open. In radiography departments, fallible humans make judgments. Of course, if an algorithm misses that which a human might have seen, the backlash (read: liability risk) could be horrifying. When friends would inquire about a baseball model’s prediction on a given evening, somehow, a model-driven error elicits a narrative-driven gripe, “why didn’t it know that John Q WildThing struggles against left-handed hitters with facial hair?” When John Q Wagerer is wrong, no such recriminations are possible. Those building models had better recognize and prepare for the venom such errors will receive.

Fine, this is a left-handed pitcher with facial hair, but he is a "wild thing," and he is the architect of one of the more traumatic moments of my childhood in October of 1993. No model needed.

But, if you still want to take on BATMAN and ROBIN, develop a few new acronyms along the way, and hear some war stories about ground-balls-through-legs at inopportune moments, we’d love to talk to you!

1

“Benthic” means the bottom-layer of a body of water. “Hypoxia” means a lack of oxygen. These two words, in concert, mean dead fish and irritated humans whose economic livelihoods require fish to become dead only after being fished.

2

10 choose 2, which is 10! / (8! * 2!), or 10 * 9 / 2 = 45. You didn’t think there’d be an entire blog post without any combinatorics, did you?

3

Large clients have this annoying habit of requiring human attention, additional perks, preferred resource allocations, etc...and either you draw some boundaries in the contract or, well, … you can guess.

4

Actually, “full-time” would have been a marked improvement. Yet, nowhere in the corporate lexicon is a term along the lines of “hyper-time” or “sleep-time-too” job. Not as catchy as “BATMAN” or “ROBIN,” but definitely worth the attention of intrepid wordsmiths.

No one works with an agency just because they have a clever blog. To work with my colleagues, who spend their days developing software that turns your MVP into an IPO, rather than writing blog posts, click here (Then you can spend your time reading our content from your yacht / pied-a-terre). If you can’t afford to build an app, you can always learn how to succeed in tech by reading other essays.