Balls, Strikes, And Other KPIs
Baseball fans know that umpires will not tolerate any debate over balls and strikes. These are immutable features of the national pastime, and as metrics for pitchers, they’re straightforward. Maximize strikes and minimize balls. Right?
This seems like a defensible argument—more balls lead to more walks (bad for pitchers) and more strikes increase the probability of a strikeout (good for pitchers). And then, we dig a little deeper.
Let’s return to 2007, as baseball was beginning to extricate itself from the preceding, steroid-filled decade.1 There were no smartphones with which to tweet2 one’s ire at an umpire’s calls, but nonetheless, games were played.3 About 8.5% of all plate appearances ended in a walk, 17.1% ended in a strikeout, and this yielded about 4.8 runs per game for hitters.
Now let’s consider 2021.4 The analytics cohort discovered that it is more efficient for batters to swing for the fences (lots of home runs and strikeouts) than attempt a higher rate of contact. That same cohort, in turn, discovered that the optimal pitching strategy would be to amass a stable of monsters who throw 100mph fastballs, explosive breaking balls, and avoid risking contact from slugging behemoths.5 Now, 23.2% of plate appearances end in strikeouts, and the number of runs per game has fallen to ~4.5. Seems like a win for pitchers.
But the walk rate has actually risen to 8.7% of all plate appearances.
How can this be? Pitchers throw harder, batters are swinging from their shoe tops, and fewer runs are scored. But one of our metrics is worsening?6
Divisions and Departments
Perhaps you think this is a silly example from an obsessive baseball fan.7 But if so, I urge you to consider the notion of divisional metrics in corporate America. If baseball teams were run like some modern corporate entities, there would be a department of walk-reduction, whose KPI for the quarter would be to lower the number of walks allowed. This would be wholly achievable by throwing a higher proportion of balls over the plate at lesser velocity. Anyone think this is good for the broader objective (in this case, preventing runs)?
If this sounds silly, consider an individual who is judged solely by a metric focused on minimizing bugs in software, or exclusively by minimizing downtime for a given site. They can achieve their goals at the expense of new features, new revenue streams, and any number of other strategies for which a cost must be paid on their particular KPIs.
In turn, these flawed organizational paradigms diminish the company’s agency, leaving them slaves to layers of bureaucracy and approval designed to help solve one particular problem without consideration of a broader context.
The Wrong Pocket
Compartmentalization of metrics also leads to what is known as the “wrong pocket problem.” Simplified, it is the case when the costs are paid from one pocket, but benefit the operations of another. Malcolm Gladwell illustrated one rather startling example with his “Million-Dollar-Murray” essay. Essentially, a chronically-homeless inebriate, afflicted with countless medical issues becomes an annual seven-figure liability in medical bills and labor costs to his municipality. Buying the guy a condo and hiring someone to look after him would have been much cheaper.
Ah, but that seven-figure cost is distributed among numerous hospitals, prisons, police departments, and housing authorities. They’d all benefit from receiving a magic bullet in the form of housing subsidy and a temporary, caretaking conservator.
Each hospital, prison, and police department has their own metrics. None of them, I suspect, would be served well by writing the check required to address the problem permanently. None can incur the short-term cost. They all have their walks to minimize.
There are arguments about the impossibility of remaking a system no one would design from scratch once it exists.8 There are rationales for why problems must be divided and conquered rather than approached in aggregate.
But there are real challenges associated with this compartmentalization. Namely, the solution to countless small problems downstream of a larger one.
And possibly, wholly ignoring the largest problems of all.
Attacking the Wrong Problem
Someone’s metric somewhere is to minimize the number of automobile fatalities. Perhaps someone working for a large automotive company receives a bonus for diminishing fatalities per mile driven. Perhaps someone working for the federal or a state government assembles a task force for this purpose.
To this end, they fund the development of the most advanced driver-assist program imaginable. They aspire to develop an artificial general intelligence (AGI) that not only drives the vehicle more safely than distractible humans, but also optimizes your seating position, and even the route you select from point A to B in the hopes of safer roadway outcomes.
Though automobile fatalities remain a significant source of all-cause mortality, with lifetime risks on the order of 1%9, the risk of a malevolent AGI ultimately posing an existential risk is greater, given prevailing estimates.10
Thus, the automobile fatality metric (1% lifetime risk), might be improved by 10% (0.1% decrement in lifetime risk), at the cost of adding several percentage points to an altogether different cause.
In fact, there’s an argument to be made that you’re more likely to perish in an asteroid-driven impact event than a car accident. Depending on how you define the size of an asteroid whose collision with earth would yield significant loss of life, the odds are somewhere between 1 incident in 100,000-1,000,000 years. A human planning to wander the earth for the next 50-100 years should incur a risk on the order of 1 in 1,000-10,000.
If you’re the type who works from home now, in a safe neighborhood, and expects automobile safety features to improve, you might wish to consider the inordinately large number of other smaller (but more numerous and far less discussed) risks that surround you.
Is the relevant metric automobile accidents or all-cause mortality?
Might an overly-specific metric actually cause the broader metric to worsen?
Defining the Game
From baseball players to data scientists, success begins by understanding the nature of the game being played. What is the objective? Is the goal to prevent runs or minimize walks? Act accordingly. Is the goal to live long and live well or to minimize the risk of death in a car accident?11
When assembling an organizational paradigm, and trying to avoid Goodhart’s law while you’re at it, remember that division of responsibility is probably necessary. However, be careful that parochial metrics aren’t causing you to replace the flamethrowing hall-of-famer with the control-obsessed replacement-level gopher-baller. Be careful that in attempting to minimize the risks of fender-benders, you aren’t accelerating the road to a misaligned AGI.
KPIs do matter—rational decisions require data. Insightful strategy, however, requires context.
1 The Mitchell Report was released in December of 2007, though its findings consisted of players consuming performance-enhancing substances in prior years.
2 Technically, Twitter was founded in 2006, since I know you were thinking “wait, there wasn’t Twitter either!” There was. You just needed to use an actual computer… weird, I know!
3 And their stats were recorded at one of my favorite sites, baseball reference.
4 Another trip to baseball reference!
5 Naturally, these pitchers fatigue more rapidly, must be changed more frequently, and the proportion of balls-in-play on a per-pitch basis plummeted. Baseball is unwatchable, despite remaining an illustrative tool for data science philosophy and the romantic obsession of your humble blogger.
6 The intuition here is actually simple, even if the outcome is paradoxical. Back in the olden days (even the ones that still included laptops), when a pitcher was behind in the count (e.g. 2-0, 3-1, and 3-0), he’d often throw a fastball. Recently, the calculus is changing for pitchers. And even when pitchers do capitulate and throw a fastball, that fastball is still often at triple-digit speed and damned difficult to put in play. Thus, where in 2007, that 2-0/3-1 pitch was often put into play, now, it often is not. And thus, many of those 2-1/3-2 counts that ensue today occur in plate appearances that would have already concluded in prior years. Some of those plate appearances become walks. This is why many pitchers with obscene K/9 ratios also sport what would have been considered troublesome walk ratios. Putting the ball in play against these guys is so difficult that they actually walk more batters because plate-appearances continue when they trail in the count where a lesser pitcher might elicit a ball in play. These guys allow fewer runs too—the strategy works.
7 I mean, you’re not wrong…
8 This is, ostensibly, the conclusion in “Meditations on Moloch” and other analyses.
9 Here’s some statistics, but if you want a more nuanced take on your own personal risk, might I suggest this calculator we’ve developed?
10 Effective altruists place the odds around 3%, Nick Bostrom is even less optimistic, and we wrote an entire essay arguing that AGI is likely to be what kills you, citing other harrowing estimates.
11 Stay in your bed all day and never leave… I assure you, your risk of an automobile collision will plummet.