The Difference Between Scientific Progress and Better Software

The year was 2005. Moneyball was a book, not a movie. In a dorm in New Jersey, two college classmates conspired to wager upon baseball games. One of the two completed a PhD in quantum physics, fell in love with a Brazilian woman in California, and roamed the South American countryside for years before embarking on a career at AE in BCI research. The other ate a few hoagies and landed on Wall St. in 2008. But I digress.

Predicting the outcomes of baseball games required a combination of domain knowledge and technical competency. In an age before ubiquitous GPUs, most computation occurred on a laptop. In an age before ready-made Python libraries for deep learning, the choice of algorithm greatly impacted the complexity of the problem. Informed, parsimonious definitions of parameters and architecture required significant a priori knowledge of what drives outcomes on the diamond.

Turns out a childhood spent watching ESPN every morning and perusing a massive, physical tome that was the baseball encyclopedia was not entirely wasteful.

Modernity

Technology has advanced at an exponential pace, especially in the field of machine learning. Now, Keras, TensorFlow, PyTorch, and countless other libraries allow the composition of complex neural networks in short order. Amazon’s elastic cloud service offers near unlimited computational power for costs that even a pizza-and-beer-consuming collegian can manage.

This provides opportunities to replace domain knowledge with raw computational power. Were my friend and I to undertake the same problem in 2022, we could simply gather all available data for hundreds of thousands of historical baseball games, train an artificial neural network with as many layers as are required, and likely would produce an even more accurate model than the one that made us a few bucks in summers long ago.

But this approach has its own limitations.

Robustness

AE is currently developing machine learning algorithms for interpreting neural data. One such application under investigation is “thought-to-text.” A person with chronic paralysis is asked to attempt the motion required to write a letter upon a page. Of course, the paralysis prevents their hand from moving. In the meantime, data is collected from an array of electrodes implanted in their motor cortex. Finally, an algorithm decodes the patterns in neural data to estimate the velocity of their hand, or more precisely, the intended velocity.

Beginning from a specific location, those velocities form lines and curves. Another algorithm attempts to infer, from those lines and curves, the letter the subject intended to draw. Ultimately, large language models can be used to correct the words implied by these characters, revealing the subject’s desired words and sentences.1

But among the challenges in developing algorithms for BCI is robustness across time. Is an algorithm calibrated on a brain today capable of the same level of performance next week? Next month? Next year? Clearly, there are consistent patterns in neural activity that persist over long periods. Knowledge, memories, recognition are all relatively durable. And yet, BCI users are currently required to calibrate the system, a tedious and time consuming process, only to repeat the process just days later.

Building on the shoulders of academic giants, AE’s recent research suggests that stabilizing BCIs over long periods of time is possible. Putting an electrode in the brain is like putting a nail in a bowl of jello. As the person moves around, so does the brain. The nail does not. The result? Neurons “drift” in and out of view of the electrodes. These perturbations add up; the populations recorded on any given day rarely match the previous day.

A highly-complex model is likely to reflect the idiosyncrasies of “today’s brain,” which may not reflect “tomorrow’s brain”. However, the population activity doesn’t change entirely. Seated in a football stadium on Sunday, observing an individual fan at a specific moment might reveal a conversation about a favorite player, or they may just be busy eating a snack. However, when their team scores, one population of fans cry out in unified joy. Similarly, neural populations correlate when performing specific tasks. This allows us to align an out-of-date model at the “team level” to the new neural population, and recover much of the BCI’s performance.

Progress

A simple method, whether it predicts the winner of tonight’s ballgame or the velocity intended by a paralyzed hand, will be imperfect. As always, the goal is not simply to build models, but to improve models.

My doctoral and postdoctoral work involved the estimation of soil moisture. Dig a hole, install a sensor, receive an hourly reading of moisture levels day after week after month. Calibrate a model with precipitation data (and a few other relevant features). Lather, rinse, repeat across the country.

A neural network might have offered a higher level of accuracy, but a model leveraging a smaller number of parameters and the knowledge of the underlying hydrology and physics2 offers some huge advantages.

The domain-specific, simpler model builds knowledge. We don’t simply estimate moisture from a black box, we begin to understand its drivers and how those vary geographically. That knowledge is transferable. We can calibrate a model in one location, find another similar3 location, and deploy that model there. We can learn from our mistakes. When a model fails in a specific place at a specific time, we start asking pertinent questions (like, “hmm…there was a ton more vegetation cover this season as a result of crop rotations…maybe our model doesn’t handle that variability properly…I’ll bet we can incorporate that idea and make our model better”).

The black box offers none of these opportunities, but I’ll bet it can produce more accurate estimates.

It Takes All Kinds

Without domain-specific knowledge, the only mechanism by which to improve models is to throw more computational power at the problem. More hidden layers. Better transformers. These tools are valuable, especially when small improvements in performance yield significant value. Moreover, they help illustrate what is, today, the best we can do on any given problem.

AE’s victory in the first round of the NLB challenge did just that - it improved a benchmark. That becomes the scaffold from which future researchers will ascend.

And yet, without neuroscientists increasing our understanding of the nature and structure of the brain, progress will be incremental.

Likewise, without industrial-strength machine learning and computational horsepower, we could not discover the potential of current scientific understanding.

Looking Forward

The best of all possible worlds4 involves iterative leapfrogging. Industrial strength data science pushes modeling results from existing domain knowledge as far as modern computation will allow. In turn, the domain experts uncover new feature spaces, new latent patterns, and research more effective hardware for gathering data. The software engineers then up the ante with ever more impressive tooling for processing and decoding patterns from those data. The domain experts then note the types of conditions that remain unaddressed by those models. This tills fertile ground5 for future research and modeling efforts. And on it goes.

A world without domain research offers only incremental progress. A world without software engineering and data science best practices leaves potential performance on the table, never to be realized.

A world with both, in tandem, yields exponential improvements over time and seemingly unlimited opportunity.

1 Like autocomplete, only helpful and necessary! Read more in the relevant manuscript published in Nature.

2 Not that complicated really. Rain makes the ground wet. The sun and plants suck up the water. Water flows downhill. (Told you it was simple)

3 Being grounded in hydrology and physics means “similar” has tangible meaning like “a climate with this pattern, and a soil of this type and texture, with topography that meets certain criteria.” Are any two brains “similar?” In what way? How would we define when a model calibrated in one brain is useful in another?

4 Which might, for some of you, exclude references to Voltaire, I admit :-).

5 Agricultural pun intended!

No one works with an agency just because they have a clever blog. To work with my colleagues, who spend their days developing software that turns your MVP into an IPO, rather than writing blog posts, click here (Then you can spend your time reading our content from your yacht / pied-a-terre). If you can’t afford to build an app, you can always learn how to succeed in tech by reading other essays.