AGI Is Probably What Will Kill You
The title is intended literally. Even in a world containing risks from cancer, car accidents, and nuclear holocausts, it is an artificial general intelligence (AGI) with misaligned intentions that poses the largest risk of death or loss of autonomy.1 Why is this not a more common dinner topic than fantasy football or the latest dalliances of the Kardashians? AGI may be the greater risk, but topics of immediacy, inflation and geopolitical instability, rule our thoughts.
Putting Money Where One’s Mouth Is
Predicting the future is difficult, but people wagering their hard-earned cash on a prediction are generally more insightful than those bloviating on social media. Metaculus2, a global prediction market, places the odds that the control problem3 is solved before the creation of weak AGI at ~10%. And to be clear, the same forecasters argue that weak AGI will mature into superintelligence within a couple years of its creation. Finally, these same prediction markets suspect that a for-profit corporation is likely to best academic and government organizations in this technological race.
Most importantly, while AGI’s arrival was forecast as more likely than not by 2057 as recently as January 2022, that forecast has shortened to 2042. (Scott Alexander suggests it might be far sooner, and Eliezer Yudkowsky tends to agree).4
The probability of a misaligned AGI ultimately ending our lives or thoroughly eliminating our autonomy presents an existential risk worthy of attention. This post to the alignment forum assigns that dystopian outcome a probability north of 90%.5 Figures like these6 suggest that our lives (especially those with many remaining decades) are less likely to end with mundane diseases of aging and more likely to end catastrophically with AGI ruin.7
The folks wagering think you are more likely to be killed by a sober technologist than a drunk driver, cancer, or heart disease. To wit, there were ~36K roadway fatalities in 2020, accounting for slightly more than 1% of all deaths. Approximately 20% of all deaths in 2020 were from heart disease and cancer.8 If a global catastrophe occurs by 2100, the odds of AGI being the culprit sits at ~30%, far exceeding odds from conventional risks.9 Furthermore, preventions against roadway incidents, heart disease, and cancer are improving every year. Can we say the same about AGI?
Neglected Approaches
Effective altruists argue that efficiently allocating the scarce resources of time and money includes focusing on neglected problems.10 This then advocates for consideration of not only neglected problems, but neglected approaches to problems. Current efforts to manage the risks of AGI amount to attempts to build better neural networks. Ideally, those models will contain increasingly complex objectives that protect human freedoms. One of two things is true:
- This is a valid approach that might ultimately save humanity.
- This is not a valid approach, and the fact that everyone else is investing their time and money in this manner neglects an alternative that might succeed in its stead.
In a world where AGI ruin might actually be more likely to kill you than accidents, disease, or nuclear annihilation, the alignment problem is neglected. Solving the problem of misaligned AGI requires more than investment and resources.11 Society faces risks from accelerating the pace of technological development and from the slowing of that pace. Risks of stagnation are real - fail to develop mitigation strategies for existential risks (asteroids, climate variability, nuclear winters, etc) and given enough time, one of those risks ends the species. Risks of misalignment are real – develop the technology too rapidly or too carelessly, and AGI will outpace our own intellect, to our perpetual suffering.
This is a call-to-arms for engineers, scientists, philosophers, venture capitalists, and citizens. Are you thinking about how to address this problem? Is this atop your mind, at least competing with the conventional risks of daily life?
Intentions
One particularly compelling neglected approach is neurotech development. Neurotech may enable us to communicate higher-order intentions, aspirations, and ethics to the AGI that will ultimately optimize for the objectives it is given. The meaning of such human values is still unclear; the likelihood of succeeding in such an endeavor may be low. However, when one end of the equation contains an existential risk (whose probability is substantial), the expected value of low-probability solutions becomes attractive, especially if they are insufficiently explored. Neurotechnology development mostly pulls from orthogonal fields (biology, biomedical engineering) compared to AI research, and is therefore almost purely additive, i.e. does not dilute current AGI safety efforts.
Neurotechnology could also help human thinking more directly to present solutions. Our current working memory might suffice for holding recipe instructions in your head as you cook, but not for storing the complexity of the ideas AGI research will require. And our limitations in terms of working memory will cause us to struggle when we attempt to express complex thoughts. Transcription fluency12 is still a tremendous concern, as most of our ideas are communicated inadequately to our fellow human beings, let alone a silicon algorithm with a potentially opaque set of templates on which to build.
If we’re going to solve the alignment problem, we need to improve on our own clarity of thought first. Reinforcement learning, a common approach for AGI development, hinges on our ability to express the desired values that ought to be rewarded! How can we teach an AGI our values if we cannot express them or even think about them clearly?
Opening Doors
Lowering the barriers to entry for a field can enable newcomers that bring fresh new perspectives and neglected approaches.13 Among our projects here at AE is a suite of tools for developers in the neuroscience research space. One such tool is the Neurotech-Development Kit. “NDK” abstracts away partial differential equation solvers and simulation configurations that are necessary to test neurotech solutions in silico. NDK is designed to welcome folks with machine learning experience, but without expertise in neurotechnology simulation, to help build the next generation of neurotechnology.
Our best bet is to encourage participation from across geographies and domains—the solution being more probable if numerous approaches are explored.
Research
OpenAI, as they accelerate humanity’s pursuit of AGI, has articulated their approach to alignment research. Their primary direction is recursive reward modeling (RRM), developing models to assist human beings in evaluating the quality of quotations, commentary, and potentially, even codebases.
This is both impressive in terms of the potential practical applications (e.g. summarize a book, a model that might retain the capacity to critique and improve upon its own outputs), but also troublesome in terms of the implicit limitations.
Thinking (We’re Not Good At It)
As a species, we do not think clearly.14 We are limited in the scope of ideas we can grasp, the levels of abstraction our minds can handle, and our understanding of what it means to be an intelligent agent.
Consider my pet cockatiel. He believes that he is smarter than me. Why wouldn’t he? In terms of the types of intelligence he can grasp, he is. His ability to distinguish nuances in audible sequences is superior. His sense of spatial relations and retention thereof is stunning15 and his awareness of small movements seen at a distance vastly outstrips that of my wife and I as we gaze out the window. Of course, he cannot write poetry, nor process the mathematical implications of a machine learning algorithm. He also has no template with which to even consider such possibilities. As far as he’s concerned, he’s the more intelligent being, and his protestations and alerts are his charitable attempt to manage his large, dim-witted flock members.
If my parrot cannot process the nature of my intelligence, why do we suppose we can understand the nature of an AGI’s? Its transcendence might prove as invisible as my own to my bird’s.
Futures
There are design patterns that stand the test of time16 and there are approaches that seem like prisoners of their age.17 AGI is likely to arrive not only in our lifetimes but the next couple of decades. Incentives from VCs will continue to encourage accelerated timelines that eschew investments in safety. The monetary incentives are too enormous to be resisted.
And thus, even if the “longtermists” are correct, there is a shorter-term risk with which to be concerned. Even if we aren’t all fated to become paper clips next year, a diversity of approaches to this issue is mathematically justified. Consider the simple thought experiment. Option 1 is to pursue 100 approaches, each with a 1% chance of success. Option 2 is to pursue 2-3 well-worn paths, each with a 20% chance of success of saving the world.18 The former is what happens when barriers fall and participants enter a field. The latter is where we sit today; just a few approaches garner the majority of funding and energy.
And remember when you next fasten your seatbelt—it is AGI, not the erratic driver, that is (depending on whose estimate you believe) 10x, 100x, or even 1,000x more likely to end your life.
1 Assuming you aren’t currently terminally ill or observing Slim Pickens as Major Kong overhead.
2 “Culus” means ‘anus’ in Latin. This would then suggest that “Metaculus” is, potentially, a detailed (perhaps “anal”) discussion of…anuses? Just thought this was worth mentioning.
3 In other words, before we generate a form of general intelligence, what are the odds we’ve devised some method to ensure its ambitions will align with ours, and that we have some mechanism to refine and modify its activities and values before its intelligence exceeds ours and we’re just hoping it has the best of intentions.
4 It seems almost quaint that seven years ago, societal concerns focused on the risks of Trump presidency and the fact that the Cubs winning a World Series might qualify as a sign of the apocalypse.
5 Here’s an argument that the probability of AGI ruin in our lifetime exceeds 90%, then links to other relevant discourse on the topic.
6 William McAskill estimates the probability of AGI arriving in the next 50 years at roughly 10% and the potential of a misaligned AGI ultimately ending our lives or thoroughly eliminating our autonomy at rough 3%. Other effective altruists think this number is too low.
7 Even nuclear war presents lifetime risks of an order of magnitude similar to cancer or heart disease according to some researchers.
8 Grabbing some data from the folks who write insurance contracts.
9 Not only might AGI risk exceed those probabilities according to some estimates, but the available medical care for the most common medical conditions will likely improve as we all approach our golden years…can we say the same about AGI?
10 For instance, while cancer and heart disease claim millions of lives each year, there is no shortage of intellectual and monetary capital devoted to improving outcomes. Climate change does not struggle to find attention and research dollars. AGI might threaten a comparable number of lives, but the amount of energy devoted to solutions is orders of magnitude less.
11 Though folks like Sam Altman are doing their best and with sound reasoning.
12 The fluency with which you convey a thought or idea from the mind of its originator to a recipient. I’m doing it now, trying to choose words that will explain this idea to you, who must do some work to process and comprehend. This process is difficult for both of us. Relationships hinge on the distance between what one party intends and the other infers.
13 This is the 5th, and possibly most important category of alignment research according to this post in LessWrong, labeled (aptly) “field-building.”
14 Yes, this seems troublesome, given that I’m asking you to consume verbiage from a member of that species and consider the merits of that verbiage. Seems unavoidable until a BCI can extract and articulate my thoughts more effectively.
15 Literally, upon his return to a home my wife and I had rented out for two years, he immediately led himself up the stairs to a perch in a bathroom upon a shower handle of which he is immensely fond.
16 Literally, books like “the timeless way of building” are devoted to the subject.
17 Like, say building ever larger transformers with ever more training data as a pathway to intelligence?
18 The former has odds of success given by 1-0.99^100, which is roughly 63% (the limit of at least one success emerging from n trials at 1/n odds approaches 1-1/e). The latter has odds of 1-0.8^2, which is 36%.