Key takeaways from our EA and alignment research surveys
by Cameron Berg, Judd Rosenblatt, Florin Pop, AE Studio
Many thanks to Spencer Greenberg, Lucius Caviola, Josh Lewis, John Bargh, Ben Pace, Diogo de Lucena, and Philip Gubbins for their valuable ideas and feedback at each stage of this project—as well as the ~125 alignment researchers who provided the data that made this project possible.
Background
Last month, AE Studio launched a survey for alignment researchers. We got some surprisingly interesting results, and we're excited to share them here.
We set out to better explore population-level dynamics within the alignment research community. We examined everything from demographics and personality traits to community views on specific alignment-related topics. We took on this project because it seemed to be largely unexplored and rife with potentially-very-high-value insights. In this post, we'll present what we think are the most important findings from this project.
Meanwhile, we're also sharing and publicly releasing a tool we built for analyzing the dataset. The tool has some handy features, including customizable filtering of the dataset, distribution comparisons, automatic classification/regression experiments, LLM-powered custom queries, and more. We're excited for the wider community to use the tool to explore these questions further in whatever manner they desire. There are many open questions we haven't tackled here related to the current psychological and intellectual make-up of the alignment research community that we hope others will leverage the dataset to explore further.
(Note: if you want to see all results, navigate to the tool, select the analysis type of interest, and click 'Select All.' If you have additional questions not covered by the existing analyses, the GPT-4 integration at the bottom of the page should ideally help answer them. The code running the tool and the raw anonymized data are both also publicly available.)
We incentivized participation by offering to donate $40 per eligible respondent—strong participation enabled us to donate over $3,720 to AI safety organizations including AI Safety Camp ($1480), SERI MATS ($1040), FAR AI ($440), CAIS ($320), FHI ($240), and Catalyze Impact ($200). Thanks again to all of those who participated!
Three miscellaneous points on the goals and structure of this post before diving in:
Our goal here is to share the most impactful takeaways rather than simply regurgitating every conceivable result. This is largely why we are also releasing the data analysis tool, where anyone interested can explore the dataset and the results at whatever level of detail they please.
This post collectively represents what we at AE found to be the most relevant and interesting findings from these experiments. We sorted the TL;DR below by perceived importance of findings. We are personally excited about pursuing neglected approaches to alignment, but we have attempted to be as deliberate as possible throughout this write-up in striking the balance between presenting the results as straightforwardly as possible and sharing our views about implications of certain results where we thought it was appropriate.
This project was descriptive and exploratory in nature. Our goal was to cast a wide psychometric net in order to get a broad sense of the psychological and intellectual make-up of the alignment research community. We used standard frequentist statistical analyses to probe for significance where appropriate, but we definitely still think it is important for ourselves and others to perform follow-up experiments to those presented here with a more tightly controlled scope to replicate and further sharpen the key results we present here.
Seven key results and implications
Here we present each key result, ordered by perceived relevance, as well as what we think are the fundamental implications of that result. We hyperlink each result to the associated sections in this post for easier navigation.
(Please note that there are also a bunch of miscellaneous results that people have found interesting that are not included in this list or in the main body of the piece.)
Alignment researchers don't think the field is poised to solve alignment
Result: Alignment researchers generally do not believe that current alignment research is on track to solve alignment and do not think that current research agendas exhibit strong coverage of the full space of plausible alignment approaches. However, alignment researchers did prove impressively accurate at predicting the research community's overall views on the relative promise of various technical alignment research directions (as systematized by Shallow review of live agendas in alignment & safety).
Implications: Alignment researchers' general models of the field are well-calibrated, but the fact that they don't think the field is on track to solve alignment suggests that additional approaches should be pursued beyond what is currently being undertaken—a view which was also echoed continuously throughout alignment researchers' free responses. We think this results lends additional credence to pursuing neglected approaches to alignment.
Capabilities and alignment research not viewed as mutually exclusive
Result: Alignment researchers generally disagree with statements like 'alignment research that has some probability of advancing capabilities should not be done' and 'advancing AI capabilities and doing alignment research are mutually exclusive goals.' Interestingly, researchers also erroneously predicted that the community would generally view safety and capabilities work as incompatible.
Implications: This finding merits a more precise follow-up and discussion to better understand what exactly alignment researchers believe the relationship is and ideally should be between AI alignment and capabilities research—especially given that roughly two-thirds of alignment researchers also seem to support pausing or slowing AI development. Our general interpretation of this cluster of findings is that alignment researchers believe that capabilities research is proceeding so quickly and aggressively that the probability of alignment research being a meaningful contributor to further capabilities speed-ups is actually low—despite mispredicting that other alignment researchers would view this probability as higher. This alignment-versus-capabilities position is potentially quite action-guiding for policy efforts as well as technical alignment work.
Overestimating the perceived value of intelligence, underestimating 'softer' skills
Result: We find in the alignment community that respondents significantly overestimate how much high intelligence is actually valued in the community. Alignment researchers also tend to underestimate how much the community actually values 'softer' skills like having a strong work ethic, ability to collaborate, and people skills.
Implications: Those in charge of hiring/funding/bringing people into the alignment research community should consider (at least as a datapoint) what skills and traits are actually most valued within that community. They should probably treat high intelligence as something more like a necessary-but-not-unilaterally-sufficient trait rather than the ultimate criterion. We agree that softer skills like a strong work ethic and a strong ability to collaborate can render highly intelligent individuals dramatically more effective at driving results.
Alignment researchers think AGI >5 years away
Result: Alignment researchers generally do not expect there to be AGI within the next five years—but erroneously predict that the alignment research community does generally expect this.
Implications: Perceived timelines will naturally calibrate the speed and intensity of research being undertaken. If most AI safety researchers think they have >5 years to attempt to solve alignment, it might be worth funding and pursuing additional 'expedited' research agendas in the chance that AGI comes sooner than this.
Moral foundations of alignment researchers
Result: Alignment researchers have reasonably distinct moral foundations. We tested a model of moral foundations that uses three factors: traditionalism, compassion, and liberty. While alignment researchers place low value in traditionalism and high value in compassion, the community is fairly normally distributed in valuing liberty.
Implications: The generally-normally-distributed nature of the population on the moral foundation of liberty suggests that this value is either considered orthogonal to alignment philosophy (which seems less likely to us) or otherwise underexplored in relation to it (which seems more likely to us).
Personality traits and demographics of alignment researchers
Result: Alignment researchers score significantly higher than the general population in neuroticism, openness, conscientiousness, extraversion (ordered here by the magnitude of the delta). Males outnumber females 9 to 1 in alignment. The community leans left politically and exhibits a diversity of other (albeit nonconservative) political views.
Implications: The community's heightened sensitivity to negative emotion and risk aversion may be part of what motivates interest in AI safety—but these traits may also prevent bold and potentially risky work from being pursued where it might be necessary to do so. Alignment researchers should also probably put explicit effort into recruiting highly qualified female researchers, especially given that current female alignment researchers generally do seem to have meaningfully different views on foundational questions about alignment.
Research direction prioritization
Result: Alignment researchers are most excited about evals and interpretability work, followed by various prosaic alignment approaches, are relatively less excited about 'make the AI solve it' approaches, and are even less excited about more theoretical approaches. Researchers proved impressively accurate at predicting these community views.
Implications: This prioritization should be tempered with the parallel findings that alignment researchers generally think current alignment research is not on track to solve alignment before AGI. We suspect that alignment researchers are most excited about evals and interpretability work because they feel they can make more direct, tangible, measurable, and prestigious progress in them in the short-term—but that these approaches appear to be something of a local optimum in the current research landscape rather than the global best strategy that will solve alignment.
Survey contents and motivation
We launched a survey for technical alignment researchers. The survey had three main sections.
First, we asked for general demographic information, including the extent to which the respondent has engaged with the alignment research community, as well as the nature of the role they currently play.
Next, we had respondents answer a series of Likert scale questions from a set of well-validated psychometric scales, including the Five Factor Model ('Big Five'), an updated version of the Moral Foundations Questionnaire (MFQ), and a number of other miscellaneous scales (probing things like risk-taking, delay discounting, self-control, and communal orientation). We included these questions because we think it is important to better understand the dominant cognitive and behavioral traits at play in the alignment research community, especially with an eye towards how these mechanisms might help uncover what otherwise-promising research directions are currently being neglected.
In the final part of the survey, we asked people to respond on five-point Likert scales (strongly disagree, somewhat disagree, …, strongly agree) to statements related to specific topics in alignment. These items were first framed in the general form 'I think X' (e.g., I think that alignment research is on track to solve alignment) and subsequently framed in the general form 'I think the community believes X' (e.g., I think the alignment research community as a whole believes that alignment research is on track to solve alignment).
Our motivation in this final section was two-fold: (1) we can straightforwardly understand the distribution of the community's views on a given relevant topic, but also (2) we can compare this ground truth distribution against individuals' predictions of the community's views in order to probe for false-consensus-effect-style results. Interestingly, we indeed found that the community significantly mispredicts its own views on key questions.
Who took this survey?
Approximately 125 alignment researchers. We recruited virtually all of these participants by simply posting on LW, where we asked the community to fill out the survey via a simple Google Form.
We found that the sample includes people working across a wide diversity of research orgs at varying levels of seniority. For instance, 18% of the alignment sample self-identifies as actively leading or helping to lead an alignment org.
Here is the full list of the alignment orgs who had at least one researcher complete the survey (and who also elected to share what org they are working for): OpenAI, Meta, Anthropic, FHI, CMU, Redwood Research, Dalhousie University, AI Safety Camp, Astera Institute, Atlas Computing Institute, Model Evaluation and Threat Research (METR, formerly ARC Evals), Apart Research, Astra Fellowship, AI Standards Lab, Confirm Solutions Inc., PAISRI, MATS, FOCAL, EffiSciences, FAR AI, aintelope, Constellation, Causal Incentives Working Group, Formalizing Boundaries, AISC.
Of note, the majority of alignment researchers are under 30. Males outnumber females approximately 9 to 1 in alignment. While this gender distribution is not unfamiliar in engineering spaces, it certainly seems worth explicitly highlighting, especially to the degree that male and female alignment researchers do seem to exhibit meaningfully different views about the core aims of alignment research (including, critically, the very question of whether alignment research explicitly requires an engineering-style background).
Overall, we find that approximately 55% of alignment researchers identify as politically progressive to some extent. While there appear to be a negligible number of self-identified conservatives (n=4 in alignment), there do appear to be a diversity of other political views at play (including a significant number of highly unique written-in affiliations/leanings that we somewhat crudely lumped under 'Other'). It is worth noting that the lack of self-identified conservatives could fuel similar problems as has been well-documented in academia, especially to the degree that policy advocacy is becoming an increasingly prominent aspect of AI safety work.
Roughly 40% of alignment researchers have been involved in the space for 2 or more years, and the dataset includes researchers at various stages of their careers, including a significant sample of researchers who are actively leading alignment organizations.
(As with each part of this write-up, there are numerous additional results in this section to explore that are not explicitly called out here. We also want to call out that we generally opted to keep the sample intact in subsequent analyses and found that adopting additional exclusion criteria does not statistically affect the key results reported here; the community can easily further filter the dataset however they see fit using the data analysis tool.)
Community views on specific topics (ground truth vs. predictions)
We asked the community to rate the extent to which they agreed with a number of specific claims in the general form, 'I think X' (e.g., I think alignment research is on track to solve alignment). Later on, we asked respondents to predict how their community in general would respond to these same questions in the general form, 'I think the alignment community as a whole believes X' (e.g., I think the alignment community as a whole believes that alignment research is on track to solve alignment). In this way, we position ourselves to be able to address two important questions simultaneously:
What do the ground truth distributions of views on specific field-level topics look like within the alignment community?
How do these ground truth distributions compare to the community's prediction of these distributions? In slightly less statistical language—how well does the community actually know itself?
Research direction prioritization (ground truth vs. predictions)
We asked the community to rate the extent to which they considered various alignment research directions to be promising—and proceeded to compare these distributions to the community's predictions of how others would respond in general.
By contrast to other surveyed communities, the alignment community proved impressively accurate at predicting their own views on the relative promise of various alignment research directions as captured by the rough factor structure presented in Shallow review of live agendas in alignment & safety:
This result indicates that alignment researchers are most excited about evals and interpretability work, followed by various prosaic alignment approaches (eliminating deception, finetuning/model edits, goal robustness, etc.), are relatively less excited about 'make the AI solve it' approaches (the most prominent example being superalignment), and are even less excited about more theoretical approaches, including provably safe architectures, corrigibility, and the like. This result also clearly demonstrates that alignment researchers are well-calibrated in understanding that the community has this general prioritization.
As an organization that is particularly interested in pursuing neglected approaches (which would likely all fall into the unpopular 'theory work' bin), we certainly think it is worth cautioning (as many others did in free response questions) that this result only tells us what the idiosyncratic set of current alignment researchers think about what should be pursued within the general constraints of the Shallow review framework. We do not think it is valid to conclude from results like this that people should stop doing theory work and all become mechanistic interpretability researchers.
The prioritization here should also be tempered with the parallel findings that alignment researchers generally think (1) that current alignment research (i.e., everything encompassed by the Shallow review framework) is not on track to solve alignment before we get AGI, and (2) that the current research landscape does not demonstrate strong coverage of the space of plausible approaches:
Taken together, these results reinforce to us that additional neglected approaches to alignment are very much worth identifying and pursuing. We suspect that alignment researchers are most excited about evals and interpretability work because they feel they can make more direct, tangible, measurable, and prestigious progress in them in the short-term—but that these approaches appear to be something of a local optimum in the current research landscape rather than the global best strategy that will solve alignment.
Other interesting field-level distributions (ground truth vs. predictions)
In addition to research direction prioritization, we asked the community to share the extent to which they agreed with a number of claims specific to alignment research. All of these distributions are available in the data tool; in this section, we will only highlight and comment on what we think are the most relevant results.
We find that respondents vastly overestimate (≈2.5x) how much high intelligence is actually valued, and underestimate other cognitive features like having strong work ethics, abilities to collaborate, and people skills. One potentially clear interpretation of this finding is that alignment researchers actually believe that high intelligence is necessary but not sufficient for being impactful—but perceive other alignment researchers as thinking high intelligence is basically sufficient. The community aligning on these questions seems of very high practical importance for hiring/grantmaking criteria and decision-making.
We asked alignment researchers multiple questions to evaluate the extent to which they generally view capabilities research and alignment research as compatible. Interestingly, researchers predicted that the community would view progress in capabilities and alignment as fundamentally incompatible, but the community actually skews fairly strongly in the opposite direction—ie, towards thinking that capabilities and alignment are decidedly not mutually exclusive. As described earlier, our general interpretation of this cluster of findings is that alignment researchers believe that capabilities research is proceeding so hastily that the probability of alignment research being a meaningful contributor to further capabilities speed-ups is actually low—despite mispredicting that other alignment researchers would view this probability as higher.
We find this mismatch particularly interesting for our own alignment agenda and intend to follow up on the implications of this specific development in later work.
Full text of left item: 'Alignment research that has some probability of also advancing capabilities should not be done.' Full text of right item: "Advancing AI capabilities and doing alignment research are mutually exclusive goals."
Another relevant misprediction of note relates to AGI timelines. Most alignment researchers do not actively expect there to be AGI in the next five years—but incorrectly predict that other alignment researchers do expect this in general. In other words, this distribution's skew was systematically mispredicted. Similar distributions can be seen for the related item, 'I expect there will be superintelligent AI in the next five years.'
Finally, we share here that the majority of alignment researchers (>55%) agree to some extent that alignment should be a more multidisciplinary field, despite community expectations of a more lukewarm response to this question.
Personality, values, moral foundations
Background on the Big Five
There are many different models of personality (≈ 'broad patterns of behavior and cognition over time'). The Five Factor Model, or 'Big Five,' is widely considered to be the most scientifically rigorous personality model (though it certainly isn't without its own criticisms). It was developed by performing factor analyses on participants' ratings over thousands of self-descriptions, and has been generally replicated cross-culturally and over time. Big Five scores for a given individual are also demonstrated to remain fairly consistent over the lifespan. For these reasons, we used this model to measure personality traits in the alignment sample.
(We show later on that Big Five + Moral Foundations scores can be used to predict alignment-specific views of researchers significantly above chance level, demonstrating that these tools are picking up on some predictive signal.)
The five factors/traits are as follows:
Openness: Creativity and willingness to explore new experiences. Lower scores indicate a preference for routine and tradition, while higher scores denote a readiness to engage with new ideas and experiences.
Conscientiousness: Organization, thoroughness, and responsibility. Individuals with lower scores might tend towards spontaneity and flexibility, whereas those with higher scores demonstrate meticulousness and reliability.
Extraversion: Outgoingness, energy, and sociability. Lower scores are characteristic of introverted, reflective, and reserved individuals, while higher scores are indicative of sociability, enthusiasm, and assertiveness.
Agreeableness: Cooperativeness, compassion, and friendliness. Lower scores may suggest a more competitive or skeptical approach to social interactions, whereas higher scores reflect a predisposition towards empathy and cooperation.
Neuroticism: Tendency and sensitivity towards negative emotionality. Lower scores suggest emotional stability and resilience, whereas higher scores indicate a greater susceptibility to stress and mood swings.
Personality similarities and differences
In general, the results of the Big Five assessment we administered indicate that alignment researchers tend to be fairly extraverted, moderately neurotic, intellectually open-minded, generally industrious, and generally quite compassionate. Compared to the general population, alignment researchers are significantly more extraverted, conscientious, neurotic, and open.
This result is not the first to demonstrate that the psychological combination of intellectualism (≈ openness), competence (≈ conscientiousness), and compassion (≈ agreeableness) corresponds intuitively to the core philosophies of AI alignment.
It is also somewhat unsurprising that two key differentiators between the alignment community and the general population appear to be (1) significantly higher sensitivity to negative emotion and (2) significantly higher openness. It seems clear that individuals attracted to alignment are particularly calibrated towards avoidance of negative long-term outcomes, which seems to be reflected not only in the community's higher neuroticism scores, but also in our measurements of fairly tepid attitudes towards risk-taking in general. Additionally, higher openness should certainly be expected in communities organized around ideas, rationality, and intellectual exchange. However, it also seems likely that alignment researchers may score significantly higher in intellect (often described as 'truth-oriented')—one of the two aspects/constituent factors of trait openness—than openness to experience (often described as 'beauty-oriented'). Pinning down this asymmetry more precisely seems like one interesting direction for follow-up work.
Though it was out of scope for this report, we are also excited about better understanding the extent to which there might be 'neglected' personalities in alignment—i.e., whether there are certain trait configurations that are typically associated with research/organizational success that are currently underrepresented in the community. To give one example hypothesis, it may be the case that consistently deprioritizing openness to experience (beauty-orientedness) in favor of intellect (truth-orientedness) may lead to organizational and research environments that prevent the most effective and resonant possible creative/intellectual work from being done. We are also interested in better understanding whether there is a clear relationship between 'neglected' personalities and neglected approaches to alignment—that is, to what degree including (or not including) specific kinds of thinkers in alignment would have a predictable impact on research directions.
Trait scores were zero-indexed. Note that the general population sample was directly taken from the paper that validated the specific Big Five scales used in this project; the alignment sample was measured directly in our survey.
Alignment researchers have distinct moral foundations
Moral foundations theory posits that the latent variables underlying moral judgments are modularized to some extent and are validly captured (like the Big Five) via factor analysis/dimensionality reduction techniques. We directly operationalize this paper in our implementation of the Moral Foundations Questionnaire (MFQ), which finds three clear factors underlying the original model:
Traditionalism: Values social stability, respect for authority, and community traditions, emphasizing loyalty and social norms. Lower scores may lean towards change and flexibility, whereas higher scores uphold authority and tradition.
Compassion: Centers on empathy, care for the vulnerable, and fairness, advocating for treating individuals based on need rather than status. Lower scores might place less emphasis on individual care, while higher scores demonstrate deep empathy and fairness.
Liberty: Prioritizes individual freedom and autonomy, resisting excessive governmental control and supporting the right to personal wealth. Lower scores may be more accepting of government intervention, while higher scores champion personal freedom and autonomy.
We find in general that alignment researchers score low on traditionalism, high on compassion, and are distributed roughly normally on liberty. Note that Likert items (strongly disagree, somewhat disagree, ..., strongly agree) are represented numerically below, where 1 = strongly disagree, and so on.
Note the slightly truncated x-axis given virtually no one from the group scored <2 in this scale.
Considering each of these three results in turn:
It is not very surprising that alignment researchers are low in traditionalism, which is typically associated with conservatism and more deontological/rule-based ethical systems. Worrying about issues like rogue AI might indeed be considered the epitome of 'untraditional' ethics. This result naturally pairs with the finding that there are virtually no conservative alignment researchers, which may have important implications for viewpoint diversity and neglected approaches in the community.
Alignment researchers clearly value compassion from a moral perspective. This result likely can be explained by the fundamental motivation to prevent AI-related catastrophes that could harm humanity.
Interestingly, alignment researchers are generally-normally-distributed on liberty as a moral foundation, with a slight positive skew (towards liberty). It is worth noting that while the philosophy of AI safety has a clear expected relationship to traditionalism (low) and compassion (high), it seems plausibly agnostic to liberty as a moral value, potentially explaining the generally-normally-distributed nature of the population. This finding invites further reflection within the community on how liberty as a moral foundation relates to their work. For example, the implementation details of an AI development pause seemingly have a clear relationship to liberty (as we actually demonstrate quantitatively later on). Given that alignment researchers seem to care both about liberty and AI x-risk, it would be interesting for follow-up work to better understand, for example, how researchers would react to a government-enforced pause.
Free responses from alignment survey
On the alignment survey, we asked respondents three questions that they could optionally write in responses to:
- What, if anything, do you think is neglected in the current alignment research landscape? Why do you think it is neglected?
- How would you characterize the typical alignment researcher? What are the key ways, if any, that you perceive the typical alignment researcher as unique from the typical layperson, the typical researcher, and/or the typical rationalist type?
- Do you have any other insights about the current state of alignment research that you'd like to share that seems relevant to the contents of this survey?
Given the quantity of the feedback and the fact that we ourselves have strong priors about these questions, we elected to simply aggregate responses for each question and pass them to an LLM to synthesize a coherent and comprehensive overview.
Here is that output (note: it is ~60% the length of this post), along with the anonymized text of the respondents.
Our four biggest takeaways from the free responses (consider this an opinionated TL;DR):
- The field is seen as suffering from discoordination and a lack of consensus on research strategies, compounded by a community described as small, insular, and overly influenced by a few thought leaders. It is important to highlight the significant worries about the lack of self-correction mechanisms and objective measures of research impact, which suggests the need for further introspection on how the community evaluates progress and success. Both of these concerns appear to us as potentially highly impactful neglected 'meta-approaches' that would be highly worthwhile to fund and/or pursue further.
- There were numerous specific calls for interdisciplinary involvement in alignment, including multiple calls for collaboration with cognitive psychologists and behavioral scientists. We were excited to see that brain-like AGI was highlighted as one neglected approach that was construed as both accessible and potentially-high-impact for new promising researchers entering the space.
- The alignment community perceives itself to be distinguished by its members' high intellectual capacity and mathematical ability, specialized technical knowledge, high agency, pragmatic altruism, and excellent epistemic practices. Distinct from typical rationalist types, they're noted for their STEM background, practical engagement with technical AI issues, and a combination of ambition with intrinsic motivation. They also believe they are perceived as less experienced and sometimes less realistic than their peers in cognitive sciences or typical ML researchers.
- The community also shared concerns about the ambiguous standards defining alignment researchers, potentially skewing the field towards rewarding effective communication over substantive research progress. Critiques also extend to the research direction and quality, with some arguing that emphasis on intelligence may overlook creativity and diverse contributions (a finding we replicate in more quantitative terms elsewhere).
Concluding thoughts
Thanks again to the alignment research community for their participation in this survey, which has enabled all of the analysis presented here, as well as over $3,720 in donations to AI safety organizations. We want to emphasize that we perceive this write-up to be a first pass on the dataset rather than the last word, and we'd like to strongly encourage those who are interested to explore the data analysis tool we built alongside this project (as well as the full, anonymized dataset). We suspect that there are other interesting results to be found that we have not yet uncovered and are very excited to see what else the community can unearth (please do share any results you find and we will add them to this post!).
One practical thought: we were most surprised by the community misprediction/false consensus effect results. Accordingly, we think it is probably worth probing alignment between (1) group X's perception of group X's views 'as a whole' and (2) group X's actual views fairly regularly, akin to calibration training in forecasting. Group-level self-misperceptions are a clear coordination problem that should likely be explicitly minimized through some kind of active training or reporting process. (A more precise future tool might enable users to predict the full shape of the distribution to avoid noise in varying statistical interpretations of (1) above.)
To end on a positive note, we highlight one final significant community misprediction:
This demonstrates that alignment researchers are significantly more optimistic than they anticipated about having made significant alignment progress before AGI is developed. In other words: alignment researchers currently don't think that other alignment researchers are particularly hopeful about making progress, but they actually are! (Or at the very least, are explicitly not pessimistic.) So we'd like to strongly encourage researchers to go out and continue doing the hard work with this understanding in mind, particularly with respect to the more underexplored areas of the alignment research landscape.
Thanks very much for your engagement with this project, and we are looking forward to seeing what other interesting results the community can discover.
Appendix: other interesting miscellaneous findings (in no particular order)
Using temperament to predict alignment positions
An interesting (though not particularly actionable) classification result:
Predicted alignment positions shown on the y-axis. Predictive accuracy of classifier shown on x-axis. Dotted red line indicates chance-level.
We show that respondents' trait-level scores from the psychometric instruments deployed can be used to predict alignment researchers' positions on the various alignment-specific questions significantly above chance level using a simple Random Forest Classifier (with balanced class weights). Feature importances reveal that many such predictions are based on seemingly sensible features—for instance, for the statement, "I currently support pausing or dramatically slowing AI development," the feature with the single highest importance is one's liberty moral foundation score, which makes a good deal of sense. For the "promise seen in controlling the AI (deception, model edits, value alignment, goal robustness)" question, the single feature with the highest importance is, quite intriguingly, one's own self-control score on the Brief Self-Control Scale.
The purpose of this analysis is to demonstrate that, while undoubtedly imperfect, these psychometric tools can indeed be used to help predict real-world psychological variables in sensible and interesting ways—which in turn can yield interesting practical implications for field-building, pursuing novel approaches, and the like.
Gender differences in alignment
We show here that female alignment researchers are slightly less likely to think of alignment as fundamentally related to control rather than coexistence, more likely to think that alignment should be more multidisciplinary, and slightly less likely to think that alignment researchers require a CS, math, physics, engineering, or similar background. Given that female researchers seem to have meaningfully different views on key questions about the nature of alignment research and are dramatically outnumbered by males (9 to 1), it may be worth explicitly attempting to recruit a larger number of well-qualified female alignment researchers into the fold.
Alignment researchers exhibit very low future discounting rates
This plot shows that >70% of alignment researchers exhibit extremely low future discounting.
As additional convergent evidence supporting the they-are-who-they-say-they-are conclusion, alignment researchers demonstrate very low future discounting rates as measured using a subset of questions from the Monetary Choice Questionnaire. (This tool basically can be thought of as a more quantitative version of the famous marshmallow test and has been shown to correlate with a number of real-world variables.) Having very low discounting rates makes quite a lot of sense for rationalist longtermist thinkers.
One particularly interesting finding related to this metric is that k-value correlates moderately (r=0.19, p=0.03) with support for pursuing theory work in alignment. One clear interpretation of this result might be that those who discount the future more aggressively—and who might have a diminished sense of the urgency of alignment research as a result—also think it is more promising to pursue alignment approaches that are less immediately practical (i.e., theory work).
Alignment researchers aren't huge risk-takers
Example items that comprise the General Risk Propensity Scale.
We show that alignment researchers are generally normally distributed with a slight negative skew on risk-taking as captured by the General Risk Propensity Scale, with less than 15% of individuals displaying a strong risk-taking temperament (≥4 on the scale above). This effect is driven by example responses shown below the scale-level plot.
Alignment researchers support a pause
It is very clear that alignment researchers generally support pausing or dramatically slowing AI development (>60% agreement), which naturally pairs with the finding that alignment researchers do not think we are currently on track to solve alignment before we get AGI.
Alignment org leaders are highly optimistic by temperament
In blue are respondents who actively lead alignment orgs, and in red are all other alignment researchers. We probed trait optimism (ie, not optimism about alignment specifically) in the survey using items like "I see myself as someone who is an optimist," "...who has a 'glass-half-full' mentality," etc. and found an interesting pocket of extremely optimistic alignment org leaders! This finding suggests an important (if somewhat obvious) motivating factor of good leaders: genuinely believing that effortfully pushing forward impactful work is likely to yield very positive outcomes.
[Any additional interesting results found by the community will be added here!]