AE and AI Alignment

At AE Studio, we specialize in tackling the most ambitious and important challenges we can get our hands on by focusing on neglected approaches with high potential impact.

Our journey didn't begin with AI alignment. We first applied a similar neglected strategy to Brain Computer Interfaces (BCI): we started by bootstrapping a sustainable software consulting business, developing our own startups, and reinvesting the proceeds into BCI research. This approach quickly led us to collaborations with leading scientists and companies (like Forest Neurotech and Blackrock Neurotech), pushing the boundaries of brain-computer interaction faster than we’d imagined possible.

Today, we're a team of about 160 talented individuals - programmers, product designers, and data scientists - united by our mission to increase human agency. We're profitable, growing, and guided solely by our own standards of excellence.

Our success stems from treating our clients' businesses as if they were our own startups. This mindset has propelled us further than we initially envisioned when we first developed this ambitious plan.

As the world evolves, so do we. With shortening AGI timelines and so much still to be done in AI alignment, we realized that our unique process could be applied to reducing existential risk from AI. Accordingly, we’re now applying our expertise and business model in neglected approaches to one of the most critical challenges of our era: AI alignment. As part of this effort—and beyond our research—we’ve also been accelerating AI alignment startups, working with clients like Goodfire AI and NotADoctor.ai.

You can contact us here.

Why AE Studio

At AE Studio, we have the expertise and the right incentives to solve the problem. Unlike other organizations, the financial incentive to expedite AGI development doesn’t drive us. We have a profitable consulting business that allows us to explore other areas that don’t necessarily attract significant research funding or other sources of revenue.

Our 'Neglected Approaches' Approach

At AE, we believe the space of plausible alignment research directions is vast and largely unexplored. Our 'Neglected Approaches' approach focuses on pursuing a diverse set of promising but overlooked approaches to AI alignment.

Key aspects of our approach include:

We adopt an optimistic, exploratory mindset towards creative and plausible alignment directions.
We leverage our expertise in BCI, neuroscience, and machine learning.
We pursue multiple neglected approaches simultaneously, increasing our chances of breakthroughs.
We actively collaborate with the broader alignment community.

Our technical work includes reverse-engineering prosociality, BCI-enhanced alignment research, and other innovative approaches. Complementing this, we're also engaged in neglected AI policy initiatives. We're advocating for increased alignment funding, exploring ways to empower whistleblowers in the AI industry, and working to bridge political divides in AI alignment discussions. These policy efforts aim to create a more favorable environment for responsible AI development and effective alignment research.

Our goal is to ensure superintelligent AI systems don't pose existential risks while increasing human agency and flourishing. We aim to use our proven project success structures to bring more experts into the field, reducing the talent gap, and rapidly develop impactful ideas into fully testable implementations.

By taking this 'Neglected Approaches' approach, we aim to contribute unique insights to AI alignment, tackling this critical challenge from multiple, often overlooked angles in both technical and policy domains.

Some of our Public Alignment Contributions

Despite being relatively new to the field of AI alignment, we've already made several contributions that we’re excited about

'Neglected Approaches' Approach: We described our alignment research agenda, focusing on neglected approaches, which received significant positive feedback from the community and has updated the broader alignment ecosystem towards embracing the notion of neglected approaches. Notably, some of the neglected approaches we propose could have a negative alignment tax, a concept we elaborate on in our LessWrong post "The case for a negative alignment tax" that challenges traditional assumptions about the relationship between AI capabilities and alignment.
We also discussed our approach to alignment, AI x-risks, and many other topics in a couple of podcasts:
Biologically Inspired AI Alignment: Exploring Neglected Approaches with AE Studio's Judd and Mike @ The Cognitive Revolution
Is Artificial Intelligence a Threat to Humanity? Judd Rosenblatt Discusses AI Safety and Alignment @ Superhuman AI
Attention Schema Theory: We published a paper on the "Unexpected Benefits of Self-Modeling in Neural Systems", where neural networks learn to predict their internal states as an auxiliary task, which changes them in a fundamental way. This work was presented at the Science of Consciousness 2024 and Mila’s NeuroAI Seminar. There are also a lot of interesting discussions on the implications of this work and general public excitement about it in the Twitter thread we released with the paper.
Self-Other Overlap: We published a LessWrong post explaining the concept of self-other overlap, a method inspired by mechanisms fostering human prosociality that aligns an AI’s representations of itself and others. It also shows our initial results with this methodology on a reinforcement learning model. We posted the highlights on this Twitter thread.
“Not obviously stupid on a very quick skim… I rarely give any review this positive… Congrats.” - Eliezer Yudkowsky
A follow-up post on LessWrong, which expanded our research with new experimental scenarios, was met with significant engagement. Highlights can be seen in this X post.
“This is the most constructive version of alignment work I have seen for LLMs so far.” - Emmett Shear (prev. CEO of Twitch, prev. interim CEO of OpenAI)
The Alignment Survey: We ran a comprehensive survey of over 100 alignment researchers and 250+ EA community members. This survey provided valuable insights into the current psychological and epistemological priors of the alignment community. Notably, we found that alignment researchers don’t believe that we are tracking to solve alignment, and relatedly, that current research doesn't adequately cover the space of plausible approaches to alignment, which reinforces our perspective of the importance of pursuing neglected approaches. Along with the results, we released an interactive data analysis tool so everyone can explore the data independently.
SXSW Panel on Conscious AI: We hosted a panel discussion at SXSW about the path to Conscious AI, highlighting the importance of AI consciousness research, and discussed it in a LessWrong post. And we had also hosted two other SXSW panels on BCI in the past. We’re now regularly collaborating with the top thinkers in this space (if this sounds like you, we encourage you to reach out to us).
AI Alignment Startups Initiative: We published a thought piece making the case for more startups in the alignment ecosystem, arguing that the incentives and structures of startups can be particularly effective for alignment work, as well as driving new funding and talent into the space. We’ve also made some Alignment-focused investments, which go into AE's equity plan for team members with the new goal of getting everyone diversified exposure to an exponentially larger post-human economy, while simultaneously advancing AI alignment.
ICLR Paper on Reason-Based Deception: Our paper on LLM reason-based deception was presented at the ICLR AGI workshop and can be found in arXiv.
PromptInject: Vulnerability Study on Language Models: Earlier on, we published a paper on arXiv titled "Ignore Previous Prompt: Attack Techniques For Language Models", which won the best paper award at the 2022 NeurIPS ML Safety Workshop.
Educational Content: We've created accessible content on important alignment concepts, including a DIY implementation of RLHF and a video on Guaranteed Safe AI.
Research Funding: We've sponsored various events focused on AI alignment and whole brain emulation, including Foresight Institute events and a Brainmind event. We've also funded research by Professor Michael Graziano to continue his research on attention schema theory and Joel Saarinen for his value learning research.

And Here's Some Cool Stuff on our Earlier BCI work

Our original theory of change involved enhancing human cognitive capabilities to address challenges like AI alignment. While we're now exploring multiple approaches to AI alignment, we continue to see potential in BCI technology. If AI-driven scientific automation progresses safely, we anticipate increased investment in BCI research. We're also advocating for government funding to be directed towards this approach, as it represents an opportunity to augment human intelligence alongside AI development.

While our emphasis has shifted towards AI alignment, our work in Brain-Computer Interfaces (BCI) remains an important part of our mission to enhance human agency:

Collaboration with top companies in the space: We've joined forces with leading BCI companies like Forest Neurotech and Blackrock Neurotech, helping to bridge the gap between academic research and industry applications.
Neural Latent Benchmark Challenge: We won first place in this challenge to develop the best ML models to predict neural data topping the best research labs in the space.
Open-Source Tools: We've developed and open-sourced several tools for the propel and democratize BCI development, like the Neural Data Simulator that facilitated the development of closed-loop BCIs, and the Neurotech Development Kit to model transcranial brain stimulation technologies. These tools have contributed to lowering barriers in BCI research and development.
Privacy-Preserving ML: We've developed secure methods for analyzing neural data and training privacy-preserving machine learning models, addressing crucial ethical considerations in BCI development.
Neuro Metadata Standards: We led the development of widely accepted neuro metadata standards and tools, supporting open-source neuro-analysis software projects like MNE, OpenEphys, and Lab Streaming Layer.

As we face increasingly urgent AGI timelines, we are intensifying our efforts to identify the most impactful paths forward. Our goal remains to leverage BCI technology to enhance human intelligence, ultimately contributing to solving the alignment problem. While our precise alignment x BCI strategy is still being internally debated and refined, we are committed to ambitious initiatives that push the boundaries of what BCI can achieve. We believe that with substantial funding directed toward AI alignment and BCI research, we can make significant strides. Moreover, in a future where scientific automation progresses safely, we aim to rapidly advance BCI technology to empower humans in tackling alignment challenges more effectively.

You can contact us here.

AE and AI Alignment

Why AE Studio

Our 'Neglected Approaches' Approach

Some of our Public Alignment Contributions

And Here's Some Cool Stuff on our Earlier BCI work

Sound like fun?