ai safety
A collection of my activity in alignment and AI safety
What Happened to Iterated Amplification?
Key terms: Literature Review, Technical Alignment
Iterated (Distillation and) Amplification is a central and interesting concept in AI safety. It seemed odd however that not much has been published about it since 2018-2019.
We dug into the topic to figure out why - was the track dropped due to some serious issue, is it not needed anymore, or are people still working on it outside the spotlight? Beyond that, are there any key research questions that IDA has led to in 2024?
We argue that several of the most notable superalignment approaches in 2024 fall under the IDA umbrella and summarize what is recurringly raised as their most critical challenges.
Make Alignment Easier with a Dual-System AI Framework
Key terms: Governance, Alignment Problem
The full alignment problem seems really hard. That is, if in the future, we let superintelligent models beyond our understanding design and optimize their own architectures in ways that we cannot follow and are allowed to do large-scale optimization in society.
However, this more dangerous and unconstrained situation may still be rather far off. As Max Tegmark (TODO) and many others have pointed out, AI may provide revolutionary value to improve lives through research and other developments. This, however, we likely can already do with AI that is just slightly more capable than us, rather than needing to immediately go to the most exterme unrestricted form of optimized systems.
For that reason, we here attempt to define ways in which models can be designed so that they can be safe in the medium term and still provide most of the value from them that nations, corporations, and individuals seek, without needing to tackle the full alignment problem.
We argue that two different kinds of AI systems are needed and that they each have their own technical alignment problems that are strictly easier to address than the full alignment problem. This should therefore make it mroe feasible to solve the actual problems that need to be solved.
Engagement
Lab Fellow at Apart Research
Conducting research through the fantastically well-run non-profit Apart Research on the question of how to reliably evaluate LLMs.
Program Facilitator at AI Safety Sweden
Assisting the organization AI Safety Sweden in developing courses and programs to introduce more researchers to the field. Notably with a focus on how to get people involved in research and to meet collaborators.
Courses
- AI Safety Fundamentals - Advanced AI Safety technical course with Budapest AI. ongoing
- AI Safety Fundamentals - Alignment Course. completed 2023
- AI Safety Fundamentals - Governance Course. completed 2023