my research and why its relevant to ai alignment

cover letter of what i did

I'm interested in developing the area of Cooperative AI. I believe that increasing our frameworks and tools for interpreting how AI agents are building their optimization functions and overall understanding the environment around them and how to build sturdy frameworks to enforce cooperative outcomes with humans are paramount in the development of safe AGI.

To achieve this goal I'm working with prof. Wolfram Barfuss. We are looking into the application of Temporal-Difference (TD) learning to replicate reinforcement learning dynamics. Our work extends to investigating the transition from competitive dynamics, like the tragedy of the commons, to cooperative ones under specific environmental conditions. Understanding how to move from scenarios where we fail to cooperate to those of successful cooperation is very important for AI safety. We're also interested in tackling the area of interpreting agent cognition. Current efforts in the area of interpretability only take into account our current architectures; as new architectures arise, our understanding of these systems have to be developed from the ground up, this doesn't seem like an approach that scales well. The approach we're pursuing looks to bridge the gap between deterministic learning dynamics and more data-efficient algorithms, aiming for a resource-rational approach and hopefully will offer us a more generic and resilient approach to interpreting AI systems. In the future, we also hope to work on the formalization of the exploration-exploitation balance in relation to agents' experience levels and their outlook on the future, which could lead to novel algorithmic developments. The development of better cooperative algorithms can assure safer interaction between future AI systems and humans.

Besides the research with prof. Barfuss, I'm also interested in working in two other software projects. The first one is a multi-agent reinforcement learning (MARL) scenario to test Nowak's rules for cooperation under a reinforcement learning (RL) perspective. This project focuses on how these classical rules for cooperation react to various environmental incentives and which rules most effectively promote agent cooperation. Additionally, I am developing an open-source software layer with a dual purpose. Firstly, it can be integrated with a LLM to prevent the generation of unethical content. Secondly, it serves to bypass the human-in-the-loop in human reinforcement learning (HRL) processes. This tool is designed as a mechanism for ethical consensus building in AI, employing a bargaining strategy rooted in a MARL framework. Both these ideas were discussed with several members of the AI Safety community, including Anders Sandberg that confirmed that these projects were important areas to be explored in order to achieve safe AGI, as he also believes in the importance of understanding multi-agent systems and exploiting their abilities.

This budget will provide me with the necessary time to either continue my research with an organization interested in hiring me or to apply to a university to pursue a program that will leverage my career as a researcher. I will opt for a program in Germany, where university education is free. Being part of a university environment, I believe, would significantly ease my path into research.

time spent with AI alignment in the last 7mo:

I am a systems engineer with several years of experience who has recently transitioned to a research role in Cooperative AI. The reason why I am shifting my career is because I want to effectively employ my time in a way that will be uplifiting to human kind. I believe I'm a competent software engineer and I can make a difference in the AI landscape by pursuing a career that has the goal of aligning AGIs with human purposes.

I've came across the area of Cooperation in June 2023 and since then I've been pursuing research on this field. I consider this area to be critically important within the AI safety landscape. The likelihood of the AGI emerging within the context of multi-agent systems appears to be quite high given that cooperative systems are ubiquos as the most effective outcome of evolutionary algorithms.

I've spent about 1500 hours in the last 7 months studying/working with AI related topics. From these, about 700 to 800 hours were spent specifically in working with AI safety. Here is a rough breakdown of my time:

Half of the AI Safety Fundamentals course by BlueDot (60h)
I've read papers, books and articles about AI alignment. I've also wrote documents rebating or steel-manning arguments and listened to podcasts (200h)
Impact Academy's course on AI alignment (40h)
My own research related to cooperative AI (this time only takes into account time learning or working with AI alignment related topics) (around 500h hard to estimate)
The remainder of the time was spent in upscaling my understanding of AI systems, learning about Machine Learning, Deep Learning and Reinforcement Learning.

Currently I'm a researcher within Impact Academy's research track however they're unable to assist me with funding.

Having dedicated hundreds of hours to working in this field, I have developed a good understanding of its demands, objectives, and the potential outcomes I can achieve. I see myself as a good fit because of my background in software engineering and my continued interest in learning more about these topics and will to understand and improve them.