Agent Foundations for Aligning Machine Intelligence with Human Interests

A Technical Research Agenda


The mission of the Machine Intelligence Research Institute is to ensure that the creation of smarter-than-human machine intelligence has a positive impact. Although such systems may be many decades away, it is prudent to begin investigations early: the technical challenges involved in safety and reliability work appear formidable, and uniquely consequential.

Our technical agenda discusses six research areas where we think foundational research today could make it easier in the future to develop superintelligent systems that are reliably aligned with human interests. Since little is known about the design or implementation details of such systems, the research described below focuses on formal agent foundations for AI alignment research — that is, on developing the basic conceptual tools and theory that are most likely to be useful for engineering robustly beneficial systems in the future.

Our agenda overview paper is supported by six papers, motivating each topic in turn. Many tractable open problems are discussed throughout, which we hope can serve as a guide for researchers eager to do early work on AI alignment. The packet closes with an annotated bibliography summarizing recent research in each area (as of January 2015).


Agent Foundations for Aligning Machine Intelligence with Human Interests
Nate Soares and Benja Fallenstein (2014)


Formalizing Two Problems of Realistic World-Models
Nate Soares (2015)
Toward Idealized Decision Theory
Nate Soares and Benja Fallenstein (2015)
Questions of Reasoning Under Logical Uncertainty
Nate Soares and Benja Fallenstein (2015)
Vingean Reflection: Reliable Reasoning for Self-Modifying Agents
Benja Fallenstein and Nate Soares (2015)
Corrigibility
Nate Soares, Benja Fallenstein, Eliezer Yudkowsky, and Stuart Armstrong (2015)
The Value Learning Problem
Nate Soares (2015)

调整超智与人类Interests:
An Annotated Bibliography

Nate Soares (2015)