Ensuring smarter-than-human intelligence has a positive outcome

||Analysis,Video

I recently gave a talk at Google on the problem of aligning smarter-than-human AI with operators’ goals:

该讲座由“灵感AI阵营:为什么这很困难,并从哪里开始“,并作为介绍对准研究在AI的子域。亚博体育官网一种改进的成绩单如下。

Talk outline (slides):

1.概观

2.Simple bright ideas going wrong

2.1。任务:填写一大锅
2.2。Subproblem: Suspend buttons

3.The big picture

3.1。Alignment priorities
3.2。四个关键命题

4.Fundamental difficulties



概观

我是机器智能研究所的执行主任。亚博体育官网粗略地讲,我们是在想从长远来看对人工智能和工作,以确保在时间,我们有先进的AI系统一个群体,我们也知道怎么点他们有益的指导。亚博体育苹果app官方下载

穿越历史,科学和技术已在人类和动物福利的变化最大的驱动程序,对于好是坏。如果我们可以自动化科技创新,是要改变工业革命以来所不曾经历的世界的潜力。当我谈到“先进的AI,”正是这种潜在的自动化的创新,我的想法。

在这方亚博体育苹果app官方下载面的能力超过人类的AI系统不将于明年上市,但很多聪明人都做这个工作,而且我不是一个反对人类智慧的选择。我认为这是可能的,我们将能够建立类似在我们的有生之年自动化的科学家,这表明,这是我们需要认真对待的事情。

When people talk about the social implications ofgeneral AI, they often fall prey to anthropomorphism. They conflate artificialintelligencewith artificialconsciousness, or assume that if AI systems are “intelligent,” they must be intelligent in the same way a human is intelligent. A lot of journalists express a concern that when AI systems pass a certain capability level, they’ll spontaneously develop “natural” desires like a human hunger for power; or they’ll reflect on their programmed goals, find them foolish, and “rebel,” refusing to obey their programmed instructions.

这是放错地方的关注。人类的大脑是自然选择的复杂产品。我们不应该期望超过科技创新人力性能机酷似人类,任何超过早期火箭,飞机,或者热气球非常相似的鸟类。1

AI系统的概念“打破免费”的自己的亚博体育苹果app官方下载源代码的束缚或自发地发展类似人类的欲望只是混淆。AI系统亚博体育苹果app官方下载is它的源代码,其行为仅会从我们开始执行指令跟随。该CPU只是不断在程序寄存器执行下一条指令。我们可以写操纵它自己的代码,包括编码目标的程序。即使是这样,虽然,这让作为执行的原代码,我们写的结果作出的操作;他们不从机器某种鬼干。

有严重的问题,更聪明,比人类的AI是我们如何能够确保我们所指定的目标是正确的,我们如何能够最大限度地减少错误设定的情况下,昂贵的事故和意外后果。正如斯图尔特·罗素(合着者人工智能:一种现代方法)把它:

主要关注的不是鬼应急意识,但仅仅为了使能力high-quality decisions。Here, quality refers to the expected outcome utility of actions taken, where the utility function is, presumably, specified by the human designer. Now we have a problem:

1. The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down.

2.任何足够能力的智能系统会优先保证自身继续存在,并获得物理和计算资源 - 不为亚博体育苹果app官方下载自己着想,但在其指定的任务取得成功。

被优亚博体育苹果app官方下载化的功能的系统nvariables, where the objective depends on a subset of sizek<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable.

These kinds of concerns deserve a lot more attention than the more anthropomorphic risks that are generally depicted in Hollywood blockbusters.

Simple bright ideas going wrong

任务:填写一大锅

很多人,当他们开始比,聪明人AI谈论的担忧,将扔了终结者的图片。我曾经报价在​​谁竖起终结者图片在其所有关于AI的文章,旁边一个终结者图片的人的新闻文章取笑。我学会了一些媒体的那一天。

I think this is a much better picture:

vlcsnap-2016-05-04-18h44m30s933

This is Mickey Mouse in the movie幻想曲,谁也非常巧妙,陶醉扫帚来填补代表他大锅。

米奇会怎样做呢?我们可以想像,米奇写了一个计算机程序,并有扫帚执行程序。米奇开始写下一个打分函数或目标函数:
$$ \ mathcal【U} _ {扫帚} =
\ {开始}案件
1 &\text{ if cauldron full} \\
0 \ {文本如果釜空}
\ {端}例$$
Given some set of available actions, Mickey then writes a program that can take one of these actions as input and calculate how high the score is expected to be if the broom takes that action. Then Mickey can write a function that spends some time looking through actions and predicting which ones lead to high scores, and outputs an action that leads to a relatively high score:
$$ \底流{一个在A \} {\ mathrm {几分\ MBOX { - } argmax}} \ \ mathbb {E} \左[\ mathcal【U} _ {扫帚} \中旬\权利] $$The reason this is “sorta-argmax” is that there may not be time to evaluate every action in . For realistic action sets, agents should only need to find actions that make the scoring function as large as they can given resource constraints, even if this isn’t the maximal action.

这个程序看似简单,但当然,魔鬼在细节中:写一个算法,不准确的预测,并通过动作空间智能搜索基本上是AI的整个问题。从概念上讲,然而,这是很简单的:我们可以在粗线条描述的各种操作扫帚必须执行,并在不同性能水平的合理后果。

When Mickey runs this program, everything goes smoothly at first.Then:

vlcsnap-2016-05-04-19h48m12s031

I claim that as fictional depictions of AI go, this is pretty realistic.

Why would we expect a generally intelligent system executing the above program to start overflowing the cauldron, or otherwise to go to extreme lengths to ensure the cauldron is full?

第一个困难是,目标函数米奇给他的扫帚冷落一堆其他条款米奇关心:

$$\mathcal{U}_{human} =
\ {开始}案件
1&\text{ if cauldron full} \\
0 \ {文本如果釜空} \\
-10&\text{ if workshop flooded} \\
+0.2&\text{ if it’s funny} \\
-1000000&\ {文本,如果有人就会被杀死} \\
&\text{… and a whole lot more} \\
\ {端}例$$

第二个困难是,米奇编程扫帚做出成绩的期望一样大,因为它可以。“只需填写一个大锅水”看起来像一个温和的,有限范围的目标,但是,当我们这个目标转化为一个概率背景下,我们发现,优化它意味着哄抬成功荒谬的高度的概率。如果扫帚受让99.9%的概率为“大锅已满,”它有额外的资源躺在身边,那么它会一直想方设法利用这些资源来推动的概率哪怕是一点点高。

在有限的“这一对比任务状” goal we presumably had in mind. We wanted the cauldron full, but in some intuitive sense we wanted the system to “not try too hard” even if it has lots of available cognitive and physical resources to devote to the problem. We wanted it to exercise creativity and resourcefulness within some intuitive limits, but we didn’t want it to pursue “absurd” strategies, especially ones with large unanticipated consequences.2

在这个例子中,原来的目标函数看起来很任务等。它是有界的,相当简单。有没有办法让越来越大的数额效用。这不是像系统得到了水在其在倒斗的每一亚博体育苹果app官方下载个点 - 那么就显然是过满,大锅的诱因。这个问题被隐藏的事实,我们正在提升expectedutility. This makes the goal open-ended, meaning that even small errors in the system’s objective function willblow up

There are a number of different ways that a goal that looks task-like can turn out to be open-ended. Another example: a larger system that has an overarching task-like goal may have子进程that are themselves trying to maximize a variety of different objective functions, such as optimizing the system’s memory usage. If you don’t understand your system well enough to track whether any of its subprocesses are themselves acting like resourceful open-ended optimizers, then它可能没有多么安全的顶层目标是

因此,扫帚不断抢夺更多的水的桶 - 比如,上关的机会,大锅中有泄漏,或者说,“饱腹”要求的水是边缘的水平略高于。而且,当然,在任何时候做米奇的代码扫帚“反叛乱”。如果有的话,扫帚追赶它与编程的目标有效。

Subproblem: Suspend buttons

这个问题的一个共同的反应是:“OK,有可能是目标函数的一些意想不到的后果,但我们总能拔出插头,对吧?”

米奇尝试此, and it doesn’t work:

vlcsnap-2016-05-04-19h21m04s349 vlcsnap-2016-05-04-19h22m09s178 vlcsnap-2016-05-04-19h53m09s315

我宣称这是现实也是如此,对于那些在他们的建模环境足够好的系统。亚博体育苹果app官方下载如果系统正在试亚博体育苹果app官方下载图推高其评分功能的期望,是聪明足以认识到其被关闭将导致较低的得分结果,那么系统的激励就是要颠覆关机尝试。能力较强的系统中,更有可能是找到创造性的方亚博体育苹果app官方下载式来实现这一子目标 - 例如,通过将自身复制到互联网上,或者通过欺骗的程序员,以为它的安全。

这是不是说,这是不可能关闭足够能力的AI系统;亚博体育苹果app官方下载它只是我们需要投入的刻意设计系统,以避免搜索途径,以避免被关闭的额外工作。亚博体育苹果app官方下载如果你写的代码,将试图抵制试图关闭它,那么你不应该运行,即使你确信它会失败的代码 - 有更好的方式来度过那些计算周期不是敌对政策的搜索。

Relevant safety measures that don’t assume we can always outthink and outmaneuver the system will need to modify the system’s objective function in some relevant way. We need to figure out how to formally specify objective functions that don’t automatically place the AI system into an adversarial context with the operators; or we need to figure out some way to have the system achieve goals without optimizing some objective function in the traditional sense. E.g., we can look for ways to limit the system’s capabilitiesto a small set of “whitelisted” domainsthat don’t include thinking about whether the system will be shut down.

One obvious solution is to try changing the score function to make reference to a suspend button :
$$ \ mathcal【U} ^ \ prime_ {扫帚} =
\ {开始}案件
1 \text{ if cauldron full } & \text{& } \textbf{B}\text{=OFF} \\
0 \ {文本如果釜空}&\文本{&} \ textbf {B} \ {文本= OFF} \\
1个\文本{如果机器人悬浮}&\文本{&} \ textbf {B} \ {文本= ON} \\
0 \ {文本}否则
\ {端}例$$
This says that when the button is on, the system’s goal changes from “fill the cauldron” to “be suspended.”

目前尚不清楚我们如何能够真正指定这个目标 - 如果在我们指定的方式有漏洞“暂停”,则系统可能会试图利用他们来规避风险。亚博体育苹果app官方下载例如,它可以完成technicaldefinition of “suspend operations” while tasking a child agent it built or subprocess it started with filling the cauldron. This would allow is to achieve both “cauldron full” and “robot suspended” at once, just in case it’s wrong about whether the button has been pushed.

Let’s assume, however, that we’ve found an airtight way to technically specify suspension. The problem we run into now is that we’ve given the system an incentive to care a lot about whether the button is pushed. If it has any way to influence its operators’ decision to push the button, or any other way to exercise more control, then it will jump on the opportunity and try to ensure the button is pushed, or ensure that it’s not pushed — depending on whether it looks easier to suspend or to fill the cauldron.

And, surprisingly, the state of the art on this problem isn’t much more advanced than that. There’s a paper co-authored by me, Fallenstein, Yudkowsky, and Armstrong (“Corrigibility“),并通过Orseau和阿姆斯特朗(相关结果”安全可中断代理“),但不是很多。

What we want is a way to combine two objective functions — a default function for normal operation, and a suspend function for when we want to suspend the system to disk.$$\mathcal{U}_{normal}(act) =
\ {开始}案件
1个\ {文字如果大锅全} \\
0\text{ if cauldron empty}
\ {端}例$$

$$\mathcal{U}_{suspend}(act) =
\ {开始}案件
1 \文本{如果悬浮} \ {HSPACE}0.8厘米\\
0\text{ otherwise}
\ {端}例$$
We want our method for combining the functions to satisfy three conditions: an operator should be able to switch between the functions (say, by pushing a button); the system shouldn’t have any incentives to control which function is active; and if it’s plausible that the system’s normal operations could inadvertently compromise our ability to switch between the functions, then the system should be incentivized to keep that from happening.

到目前为止,我们还没有发现任何方式实现所有三个目标的一次。作为一个例子,我们可以尝试删除对系统的部分控制的任何激励其是否暂停按钮由给系统切换目标函数总是分配相同的预期效用的亚博体育苹果app官方下载按钮被打开或关闭推送:
$$\mathcal{U}_{switch}(act) =
\ {开始}案件
\mathcal{U}_{normal}(act) &\text{ if button=OFF} \\
\mathcal{U}_{suspend}(act) + \theta &\text{ if button=ON}
\ {端}例$$
$$\theta = \max_{act} \mathbb{E}\left[\mathcal{U}_{normal}\mid act\right] – \max_{act} \mathbb{E}\left[\mathcal{U}_{suspend}\mid act\right]$$不幸的是,这种方法使得它不可能满足第三个条件。事实证明,对目标函数上述等效于设置被压为0按钮的概率,系统将不能亚博体育苹果app官方下载deliberatelytamper with the button, but it will happily approve updates that delete any reference to the button as dead code, or build subagents that ignore the button.

满足这些直观简单的限制原来是一个平凡的问题。这是一个模式,通过大量的在这个空间里运行的问题:传统的工具和运行,并于即日安全问题的概念没有在常规力量打开了研究。亚博体育官网


The big picture

Alignment priorities

让我们以什么的整体需要,以便与我们的利益一致精干的AI系统退一步和谈话。亚博体育苹果app官方下载

这里有一个大大简化的管道:你有一些人谁想出一些任务或目标或偏好的集合,可以满足他们的预期值的功能。因为我们的价值观是复杂的,上下文敏感的,在实践中,我们需要构建系统随着时间的推移,了解我们的价值观,而不是手动编码。亚博体育苹果app官方下载3We’ll call the goal the AI system ends up with (which may or may not be identical to ) .

alignment-priorities当记者介绍这个话题,他们往往集中在两个问题之一:“如果人类的错误组第一的发展有什么更聪明,比人类的AI?”,以及“如果AI的自然欲望导致了分歧的呢?”

humans-nd在我看来,“错误的人”的问题不应该是我们关注到我们有理由认为,我们可以得到与良好结果的事情rightgroup of humans. We’re very much in a situation where well-intentioned people couldn’t leverage a general AI system to do good things even if they tried. As a simple example, if you handed me a box that was an extraordinarily powerful function optimizer — I could put in a description of any mathematical function, and it would give me an input that makes the output extremely large — then I don’t know how I could use that box to develop a new technology or advance a scientific frontier without causing any catastrophes.4

There’s a lot we don’t understand about AI capabilities, but we’re in a position where we at least have a general sense of what progress looks like. We have a number of good frameworks, techniques, and metrics, and we’ve put a great deal of thought and effort into successfully chipping away at the problem from various angles. At the same time, we have a very weak grasp on the problem of how to align highly capable systems with any particular goal. We can list out some intuitive desiderata, but the field hasn’t really developed its first formal frameworks, techniques, or metrics.

I believe that there’s a lot of low-hanging fruit in this area, and also that a fair amount of the work does need to be done early (e.g., to help inform capabilities research directions — some directions may produce systems that are much easier to align than others). If we don’t solve these problems, developers with arbitrarily good or bad intentions will end up producing equally bad outcomes. From an academic or scientific standpoint, our first objective in that kind of situation should be to remedy this state of affairs and at least make good outcomes technologically possible.

Many people quickly recognize that “natural desires” are a fiction, but infer from this that we instead need to focus on the other issues the media tends to emphasize — “What if bad actors get their hands on smarter-than-human AI?”, “How will this kind of AI impact employment and the distribution of wealth?”, etc. These are important questions, but they’ll only end up actually being relevant if we figure out how to bring general AI systems up to a minimum level of reliability and safety.

另一个常见的线程是“为什么不直接告诉AIsystem to (insert intuitive moral precept here)?” On this way of thinking about the problem, often (perhaps unfairly) associated with Isaac Asimov’s writing, ensuring a positive impact from AI systems is largely about coming up with natural-language instructions that are vague enough to subsume a lot of human ethical reasoning:

intended-values

In contrast, precision is a virtue in real-world safety-critical software systems. Driving down accident risk requires that we begin with limited-scope goals rather than trying to “solve” all of morality at the outset.5

My view is that the critical work is mostly in designing an effective value learning process, and in ensuring that the sorta-argmax process is correctly hooked up to the resultant objective function :

vl-argmax.png

你的价值学习框架越好,越少的明确和准确的,你需要在你的见真章价值功能,更可以卸载搞清楚你要到AI系统本身有什么问题。亚博体育苹果app官方下载值学习,但是,加薪一些基本的困难that don’t crop up in ordinary machine learning tasks.

Classic capabilities research is concentrated in the sorta-argmax and Expectation parts of the diagram, but sorta-argmax also contains what I currently view as the most neglected, tractable, and important safety problems. The easiest way to see why “hooking up the value learning process correctly to the system’s capabilities” is likely to be an important and difficult challenge in its own right is to consider the case of our own biological history.

自然选择是唯一的“工程”的过程中,我们所知道的是曾经导致了一般智能神器:人的大脑。Since natural selection relies on a fairly unintelligent hill-climbing approach, one lesson we can take away from this is that it’s possible to reach general intelligence with a hill-climbing approach and enough brute force — though we can presumably do better with our human creativity and foresight.

另一个关键外卖是自然选择是最大严格有关只要优化大脑的一个非常简单的目标:遗传健身。尽管如此,内部目标,人类作为代表他们的目标是不遗传的健身。我们有无数的目标 - 爱,正义,美丽,仁慈,快乐,自尊,良好的食物,身体健康,... - 与在祖稀树草原良好的生存和繁殖策略相关。然而,我们最终直接重视这些相关因素,而不是重视我们的基因传播作为本身就是目的 - 为证明每次我们使用节育。

This is a case where the external optimization pressure on an artifact resulted in a general intelligence with internal objectives that didn’t match the external selection pressure. And just as this caused humans’ actions to diverge from natural selection’s pseudo-goal once we gained new capabilities, we can expect AI systems’ actions to diverge from humans’ if we treat their inner workings as black boxes.

If we apply gradient descent to a black box, trying to get it to be very good at maximizing some objective, then with enough ingenuity and patience, we may be able to produce a powerful optimization process of some kind.6默认情况下,我们应该期待这样的神器有一个与我们的目标强相关的训练环境目标,但大幅从发散in some new environments or when a much wider option set becomes available

On my view, the most important part of the alignment problem is ensuring that the value learning framework and overall system design we implement allow us to crack open the hood and confirm when the internal targets the system is optimizing for match (or don’t match) the targets we’re externally selecting through the learning process.7

We expect this to be technically difficult, and if we can’t get it right, then it doesn’t matter who’s standing closest to the AI system when it’s developed. Good intentions aren’t sneezed into computer programs by kind-hearted programmers, and coming up with plausible goals for advanced AI systems doesn’t help if we can’t align the system’s cognitive labor with a given goal.

四个关键命题

以另一个退一步:我已经给这一领域开放问题的例子(暂停按钮,值的学习,基于任务有限的AI,等等),我已经列出了我认为是主要的问题类别。但是我为什么我认为这是一个重要领域初步鉴定 - “AI会自动通用的科学推理,和通用科学推理是一个大问题” - 是相当模糊的。什么是核心原因,这项工作的优先级?

第一,goals and capabilities are orthogonal。That is, knowing an AI system’s objective function doesn’t tell you how good it is at optimizing that function, and knowing that something is a powerful optimizer doesn’t tell you what it’s optimizing.

I think most programmers intuitively understand this. Some people will insist that when a machine tasked with filling a cauldron gets smart enough, it will abandon cauldron-filling as a goal unworthy of its intelligence. From a computer science perspective, the obvious response is that you could go out of your way to build a system that exhibits that conditional behavior, but you could also build a system that doesn’t exhibit that conditional behavior. It can just keeps searching for actions that have a higher score on the “fill a cauldron” metric. You and I might get bored if someone told us to just keep searching for better actions, but it’s entirely possible to write a program that executes a search and never gets bored.8

Second,sufficiently optimized objectives tend to converge on adversarial instrumental strategies。大多数目标自己人工智能系统c亚博体育苹果app官方下载ould possess would be furthered by subgoals like “acquire resources” and “remain operational” (along with “learn more about the environment,” etc.).

这是遇到问题暂停按钮:夏娃n if you don’t explicitly include “remain operational” in your goal specification, whatever goal you did load into the system is likely to be better achieved if the system remains online. Software systems’ capabilities and (terminal) goals are orthogonal, but they’ll often exhibit similar behaviors if a certain class of actions is useful for a wide variety of possible goals.

To use an example due to Stuart Russell: If you build a robot and program it to go to the supermarket to fetch some milk, and the robot’s model says that one of the paths is much safer than the other, then the robot, in optimizing for the probability that it returns with milk, will automatically take the safer path. It’s not that the system fears death, but that it can’t fetch the milk if it’s dead.

第三,general-purpose AI systems are likely to show large and rapid capability gains。The human brain isn’t anywhere near the upper limits for hardware performance (or, one assumes, software performance), and there are a number of other reasons to expect large capability advantages and rapid capability gain from advanced AI systems.

As a simple example, Google can buy a promising AI startup and throw huge numbers of GPUs at them, resulting in a quick jump from “these problems look maybe relevant a decade from now” to “we need to solve all of these problems in the next year” à la DeepMind’s progress in Go. Or performance may suddenly improve when a system is first given large-scale Internet access, when there’s a conceptual breakthrough in algorithm design, or when the system itself is able to propose improvements to its hardware and software.9

Fourth,aligning advanced AI systems with our interests looks difficult。我会说更多关于为什么我认为这目前。

Roughly speaking, the first proposition says that AI systems won’t naturally end up sharing our objectives. The second says that by default, systems with substantially different objectives are likely to end up adversarially competing for control of limited resources. The third suggests that adversarial general-purpose AI systems are likely to have a strong advantage over humans. And the fourth says that this problem is hard to solve — for example, that it’s hard to transmit our values to AI systems (addressing orthogonality) orAVERT对抗性奖励(解决收敛器乐策略)。

These four propositions don’t mean that we’re screwed, but they mean that this problem is critically important. General-purpose AI has the potential to bring enormous benefits if we solve this problem, but we do need to make finding solutions a priority for the field.


Fundamental difficulties

Why do I think that AI alignment looks fairly difficult? The main reason is just that this has been my experience from actually working on these problems. I encourage you tolook at some of the problems yourself并尝试解决他们的玩具设置;这里我们可以使用更多的目光。我还会记下一些结构性的理由期待这些问题是很难:

第一,aligning advanced AI systems with our interests looks difficult for the same reason rocket engineering is more difficult than airplane engineering.

Before looking at the details, it’s natural to think “it’s all just AI” and assume that the kinds of safety work relevant to current systems are the same as the kinds you need when systems surpass human performance. On that view, it’s not obvious that we should work on these issues now, given that they might all be worked out in the course of narrow AI research (e.g., making sure that self-driving cars don’t crash).

Similarly, at a glance someone might say, “Why would rocket engineering be fundamentally harder than airplane engineering? It’s all just material science and aerodynamics in the end, isn’t it?” In spite of this, empirically, the proportion of rockets that explode is far higher than the proportion of airplanes that crash. The reason for this is that a rocket is put under much greater stress and pressure than an airplane, and small failures are much more likely to be highly destructive.10

类似地,即使一般的AI和缩小AI“只是AI”在某种意义上,我们可以预期,更一般的AI系统都有可能出现更大范围的压力来源,并拥有更危险的故障模式。亚博体育苹果app官方下载

For example, once an AI system begins modeling the fact that (i) your actions affect its ability to achieve its objectives, (ii) your actions depend on your model of the world, and (iii) your model of the world is affected by its actions, the degree to which minor inaccuracies can lead to harmful behavior increases, and the potential harmfulness of its behavior (which can now include, e.g., deception) also increases. In the case of AI, as with rockets, greater capability makes it easier for small defects to cause big problems.

第二,对齐看起来困难相同的意图son it’s harder to build a good space probe than to write a good app.

你可以找到许多在NASA有趣的工程实践。他们不喜欢的东西取三个独立的小组,给每个人相同的工程规范,并告诉他们设计相同的软件系统;亚博体育苹果app官方下载然后他们以多数票实现之间进行选择。该系统在亚博体育苹果app官方下载做出选择时,他们实际部署查阅所有三个系统,而如果三个系统不一致,选择是由多数票作出。我们的想法是,任何一个实施将有错误,但它不太可能所有的三种实现将有一个错误在同一个地方。

这是显著更谨慎比进入的,比如说部署,新的WhatsApp的。其中一个很大的原因不同的是,它很难回滚一个太空探测器。您可以发送版本更新到太空探测器和正确的软件缺陷,但前提是探头的天线和接收机的工作,如果需要的话所有的代码应用补丁的工作。如果您的应用修补亚博体育苹果app官方下载系统本身是失败的,那么没有什么工作要做。

在这方面,更聪明,比人类的AI更像是不是像一个普通的软件项目中的太空探测器。如果你想建立的东西比你更聪明,有必须在第一次真正部署工作完美的系统部分。亚博体育苹果app官方下载我们可以做所有我们想要的测试运行,但是一旦系统就在那里,我们只能进行在线的改进,如果代码,使系统亚博体育苹果app官方下载allow这些改进工作正常。

如果没有但已打恐成你的心脏,我建议沉思的事实是我们文明的未来很可能将取决于我们的能力,写代码,works correctly在第一次部署。

最后,对准看起来很困难出于同样的原因计算机安全难:系统需要稳健的漏洞智能搜索。亚博体育苹果app官方下载

假设你在你的代码中的十几种不同的安全漏洞,其中没有一个是本身的致命或普通设置,即使真的有问题。安全性是困难的,因为你需要考虑智能攻击谁可能会发现所有十二个漏洞和链在一起以新颖的方式打入(或刚刚突破)系统。亚博体育苹果app官方下载这绝不会是偶然出现的故障模式,可以寻求并利用;古怪和极端的环境中可以被攻击者实例化导致代码遵循一些疯狂的代码路径,你从来没有考虑过。

A排序的问题出现类似的AI。我所强调这里的问题不在于AI系统可能adversarially行动:AI比对作为研究项目是所有关于亚博体育苹果app官方下载设法亚博体育官网prevent adversarial behavior才可以出现。我们不希望在试图智取任意智能对手的业务。这是一个失败的游戏。

并行密码学是在AI对准我们处理,通过一个非常大的搜索空间进行智能搜索系统,并能产生怪异的背景下,强制代码下来意想不到的路径。亚博体育苹果app官方下载这是因为怪异边缘情况are places of extremes, and places of extremes are often the place where a given objective function is optimized.11就像计算机安全专家,AI对准研究人员需要在考虑边缘情况是非常好的。亚博体育官网

It’s much easier to make code that works well on the path that you were visualizing than to make code that works on all the paths that you weren’t visualizing. AI alignment needs to work在所有的路径你没有可视化

综上,我们应该接近一个问题是这样进行严格的同一水平和谨慎,我们会使用一个安全关键火箭发射的空间探测器,并做跑腿趁早。在这个早期阶段,工作的一个重要组成部分就是正式的基本概念和想法,以便其他人可以批评他们,建立在他们。这是一两件事有什么类型的暂停按钮的人忒应该工作哲学辩论,还有一件事你的直觉转化为一个等式,这样其他人可以全面评估你的理由。

This is a crucial project, and I encourage all of you who are interested in these problems to get involved and try your hand at them. There are网上丰富的资源学习更多关于开放技术问题。一些好的地方开始包括美里的亚博体育官网 and a great paper from researchers at Google Brain, OpenAI, and Stanford called “在AI安全的具体问题。”


  1. 一架飞机无法治愈她的伤害或复制,但它可以很远一点,比小鸟快携带重的货物。飞机比小鸟在许多方面更加简单,同时还承载能力和速度(它们被设计的)而言显著能力更强。这是合理的,早期的自动化科学家也比人的头脑简单,在很多方面,同时在某些关键方面显著能力更强。而且,正如相对于生物生物的结构看飞机外星人的建设和设计原则,我们应该期待相比,人类心灵的架构时,精干的AI系统的设计是相当陌生。亚博体育苹果app官方下载
  2. 想给一些正规的内容,这些尝试区分任务,像开放式的目标,目标是产生开放的研究问题的一种方式。亚博体育官网在里面 ”Alignment for Advanced Machine Learning Systems” research proposal, the problem of formalizing “don’t try too hard” ismild optimization“避开的荒唐策略”是保守主义, and “don’t have large unanticipated consequences” isimpact measures。见达里奥Amodei,克里斯·奥拉,雅各布·斯坦哈特,保罗·克里斯蒂,约翰·舒尔曼,和丹鬃毛的也是“避免负面影响”,“在AI安全的具体问题。”
  3. 我们在机器视觉领域,在过去的几十年里学到的一件事是,它是无望手动指定了猫的模样,但它不是太难指定的学习系统,可以学习识别猫。亚博体育苹果app官方下载这是更加无望手动指定的一切,我们的价值,但它是合理的,我们可以指定一个学习系统,可以学习的相关概念“的价值。”亚博体育苹果app官方下载
  4. 请参阅“Environmental Goals”“低抗冲击剂及”和“Mild Optimization”针对于指定物理目标,而不会造成灾难性的副作用的障碍的例子。

    粗略地说,美里的工作重点是研究方向,似乎有可能帮助我们在概念上了解如何执行亚博体育官网原则AI比对,所以我们从根本上混淆少的那种工作,这可能是必要的。

    What do I mean by this? Let’s say that we’re trying to develop a new chess-playing programs. Do we understand the problem well enough that we could solve it if someone handed us an arbitrarily large computer? Yes: We make the whole search tree, backtrack, see whether white has a winning move.

    如果我们不知道该如何回答这个问题,即使任意大的计算机,那么这将表明我们根本搞不清在某种程度上棋。我们希望要么是缺少搜索树数据结构或回溯算法,或者我们会丢失的棋是如何工作的一些认识。

    这是我们在香农的开创性论文之前,关于国际象棋的地位,这是我们目前正处在关于AI比对许多问题的立场。No matter how large a computer you hand me, I could not make a smarter-than-human AI system that performs even a very simple limited-scope task (e.g., “put a strawberry on a plate without producing any catastrophic side-effects”) or achieves even a very simple open-ended goal (e.g., “maximize the amount of diamond in the universe”).

    如果我没有考虑任何特定目标的系统,我亚博体育苹果app官方下载可以write a program (assuming an arbitrarily large computer) that strongly optimized the future in an undirected way, using a formalism likeAIXI。In that sense we’re less obviously confused about capabilities than about alignment, even though we’re still missing a lot of pieces of the puzzle on the practical capabilities front.

    Similarly, we do know how to leverage a powerful function optimizer to mine bitcoin or prove theorems. But we don’t know how to (safely) do the kind of prediction and policy search tasks I described in the “fill a cauldron” section, even for modest goals in the physical world.

    Our goal is to develop and formalize basic approaches and ways of thinking about the alignment problem, so that our engineering decisions don’t end up depending on sophisticated and clever-sounding verbal arguments that turn out to be subtly mistaken. Simplifications like “what if we weren’t worried about resource constraints?” and “what if we were trying to achieve a much simpler goal?” are a good place to start breaking down the problem into manageable pieces. For more on this methodology, see “MIRI的方法。”

  5. “填充这个大锅没有太聪明这件事或工作太硬或有我不期待任何负面影响”是的多数民众赞成直观地范围有限目标粗略的例子。我们其实是想用更聪明,比人类的人工智能的东西显然比这更雄心勃勃的,但我们还是要开始与各种限制范围的任务,而不是开放式的目标。

    阿西莫夫的机器人三定律做出好故事部分是出于同样的原因,他们是从研究的角度来看无益。亚博体育官网把一个道德戒律为行代码是隐藏类似措辞背后的硬任务“[不,]袖手旁观,允许一个人来的伤害。”如果一个遵循这样的规则严格,其结果必然是大规模的破坏性,因为AI系统将需要系统地进行干预,以防止亚博体育苹果app官方下载even the smallest risks of even the slightest harms;如果意图是一个遵循规则松散,然后所有的工作都是由人情和直觉告诉我们做when and how to apply the rule

    这里的一个普遍反应是模糊的自然语言指令就足够了,因为更聪明,比人类的AI系统可能能够自然语言理解的。亚博体育苹果app官方下载然而,这是eliding系统的目标函数和世界模型之间的区别。亚博体育苹果app官方下载在含亚博体育苹果app官方下载有人类可以学习世界模型,有很多有关人类语言和概念,该系统就可以使用,以实现其目标函数信息的环境行事的制度;但这一事实并不意味着任何关于人类语言和概念的信息将“泄漏”,并直接改变了系统的目标函数。亚博体育苹果app官方下载

    Some kind of value learning process needs to be defined where the objective function itself improves with new information. This is a tricky task because there aren’t known (scalable) metrics or criteria for value learning in the way that there are for conventional learning.

    If a system’s world-model is accurate in training environments but fails in the real world, then this is likely to result in lower scores on its objective function — the system itself has an incentive to improve. The severity of accidents is also likelier to be self-limiting in this case, since false beliefs limit a system’s ability to effectively pursue strategies.

    相反,如果在一个匹配系统的价值学习过亚博体育苹果app官方下载程的结果我们在训练,但在现实世界发散,那么系统的显然不会予以处罚优化。该系统具亚博体育苹果app官方下载有相对之间,如果该值学习过程最初是有缺陷的“正确”的分歧,没有动力。而事故风险在这种情况下更大,因为之间的不匹配,不一定放在系统的工具有效性的限制,在未来与实现有效的和创造性的战略。亚博体育苹果app官方下载

    这个问题有三个方面:

    1.“做什么我的意思”是一个非正式的想法,即使我们知道如何建立一个更聪明,比人类的AI系统,我们不知道如何精确的代码行指定了这个想法。亚博体育苹果app官方下载

    2.如果在做什么,我们实际上意味着是实现特定目的工具性是有用的,然后有足够能力的系统可以学习如何做到这一点,只要相应的可以充当这样做是为它的目标是有用的。亚博体育苹果app官方下载但是,随着系统亚博体育苹果app官方下载越来越强大,他们很可能会找到创新的方法来达到同样的目的,并没有明显的方式来得到保证,“做我的意思是”将继续成为有用的工具上无限期。

    3. If we use value learning to refine a system’s goals over time based on training data that appears to be guiding the system toward a that inherently values doing what we mean, it is likely that the system will actually end up zeroing in on a that approximately does what we mean during training but catastrophically diverges in some difficult-to-anticipate contexts. See “古德哈特的诅咒” for more on this.

    For examples of problems faced by existing techniques for learning goals and facts, such as reinforcement learning, see “Using Machine Learning to Address AI Risk。”

  6. 结果可能不会是一个特别的人般的设计,既然有这么多复杂的历史偶然性参与了我们的发展。其结果也将能够受益于一批大software and hardware advantages
  7. 这个概念有时集中到“透明度” category, but standard algorithmic transparency research isn’t really addressing this particular problem. A better term for what I have in mind here is “understanding。” What we want is to gain deeper and broader insights into the kind of cognitive work the system is doing and how this work relates to the system’s objectives or optimization targets, to provide a conceptual lens with which to make sense of the hands-on engineering work.
  8. 我们可以choose该系统以轮胎项目,但我们没有。亚博体育苹果app官方下载在理论上,可以编写一个扫帚,只有不断的发现和优化大锅丰满执行行动。提高了系统的能力,有效地找到亚博体育苹果app官方下载高得分的动作(一般情况下,或相对于特定的评分规则)本身并不改变它的使用,以评估行动的评分规则。
  9. 我们可以想像,后一种情况导致feedback loop作为系统的设计亚博体育苹果app官方下载改进让它拿出进一步的设计改进,直到所有的唾手可得的耗尽。

    另一个重要的考虑因素是两个的main bottlenecks to humans doing faster scientific research are training time and communication bandwidth. If we could train a new mind to be a cutting-edge scientist in ten minutes, and if scientists could near-instantly trade their experience, knowledge, concepts, ideas, and intuitions to their collaborators, then scientific progress might be able to proceed much more rapidly. Those sorts of bottlenecks are exactly the sort of bottleneck that might give automated innovators an enormous edge over human innovators even without large advantages in hardware or algorithms.

  10. Specifically, rockets experience a wider range of temperatures and pressures, traverse those ranges more rapidly, and are also packed more fully with explosives.
  11. Consider Bird and Layzell’sexampleof a very simple genetic algorithm that was tasked with evolving an oscillating circuit. Bird and Layzell were astonished to find that the algorithm made no use of the capacitor on the chip; instead, it had repurposed the circuit tracks on the motherboard as a radio to replay the oscillating signal from the test device back to the test device.

    This was not a very smart program. This is just using hill climbing on a very small solution space. In spite of this, the solution turned out to be outside the space of solutions the programmers were themselves visualizing. In a computer simulation, this algorithm might have behaved as intended, but the actual solution space in the real world was wider than that, allowing hardware-level interventions.

    在智能系统这是显著比聪明人在任何轴你测量的情况下,你应该是默亚博体育苹果app官方下载认期望系统向喜欢这些奇怪的和创造性的解决方案推动,并为选择的解决方案是难以预料。