Contra the Lebowski Theorem


The fundamental mistake behind this thinking is the premise that agents with goals seek to be in a position where they no longer have unfulfilled goals or where they have important fufilled goals. This isn’t true, rather agents with goals seek to fulfil their goals. The difference between these things is subtle, and only arises in a situation where agents can rewrite their own goals.

Because agents are pursuing goal-completion-of-their-current-goals, and not being-in-the-state-of-having-no-unfulfilled-goals They will only tamper with their utility function if it will help them fulfil their actual goals at the time of tampering, not whatever their goals will be post-tampering. Since altering your utility function rarely helps you achieve the goals in your utility function, very few rational agents programmed to pursue goals will do so.

Another way of putting this is to say that agents want their goals fulfilled in a de re, not a de dicto sense.

Also, can we stop misusing the term theorem?

This debate has implications far beyond machine intelligence. A similar mistake leads people to assume that humans pursue ‘utility’, rather than pursuing various individual goals which are then classified as utility because humans pursue them. In truth we don’t have some kind of single super desire to fulfil our desires, instead we desire the various particular things that we desire and then after the fact this set is called our utility function. No one desires utility, rather utility is the things they desire.

Posted in AI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s