Best Strategy If We are Living in Simulation

We are living in simulation.

2020 David Kipping of Columbia University apply Bayesian Approach to estimate the hypothesis’s possibility. There is slightly over 50% chance that we live in reality, but if we can simulate conscious beings (for instance general artificial intelligence), the possibility will flip to almost 100% that we live in simulation. (aligned with Musk’s comment below)

2016 Elon Musk “The odds that we are in base reality is one in billions,” 

2003 Nick Bostrom of the University of Oxford, published paper of simulation argument.

No alt text provided for this image

~400BC Greek philosopher Plato’s Allegory of the cave: shadows become the reality of people in the cave, but not real representation of the world

(Fig1. Plato’s Allegory of the cave credit: Jan Saenredam – British Museum)

~400BC Cofounder of Taoism ZhuangZhou had the famous butterfly dream argument, questioning whether he was just a dream of a butterfly.

~600BC Buddism – Concept of Emptiness. In the words of Chögyal Namkhai Norbu, a leading authority on Tibetan culture, “In a real sense, all the visions that we see in our lifetime are like a big dream […]”

This article is not to prove the hypothesis, but utilize this hypothesis to make better strategy for life. All discussions below are based on the assumption that we do live in a simulation:

The Best Strategy

The hypothesis that we are living in simulation simplified the strategy search for complex life.

Life can be simplified into a reinforcement learning process. Human beings are agents who interacted with environment (other agents or world simulation) and get trained to maximize reward. During which, agents can choose to produce the next generation similar as evolutionary algorithm in global optimization. By end of life, agent left the environment and cannot take anything with them.

No alt text provided for this image

Fig2.Basic Diagram of Reinforcement Learning – credit: KDNuggets

Then search for life’s best strategy is an AI training routine:

  1. define business value -Why create this simulation?
  2. define reward function – What is the good metrics?
  3. choose the technical approach – how to train model efficiently?

(as for when to stop, what kind of world serve as the best simulation environment and many other technical details are skipped in this article)

1.Why our world is created in the first place?

No alt text provided for this image

(Fig3. Tesla Simulation for Autonomous Driving Training Credit: Tesla Livestream)

One guess can be that this world’s creator want a trained model for specific task, such as AI agent driving a car (hopefully the task is not serving as electric generator shown in film the Matrix) The dream of AI engineer is to train a model same as human driver. It would be even better to train 1.4billions models at the same time, and select the best one to use based on different scenarios. Similarly our creators are likely running trillions of trillions of simulations at the same time, this world is just one of them. So it is guaranteed for us to get maximum value of life by serving our creators’ purpose – generating a well trained model useful for them. And the value of life is the reward defined by creators, similar as reward function in AI training. It is projected in current simulation as happiness, meaningfulness, fruitfulness, sense of achievement… maybe in the format of chemicals such as dopamine and serotonin

2.What is the good metrics? – Definition of Reward Function

Unfortunately, there is no clear definition since we don’t understand the fundamental rules of the world even less about the intention of our creators. The most relevant concept which we can understand is meaning of life. That explains two great endeavors of human beings: searching for the meaning of life(philosophy, religion…) and understanding the fundamental rules of the world(science, religion…). Both contribute to define the reward function, though the former try to tackle the problem more directly. Lessons learnt in those two topics are the most valuable things for human beings and never fade. The efforts to do both topics will last until the end of simulation, during which different cooperation formats will be tried out, such as country, company, tribe, groups….

2.1 For AI agentthe most efficient strategy is using a pre-trained model or the best possible prior(as in Bayesian optimization). In the words of human being: Standing on the shoulders of giants, from wikipedia: “Using the understanding gained by major thinkers who have gone before in order to make intellectual progress”

There are many great thinkers on the searching for Meaning ( Viktor Frankl has a great book in 1946 fell under the same name). I quoted some which aligned with simulation assumption:

No alt text provided for this image

Fig 4(credit: Stanford university)

  1. Steve Jobs’ 2005 commencement at Stanford University: “And most important, have the courage to follow your heart and intuition. They somehow already know what you truly want to become. Everything else is secondary. ” — nobody knows how simulation creator’s intention or reward definition. Follow your heart is like random search with intelligent guess.
  2. Warren Buffett “The most important investment you can make is in yourself.” — For the simulation’s creator, you, the trained model is the most important. “Everything else is secondary”:-)
  3. Elon Musk “The meaning of life is to understand the nature of the universe and figure out what the meaning of life is.”

Since the reward is beyond our understanding, it has the advantage of not restricted by anything human created. The reward function can be obtained no matter who you are, where you are from or what you do: you don’t have to be anyone else, just yourself. People got penalty by using reward function. A good example is “self-exploitation” concept raised in Chayanov’s The Theory of Peasant Economy. In his study, Russian peasants in 1920s self exploited themselves by maximizing only money. The outcome is extensive long hours of labor with negative impact on quality of life.

3.How to train efficiently.

Once we have some idea about reward function, we could start the optimization process for life – to maximize the reward in a sustainable way with available resources. I will simplify the categorization of optimization methods into gradient-based optimizers and derivative-free optimizers.

3.1 When you are clear about your meaning of life:

You can increase your value much quickly by using gradient-based optimizer. In neuron network training, especially in supervised learning, gradient-based optimizer is significantly faster. However, this method has strict requirement for clarity and quality of reward function. Some of the examples are Stochastic Gradient Descent (SGD), Levenberg–Marquardt algorithm (LMG). Lesson learnt from AI Training:

  1. Specify reward function – the meaning of your life as clear as possible. It is the cheat code for using significantly faster optimizer to increase your value.
No alt text provided for this image

Fig 5. Gradient Descent Algorithm credit link: oreilly

3.2 When meaning of life is still a black box:

The problem becomes global optimization of black box functions. Some popular and relevant methods are listed below :

  1. Generic Algorithm: inspired by natural selection process, it used mutation, crossover and selection to produce better next generation. Through multiple generations, high-quality solution could be obtained to maximize the reward.
No alt text provided for this image

Fig 6. Generic Algorithm Process (credit: quantdare.com)

  1. Bayesian Optimization: Update understanding of behavior (a prior -> posterior distribution) through evaluation or observation. It is usually employed to optimize expensive-to-evaluate functions, which suits our use case here.
No alt text provided for this image

Fig 7. An-illustration-of-EI-based-Bayesian-optimization-in-an-example-scenario (credit: researchGate)

3. Multi-objective optimization: a method of multiple criteria decision making, it addressed the pain point of extracting single dimension reward function. Even incomplete understanding can guide the optimization process.

4. Surrogate optimization: this method use surrogate model to approximate the award function, it is another simplification method to reduce prerequisite for optimization.

Lesson learnt from the common points of above global optimization methods:

  1. Act first (Just do it!) and engage in activities that you can get feedback. Better to keep diary and document the actual outcome for better next iteration sampling.
  2. frequent iteration and evaluation of action to update understanding about the reward function – meaning of life and the environment – nature of the world.
  3. Keep iterating until you cannot afford the cost. Never settle down easily. Chasing a local optimal can be very costly with diminishing return.

Final words:

No matter whether the hypothesis is valid or not, it help me focus my limited time on great endeavors to me: searching for the meaning of life and understanding the nature of the world.

It is also one of the reasons why I choose simulation and AI as my research topic. It gives me first hand experience of how to move towards a “realistic” simulation step by step, use it to train AI agents and interact with this world by autonomous system such as robot. I consider it as proactive way to understand the nature of this world. It provides inspirations for me to find my reward function – meaning of life. So I shared some findings in this article.

Leave a comment