# Outline¶

• Bayesian learning

• repeated actions, observe each other
• DeGroot model

• repeated communication, "naive" updating

# Bayesian Learning¶

• Will society converge

• Will they aggregate information properly?

# Bala Goyal 98¶

• n players in an undirected component g

• Choose action A or B each period

• A pays 1 for sure, B pays 2 with probability p and 0 with probability 1-p

# learning¶

• Each period get a payoff based on choice

• Also observe neighbors' choices

• Maximize discounted stream of payoffs $E[\sum_t \delta^t \pi_{it}]$

• p is unkown takes on finite set of values

# Proposition¶

• If p is not exactly 1/2, then with probability 1 there is a time such that all agents in a given component play just one action (and all play the same action) from that time onward

# Sketch of Proof¶

• Suppose contrary

• Some agent in some component plays B infinitely often

• That agent will converge to true belief by the law of large numbers

• Must be that belief converges to p>1/2, or that agent would stop playing B

# Play the right action?¶

• If B is the right action then play the right action if converge to it, but might not

• If A is the right action, then must converge to right action

Consider the model of observational Bayesian learning on a network that we have discussed in which action A pays 1 for sure and action B pays 2 with an initially unknown probability p, and 0 with probability 1-p. Suppose that the society is in a network that is connected and all agents start with the same beliefs over which possible values p could have, and think p to be either 1/4 or 3/4.

According the result we discussed, following statement(s) are correct:

• If p<0.5, then with probability 1 all agents will play action A from some time onwards.
• With probability 1, there is some time after which all agents will play the same action.

Notice it could occur that all agents eventually play A even though B is actually the higher return action: provided they have sufficiently pessimistic beliefs about the return to playing B which could come from a sufficiently pessimistic prior or from bad luck on the initial outcomes from playing B.

However, they cannot end up eventually playing B when it is the lower return action, since they would then eventually learn it to have a lower payoff. So in that case they eventually play A, and so the last statement is also correct.

# Probability of Converging to "correct" action¶

• Arbitrarily high if each action has some agent who initially has arbitrarily high prior that the action is the best one

# Conclusions¶

• Consensus action chosen

• Not necessarily consensus belief

• Speed of convergence?

# Limitations¶

• Homogeneity of actions and payoffs across players

• What if heterogeneity?

• Repeated actions over time

• Stationarity

• Networks are not playing role here!