Bayesian learning

- repeated actions, observe each other

DeGroot model

- repeated communication, "naive" updating

Will society converge

Will they aggregate information properly?

n players in an undirected component g

Choose action A or B each period

A pays 1 for sure, B pays 2 with probability p and 0 with probability 1-p

Each period get a payoff based on choice

Also observe neighbors' choices

Maximize discounted stream of payoffs $E[\sum_t \delta^t \pi_{it}]$

p is unkown takes on finite set of values

- If p is not exactly 1/2, then with probability 1 there is a time such that all agents in a given component play just one action (and all play the same action) from that time onward

Suppose contrary

Some agent in some component plays B infinitely often

That agent will converge to true belief by the law of large numbers

Must be that belief converges to p>1/2, or that agent would stop playing B

If B is the right action then play the right action if converge to it, but might not

If A is the right action, then must converge to right action

Consider the model of observational Bayesian learning on a network that we have discussed in which action A pays 1 for sure and action B pays 2 with an initially unknown probability p, and 0 with probability 1-p. Suppose that the society is in a network that is connected and all agents start with the same beliefs over which possible values p could have, and think p to be either 1/4 or 3/4.

According the result we discussed, following statement(s) are correct:

- If p<0.5, then with probability 1 all agents will play action A from some time onwards.
With probability 1, there is some time after which all agents will play the same action.

Notice it could occur that all agents eventually play A even though B is actually the higher return action: provided they have sufficiently pessimistic beliefs about the return to playing B which could come from a sufficiently pessimistic prior or from bad luck on the initial outcomes from playing B.

However, they cannot end up eventually playing B when it is the lower return action, since they would then eventually learn it to have a lower payoff. So in that case they eventually play A, and so the last statement is also correct.

- Arbitrarily high if each action has some agent who initially has arbitrarily high prior that the action is the best one

Consensus action chosen

Not necessarily consensus belief

Speed of convergence?

Homogeneity of actions and payoffs across players

What if heterogeneity?

Repeated actions over time

Stationarity

Networks are not playing role here!