Statistical Exponential Random Graph Model(SERGMs)

Random Network Models:

  • Erdos-Renyi
    • Useful for understanding thresholds and how networks come to exhibit certain features
    • Miss many real-world features: e.g., clustering
  • other models link-by-link models
    • Watts and Strogatz, Barabasi and Albert, Jackson and Rogers...
    • Capture other features: clustering, degree distribution, correlation...
  • Stochastic Block Models
    • Enrich Erdos-Renyi to allow for probabilities to depend on node characteristics, attributes(or on latent - unobserved characteristics)
  • Popular set of models: ERGMs and new ones: SERGMs/SUGMs
    • flexible way to introduce various local features and dependencies
    • estimated statistically

So what we've been through is we had sets of models that are linked based, other models which are still link by link based which can begin to capture different things like clustering and so forth. We brought in attributes in the Stochastic block models. And then we said okay, now there's the class of models which allow us to capture richer things where there's two dependencies and trying to estimate those things statistically. Exponential Random Graph Models, but as we saw there is difficulties in estimating those. So what I want to talk about now is a class of new ones called Statistical Exponential Random Graph Models. That'll allow us to keep the track of these local features and dependencies and actually do some accurate and fast and easy estimation.

SERGMs: Introduction

  • ERGMs not accurately estimable in many cases...
  • Too many alternative networks to consider...
  • Way out: Many networks lead to the same statistics
    • Probabilities only depend on statistics
    • So, newworks with same statistics are "equivalent"(equally likely)
  • Collapse all equivalent networks

the Exponential Random Graph Models are not accurately estimable in, in many cases. There's basically just too many alternative networks to consider. So what's the idea here? What's the way out? The way out is going to be that many networks lead to the same statistics. So for instance in the, in the simulations we did in the last video what we had was a series of you know, 45 links, 10 triangles and 20 isolated nodes. And in fact, under the statistical, under an Exponential Random Graph formulation, any network that had exactly those characteristics would be equally likely. They'd all lead to the same function, they'd lead to the same probability. So what we can do is we can just say, okay look, even though there's many different networks to search over, In fact, what we really need to keep track of is just the number of different types of statistics, because all the networks that have the same statistic have the same probability. And so we can look at those as equivalent ones. Collapse all those equivalent networks. And that makes the summation and the area that we have to be searching over much smaller than what it was before.

Sufficient Statistics

  • $Pr(g) = \frac{exp(\beta S(g))}{\sum_{g'}exp(\beta S(g'))}$

Let N of s prime be the number of networks that would have statistics equal to s prime.

$N(s')$ = number networks

s.t. $S(g') = s'$

So how many different networks are there that have 20 isolates? 10 triangles, and 45 links. So that's a set of statistics. That's our S of g.

So how many different networks would have exactly this, that will let N of s be this. Now that's not necessarily an easy number to, to calculate. And in some cases it can actually be a complicated number to even estimate.

  • $Pr(g) = \frac{exp(\beta S(g))}{\sum_{s'}N(s')exp(\beta s')}$

the probability that a certain set of statistics

Statistical Form

  • $Pr(g) = \frac{exp(\beta S(g))}{\sum_{s'}N(s')exp(\beta s')}$

  • $Pr(s) = \frac{N(s)exp(\beta s)}{\sum_{s'}N(s')exp(\beta s')}$

    So ultimately, it's not so much that we're interested in the particular network we saw, as much as it had 20 isolates, 10 triangles and 45 links. So that was the really important aspect, was that it had certain attributes. Those are the kinds of things we're going to want to test as scientists. You know, does the network have certain characteristics? That was a very a very particular network realized, as opposed to some other one. Whatever the characteristics we're really interested in, we can put into those statistics.

    • And so this is what we're ultimately interested in.
    • What's the probability that we see a network that has certain characteristics.

    So now we're going to call this the statistical form here. Because we've gotten rid of the networks and we are just talking about the properties of the network. And all estimations are now in statistics based which can be a much smaller space than we are working in than the original space in, in that ideas as the idea behind the Statistical Exponential Random Graph Models.

And of course, anywhere one of these representations of an Exponential Random Graph Model then has a corresponding representation in this format. And if we can estimate the parameters here, then that tells us what the parameters would be in the original model. So we can work in this space, if we can do the estimation and, and track back and we get the Exponential Random Graph spot.

Instead of asking what is the probability of a specific network:

Ask: What is the probaility of observing a network that has density of links .1, clustering .3, and average path length of 2.7, etc.?

SERGMs Include:

  • $Pr(s) = \frac{K(s)exp(\beta s)}{\sum_{s'}K(s')exp(\beta s')}$ SERGMs

s can encode many things:

  • Links, cliques, k-stars, subgraphs, friends in common per link, multi-graphs, adapht for degree distributions
  • Can do preference based-models
  • Allow for node characteristics...

Challenge:

What was, usually faced with is one network, right? We don't see many different networks. Usually, we just see one realization. And now we've got these inter-dependencies. So we've got problems in, in working with large numbers because we've, we have basically only got one observation. But we do have many observations of how many links are present, which triangles are present. Even though these qui, aren't quite independent, we can still ask the question, are they enough? Is there enough information in one of these things to estimate a model? Okay? And, and so it's not completely obvious that as you estimate these kinds of models, where you've got all these, inter-dependencies, that whatever estimates of the parameters you're going to get are going to be accurate. So we can do, you know, we can go ahead and do say maximum likelihood on this kind of model, an Exponential Random Graph Model, now within a statistical form. And we can go through and estimate these things. So, you know, we saw certain statistic come out. We can say okay, what are the betas which actually maximize this function that make it most likely that we saw these statistics rather than some other ones? Now we can just go ahead and do maximum likelihood on this function. And we can ask, when is it that these betas will be accurate estimates of the true parameters. How large does the, does our data set has to be before we're pretty sure that were getting an accurate look at what nature was doing?

SERGM Estimation

  • Chandrasekhar and Jackson(2012) provide results on
    • Classes of SERGMs for which maximum likelihood converge to true parameters as n grows
    • Simple ways of estimating those
  • Look at a related set of models