Wednesday, October 20, 2010

The nine phases of a network analytic project

In a bold attempt to tame network analytic literature, I have boiled down network analytic projects to the following nine phases:

  1. Pose the question;
  2. Build the network G;
  3. Choose a set of subgraphs H;
  4. Design a measure on H in G;
  5. Choose an appropriate null-model, i.e., choose a suitable random graph model G';
  6. Measure significance of H with respect to G';
  7. Design a model M that evolves all significant structures in H;
  8. Design a process P on top of the network;
  9. Analyze the interplay between M and P.
 You don't find this scheme in your favourite network analytic paper? So, let's check with the classic papers of Watts and Strogatz and Barabasi and Albert.

A first example: The Collective Dynamics of Small-World Networks by D. Watts and S. Strogatz

  1. The authors start with the question of whether real-world networks are rather of the ordered, local type or whether they are of the purely random type. 
  2. To check this question, they construct three types of networks: the known neural connections of the worm Caenorhabditis elegance; the actors that were co-casted for any film in the IMDB at that time; and the U.S. power grid.
  3. They actually define two sets of subgraphs: one contains all shortest paths in G, the other contains for each node the so-called ego-network, i.e., the set of edges to and between its neighbors.
  4. The measures they define are once the average length of all paths, and the clustering coefficient of each ego-network.
  5. The null-model they choose is the classic random graph model G(n,p). 
  6. Although they do not directly assess the significance of the values in G, they compare the observed values with the one in the according random graph. From the difference between these values it is quite obvious that also the significance of these values is strong. 
  7. The small-world model captures the newly found significant structure, namely the combination of high clustering coefficient and low average path length. Moreover, this model has only one parameter p that enables them to scale between a rather random graph and a strongly ordered one.
  8. They also define some processes on networks, namely a version of an iterated prisoner's dilemma, synchronization of coupled oscillators, and a simple voting model.
  9. In a last step they analyze the interplay of the outcome of any of the three process with respect to the chosen parameter p. 

A second example: Emergence of Scaling in Random Networks by A.-L. Barabási and R. Albert

  1. The authors pose the question of what kind of degree distribution we normally see in a real-world network.
  2. They used the same actor and power grid networks as Watts and Strogatz and added a network constructed from a partial crawl of the web at that time. 
  3. The set of subgraphs they defined is given by each vertex and its incident edges.
  4. The measure on this set is simply the number of edges in each subgraph, the degree of each vertex. It is this time aggregrated and viewed as a measure of the whole graph.
  5. Again, the authors use G(n,p) as the suitable random graph model and also the newly introduced small-world model.
  6. They compare the expected degree distributions in both null-models with the ones observed in the real networks and it is obvious that these deviate significantly from each other (although this is not formally assessed).
  7. The preferential attachment model  is then introduced to capture these newly found significant structures.
  8. In a second paper (Error and Attack Tolerance of Complex Networks) they define two processes on networks, namely the random removal of nodes (error/random failure) and the deliberate removal of the best connected nodes (attack).
  9. They then analyze the behavior of these two processe with respect to the underlying graph model, namely the fully random graph from the G(n,p) model and the newly designed preferential attachment model.
You see that the phases capture these papers quite well. Of course, we restrict ourselves to those network analytic projects on static networks, so far. I am convinced that most of our network analytic pet methods like centrality measures, clustering, network motifs, and position/role assignment can be meaningfully described within this framework. Let me know your opinion!

No comments:

Post a Comment