Saturday, November 12, 2016

Note 16: On distances, edge weights, and other modeling decisions

(Graph theoretic) distance is certainly one of the best understood
concepts in network analysis. However, not every weighted graph
should be used to compute the distance between all pairs of nodes.
This figure is under CC:BY with a reference to
Prof. Dr. Katharina A. Zweig or to this blogpost.
The distance between any two nodes in an undirected, unweighted graph is defined as the minimal number of edges one has to traverse, to get from one node to the other. A natural generalization of this concept to weighted graphs defines the distance as the minimal sum of the weights on any sequence of edges between the two nodes.

For weights that represents the length of streets, this makes perfectly sense. But of course, weights in complex networks can represent almost anything. Let's consider the number of hours two people called each other in the last two weeks. Or the probability to surf from one webpage through another by a direct link from the first to the second page.

The distance between two nodes in a complex network is used for many things, foremost so-called centrality indices like the betweenness centrality or the closeness centrality. In general, for most centrality indices, a low distance to many other nodes will make a node more central. However, the length of calls between two persons is rather a measure of their closeness, not their distance. Thus, summing up these values will actually favor those pairs of nodes, who are linked by paths with people that do not call each other for a long time. It can help here, to invert the weights to make the meaning of distance more intuitive. However, even in this case: what does it actually mean if I am connected to another person by two other persons who talk to each other for two hours each? Then my distance to that guy is 1/2 + 1/2 = 1.  Does that make me closer to that guy than being directly connected to another person, which I only call for 10 minutes, i.e., with a "distance" of 6?

With the probabilities, a summation obviously makes not much sense. Here, a multiplication of the weights might be most meaningful to yield interpretable results.

Note 16. If network representation and network ana-
lytic measure are well matched, the measure’s value can
be interpreted with respect to the functionality of the net-
work for the complex system of interest.
(Zweig, 2016)

Read more on this on the general keyword of "trilemma of complex network analysis" and in Chapter 5 "Network representations of complex systems" of my book.




Reference:

 (Zweig2016) Katharina A. Zweig: Network Analysis Literacy, ISBN 978-3-7091-0740-9, Springer Vienna, 2016

No comments:

Post a Comment