Sunday, June 26, 2016

Notes 8-11: What is the difference between social network analysis and network science?

The following four notes are from my book "Network Analysis Literacy" (Zweig2016).

Summary of the differences between social network analysis and network science. Of course, this is a generalization and will not apply to every single network analytic project from either field.
"Note 8. The first big difference between social and complex network analysis as a part of network science, however, is that the underlying data is not restricted to social systems but comprises all relationships between any kind of entities in any given complex system."
"Note 9. A second important difference between network science and social network analysis is that (in general) the first induces micro-behavior from observed macro-behavior while (in general) the second predicts macro-behavior from hypothesized micro-behavior."
"Note 10. Social network analysis tries to capture many details from the social system of interest. Often, additional parameters of the persons under observation are requested and used for the analysis. The approach is thus a contextual approach that takes the context into account. In network science, the abstraction level is in most cases much higher and individual properties of the entities are much less often taken into account. The approach can be characterized as being largely context-free."
"Note 11. In summary (and a bit bold), social network analysis is a theory-driven, bottom-up approach that carefully models additional social information where available and takes it into account when interpreting the results. Network science follows a data-driven, top-down approach that tries to clean the data from all detail to compare the core structure of different complex networks."


 (Zweig2016) Katharina A. Zweig: Network Analysis Literacy, ISBN 978-3-7091-0740-9, Springer Vienna, 2016

Note 7: What is the difference between graph theory and network analysis?

What is the difference between graph theory and network analysis?
This figure is under CC:BY with a reference to
Prof. Dr. Katharina A. Zweig or to this blogpost.
Graph theory is often seen as one building block of network analysis. But what exactly are the differences between the two fields? Graph theory is a very abstract science that defines different graph classes and tries to understand their specific properties. Furthermore, it is concerned with so-called graph problems. Let me give you the most classic example of such a graph problem:

Given: A graph G
Wanted: The maximum clique C in G, i.e., the largest subset of nodes in G such that any two nodes in C are connected by an edge (complete subgraph)
Figure 1: A set of intervals (above) and the corresponding interval graph (below). This figure is under CC:BY with a reference to Prof. Dr. Katharina A. Zweig  or to this blogpost.
 A graph problem is always described by an input ("Given: XX") and a definition of the wanted output ("Wanted: YY") with resepct to the input. Finding a maximum clique in a given graph is problem that takes a long time to solve. The problem is that, intuitively, all possible subsets of nodes have to be tested of whether they constitute a clique. Until now, there is no other solution process that is substantially faster for all possible inputs. However, for some graph classes, it can be done much faster. One of those graph classes is a so-called interval graph (see Figure 1). Given a set I of intervals, the interval graph represents the intervals as node and connects any two nodes with each other, if the corresponding intervals overlap. In such a graph, we just need to find a spot where the maximum number of intervals overlap. It can be shown that we just have to look at all the endpoints of the intervals to find that spot. This can be done very fast, even if the network is extremely large.

Thus, graph theory is concerned with different graph classes and their relationship to graph problems and the runtime necessary to find the solutions to the graph problems.

"Note 7. In general, graph theory is concerned with the relationships between different graph classes and the relationship between certain graph structures, a graph problem, and its algorithmic solution." (Zweig2016)

In contrast, network analysis is interested in the connection between a certain graph structure and the function this graph structure has in the complex system of interest. For this, we use graph theoretic insights. For example, in random graph theory, we know how likely it is that a certain random graph contains a clique of order k (order = number of nodes in it). If a clique of order k is found in a complex network but it would also be very likely to find one of those in a random graph of a corresponding size and order, network analysis would deduce that the clique is not likely to be functional for the complex system of interest.

In summary: network analysis builds on (some of the) insights of graph theory, but is not interested so much in the abstract behavior of graph classes, but rather in the relationship between structure and function of a graph and the complex system it represents.

Saturday, June 25, 2016

Note 4 and Note 6: What does a measure measure?

Network analysis promises to give insight into the inner
workings of a complex system by representing
it as a complex network, by analyzing this representation,
and by interpreting the results of the analysis.
But: what does a measure actually measure?
How is the result mapped to the inner workings of the system?
This figure is under CC:BY with a reference to
Prof. Dr. Katharina A. Zweig  or to this blogpost.
Network analysis as a framework is mainly seen as a set of functions applied to the network's structure. These functions are often called "measures" but from a mathematical standpoint, a measure is a very specific function, that would need to fulfill a set of properties. However, since it is common to call them "measures", I will stick to that term.

Now, what does a network analytic measure like the eccentricity measure? Mathematically, the eccentricity of a node is defined as its maximal distance to any other node in the graph. So, this is what the eccentricity measures. However, when we apply network analysis, we hope for the following:

"Note 4. The promise of network analysis is that the abstraction of a complex system as represented by a complex network and its underlying graph still allows to infer something about the complex system of interest. That is actually a strong assumption [and there are] preconditions to enable this transfer." (Zweig2016)

 So, what is the insight that a measure like the eccentrictiy can give? It kind of determines the centrality of a node, but under very strong assumptions:
  1. The graph theoretic distance between v and w, i.e., the minimal number of edges to be traversed to get from v to w, is of interest in the complex system to be investigated. For example, if the complex network represents a street network as an unweighted graph, then the graph theoretic distance which only counts the number of edges (i.e., streets) to be traversed, is hardly of interest.
  2. The eccentricity looks at the maximal distance. If we identify the node with the lowest eccentricity, this is the node that can, in principle, send a message to all nodes and it will reach even the farthest node in the minimal time possible. This again under some assumptions:
    1. The message is sent at the same time to all neighbors.
    2. They send it to all their neighbors within one time step.
    3. There is such a thing as a time step, which all nodes know.
  3. In other words: there is a process in the real-world complex system which uses the relationship represented by the complex network modeling the system. This process needs to be well described by the implicit assumptions of the network measure (cf. Borgatti2005).
 This is a first indication, that network analytic measures implicitly contain a model of a network flow process, and that the measure needs to match the network flow process of interest.

"Note 6. While it is absolutely true that the result of a formula is never wrong in the sense of 'diļ¬€erent than what it is supposed to be', the application of the formula might be a mismatch with the intention of what is to be measured." (Zweig2016)


(Borgatti2005) Borgatti, S. P.: Centrality and Network Flow, Social Networks, 2005, 27, 55-71
(Zweig2016) Katharina A. Zweig: Network Analysis Literacy, ISBN 978-3-7091-0740-9, Springer Vienna, 2016

Thursday, June 23, 2016

Note 5: Can visualization replace analysis?

What is the most central node in the graph on the left,
what is the most central node in the graph on the right?

I always show this figure in my lecture. Then I ask: "What is the most central node in the left graph? What is the most central node in the right graph?" For the first question, it is almost impossible to suppress the urge to shout out: "The one in the middle is!". I think that this is because we humans are used to put the most important thing in the middle, where our (physical) focus is. We believe that if someone puts a thing in the middle, there must be meaning assigned to this. However, in a network visualization, it does not need to be. However, looking closely at the two visualizations, you will find that they display the same graph, i.e., the connections are absolutely the same. And on the right hand visualization, it becomes obvious that every node is interchangeable with every other node (in graph theoretical terms: the nodes are in an automorphism class).  All nodes thus have the same centrality in the network.

Of course, most algorithms try to put nodes in the middle of their neighbors. This already indicates that a node which is at the center of a network (in the sense of smallest closeness) might also be in the center of the visualization of the network. However, there are also other aesthetic considerations, such as the overall ratio of the resulting figure, or edge crossing minimization.

Looking at a visualization of a network is always a good idea. Being inspired by the visualization and creating a new hypothesis about the structure of it, is absolutely helpful in the first stages of any network analytic project. However, the hypothesis needs to be tested by a quantifiable method - not by another visualization.

"Note 5. A visualization of a network can be both revealing and deceiving. This is why Gephi, yEd, and other visualization tools are perfect for exploration and hypothesis building; it is also the reason why statistical software packages or self-tailored applications are needed to collect quantifiable evidence that a given hypothesis is true." (Zweig2016)


(Zweig2016) Katharina A. Zweig: Network Analysis Literacy, ISBN 978-3-7091-0740-9, Springer Vienna, publication expected Dec 2016

Sunday, June 5, 2016

Note 3: What is actually the differece between a graph and a complex network?

A graph is a mathematical structure that is composed of a set of
elements and a relation defined on that set. It does not know about
the set of real-world entities and their relationship it
(might) represent. It is important to note that a graph does,
of course, not need to represent any real-world situation.
The 'complex network' defines the relationship between
the set of entities in the real-world and their representation
in the graph.This figure is under CC:BY with a
reference to Prof. Dr. Katharina A. Zweig or to this blogpost.
When I started to do network analysis, I was confused by the terms 'graph' and 'complex network', and I wondered whether they can be used totally interchangeably or not.

"Note 3. What is the difference between a (complex) network and a graph? The quick answer is that a graph is the abstract representation of a relation between entities while a network combines the graph with additional information about the entities and their relationship represented by the graph."

A graph is, mathematically, just the combination of any set of elements V and a relation E defined on it. A relation is just a subset of all the possible pairs of elements in the set. By definition, these pairs have an order, i.e., it makes a difference whether the pair $(x,y)$ or the air $(y,x)$ is included. If for all pairs both directions are included in the relation, we speak of a symmetric relation. If there is at least one pair that is only included in one direction, it is an asymmetric relation. The elements of the relation are called edges, when they are part of a graph, and the elements of V are called nodes or vertices.
A graph can be associated with functions, that assign values to nodes or edges. You see, on this mathematical level, everything is pretty abstract.

A complex network fills these things with meaning: the nodes suddenly represents a set of real-world entities, e.g., persons. The edges represent a relationship between the nodes. The mathematical property of the relation called 'symmetry' suddenly represents an undirected relationship, while an asymmetric relation represents a directed relationship. Functions associated with the edges are weights that capture an important aspect of the relationship, and functions associated with the nodes capture important properties of the nodes.

In most network analytic publications, you will see an identification of the nodes with their entities and the edges with the relationship they represent. This is in most cases unproblematic and saves a lot of text. Instead of writing "Two nodes are connected by an edge if the corresponding street corners are connected by a street", it is much faster to write: "In the network, street corners are the nodes and streets are the edges.".

However, such a formulation also indicates there would be a clear one-to-one-mapping. As will be seen in later blog posts, this is almost never the case: there are multiple modeling decisions to be made to come from a heap of raw data to a network representation. Here is how Brandes et al. phrase the problem in their editorial of the first issue of their journal "Network Science":

"As representation is usually defined via an isomorphism, i.e., a one-to-one mapping
between structures preserving relations, a phenomenon cannot be represented
directly but needs to be conceptualized first.
Of course, this is by no means an unusual division in science or other areas of
knowledge. Possibly because of the graphic and metaphoric connotations of the
term network, the implications of a preceding abstraction step are often overlooked
or blurred. Sometimes this may be on purpose for terminological convenience.
More often, there appears to be a lack of awareness. We feel, however, that this
distinction is crucially important for serious applications of network science to the
understanding of substantive phenomena as it points to the delicacy of interpreting
the results of network data analysis.
Interpretation essentially reverses the process of abstraction and representation
to get back to the phenomenon so that substantive theory is required to secure
Defining the isomorphism between the elements in the real-world and their counterpart in the graph is a step that is often also called operationalization and later blog post will have a lot to say about this step.

In summary: there is an important distinction between a complex network and the graph it contains. And at least in the description of how the real-world phenomenon is turned into a network representation, it is good practice to differentiate between the two different layers---the real-world and the graph representation of it. In later parts of the text, however, it might be cumbersome to differentiate between the elements of the graph and the real-world entities they represent.


(Brandes2013) Brandes, U.; Robins, G.; McCranie, A. & Wasserman, S.: "What is Network Science?", Network Science, 2013, 1, Editorial
(Zweig2016) Katharina A. Zweig: Network Analysis Literacy, ISBN 978-3-7091-0740-9, Springer Vienna, publication expected Dec 2016 

Saturday, June 4, 2016

Note 2: Minimal requirements to represent data as a "complex network"

Not all well-defined relations lend
themselves to a meaningful representation
as a complex network. This figure is under CC:BY
with a reference to Prof. Dr. Katharina A. Zweig
or to this blogpost.
Have you ever asked yourself what the minimal requirement is to turn something into a "complex network"? Well, mathematically seen, it is the following:

Note 2. Mathematically, a relation R on a given set
of entities or objects is just an arbitrary choice of pairs
of these entities (objects), denoted by R ⊆ O × O. In
principle, any relation can be represented as a graph.

So, the minimal requirements are actually - minimal. While mathematically possible, not all relations gain from being represented as a graph and by being treated as a "complex network". Look, for example at the set of all living humans that own at least one ID card and connect any two of them if their oldest ID-card's ID number shares the last digit. This is surely a relation, but it is also surely a relationship between humans that will not be any better analyzed by turning it into a complex network.

Why is this so? The whole idea of complex network analysis is to understand the interaction structure of entities in a complex system. Complex systems are those with emergent phenomena. Emergence often - well - emerges, when interactions between pairs of entities change the interactions of other pairs of entities because of the interactions between the pairs, i.e., when indirect effects are transferred via the interactions. Brandes et al. express it like this:

By postulating a friendship network in (say) a school class-
room of 25 students, we have taken a theoretical step that
is non-trivial. We have supposed that separate individ-
uals are not an adequate representation, moreover that
even separate dyads are insufficient; rather, that there
is a unity within the classroom that makes it proper to
talk of “a” network, not 25 children or 300 dyads. To con-
ceptualize the classroom in network terms is an implicit
(and strong) claim that connectedness across individual
elements is fundamentally important, so that the class-
room can be thought of as one “system”. (Brandes2013)
I believe that this feature that turns a set of pairs (or dyads) into "one system" is a network process that induces indirect effects via the relationship that binds the pairs together in one network. Thus, the answer to the question is: while mathematically any relation defined on a finite set of entities is good enough for a representation as a graph, semantically, not all relations make sense to be represented as a complex network. And since a relation can once represent a meaningful relationship and the very same relationship can also represent a meaningless relationship (like the one above), it is not the relation itself that decides about its "networkability". It is the relationship the relation represents.


(Brandes2013) Brandes, U.; Robins, G.; McCranie, A. & Wasserman, S.: "What is Network Science?", Network Science, 2013, 1, Editorial
(Zweig2016) Katharina A. Zweig: Network Analysis Literacy, ISBN 978-3-7091-0740-9, Springer Vienna, publication expected Dec 2016

Note 1: Trilemma of Complex Network Analysis

I started my doctoral studies over 13 years ago - and it seemed that complex network analysis is the framework to use whenever you can define a meaningful relationship between a set of entities: proteins interacting with each other, airports connected by scheduled flights, people connected by pressing a 'retweet' button in their browser to express their opinion on someone else's tweet.

The main hypothesis and first Note in my upcoming book "Network Analysis Literacy" (Zweig2016) condenses my findings of these more than a dozen years:

Note 1. "To interpret the values of a distance-based measure, the way of calculating the distance must be matched to the process of interest. To interpret any walk-based measure, the set of walks used by the measure needs to be closely adapted to the process. (K.A. Zweig: Network Analysis Literacy, (c) by Springer Verlag, Heidelberg, to be published)

It refers to the so-called trilemma of complex network analysis, a term Isadory Dorn, Andreas Lindenblatt, and I developed in 2012 (Dorn2012). It summarizes the interdependencies between raw data, relationship of interest, network process of interest, research question, and methods used to analyze the latter. The research question determines a network process that uses a (set of) relationships to exert indirect effects on entities connected to each other by this relationship. By representing this relationship as a complex network, all classic network analytic methods can be applied---in principle.
Stephen P. Borgatti  was the first to show that centrality indices, one of the most classic and widely used set of methods, have an inbuilt model of a network process they are associated with (Borgatti2005): they secretely determine the paths on which indirect effects are induced! Take your beloved betweenness centrality, defined as follows:

where $\delta_v(s,t)$ refers to the number of shortest paths between s and t, containing v, and $\delta(s,t)$ refers to the number of all shortest paths between s and t. There will be another dedicated blog entry to the implicit assumptions the betweenness centrality makes, but here it suffices to say that it assumes the following: all entities want to interact with each other in the same intensity (all pairs of s and t are treated equally) and all of them interact on shortest paths.

If your network process of interest does not follow these two assumptions, the betweenness centrality might not be the best centrality index to answer your research question. 

This is just the first example of how network analysis literacy, e.g., knowing the implicit models behind your favourite network analytic measure or the relationship between research question, network process, and relationship, may help you to make well-grounded choices, 


(Borgatti2005) Borgatti, S. P.: "Centrality and Network Flow", Social Networks, 2005, 27, 55-71

(Dorn2012) Dorn, I.; Lindenblatt, A. & Zweig, K. A.: "The Trilemma of Network Analysis", Proceedings of the 2012 IEEE/ACM international conference on Advances in Social Network Analysis and Mining, Istanbul, 2012

(Zweig2016) Katharina A. Zweig: Network Analysis Literacy, ISBN 978-3-7091-0740-9, Springer Vienna, publication expected Dec 2016