
Network Analysis has made heavy use of statistics  and it seems that statistics is not humankind's strength. That does not make network analysis any easier.
This figure is under CC:BY with a reference to
Prof. Dr. Katharina A. Zweig or to this blogpost.

A main part of network analysis is computing and interpreting statistical numbers. Unfortunately, statistics is not the most intuitive part of mathematics and it is wellknown that even trained scientists have problem in correctly interpreting statistical results. Consider the following test:
A group of young students without any symptoms of sickness make a blood donation. Routinely, their blood is checked for an HIV infection. The test is very sensitive. For simplicity, assume that it detects 99.99% of all infected persons and that noninfected persons will get a negative test with 99.99%. If now a person's first test returns a positive result, what is her likelihood to actually be infected?
If you think it is 99.99%, you are in very good company (but wrong):
Note 14. Gigerenzer and his team showed in various
studies that almost none of the experts was able to give
the correct answer . Most answered that, as the test is
so speciﬁc and so sensitive, the probability that a person
is infected if the test says so is 99.99%. (Zweig, 2016)
Actually, the question cannot really be answered without knowing the chance that a person without any symptoms is infected. This probability can be approximated by the socalled
incidence rate, the number of new infections per year. For Germany, this is around 3,000 in a nation with about 80 million inhabitants (for young people, it might actually be higher than for the general population, but as an approximation that is fine).
We now want to know the probability that a person is infected if her test turns out to be positive. There are two ways for a positive test: the person is infected and detected or the person is notinfected but falsely flagged. If we would test whole Germany, we would in essence find all of the 3,000 newly infected persons. However, from the remaining (still roughly) 80 million people, we would flag 0,01%, i.e., 1 person in 1 in 10,000. Thus, we additionally flag 8,000 people as positive. From all 11,000 people with a positive test, 8,000 would actually not be infected. I.e., the probability that a person with a positive test is infected is less than 50%, namely around 27%. Surprised?
This computation has a very important consequence. Let our 'nullhypothesis' be that any given person is not infected. Now, we know that the probability that a person is not infected and gets a positive test is very small  this value is called her
pvalue (probability to observe the data given the assumption in the nullhypothesis). Especially, it is smaller than p=0.05, the classic threshold value to 'reject the nullhypothesis'. However, as we have seen, we need to compute the probability that the person is infected given a positive test result. And this probability can differ strongly from the other one when the ratio of the two classes (infected vs not infected) is not around 0.5. Thus, rejecting a nullhypothesis, just because given the assumption ("not infected") the observation of the data ("positive test") is unlikely, is the wrong way.
Note 15. The only correct verbal descriptions of a pvalue need to
contain the words given that the nullhypothesis is true as
the pvalue conditions on that. As the pvalue does not
say anything about the probability of the hypothesis to
be true, given the observed data, it cannot be used as a
basis for rejecting the nullhypothesis. (Zweig, 2016)
It is just the first step to update our probability of the assumption, given the observed data. This will be important, e.g., to identify
network motifs.
If statistics is already hard, then this makes network analysis no way easier! Read more about
statistical hypotheses testing on Wikipedia. Or join my Mendely group on "
Good statistics papers for nonstatisticians".
References:
(Gigerenzer, 2007) Gerd Gigerenzer, Wolfgang Gaissmaier, Elke KurzMilcke, Lisa M. Schwartz, and Steven Woloshin.
Helping doctors and patients make sense of health statistics. Psychological science in the public interest, 8(2), 2007.
(Zweig2016) Katharina A. Zweig: Network Analysis Literacy, ISBN
9783709107409, Springer Vienna, 2016