Network Analysis Literacy: Visualizing and Coloring of Induced Subgraphs

If the network is too large to visualize it as a whole, it might be a good idea to choose some nodes at random and to visualize their direct surroundings, also called their ego network. The ego network comprises the direct neighbors of a node and the relationship between these neighbors. In this example we will combine all edges between 10 randomly chosen nodes and their direct neighbors. The data set we are working on is freely available from Alex Arena's homepage. We use his data on the email contact network of the University Rovira y Virgili and we will assume that it is stored as 'email_Arenas.txt'. (For all readers who are fluent in latex, here is the according Sweave-file which will create all figures in eps/pdf/png format. It assumes you have set the working directory to where itself and the data is located. )

> library(network)
> setwd("path-to-wherever-you-stored-the-data")
> tableEmail <- read.table("email_Arenas.txt") 

# create a network from the data frame
> emailNW <- network(tableEmail, directed = T)

# choose 10 random vertex IDs without replacement
> randomSample <- sample(1:network.size(emailNW), 10, replace = FALSE) 

# show the randomly drawn vertex IDs
> randomSample
[1]  843  548  691  871  524  858  290 1059  485  593

#initialize a new vector
> neighs <- vector() 

#for all vertex IDs in the random sample
> for (x in randomSample) {

# add themselves and their direct neighborhood to the vector 'neighs'
>  neighs <- c(neighs, x, get.neighborhood(emailNW, x, type = "combined")) 
> } 

#create the induced subgraph
> igraph <- get.inducedSubgraph(emailNW, neighs)

Now, we would like to see this random sample from the graph:

> plot(igraph)

An induced subgraph based on 10 randomly drawn seed nodes and their direct neighborhoods.

The induced subgraphs gives an overall impression on how dense the local neighborhoods and the connections between 10 randomly drawn neighborhoods are. It would, however, be helpful to identify the 10 seeds. This can be done by coloring them accordingly. This was my first try:

#make a vector of size n (=#number of nodes), assigning color 2 as default
> color <- rep(2, times = network.size(emailNW))
#for the seed nodes, assign color number 3 
> color[network.vertex.names(emailNW) %in% randomSample] = 3
#plot the graph and assign the color vector
> plot(igraph, vertex.col = color[network.vertex.names(igraph)])

Interestingly, this does not give the wanted results; it seems as if there were no seed vertices at all:

A test plot in which all seed nodes should have been colored in green. Did not work, obviously.

If you would run the same code again and again, you would sometimes see one or two or even more green vertices. So, what happens (or seems to happen) is the following: the induced subgraph creates a new graph with obviously fewer vertices than the original graph. The vertex names themselves are maintained, and their order is maintained as well - you can check that by typing network.vertex.names(igraph). But only the first n entries in vertex.col are used as an assigment to the colors. Thus, we need to reduce the color-vector to those entries which are contained in igraph:

> plot(igraph ,vertex.col=color[network.vertex.names(igraph)])

This will finally produce the graph with all 10 seed nodes colored in green:

The final result with all 10 seed nodes colored in green.

Network Analysis Literacy

Monday, July 25, 2011

Visualizing and Coloring of Induced Subgraphs

1 comment: