Saturday, July 23, 2011

Importing Data into the network Package in R (II)

Okay, let's proceed with the last example to find another interesting behavior of the network and igraph packages in R. Again, we will assume you downloaded the autonomous system network as compiled by Jure Leskovec, named it "edges.txt" and saved it into your favourite directory "favDir". We will now import it:

> setwd("path-to-favDir")
> library(network)
> myEdges <- read.table("edges.txt", header=T)
We now look at the lines 500 to 600 in the table and create a network from these lines:
> myEdges[500:600,]
> network <- network(myEdges[500:600,])
In this part of the table, there are exactly 101 edges, all with the same source node with ID 1 and 101 other nodes. Thus, we would expect that the resulting network has 102 nodes. So, let's check:
> network.size(network)
> 568
So, essentially what happens is that the network()-function creates a node for each ID from 1 until the maximum ID of 568. This is a (bug or feature?) side-effect of the IDs being numeric, i.e., of type int in the data.frame myEdges. If you again tell R to read in the nodes' IDs as characters, the resulting network will have the expected 102 nodes:
> myEdges <- read.table("edges.txt", header=T, colClasses=c("character", "character"))
> network2 <- network(myEdges[500:600,])
> network.size(network2)
> 102

So, if you ever experience long transformation times from data.frame to network, check whether your IDs have a numeric type and whether you have IDs which are much higher than the number of nodes in the network.

No comments:

Post a Comment