The following paper gives a very nice example of a biological network analysis in which many of the nine phases of a network analytic project can be found:
Christine Vogel, Sarah A. Teichmann, and Jose Pereira-Leal
"The Relationship between Domain Duplication and Recombination", J. Mol. Biol. 346(1):355-65, 2005
0) Pose the question: Are domain combinations just determined by random combinations of domains with different abundance or is there a further step of selection?
1) Build the network: Domains are characterized into different superfamilies and two superfamilies are connected by a directed edge if they are directly adjacent in the N->C direction of a gene in at least one gene.
2) Choose subgraph H: ego-network
3) Choose measure: size of ego-network (versatility) vs. external measure, namely the abundance of the node (number of times the domain appears in the genome).
4) Null-model: random combinations of domains while maintaining the abundance of the node
First result: the relationship between versatility and abundance is very stable, independent of how the data set is divided up into different categories. Tested were: division by structural class, phylogeny, protein's function, or the process it contributes to. This result thus shows that the null-model of random combination cannot be rejected.
5) Designing the model: the null-model maintains the abundance of each domain, the number of domains per gene, and their total number per genome.
6) Comparison observation/model: The versatility of the nodes in this model is much higher than in reality, i.e., in the model domains have fewer adjacent domains than in reality.
7) Design of a new model that suits the findings better: here, domains are shuffled and then duplicated until one of the component domains is used up.
Processes were not tested on any of the models, thus steps 8/9 are non-existing.
No comments:
Post a Comment