Network motif

Network motifs are recurrent and statistically significant subgraphs or patterns of a larger graph. All networks, including biological networks, social networks, technological networks (e.g., computer networks and electrical circuits) and more, can be represented as graphs, which include a wide variety of subgraphs.

Network motifs are sub-graphs that repeat themselves in a specific network or even among various networks. Each of these sub-graphs, defined by a particular pattern of interactions between vertices, may reflect a framework in which particular functions are achieved efficiently. Indeed, motifs are of notable importance largely because they may reflect functional properties. They have recently^[when?] gathered much attention as a useful concept to uncover structural design principles of complex networks.^[1] Although network motifs may provide a deep insight into the network's functional abilities, their detection is computationally challenging.

Definitions

Let $G = (V, E)$ and $G' = (V', E')$ be two graphs. Graph $G'$ is a sub-graph of graph $G$ (written as $G' \subseteq G$ ) if $V' \subseteq V$ and $E' \subseteq E \cap (V' \times V')$ . If $G' \subseteq G$ and $G'$ contains all of the edges $⟨u, v⟩ \in E$ with $u, v \in V'$ , then $G'$ is an induced sub-graph of $G$ . We call $G'$ and $G$ isomorphic (written as $G' \leftrightarrow G$ ), if there exists a bijection (one-to-one correspondence) $f:V' \to V$ with $⟨u, v⟩ \in E' \Leftrightarrow ⟨f(u), f(v)⟩ \in E$ for all $u, v \in V'$ . The mapping $f$ is called an isomorphism between $G$ and $G'$ .^[2]

When $G″ \subset G$ and there exists an isomorphism between the sub-graph $G″$ and a graph $G'$ , this mapping represents an appearance of $G'$ in $G$ . The number of appearances of graph $G'$ in $G$ is called the frequency $F G$ of $G'$ in $G$ . A graph is called recurrent (or frequent) in $G$ when its frequency $F G (G')$ is above a predefined threshold or cut-off value. We use terms pattern and frequent sub-graph in this review interchangeably. There is an ensemble $Ω(G)$ of random graphs corresponding to the null-model associated to $G$ . We should choose $N$ random graphs uniformly from $Ω(G)$ and calculate the frequency for a particular frequent sub-graph $G'$ in $G$ . If the frequency of $G'$ in $G$ is higher than its arithmetic mean frequency in $N$ random graphs $R i$ , where $1 \leq i \leq N$ , we call this recurrent pattern significant and hence treat $G'$ as a network motif for $G$ . For a small graph $G'$ , the network $G$ , and a set of randomized networks $R(G) \subseteq Ω(R)$ , where $R(G) = N$ , the Z-score of the frequency of $G'$ is given by

$Z(G^{\prime })={\frac {F_{G}(G^{\prime })-\mu _{R}(G^{\prime })}{\sigma _{R}(G^{\prime })}}$

where $μ R (G')$ and $σ R (G')$ stand for the mean and standard deviation of the frequency in set $R(G)$ , respectively.^[3]^[4]^[5]^[6]^[7]^[8] The larger the $Z(G')$ , the more significant is the sub-graph $G'$ as a motif. Alternatively, another measurement in statistical hypothesis testing that can be considered in motif detection is the p-value, given as the probability of $F R (G') \geq F G (G')$ (as its null-hypothesis), where $F R (G')$ indicates the frequency of G' in a randomized network.^[6] A sub-graph with p-value less than a threshold (commonly 0.01 or 0.05) will be treated as a significant pattern. The p-value for the frequency of $G'$ is defined as

$P(G^{\prime })={\frac {1}{N}}\sum _{i=1}^{N}\delta (c(i))\quad c(i):F_{R}^{i}(G^{\prime })\geq F_{G}(G^{\prime })$

where $N$ indicates the number of randomized networks, $i$ is defined over an ensemble of randomized networks, and the Kronecker delta function $δ(c(i))$ is one if the condition $c(i)$ holds. The concentration^[9]^[10] of a particular n-size sub-graph $G'$ in network $G$ refers to the ratio of the sub-graph appearance in the network to the total n-size non-isomorphic sub-graphs' frequencies, which is formulated by

$C_{G}(G^{\prime })={\frac {F_{G}(G^{\prime })}{\sum _{i}F_{G}(G_{i})}}$

where index $i$ is defined over the set of all non-isomorphic n-size graphs. Another statistical measurement is defined for evaluating network motifs, but it is rarely used in known algorithms. This measurement is introduced by Picard et al. in 2008 and used the Poisson distribution, rather than the Gaussian normal distribution that is implicitly being used above.^[11]

In addition, three specific concepts of sub-graph frequency have been proposed.^[12] As the figure illustrates, the first frequency concept $F 1$ considers all matches of a graph in original network. This definition is similar to what we have introduced above. The second concept $F 2$ is defined as the maximum number of edge-disjoint instances of a given graph in original network. And finally, the frequency concept $F 3$ entails matches with disjoint edges and nodes. Therefore, the two concepts $F 2$ and $F 3$ restrict the usage of elements of the graph, and as can be inferred, the frequency of a sub-graph declines by imposing restrictions on network element usage. As a result, a network motif detection algorithm would pass over more candidate sub-graphs if we insist on frequency concepts $F 2$ and $F 3$ .

History

The study of network motifs was pioneered by Holland and Leinhardt^[13]^[14]^[15]^[16] who introduced the concept of a triad census of networks. They introduced methods to enumerate various types of subgraph configurations, and test whether the subgraph counts are statistically different from those expected in random networks.

This idea was further generalized in 2002 by Uri Alon and his group ^[17] when network motifs were discovered in the gene regulation (transcription) network of the bacteria E. coli and then in a large set of natural networks. Since then, a considerable number of studies have been conducted on the subject. Some of these studies focus on the biological applications, while others focus on the computational theory of network motifs.

The biological studies endeavor to interpret the motifs detected for biological networks. For example, in work following,^[17] the network motifs found in E. coli were discovered in the transcription networks of other bacteria^[18] as well as yeast^[19]^[20] and higher organisms.^[21]^[22]^[23] A distinct set of network motifs were identified in other types of biological networks such as neuronal networks and protein interaction networks.^[5]^[24]^[25]

The computational research has focused on improving existing motif detection tools to assist the biological investigations and allow larger networks to be analyzed. Several different algorithms have been provided so far, which are elaborated in the next section in chronological order.

Most recently, the acc-MOTIF tool to detect network motifs was released.^[26]

Motif discovery algorithms

Various solutions have been proposed for the challenging problem of network motif (NM) discovery. These algorithms can be classified under various paradigms such as exact counting methods, sampling methods, pattern growth methods and so on. However, motif discovery problem comprises two main steps: first, calculating the number of occurrences of a sub-graph and then, evaluating the sub-graph significance. The recurrence is significant if it is detectably far more than expected. Roughly speaking, the expected number of appearances of a sub-graph can be determined by a Null-model, which is defined by an ensemble of random networks with some of the same properties as the original network.

Until 2004, the only exact counting method for NM detection was the brute-force one proposed by Milo et al..^[3] This algorithm was successful for discovering small motifs, but using this method for finding even size 5 or 6 motifs was not computationally feasible. Hence, a new approach to this problem was needed.

Here, a review on computational aspects of major algorithms is given and their related benefits and drawbacks from an algorithmic perspective are discussed.

Classification of algorithms

The table below lists the motif discovery algorithms that will be described in this section. They can be divided into two general categories: those based on exact counting and those using statistical sampling and estimations instead. Because the second group does not count all the occurrences of a subgraph in the main network, the algorithms belonging to this group are faster, but they might yield in biased and unrealistic results.

In the next level, the exact counting algorithms can be classified to network-centric and subgraph-centric methods. The algorithms of the first class search the given network for all subgraphs of a given size, while the algorithms falling into the second class first generate different possible non-isomorphic graphs of the given size, and then explore the network for each generated subgraph separately. Each approach has its advantages and disadvantages which are discussed below.

The table also indicates whether an algorithm can be used for directed or undirected networks as well as induced or non-induced subgraphs. For more information refer to the provided web links or lab addresses.

Classification of Motif Discovery Algorithms
Counting Method	Basis	Name	Directed / Undirected	Induced / Non-Induced
Exact	Network-Centric	mfinder	Both	Induced
		ESU (FANMOD)	Both	Induced
		Kavosh (used in CytoKavosh)	Both	Induced
		G-Tries	Both	Induced
		PGD	Undirected	Induced
	Subgraph-Centric	FPF (Mavisto)	Both	Induced
		NeMoFinder	Undirected	Induced
		Grochow–Kellis	Both	Both
		MODA	Both	Both
Estimation / Sampling	Color-Coding Approach	N. Alon et al.'s Algorithm	Undirected	Non-Induced
	Other Approaches	mfinder	Both	Induced
	Other Approaches	ESU (FANMOD)	Both	Induced

Zdroj:https://en.wikipedia.org?pojem=Network_motif
Text je dostupný za podmienok Creative Commons Attribution/Share-Alike License 3.0 Unported; prípadne za ďalších podmienok. Podrobnejšie informácie nájdete na stránke Podmienky použitia.

Navigácia: Veda >

Analytika
Antropológia
Aplikované vedy
Bibliometria
Dejiny vedy
Encyklopédie
Filozofia vedy
Forenzné vedy
Humanitné vedy
Knižničná veda
Kryogenika
Kryptológia
Kulturológia
Literárna veda
Medzidisciplinárne oblasti
Metódy kvantitatívnej analýzy
Metavedy
Metodika

Metodológia vedy
Náboženstvo a veda
Náučná literatúra
Podvody vo vede
Popularizácia vedy
Potravinárstvo
Prírodné vedy
Pseudoveda
Scientometria
Spoločenské vedy
Teórie
Teatrológia
Technické vedy
Technika
Terminológia
Umenie
Výskum

Veda
Veda a technika podľa štátu
Veda a technika podľa kontinentu
Veda a technika podľa roka
Veda v kozme
Vedci
Vedecká literatúra
Vedecké databázy
Vedecké experimenty
Vedecké konferencie
Vedecké metódy
Vedecké ocenenia
Vedecké organizácie
Vedecké parky
Vedeckí spisovatelia
Vzdelávanie
Záhady

Príbuzné výrazy:

Text je dostupný za podmienok Creative Commons Attribution/Share-Alike License 3.0 Unported; prípadne za ďalších podmienok.
Podrobnejšie informácie nájdete na stránke Podmienky použitia.

Addresses of Designers of Algorithms
Algorithm	Lab / Dept. Name	Department / School	Institute	Address	E-Mail
mfinder	Uri Alon's Group	Department of Molecular Cell Biology	Weizmann Institute of Science	Rehovot, Israel, Wolfson, Rm. 607	urialon at weizmann.ac.il
FPF (Mavisto)	----	----	Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung (IPK)	Corrensstraße 3, D-06466 Stadt Seeland, OT Gatersleben, Deutschland	schreibe at ipk-gatersleben.de
ESU (FANMOD)	Lehrstuhl Theoretische Informatik I	Institut für Informatik	Friedrich-Schiller-Universität Jena	Ernst-Abbe-Platz 2,D-07743 Jena, Deutschland	sebastian.wernicke at gmail.com
NeMoFinder	----	School of Computing	National University of Singapore	Singapore 119077	chenjin at comp.nus.edu.sg
Grochow–Kellis	CS Theory Group & Complex Systems Group	Computer Science	University of Colorado, Boulder	1111 Engineering Dr. ECOT 717, 430 UCB Boulder, CO 80309-0430 USA	jgrochow at colorado.edu

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]