Fowlkes–Mallows index

The Fowlkes–Mallows index is an external evaluation method that is used to determine the similarity between two clusterings (clusters obtained after a clustering algorithm), and also a metric to measure confusion matrices. This measure of similarity could be either between two hierarchical clusterings or a clustering and a benchmark classification. A higher value for the Fowlkes–Mallows index indicates a greater similarity between the clusters and the benchmark classifications. It was invented by Bell Labs statisticians Edward Fowlkes and Collin Mallows in 1983.^[1]

Preliminaries

The Fowlkes–Mallows index, when results of two clustering algorithms are used to evaluate the results, is defined as^[2]

FM={\sqrt {PPV\cdot TPR}}={\sqrt {{\frac {TP}{TP+FP}}\cdot {\frac {TP}{TP+FN}}}}

where $TP$ is the number of true positives, $FP$ is the number of false positives, and $FN$ is the number of false negatives. $TPR$ is the true positive rate, also called sensitivity or recall, and $PPV$ is the positive predictive rate, also known as precision.

The minimum possible value of the Fowlkes–Mallows index is 0, which corresponds to the worst binary classification possible, where all the elements have been misclassified. And the maximum possible value of the Fowlkes–Mallows index is 1, which corresponds to the best binary classification possible, where all the elements have been perfectly classified.

Definition

Consider two hierarchical clusterings of $n$ objects labeled $A_{1}$ and $A_{2}$ . The trees $A_{1}$ and $A_{2}$ can be cut to produce $k=2,\ldots ,n-1$ clusters for each tree (by either selecting clusters at a particular height of the tree or setting different strength of the hierarchical clustering). For each value of $k$ , the following table can then be created

M=\qquad (i=1,\ldots ,k{\text{ and }}j=1,\ldots ,k)

where $m_{i,j}$ is of objects common between the $i$ th cluster of $A_{1}$ and $j$ th cluster of $A_{2}$ . The Fowlkes–Mallows index for the specific value of $k$ is then defined as

B_{k}={\frac {T_{k}}{\sqrt {P_{k}Q_{k}}}}

where

T_{k}=\sum _{i=1}^{k}\sum _{j=1}^{k}m_{i,j}^{2}-n

P_{k}=\sum _{i=1}^{k}(\sum _{j=1}^{k}m_{i,j})^{2}-n

Q_{k}=\sum _{j=1}^{k}(\sum _{i=1}^{k}m_{i,j})^{2}-n

$B_{k}$ can then be calculated for every value of $k$ and the similarity between the two clusterings can be shown by plotting $B_{k}$ versus $k$ . For each $k$ we have $0\leq B_{k}\leq 1$ .

Fowlkes–Mallows index can also be defined based on the number of points that are common or uncommon in the two hierarchical clusterings. If we define

TP

as the number of pairs of points that are present in the same cluster in both

A_{1}

and

A_{2}

.

FP

as the number of pairs of points that are present in the same cluster in

A_{1}

but not in

A_{2}

.

FN

as the number of pairs of points that are present in the same cluster in

A_{2}

but not in

A_{1}

.

TN

as the number of pairs of points that are in different clusters in both

A_{1}

and

A_{2}

.

It can be shown that the four counts have the following property

TP+FP+FN+TN=n(n-1)/2

Navigácia: Veda >

Analytika
Antropológia
Aplikované vedy
Bibliometria
Dejiny vedy
Encyklopédie
Filozofia vedy
Forenzné vedy
Humanitné vedy
Knižničná veda
Kryogenika
Kryptológia
Kulturológia
Literárna veda
Medzidisciplinárne oblasti
Metódy kvantitatívnej analýzy
Metavedy
Metodika

Metodológia vedy
Náboženstvo a veda
Náučná literatúra
Podvody vo vede
Popularizácia vedy
Potravinárstvo
Prírodné vedy
Pseudoveda
Scientometria
Spoločenské vedy
Teórie
Teatrológia
Technické vedy
Technika
Terminológia
Umenie
Výskum

Veda
Veda a technika podľa štátu
Veda a technika podľa kontinentu
Veda a technika podľa roka
Veda v kozme
Vedci
Vedecká literatúra
Vedecké databázy
Vedecké experimenty
Vedecké konferencie
Vedecké metódy
Vedecké ocenenia
Vedecké organizácie
Vedecké parky
Vedeckí spisovatelia
Vzdelávanie
Záhady

Príbuzné výrazy:

Text je dostupný za podmienok Creative Commons Attribution/Share-Alike License 3.0 Unported; prípadne za ďalších podmienok.
Podrobnejšie informácie nájdete na stránke Podmienky použitia.

[1]

[2]