In making Venn diagrams to look at overlap of sets, I often wonder how significant a given amount of overlap is. What is the likelyhood of seeing a given amount of overlap from two sets, simply by chance?

One way to assess this, is to use the hypergeometric distribution. The R language has a nice function for calculating the p-value, but the explanation of how to use it involves an Urn of black and white balls.

phyper(q,n,m,k,lower.tail=F)

q = the number of white balls drawn from the urn

n = the number of white balls in the urn

m = the number of black balls in the urn

k = the number of balls drawn from the urn (sample size)

Making Venn Diagrams

* I wrote a utility for making venn diagrams: venn diagrams

* But someone else wrote a better one recently: venny

VennSignificance (last edited 2011-08-29 17:54:13 by ChrisSeidel)