R version 2.9.2 (2009-08-24)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

>
> d1 = c(3,2,0,5,0,0,0,2,0,0)
> d2 = c(1,0,0,0,0,0,0,1,0,2)
>
> # our cosine similarity function
>
> cos.sim = function(p,q) {
+
+ return(sum(p*q) / sqrt(sum(p^2)*sum(q^2)))
+
+ }
>
> print(cos.sim(d1,d2))
[1] 0.3149704
>
> # cosine similarity is not exactly the same
> # as correlation
>
> print(cor(d1,d2))
[1] 0.01814885
>
> # but, correlation is the same as cosine similarity
> # after removing the mean
>
> print(cos.sim(d1-mean(d1),d2-mean(d2)))
[1] 0.01814885
>
> # other measures of correlation
> # Spearman, Kendall uses ranks of the data
>
> print(cor(d1, d2, method='kendall'))
[1] 0.1936008
> print(cor(d1, d2, method='spearman'))
[1] 0.2160396
>
>

>
>
> # here is a generative example of correlation
> # that reproduces a plot similar to the notes
>
> correlated.data = function(n=50, rho=0) {
+ x = rnorm(n)
+ y = rho*x + sqrt(1-rho^2)*rnorm(n)
+ return(data.frame(x,y))
+ }
>
> rho = seq(-1,1,length=9)
> par(mfrow=c(3,3))
>
> for (r in rho) {
+ d = correlated.data(rho=r)
+ plot(d, main=r)
+ }
>
>

>


>
> proc.time()
user system elapsed
0.636 0.028 0.806
R script