Interfacing R and GXNA ---------------------- The R package Bioconductor is commonly used for microarray analysis, hence an R interface for GXNA would be very helpful. In principle it is possible to integrate the GXNA C++ code into an R package and write an R wrapper function around it. We plan to do this in the near future; due to issues such as memory allocation, it is not as straightforward as one would hope. In the meanwhile, it is still possible to use R and GXNA with a little more effort. The idea is to export the R data into plain text files and run GXNA on these. It is not elegant but it works. Here are some functions that make it relatively painless. You can simply cut and paste the relevant code into R. 1 Creating GXNA input files --------------------------- GXNA needs three files to run: the expression file, the phenotype file, and the array annotation file. You only need to create these files once. 1.1 The phenotype and expression files -------------------------------------- The phenotype will probably be stored in R as a vector. You can just paste this into the phenotype file, or use the code below. If you are using the limma Bioconductor package, normalized expression levels will probably be stored into an MAList object. The following code outputs expression levels into a file. Numbers are trimmed to three decimals to make the file smaller and faster to read/write. Currently GXNA cannot handle missing expression values. This will be fixed soon, but for now the recommended workaround is to set all NA values to 0. This can be done in R before running the code below, or afterwards by editing the expression file. # R code starts here trimNumber = function(x) { return(sprintf("%.3f", x)) # keep only three decimals } writeArray = function(arrayName, fileName) { write.table(arrayName, file = fileName, quote=FALSE, row.names=FALSE, col.names=FALSE) } writeExpression = function(MAListName, fileName) { writeArray(cbind(MAListName$genes$ProbeName, apply(MAListName$M, 2, trimNumber)), fileName) } prepareGXNA = function(MAListName, phenotype, projectName) { expFile = paste(projectName, "exp", sep = ".") phenoFile = paste(projectName, "phe", sep = ".") writeExpression(MAListName, expFile) writeArray(t(phenotype), phenoFile) # must transpose phenotype array } # R code ends here Example: prepareGXNA(myMA, pheno, "cells") will use expression values in the MAList myMA and phenotype values in the vector pheno to create files cells.exp and cells.phe. If you are not using an MAList, you probably will still store probe names into some vector, and expression values into some array. Then you can just replace in the above code MAListName$genes$ProbeName by the probe vector, and MAListName$M by the expression array. 1.2 The array annotation file ----------------------------- If the map file from probe name to EntrezGene ID is not yet available online, you will need to create it. One easy way to do it is to use the Bioconductor annotation package for that array, assuming it exists. The following code will do the job (you may need to install Bioconductor first): # R code starts here writeProbeMap = function(packageName, arrayName = packageName) { library(packageName, character.only=TRUE) # load package map = as.list(get(paste(packageName, "LOCUSID", sep = ""))) mapFile = paste(arrayName, "ann", sep = ".") write.table(t(rbind(names(map), map)), file = mapFile, row.names=FALSE, col.names=FALSE) } # R code ends here Example: writeProbeMap("hgu95av2") will create an annotation file for the Affy U95 array named hgu95av2.ann. The file name can be changed by supplying a second argument to the function. 2 Running GXNA -------------- Once the files are created you can run GXNA as many times as you want. It is easiest to do it from the DOS/Unix command line, though you can also use the R system() function. Example: ./gxna -name cells -mapFile hgu95av2.map will run GXNA using the input files created in section 1, and output the results into the file cells_000.res. 3 Reading the output into R --------------------------- This can be done using read.table(), at least when output lines have a fixed number of words, which is guaranteed for single-gene analysis and for fixed size adapted search. Note that the output columns have changed in GXNA version 2.0. For example, for a standard single gene analysis, you can use: gxnaResults = read.table("cells_000.res", row.names = 1, col.names = c("rank","name","id","id2","score","rawp","adjp"))