Data formats for the PPC R package



The input data can be either:

Raw spectra, one spectra in each .csv (comma separated file), with lines of the form (m/z, intensity)

OR

Peaks: the peak from one spectra in each .txt (text) file, with lines of the form (m/z peak intensity).
Extra info in comment lines, or additional entries after (m/z, peak intensity), are ignored. Hence one can use Ciphergen output files directly




With either format, the data are arranged in folders (directories), one for each outcome class (eg control or diseased)

In addition, PPC can handle batches, eg samples run on different chip surfaces. The folders then have the form control/batch1, control/batch2 etc

See

data.raw.nobatch.tar.gz
data.raw.batch.tar.gz
data.peaks.nobatch.tar.gz
data.peaks.batch.tar.gz

for example datasets covering all of these possibilities

NOTE: When raw spectra are used as input, PPC applies its built-in peak finder. This peak-finder is crude, simply looking for local maxima in a window, with an intensity a pre-specified amount over background. Better results can often be obtained by applying a more sophisticated peak finding procedure, and then using the extracted peaks as input into PPC.