Raw spectra, one spectra in each .csv (comma separated file), with lines of the form (m/z, intensity)
OR
Peaks: the peak from one spectra in each .txt (text) file,
with lines of the form (m/z peak intensity).
Extra info in comment lines, or additional entries after
(m/z, peak intensity), are ignored. Hence one can use Ciphergen
output files directly
With either format, the data are arranged in folders (directories), one for each outcome class (eg control
or diseased)
In addition, PPC can handle batches, eg samples run on different chip surfaces. The folders then have the form control/batch1, control/batch2 etc
See
data.raw.nobatch.tar.gz
data.raw.batch.tar.gz
data.peaks.nobatch.tar.gz
data.peaks.batch.tar.gz
for example datasets covering all of these possibilities
NOTE: When raw spectra are used as input, PPC applies its built-in peak finder. This peak-finder is crude, simply looking for local maxima in a window, with an intensity a pre-specified amount over background. Better results can often be obtained by applying a more sophisticated peak finding procedure, and then using the extracted peaks as input into PPC.