This is a web tool to apply, in a user-friendly way, the
lpda
package available in
https://cran.r-project.org/web/packages/lpda/
.
Global classification
Examples:
4 available examples.
Upload data:
txt file with samples in rows and variables in columns. First column have to be the class.
PCA analysis:
for dimension reduction. Recommended for high-dimensional data.
Training & test sets
Test set:
Number of rows corresponding to the test set separated with commas.
Validation
%data test:
To compute the number of individuals to evaluate in the test-set.
Times evaluated:
Times the model is evaluated.
Number of PCs:
Number of Principal Components to compute.
%Explained Variance:
PCs that explains this specific % of variance will be selected.
Confusion matrix. Comparing predicted with real. Predicted in rows / Real in columns
Predicted classes
Download
Predicted classes for test set
Prediction error rate for current model
Prediction error rate for a model with a specific number of PCs
Prediction error rate for a model with PCs that explains a specific % of variance
Examples, except iris, are part of the data supplied by the
lpda
package.
palmdates data
It is a data set with scores of 21 palm dates including their respective Raman spectra
and the concentration of five compounds covering a wide range of concentrations: fibre,
glucose, fructose, sorbitol and myo-inositol [2]. The
first 11 dates are Spanish (from Elche, Alicante) with no well-defined variety and the last 10
are from other countries and varieties, mainly Arabian. The data set has two data.frames:
conc with 5 variables and spectra with 2050.
RNAseq data
It is data set has been simulated as Negative Binomial distributed and transformed to
rpkm (Reads per kilo base per million mapped reads). It contains 600 genes (in columns)
and 60 samples (rows), 30 of each one of the experimental groups. First 30 samples are from
first group and the remaining samples from the second one. It has been simulated with few
variables (genes) that discriminate between groups. There is few correlation and a lot of
noise.
iris data
This is the famous (Fisher’s or Anderson’s) iris data set
that is available in base R. The iris data set gives the measurements in centimeters of the
variables sepal length and width and petal length and width, respectively, for 50 flowers from
each of 3 species of iris. The species are Iris setosa, versicolor and virginica.
References
[1] Nueda MJ, Gandia C, Molina MD (2022) LPDA: A new classification method based on linear programming.
PLoS ONE 17(7): e0270403.
<https://doi.org/10.1371/journal.pone.0270403>
.
[2] Abdrabo, S.S., Gras, L., Grindlay, G. and Mora, J. (2021) Evaluation of Fourier Transform-Raman Spectroscopy
for palm dates characterization. Journal of food composition and analysis. Submitted.