lpda: Linear Programming Discriminant Analysis

This is a web tool to apply, in a user-friendly way, the lpda package available in https://cran.r-project.org/web/packages/lpda/ .

Global classification

Examples: 4 available examples.

Upload data: txt file with samples in rows and variables in columns. First column have to be the class.

PCA analysis: for dimension reduction. Recommended for high-dimensional data.

Training & test sets

Test set: Number of rows corresponding to the test set separated with commas.

Validation

%data test: To compute the number of individuals to evaluate in the test-set.

Times evaluated: Times the model is evaluated.

Number of PCs: Number of Principal Components to compute.

%Explained Variance: PCs that explains this specific % of variance will be selected.

Confusion matrix. Comparing predicted with real. Predicted in rows / Real in columns

                
Predicted classes

                
                  
                  Download
                
              
Predicted classes for test set

              

Prediction error rate for current model


                

Prediction error rate for a model with a specific number of PCs


                

Prediction error rate for a model with PCs that explains a specific % of variance


                

Examples, except iris, are part of the data supplied by the lpda package.

palmdates data

It is a data set with scores of 21 palm dates including their respective Raman spectra and the concentration of five compounds covering a wide range of concentrations: fibre, glucose, fructose, sorbitol and myo-inositol [2]. The first 11 dates are Spanish (from Elche, Alicante) with no well-defined variety and the last 10 are from other countries and varieties, mainly Arabian. The data set has two data.frames: conc with 5 variables and spectra with 2050.

RNAseq data

It is data set has been simulated as Negative Binomial distributed and transformed to rpkm (Reads per kilo base per million mapped reads). It contains 600 genes (in columns) and 60 samples (rows), 30 of each one of the experimental groups. First 30 samples are from first group and the remaining samples from the second one. It has been simulated with few variables (genes) that discriminate between groups. There is few correlation and a lot of noise.

iris data

This is the famous (Fisher’s or Anderson’s) iris data set that is available in base R. The iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor and virginica.

References

[1] Nueda MJ, Gandia C, Molina MD (2022) LPDA: A new classification method based on linear programming. PLoS ONE 17(7): e0270403. <https://doi.org/10.1371/journal.pone.0270403> .

[2] Abdrabo, S.S., Gras, L., Grindlay, G. and Mora, J. (2021) Evaluation of Fourier Transform-Raman Spectroscopy for palm dates characterization. Journal of food composition and analysis. Submitted.

Maintainer: Maria J. Nueda