The HotellingEllipse package is designed to help draw the Hotelling ellipse on the PCA or PLS score scatterplot. HotellingEllipse computes the Hotelling’s T\(^2\) value, the semi-minor axis (denoted a), the semi-major axis (denoted b) along with the x-y coordinates for drawing a confidence ellipse based on Hotelling’s T\(^2\). Specifically, there are two functions available:
ellipseParam()
, is used to calculate the Hotelling’s T\(^2\) and the semi-axes of an ellipse at 99% and 95% confidence intervals.
ellipseCoord()
, is used to get the x and y coordinates of a confidence ellipse at user-defined confidence interval. The confidence interval is set at 95% by default.
library(HotellingEllipse)
data("specData")
In this example, we use FactoMineR::PCA()
to perform the Principal Component Analysis (PCA) from a LIBS spectral dataset specData
and extract the PCA scores as a data frame tibble::as_tibble()
.
set.seed(123)
<- specData %>%
pca_mod select(where(is.numeric)) %>%
PCA(scale.unit = FALSE, graph = FALSE)
<- pca_mod %>%
pca_scores pluck("ind", "coord") %>%
as_tibble()
pca_scores#> # A tibble: 171 x 5
#> Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 144168. -36399. 2228. -670. 13805.
#> 2 118520. -31465. 16300. -20686. -13872.
#> 3 90303. -28356. 31340. -60615. 15157.
#> 4 107107. -38209. 24897. -60366. 19449.
#> 5 74350. -2148. 29814. -8351. 494.
#> 6 97511. -17932. 22254. -15406. -4195.
#> 7 82142. 19297. -34299. -12498. -648.
#> 8 76261. 16566. -34382. -16293. 137.
#> 9 73705. 31091. -22577. -17182. 2438.
#> 10 68042. 25124. -26063. -19389. 6051.
#> # … with 161 more rows
To add a confidence ellipse, we use the function ellipseParam()
. We want to compute the length of the ellipse semi-axes for bivariate data within the PC1-PC2 subspace. To do this, we set the number of components, k, to 2, while the pcx and pcy inputs are respectively set to 1 and 2.
<- ellipseParam(data = pca_scores, k = 2, pcx = 1, pcy = 2) res
str(res)
#> List of 4
#> $ Tsquare : tibble[,1] [171 × 1] (S3: tbl_df/tbl/data.frame)
#> ..$ value: num [1:171] 2.28 2.65 8 8.63 1.05 ...
#> $ Ellipse : tibble[,4] [1 × 4] (S3: tbl_df/tbl/data.frame)
#> ..$ a.99pct: num 319536
#> ..$ b.99pct: num 91816
#> ..$ a.95pct: num 256487
#> ..$ b.95pct: num 73699
#> $ cutoff.99pct: num 9.52
#> $ cutoff.95pct: num 6.14
We can extract parameters for further use:
<- pluck(res, "Ellipse", "a.99pct")
a1 <- pluck(res, "Ellipse", "b.99pct") b1
<- pluck(res, "Ellipse", "a.95pct")
a2 <- pluck(res, "Ellipse", "b.95pct") b2
<- pluck(res, "Tsquare", "value") Tsq
Another way to add Hotelling ellipse is to use the function ellipseCoord()
. This function provides the x and y coordinates of the confidence ellipse at user-defined confidence interval. The confidence interval confi.limit
is set at 95% by default. Below, the x-y coordinates are estimated based on data projected into the PC1-PC3 subspace.
<- ellipseCoord(data = pca_scores, pcx = 1, pcy = 3, conf.limit = 0.95, pts = 500) xy_coord
str(xy_coord)
#> tibble[,2] [500 × 2] (S3: tbl_df/tbl/data.frame)
#> $ x: num [1:500] 256487 256466 256405 256304 256161 ...
#> $ y: num [1:500] -1.73e-12 7.93e+02 1.59e+03 2.38e+03 3.17e+03 ...