Course Home Page – STOR 881 Object Oriented Data Analysis – Spring 2022

Instructor:    J. S. Marron

Email:   marron@unc.edu

Office:   352 Hanes Hall

SyllabusSTOR881Fall2021

 

Course Notes

  1.   STOR881-01-11-2022.pptx:  Organizational Matters, OODA Book, What is OODA?, Taste of OODA Examples (including Spanish Male Mortality, Amplitude – Phase, Shapes, Sounds, Faces), 3 Major Phases of OODA.
  2.   STOR881-01-13-2022.pptx:  Visualization, Scatterplot Matrix Views, Principal Component Analysis (PCA), Object Space – Trait Space, Scree Plots.
  3.   STOR881-01-18-2022.pptx:  Define Modes of Variation,  Prob. Dist’ns as data objects, PCA Toy & Real Examples, Shifted Parabolas Data, Lung Cancer Data, Limitation of PCA: Apple, Banana, Pear.
  4.   STOR881-01-20-2022.pptx:  Limitations of PCA: NCI-60 Data, OODA Terminology, Caution about DWD, Inference using DiProPerm, Marginal Distribution Plots.
  5. STOR881-01-25-2022.pptx:  Marginal Distribution Plot Analysis of Drug Discovery Data, Normalization and Correlation PCA, Transformations.
  6. STOR881-01-27-2022.pptx:  Melanoma Data, Automatic Shifted Log Transformation, ROC Curve to Quantify Impact of Transformation on Gene Expression Data, Heatmap Data Visualization.
  7. STOR881-02-01-2022.pptx:  Finish Heat Maps, Lung Cancer Data, Other Directions for Scatterplot Views, Centering, Details of PCA, Review Linear Algebra, Covariance Matrices, PCA as Optimization.
  8. STOR881-02-03-2022.pptx:  Finish PCA as Optimization, Alternate Viewpoints of PCA (Data Representation, Distribution of Energy, Simulation, Comparison to SVD)Distance Methods.
  9. STOR881-02-08-2022.pptx: Distance Methods, Distance Based Centers, Multidimensional Scaling, Shapes as Data Objects, Shape Representations (Landmark, Boundary, Shape)Male Pelvis Data.
  10. STOR881-02-10-2022.pptx:  Male Pelvis Data, Manifold Data, Directional Data, S-rep Analysis.
  11. STOR881-02-15-2022.pptx:  Principal Geodesic Analysis, Polysphere PCA, Torus PCA.  Taebin Kim:  Stain Normalization, Sherry Shuxian Wang: reconstruction and modeling from colonoscopy video 
  12. STOR881-02-17-2022.pptx:   Scaled Torus PCA, Principal Nested Spheres, Nonnegative Nested Cone Analysis, Principal Curves and Surfaces.  SooHyun Kim: Reading the Stars of Online Reviews: A Look into the Boundary Conditions of Helpfulness
  13. STOR881-02-22-2022.pptx: General Motivation for Backwards Methods, Curve Registration, Shifted Betas Example, Amplitude and Phase Modes of Variation.  Panos Andreou: Estimating the Transition Matrix of a Markov Chain
  14. STOR881-02-24-2022.pptx:  Amplitude and Phase Modes of Variation, Fisher-Rao Curve Estimation.  Joseph Lavond: Fast Adversarial Training,  Sam Ehrenstein:  Non-Contrast Ultrasound Imaging of Blood Microvessels using SVD
  15. STOR881-03-01-2022.pptx:  Principal Nested Spheres on SRVF Sphere, Juggling Data, Classification – Discrimination: Basics (k-NN, Linear Classifiers), Mean Difference.   Younghoon Kim: Simultaneous Component Analysis for Joint and Individual Models, Hong Dang: Single-cell RNA-seq from SARS-CoV-2 infected human airway epithelial cells
  16. STOR881-03-03-2022.pptx: Linear Discriminant Analysis (non-parametric and likelihood derivations), HDLSS Classification.  Kenya Vazquez Martinez: Kippenhahn curves of some tridiagonal matrices
  17. STOR881-03-08-2022.pptx: HDLSS Classification, Maximal Data Piling, Kernel Methods.  Rui Liu: Hi-C Data Analysis, Nick Tapp-Hughes: Introduction to Skeletal Representations
  18. STOR881-03-10-2022.pptx:  Kernel Methods, Kernel PCA, t-SNE Visualization, Support Vector Machine.  Brian White: extreme value analysis of sea-level time series data, John Lin:  Acute Leukemia Classification using Nanopore Sequencing Data
  19. STOR881-03-15-2022.pptx:  Spring Break
  20. STOR881-03-17-2022.pptx:  Spring Break 
  21. STOR881-03-22-2022.pptx:  Revisit SVM Optimization, SVM Tuning and Extensions, Distance Weighted Discrimination, DWD Simulations, Compare DWD, SVM and MD Visualizations, DWD Batch Adjustment,  Alice Peng: Studying Epstein-Barr with qPCR and PCA Darius Bost: Meta-analysis of Predicted Expression on Addiction Traits
  22. STOR881-03-24-2022.pptx:  DWD Source & Batch Adjustment, NCI-60 Data, Why not Adjust by Means?, Radial DWD, Data Integration,  Jordan Valone: Context specific effects of genetic variation in response to Wnt pathway activation, Darius Bost: Meta-analysis of Predicted Expression on Addiction Traits
  23. STOR881-03-29-2022.pptx:  Partial Least Squares, Canoncical Correlation Analysis, Angle Based Joint and Individual Variation Explained (AJIVE), FMRI Data,  Minji Kim: Step Data Clustering via Thick Pen Transform, Parvathi Meyyappan: Classifying Fuel Type from Chemical Compounds in Wildfire Smoke Samples
  24. STOR881-03-31-2022.pptx: Finish FMRI Data, AJIVE Algorithm and Diagnostics, Breast Cancer Images and Genomics,  Xianwen He: County-level GDP estimation around Chinese Mainland via CNN from remote sensing data,  Andy Ackerman: Data Integration for MRI
  25. STOR881-04-05-2022.pptx:  Multiple Genomics in Breast Cancer, Amplification Adjustment in Single Cell RNAseq, Data Integration Via Analysis of Subspaces (DIVAS), DIVAS Toy Example, DIVAS on TCGA Data, High Dimension Low Sample Size (HDLSS) Analysis,  Ben Brown: Binary Expansion Testing,  Adrian Allen: A Tree-based Method for Parcellating the Brain Based on Its Structural and Functional Connectivity, Izzy Wiesenthal: Eliminating Acoustic Feedback in Hearing Aids
  26. STOR881-04-07-2022.pptx:  DIVAS on TCGA Data, Introduction to High Dimension Low Sample Size Asymptotics, Explanation of DWD Visualization, Technical Assumptions, Zero Covarince is Not Independence, Mixing Conditions,   Jeff Ayers: Solutions of the Quantum Differential Equation,  Hank Flury: Bivariate Estimation for Spatial Processes
  27. STOR881-04-12-2022.pptx:  HDLSS Analysis of PCA, HDLSS Explanation of Earlier Observations,  Yang Luo: Halpern-Type Accelerated Algorithms For Monotone Inclusions,  Madison Lindsay: Analysis of a State-Dependent Markovian Queuing System
  28. STOR881-04-14-2022.pptx:  Wellness Day
  29. STOR881-04-19-2022.pptx: Introduction to Random Matrix Theory, Macenko-Pastur Distribution, K-Means Clustering,  Bongsoo Yi:  Adversarial Attack on Object Detection, Fengyu Yang: Method of moments for protein structuring from cryo-electron microscopy, Wan Zhang: Nonparametric prediction distribution from resolution-wise regression with heterogeneous data
  30. STOR881-04-21-2022.pptx:  Hierarchical Clustering, SigClust,  Liubov Arbeeva:  Accelerometer Data Analysis,  Shaleni Kovach:  Generative models for brain imaging data, Puyao Ge: Vaccine Distributoin Policy
  31. STOR881-04-26-2022.pptx:  SigClust, Smoothing, SiiZer,  Jimin Choi: Human trait prediction using brain connectome data, Kyungjin Sohn: Brain age prediction using deep learning uncovers associated sequence variants Emma Mitchell: Using a Dirichlet Mixture Model to Detect Concomitant Changes in Allele Frequencies, Elyse Borgert:  Topological Data Analysis

 

References

Ahn, J. (2006) High dimension, low sample size data analysis. PhD Dissertation, University of North Carolina, Chapel Hill (cited 3/22/22)

Ahn, J. (2010). A stable hyperparameter selection for the Gaussian RBF kernel for discrimination. Statistical Analysis and Data Mining: The ASA Data Science Journal, 3(3), 142-148.

Ahn, J., Lee, M. H., & Yoon, Y. J. (2012). Clustering high dimension, low sample size data using the maximal data piling distance. Statistica Sinica, 443-464 (cited 3/8/22)

Ahn, J., Marron, J. S., Muller, K. M., & Chi, Y. Y. (2007) The high-dimension, low-sample-size geometric representation holds under mild conditions. Biometrika, 94(3), 760-766 (cited 4/7/22)

Ahn, J., & Marron, J. S. (2010) The maximal data piling direction for discrimination. Biometrika, 97(1), 254-259 (cited 3/8/22)

Aitchison, J. (1982) The statistical analysis of compositional data. Journal of the Royal Statistical Society: Series B (Methodological), 44, 139-160 (cited 1/18/22)

Aizerman, A., Braverman, E. M., & Rozoner, L. I. (1964) Theoretical foundations of the potential function method in pattern recognition learning. Automation and remote control, 25, 821-837 (cited 3/8/22)

Alter, O., Brown, P. O., & Botstein, D. (2000) Singular value decomposition for genome-wide expression data processing and modeling. Proceedings of the National Academy of Sciences, 97, 10101-10106 (cited 3/22/22)

Amari, S. I. (2012). Differential-geometrical methods in statistics (Vol. 28). Springer Science & Business Media (cited 2/22/22)

Anderson, T. W., & Darling, D. A. (1952) Asymptotic theory of certain” goodness of fit” criteria based on stochastic processes. The Annals of Mathematical Statistics, 193-212  (cited 1/27/22)

Aoshima, M., Shen, D., Shen, H., Yata, K., Zhou, Y. H., & Marron, J. S. (2018). A survey of high dimension low sample size asymptotics. Australian & New Zealand journal of statistics, 60(1), 4-19 (cited 4/12/55)

Bai, Z. D., & Saranadasa, H. (1996) Effect of high dimension: by an example of a two sample problem. Statistica Sinica, 6(2), 311-329 (cited 1/18/22)

Basser, P. J., Mattiello, J., & LeBihan, D. (1994). MR diffusion tensor spectroscopy and imaging. Biophysical journal, 66(1), 259-267 (cited 2/22/22)

Benito, M., Parker, J., Du, Q., Wu, J., Xiang, D., Perou, C. M., & Marron, J. S. (2004) Adjustment of systematic microarray data biases. Bioinformatics, 20(1), 105-114 (cited 1/11/22, 3/22/22)

Bickel, P. J. and Levina, E. (2004) Some theory for Fisher’s Linear Discriminant function, “naive Bayes”, and some alternatives when there are many more variables than observations, Bernoulli, 10, 989-1010 (cited 3/3/22)

Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.  (cited 3/10/22)

Bookstein, F. L. (1991). Morphometric Tools for Landmark Data, Cambridge: Cambridge University Press (cited 2/8/22)

Borland, D., & Taylor, R. M. (2007). Rainbow color map (still) considered harmful. IEEE computer graphics and applications, 27(2), 14-17, (cited 2/1/22)

Borysov, P., Hannig, J., Marron, J. S., Muratov, E., Fourches, D., & Tropsha, A. (2016). Activity prediction and identification of mis‐annotated chemical compounds using extreme descriptors. Journal of Chemometrics, 30(3), 99-108 (cited 1/25/22)

Boser, B. E., Guyon, I. and Vapnik, V. (1992) A Training Algorithm for Optimal Margin Classifiers, in Fifth Annual Workshop on Computational Learning Theory, ACM (cited 3/10/22)

Box, G. E., & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society. Series B (Methodological), 211-252 (cited 1/25/22)

Bradley, R. C. (2005). Basic properties of strong mixing conditions. A survey and some open questions. Probab. Surv. 2 107–144 (electronic). (Update of, and a supplement to, the 1986 original.)  (cited 4/7/22)

Brooks, J. P., Dulá, J. H., & Boone, E. L. (2013). A pure L1-norm principal component analysis. Computational statistics & data analysis, 61, 83-98 (cited 2/17/22)

Bullitt, E., & Aylward, S. R. (2002). Volume rendering of segmented image objects. IEEE Transactions on Medical Imaging, 21(8), 998-1002. (cited 1/11/22)

Burges, C. J. (1998). A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery, 2(2), 121-167 (cited 3/10/22)

Cabanski, C. R., Qi, Y., Yin, X., Bair, E., Hayward, M. C., Fan, C., Li, J., Wilkerson, M. D., Marron, J. S., Perou, C. M. and Hayes, D. N. (2010) SWISS MADE: Standardized WithIn Class Sum of Squares to Evaluate Methodologies and Dataset Elements, PLoS ONE, 5(3): e9905.doi:10.1371/journal.pone.0009905, PMCID: PMC2845619.   (cited 4/19/22)

Cai, T., Liu, W., & Xia, Y. (2014) Two‐sample test of high dimensional means under dependence. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(2), 349-372 (cited 1/18/22)

Carmichael, I., Calhoun, B. C., Hoadley, K. A., Troester, M. A., Geradts, J., Couture, H. D., … & Marron, J. S. (2021). Joint and individual analysis of breast cancer histologic images and genomic covariates. The Annals of Applied Statistics, 15(4), 1697-1722 (cited 4/31/22)

Carmichael, I., & Marron, J. S. (2021). Geometric insights into support vector machine behavior using the KKT conditions. Electronic Journal of Statistics, 15(2), 6311-6343 (cited 3/22/22)

Cates, J., Fletcher, P. T., Styner, M., Shenton, M., & Whitaker, R. (2007, July). Shape modeling and analysis with entropy-based particle systems. In Biennial International Conference on Information Processing in Medical Imaging (pp. 333-345). Springer, Berlin, Heidelberg (cited 1/20/22, 2/10/22)

Cattell, R. B. (1966) The scree test for the number of factors. Multivariate Behavioral Research, 1(2), 245-276 (cited 1/13/22, 2/1/22, 4/19/22)

Chaney, E. L., Pizer, S., Joshi, S., Broadhurst, R., Fletcher, T., Gash, G., … & Tracton, G. (2004). Automatic male pelvis segmentation from CT images via statistically trained multi-object deformable m-rep models. International Journal of Radiation Oncology, Biology, Physics, 60(1), S153-S154. (cited 1/11/22)

Chang, T. (1988). Estimating the relative rotation of two tectonic plates from boundary crossings. Journal of the American Statistical Association, 83(404), 1178-1183 (cited 2/8/22)

Chaudhuri, P. and Marron, J. S. (1999) SiZer for exploration of structure in curves, Journal of the American Statistical Association, 94, 807-823 (cited 4/26/22)

Chaudhuri, P., & Marron, J. S. (2000). Scale space view of curve estimation. Annals of Statistics, 408-428 (cited 4/26/22)

Chen, M., & Zhou, X. (2016). Single Cell Partial Least Squares, unpublishd manuscript (cited 3/29/22)

Chen, S. X., & Qin, Y. L. (2010) A two-sample test for high-dimensional data with applications to gene-set testing. The Annals of Statistics, 808-835  (cited 1/18/22)

Cootes, T. F., Hill, A., Taylor, C. J. and Haslam, J. (1993) The use of active shape models for locating structures in medical images, Information in Medical Imaging, H. H. Barret and A. F. Gmitro, eds. Lecture Notes in Computer Science 687, 33-47, Springer Verlag, Berlin (cited 2/10/22)

CRAN-DWD (2014). https://cran.r-project.org/package=DWD (cited 3/22/22)

Cristianini, N. and Shawe-Taylor, J. (2000) An Introduction to Support Vector Machines, Cambridge University Press (cited 3/10/22)

Damon, J., & Marron, J. S. (2014). Backwards principal component analysis and principal nested relations. Journal of Mathematical Imaging and Vision, 50(1-2), 107-114 (cited 2/22/22)

DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, 44, 837-845 (cited 1/27/22)

Diaconis, P., Goel, S., & Holmes, S. (2008). Horseshoes in multidimensional scaling and local kernel methods. The Annals of Applied Statistics, 2(3), 777-807 (cited 2/3/22)

Dobriban, E. (2015). Efficient computation of limit spectra of sample covariance matrices. Random Matrices: Theory and Applications, 4(04), 1550019 (cited 4/19/22)

Domingos, P. & Pazzani, M. (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29:103–­137 (cited 3/1/22)

Dryden, I.L., Mardia, K.V. (2016) Statistical Shape Analysis with applications in R, Wiley, Chichester (cited 2/8/22)

Duda, R. O. and Hart P. E. (1973) Pattern Classification and Scene Analysis, Wiley, New York (cited 3/1/22)

Duda, R. O., Hart P. E. and Stork, D. G. (2001) Pattern Classification, Wiley, New York (cited 3/1/22)

Duin, R. P., & Pekalska, E. (2005). Dissimilarity Representation For Pattern Recognition, The: Foundations And Applications (Vol. 64). World scientific. (cited 2/8/22)

Eckart, C., & Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika, 1(3), 211-218 (cited 2/8/22)

El Karoui, N. (2010). The spectrum of kernel random matrices. The Annals of Statistics, 38(1), 1-50 (cited 12/19/22)

Eltzner, B., Jung, S., & Huckemann, S. (2015). Dimension reduction on polyspheres with application to skeletal representations. In International Conference on Networked Geometric Science of Information (pp. 22-29). Springer, Cham. (cited 2/15/22)

Eltzner, B., Huckemann, S., & Mardia, K. V. (2018). Torus principal component analysis with applications to RNA structure. The Annals of Applied Statistics, 12(2), 1332-1359 (cited 2/15/2022)

Feng, Q., Hannig, J., & Marron, J. S. (2016). A note on automatic data transformation. Stat, 5(1), 82-87  (cited 1/25/2019, 1/27/22)

Feng, Q., Jiang, M., Hannig, J., & Marron, J. S. (2018). Angle-based joint and individual variation explained. Journal of multivariate analysis, 166, 241-265 (cited 3/24/22)

Fisher, R.A. (1936) The Use of Multiple Measurements in Taxonomic Problems, Annals of Eugenics, 7, 179-188  (cited 3/1/22)

Fisher, N. I. (1995). Statistical analysis of circular data. Cambridge University Press (cited 2/10/22)

Fletcher, P. T. (2004) Statistical variability in nonlinear spaces: Application to shape analysis and DT-MRI, University of North Carolina at Chapel Hill  (cited 2/10/22)

Fletcher, P. T., Lu, C., Pizer, S. M., & Joshi, S. (2004). Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE transactions on medical imaging, 23(8), 995-1005 (cited 2/15/2022)

Fréchet, M. (1948) Les éléments aléatoires de nature quelconque dans un espace distancié, Annales de l’institut Henri Poincaré, 10, 215-310 (cited 2/8/22, 2/10/22)

Gaydos, T. L., Heckman, N. E., Kirkpatrick, M., Stinchcombe, J. R., Schmitt, J., Kingsolver, J., & Marron, J. S. (2013). Visualizing genetic constraints. The Annals of Applied Statistics, 7(2), 860-882 (cited 2/1/2022)

Gersho, A. and Gray, R. M. (1991) Vector Quantization and Signal Compression, Springer, New York  (cited 4/19/22)

Godtliebsen, F., Marron, J. S., & Chaudhuri, P. (2002). Significance in scale space for bivariate density estimation. Journal of Computational and Graphical Statistics, 11(1), 1-21 (cited 4/26/22)

Godtliebsen, F., Marron, J. S., & Chaudhuri, P. (2004). Statistical significance of features in digital images. Image and Vision Computing, 22(13), 1093-1104 (cited 4/26/22)

Godtliebsen, F., Marron, J. S., & Pizer, S. M. (2002). Significance in scale-space for clustering. Spatial clustering modeling. Chapman and Hall/CRC, 24-36 (cited 4/26/22)

Good, I. J., & Gaskins, R. A. (1980). Density estimation and bump-hunting by the penalized likelihood method exemplified by scattering and meteorite data. Journal of the American Statistical Association, 75(369), 42-56 (cited 4/26/22)

Gower, J. C. (1966). Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika, 53(3-4), 325-338 (cited 2/8/22)

Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics, Wiley (cited 1/27/22)

Grün, D., Kester, L., & Van Oudenaarden, A. (2014). Validation of noise models for single-cell transcriptomics. Nature methods, 11(6), 637-640 (cited 3/29/22)

Hall, P., Marron, J. S., & Neeman, A. (2005). Geometric representation of high dimension, low sample size data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(3), 427-444. (cited 4/7/22)

Hannig, J., & Marron, J. S. (2006). Advanced distribution theory for SiZer. Journal of the American Statistical Association, 101(474), 484-499 (cited 4/26/22)

Hannig, J., Marron, J. S., & Riedi, R. (2001). Zooming statistics: Inference across scales. Journal of the Korean Statistical Society, 30(2), 327-345 (cited 4/26/22)

Hartigan, J. A. (1975) Clustering Algorithms, Wiley, New York  (cited 4/19/22)

Hastie, T., & Stuetzle, W. (1989). Principal curves. Journal of the American Statistical Association, 84(406), 502-516 (cited 2/17/22)

Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning New York. NY: Springer, 115-163 (cited 3/10/22)

Hastie, T., Tibshirani, R., Friedman, J., & Franklin, J. (2005). The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer, 27(2), 83-85 (cited 3/10/22)

Hotelling, H. (1933) Analysis of a Complex of Statistical Variables Into Principal Components. Journal of Educational Psychology, 24, 417-441 (cited 1/13/22, 2/1/22)

Hotelling, H. (1936) Relations between two sets of variates.  Biometrika,  28, 321-377 (cited 3/29/22))

Hron, K., Menafoglio, A., Templ, M., Hrůzová, K. & Filzmoser, P. (2016) Simplicial principal component analysis for density functions in Bayes spaces. Computational Statistics & Data Analysis, 94, 330-350  (cited 1/18/22)

Hsu, C.-W. and Lin, C.-J. (2002) A comparison of methods for multiclass support vector machines, IEEE Transactions on Neural Networks, 13, 415-425 (cited 3/22/22)

Huang, H., Liu, Y., Yuan, M. and Marron J.S. (2014) Statistical Significance of Clustering Using Soft Thresholding, Journal of Computational and Graphical Statistics, DOI:10.1080/10618600.2014.948179 (cited 4/21/22)

Huckemann, S., Hotz, T., & Munk, A. (2010). Intrinsic shape analysis: Geodesic PCA for Riemannian manifolds modulo isometric lie group actions. Statistica Sinica, 1-58 (cited 2/15/22)

Inselberg, A. (1985) The Plane with Parallel Coordinates, Visual Computer 1: 69–91 (cited 1/18/22)

Inselberg, A. (2009) Parallel Coordinates: VISUAL Multidimensional Geometry and its Applications. Springer, New York (cited 1/18/22)

Izem, R., & Kingsolver, J. G. (2005). Variation in continuous reaction norms: quantifying directions of biological interest. The American Naturalist, 166(2), 277-289 (cited 2/1/22)

Izem, R., & Marron, J. S. (2007). Analysis of nonlinear modes of variation for functional data. Electronic Journal of Statistics, 1, 641-676 (cited 2/1/22)

Izenman, A. J., & Sommer, C. J. (1988). Philatelic mixtures and multimodal densities. Journal of the American Statistical association, 83(404), 941-953 (cited 4/26/22)

Jammalamadaka, S. R., & Sengupta, A. (2001). Topics in circular statistics (Vol. 5). World Scientific (cited 2/10/22)

Jeong, J.-Y. (2009) Estimation of Probability Distributions on Multiple Anatomical Objects and Evaluation of Statistical Shape Models, Ph.D. Thesis, Department of Computer Science, University of North Carolina (cited 2/10/22)

Joachims, T. (2000). Estimating the Generalization Performance of an SVM Efficiently. In Proc. 17th International Conf. on Machine Learning, 431-438 (cited 3/22/22)

John, S. (1972) The distribution of a statistic used for testing sphericity of normal distributions. Biometrika, 59(1), 169-173 (cited 4/7/22)

Johnstone, I. M. (2008). Multivariate analysis and Jacobi ensembles: Largest eigenvalue, Tracy–Widom limits and rates of convergence. Annals of statistics, 36(6), 2638 (cited 4/19/22)

Jolliffe, I. T. (2002) Principal Component Analysis, Springer, New York, 2nd Edition, ISBN 978-0-387-95442-4   (cited 2/1/22)

Jung, S., & Marron, J. S. (2009). PCA consistency in high dimension, low sample size context. The Annals of Statistics, 37(6B), 4104-4130 (cited 4/12/22)

Jung, S., Foskey, M., & Marron, J. S. (2011). Principal arc analysis on direct product manifolds. The Annals of Applied Statistics, 578-603 (cited 2/15/22)

Jung, S., Dryden I. L., & Marron, J. S., (2012) Analysis of Principal Nested Spheres, Biometrika, doi: 10.1093/biomet/ass022 (cited 2/15/22)

Jung, S., Sen, A. and Marron, J. S. (2012), Boundary behavior in high dimension, low sample size asymptotics of PCA, The Journal of Multivariate Analysis,109, 190–203 (cited 4/12/22)

Karcher, H. (2014). Riemannian center of mass and so called karcher mean. arXiv preprint arXiv:1407.2087 (cited 2/22/22)

Kaufman, L. and Rousseeuw, P. J. (2005) Finding Groups in Data: An Introduction to Cluster Analysis, Wiley, New York  (cited 4/14/22)

Keleman, A. Szèkely, G. and Gerig, G. (1997 & 1999) Three dimensional model-based segmentation, TR-178 Technical Report Image Scinec Lab, ETH Zurich & Elastic model-based segmentation of 3-D neuroradiological daat sets, IEEE Transactions on Medical Imaging, 18, 828-839 (cited 2/10/22)

Kendall, D.G., Barden, D., Carne, T.K. and Le, H. (1999) Shape and Shape Theory, Wiley, Chichester (cited 2/8/22)

Kim, B. (2018). Small sphere distributions and related topics in directional statistics, Doctoral dissertation, University of Pittsburgh (cited 2/15/22)

Kimes, P. K., Cabanski, C. R., Wilkerson, M. D., Zhao, N., Johnson, A. R., Perou, C. M., Makowski, L., Marron, J. S. & Hayes, D. N. (2014) SigFuge: single gene clustering of RNA-seq reveals differential isoform usage among cancer samples, Nucleic Acids Research (2014): gku521 (cited 1/18/22)

Kimes, P. K., Liu, Y., Neil Hayes, D., & Marron, J. S. (2017). Statistical significance for hierarchical clustering. Biometrics, 73(3), 811-821 (cited 4/26/22)

Kingsolver, J. G., Heckman, N., Zhang, J., Carter, P. A., Knies, J. L., Stinchcombe, J. R., & Meyer, K. (2015). Genetic variation, simplicity, and evolutionary constraints for function-valued traits. The American Naturalist, 185(6), E166-E181 (cited 2/1/22)

Klein, R. J., Zeiss, C., Chew, E. Y., Tsai, J. Y., Sackler, R. S., Haynes, C., … & Bracken, M. B. (2005). Complement factor H polymorphism in age-related macular degeneration. Science, 308, 385-389 (cited 4/10/22)

Koch, I., Hoffmann, P., & Marron, J. S. (2014). Proteomics profiles from mass spectrometry. Electronic Journal of Statistics, 8(2), 1703-1713 (2/24/22)

Kruskal, J. B. (1964). Nonmetric multidimensional scaling: a numerical method. Psychometrika, 29(2), 115-129 (cited 2/1/22, 4/19/22)

Lam, X. Y., Marron, J. S., Sun, D., & Toh, K. C. (2018). Fast algorithms for large-scale generalized distance weighted discrimination. Journal of Computational and Graphical Statistics, 27(2), 368-379 (cited 3/22/22)

LeBlanc, M., & Tibshirani, R. (1996). Combining estimates in regression and classification. Journal of the American Statistical Association, 91(436), 1641-1650 (cited 2/17/22)

Ledoux, M. (2001). The concentration of measure phenomenon (No. 89). American Mathematical Soc. (cited 4/7/22)

Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788-791 (cited 2/17/22)

Lee, Y., Lin, Y. and Wahba, G. (2004) Multicategory Support Vector Machines, Theory, and Application to the Classification of Microarray Data and Satellite Radiance Data, Journal of the American Statistical Association, 99, 67-81 (cited 3/22/22)

Lindeberg, T. (1994) Scale Space Theory in Computer Vision, Kluwer (cited 4/26/22)

Liu, X. (2007). New statistical tools for microarray data and comparison with existing tools. The University of North Carolina at Chapel Hill (cited 4/12/22)

Liu, X., Parker, J., Fan, C., Perou, C. M., & Marron, J. S. (2009). Visualization of cross-platform microarray normalization. Batch Effects and Noise in Microarray Experiments: Sources and Solutions. Wiley, New York, 167-181 (cited 1/20/22, 3/24/22)

Liu, Y., Hayes, D. N., Nobel, A. and Marron, J. S. (2008) Statistical Significance of Clustering for High Dimension Low Sample Size Data, Journal of the American Statistical Association, 103, 1281-1293  (cited 4/21/22)

Lock, E. F., Hoadley, K. A., Marron, J. S., & Nobel, A. B. (2013). Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. The annals of applied statistics, 7(1), 523 (cited 3/29/22)

Lu, X., & Marron, J. S. (2014). Analysis of juggling data: Object oriented data analysis of clustering in acceleration functions. Electronic Journal of Statistics, 8(2), 1842-1847 (cited 2/24/22)

Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning research, 9(Nov), 2579-2605 (cited 3/10/22)

MacQueen, J. B. (1967) Some Methods for Classification and Analysis of Multivariate Observations, Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, 281-297, University of California Press, Berkeley  (cited 4/19/22)

Maggiora, G. M. (2006). On outliers and activity cliffs why QSAR often disappoints (cited 1/25/22)

Marčenko, V. A., & Pastur, L. A. (1967). Distribution of eigenvalues for some sets of random matrices. Mathematics of the USSR-Sbornik, 1(4), 457 (cited 4/19/22)

Mardia, K. V., Kent, J. T., & Bibby, J. M. Multivariate analysis. 1979. Probability and mathematical statistics. Academic Press Inc (cited 2/10/22)

Marron, J. S. & Alonso, A. M. (2014) Overview of object oriented data analysis, Biometrical Journal, 56, 732-753 (cited 1/11/22)

Marron, J. S. & Dryden, I. L. (2021) Object Oriented Data Analysis, CRC Press (cited 1/11/22)

Marron, J. S., Ramsay, J. O., Sangalli, L. M., & Srivastava, A. (2014). Statistics of time warpings and phase variations. Electronic Journal of Statistics, 8(2), 1697-1702 (cited 3/1/22)

Marron, J. S., Ramsay, J. O., Sangalli, L. M., & Srivastava, A. (2015). Functional data analysis of amplitude and phase variation. Statistical Science, 30(4), 468-484 (cited 3/1/22)

Marron, J. S., Todd, M. J., & Ahn, J. (2007). Distance-weighted discrimination. Journal of the American Statistical Association, 102(480), 1267-1271 (cited 3/22/22)

Marron, J. S., & Wand, M. P. (1992). Exact mean integrated squared error. The Annals of Statistics, 712-736 (cited 4/26/22)

McLachlan, G. J. (2004) Discriminant Analysis and Statistical Pattern Recognition, Wiley-Interscience (cited 3/1/22)

Miao, D. (2015) Class-Sensitive Principal Components Analysis , UNC PhD Dissertation, https://cdr.lib.unc.edu/record/uuid:853d8c52-5b4a-4607-afff-9554b68bb6f5 (cited 3/8/22)

Miedema, J., Marron, J. S., Niethammer, M., Borland, D., Woosley, J., Coposky, J. & Thomas, N. E. (2012) Image and statistical analysis of melanocytic histology. Histopathology, 61(3), 436-444 (cited 1/27/22)

Menafoglio, A., Grasso, M., Secchi, P. & Colosimo, B.M. (2018) Profile monitoring of probability density functions via simplicial functional PCA with application to image data, Technometrics, 60, 497-510  (cited 1/18/22)

Morton, J. T., Toran, L., Edlund, A., Metcalf, J. L., Lauber, C., & Knight, R. (2017). Uncovering the horseshoe effect in microbial analyses. Msystems, 2(1), e00166-16 (cited 2/8/22)

Owen, S. J. (1998) A survey of Mesh Generation Technology, http://www.imr.sandia.gov/papers/imr7/owen_meshtech98.ps.gz (cited 2/10/22)

Parzen, E. (2004) Quantile probability and statistical data modeling, Statistical Science, 19, 652-662. (cited 1/18/22)

Patrangenaru, V., & Ellingson, L. (2019). Nonparametric statistics on manifolds and their applications to object data analysis. CRC Press. (cited 1/20/22, 2/10/22, 2/22/22)

Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statistica Sinica, 17(4), 1617 (cited 4/12/22)

Pearson, K. (1901) On Lines and Planes of Closest Fit to Systems of Points in Space, Philosophical Magazine, 2, 559-572 (cited 1/13/22, 2/1/22)

Perou, C. M., Sorlie, T., Eisen, M. B., & Van De Rijn, M. (2000). Molecular portraits of human breast tumours. nature, 406(6797), 747 (cited 3/22/22, 4/21/22)

Pigoli, D., Pantelis, Z. H., Coleman, J. S. & Aston, J. A. D. (2015) The analysis of acoustic phonetic data: exploring differences in the spoken romance languages. arXiv preprint arXiv:1507.07587 (cited 1/11/22)

Pizer, S. M., Jung, S., Goswami, D., Vicory, J., Zhao, X., Chaudhuri, R., … & Marron, J. S. (2013). Nested sphere statistics of skeletal models. In Innovations for Shape Analysis (pp. 93-115). Springer, Berlin, Heidelberg (cited 2/15/2022)

Pizer, S. M., & Marron, J. S. (2017). Object statistics on curved manifolds. In Statistical Shape and Deformation Analysis (pp. 137-164). Academic Press (cited 2/15/22)

Pizer, S. M., Hong, J., Vicory, J., Liu, Z., Marron, J. S., Choi, H. Y., … & Wang, J. (2020). Object shape representation via skeletal models (s-reps) and statistical analysis. In Riemannian Geometric Statistics in Medical Image Analysis (pp. 233-271). Academic Press. (cited 2/15/22)

Qiao, X., Zhang, H. H., Liu, Y., Todd, M. J., & Marron, J. S. (2010). Weighted distance weighted discrimination and its asymptotic properties. Journal of the American Statistical Association, 105(489), 401-414 (cited 3/22/22, 4/12/22)

Ramsay, J. O. & Silverman, B. W. (2002) Applied Functional Data Analysis, Springer, N.Y. ISBN 0-387-95414-7 (cited 1/11/22)

Ramsay, J. O. & Silverman, B. W. (2005) Functional Data Analysis, 2nd Edition, Springer, N.Y. ISBN 0-387-40080-X (cited 1/11/22)

Ramsay, J. O. (2005) Functional Data Analysis Web Site, http://psych.mcgill.ca/misc/fda/  (cited 1/11/22)

Ramsay, J. O., Gribble, P., & Kurtek, S. (2014). Description and processing of functional data arising from juggling trajectories. Electronic Journal of Statistics, 8(2), 1811-1816 (cited 2/24/22)

Rao, C. R. (1945). Information and the accuracy attainable in the estimation of statistical parameters, Bull. Calcutta Math. Soc.,  37, 81-91. (cited 2/22/22)

Rao, C. R. (1958). Some statistical methods for comparison of growth curves. Biometrics, 14(1), 1-17  (cited 2/1/22)

Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323-2326 (cited 2/17/22)

Royer, J.-Y. and Chang, T. (1991) Evidence for relative motions between the Indian and Australian Plates during the last 20 m.y. from plate tectonic reconstructions: Implications for the deformation of the Indo-Australian Plate, Journal of Geophysical Research, 96(B7), 11,779–11,802, doi:10.1029/91JB00897 (cited 2/8/22)

Sarle, W. S., and Kuo, A. H. (1993), The MODECLUS Procedure, Technical Report P-256, SAS Institute Inc., Cary  (cited 4/21/22)

Schmitz, H. P. and Marron, J. S. (1992) Simultaneous estimation of several size distributions of  income, Econometric Theory, 8, 476-488  (cited 4/26/22)

Schölkopf, B., Smola, A., & Müller, K. R. (1997) Kernel principal component analysis. In International conference on artificial neural networks (pp. 583-588). Springer, Berlin, Heidelberg (cited 3/10/22)

Shabalin, A. A., Tjelmeland, H., Fan, C., Perou, C. M., & Nobel, A. B. (2008) Merging two gene-expression studies via cross-platform normalization. Bioinformatics, 24(9), 1154-1160 (cited 3/24/22)

Shabalin, A. A., & Nobel, A. B. (2013). Reconstruction of a low-rank matrix in the presence of Gaussian noise. Journal of Multivariate Analysis, 118, 67-76 (cited 3/31/22)

Shen, H., & Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. Journal of multivariate analysis, 99(6), 1015-1034 (cited 4/12/22)

Shen, D., Shen, H., & Marron, J. S. (2013) Consistency of sparse PCA in high dimension, low sample size contexts. Journal of Multivariate Analysis, 115, 317-333 (cited 4/12/22)

Shen, D., Shen, H., Zhu, H., & Marron, J. S. (2016) The statistics and mathematics of high dimension low sample size asymptotics. Statistica Sinica, 26(4), 1747 (cited 4/12/22)

Siddiqi, K. and Pizer, S. M. (2007) Medial Representations Mathematics Algorithms and Applications, Springer, New York (cited 2/10/22)

Srivastava, A., Wu, W., Kurtek, S., Klassen, E., & Marron, J. S. (2011). Registration of functional data using Fisher-Rao metric. arXiv preprint arXiv:1103.3817 (cited 2/22/22)

Srivastava, M. S., Katayama, S., & Kano, Y. (2013) A two sample test in high dimensional data. Journal of Multivariate Analysis, 114, 349-358 (cited 1/18/22)

Stephens, M. A. (1974). EDF statistics for goodness of fit and some comparisons. Journal of the American statistical Association, 69(347), 730-737  (cited 1/27/22)

Talagrand, M. (1991). A new isoperimetric inequality and the concentration of measure phenomenon. In Geometric Aspects of Functional Analysis (pp. 94-124). Springer, Berlin, Heidelberg (cited 4/7/22)

Talagrand, M. (1995). Concentration of measure and isoperimetric inequalities in product spaces. Publications Mathématiques de l’Institut des Hautes Etudes Scientifiques, 81(1), 73-205 (cited 4/7/22)

Tenenbaum, J. B., De Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319-2323 (cited 2/17/22)

Toh, K. C., Todd, M. J. & Tutuncu, R. H. (1999) www.math.nus.edu.sg/~mattohkc/sdpt3.html  (cited 3/22/22)

Torgerson, W. S. (1952). Multidimensional scaling: I. Theory and method. Psychometrika, 17(4), 401-419 (cited 2/8/22)

Torgerson, W. S. (1958). Theory and methods of scaling (cited 2/8/22)

Tracy, C. A., & Widom, H. (1994). Level-spacing distributions and the Airy kernel. Communications in Mathematical Physics, 159(1), 151-174 (cited 4/19/22)

Tukey, J. W. (1977) Exploratory data analysis, Pearson, N.Y. ISBN 978-0201076165 (cited 1/11/22)

Vapnik, V, N. (1982) Estimation of dependences based on empirical data, Springer (Russian version, 1979) (cited 3/10/22)

Vapnik, V. N. (1995) The nature of statistical learning theory, Springer (cited 3/10/22)

Venables, W.N. & Ripley, B.D. (2013) Modern applied statistics with S-PLUS. Springer Science & Business Media (cited 1/20/22)

Vidal, R., Ma, Y., & Sastry, S. (2016). Generalized principal component analysis, Springer (cited 3/10/22)

Wahba, G., Lin, Y., & Zhang, H. (1999). Generalized approximate cross validation for support vector machines, or, another way to look at margin-like quantities. (cited 3/22/22)

Wahba, G., Lin, Y., Lee, Y., & Zhang, H. (2003). Optimal properties and adaptive tuning of standard and nonstandard support vector machines. In Nonlinear estimation and classification (pp. 129-147) Springer, New York, NY (cited 3/22/22)

Wand, M. P., & Jones, M. C. (1994). Kernel smoothing. Crc Press (cited 4/26/22)

Wang, B., & Zou, H. (2016). Sparse Distance Weighted Discrimination. Journal of Computational and Graphical Statistics, 25(3), 826-838 (cited 3/22/22)

Wang, H. & Marron, J. S. (2007) Object oriented data analysis: sets of trees, Annals of Statistics, 35, 1849-1873  (cited 1/11/11)

Wegelin, J. A. (2000). A survey of Partial Least Squares (PLS) methods, with emphasis on the two-block case, http://www.stat.washington.edu/www/research/reports/2000/tr371.ps (cited 3/29/22)

Wei, S., Lee, C., Wichers, L., & Marron, J. S. (2015) Direction-projection-permutation for high dimensional hypothesis tests. Journal of Computational and Graphical Statistics, (1/18/22, 4/12/22)

Weinstein, J. N., Collisson, E. A., Mills, G. B., Shaw, K. R. M., Ozenberger, B. A., Ellrott, K., … & Cancer Genome Atlas Research Network. (2013). The cancer genome atlas pan-cancer analysis project. Nature genetics, 45(10), 1113 (cited 3/24/22)

White, H. (2014). Asymptotic theory for econometricians. Academic press (cited 4/7/22)

Wilkinson, L., & Friendly, M. (2009). The history of the cluster heat map. The American Statistician, 63(2), 179-184, (cited 2/1/22)

Wold, H. (1975). Soft modelling by latent variables: the non-linear iterative partial least squares (NIPALS) approach. Journal of Applied Probability, 12(S1), 117-142 (cited 3/29/22)

Wold, H. O. A. (1985). Partial least squares. Kotz S, Johnson N L. Encyclopedia of Statistical Sciences. New York: Wiley, 581-591 (cited 3/29/22)

Xiong, J., Dittmer, D. P., & Marron, J. S. (2015). “Virus hunting” using radial distance weighted discrimination. The Annals of Applied Statistics, 9(4), 2090-2109 (cited 3/24/22)

Yao, J., Zheng, S., & Bai, Z. D. (2015). Sample covariance matrices and high-dimensional data analysis. Cambridge University Press (cited 4/19/22)

Yata, K., & Aoshima, M. (2009) PCA consistency for non-Gaussian data in high dimension, low sample size context. Communications in Statistics—Theory and Methods, 38(16-17), 2634-2652 (cited 4/7/22)

Yata, K., & Aoshima, M. (2010a) Effective PCA for high-dimension, low-sample-size data with singular value decomposition of cross data matrix. Journal of multivariate analysis, 101(9), 2060-2077 (cited 4/7/22)

Yata, K., & Aoshima, M. (2010b) Intrinsic dimensionality estimation of high-dimension, low sample size data with d-asymptotics. Communications in Statistics—Theory and Methods, 39(8-9), 1511-1521 (cited 4/7/22)

Yata, K., & Aoshima, M. (2012) Effective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations. Journal of multivariate analysis, 105(1), 193-215 (cited 4/7/22)

Yata, K., & Aoshima, M. (2013) PCA consistency for the power spiked model in high-dimensional settings. Journal of multivariate analysis, 122, 334-354 (cited 4/7/22)

Young, G., & Householder, A. S. (1938). Discussion of a set of points in terms of their mutual distances. Psychometrika, 3(1), 19-22 (cited 2/8/22)

Yu, Q., Lu, X., & Marron, J. S. (2017). Principal Nested Spheres for Time-Warped Functional Data Analysis. Journal of Computational and Graphical Statistics, 26(1), 144-151 (cited 2/24/22)

Yushkevich, P., Pizer, S. M., Joshi, S., and Marron, J. S. (2001) Intiutive, localized analysis of shape variability, Information Processing in Medical Imaging (IPMI), eds. Insana, M. F. and Leahy, R. M. 402-408 (cited 2/10/22)

Zhang, J., Heckman, N., Cubranic, D., Kingsolver, J. G., Gaydos, T., & Marron, J. S. (2014). Prinsimp. R Journal, 6(2) (cited 2/1/22)

Zhang, L., Marron, J. S., Shen, H., & Zhu, Z. (2007). Singular value decomposition and its visualization. Journal of Computational and Graphical Statistics, 16(4), 833-854 (cited 2/3/22)

Zhang, L., Lu, S., & Marron, J. S. (2015). Nested nonnegative cone analysis. Computational Statistics & Data Analysis, 88, 100-110 (cited 2/17/22)

Zoubouloglou, P., García-Portugués, E., & Marron, J. S. (2021). Scaled torus principal component analysis. arXiv preprint arXiv:2110.04758 (cited 2/15/2022)