Course Home Page – STOR 881 Object Oriented Data Analysis – Spring 2024

Instructor:    J. S. Marron

Email:   marron@unc.edu

Office:   352 Hanes Hall

SyllabusSTOR881Spring2024

 

Course Notes

  1.     STOR881-01-11-2024.pptx:  Organizational Matters, OODA Book, What is OODA?, Taste of OODA Examples (including Spanish Male Mortality, Amplitude – Phase, Shapes, Sounds, Faces), 3 Major Phases of OODA, Projections, Scatterplot Matrices, Trait by Trait Views.
  2.    STOR881-01-16-2024.pptx:  Principal Component Analysis (PCA), Object Space – Trait Space, Scree Plots, Define Modes of Variation, Prob. Dist’ns as data objects, PCA Toy & Real Examples, Shifted Parabolas Data.
  3.   STOR881-01-18-2024.pptx:   Lung Cancer Data, Limitations of PCA: Apple, Banana, Pear, NCI-60 Data, Caution about DWD, Inference using DiProPerm.
  4.   STOR881-01-23-2024.pptx:  OODA Terminology, Marginal Distribution Plots, Marginal Distribution Plot Analysis of Drug Discovery Data, Normalization and Correlation PCATransformations.
  5.  STOR881-01-25-2024.pptx:  Transformations, Melanoma Data, Automatic Shifted Log Transformation, ROC Curve to Quantify Impact of Transformation on Gene Expression Data, Heatmap Data Visualization.
  6.  STOR881-01-30-2024.pptx:  Other Directions for Scatterplot Views, Centering, Details of PCA, Review Linear Algebra, Covariance Matrices, PCA as Optimization.
  7.  STOR881-02-01-2024.pptx:  Alternate Viewpoints of PCA (Data Representation, Distribution of Energy, Simulation, Comparison to SVD), Distance Methods, Distance Based Centers, Multidimensional Scaling, Shapes as Data Objects, Shape Representations (Landmark, Boundary, Shape).
  8.  STOR881-02-06-2024.pptx: Male Pelvis Data, Manifold Data, Directional Data, S-rep Analysis, Principal Geodesic Analysis.  Sara Peterson – Joint Dimension Reduction for Integrated Tumor/Model Pairs
  9.  STOR881-02-08-2024.pptx: Principal Nested Spheres, Polysphere PCA, Scaled Torus PCA, Nonnegative Nested Cone Analysis Principal Curves and Surfaces, General Motivation for Backwards Methods.  Teresa McGhee – QTl Mapping With Mediation
  10.  STOR881-02-13-2024.pptx:  No Class, Wellness Day
  11.  STOR881-02-15-2024.pptx:  Principal Curves and Surfaces, General Motivation for Backwards Methods, Curve Registration, Shifted Betas Example, Amplitude and Phase Modes of Variation.  Victoria Sagasta Pereira – MMM
  12.  STOR881-02-20-2024.pptx: Amplitude and Phase Modes of Variation, Fisher-Rao Curve Estimation, Principal Nested Spheres on SRVF Sphere.  Michael Nisenzon –
  13.  STOR881-02-22-2024.pptx: Principal Nested Spheres on SRVF Sphere, Juggling Data, Data Integration, Partial Least Squares.  Hyeon Lee – Tree PCA
  14.  STOR881-02-27-2024.pptx:  Canonical Correlation Analysis, Angle Based Joint and Individual Variation Explained (AJIVE), FMRI Data, AJIVE Algorithm.  Katelyn McInerney – Object Oriented Perspective on Genome Types
  15.  STOR881-02-29-2024.pptx: AJIVE Algorithm and Diagnostics, Breast Cancer Images and Genomics, Multiple Genomics in Breast Cancer, Amplification Adjustment in Single Cell RNAseq, Data Integration Via Analysis of Subspaces (DIVAS).  Andrew Walker – Multiplex IF Image Analysis
  16.  STOR881-03-05-2024.pptx:  Data Integration Via Analysis of Subspaces (DIVAS), DIVAS Toy Example, DIVAS on TCGA Data, High Dimension Low Sample Size (HDLSS) Analysis.  Enes Kelestemur – Drug Discovery
  17.  STOR881-03-07-2024.pptx:   High Dimension Low Sample Size (HDLSS) Analysis, Explanation of DWD Visualization, Technical Assumptions, Zero Covarince is Not Independence, Mixing Conditions, HDLSS Analysis of PCA, HDLSS Explanation of Earlier Observations.  Shiying Li – Optimal transport-based embeddings
  18.  STOR881-03-12-2024.pptx:  No Class, Spring Break
  19.  STOR881-03-14-2024.pptx:  No Class, Spring Break
  20.  STOR881-03-19-2024.pptx:  Introduction to Random Matrix Theory, Macenko-Pastur Distribution, K-Means Clustering,  Gilbert Giri – Evolution of Gene Regulation
  21.  STOR881-03-21-2024.pptx:  K-Means Clustering, Hierarchical Clustering, SigClust.  Tianzhu Liu – Topics Related to Optimization
  22.  STOR881-03-26-2024.pptx:  SigClust, Smoothing, Kernel Density Estimation.  Kendall Thomas – Comprehensive Female Athlete Health & Performance Monitoring
  23.  STOR881-03-28-2024.pptx:  No Class, Wellness Day
  24.  STOR881-04-02-2024.pptx: Scatterplot Smoothing, SiZer, P-P & Q-Q Plots.  Kyung Rok Kim – ?
  25.  STOR881-04-04-2024.pptx: Q-Q Envelopes, Outliers in OODA, Robustness, Multivariate Median, Spherical PCA.  Ameer Qaqish – Gaussian Processes
  26.  STOR881-04-09-2024.pptx:  Cornea Data, Elliptical PCA, GWAS Data, Classfication – Discrimination.
  27.  STOR881-04-11-2024.pptx: Linear Discriminant Analysis (non-parametric and likelihood derivations), HDLSS Classification.  Qichen Wang – Breast Cancer Epidemiology Study
  28.  STOR881-04-16-2024.pptx:  Maximal Data Piling, Kernel Embedding.  Yuhao Zhou – Causal Graphical Models
  29.  STOR881-04-18-2024.pptx: Kernel Embedding and PCA, t-SNE Visualization, Support Vector Machine, SVM Tuning and Extensions, Distance Weighted Discrimination.  Jason Hu – ???
  30.  STOR881-04-23-2024.pptx: Distance Weighted Discrimination, DWD Simulations, Compare DWD, SVM and MD Visualizations, DWD Batch Adjustment, DWD Source & Batch Adjustment, NCI-60 Data, Why not Adjust by Means?, Robust DWD.  Charles Zhao – ???
  31.  STOR881-04-25-2024.pptx:  Generalized DWD, Radial DWD, Independent Component Analysis.  Yu Chen – FMRI Clustering
  32.  STOR881-04-30-2024.pptx:  Tree Structured Data Objects, Topological Data Analysis.  Ashley Buck – phenotyping gait biomechanics in knee osteoarthritis

 

References

Ahn, J. (2006) High dimension, low sample size data analysis. PhD Dissertation, University of North Carolina, Chapel Hill (cited 4/23/24)

Ahn, J. (2010). A stable hyperparameter selection for the Gaussian RBF kernel for discrimination. Statistical Analysis and Data Mining: The ASA Data Science Journal, 3(3), 142-148 (cited 4/16/24)

Ahn, J., Lee, M. H., & Yoon, Y. J. (2012). Clustering high dimension, low sample size data using the maximal data piling distance. Statistica Sinica, 443-464 (cited 4/11/224)

Ahn, J., Marron, J. S., Muller, K. M., & Chi, Y. Y. (2007) The high-dimension, low-sample-size geometric representation holds under mild conditions. Biometrika, 94(3), 760-766 (cited 3/5/24)

Ahn, J., & Marron, J. S. (2010) The maximal data piling direction for discrimination. Biometrika, 97(1), 254-259 (cited 4/16/24)

Ahn, J., Lee, M. H., & Yoon, Y. J. (2012). Clustering high dimension, low sample size data using the maximal data piling distance. Statistica Sinica, 443-464 (cited 4/16/24)

Aitchison, J. (1982) The statistical analysis of compositional data. Journal of the Royal Statistical Society: Series B (Methodological), 44, 139-160 (cited 1/18/24)

Aizerman, A., Braverman, E. M., & Rozoner, L. I. (1964) Theoretical foundations of the potential function method in pattern recognition learning. Automation and remote control, 25, 821-837 (cited 4/16/24)

Aizerman, A., Braverman, E. M., & Rozoner, L. I. (1964) Theoretical foundations of the potential function method in pattern recognition learning. Automation and remote control, 25, 821-837 (cited 4/11/24)

Alter, O., Brown, P. O., & Botstein, D. (2000) Singular value decomposition for genome-wide expression data processing and modeling. Proceedings of the National Academy of Sciences, 97, 10101-10106 (cited 4/23/24)

Amari, S. I. (2012). Differential-geometrical methods in statistics (Vol. 28). Springer Science & Business Media (cited 2/20/24)

Anderson, T. W., & Darling, D. A. (1952) Asymptotic theory of certain” goodness of fit” criteria based on stochastic processes. The Annals of Mathematical Statistics, 193-212  (cited 1/25/24)

Aoshima, M., Shen, D., Shen, H., Yata, K., Zhou, Y. H., & Marron, J. S. (2018). A survey of high dimension low sample size asymptotics. Australian & New Zealand journal of statistics, 60(1), 4-19 (cited 3/7/24)

Bai, Z. D., & Saranadasa, H. (1996) Effect of high dimension: by an example of a two sample problem. Statistica Sinica, 6(2), 311-329 (cited 1/18/24)

Basser, P. J., Mattiello, J., & LeBihan, D. (1994). MR diffusion tensor spectroscopy and imaging. Biophysical journal, 66(1), 259-267 (cited 2/15/24)

Benito, M., Parker, J., Du, Q., Wu, J., Xiang, D., Perou, C. M., & Marron, J. S. (2004) Adjustment of systematic microarray data biases. Bioinformatics, 20(1), 105-114 (cited 4/23/24)

Bernardi, M., Sangalli, L. M., Secchi, P., & Vantini, S. (2014). Analysis of proteomics data: Block k-mean alignment, Electronic Journal of Statistics, 8 (2), 1714-1723, (2014) DOI: 10.1214/14-EJS900A (cited 2/20/24)
Bickel, P. J. and Levina, E. (2004) Some theory for Fisher’s Linear Discriminant function, “naive Bayes”, and some alternatives when there are many more variables than observations, Bernoulli, 10, 989-1010 (cited 4/11/24)
Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.  (cited 4/18/24)

Borysov, P., Hannig, J., Marron, J. S., Muratov, E., Fourches, D., & Tropsha, A. (2016). Activity prediction and identification of mis‐annotated chemical compounds using extreme descriptors. Journal of Chemometrics, 30(3), 99-108 (cited 1/23/24)

Boser, B. E., Guyon, I. and Vapnik, V. (1992) A Training Algorithm for Optimal Margin Classifiers, in Fifth Annual Workshop on Computational Learning Theory, ACM (cited 4/18/24)

Bottai, M., Kim, T., Lieberman, B., Luta, G., & Peña, E. (2022) On Optimal Correlation-Based Prediction. The American Statistician, 76(4), 313-321 (cited 1/30/24)

Box, G. E., & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society. Series B (Methodological), 211-252 (cited 1/25/24)

Bradley, R. C. (2005). Basic properties of strong mixing conditions. A survey and some open questions. Probab. Surv. 2 107–144 (electronic). (Update of, and a supplement to, the 1986 original.)  (cited 3/5/24)

Brooks, J. P., Dulá, J. H., & Boone, E. L. (2013). A pure L1-norm principal component analysis. Computational statistics & data analysis, 61, 83-98 (cited 2/15/24, 4/9/24)

Bullitt, E., & Aylward, S. R. (2002). Volume rendering of segmented image objects. IEEE Transactions on Medical Imaging, 21(8), 998-1002. (cited 1/11/24)

Burges, C. J. (1998). A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery, 2(2), 121-167 (cited 4/18/24)

Cabanski, C. R., Qi, Y., Yin, X., Bair, E., Hayward, M. C., Fan, C., Li, J., Wilkerson, M. D., Marron, J. S., Perou, C. M. and Hayes, D. N. (2010) SWISS MADE: Standardized WithIn Class Sum of Squares to Evaluate Methodologies and Dataset Elements, PLoS ONE, 5(3): e9905.doi:10.1371/journal.pone.0009905, PMCID: PMC2845619.   (cited 3/21/24)

Cai, T., Liu, W., & Xia, Y. (2014) Two‐sample test of high dimensional means under dependence. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(2), 349-372 (cited 1/18/24)

Carmichael, I., Calhoun, B. C., Hoadley, K. A., Troester, M. A., Geradts, J., Couture, H. D., … & Marron, J. S. (2021). Joint and individual analysis of breast cancer histologic images and genomic covariates. The Annals of Applied Statistics, 15(4), 1697-1722 (cited 2/27/24, 2/29/24)

Carmichael, I., & Marron, J. S. (2021). Geometric insights into support vector machine behavior using the KKT conditions. Electronic Journal of Statistics, 15(2), 6311-6343 (cited 4/23/24)

Cates, J., Fletcher, P. T., Styner, M., Shenton, M., & Whitaker, R. (2007, July). Shape modeling and analysis with entropy-based particle systems. In Biennial International Conference on Information Processing in Medical Imaging (pp. 333-345). Springer, Berlin, Heidelberg (cited 1/18/24)

Cattell, R. B. (1966) The scree test for the number of factors. Multivariate Behavioral Research, 1(2), 245-276 (cited 1/16/24, 1/30/24, 3/7/24)

Chaney, E. L., Pizer, S., Joshi, S., Broadhurst, R., Fletcher, T., Gash, G., … & Tracton, G. (2004). Automatic male pelvis segmentation from CT images via statistically trained multi-object deformable m-rep models. International Journal of Radiation Oncology, Biology, Physics, 60(1), S153-S154. (cited 1/11/24)

Chaudhuri, P. and Marron, J. S. (1999) SiZer for exploration of structure in curves, Journal of the American Statistical Association, 94, 807-823 (cited 4/2/24)

Chaudhuri, P., & Marron, J. S. (2000). Scale space view of curve estimation. Annals of Statistics, 408-428 (cited 4/2/24)

Chen, M., & Zhou, X. (2016). Single Cell Partial Least Squares, unpublished manuscript (cited 2/29/24)

Chen, S. X., & Qin, Y. L. (2010) A two-sample test for high-dimensional data with applications to gene-set testing. The Annals of Statistics, 808-835  (cited 1/18/24)

Clarke, B. R. (2018). Robustness theory and application. John Wiley & Sons (cited 4/4/24)

CRAN-DWD (2014). https://cran.r-project.org/package=DWD (cited 4/23/24)

Cristianini, N. and Shawe-Taylor, J. (2000) An Introduction to Support Vector Machines, Cambridge University Press (cited 4/16/24)

Dai, W., & Genton, M. G. (2016). Directional outlyingness for multivariate functional data. arXiv preprint arXiv:1612.04615 (cited 4/4/24)

Dai, W., & Genton, M. G. (2017). Multivariate Functional Data Visualization and Outlier Detection. arXiv preprint arXiv:1703.06419. (cited 4/4/24)

Damon, J., & Marron, J. S. (2014). Backwards principal component analysis and principal nested relations. Journal of Mathematical Imaging and Vision, 50(1-2), 107-114 (cited 2/15/24)

DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, 44, 837-845 (cited 1/25/24)

Dobriban, E. (2015). Efficient computation of limit spectra of sample covariance matrices. Random Matrices: Theory and Applications, 4(04), 1550019 (cited 3/7/24)

Domingos, P. & Pazzani, M. (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29:103–­137 (cited 4/4/24)

Duda, R. O. and Hart P. E. (1973) Pattern Classification and Scene Analysis, Wiley, New York (cited 4/9/24)

Duda, R. O., Hart P. E. and Stork, D. G. (2001) Pattern Classification, Wiley, New York (cited 4/9/24, 4/16/24)

El Karoui, N. (2010). The spectrum of kernel random matrices. The Annals of Statistics, 38(1), 1-50 (cited 4/23/24)

Eltzner, B., Jung, S., & Huckemann, S. (2015). Dimension reduction on polyspheres with application to skeletal representations. In International Conference on Networked Geometric Science of Information (pp. 22-29). Springer, Cham. (cited 2/8/24)

Eltzner, B., Huckemann, S., & Mardia, K. V. (2018). Torus principal component analysis with applications to RNA structure. The Annals of Applied Statistics, 12(2), 1332-1359 (cited 2/8/2024)

Erästö, P., & Holmström, L. (2005). Bayesian multiscale smoothing for making inferences about features in scatterplots. Journal of Computational and Graphical Statistics, 14(3), 569-589 (cited 4/2/2024)

Fan, J., & Gijbels, I. (1996). Local Polynomial Modelling and Its Applications, Chapman and Hall, London (cited 4/2/24)

Feng, Q., Hannig, J., & Marron, J. S. (2016). A note on automatic data transformation. Stat, 5(1), 82-87  (cited 1/25/2024)

Feng, Q., Jiang, M., Hannig, J., & Marron, J. S. (2018). Angle-based joint and individual variation explained. Journal of multivariate analysis, 166, 241-265 (cited 2/22/24)

Fisher, N. I. (1983) Graphical Methods in Nonparametric Statistics: A Review and Annotated Bibliography, International Statistical Review, 51, 25-58  (cited 4/2/24)

Fisher, N. I. (1995). Statistical analysis of circular data. Cambridge University Press (cited 2/6/24)

Fisher, R.A. (1936) The Use of Multiple Measurements in Taxonomic Problems, Annals of Eugenics, 7, 179-188  (cited 4/4/24)

Fletcher, P. T. (2004) Statistical variability in nonlinear spaces: Application to shape analysis and DT-MRI, University of North Carolina at Chapel Hill  (cited 2/6/24)

Fletcher, P. T., Lu, C., Pizer, S. M., & Joshi, S. (2004). Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE transactions on medical imaging, 23(8), 995-1005 (cited 2/6/2024)

Fréchet, M. (1948) Les éléments aléatoires de nature quelconque dans un espace distancié, Annales de l’institut Henri Poincaré, 10, 215-310 (cited 2/1/24, 2/6/24)

Gaydos, T. L., Heckman, N. E., Kirkpatrick, M., Stinchcombe, J. R., Schmitt, J., Kingsolver, J., & Marron, J. S. (2013). Visualizing genetic constraints. The Annals of Applied Statistics, 7(2), 860-882 (cited 1/30/2024)

Gersho, A. and Gray, R. M. (1991) Vector Quantization and Signal Compression, Springer, New York  (cited 3/19/24)

Godtliebsen, F., Marron, J. S., & Chaudhuri, P. (2002). Significance in scale space for bivariate density estimation. Journal of Computational and Graphical Statistics, 11(1), 1-21 (cited 4/2/24)

Godtliebsen, F., Marron, J. S., & Chaudhuri, P. (2004). Statistical significance of features in digital images. Image and Vision Computing, 22(13), 1093-1104 (cited 4/2/24)

Godtliebsen, F., Marron, J. S., & Pizer, S. M. (2002). Significance in scale-space for clustering. Spatial clustering modeling. Chapman and Hall/CRC, 24-36 (cited 4/2/24)

Good, I. J., & Gaskins, R. A. (1980). Density estimation and bump-hunting by the penalized likelihood method exemplified by scattering and meteorite data. Journal of the American Statistical Association, 75(369), 42-56 (cited 3/26/24)

Gower, J. C. (1974). Algorithm as 78: The mediancentre. Journal of the Royal Statistical Society. Series C (Applied Statistics), 23(3), 466-470 (cited 4/4/24)

Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics, Wiley (cited 1/25/24)

Grün, D., Kester, L., & Van Oudenaarden, A. (2014). Validation of noise models for single-cell transcriptomics. Nature methods, 11(6), 637-640 (cited 2/29/24)

Haldane, J. B. S. (1948) Note on the median of a multivariate distribution, Biometrika, 35, 414-415 (cited 4/4/24)

Hall, P., Marron, J. S., & Neeman, A. (2005). Geometric representation of high dimension, low sample size data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(3), 427-444. (cited 3/5/24)

Hampel, F. M., Ronchetti, E. R., Rouseeuw, P. J. and Stahel, W. A. (2011) Robust Statistics: the Approach Based on Influence Functions, Wiley, New York (cited 4/4/24)

Hannig, J., & Marron, J. S. (2006). Advanced distribution theory for SiZer. Journal of the American Statistical Association, 101(474), 484-499 (cited 4/2/24)

Hannig, J., Marron, J. S., & Riedi, R. (2001). Zooming statistics: Inference across scales. Journal of the Korean Statistical Society, 30(2), 327-345 (cited 4/2/24)

Hartigan, J. A. (1975) Clustering Algorithms, Wiley, New York  (cited 3/19/24)

Hastie, T., & Stuetzle, W. (1989). Principal curves. Journal of the American Statistical Association, 84(406), 502-516 (cited 2/15/24)

Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning New York. NY: Springer, 115-163 (cited 4/18/24)

Hastie, T., Tibshirani, R., Friedman, J., & Franklin, J. (2005). The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer, 27(2), 83-85 (cited 4/18/24)

Hofmann, Thomas, Bernhard Schölkopf, and Alexander J. Smola (2008). Kernel methods in machine learning. Annals of Statistics, 36, 1171-1220.

Hotelling, H. (1933) Analysis of a Complex of Statistical Variables Into Principal Components. Journal of Educational Psychology, 24, 417-441 (cited 1/16/24, 1/30/24)

Hotelling, H. (1936) Relations between two sets of variates.  Biometrika,  28, 321-377 (cited 2/27/24)

Hron, K., Menafoglio, A., Templ, M., Hrůzová, K. & Filzmoser, P. (2016) Simplicial principal component analysis for density functions in Bayes spaces. Computational Statistics & Data Analysis, 94, 330-350  (cited 1/18/24)

Hsu, C.-W. and Lin, C.-J. (2002) A comparison of methods for multiclass support vector machines, IEEE Transactions on Neural Networks, 13, 415-425 (cited 4/18/24)

Huang, H., Liu, Y., Yuan, M. and Marron J.S. (2014) Statistical Significance of Clustering Using Soft Thresholding, Journal of Computational and Graphical Statistics, DOI:10.1080/10618600.2014.948179 (cited 3/21/24, 3/26/24)

Huber, P. (2011) Robust Statistics. Wiley, New York (cited 4/4/24)

Huckemann, S., Hotz, T., & Munk, A. (2010). Intrinsic shape analysis: Geodesic PCA for Riemannian manifolds modulo isometric lie group actions. Statistica Sinica, 1-58 (cited 2/6/24)

Inselberg, A. (1985) The Plane with Parallel Coordinates, Visual Computer 1: 69–91 (cited 1/16/24, 4/4/24)

Inselberg, A. (2009) Parallel Coordinates: VISUAL Multidimensional Geometry and its Applications. Springer, New York (cited 1/16/24, 4/4/24)

Izem, R., & Kingsolver, J. G. (2005). Variation in continuous reaction norms: quantifying directions of biological interest. The American Naturalist, 166(2), 277-289 (cited 1/30/24)

Izem, R., & Marron, J. S. (2007). Analysis of nonlinear modes of variation for functional data. Electronic Journal of Statistics, 1, 641-676 (cited 1/30/24)

Izenman, A. J., & Sommer, C. J. (1988). Philatelic mixtures and multimodal densities. Journal of the American Statistical association, 83(404), 941-953 (cited 3/26/24)

Jammalamadaka, S. R., & Sengupta, A. (2001). Topics in circular statistics (Vol. 5). World Scientific (cited 2/6/24)

Jeong, J.-Y. (2009) Estimation of Probability Distributions on Multiple Anatomical Objects and Evaluation of Statistical Shape Models, Ph.D. Thesis, Department of Computer Science, University of North Carolina (cited 2/6/24)

Joachims, T. (2000). Estimating the Generalization Performance of an SVM Efficiently. In Proc. 17th International Conf. on Machine Learning, 431-438 (cited 4/18/24)

John, S. (1972) The distribution of a statistic used for testing sphericity of normal distributions. Biometrika, 59(1), 169-173 (cited 3/5/24)

Johnstone, I. M. (2008). Multivariate analysis and Jacobi ensembles: Largest eigenvalue, Tracy–Widom limits and rates of convergence. Annals of statistics, 36(6), 2638 (cited 3/7/24)

Jolliffe, I. T. (2002) Principal Component Analysis, Springer, New York, 2nd Edition, ISBN 978-0-387-95442-4   (cited 1/30/24)

Jones, M. C., Marron, J. S., & Sheather, S. J. (1996). A brief survey of bandwidth selection for density estimation. Journal of the American Statistical Association, 91(433), 401-407 (cited 4/2/24)

Jung, S., & Marron, J. S. (2009). PCA consistency in high dimension, low sample size context. The Annals of Statistics, 37(6B), 4104-4130 (cited 3/7/24)

Jung, S., Foskey, M., & Marron, J. S. (2011). Principal arc analysis on direct product manifolds. The Annals of Applied Statistics, 578-603 (cited 2/6/24)

Jung, S., Dryden I. L., & Marron, J. S., (2012) Analysis of Principal Nested Spheres, Biometrika, doi: 10.1093/biomet/ass022 (cited 2/6/24)

Jung, S., Sen, A. and Marron, J. S. (2012), Boundary behavior in high dimension, low sample size asymptotics of PCA, The Journal of Multivariate Analysis,109, 190–203 (cited 3/7/24)

Karcher, H. (2014). Riemannian center of mass and so called karcher mean. arXiv preprint arXiv:1407.2087 (cited 2/20/24)

Kaufman, L. and Rousseeuw, P. J. (2005) Finding Groups in Data: An Introduction to Cluster Analysis, Wiley, New York  (cited 3/19/24)

Kim, B. (2018). Small sphere distributions and related topics in directional statistics, Doctoral dissertation, University of Pittsburgh (cited 2/6/24)

Kimes, P. K., Cabanski, C. R., Wilkerson, M. D., Zhao, N., Johnson, A. R., Perou, C. M., Makowski, L., Marron, J. S. & Hayes, D. N. (2014) SigFuge: single gene clustering of RNA-seq reveals differential isoform usage among cancer samples, Nucleic Acids Research (2014): gku521 (cited 1/18/24)

Kimes, P. K., Liu, Y., Neil Hayes, D., & Marron, J. S. (2017). Statistical significance for hierarchical clustering. Biometrics, 73(3), 811-821 (cited 3/26/24)

Kingsolver, J. G., Heckman, N., Zhang, J., Carter, P. A., Knies, J. L., Stinchcombe, J. R., & Meyer, K. (2015). Genetic variation, simplicity, and evolutionary constraints for function-valued traits. The American Naturalist, 185(6), E166-E181 (cited 1/30/24)

Klein, R. J., Zeiss, C., Chew, E. Y., Tsai, J. Y., Sackler, R. S., Haynes, C., … & Bracken, M. B. (2005). Complement factor H polymorphism in age-related macular degeneration. Science, 308, 385-389 (cited 3/5/24)

Koch, I., Hoffmann, P., & Marron, J. S. (2014). Proteomics profiles from mass spectrometry. Electronic Journal of Statistics, 8(2), 1703-1713 (2/20/24)

Kruskal, J. B. (1964). Nonmetric multidimensional scaling: a numerical method. Psychometrika, 29(2), 115-129 (cited 1/30/24, 3/7/24)

Lam, X. Y., Marron, J. S., Sun, D., & Toh, K. C. (2018). Fast algorithms for large-scale generalized distance weighted discrimination. Journal of Computational and Graphical Statistics, 27(2), 368-379 (cited 4/23/24)

LeBlanc, M., & Tibshirani, R. (1996). Combining estimates in regression and classification. Journal of the American Statistical Association, 91(436), 1641-1650 (cited 2/15/24)

Ledoux, M. (2001). The concentration of measure phenomenon (No. 89). American Mathematical Soc. (cited 3/5/24)

Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788-791 (cited 2/8/24)

Lee, Y., Lin, Y. and Wahba, G. (2004) Multicategory Support Vector Machines, Theory, and Application to the Classification of Microarray Data and Satellite Radiance Data, Journal of the American Statistical Association, 99, 67-81 (cited 4/18/24)

Li, G. and Chen, Z. (1985) Projection pursuit approach to robust dispersion matrices and principal components: primary theory and Monte Carlo, Journal of the American Statistical Association, 80, 759-776 (cited 4/4/24)

Lin, L. (1989), A Concordance Correlation Coefficient to Evaluate Reproducibility, Biometrics, 45, 255–268 (cited 1/30/24)

Lindeberg, T. (1994) Scale Space Theory in Computer Vision, Kluwer (cited 4/2/24)

Liu, R. Y. (1990). On a notion of data depth based on random simplices. The Annals of Statistics, 18(1), 405-414 (cited 4/4/24)

Liu, X. (2007). New statistical tools for microarray data and comparison with existing tools. The University of North Carolina at Chapel Hill (cited 4/25/24)

Liu, Y., Hayes, D. N., Nobel, A. and Marron, J. S. (2008) Statistical Significance of Clustering for High Dimension Low Sample Size Data, Journal of the American Statistical Association, 103, 1281-1293  (cited 3/21/24)

Liu, X., Parker, J., Fan, C., Perou, C. M., & Marron, J. S. (2009). Visualization of cross-platform microarray normalization. Batch Effects and Noise in Microarray Experiments: Sources and Solutions. Wiley, New York, 167-181 (cited 1/18/24, 4/23/24)

Locantore, N., Marron, J. S., Simpson, D. G., Tripoli, N., Zhang, J. T., Cohen, K. L., … & Fan, J. (1999). Robust principal component analysis for functional data. Test, 8(1), 1-73 (cited 4/4/24)

Lock, E. F., Hoadley, K. A., Marron, J. S., & Nobel, A. B. (2013). Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. The Annals of Applied Statistics, 7(1), 523 (cited 2/22/24)

Lu, X., & Marron, J. S. (2014). Analysis of juggling data: Object oriented data analysis of clustering in acceleration functions. Electronic Journal of Statistics, 8(2), 1842-1847 (cited 2/22/24)

Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning research, 9(Nov), 2579-2605 (cited 4/18/24)

Maggiora, G. M. (2006). On outliers and activity cliffs why QSAR often disappoints (cited 1/23/24)

Marčenko, V. A., & Pastur, L. A. (1967). Distribution of eigenvalues for some sets of random matrices. Mathematics of the USSR-Sbornik, 1(4), 457 (cited 3/7/24)

Mardia, K. V., Kent, J. T., & Bibby, J. M. (1967)  Multivariate analysis,  Probability and mathematical statistics. Academic Press Inc. (cited 4/4/24)

Mardia, K. V., & Jupp, P. E. (2009). Directional statistics. John Wiley & Sons (cited 2/6/24)

Marron, J. S. & Alonso, A. M. (2014) Overview of object oriented data analysis, Biometrical Journal, 56, 732-753 (cited 1/11/24)

Marron, J. S. & Dryden, I. L. (2021) Object Oriented Data Analysis, CRC Press (cited 1/11/24)

Marron, J. S., Ramsay, J. O., Sangalli, L. M., & Srivastava, A. (2014). Statistics of time warpings and phase variations. Electronic Journal of Statistics, 8(2), 1697-1702 (cited 2/22/24)

Marron, J. S., Ramsay, J. O., Sangalli, L. M., & Srivastava, A. (2015). Functional data analysis of amplitude and phase variation. Statistical Science, 30(4), 468-484 (cited 2/22/22)

Marron, J. S., Todd, M. J., & Ahn, J. (2007). Distance-weighted discrimination. Journal of the American Statistical Association, 102(480), 1267-1271 (cited 4/23/24)

Marron, J. S., & Wand, M. P. (1992). Exact mean integrated squared error. The Annals of Statistics, 712-736 (cited 4/2/24)

MacQueen, J. B. (1967) Some Methods for Classification and Analysis of Multivariate Observations, Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, 281-297, University of California Press, Berkeley  (cited 3/19/24)

McLachlan, G. J. (2004) Discriminant Analysis and Statistical Pattern Recognition, Wiley-Interscience (cited 4/4/24)

Menafoglio, A., Grasso, M., Secchi, P. & Colosimo, B.M. (2018) Profile monitoring of probability density functions via simplicial functional PCA with application to image data, Technometrics, 60, 497-510  (cited 1/18/24)

Miao, D. (2015) Class-Sensitive Principal Components Analysis , UNC PhD Dissertation, https://cdr.lib.unc.edu/record/uuid:853d8c52-5b4a-4607-afff-9554b68bb6f5 (cited 4/16/24)

Miedema, J., Marron, J. S., Niethammer, M., Borland, D., Woosley, J., Coposky, J. & Thomas, N. E. (2012) Image and statistical analysis of melanocytic histology. Histopathology, 61(3), 436-444 (cited 1/25/24)

Milasevic, P., & Ducharme, G. R. (1987). Uniqueness of the spatial median. The Annals of Statistics, 15(3), 1332-1333 (cited 4/4/24)

Möttönen, J., & Oja, H. (1995). Multivariate spatial sign and rank methods. Journal of Nonparametric Statistics, 5(2), 201-213 (cited 4/4/24)

Oja, H. (2010). Multivariate nonparametric methods with R: an approach based on spatial signs and ranks. Springer Science & Business Media (14/4/24)

Patrangenaru, V., & Ellingson, L. (2019). Nonparametric statistics on manifolds and their applications to object data analysis. CRC Press. (cited 1/23/24, 2/6/24, 2/15/24)

Parzen, E. (2004) Quantile probability and statistical data modeling, Statistical Science, 19, 652-662. (cited 1/18/24)

Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statistica Sinica, 17(4), 1617 (cited 3/7/24)

Pearson, K. (1901) On Lines and Planes of Closest Fit to Systems of Points in Space, Philosophical Magazine, 2, 559-572 (cited 1/16/24, 1/30/24)

Perou, C. M., Sorlie, T., Eisen, M. B., & Van De Rijn, M. (2000). Molecular portraits of human breast tumours. nature, 406(6797), 747 (cited 3/21/24, 4/23/24)

Pigoli, D., Pantelis, Z. H., Coleman, J. S. & Aston, J. A. D. (2015) The analysis of acoustic phonetic data: exploring differences in the spoken romance languages. arXiv preprint arXiv:1507.07587 (cited 1/11/24)

Pizer, S. M., Jung, S., Goswami, D., Vicory, J., Zhao, X., Chaudhuri, R., … & Marron, J. S. (2013). Nested sphere statistics of skeletal models. In Innovations for Shape Analysis (pp. 93-115). Springer, Berlin, Heidelberg (cited 2/6/24, 2/8/24)

Pizer, S. M., & Marron, J. S. (2017). Object statistics on curved manifolds. In Statistical Shape and Deformation Analysis (pp. 137-164). Academic Press (cited 2/6/24)

Pizer, S. M., Hong, J., Vicory, J., Liu, Z., Marron, J. S., Choi, H. Y., … & Wang, J. (2020). Object shape representation via skeletal models (s-reps) and statistical analysis. In Riemannian Geometric Statistics in Medical Image Analysis (pp. 233-271). Academic Press. (cited 2/6/24)

Prothero, J. B., Hannig, J., & Marron, J. S. (2021). New perspectives on centering. arXiv preprint arXiv:2103.12176 (cited 2/22/2024)

Prothero, J., Jiang, M., Hannig, J., Tran-Dinh, Q., Ackerman, A., & Marron, J. S. (2022). Data integration via analysis of subspaces (DIVAS). arXiv preprint arXiv:2212.00703 (cited 2/29/24)
Qiao, X., Zhang, H. H., Liu, Y., Todd, M. J., & Marron, J. S. (2010). Weighted distance weighted discrimination and its asymptotic properties. Journal of the American Statistical Association, 105(489), 401-414 (cited 4/23/24)

Ramsay, J. O. & Silverman, B. W. (2002) Applied Functional Data Analysis, Springer, N.Y. ISBN 0-387-95414-7 (cited 1/11/24)

Ramsay, J. O. & Silverman, B. W. (2005) Functional Data Analysis, 2nd Edition, Springer, N.Y. ISBN 0-387-40080-X (cited 1/11/24)

Ramsay, J. O. (2005) Functional Data Analysis Web Site, http://psych.mcgill.ca/misc/fda/  (cited 1/11/24)

Ramsay, J. O., Gribble, P., & Kurtek, S. (2014). Description and processing of functional data arising from juggling trajectories. Electronic Journal of Statistics, 8(2), 1811-1816 (cited 2/22/24)

Rao, C. R. (1945). Information and the accuracy attainable in the estimation of statistical parameters, Bull. Calcutta Math. Soc.,  37, 81-91. (cited 2/20/24)

Rao, C. R. (1958). Some statistical methods for comparison of growth curves. Biometrics, 14(1), 1-17  (cited 1/30/24)

Rondonotti, V., Marron, J. S., & Park, C. (2007). SiZer for time series: a new approach to the analysis of trends. Electronic Journal of Statistics, 1, 268-289 (cited 4/2/24)

Rousseeuw, P. J., & Leroy, A. M. (2005). Robust regression and outlier detection (Vol. 589). John Wiley & Sons (cited 4/4/24)

Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323-2326 (cited 2/15/24)

Sarle, W. S., and Kuo, A. H. (1993), The MODECLUS Procedure, Technical Report P-256, SAS Institute Inc., Cary  (cited 3/21/24)

Schmitz, H. P. and Marron, J. S. (1992) Simultaneous estimation of several size distributions of  income, Econometric Theory, 8, 476-488  (cited 4/2/24)

Schölkopf, B., Smola, A., & Müller, K. R. (1997) Kernel principal component analysis. In International conference on artificial neural networks (pp. 583-588). Springer, Berlin, Heidelberg (cited 4/16/24)

Schölkopf, B., & Smola, A. J. (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press (cited 4/18/124)

Shabalin, A. A., Tjelmeland, H., Fan, C., Perou, C. M., & Nobel, A. B. (2008) Merging two gene-expression studies via cross-platform normalization. Bioinformatics, 24(9), 1154-1160 (cited 4/23/24)

Shabalin, A. A., & Nobel, A. B. (2013). Reconstruction of a low-rank matrix in the presence of Gaussian noise. Journal of Multivariate Analysis, 118, 67-76 (cited 3/5/24)

Shen, H., & Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. Journal of multivariate analysis, 99(6), 1015-1034 (cited 3/7/24)

Shen, D., Shen, H., & Marron, J. S. (2013) Consistency of sparse PCA in high dimension, low sample size contexts. Journal of Multivariate Analysis, 115, 317-333 (cited 3/7/24)

Shen, D., Shen, H., Zhu, H., & Marron, J. S. (2016) The statistics and mathematics of high dimension low sample size asymptotics. Statistica Sinica, 26(4), 1747 (cited 3/7/24)

Siddiqi, K. and Pizer, S. M. (2007) Medial Representations Mathematics Algorithms and Applications, Springer, New York (cited 2/6/24)

Srivastava, A., Wu, W., Kurtek, S., Klassen, E., & Marron, J. S. (2011). Registration of functional data using Fisher-Rao metric. arXiv preprint arXiv:1103.3817 (cited 2/15/24)

Srivastava, A., & Klassen, E. P. (2016). Functional and shape data analysis (Vol. 1). New York: Springer (cited 2/15/24)

Srivastava, M. S., Katayama, S., & Kano, Y. (2013) A two sample test in high dimensional data. Journal of Multivariate Analysis, 114, 349-358 (cited 1/18/24)

Staudte, R. G. and Sheather, S. J. (2011) Robust Estimation and Testing, Wiley, New York (cited 4/4/24)

Stephens, M. A. (1974). EDF statistics for goodness of fit and some comparisons. Journal of the American statistical Association, 69(347), 730-737  (cited 1/25/24)

Talagrand, M. (1991). A new isoperimetric inequality and the concentration of measure phenomenon. In Geometric Aspects of Functional Analysis (pp. 94-124). Springer, Berlin, Heidelberg (cited 3/5/24)

Talagrand, M. (1995). Concentration of measure and isoperimetric inequalities in product spaces. Publications Mathématiques de l’Institut des Hautes Etudes Scientifiques, 81(1), 73-205 (cited 3/5/24)

Tenenbaum, J. B., De Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319-2323 (cited 2/6/24, 2/15/24)

Toh, K. C., Todd, M. J. & Tutuncu, R. H. (1999) www.math.nus.edu.sg/~mattohkc/sdpt3.html  (cited 4/23/24)

Tracy, C. A., & Widom, H. (1994). Level-spacing distributions and the Airy kernel. Communications in Mathematical Physics, 159(1), 151-174 (cited 3/7/24)

Tucker, Derek J. (2024) fdasrvf software. https://research.tetonedge.net/software (cited 2/20/24)

Tukey, J. W. (1977) Exploratory data analysis, Pearson, N.Y. ISBN 978-0201076165 (cited 1/11/24)

Vapnik, V, N. (1982) Estimation of dependences based on empirical data, Springer (Russian version, 1979) (cited 4/16/24)

Vapnik, V. N. (1995) The nature of statistical learning theory, Springer (cited 4/16/24)

Venables, W.N. & Ripley, B.D. (2013) Modern applied statistics with S-PLUS. Springer Science & Business Media (cited 1/23/24)

Vidal, R., Ma, Y., & Sastry, S. (2016). Generalized principal component analysis, Springer (cited 4/18/24)

Wahba, G., Lin, Y., & Zhang, H. (1999). Generalized approximate cross validation for support vector machines, or, another way to look at margin-like quantities. (cited 4/18/24)

Wahba, G., Lin, Y., Lee, Y., & Zhang, H. (2003). Optimal properties and adaptive tuning of standard and nonstandard support vector machines. In Nonlinear estimation and classification (pp. 129-147) Springer, New York, NY (cited 4/18/24)

Wand, M. P., & Jones, M. C. (1994). Kernel smoothing. Crc Press (cited 3/26/24)

Wang, B., & Zou, H. (2016). Sparse Distance Weighted Discrimination. Journal of Computational and Graphical Statistics, 25(3), 826-838 (cited 4/23/24)

Wang, B., & Zou, H. (2018). Another look at distance-weighted discrimination. Journal of the Royal Statistical Society Series B: Statistical Methodology, 80(1), 177-198 (cited 4/25/24)

Wang, H. & Marron, J. S. (2007) Object oriented data analysis: sets of trees, Annals of Statistics, 35, 1849-1873  (cited 1/11/24)

Wegelin, J. A. (2000). A survey of Partial Least Squares (PLS) methods, with emphasis on the two-block case, http://www.stat.washington.edu/www/research/reports/2000/tr371.ps (cited 2/22/24)

Wei, S., Lee, C., Wichers, L., & Marron, J. S. (2015) Direction-projection-permutation for high dimensional hypothesis tests. Journal of Computational and Graphical Statistics, (cited 1/18/24, 3/7/24)

White, H. (2014). Asymptotic theory for econometricians. Academic press (cited 3/5/24)

Wilkinson, L. (2017). Visualizing Big Data Outliers through Distributed Aggregation. IEEE Transactions on Visualization and Computer Graphics (cited 4/4/24)

Wold, H. (1975). Soft modelling by latent variables: the non-linear iterative partial least squares (NIPALS) approach. Journal of Applied Probability, 12(S1), 117-142 (cited 2/22/24)

Wold, H. O. A. (1985). Partial least squares. Kotz S, Johnson N L. Encyclopedia of Statistical Sciences. New York: Wiley, 581-591 (cited 2/22/24)

Wright, F. A., Strug, L. J., Doshi, V. K., Commander, C. W., Blackman, S. M., Sun, L., … & Corey, M. (2011). Genome-wide association and linkage identify modifier loci of lung disease severity in cystic fibrosis at 11p13 and 20q13. 2. Nature genetics, 43(6), 539-546 (cited 4/4/24)

Xiong, J., Dittmer, D. P., & Marron, J. S. (2015). “Virus hunting” using radial distance weighted discrimination. The Annals of Applied Statistics, 9(4), 2090-2109 (cited 4/23/24)

Yang, X., Hannig, J., Hoadley, K. A., Carmichael, I., & Marron, J. S. (2023). Measure of Strength of Evidence for Visually Observed Differences between Subpopulations. Journal of Computational and Graphical Statistics, (just-accepted), 1-14.  (cited 1/18/24, 3/7/24)

Yao, J., Zheng, S., & Bai, Z. D. (2015). Sample covariance matrices and high-dimensional data analysis. Cambridge University Press (cited 3/7/24)

Yata, K., & Aoshima, M. (2009) PCA consistency for non-Gaussian data in high dimension, low sample size context. Communications in Statistics—Theory and Methods, 38(16-17), 2634-2652 (cited 3/5/24)

Yata, K., & Aoshima, M. (2010a) Effective PCA for high-dimension, low-sample-size data with singular value decomposition of cross data matrix. Journal of multivariate analysis, 101(9), 2060-2077 (cited 3/5/24)

Yata, K., & Aoshima, M. (2010b) Intrinsic dimensionality estimation of high-dimension, low sample size data with d-asymptotics. Communications in Statistics—Theory and Methods, 39(8-9), 1511-1521 (cited 3/5/24)

Yata, K., & Aoshima, M. (2012) Effective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations. Journal of multivariate analysis, 105(1), 193-215 (cited 3/5/24)

Yata, K., & Aoshima, M. (2013) PCA consistency for the power spiked model in high-dimensional settings. Journal of multivariate analysis, 122, 334-354 (cited 3/5/24)

Yu, Q., Lu, X., & Marron, J. S. (2017). Principal Nested Spheres for Time-Warped Functional Data Analysis. Journal of Computational and Graphical Statistics, 26(1), 144-151 (cited 2/20/24)

Yu, Q., Risk, B. B., Zhang, K., & Marron, J. S. (2017). JIVE integration of imaging and behavioral data. NeuroImage, 152, 38-49 (cited 2/22/24)

Yushkevich, P., Pizer, S. M., Joshi, S., and Marron, J. S. (2001) Intiutive, localized analysis of shape variability, Information Processing in Medical Imaging (IPMI), eds. Insana, M. F. and Leahy, R. M. 402-408 (cited 2/6/24)

Zhang, L., Marron, J. S., Shen, H., & Zhu, Z. (2007). Singular value decomposition and its visualization. Journal of Computational and Graphical Statistics, 16(4), 833-854 (cited 1/30/24)

Zhang, L., Lu, S., & Marron, J. S. (2015). Nested nonnegative cone analysis. Computational Statistics & Data Analysis, 88, 100-110 (cited 2/8/24)

Zhang, J., Heckman, N., Cubranic, D., Kingsolver, J. G., Gaydos, T., & Marron, J. S. (2014). Prinsimp. R Journal, 6(2) (cited 1/30/24)

Zhou, Y. H., & Marron, J. S. (2015). High dimension low sample size asymptotics of robust PCA. Electronic Journal of Statistics, 9(1), 204-218 (cited 4/9/24)

Zhou, Y. H., & Marron, J. S. (2016). Visualization of robust L1PCA. Stat, 5(1), 173-184 (cited 4/9/24)

Zoubouloglou, P., García-Portugués, E., & Marron, J. S. (2023). Scaled torus principal component analysis. Journal of Computational and Graphical Statistics, 32(3), 1024-1035 (cited 2/8/24)