Course Home Page – STOR 881 Object Oriented Data Analysis – Fall 2019

Instructor:    J. S. Marron

Email:   marron@unc.edu

Office:   352 Hanes Hall

 

Course Notes

  1.   STOR881-08-20-2019.pptx:  Organizational Matters, OODA Book, What is OODA?, Taste of OODA Examples (including Spanish Male Mortality), 3 Major Phases of OODA, Visualization, Scatterplot Matrix Views
  2.   STOR881-08-22-2019.pptx: Principal Component Analysis (PCA), Object Space – Feature Space, Scree Plots, Define Modes of Variation, PCA Toy Examples
  3.   STOR881-08-27-2019.pptx:  PCA Toy Examples, RNAseq Data, Cancer Gene Expression (PCA, DWD, Loadings), Caution About DWD
  4.   STOR881-08-29-2019.pptx:  Inference using DiProPerm,  Prob. Distn’s as Data Object, OODA Terminology, Matlab Software
  5.   09-03-2019:  Class Cancelled, Marron Sick
  6.   09-05-2019:  Class Cancelled, Marron Sick
  7.   STOR881-09-10-2019.pptx:  Marginal Distribution Plots, Drug Discovery Data, Correlation PCA
  8.   STOR881-09-12-2019.pptx:  Correlation PCA, General Transformations, Melanoma Data, Automatic Shifted Log Transformation, ROC curves
  9.   STOR881-09-17-2019.pptx:  Heatmap Views, Other Directions for Projection, Distance Methods (Fréchet Mean, Multidimensional Scaling), Landmark Based Shape Analysis, Equivalence Relations
  10.    STOR881-09-19-2019.pptx:  Landmark Based Shape Analysis, Shape Representations, Bladder-Prostate-Rectum Data, Skeletal Representations, Manifold Feature Spaces – Benjamin Leinwand: Longitudinal Brain Scans, Keerthi Anand:  Blind Source Separation
  11.   STOR881-09-24-2019.pptx:  Manifold Feature Spaces, PCA for s-reps, Principal Nested Spheres – Kevin O’Connor: Distributions as Data Objects, Jonghwan Yoo: JIVE on Methylation and Images
  12.   STOR881-09-26-2019.pptx:  Backwards PCA, Nonnegative Matrix Factorization, Manifold Learning, PCA Basics, History, Computation – Benjamin Langworthy: PCA for Multivar. Survival
  13.   STOR881-10-01-2019.pptx:  PCA as Optimization, Redistribution of Energy – Siyao Liu: Clustering Single Cell RNAseq Data
  14.   STOR881-10-03-2019.pptx: PCA Data Representation, PCA Simulation, PCA Graphical Representation, Classification -Discrimination, Fisher Linear Discrimination – Kentaro Hoffman: Time Frequency Analysis
  15.   STOR881-10-08-2019.pptx:  FLD Likelihood Approach, Gaussian Likelihood Ratio, HDLSS Discrimination, Maximal Data Piling – Yifeng Shi: Dealing with Histological Images Using Set-structured Method
  16.   STOR881-10-10-2019.pptx: Maximal Data Piling, Kernel Embedding – Chalmer Tomlinson: Longitudinal Autism, Samuel Rosin: Horseshoe Effect in Microbiome Data
  17.   STOR881-10-15-2019.pptx:  Kernel Embedding, Kernel PCA, t-SNE Visualization, Support Vector Machines – Alexander Murph: Visual Disturbances, Haodong Wang: Comparing t-SNE and UMAP for visualizing scRNA-seq data
  18.   October 17, 2019:   Fall Break, No Class
  19.   STOR881-10-22-2019.pptx:  Support Vector Machines, Distance Weighted Discrimination, Batch and Source Adjustment, Breast Cancer Data – Bryce Rowland: NMF for Hi-C Data, Thomas Keefe: Haar Wavelet Bases, Feng Cheng: How to normalize: a challenge in accelerating Magnetic Resonance Fingerprinting
  20.   STOR881-10-24-2019.pptx:  Distance Weighted Discrimination, NCI 60 Data, SVM & DWD Tuning, Outliers and Robust Methods – Seoyoon Cho: Relationship Between Mullen Score and Nutritions
  21.   STOR881-10-29-2019.pptx:  Spherical PCA, GWAS Data, VL1PCA, Start HDLSS Asymptotics – Richard Sizelove:  Integrative Analysis for Brain Functional Networks, Sumit Kar: Community structure in biological networks
  22.   STOR881-10-31-2019.pptx:  HDLSS Paradoxes and Explanations, Zero Covariance in Not Independence – Jose Sanchez: Stability of filtrations in Topological Data Analysis, Yonggang Sha: Statistical analysis methods for Multiple Myeloma Microarray  data
  23.   STOR881-11-05-2019.pptx:  Gaussian Scale Mixture and Zero Covariance, Mixing Conditions, HDLSS Analysis of PCA, Sparsity and DiProPerm – Yue Pan: Data Visualization for single cell RNA-seq data, Andrew Hamilton: Time Series Prediction
  24.   STOR881-11-07-2019.pptx: HDLSS Analysis of DWD Batch Adjustment, Radial DWD, Random Matrix Theory – David Bang: Catboost: Handling Large Categorical Variables, Carson Mosso: Manifold Learning, Katelyn Heath: Using ViSR Ultrasound breast data to diagnose malignancy in patients
  25.   STOR881-11-12-2019.pptx: Random Matrix Theory, High d Kernel Methods are Linear, PCA to Find Clusters, Smoothing – Dhruv Patel: Connections between Persistence Homology and Curvature, Hang Su: Characterization of Collaborative Cross Genomes
  26.   STOR881-11-14-2019.pptx:  Density Estimation, Scatterplot Smoothing, Inference Using SiZer – Wei Gu: A Heuristic Approach to Portfolio Optimization with Cardinality Constraints, Nicole Kramer: Calling DNA Loops in Hi-C Data
  27.   STOR881-11-19-2019.pptx:  Finish SiZer, Q-Q & PP Plots – Wongkyung Jang: OODA in Human-Computer Interaction, Bohan Li: Introduction to Word Segmentation, Andrew Marron: Kidney Function Methylation Data
  28.   STOR881-11-21-2019.pptx:  K-Means Clustering, SWISS, Hierarchical Clustering – Ram Basak:  FDA on Health Outcomes, Siqi Xiang: Analysis of Knee Osteoarthritis Data: Auto transformation and BET, Nicolas Wolczynski: Urban Sound Classification, Mingyi Wang: Symbolic Data principal component analysis
  29.   STOR881-11-26-2019.pptx:  Hierarchical Clustering, SigClust – Pavlos Zoubouloglou: Geodesic PCA in the Wasserstein Space, Taylor Petty: Forensic DNA Testing
  30.   November 28, 2019:    Thanksgiving, No Class
  31.   STOR881-12-03-2019.pptx:  SigClust, JIVE

 

References

Ahn, J. (2006) High dimension, low sample size data analysis. PhD Dissertation, University of North Carolina, Chapel Hill (cited 10/22/19)

Ahn, J., Marron, J. S., Muller, K. M., & Chi, Y. Y. (2007) The high-dimension, low-sample-size geometric representation holds under mild conditions. Biometrika, 94(3), 760-766 (cited 10/31/19, 11/5/19)

Ahn, J., & Marron, J. S. (2010) The maximal data piling direction for discrimination. Biometrika, 97(1), 254-259 (cited 10/8/19, 10/10/19)

Aitchison, J. (1982) The statistical analysis of compositional data. Journal of the Royal Statistical Society: Series B (Methodological), 44, 139-160 (cited 8/27/19)

Aizerman, A., Braverman, E. M., & Rozoner, L. I. (1964) Theoretical foundations of the potential function method in pattern recognition learning. Automation and remote control, 25, 821-837 (cited 10/10/19)

Alter, O., Brown, P. O., & Botstein, D. (2000) Singular value decomposition for genome-wide expression data processing and modeling. Proceedings of the National Academy of Sciences, 97, 10101-10106 (cited 10/22/19)

Anderson, T. W., & Darling, D. A. (1952) Asymptotic theory of certain” goodness of fit” criteria based on stochastic processes. The Annals of Mathematical Statistics, 193-212  (cited 9/12/19)

Aoshima, M., Shen, D., Shen, H., Yata, K., Zhou, Y. H., & Marron, J. S. (2018). A survey of high dimension low sample size asymptotics. Australian & New Zealand journal of statistics, 60(1), 4-19 (cited 11/7/19)

Bai, Z. D., & Saranadasa, H. (1996) Effect of high dimension: by an example of a two sample problem. Statistica Sinica, 6(2), 311-329 (cited 8/29/19)

Benito, M., Parker, J., Du, Q., Wu, J., Xiang, D., Perou, C. M., & Marron, J. S. (2004) Adjustment of systematic microarray data biases. Bioinformatics, 20(1), 105-114 (10/22/19)

Benito, M., García-Portugués, E., Marron, J. S., & Peña, D. (2017) Distance-weighted discrimination of face images for gender classification, Stat, 6, 231-240 (cited 8/20/19, 10/10/19)

Bickel, P. J. and Levina, E. (2004) Some theory for Fisher’s Linear Discriminant function, “naive Bayes”, and some alternatives when there are many more variables than observations, Bernoulli, 10, 989-1010 (cited 10/1/19, 10/3/19)

Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.  (cited 10/15/19)

Bookstein, F. L. (1991). Morphometric Tools for Landmark Data, Cambridge: Cambridge University Press (cited 9/17/19)

Borland, D., & Taylor, R. M. (2007). Rainbow color map (still) considered harmful. IEEE computer graphics and applications, 27(2), 14-17, (cited 9/17/19)

Borysov, P., Hannig, J., Marron, J. S., Muratov, E., Fourches, D., & Tropsha, A. (2016). Activity prediction and identification of mis‐annotated chemical compounds using extreme descriptors. Journal of Chemometrics, 30(3), 99-108 (cited 9/10/19)

Boser, B. E., Guyon, I. and Vapnik, V. (1992) A Training Algorithm for Optimal Margin Classifiers, in Fifth Annual Workshop on Computational Learning Theory, ACM (cited 10/15/19)

Box, G. E., & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society. Series B (Methodological), 211-252 (cited 9/12/19)

Bradley, R. C. (2005). Basic properties of strong mixing conditions. A survey and some open questions. Probab. Surv. 2 107–144 (electronic). (Update of, and a supplement to, the 1986 original.)  (cited 11/5/19)

Brooks, J. P., Dulá, J. H., & Boone, E. L. (2013). A pure L1-norm principal component analysis. Computational statistics & data analysis, 61, 83-98 (cited 9/29/19)

Burges, C. J. (1998). A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery, 2(2), 121-167 (cited 10/15/19)

Cabanski, C. R., Qi, Y., Yin, X., Bair, E., Hayward, M. C., Fan, C., Li, J., Wilkerson, M. D., Marron, J. S., Perou, C. M. and Hayes, D. N. (2010) SWISS MADE: Standardized WithIn Class Sum of Squares to Evaluate Methodologies and Dataset Elements, PLoS ONE, 5(3): e9905.doi:10.1371/journal.pone.0009905, PMCID: PMC2845619.   (cited 11/21/19)

Cai, T., Liu, W., & Xia, Y. (2014) Two‐sample test of high dimensional means under dependence. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(2), 349-372 (cited 8/29/19)

Carmichael, I., & Marron, J. S. (2017). Geometric insights into support vector machine behavior using the KKT conditions. arXiv preprint arXiv:1704.00767 (cited 10/22/19, 10/24/19)

Cates, J., Fletcher, P. T., Styner, M., Shenton, M., & Whitaker, R. (2007, July). Shape modeling and analysis with entropy-based particle systems. In Biennial International Conference on Information Processing in Medical Imaging (pp. 333-345). Springer, Berlin, Heidelberg (cited 9/19/19)

Cattell, R. B. (1966) The scree test for the number of factors. Multivariate Behavioral Research, 1(2), 245-276 (cited 8/22/19, 10/1/19, 11/7/19)

Chang, T. (1988). Estimating the relative rotation of two tectonic plates from boundary crossings. Journal of the American Statistical Association, 83(404), 1178-1183 (cited 9/24/19)

Chaudhuri, P. and Marron, J. S. (1999) SiZer for exploration of structure in curves, Journal of the American Statistical Association, 94, 807-823 (cited 11/14/19. 11/19/19)

Chaudhuri, P., & Marron, J. S. (2000). Scale space view of curve estimation. Annals of Statistics, 408-428 (cited 11/19/19)

Chen, S. X., & Qin, Y. L. (2010) A two-sample test for high-dimensional data with applications to gene-set testing. The Annals of Statistics, 808-835  (cited 8/29/19)

Clarke, B. R. (2018). Robustness theory and application. John Wiley & Sons (cited 10/24/19)

Cootes, T. F., Hill, A., Taylor, C. J. and Haslam, J. (1993) The use of active shape models for locating structures in medical images, Information in Medical Imaging, H. H. Barret and A. F. Gmitro, eds. Lecture Notes in Computer Science 687, 33-47, Springer Verlag, Berlin (cited 9/19/19)

CRAN-DWD (2014). https://cran.r-project.org/package=DWD (cited 10/22/19)

Cristianini, N. and Shawe-Taylor, J. (2000) An Introduction to Support Vector Machines, Cambridge University Press (cited 10/15/19)

Dai, W., & Genton, M. G. (2016). Directional outlyingness for multivariate functional data. arXiv preprint arXiv:1612.04615 (cited 10/24/19)

Dai, W., & Genton, M. G. (2017). Multivariate Functional Data Visualization and Outlier Detection. arXiv preprint arXiv:1703.06419. (cited 10/24.19)

Damon, J., & Marron, J. S. (2014). Backwards principal component analysis and principal nested relations. Journal of Mathematical Imaging and Vision, 50(1-2), 107-114 (cited 9/26/19)

DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, 44, 837-845 (cited 9/12/19)

Dobriban, E. (2015). Efficient computation of limit spectra of sample covariance matrices. Random Matrices: Theory and Applications, 4(04), 1550019 (cited 11/7/19)

Domingos, P. & Pazzani, M. (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29:103–­137 (cited 10/1/19)

Dryden, I.L., Mardia, K.V. (2016) Statistical Shape Analysis with applications in R, Wiley, Chichester (cited 9/17/19)

Duda, R. O. and Hart P. E. (1973) Pattern Classification and Scene Analysis, Wiley, New York (cited 10/3/19)

Duda, R. O., Hart P. E. and Stork, D. G. (2001) Pattern Classification, Wiley, New York (cited 10/3/19, 10/8/19)

Eckart, C., & Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika, 1(3), 211-218 (cited 9/17/19)

El Karoui, N. (2010). The spectrum of kernel random matrices. The Annals of Statistics, 38(1), 1-50 (cited 10/15/19, 11/12/19)

Eltzner, B., Jung, S., & Huckemann, S. (2015). Dimension reduction on polyspheres with application to skeletal representations. In International Conference on Networked Geometric Science of Information (pp. 22-29). Springer, Cham. (cited 9/24/19)

Fan, J., & Gijbels, I. (1996). Local Polynomial Modelling and Its Applications, Chapman and Hall, London (cited 11/14/19)

Feng, Q., Hannig, J., & Marron, J. S. (2016). A note on automatic data transformation. Stat, 5(1), 82-87  (cited 9/10/2019, 9/12/19)

Feng, Q., Jiang, M., Hannig, J., & Marron, J. S. (2018). Angle-based joint and individual variation explained. Journal of multivariate analysis, 166, 241-265 (cited 12/03/19)

Fisher, N. I. (1995). Statistical analysis of circular data. Cambridge University Press (cited 9/19/19)

Fisher, R.A. (1936) The Use of Multiple Measurements in Taxonomic Problems, Annals of Eugenics, 7, 179-188  (cited 10/3/19)

Fletcher, P. T. (2004) Statistical variability in nonlinear spaces: Application to shape analysis and DT-MRI, University of North Carolina at Chapel Hill  (cited 9/19/19, 9/24/19)

Fréchet, M. (1948) Les éléments aléatoires de nature quelconque dans un espace distancié, Annales de l’institut Henri Poincaré, 10, 215-310 (cited 9/17/19, 9/19/19)

Gavish, M., & Donoho, D. L. (2014). The optimal hard threshold for singular values is 4/sqrt(3#. IEEE Transactions on Information Theory, 60(8), 5040-5053 (cited 12/03/19)

Gaydos, T. L., Heckman, N. E., Kirkpatrick, M., Stinchcombe, J. R., Schmitt, J., Kingsolver, J., & Marron, J. S. (2013). Visualizing genetic constraints. The Annals of Applied Statistics, 7(2), 860-882 (cited 9/17/2019)

Gersho, A. and Gray, R. M. (1991) Vector Quantization and Signal Compression, Springer, New York  (cited 11/21/19)

Godtliebsen, F., Marron, J. S., & Chaudhuri, P. (2002). Significance in scale space for bivariate density estimation. Journal of Computational and Graphical Statistics, 11(1), 1-21 (cited 11/19/19)

Godtliebsen, F., Marron, J. S., & Chaudhuri, P. (2004). Statistical significance of features in digital images. Image and Vision Computing, 22(13), 1093-1104 (cited 11/19/19)

Godtliebsen, F., Marron, J. S., & Pizer, S. M. (2002). Significance in scale-space for clustering. Spatial clustering modeling. Chapman and Hall/CRC, 24-36 (cited 11/19/19)

Good, I. J., & Gaskins, R. A. (1980). Density estimation and bump-hunting by the penalized likelihood method exemplified by scattering and meteorite data. Journal of the American Statistical Association, 75(369), 42-56 (cited 11/12/19)

Gower, J. C. (1966). Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika, 53(3-4), 325-338 (cited 9/17/19)

Gower, J. C. (1974). Algorithm as 78: The mediancentre. Journal of the Royal Statistical Society. Series C (Applied Statistics), 23(3), 466-470 (cited 10/24/19)

Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics, Wiley (cited 9/12/19)

Haldane, J. B. S. (1948) Note on the median of a multivariate distribution, Biometrika, 35, 414-415 (cited 910/24/19)

Hall, P., Marron, J. S., & Neeman, A. (2005). Geometric representation of high dimension, low sample size data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(3), 427-444. (cited 10/29/19, 11/5/19)

Hampel, F. M., Ronchetti, E. R., Rouseeuw, P. J. and Stahel, W. A. (2011) Robust Statistics: the Approach Based on Influence Functions, Wiley, New York (cited 10/24/19)

Hannig, J., & Marron, J. S. (2006). Advanced distribution theory for SiZer. Journal of the American Statistical Association, 101(474), 484-499 (cited 11/14/19, 11/19/19)

Hannig, J., Marron, J. S., & Riedi, R. (2001). Zooming statistics: Inference across scales. Journal of the Korean Statistical Society, 30(2), 327-345 (cited 11/19/19)

Hartigan, J. A. (1975) Clustering Algorithms, Wiley, New York  (cited 11/21/19)

Hastie, T., & Stuetzle, W. (1989). Principal curves. Journal of the American Statistical Association, 84(406), 502-516 (cited 9/24/19)

Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning New York. NY: Springer, 115-163 (cited 10/15/19)

Hastie, T., Tibshirani, R., Friedman, J., & Franklin, J. (2005). The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer, 27(2), 83-85 (cited 10/15/19)

Hotelling, H. (1933) Analysis of a Complex of Statistical Variables Into Principal Components. Journal of Educational Psychology, 24, 417-441 (cited 8/20/19, 10/1/19)

Hron, K., Menafoglio, A., Templ, M., Hrůzová, K. & Filzmoser, P. (2016) Simplicial principal component analysis for density functions in Bayes spaces. Computational Statistics & Data Analysis, 94, 330-350  (cited 8/27/19)

Hsu, C.-W. and Lin, C.-J. (2002) A comparison of methods for multiclass support vector machines, IEEE Transactions on Neural Networks, 13, 415-425 (cited 10/22/19)

Huang, H., Liu, Y., Yuan, M. and Marron J.S. (2014) Statistical Significance of Clustering Using Soft Thresholding, Journal of Computational and Graphical Statistics, DOI:10.1080/10618600.2014.948179 (cited 11/26/19, 12/03/19)

Huber, P. (2011) Robust Statistics. Wiley, New York (cited 10/24/19)

Huckemann, S., Hotz, T., & Munk, A. (2010). Intrinsic shape analysis: Geodesic PCA for Riemannian manifolds modulo isometric lie group actions. Statistica Sinica, 1-58 (cited 9/24/19)

Inselberg, A. (1985) The Plane with Parallel Coordinates, Visual Computer 1: 69–91 (cited 8/22/19)

Inselberg, A. (2009) Parallel Coordinates: VISUAL Multidimensional Geometry and its Applications. Springer, New York (cited 8/22/19)

Izem, R., & Kingsolver, J. G. (2005). Variation in continuous reaction norms: quantifying directions of biological interest. The American Naturalist, 166(2), 277-289 (cited 9/17/19)

Izem, R., & Marron, J. S. (2007). Analysis of nonlinear modes of variation for functional data. Electronic Journal of Statistics, 1, 641-676 (cited 9/17/19)

Izenman, A. J., & Sommer, C. J. (1988). Philatelic mixtures and multimodal densities. Journal of the American Statistical association, 83(404), 941-953 (cited 11/12/19)

Jammalamadaka, S. R., & Sengupta, A. (2001). Topics in circular statistics (Vol. 5). World Scientific (cited 9/19/19)

Jeong, J.-Y. (2009) Estimation of Probability Distributions on Multiple Anatomical Objects and Evaluation of Statistical Shape Models, Ph.D. Thesis, Department of Computer Science, University of North Carolina (cited 9/19/19)

Joachims, T. (2000). Estimating the Generalization Performance of an SVM Efficiently. In Proc. 17th International Conf. on Machine Learning, 431-438 (cited 10/22/19).

John, S. (1972) The distribution of a statistic used for testing sphericity of normal distributions. Biometrika, 59(1), 169-173 (cited 10/31/19)

Johnstone, I. M. (2008). Multivariate analysis and Jacobi ensembles: Largest eigenvalue, Tracy–Widom limits and rates of convergence. Annals of statistics, 36(6), 2638 (cited 11/12/19)

Jolliffe, I. T. (2002) Principal Component Analysis, Springer, New York, 2nd Edition, ISBN 978-0-387-95442-4   (cited 10/1/19)

Jones, M. C., Marron, J. S., & Sheather, S. J. (1996). A brief survey of bandwidth selection for density estimation. Journal of the American Statistical Association, 91(433), 401-407 (cited 11/19/19)

Jung, S., & Marron, J. S. (2009). PCA consistency in high dimension, low sample size context. The Annals of Statistics, 37(6B), 4104-4130 (cited 11/5/19)

Jung, S., Liu, X., Marron, J. S., & Pizer, S. M. (2010). Generalized PCA via the backward stepwise approach in image analysis. In Brain, Body and Machine (pp. 111-123). Springer Berlin Heidelberg (cited 9/24/19)

Jung, S., Foskey, M., & Marron, J. S. (2011). Principal arc analysis on direct product manifolds. The Annals of Applied Statistics, 578-603 (cited 9/24/19)

Jung, S., Dryden I. L., & Marron, J. S., (2012) Analysis of Principal Nested Spheres, Biometrika, doi: 10.1093/biomet/ass022 (cited 9/24/19)

Jung, S., Sen, A. and Marron, J. S. (2012), Boundary behavior in high dimension, low sample size asymptotics of PCA, The Journal of Multivariate Analysis,109, 190–203 (cited 11/5/19)

Kaufman, L. and Rousseeuw, P. J. (2005) Finding Groups in Data: An Introduction to Cluster Analysis, Wiley, New York  (cited 11/21/19)

Keleman, A. Szèkely, G. and Gerig, G. (1997 & 1999) Three dimensional model-based segmentation, TR-178 Technical Report Image Scinec Lab, ETH Zurich & Elastic model-based segmentation of 3-D neuroradiological daat sets, IEEE Transactions on Medical Imaging, 18, 828-839 (cited 9/19/19)

Kendall, D.G., Barden, D., Carne, T.K. and Le, H. (1999) Shape and Shape Theory, Wiley, Chichester (cited 9/17/19)

Kim, B. (2018). Small sphere distributions and related topics in directional statistics, Doctoral dissertation, University of Pittsburgh (cited 9/24/19)

Kimes, P. K., Cabanski, C. R., Wilkerson, M. D., Zhao, N., Johnson, A. R., Perou, C. M., Makowski, L., Marron, J. S. & Hayes, D. N. (2014) SigFuge: single gene clustering of RNA-seq reveals differential isoform usage among cancer samples, Nucleic Acids Research (2014): gku521 (cited 8/22/19)

Kingsolver, J. G., Heckman, N., Zhang, J., Carter, P. A., Knies, J. L., Stinchcombe, J. R., & Meyer, K. (2015). Genetic variation, simplicity, and evolutionary constraints for function-valued traits. The American Naturalist, 185(6), E166-E181 (cited 9/17/19)

Klein, R. J., Zeiss, C., Chew, E. Y., Tsai, J. Y., Sackler, R. S., Haynes, C., … & Bracken, M. B. (2005). Complement factor H polymorphism in age-related macular degeneration. Science, 308, 385-389 (cited 11/5/19)

Kruskal, J. B. (1964). Nonmetric multidimensional scaling: a numerical method. Psychometrika, 29(2), 115-129 (cited 10/1/19, 11/7/19)

Lam, X. Y., Marron, J. S., Sun, D., & Toh, K. C. (2018). Fast algorithms for large-scale generalized distance weighted discrimination. Journal of Computational and Graphical Statistics, 27(2), 368-379 (cited 10/22/19)

LeBlanc, M., & Tibshirani, R. (1996). Combining estimates in regression and classification. Journal of the American Statistical Association, 91(436), 1641-1650 (cited 9/24/19)

Ledoux, M. (2001). The concentration of measure phenomenon (No. 89). American Mathematical Soc. (cited 10/31/19)

Lee, Y., Lin, Y. and Wahba, G. (2004) Multicategory Support Vector Machines, Theory, and Application to the Classification of Microarray Data and Satellite Radiance Data, Journal of the American Statistical Association, 99, 67-81 (cited 10/22/19)

Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788-791 (cited 9/24/19)

Li, G. and Chen, Z. (1985) Projection pursuit approach to robust dispersion matrices and principal components: primary theory and Monte Carlo, Journal of the American Statistical Association, 80, 759-776 (cited 10/24/19)

Lindeberg, T. (1994) Scale Space Theory in Computer Vision, Kluwer (cited 11/14/19)

Liu, R. Y. (1990). On a notion of data depth based on random simplices. The Annals of Statistics, 18(1), 405-414 (cited 10/24/19)

Liu, X. (2007). New statistical tools for microarray data and comparison with existing tools. The University of North Carolina at Chapel Hill (cited 11/7/19)

Liu, X., Parker, J., Fan, C., Perou, C. M., & Marron, J. S. (2009). Visualization of cross-platform microarray normalization. Batch Effects and Noise in Microarray Experiments: Sources and Solutions. Wiley, New York, 167-181 (cited 8/29/19, 10/24/19)

Liu, Y., Hayes, D. N., Nobel, A. and Marron, J. S. (2008) Statistical Significance of Clustering for High Dimension Low Sample Size Data, Journal of the American Statistical Association, 103, 1281-1293  (cited 11/26/19)

Locantore, N., Marron, J. S., Simpson, D. G., Tripoli, N., Zhang, J. T., Cohen, K. L., … & Fan, J. (1999). Robust principal component analysis for functional data. Test, 8(1), 1-73 (cited 10/24/19)

Lock, E. F., Hoadley, K. A., Marron, J. S., & Nobel, A. B. (2013). Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. The annals of applied statistics, 7(1), 523 (cited 12/03/29)

Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning research, 9(Nov), 2579-2605 (cited 10/15/19)

MacQueen, J. B. (1967) Some Methods for Classification and Analysis of Multivariate Observations, Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, 281-297, University of California Press, Berkeley  (cited 11/21/19)

Maggiora, G. M. (2006). On outliers and activity cliffs why QSAR often disappoints (cited 9/10/19)

Marčenko, V. A., & Pastur, L. A. (1967). Distribution of eigenvalues for some sets of random matrices. Mathematics of the USSR-Sbornik, 1(4), 457 (cited 11/7/19)

Mardia, K. V., Kent, J. T., & Bibby, J. M. Multivariate analysis. 1979. Probability and mathematical statistics. Academic Press Inc (cited 10/8/19)

Mardia, K. V. (2014). Statistics of directional data. Academic press (cited 9/14/19)

Marron, J. S. & Alonso, A. M. (2014) Overview of object oriented data analysis, Biometrical Journal, 56, 732-753 (cited 8/20/19)

Marron, J. S., Todd, M. J., & Ahn, J. (2007). Distance-weighted discrimination. Journal of the American Statistical Association, 102(480), 1267-1271 (cited 10/22/19)

Marron, J. S., & Wand, M. P. (1992). Exact mean integrated squared error. The Annals of Statistics, 712-736 (cited 11/14/19)

McLachlan, G. J. (2004) Discriminant Analysis and Statistical Pattern Recognition, Wiley-Interscience (cited 10/1/19)

Menafoglio, A., Grasso, M., Secchi, P. & Colosimo, B.M. (2018) Profile monitoring of probability density functions via simplicial functional PCA with application to image data, Technometrics, 60, 497-510  (cited 8/27/19)

Miao, D. (2015) Class-Sensitive Principal Components Analysis , UNC PhD Dissertation, https://cdr.lib.unc.edu/record/uuid:853d8c52-5b4a-4607-afff-9554b68bb6f5 (cited 10/10/19)

Miedema, J., Marron, J. S., Niethammer, M., Borland, D., Woosley, J., Coposky, J. & Thomas, N. E. (2012) Image and statistical analysis of melanocytic histology. Histopathology, 61(3), 436-444 (cited 9/12/19)

Milasevic, P., & Ducharme, G. R. (1987). Uniqueness of the spatial median. The Annals of Statistics, 15(3), 1332-1333 (cited 9/24/19)

Möttönen, J., & Oja, H. (1995). Multivariate spatial sign and rank methods. Journal of Nonparametric Statistics, 5(2), 201-213 (cited 10/29/19)

Oja, H. (2010). Multivariate nonparametric methods with R: an approach based on spatial signs and ranks. Springer Science & Business Media (10/29/19)

Owen, S. J. (1998) A survey of Mesh Generation Technology, http://www.imr.sandia.gov/papers/imr7/owen_meshtech98.ps.gz (cited 9/19/19)

Parzen, E. (2004) Quantile probability and statistical data modeling, Statistical Science, 19, 652-662.

Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statistica Sinica, 17(4), 1617 (cited 11/5/19)

Pearson, K. (1901) On Lines and Planes of Closest Fit to Systems of Points in Space, Philosophical Magazine, 2, 559-572 (cited 8/22/19, 10/1/19)

Pennec, X. (2015). Barycentric subspaces and affine spans in manifolds. In International Conference on Networked Geometric Science of Information (pp. 12-21). Springer, Cham. (cited 9/26/19)

Perou, C. M., Sorlie, T., Eisen, M. B., & Van De Rijn, M. (2000). Molecular portraits of human breast tumours. nature, 406(6797), 747 (cited 10/22/19, 11/19/19)

Pigoli, D., Pantelis, Z. H., Coleman, J. S. & Aston, J. A. D. (2015) The analysis of acoustic phonetic data: exploring differences in the spoken romance languages. arXiv preprint arXiv:1507.07587 (cited 8/20/19)

Pizer, S. M., Jung, S., Goswami, D., Vicory, J., Zhao, X., Chaudhuri, R., … & Marron, J. S. (2013). Nested sphere statistics of skeletal models. In Innovations for Shape Analysis (pp. 93-115). Springer Berlin Heidelberg (cited 9/24/19)

Pizer, S. M., & Marron, J. S. (2017). Object statistics on curved manifolds. In Statistical Shape and Deformation Analysis (pp. 137-164). Academic Press (cited 9/19/19)

Qiao, X., Zhang, H. H., Liu, Y., Todd, M. J., & Marron, J. S. (2010). Weighted distance weighted discrimination and its asymptotic properties. Journal of the American Statistical Association, 105(489), 401-414 (cited 10/22/19, 11/5/19)

Ramsay, J. O. & Silverman, B. W. (2002) Applied Functional Data Analysis, Springer, N.Y. ISBN 0-387-95414-7 (cited 8/20/19)

Ramsay, J. O. & Silverman, B. W. (2005) Functional Data Analysis, 2nd Edition, Springer, N.Y. ISBN 0-387-40080-X (cited 8/20/19)

Ramsay, J. O. (2005) Functional Data Analysis Web Site, http://psych.mcgill.ca/misc/fda/  (cited 8/20/19)

Rao, C. R. (1958). Some statistical methods for comparison of growth curves. Biometrics, 14(1), 1-17  (cited 10/1/19)

Rondonotti, V., Marron, J. S., & Park, C. (2007). SiZer for time series: a new approach to the analysis of trends. Electronic Journal of Statistics, 1, 268-289 (cited 11/19/19)

Rousseeuw, P. J., & Leroy, A. M. (2005). Robust regression and outlier detection (Vol. 589). John Wiley & Sons (cited 10/24/19)

Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323-2326 (cited 9/24/19)

Royer, J.-Y. and Chang, T. (1991) Evidence for relative motions between the Indian and Australian Plates during the last 20 m.y. from plate tectonic reconstructions: Implications for the deformation of the Indo-Australian Plate, Journal of Geophysical Research, 96(B7), 11,779–11,802, doi:10.1029/91JB00897 (cited 9/24/19)

Sarle, W. S., and Kuo, A. H. (1993), The MODECLUS Procedure, Technical Report P-256, SAS Institute Inc., Cary  (cited 11/19/19)

Schmitz, H. P. and Marron, J. S. (1992) Simultaneous estimation of several size distributions of  income, Econometric Theory, 8, 476-488  (cited 11/14/19)

Schölkopf, B., Smola, A., & Müller, K. R. (1997) Kernel principal component analysis. In International conference on artificial neural networks (pp. 583-588). Springer, Berlin, Heidelberg (cited 10/15/19)

Schölkopf, B., & Smola, A. J. (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press (cited 10/15/19)

Shabalin, A. A., Tjelmeland, H., Fan, C., Perou, C. M., & Nobel, A. B. (2008) Merging two gene-expression studies via cross-platform normalization. Bioinformatics, 24(9), 1154-1160 (cited 10/22/19)

Shabalin, A. A., & Nobel, A. B. (2013). Reconstruction of a low-rank matrix in the presence of Gaussian noise. Journal of Multivariate Analysis, 118, 67-76 (cited 12/03/19)

Shen, D., Shen, H., & Marron, J. S. (2013) Consistency of sparse PCA in high dimension, low sample size contexts. Journal of Multivariate Analysis, 115, 317-333 (cited 11/5/19)

Shen, D., Shen, H., Zhu, H., & Marron, J. S. (2016) The statistics and mathematics of high dimension low sample size asymptotics. Statistica Sinica, 26(4), 1747 (cited 11/5/19)

Shen, H., & Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. Journal of multivariate analysis, 99(6), 1015-1034 (cited 11/5/19)

Siddiqi, K. and Pizer, S. M. (2007) Medial Representations Mathematics Algorithms and Applications, Springer, New York (cited 9/24/19)

Srivastava, M. S., Katayama, S., & Kano, Y. (2013) A two sample test in high dimensional data. Journal of Multivariate Analysis, 114, 349-358 (cited 8/29/19)

Staudte, R. G. and Sheather, S. J. (2011) Robust Estimation and Testing, Wiley, New York (cited 10/24/19)

Stephens, M. A. (1974). EDF statistics for goodness of fit and some comparisons. Journal of the American statistical Association, 69(347), 730-737  (cited 9/12/19)

Talagrand, M. (1991). A new isoperimetric inequality and the concentration of measure phenomenon. In Geometric Aspects of Functional Analysis (pp. 94-124). Springer, Berlin, Heidelberg (cited 10/31/19)

Talagrand, M. (1995). Concentration of measure and isoperimetric inequalities in product spaces. Publications Mathématiques de l’Institut des Hautes Etudes Scientifiques, 81(1), 73-205 (cited 10/31/19)

Tenenbaum, J. B., De Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319-2323 (cited 9/24/19)

Toh, K. C., Todd, M. J. & Tutuncu, R. H. (1999) www.math.nus.edu.sg/~mattohkc/sdpt3.html  (cited 10/22/19)

Torgerson, W. S. (1952). Multidimensional scaling: I. Theory and method. Psychometrika, 17(4), 401-419 (cited 9/17/19)

Torgerson, W. S. (1958). Theory and methods of scaling (cited 9/17/19)

Tracy, C. A., & Widom, H. (1994). Level-spacing distributions and the Airy kernel. Communications in Mathematical Physics, 159(1), 151-174 (cited 11/12/19)

Tukey, J. W. (1977) Exploratory data analysis, Pearson, N.Y. ISBN 978-0201076165 (cited 8/20/2019)

Vapnik, V, N. (1982) Estimation of dependences based on empirical data, Springer (Russian version, 1979) (cited 10/15/19)

Vapnik, V. N. (1995) The nature of statistical learning theory, Springer (cited 10/15/19)

Venables, W.N. & Ripley, B.D. (2013) Modern applied statistics with S-PLUS. Springer Science & Business Media (cited 8/27/19)

Vidal, R., Ma, Y., & Sastry, S. (2016). Generalized principal component analysis, Springer (cited 10/15/19)

Wahba, G., Lin, Y., & Zhang, H. (1999). Generalized approximate cross validation for support vector machines, or, another way to look at margin-like quantities. (cited 10/22/19)

Wahba, G., Lin, Y., Lee, Y., & Zhang, H. (2003). Optimal properties and adaptive tuning of standard and nonstandard support vector machines. In Nonlinear estimation and classification (pp. 129-147) Springer, New York, NY (cited 10/22/19)

Wand, M. P., & Jones, M. C. (1994). Kernel smoothing. Crc Press (cited 11/14/19)

Wang, B., & Zou, H. (2016). Sparse Distance Weighted Discrimination. Journal of Computational and Graphical Statistics, 25(3), 826-838 (cited 10/22/19)

Wang, H. & Marron, J. S. (2007) Object oriented data analysis: sets of trees, Annals of Statistics, 35, 1849-1873  (cited 8/20/19)

Wei, S., Lee, C., Wichers, L., & Marron, J. S. (2015) Direction-projection-permutation for high dimensional hypothesis tests. Journal of Computational and Graphical Statistics, (cited 8/29/19, 11/5/19)

Weinstein, J. N., Collisson, E. A., Mills, G. B., Shaw, K. R. M., Ozenberger, B. A., Ellrott, K., … & Cancer Genome Atlas Research Network. (2013). The cancer genome atlas pan-cancer analysis project. Nature genetics, 45(10), 1113 (cited 12/03/19)

White, H. (2014). Asymptotic theory for econometricians. Academic press (cited 11/5/19)

Wilkinson, L., & Friendly, M. (2009). The history of the cluster heat map. The American Statistician, 63(2), 179-184, (cited 9/17/19)

Wilkinson, L. (2017). Visualizing Big Data Outliers through Distributed Aggregation. IEEE Transactions on Visualization and Computer Graphics (cited 10/24/19)

Xiong, J., Dittmer, D. P., & Marron, J. S. (2015). “Virus hunting” using radial distance weighted discrimination. The Annals of Applied Statistics, 9(4), 2090-2109 (cited 11/7/19)

Wright, F. A., Strug, L. J., Doshi, V. K., Commander, C. W., Blackman, S. M., Sun, L., … & Corey, M. (2011). Genome-wide association and linkage identify modifier loci of lung disease severity in cystic fibrosis at 11p13 and 20q13. 2. Nature genetics, 43(6), 539-546 (cited 10/29/19)

Yao, J., Zheng, S., & Bai, Z. D. (2015). Sample covariance matrices and high-dimensional data analysis. Cambridge University Press (cited 11/7/19)

Yata, K., & Aoshima, M. (2009) PCA consistency for non-Gaussian data in high dimension, low sample size context. Communications in Statistics—Theory and Methods, 38(16-17), 2634-2652 (cited 11/5/19)

Yata, K., & Aoshima, M. (2010a) Effective PCA for high-dimension, low-sample-size data with singular value decomposition of cross data matrix. Journal of multivariate analysis, 101(9), 2060-2077 (cited 11/5/19)

Yata, K., & Aoshima, M. (2010b) Intrinsic dimensionality estimation of high-dimension, low sample size data with d-asymptotics. Communications in Statistics—Theory and Methods, 39(8-9), 1511-1521 (cited 11/5/19)

Yata, K., & Aoshima, M. (2012) Effective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations. Journal of multivariate analysis, 105(1), 193-215 (cited 11/5/19)

Yata, K., & Aoshima, M. (2013) PCA consistency for the power spiked model in high-dimensional settings. Journal of multivariate analysis, 122, 334-354 (cited 11/5/19)

Young, G., & Householder, A. S. (1938). Discussion of a set of points in terms of their mutual distances. Psychometrika, 3(1), 19-22 (cited 9/17/19)

Yu, Q., Risk, B. B., Zhang, K., & Marron, J. S. (2017). JIVE integration of imaging and behavioral data. NeuroImage, 152, 38-49 )cited 12/03/19)

Yushkevich, P., Pizer, S. M., Joshi, S., and Marron, J. S. (2001) Intiutive, localized analysis of shape variability, Information Processing in Medical Imaging (IPMI), eds. Insana, M. F. and Leahy, R. M. 402-408 (cited 9/19/19)

Zhang, J., Heckman, N., Cubranic, D., Kingsolver, J. G., Gaydos, T., & Marron, J. S. (2014). Prinsimp. R Journal, 6(2) (cited 9/17/19)

Zhang, L., Lu, S., & Marron, J. S. (2015). Nested nonnegative cone analysis. Computational Statistics & Data Analysis, 88, 100-110 (cited 9/24/19)

Zhang, L., Marron, J. S., Shen, H., & Zhu, Z. (2007). Singular value decomposition and its visualization. Journal of Computational and Graphical Statistics, 16(4), 833-854 (cited 10/1/19)

Zhou, Y. H., & Marron, J. S. (2015). High dimension low sample size asymptotics of robust PCA. Electronic Journal of Statistics, 9(1), 204-218 (cited 11/5/19)

Zhou, Y. H., & Marron, J. S. (2016). Visualization of robust L1PCA. Stat, 5(1), 173-184 (cited 10/29/19)

 

Software:

Link to Marron’s Matlab Software (.zip file, expand to 4 directories, and put those in Matlab Path)

LungCancer2011.m, for Analysis of 2011 RNAseq Lung Cancer Data

counts, for 2011 RNAseq Lung Cancer Data

exonsMarron, for 2011 RNAseq Lung Cancer Data

Single .zip file with above 3, plus generated graphics