Jump to navigation
Research across the disciplines increasingly requires the integration of data science, statistics and machine learning to make cutting-edge advancements. Princeton University is dedicated to playing a vital role in preparing students to lead in these areas, and the Center for Statistics and Machine Learning (CSML) is a campus focal point for fulfilling this commitment.
The center’s mission is threefold: to foster and support a community of scholars addressing the challenges of data-driven research; to educate students in the foundations of modern data science including computation, machine learning, and statistics, along with specific application domains; and to develop innovative methodologies for extracting information from data.
In addition, the center supports innovations in the theoretic foundations of data science, including advanced algorithms for big-data problems, machine learning, optimization, and statistics.
Established in July 2014, the Center for Statistics and Machine Learning continues Princeton University’s rich and influential history in data science. Pioneers such as Samuel Wilks, John Tukey, William Feller, Alonzo Church, Alan Turing, and John Von Neumann played key roles in advancing the use of statistics, probabilistic models, and computers to solve real world problems. The Cooley–Tukey FFT algorithm (1965), and the initiation of the ImageNet database (2009) are two prominent examples of Princeton’s important contributions to data science.
The Graduate Certificate in Statistics and Machine Learning is designed to formalize the training of students who both contribute to, or make use of, statistics and machine learning as a significant part of their research. In addition, it serves to recognize the accomplishments of graduate students across the University who go beyond the requirements of their own degree programs to acquire additional training in statistics and machine learning.
The graduate certificate is comprised of three components: (a) completion of three appropriate graduate courses, (b) a relevant research contribution, and (c) a research seminar. We expect that the core courses can be taken as graduate electives, in partial fulfillment of the various course requirements in home departments, and that the research component will naturally form part of the student's thesis or other research paper. Each enrolled student who completes the requirements will be awarded a certificate and recognized on the CSML website.
This certificate program is open to Princeton students currently enrolled in a PhD or Masters program at the University. Students may enroll by completing an online application form on the CSML website. The application will include a tentative plan and timeline for completing all the course requirements. Students are encouraged to sign up as soon as possible, preferably in their second or third year, but no later than one semester prior to graduation.The application will include a tentative plan and timeline for completing all the course requirements. Students are encouraged to sign up as soon as possible, preferably in their second or third year, but no later than one semester prior to graduation.
Students are required to take a total of three courses from approved lists and earn at least a B+ for each course: one Core Machine Learning, one Core Statistics and Probabilistic Modeling, and one Elective. With the permission of the program director, the elective course can be selected from a core category provided it does not significantly overlap with the other course selected from that category. At least one of the three courses must be outside the student's home department and at most one course can be below the 500 level. Students may not count courses that are used to satisfy core requirements in their home department concentration toward this certificate, however they may count up to two electives that were taken for their degree requirements.
List of Approved Courses
Core Machine Learning
COS 402: Machine Learning and Artificial Intelligence
ELE 535: Machine Learning and Pattern Recognition
COS 424: Fundamentals of Machine Learning
COS 485: Neural Networks: Theory and Applications
COS 511: Theoretical Machine Learning
Core Statistics and Probabilistic Modeling
ECO 513: Time Series Econometrics
ECO 519: Advanced Econometrics: Nonlinear Models
ORF 524: Statistical Theory and Methods
COS 513: Foundations of Probabilistic Modeling
ELE 530: Estimation and Detection
POL 572: Quantitative Analysis II
QCB 508: Foundations of Applied Statistics and Data Science
APC 527: Random Graphs and Networks
ELE 477: Kernel-Based Machine Learning
ORF 505: Statistical Analysis of Financial Data
POL 573: Quantitative Analysis III
POP 507: Generalized Linear Statistical Models
ORF 522: Linear and Nonlinear Optimization
ECO 515: Econometric Modeling
ELE 538B: Sparsity, Structure, and Inference
ELE 538C: Large-Scale Optimization in Data Science
MAT585/APC520: Mathematical Analysis of Massive Data Sets
ORF 523: Convex and Conic Optimization
ORF 525: Statistical Learning & Nonparametric Estimation
POL 574: Quantitative Analysis IV
SOC 504: Advanced Social Statistics
To ensure that an important component of the student's thesis involves either rigorous data analysis, and/or mathematical or computational modeling of data or machine learning problems, one of the thesis or research paper readers must be a participating graduate certificate faculty member (see CSML website for list). This reader will be required to either send a letter, or their reader's report, to the program director to verify that the research satisfies this requirement.
The CSML graduate seminar course serves as a venue for reporting current results and discussing the integration of different research approaches to data analysis. Enrollment, attendance and participation in the CSML graduate seminar for at least one semester helps teach students how to communicate their research to a broad audience, and encourages the development of skills for interacting with other students, postdoctoral fellows, and faculty who are investigating data analysis problems. It also serves to build a supportive community of young scholars with shared interests.
Peter J. Ramadge, Electrical Engineering
Jianquing Fan, Operations Reserach and Financial Engineering
Elad Hazan, Computer Science
Kosuke Imai, Politics
Jonathan Pillow, Psychology
Matthew Salganik, Sociology
H. Sebatian Seung, Computer Science
Christopher Sims, Economics
Amit Singer, Mathematics
Mona Singh, Computer Science
John D. Storey, Genomics
Olga Troyanskaya, Genomics