Statistics and Machine Learning
Research across the disciplines increasingly requires the integration of data science, statistics and machine learning to make cutting-edge advancements. Princeton University is dedicated to playing a vital role in preparing students to lead in these areas, and the Center for Statistics and Machine Learning (CSML) is a campus focal point for fulfilling this commitment.
The center’s mission is threefold: to foster and support a community of scholars addressing the challenges of data-driven research; to educate students in the foundations of modern data science including computation, machine learning, and statistics, along with specific application domains; and to develop innovative methodologies for extracting information from data.
The center supports and collaborates on research and teaching that combine insights from computation, machine learning, and statistics with specific application domains. To encourage a flow of ideas, CSML welcomes connections with faculty, departments, centers and institutes across the Princeton campus. In addition, the center supports innovations in the theoretic foundations of data science, including advanced algorithms for big-data problems, machine learning, optimization, and statistics.
Established in July 2014, the Center for Statistics and Machine Learning continues Princeton University’s rich and influential history in data science. Pioneers such as Samuel Wilks, John Tukey, William Feller, Alonzo Church, Alan Turing, and John Von Neumann played key roles in advancing the use of statistics, probabilistic models, and computers to solve real world problems. The Cooley–Tukey FFT algorithm (1965), and the initiation of the ImageNet database (2009) are two prominent examples of Princeton’s important contributions to data science.
The Graduate Certificate in Statistics and Machine Learning is designed to formalize the training of students who contribute to or make use of statistics and machine learning as a significant part of their research. In addition, it serves to recognize the accomplishments of graduate students across the University who go beyond the requirements of their own degree programs to acquire additional training in statistics and machine learning.
The certificate is administered by the Center for Statistics and Machine Learning and under the direction of the certificate director (currently, Prof. Peter Ramadge). The academic program coordinator for the Center for Statistics and Machine Learning provides the administrative support.
The certificate program is open to Princeton University students currently enrolled in a Ph.D. or a master’s program at the University. Students may enroll by completing an online application form on the CSML website. The application includes a tentative plan and timeline for completing the requirements. We encourage eligible students to sign up as soon as possible, preferably in their second or third year, but no later than one semester prior to graduation. Because Ph.D. students who have entered Dissertation Completion Enrollment (DCE) status are not eligible to enroll in courses, Ph.D. students must enroll in the CSML graduate certificate program in time to complete the course requirements while they are still in their regular degree program length. Ph.D. students are required to provide an adviser or reader for their research component who must be a participating faculty member in the program.
Upon graduation, enrolled students who complete the certificate requirements have the certificate recorded on their official transcript and are recognized on the CSML website.
The core curriculum is intended to provide training in the foundations of statistics and machine learning while ensuring that certificate students have some breadth across the core of statistics and machine learning.
Students are required to take for credit and earn a grade of B+ or better in three courses from an approved list that has three categories: core machine learning, core statistics and probablistic modeling, and electives. One course must be selected from each category. With the permission of the program director, the elective course can be selected from a core category provided it does not significantly overlap with the other course selected from that category. At least one of the three courses must be outside the student's home department and at most one course can be below the 500 level.
Guiding Principles for Course Selection:
1. If your department requires degree students to take the same set of core courses, then none of these courses can count towards the CSML certificate.
2. If your department requires degree students to take a certain number of core courses distributed across designated areas, then none of the courses you select to meet this requirement can count towards the CSML graduate certificate.
3. If your department requires you to select a given number of courses as core courses that indicate preparation for research, none of these courses can count towards the CSML certificate.
4. Beyond “core courses”: if your department requires a designated number of electives, you may use these electives to meet the course requirement for the SML certificate.
Additionally, prior to graduation, students must enroll in and complete the requirements of the CSML graduate seminar series (SML 510) for at least one semester. The CSML graduate seminar, SML 510, serves as a venue for discussing current methods and results and the integration of different research approaches to data analysis. Attendance and participation in the CSML graduate seminar for at least one full semester is required. The seminar helps teach students how to communicate technical ideas to a broad audience and encourages the development of skills for interacting with other students, postdoctoral fellows, and faculty who are investigating data analysis problems. It also serves to build a supporting community of young scholars with shared interests.
For students completing a thesis or dissertation as part of their degree, the thesis or dissertation should include a significant component making contributions to statistics or maching learning, or rigorous use of such methods in an application domain. To ensure that an important component of a Ph.D. student's dissertation involves either rigorous data analyis and/or mathematical or computational modeling of data or machine learning problems, one of the dissertation readers or FPO committee members must be a participating graduate certificate faculty member. This reader or committee member is required to send either a letter or the reader's report to the program director to verify that the dissertation satisfies this requirement. Master's students who complete a thesis follow the same requirement.
For non-thesis master's degree students, the original research requirement can be satisfied by completing a relevant graded research project while enrolled in FIN 561 - Master's Project II, or through applied research performed in a professional setting as part of an approved internship. Such research performed in a professional setting must be submitted as a technical presentation and is reviewed for approval by the certificate director.
The original research requirement can also be met by any student through a publishable research paper that is approved by the certificate director.
Ryan P. Adams
Sits with Committee
Daisy Yan Huang
Courses listed below are graduate-level courses that have been approved by the program’s faculty as well as the Curriculum Subcommittee of the Faculty Committee on the Graduate School as permanent course offerings. Permanent courses may be offered by the department or program on an ongoing basis, depending on curricular needs, scheduling requirements, and student interest. Not listed below are undergraduate courses and one-time-only graduate courses, which may be found for a specific term through the Registrar’s website. Also not listed are graduate-level independent reading and research courses, which may be approved by the Graduate School for individual students.