Statistics and Machine Learning Academic Year 2024 – 2025 Jump To: Jump To: General Information Address 26 Prospect Ave Phone 609-258-2047 Website Center for Statistics and Machine Learning Program Offerings: Certificate Director of Graduate Studies: Jonathan Cohen Graduate Program Administrator: Susan Johansen Overview As the field of data science grows and opens new opportunities in many different disciplines, Princeton University’s CSML has kept apace by fostering cutting-edge research and engaging in deep collaboration with faculty members, centers and departments across campus and with outside practitioners in varied industries. Graduate students would be well positioned at Princeton due to the center’s strong interdisciplinary nature and the wide array of exciting research happening on campus. Princeton graduate students can now earn a Graduate Certificate in Statistics and Machine Learning as a complement to their departmental graduate studies. This certificate is overseen by CSML. There are three requirements in completing the certificate: completing appropriate course work, engaging in research involving statistics or machine learning, and participating in the CSML graduate seminar. (See link on the left-hand menu.)Data-driven research increasingly involves large-scale complex data that needs to be analyzed using innovative methods from the fields of statistics and machine learning. This approach, coupled with advancing data science’s underlying methods and algorithms, has become an essential component of modern scientific discovery. Princeton University is committed to playing a vital role in preparing students to lead in these areas, and the certificate should deepen and enhance students’ understanding and application of data science techniques. Program Offerings Certificate Program Offering: Certificate Program description The Graduate Certificate Program in Statistics and Machine Learning is designed to formalize the training of students who contribute to or make use of statistics and machine learning as a significant part of their degree program. In addition, it serves to recognize the accomplishments of graduate students across the University who acquire additional training in statistics and machine learning, going beyond the requirements of their own degree programs. This certificate program is open to Princeton University students currently enrolled in a Ph.D. or master’s program at the University. Students must enroll by completing an online application form on the CSML website. The application will include a tentative plan and timeline for completing all the course requirements. Students are encouraged to sign up as soon as possible, and no later than one semester prior to graduation. Because Ph.D. students who have entered Dissertation Completion Enrollment (DCE) status are not eligible to enroll in courses, Ph.D. students must enroll in the CSML graduate certificate program in time to complete the course requirements while they are still in their regular degree program length.For enrollment, please use this form: Graduate Certificate Enrollment FormFor questions, contact us at [email protected]For students enrolled in a graduate degree program with a thesis or dissertation requirement, the certificate is comprised of three components: (a) completion of three appropriate graduate courses, (b) a relevant research contribution, and (c) a research seminar. We expect that the core courses can be taken as graduate electives, in partial fulfillment of the various course requirements in home departments, and that item (b) will naturally form as part of the student’s thesis or dissertation. For non-thesis master’s students, item (b) is replaced by an appropriate technical presentation. The certificate will appear on a student’s official transcript after all requirements for the certificate have been fulfilled and a graduate degree has been awarded. Students who earn the certificate will also be recognized on the CSML website. Courses Take for credit and receive an average GPA of B+ (3.3) or better in three courses from the approved list that has three categories: core machine learning, core statistics and probabilistic modeling, and electives. One course must be selected from each category. With the permission of the certificate director, the elective course can be selected from a core category provided it does not significantly overlap with the other course selected from that category. At least one of the three courses must be outside the student’s home department and at most one course can be below the 500 level. The core curriculum is intended to provide training in the foundations of statistics and machine learning while ensuring that certificate students have some breadth across the core of statistics and machine learning. A list of approved core courses in the two areas is included below. In addition, a certificate student selects the third course from a listed set of elective courses that expands on the core courses. These electives delve more deeply into supporting material (e.g., optimization) or focus on applications in a specific domain.Students may not count courses that are used to satisfy core requirements in their home department concentration toward this certificate, however, they may count up to two electives that were taken for their degree requirements. Additional requirements Seminar series (SML 510) The CSML graduate seminar, SML 510 serves as a venue for discussing current methods and results and the integration of different research approaches to data analysis. Attendance and participation in the CSML graduate seminar for at least one semester is required. It helps teach students how to communicate technical ideas to a broad audience and encourages the development of skills for interacting with other students, postdoctoral fellows, and faculty who are investigating data analysis problems. It also serves to build a supporting community of young scholars with shared interests.Original ResearchFor students completing a thesis or dissertation as part of their degree, the thesis or dissertation should include a significant component making contributions to statistics or machine learning, or rigorous use of such methods in an application domain. To ensure that an important component of a Ph.D. student's dissertation involves either rigorous data analysis and/or mathematical or computational modeling of data or machine learning problems, one of the dissertation readers or FPO committee members must be a participating graduate certificate faculty member. This reader or committee member is required to send either a letter or the reader's report to the program director to verify that the dissertation satisfies this requirement. Master's students who complete a thesis follow the same requirement.For non-thesis master's degree students, the original research requirement can be satisfied by completing a relevant graded research project while enrolled in FIN 561 - Master's Project II, or through applied research performed in a professional setting as part of an approved internship. Such research performed in a professional setting must be submitted as a technical presentation and is reviewed for approval by the certificate director.The original research requirement can also be met by any student through a publishable research paper that is approved by the certificate director. Faculty Director Sarah-Jane Leslie Executive Committee Ryan P. Adams, Computer Science, <i>ex officio</i> Sarah-Jane Leslie, Philosophy Peter M. Melchior, Astrophysical Sciences Brandon M. Stewart, Sociology Ellen Zhong, Computer Science Associated Faculty Sigrid M. Adriaenssens, Civil and Environmental Eng Amir Ali Ahmadi, Oper Res and Financial Eng Sanjeev Arora, Computer Science Yacine Aït-Sahalia, Economics Matias D. Cattaneo, Oper Res and Financial Eng Danqi Chen, Computer Science Jonathan D. Cohen, Psychology Jia Deng, Computer Science Jianqing Fan, Oper Res and Financial Eng Jaime Fernandez Fisac, Electrical & Comp Engineering Filiz Garip, Sociology Tom Griffiths, Psychology Boris Hanin, Oper Res and Financial Eng Elad Hazan, Computer Science Bo E. Honoré, Economics Niraj K. Jha, Electrical & Comp Engineering Chi Jin, Electrical & Comp Engineering Jason Matthew Klusowski, Oper Res and Financial Eng Michal Kolesár, Economics Sanjeev R. Kulkarni, Electrical & Comp Engineering Jason D. Lee, Electrical & Comp Engineering Naomi E. Leonard, Mechanical & Aerospace Eng Sarah-Jane Leslie, Philosophy John B. Londregan, Schl of Public & Int'l Affairs Anirudha Majumdar, Mechanical & Aerospace Eng William A. Massey, Oper Res and Financial Eng Reed M. Maxwell, Civil and Environmental Eng Peter M. Melchior, Astrophysical Sciences Ulrich K. Mueller, Economics Karthik Narasimhan, Computer Science Jonathan W. Pillow, Psychology H. Vincent Poor, Electrical & Comp Engineering Yuri Pritykin, Computer Science Olga Russakovsky, Computer Science Matthew J. Salganik, Sociology Amit Singer, Mathematics Mona Singh, Computer Science Bartolomeo Stellato, Oper Res and Financial Eng Brandon M. Stewart, Sociology John D. Storey, Integrative Genomics Michael A. Strauss, Astrophysical Sciences Rocío Titiunik, Politics Jeroen Tromp, Geosciences Olga G. Troyanskaya, Computer Science Mark W. Watson, Schl of Public & Int'l Affairs Michael A. Webb, Chemical and Biological Eng For a full list of faculty members and fellows please visit the department or program website. Permanent Courses Courses listed below are graduate-level courses that have been approved by the program’s faculty as well as the Curriculum Subcommittee of the Faculty Committee on the Graduate School as permanent course offerings. Permanent courses may be offered by the department or program on an ongoing basis, depending on curricular needs, scheduling requirements, and student interest. Not listed below are undergraduate courses and one-time-only graduate courses, which may be found for a specific term through the Registrar’s website. Also not listed are graduate-level independent reading and research courses, which may be approved by the Graduate School for individual students. COS 513 - Foundations of Probabilistic Modeling (also SML 513) A study of the essential tools for analyzing the vast amount of data that have become available in modern scientific research. Mathematical foundations of the field will be studied, along with the methods underlying the current state of the art. Probabilisitc graphical models and a unifying formalism for descrtibing and extending previous methods from statistics and engineering will be considered. Prerequisites COS402 or COS424. Undergraduates by permission only. PHI 543 - Machine Learning: A Practical Introduction for Humanists and Social Scientists (also SML 543) Machine learning - especially deep learning - is opening new horizons for research in the humanities and social sciences. This course offers a practical introduction to deep learning for graduate students, without assuming calculus/linear algebra or prior experience with coding. By the end of the course, students are able to code a variety of models themselves, including language and image recognition models, and gain an appreciation for the uses of ML in the humanities/social sciences. The course thus aims to support graduate students' professional development and is correspondingly offered in partnership with GradFUTURES. SML 505 - Modern Statistics (also AST 505) The course provides an introduction to modern statistics and data analysis. It addresses the question, "What should I do if these are my data and this is what I want to know"? The course adopts a model based, largely Bayesian, approach. It introduces the computational means and software packages to explore data and infer underlying parameters from them. An emphasis will be put on streamlining model specification and evaluation by leveraging probabilistic programming frameworks. The topics are exemplified by real-world applications drawn from across the sciences. SML 510 - Graduate Research Seminar This course is for graduate students enrolled in the CSML Graduate Certificate Program and is part of the certificate requirements. Students enrolled in the certificate must enroll, attend and present their research during at least one semester. Each week features a presentation by a student, invited faculty or external visitors. All students are required to read materials prior to the workshop and come prepared to engage in conversation. Each week a student presents, a second student introduces the speaker and gives background on the work and a third student moderates the post-presentation discussion. SML 515 - Topics in Statistics and Machine Learning (also SOC 516) The course provides an introduction to modern data analysis and data science. It addresses the central question, "what should I do if these are my data and this is what I want to know"? The course covers basic and advanced statistical descriptions of data. It also introduces the computational means and software packages to explore data and infer underlying structural parameters from them. The topics are exemplified by real-world applications. Prerequisites are linear algebra, multi-variate analysis, and a familiarity with basic statistics and programming (ideally in python).