Ph.D. Student, Computer Science,
About Me
I'm a Ph.D. student in Computer Science at Carnegie Mellon University, where I work with Prof. Zico Kolter on steering frontier generative AI models toward greater safety, robustness, and efficiency. I work at the intersection of theoretical insight and practical scalability. My research connects the mathematics of high-dimensional learning (differential geometry, stochastic differential equations, optimal transport) with methods for training and steering generative models (spanning pretraining, fine-tuning, reinforcement learning, and controlled decoding), and brings these ideas to life at scale using PyTorch, JAX, CUDA, Triton and modern distributed systems (DeepSpeed, FSDP, Megatron).
If you're interested in discussing new ideas or collaborating, feel free to drop me an email or schedule a meeting with me here!
Education
-
Ph.D. in Computer Science.
Aug 2021 - Current
Carnegie Mellon University Pittsburgh, PA
-
M.S. in Statistics.
Mar 2015 - Jun 2017
Stanford University Palo Alto, CA
-
B.S. in Computer Science.
Sep 2013 - Jun 2017
Stanford University Palo Alto, CA
Conference Publications
(For the most up to date list look at my Google Scholar page)
-
2025
-
Antidistillation Sampling.
Y. Savani*, A. Trockman*, Z. Feng, Y. E. Xu, A. Schwarzschild, A. Robey, M. Finzi, J. Z. Kolter.
Advances in Neural Information Processing Systems (NeurIPS) 2025.
-
Safety Pretraining: Toward the Next Generation of Safe AI.
P. Maini, S. Goyal, D. Sam, A. Robey, Y. Savani, Y. Jiang, A. Zou, M. Fredrikson, Z. C. Lipton, J. Z. Kolter.
Advances in Neural Information Processing Systems (NeurIPS) 2025.
-
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning.
Y. E. Xu*, Y. Savani*, F. Fang, J. Z. Kolter.
ArXiv Preprint.
-
-
2024
-
Diffusing Differentiable Representations.
Y. Savani, M. Finzi, J. Z. Kolter.
Advances in Neural Information Processing Systems (NeurIPS) 2024.
-
-
2022
-
Deep Equilibrium Optical Flow Estimation.
S. Bai, Z. Geng, Y. Savani, J. Z. Kolter.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022.
-
-
2021
-
NAS-Bench-x11 and the Power of Learning Curves.
S. Yan, C. White, Y. Savani, F. Hutter.
Advances in Neural Information Processing Systems (NeurIPS) 2021.
-
Exploring the Loss Landscape in Neural Architecture Search.
C. White, S. Nolen, Y. Savani.
Conference on Uncertainty in Artificial Intelligence (UAI). PMLR, 2021.
-
BANANAS: Bayesian Optimization with Neural Architectures for Neural Architecture Search.
C. White, W. Neiswanger, Y. Savani.
Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) 2021.
-
-
2020
-
Intra-Processing Methods for Debiasing Neural Networks.
Y. Savani, C. White, N. S. Govindarajulu.
Advances in Neural Information Processing Systems (NeurIPS) 2020.
-
A Study on Encodings for Neural Architecture Search.
C. White, W. Neiswanger, S. Nolen, Y. Savani.
Advances in Neural Information Processing Systems (NeurIPS) 2020.
-
Workshop Publications
-
2020
-
A Study on Encodings for Neural Architecture Search.
C. White, W. Neiswanger, S. Nolen, Y. Savani.
-
Local Search is State of the Art for Neural Architecture Search Benchmarks.
C. White, S. Nolen, Y. Savani.
-
-
2019
-
Neural Architecture Search via Bayesian Optimization with a Neural Network Prior.
C. White, W. Neiswanger, Y. Savani.
-
Deep Uncertainty Estimation for Model-based Neural Architecture Search.
C. White, W. Neiswanger, Y. Savani.
-
Experience
-
I worked on improving RL methods to fine tune flow based models.
-
I performed research in the AutoML / NAS and Fairness in ML domains. We wrote five papers based on this work.
-
I designed and implemented scalable deep learning architectures including LSTM forecasting models, AutoML / NAS regression and classification models, GAN data augmentation models, and VAE anomaly detection models among others.
-
I worked on improving contemporary statistical learning and applied graph theory models for natural language applications. The machine intelligence algorithms I developed help decipher global news data.
-
Research Assistant at Andrew Ng's Lab
Stanford University, CA
Jul 2015 - Sep 2015
I worked on the system infrastructure and CUDA code for a hybrid CNN and LSTM architecture designed to instantly detect and semantically segment images and videos with multiple stimuli.
-
Cofounder (CTO) at Ebotic, Inc.
Palo Alto, CA
Jul 2014 - Dec 2015
I worked with an international team to develop an intelligent drone platform that applied advanced flight technologies, SLAM, and deep learning for improved flight stability and awareness.
-
Research Assistant at Sebastian Thrun's Lab
Stanford University, CA
Jun 2014 - Aug 2014
I improved the performance of machine learning algorithms for smart home applications by adding thermal image descriptors into a robotics pipeline.
Teaching
-
2025
-
2023
-
2020
-
2019
Skills
Commonly used skills are highlighted.
-
Computer Languages
Python Julia C / C++ CUDA Javascript R Java MATLAB Haskell LaTeX SQL NoSQL HTML5 / CSS3 -
Frameworks / Tools
PyTorch JAX TensorFlow NumPy Matplotlib Jupyter SpaCy Nltk AllenNLP Linux AWS GCP Docker Git Cursor Codex Visual Studio Code Vim uv React Redux Webpack Flask Blender Photoshop Figma -
Other Interests
Analysis algebra topology incentive theory economics cognitive science neuroscience videography scuba diving rock climbing fitness