Featured Content
Data Science & Analytics Projects
-
Report to Congress, Vehicle Safety Recall Completion Rates 2021
A report on trends in vehicle safety recall completion rates and identifying risk factors associated with lower completion rates. (Recall completion rates indicate the share of recalled vehicles that have been repaired or otherwise remedied.)
I conducted the modeling and drafted sections IIIc, IIId, V, and VI. In addition to the Williams-adjusted fixed-effects logistic regression described in the report, I also built
decision trees and generalized linear models, using LASSO, stepwise selection, and multi-fold cross-validation. My champion model was implemented by NHTSA to identify low-performing recalls for follow-up.
-
An Analysis of Recent Improvements to Vehicle Safety
A study of improvements to vehicle safety, using negative binomial, log-linear, logistic, generalized logistic, and cumulative logistic models.
My study showed that improvements collectively prevented over 700,000 crashes in a single year, as well as preventing or mitigating over one million injuries.
-
Kaggle Predicting Optimal Fertilizers Challenge
Data science solution developed for a Kaggle prediction challenge involving agricultural fertilizer optimization, illustrating my skills in feature engineering, model development, and evaluation on complex real-world data.
Machine Learning Deployments & APIs
-
Fraud Detection Supervised Learning Blog
In this series of posts, I explore advanced machine learning techniques in fraud detection, focusing on business objectives, visualizations, model deployments, and the underlying math.
I investigate optimizing investigative resources supplementing hourly model runs targeting a given precision and recall. I also build two deployments with a tuned XGBoost model: a Databricks interactive dashboard for monitoring
fraud analytics and a Streamlit-based fraud detection API on Hugging Face Spaces.
-
Databricks Fraud Detection Dashboard
An interactive dashboard built on Databricks for monitoring fraud analytics, featuring real-time data visualizations and model performance tracking.
-
Fraud Detection API on Hugging Face Spaces
A Streamlit API application deployed on Hugging Face Spaces allowing users to input transaction features and generate fraud predictions using a deployed XGBoost model.
Statistical Analyses
-
Designing Samples to Satisfy Many Variance Constraints, 2001 FCSM
This paper presents and proves an algorithm that finds optimal sample sizes meeting nested univariate constraints of the coefficients of variation of a Horvitz-Thompson estimator under stratified simple random sampling.
-
Estimating the Lives Saved by Safety Belts and Air Bags, 2003 ESV
This paper, which was presented at the 2003 Enhanced Safety of Vehicle International Conference, describes changes to the calculations of the lives saved by safety belts and air bags.
It also discusses alternative methods for attributing a life saved to the safety belt or the air bag, for occupants protected by both devices.
-
NHTSA's Review of the National Automotive Sampling System, Report to Congress
I conducted the analysis in Chapter 8 of this report, which calculates the recommended numbers of investigations, crash reports, and data collection sites to use for NHTSA's two premier crash databases (now called the Crash Report Sampling System and
Crash Investigation Sampling System). This chapter, which I drafted, also presents the analyses that could be conducted and conclusions that could be reached by the recommended sample sizes.
-
The Relationship between Occupant Compartment Deformation and Occupant Injury
I cowrote this report with a NHTSA engineer, which analyzes the relationship between occupant compartment deformation and injury to the occupant.
Mathematics Research
While the results of these papers do not directly apply to data science, statistics, or machine learning, they showcase my mathematical acumen and my ability to approach and communicate complex mathematical ideas with rigor and clarity.
Resume