Projects
Simulation Studies on Optimal Experimental Design under Budget Constraints

Simulation Studies on Optimal Experimental Design under Budget Constraints

Keywords: Simulation Studies, Optimal Experimental Design

This study investigates optimal experimental designs under budget constraints through comprehensive simulation studies, focusing on clustered study designs where treatments are assigned at the cluster level. Using the ADEMP framework, we systematically examined how the number of clusters (G) and observations per cluster (R) influence estimation accuracy under various parameter configurations and cost constraints. Our simulations explored both normal and Poisson-distributed outcomes, incorporating key parameters including treatment effect size (beta), between-cluster variance (gamma squared), within-cluster variance (sigma squared), and the ratio of cluster to individual-level costs (c1/c2). Results show that while the mean parameters (alpha, beta) had negligible impact on optimal designs, variance components significantly influenced design efficiency. Higher between-cluster variance generally favored designs with more clusters and fewer observations per cluster, particularly for Poisson-distributed outcomes. The cost ratio emerged as a crucial determinant, with higher c1/c2 ratios leading to fewer but larger clusters. These findings provide practical guidance for researchers designing cluster-randomized trials under budget constraints, demonstrating how to optimize resource allocation between clusters and within-cluster observations based on distributional assumptions, variance components, and cost structures.

[Github][Report]

Analysis on the Cessation Effect on Smoking among Patients with MDD with the Combination Treatment of Behavioral Activation and Varenicline

Analysis on the Cessation Effect on Smoking among Patients with MDD with the Combination Treatment of Behavioral Activation and Varenicline

Keywords: Exploratory Data Analysis, Linear Regression Models

This study investigates the effectiveness of behavioral activation and varenicline for smoking cessation among individuals with current or past Major Depressive Disorder (MDD). Using data from a randomized, placebo-controlled, 2×2 factorial trial of 300 adult smokers, we employed regularized regression approaches (LASSO, Ridge, and Elastic Net) to examine treatment effects and their moderators. The analysis focused on identifying baseline characteristics that might influence treatment outcomes and exploring potential interactions between treatments and patient characteristics.

[Github][Report]

Marathon Performance Analysis: Impact of Different Weather Conditions on Runners

Marathon Performance Analysis: Impact of Different Weather Conditions on Runners

Keywords: Exploratory Data Analysis, Linear Regression Models, Data Visualization

Purpose: The purpose of this report is to analyze the impact of various weather conditions on marathon performance, focusing on how age, gender, and environmental factors influence race completion times. The study examines data from five major marathons to identify which factors have the most significant effects on marathon performance and whether these effects differ across age and gender.

Methods: The analysis employed both data visualization and linear regression models to examine the impact of inner factors (age, gender) and outer factors (weather variables such as temperature, humidity, wind, and solar radiation) on marathon completion times. Data from two datasets (Marathon Data and Course Record Data) were combined, and exploratory data analysis, including boxplots, scatter plots, and regression lines, was used to visualize trends. Linear regression models were then built to quantify the effects of each variable and their interactions, particularly focusing on the influence of age and gender across different weather conditions.

Results: The study found that weather conditions, particularly the Wet Bulb Globe Temperature (WBGT), significantly affect marathon performance, with higher WBGT values increasing completion times. Age and gender also influence results, with males generally outperforming females and younger runners performing better than older ones. However, the weather’s impact does not vary significantly between genders but does vary with age, with older runners being more affected.

[Github][Report]

Minimalistic Portfolio Template for Academics: A Modern and Easy Way

Minimalistic Portfolio Template for Academics: A Modern and Easy Way

Academic portfolios are essential for showcasing research, publications, and projects to potential employers, collaborators, and the academic community. However, creating a professional and visually appealing portfolio can be challenging, especially for individuals with limited web development experience. This project aims to provide a minimalistic portfolio template designed specifically for academics, researchers, and students.

The template features a clean and modern design, making it easy to customize and update with personal information, research interests, publications, and projects. Users can easily modified sections, add new pictures and texts. The template is built using Next.js, a popular React framework, and Tailwind CSS, a utility-first CSS framework, ensuring a responsive and mobile-friendly design. By providing a simple and intuitive solution for creating academic portfolios, this template empowers individuals to showcase their work effectively and professionally.

[Github][Demo]

Leukemia Cancer Treatment Survival Analysis from a Bayesian Perspective

Leukemia Cancer Treatment Survival Analysis from a Bayesian Perspective

Keywords: Bayesian Survival Analysis, MCMC, Cox Model, Stan

Leukemia accounts for a significant proportion of cancer cases and mortality worldwide, prompting extensive research to improve patient survival. Traditional survival analysis using the Cox proportional hazards model faces limitations with complex data structures and restrictive assumptions. This study extends the Cox model with a Bayesian framework, incorporating hierarchical priors to enhance model flexibility and reliability. Utilizing a leukemia treatment dataset from Kaggle, the Bayesian Cox model demonstrates superior parameter estimation, model fit, and inference compared to traditional methods. Through Markov Chain Monte Carlo (MCMC) sampling with Stan, the study estimates posterior distributions of parameters, revealing significant treatment effects on survival outcomes. The results indicate that the new treatment significantly improves survival time, supporting the model’s effectiveness in handling censored and uncensored data in survival analysis. The findings underscore the potential of Bayesian methods in clinical research for more accurate and informative survival predictions.

What I did:
  • Applied an adjusted Bayesian Cox model with hierarchical priors to improve the flexibility and reliability of survival analysis for leukemia cancer treatment data;
  • Utilized Markov Chain Monte Carlo (MCMC) algorithms in Stan to draw posterior samples and estimate parameters of the Bayesian Cox model, ensuring accurate parameter estimation and model fit;
  • Compared survival probabilities between treatment and placebo groups, demonstrating the effectiveness of the new treatment in improving patient survival times.

[Github][Thesis]

A Complete Introduction to ResNet

A Complete Introduction to ResNet

Keywords: Deep Learning, ResNet, Model Optimization

ResNet, short for Residual Network, is a deep learning model that was proposed by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun in 2015. ResNet is a type of Convolutional Neural Network (CNN) that is widely used in computer vision tasks, such as image classification, object detection, and image segmentation.

ResNet used to be a breakthrough in the field of deep learning. It solved the problem of accuracy degradation and verify the idea that the deeper network should have better performance. It was once widely used in various computer vision tasks and has been the basis of many other deep learning models. Although Transformer has become the most popular deep learning model in recent years, it is still necessary to understand the basic principles of ResNet.

This project provides a comprehensive introduction to ResNet, including its basic theory, structure, and significance in deep learning. It also discusses advanced applications and variations of ResNet, offering insights into its practical uses and potential improvements. By presenting experimental results and analysis, this project showcases the performance and advantages of ResNet over traditional neural networks.

What I did:
  • Provided a comprehensive overview of the basic theory behind Residual Networks (ResNet), including its structure and significance in deep learning;
  • Discussed advanced applications and variations of ResNet, offering insights into its practical uses and potential improvements;
  • Presented experimental results and analysis, showcasing the performance and advantages of ResNet over traditional neural networks.

[Github][Site]

Opioid Overdose Problems in the United States: Insights from Prescribing & Overdose Death Rates

Opioid Overdose Problems in the United States: Insights from Prescribing & Overdose Death Rates

Keywords: Data Mining, Data Visualization, Practical Data Analysis

The opioid crisis remains a significant public health challenge in the United States, characterized by high prescribing rates and overdose deaths. This analysis examines the trends and patterns of opioid prescribing and overdose deaths from 2015 to 2021. Utilizing datasets from Medicaid, the Opioid Treatment Program (OTP) Providers, and the National Vital Statistics System, we explore the geographical distribution of prescribing rates, the availability of treatment programs, and the specific types of opioids contributing to overdose deaths. The results reveal critical insights into the relationship between opioid prescribing practices and overdose mortality, highlighting the regional disparities and the effectiveness of treatment programs. This comprehensive study provides valuable information for policymakers and public health officials to develop targeted interventions to combat the opioid epidemic.

What I did:
  • Conducted a comprehensive analysis of Medicaid opioid prescribing rates, identifying significant variations in opioid prescriptions across different geographic regions and plan types;
  • Performed an in-depth study of data from opioid treatment program providers, focusing on the availability and distribution of treatment resources;
  • Analyzed the provisional drug overdose death counts to uncover emerging trends and patterns in opioid-related fatalities, highlighting key areas of public health intervention.

[Report]

A Comparative Analysis of Interval Estimation Techniques in Small Sample Research

A Comparative Analysis of Interval Estimation Techniques in Small Sample Research

Keywords: Wald Interval, Agresti-Coull Interval, Simulation

In statistical analysis, estimating the interval of a binomial proportion has great importance, particularly in fields ranging from clinical trials to market research. In practice, we may encounter different estimation methods. We compare two different interval estimators, the Wald interval and the Agresti-Coull interval, for an unknown binomial proportion in this report. By doing simulations, we obtained the performance of those two estimators by comparing them to the nominal/stated coverage of 95% and found the Agresti-Coull interval is better than the Wald interval. We analyzed the reasons for that result from a Bayesian interpretation.

What I did:
  • Conducted a comprehensive statistical analysis, focusing on Wald and Agresti-Coull Intervals, to evaluate their efficacy in various scenarios;
  • Utilized simulations to compare interval performances against a 95% nominal coverage, emphasizing practical applications in clinical trials and market research;
  • Given a unique interpretation from the Bayesian perspective.

[Report]