ALS Clinical Data ETL & Harmonization (Answer ALS, ALS TDI, CPath)

Directed the ETL of ALS clinical and survey data into the OMOP Common Data Model, harmonizing data across multiple organizations. Built automated transformation scripts, implemented terminology mapping workflows (with clinician input via Usagi), and conducted rigorous quality checks. Produced documentation to orient both technical and non-technical users, ensuring the resulting harmonized datasets could support broad reuse through the NeuroMine portal.

Specimen Management System Migration (TissueMetrix → OpenSpecimen)

Led the migration of 250,000+ biospecimens and 14,000 participants to a modern specimen management system. Oversaw vendor evaluation, system customization, and technical preparation for migration. Post-migration, optimized workflows and data structures to ensure research staff could seamlessly access, manage, and analyze specimen data with improved reliability and functionality.

COVID-19 Biorepository Data Infrastructure

Developed and deployed process automations to handle a surge of biospecimen data during the pandemic, replacing error-prone manual entry with reliable digital workflows. Designed electronic tracking systems to monitor COVID test results, specimen metadata, and later vaccine cold-chain logistics, ensuring that large-scale testing and vaccination efforts were supported by accurate, real-time data reporting.

Biorepository Informatics Expansion Planning

Served on an informatics working group to design a next-generation data environment for the University of Arizona Biorepository. Proposed systems for integrating biospecimen records, electronic medical records, genomic data, and imaging into a HIPAA-compliant computing environment, laying the groundwork for interoperable, multi-modal biomedical research platforms.

Computational Medicine & Informatics Lab (University of Arizona)

Conducted large-scale research analysis across clinical and neuroimaging datasets. Preprocessed Alzheimer’s fMRI scans for deep learning pipelines, processed OMOP medical records from 10M+ patients using Spark SQL, and built machine learning models to predict COVID-19 outcomes. Developed phenotyping algorithms to classify patient treatments, demonstrating expertise in turning raw biomedical data into structured, analyzable formats and predictive models.