Workflow Automation & Data Architecture
I design the digital backbone of research through sophisticated ETL pipelines and FAIR-compliant data structures. I specialize in modeling complex scientific datasets to ensure they are optimized for high-performance engineering, analysis, and cloud deployment.
Scientific ETL & Pipeline Orchestration
I build robust data pipelines using Apache Airflow, Prefect, and Dagster. My expertise lies in automating the extraction, transformation, and loading of massive scientific datasets—including large-scale image processing and web-scraped research data.
High-Performance Data Modeling
I re-engineer existing data systems (such as the COSMIC database) to bypass performance bottlenecks, enabling complex SQL querying and deep data mining. I design both SQL and NoSQL architectures (PostgreSQL, MongoDB, Cassandra) tailored for the unique requirements of "Big Data" in the life sciences.
Infrastructure as Code (IaC)
I apply DevOps principles to research environments, utilizing Docker, Kubernetes, and Terraform to ensure that scientific workflows are containerized, scalable, and deployable across cloud-based infrastructures.