Research
My research focuses on 3D human avatar reconstruction, 3D Gaussian splatting, and vision–language reasoning for human and hand understanding. I am especially interested in approaches that remain robust under partial observations — for example, monocular video where the body is only partially visible — by combining geometric priors (SMPL-X / FLAME), differentiable rendering, and diffusion-based generative completion.
Research interests
- 3D Gaussian Splatting & Neural Rendering — visibility-aware optimization, real-time rendering, memory-efficient avatars.
- 3D Human Avatar Reconstruction & Animation — animatable avatars from monocular video across full-body, upper-body, and head-only settings.
- Diffusion Models for 3D Generation — diffusion-based texture completion and view synthesis for unobserved regions.
- Parametric Body Models — occlusion-robust SMPL-X tracking and FLAME-based facial initialization.
- Hand Pose Estimation & 3D Hand–Object Interaction — real-time two-hand manipulation, text-conditioned hand–object mesh generation.
- Vision–Language Models for Spatial Reasoning — large-scale benchmarks for diagnosing fine-grained spatial reasoning failures.
- Continual Learning — generative replay for class-incremental object detection.
Selected research projects
Visibility-Aware 3D Gaussian Human Avatar Reconstruction
Vision & Learning Lab, UNIST · Jan 2025 – Present First-author work submitted to ECCV 2026 (under review).
- Designed a unified 3D Gaussian splatting pipeline for animatable avatar reconstruction from monocular video across full-body, upper-body, and head-only inputs.
- Developed visibility-aware optimization using Otsu’s method to prune unobserved Gaussians, reducing memory by up to 50% and improving rendering speed by ~34%.
- Implemented occlusion-robust SMPL-X co-registration with FLAME-based facial initialization and part-specific residual MLPs for high-frequency face and hand refinement.
- Integrated diffusion-based video generation to synthesize auxiliary 360° views for texture completion of unobserved regions.
- Code: GitHub
Accelerating Inference Speed for 3D Face Reconstruction
Vision & Learning Lab, UNIST · Aug 2023 – Aug 2024
- Implemented post-training sparsity-aware quantization for a 3D face reconstruction model to reduce inference overhead.
- Adopted and optimized a Vision Transformer backbone for the face reconstruction task.
- Designed lightweight student CNN networks via knowledge distillation to compress model complexity while preserving accuracy.
- Optimized MobileNetV3-Small for 3D face reconstruction, achieving 50% faster inference and 25% accuracy improvement.
- Code: GitHub
Advanced Mesh Reconstruction & Visualization on the ARCTIC Dataset
Vision & Learning Lab, UNIST · Feb 2023 – Jul 2023
- Implemented the FastInst architecture on the ARCTIC dataset for efficient hand–object mesh visualization.
- Deployed rendering pipelines to enable precise model evaluation on hand–object manipulation sequences.
- Code: GitHub
Earlier research experience
Machine Learning, Vision & Language Lab, UNIST — Research Assistant (Sep 2022 – Dec 2022)
Mentor: Prof. Taehwan Kim.
- Led a computer vision research initiative on astronomical image clarity enhancement using GANs, improving image detail by 60%.
- Evaluated deep learning models using standard metrics including accuracy, sensitivity, specificity, and precision.
Bio-Optics & Computational Imaging Lab, UNIST — Research Assistant (Sep 2021 – Dec 2021)
Mentor: Prof. Jung-Hoon Park.
- Conducted research on non-line-of-sight imaging using Ghost Imaging techniques.
- Applied machine learning algorithms to extract scattering resistance modes, achieving 95% accuracy.
Selected research highlights
- First author, ECCV 2026 (under review): Visibility-aware 3D Gaussian avatar framework — state-of-the-art across full-body, upper-body, and head-only settings (~3% PSNR gain, up to 50% memory reduction).
- Co-author, CVPR 2026: HandVQA — 1.6M-scale benchmark for diagnosing spatial reasoning failures in vision–language models.
- Co-author, CVPR 2026 Findings: THOM — text-conditioned generative framework for physically plausible hand–object meshes.
- Co-author, AAAI 2025: QORT-Former — real-time 3D two-hand manipulation modeling (53.5 FPS; +27.2% H2O; +10.4% FPHA).
- Co-author, CVPR 2024 Highlight (top 2.8%): SDDGR — Stable-Diffusion-based generative replay for class-incremental object detection.
A complete publication list is available on the Publications page.
