Thanh V. T. Tran

avatar.jpg

I am Thanh Tran, an AI Research Resident at the FPT Software – AI Center, working under the supervision of Dr. Van Nguyen and Professor Truong-Son Hy. I’m starting my Ph.D. at Nanyang Technological University (NTU) in Fall 2026, advised by Professor Woon-Seng Gan.

I’m always open to collaborations, discussions, and new opportunities. Feel free to reach out if you’re interested in my research or would like to discuss potential projects.

Research: My research spans several key areas in artificial intelligence, with a primary focus on multimodal AI, generative models, and AI for scientific discovery.

1. Multimodal AI and Audio-Visual Learning. I develop deep learning models for audio-visual understanding and generation, including video-to-audio synthesis, automated video dubbing, and speech reconstruction from silent videos.

2. Generative Models for Speech and Audio. I work on flow models for text-to-speech and audio generation, aiming to build efficient, low-latency systems for real-world deployment.

3. AI for Scientific Discovery. Inspired by evolutionary algorithms, I optimize protein sequences using black-box optimization methods in discrete and latent spaces.

News

May 01, 2026 DiFlowDubber got accepted at CVPR Findings 2026. DiFlowDubber and Flowley also got accepted at Sight and Sound Workshop, CVPR 2026.
Jan 10, 2026 Honored to receive the Best Performance Award 2025, ranking in the top 3 out of 100+ AI engineers and researchers at FPT Software – AI Center.
May 20, 2025 RESOUND got accepted at Interspeech 2025.
Dec 21, 2024 ConxGNN got accepted at ICASSP 2025.
Nov 17, 2024 GROOT got accepted at KDD 2025.

Selected Publications

  1. DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and Synchronization
    Ngoc-Son Nguyen, Thanh V. T. Tran, Jeongsoo Choi, and 3 more authors
    Computer Vision and Pattern Recognition Findings, 2026
  2. Speech Reconstruction from Silent Videos via Acoustic-Semantic Decomposed Modeling
    Long-Khanh Pham, Thanh V. T. Tran, Minh-Tan Pham, and 1 more author
    Interspeech, 2025
  3. Effective Context Modeling Framework for Emotion Recognition in Conversations
    Cuong Tran Van*, Thanh V. T. Tran*, Van Nguyen, and 1 more author
    International Conference on Acoustics, Speech, and Signal Processing, 2025
  4. KDD
    GROOT: Effective Design of Biological Sequences with Limited Experimental Data
    Thanh V. T. Tran*, Nhat Khang Ngo*, Viet Anh Nguyen, and 1 more author
    Conference on Knowledge Discovery and Data Mining, 2025