KEYNOTE Speakers

Prof. Xudong Jiang (IEEE Fellow)
Nanyang Technological University, Singapore

Speech Title: How Deep CNN and Transformer Solve Machine Learning Problems of Traditional ANN

Abstract: The powerfulness of machine learning was already proven more than 30 years ago in the boom of neural networks but its successful application to the real world is just in recent 10 years after the deep convolutional neural networks (CNN) have been developed. This is because the machine learning alone can only solve problems in the training data but the system is designed for the unknown data outside of the training set. This gap can be bridged by regularization: human knowledge guidance or interference to the machine learning. This speech will analyze these concepts and ideas from traditional neural networks such as MLP to the deep convolutional neural networks (CNN) and Transformer. It will answer the questions why the traditional neural networks fail to solve real world problems even after more than 30 years’ intensive research and development and how the deep CNN and Transformer solve the problems of the traditional neural networks and now are very successful in solving various real world AI problems.

Biography: Xudong Jiang (Fellow of IEEE) received the B.E. and M.Eng degrees from the University of Electronic Science and Technology of China (UESTC), and the PhD degree from Helmut Schmidt University, Hamburg, Germany. From 1998 to 2004, he was with the Institute for Infocomm Research, A*STAR, Singapore, as a lead scientist, and the head of the Biometrics Laboratory. He joined Nanyang Technological University (NTU), Singapore, as a faculty member, in 2004, where he served as the director of the Centre for Information Security from 2005 to 2011. He is currently a professor with the School of EEE, NTU and serves as the director of the Centre for Information Sciences and Systems of School of EEE, NTU. He has authored over 300 papers with over 80 papers in IEEE journals including 15 T-PAMI papers and over 20 T-IP papers. Dr Jiang has presented over 50 papers in top AI conferences CVPR/NeurIPS/ICML/ICCV/ECCV/ICLR/AAAI. His papers have been cited over 18 Thousand times with H-index 73. He served as IFS TC member of the IEEE Signal Processing Society from 2015 to 2017, associate editor for IEEE Signal Processing Letter from 2014 to 2018 and associate editor for IEEE Transactions on Image Processing from 2016 to 2020. Currently, he is an IEEE Fellow, serves as senior area editor for IEEE Transactions on Image Processing and editor-in-chief for IET Biometrics. He also served as Area chairs for AAAI, NeurIPS and IEEE ICIP. His current research interests include image processing, pattern recognition, computer vision, machine learning, and biometrics.

Prof. Ruili Wang (Fellow of Engineering New Zealand, Stanford/Elsevier Top 2% Scientists List (2021-2025))
Massey University, New Zealand

Speech Title: Multimodal Data Processing

Abstract: In this presentation, we will introduce our recent progress in Multimodal Data Processing, especially in video captioning and infrared–visible image fusion.
Video captioning bridges computer vision and natural language processing and plays an essential role in various knowledge-driven systems within the streaming media era. Recent video captioning methods have achieved promising performance by leveraging external textual knowledge to better understand video content and generate more informative captions. Nevertheless, existing methods that rely excessively on knowledge graphs still suffer from several inherent limitations. To address these issues, we propose a novel knowledge enhancement and disentanglement learning framework for video captioning.
Image fusion is an important technique in computer vision and image processing. It integrates multiple images of the same scene — captured by different sensors, at different times or from different viewpoints — into a single high-quality composite image. Current text-driven infrared–visible image fusion methods mainly adopt sentence-level textual guidance. This paradigm easily introduces semantic noise caused by text redundancy and fails to fully exploit the deep semantic value of textual cues. To overcome these drawbacks, we propose a novel fusion approach dubbed Entity-Guided Multi-Task Learning for Infrared and Visible Image Fusion.

Biography: Professor Ruili Wang, Fellow of Engineering New Zealand, graduated from Huazhong University of Science and Technology, Northeastern University, and Dublin City University, where he obtained his B.E., M.E., and Ph.D., respectively. His research areas include AI, machine learning, computer vision, and speech and language processing. His research has been funded by multiple grants from the New Zealand government. Professor Wang is an associate editor/editorial board member of the following journals: IEEE Transactions on Multimedia (TMM), IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), IEEE Computational Intelligence Magazine, IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI), ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), etc.

Prof. Leida Li (National Young Talent)
Xidian University, China

Speech Title: Fine-grained Visual Quality Assessment

Abstract: Visual quality assessment measures the perceptual quality of images by simulating the characteristics of the Human Visual System (HVS). As a common technology, it has important applications in many fields such as low-level vision, imaging optimization, smart photography, and AIGC. After more than 20 years of rapid development, a large number of algorithms have been proposed. However, the existing methods typically suffer from insufficient discrimination ability when used in real-world environments. This talk focuses on the key differences between coarse-grained and fine-grained visual quality assessment, the main research progress in fine-grained visual quality assessment, as well as applications in camera tuning and aesthetic recommendation, etc.

Biography: Leida Li is a Full Professor at Xidian University, recognized as a National Young Talent. His research interests include computer vision, visual quality assessment, and computational aesthetics. He has published over 100 papers in top-tier journals and conferences like IEEE TPAMI, IEEE TIP, CVPR, ICCV, and AAAI, with about 10,000 citations. He has led five projects supported by the National Natural Science Foundation of China and has actively engaged in industry-academia collaborations with top companies such as Huawei, OPPO, and Tencent. He was awarded the "Outstanding Industry-Academia Collaboration Partner" by OPPO, and his research outcomes have been applied in smart phone and live-streaming cameras. He is an Associate Editor of IEEE Transactions on Image Processing (TIP) and Journal of Visual Communication and Image Representation(Best Associate Editor Award 2021/2023), and serves as Area Chair/Senior Program Committee member for top international conferences such as AAAI, IJCAI, and ACM MM. He is a Senior Member of IEEE/CCF/CSIG.

Prof. Jiantao Zhou
University of Macau, China

Speech Title: Towards Robust Learning-Based Multimedia Forensics

Abstract: In recent years, the proliferation of sophisticated multimedia generation and manipulation technologies, such as deepfakes and advanced image/video editing tools, has significantly blurred the line between authentic and fabricated content. As multimedia plays an increasingly crucial role in information dissemination, legal evidence, and social interactions, ensuring its integrity has become a pressing concern. How can we effectively distinguish genuine media from skilfully crafted forgeries, especially when traditional forensic techniques struggle to keep pace with rapidly evolving tampering methods? Moreover, the challenges are compounded by the degradation of forensic features during media transmission and the vulnerability of detection models to adversarial attacks. In the realm of multimedia forensics, learning-based approaches offer a promising avenue for tackling these complex issues. However, there is a pressing need to enhance their robustness against various distortions, obfuscation strategies, and dynamic threats. This talk explores the latest advancements in robust learning-based multimedia forensics, delving into novel methodologies designed to fortify detection capabilities. From developing innovative feature extraction techniques that can withstand transmission-induced degradation to creating resilient models that can counter adversarial manipulations, the presentation aims to outline a comprehensive research direction for achieving reliable and trustworthy multimedia forensics in an increasingly digital and deceptive world.

Biography: Dr. Jiantao Zhou is a Full Professor at the Department of Computer and Information Science, and the State Key Laboratory of Internet of Things for Smart City, University of Macau, where he also serves as the Director for Research Services and Knowledge Transfer Office. He graduated from the Hong Kong University of Science and Technology in 2009 with a PhD in Electrical and Computer Engineering. He was a Fulbright Junior Scholar at the University of Illinois at Urbana-Champaign (UIUC). Professor Zhou’s research focuses on AI security, multimedia information privacy protection and forensics, and intelligent multimedia information processing. He has published more than 200 papers in top journals such as IEEE T-PAMI, IEEE T-IP, IEEE T-SP, IEEE T-IFS, IEEE T-AC and other top conferences such as CVPR, ICCV, ICML, and AAAI. He currently serves as the Associate Editor for IEEE Trans. Multimedia and IEEE Trans. Dependable and Secure Computing, the top journals in the field of multimedia information processing and security and was the Editor-in-Chief of APSIPA Newsletters. He is the Chair for the Multimedia Systems and Applications Technical Committee in IEEE Circuits and Systems Society and was the TPC Co-Chair of ICME 2023 and the General-Chair of APSIPA ASC 2024. He received the 2022 Macau Science and Technology Award (Third Prize, Natural Science Award) and the 2023 Alibaba Outstanding Academic Cooperation Project Award.