
Prof. Xudong Jiang (IEEE Fellow)
Nanyang
Technological University, Singapore
Speech Title: How Deep CNN and Transformer
Solve Machine Learning Problems of Traditional ANN
Abstract:
The powerfulness of machine learning was already proven more than 30
years ago in the boom of neural networks but its successful
application to the real world is just in recent 10 years after the
deep convolutional neural networks (CNN) have been developed. This
is because the machine learning alone can only solve problems in the
training data but the system is designed for the unknown data
outside of the training set. This gap can be bridged by
regularization: human knowledge guidance or interference to the
machine learning. This speech will analyze these concepts and ideas
from traditional neural networks such as MLP to the deep
convolutional neural networks (CNN) and Transformer. It will answer
the questions why the traditional neural networks fail to solve real
world problems even after more than 30 years’ intensive research and
development and how the deep CNN and Transformer solve the problems
of the traditional neural networks and now are very successful in
solving various real world AI problems.
Biography: Xudong
Jiang (Fellow of IEEE) received the B.E. and M.Eng degrees from the
University of Electronic Science and Technology of China (UESTC),
and the PhD degree from Helmut Schmidt University, Hamburg, Germany.
From 1998 to 2004, he was with the Institute for Infocomm Research,
A*STAR, Singapore, as a lead scientist, and the head of the
Biometrics Laboratory. He joined Nanyang Technological University
(NTU), Singapore, as a faculty member, in 2004, where he served as
the director of the Centre for Information Security from 2005 to
2011. He is currently a professor with the School of EEE, NTU and
serves as the director of the Centre for Information Sciences and
Systems of School of EEE, NTU. He has authored over 300 papers with
over 80 papers in IEEE journals including 15 T-PAMI papers and over
20 T-IP papers. Dr Jiang has presented over 50 papers in top AI
conferences CVPR/NeurIPS/ICML/ICCV/ECCV/ICLR/AAAI. His papers have
been cited over 18 Thousand times with H-index 73. He served as IFS
TC member of the IEEE Signal Processing Society from 2015 to 2017,
associate editor for IEEE Signal Processing Letter from 2014 to 2018
and associate editor for IEEE Transactions on Image Processing from
2016 to 2020. Currently, he is an IEEE Fellow, serves as senior area
editor for IEEE Transactions on Image Processing and editor-in-chief
for IET Biometrics. He also served as Area chairs for AAAI, NeurIPS
and IEEE ICIP. His current research interests include image
processing, pattern recognition, computer vision, machine learning,
and biometrics.

Prof. Ruili Wang (Fellow of Engineering New
Zealand, Stanford/Elsevier Top 2% Scientists List (2021-2025))
Massey University, New Zealand
Speech Title: Multimodal Data Processing
Abstract: In this presentation, we will introduce our recent
progress in Multimodal Data Processing, especially in video
captioning and infrared–visible image fusion.
Video captioning
bridges computer vision and natural language processing and plays an
essential role in various knowledge-driven systems within the
streaming media era. Recent video captioning methods have achieved
promising performance by leveraging external textual knowledge to
better understand video content and generate more informative
captions. Nevertheless, existing methods that rely excessively on
knowledge graphs still suffer from several inherent limitations. To
address these issues, we propose a novel knowledge enhancement and
disentanglement learning framework for video captioning.
Image
fusion is an important technique in computer vision and image
processing. It integrates multiple images of the same scene —
captured by different sensors, at different times or from different
viewpoints — into a single high-quality composite image. Current
text-driven infrared–visible image fusion methods mainly adopt
sentence-level textual guidance. This paradigm easily introduces
semantic noise caused by text redundancy and fails to fully exploit
the deep semantic value of textual cues. To overcome these
drawbacks, we propose a novel fusion approach dubbed Entity-Guided
Multi-Task Learning for Infrared and Visible Image Fusion.
Biography: Professor Ruili Wang, Fellow of Engineering New Zealand,
graduated from Huazhong University of Science and Technology,
Northeastern University, and Dublin City University, where he
obtained his B.E., M.E., and Ph.D., respectively. His research areas
include AI, machine learning, computer vision, and speech and
language processing. His research has been funded by multiple grants
from the New Zealand government. Professor Wang is an associate
editor/editorial board member of the following journals: IEEE
Transactions on Multimedia (TMM), IEEE Transactions on Circuits and
Systems for Video Technology (TCSVT), IEEE Computational
Intelligence Magazine, IEEE Transactions on Emerging Topics in
Computational Intelligence (TETCI), ACM Transactions on Multimedia
Computing, Communications, and Applications (TOMM), etc.

Prof. Leida Li
(National Young Talent)
Xidian University, China
Speech Title: Fine-grained Visual Quality
Assessment
Abstract: Visual quality assessment measures the
perceptual quality of images by simulating the characteristics of
the Human Visual System (HVS). As a common technology, it has
important applications in many fields such as low-level vision,
imaging optimization, smart photography, and AIGC. After more than
20 years of rapid development, a large number of algorithms have
been proposed. However, the existing methods typically suffer from
insufficient discrimination ability when used in real-world
environments. This talk focuses on the key differences between
coarse-grained and fine-grained visual quality assessment, the main
research progress in fine-grained visual quality assessment, as well
as applications in camera tuning and aesthetic recommendation, etc.
Biography: Leida Li is a Full Professor at Xidian University, recognized as a National Young Talent. His research interests include computer vision, visual quality assessment, and computational aesthetics. He has published over 100 papers in top-tier journals and conferences like IEEE TPAMI, IEEE TIP, CVPR, ICCV, and AAAI, with about 10,000 citations. He has led five projects supported by the National Natural Science Foundation of China and has actively engaged in industry-academia collaborations with top companies such as Huawei, OPPO, and Tencent. He was awarded the "Outstanding Industry-Academia Collaboration Partner" by OPPO, and his research outcomes have been applied in smart phone and live-streaming cameras. He is an Associate Editor of IEEE Transactions on Image Processing (TIP) and Journal of Visual Communication and Image Representation(Best Associate Editor Award 2021/2023), and serves as Area Chair/Senior Program Committee member for top international conferences such as AAAI, IJCAI, and ACM MM. He is a Senior Member of IEEE/CCF/CSIG.

Prof. Jiantao Zhou
University of Macau, China
Speech Title: Towards Robust Learning-Based
Multimedia Forensics
Abstract: In recent years, the
proliferation of sophisticated multimedia generation and
manipulation technologies, such as deepfakes and advanced
image/video editing tools, has significantly blurred the line
between authentic and fabricated content. As multimedia plays an
increasingly crucial role in information dissemination, legal
evidence, and social interactions, ensuring its integrity has become
a pressing concern. How can we effectively distinguish genuine media
from skilfully crafted forgeries, especially when traditional
forensic techniques struggle to keep pace with rapidly evolving
tampering methods? Moreover, the challenges are compounded by the
degradation of forensic features during media transmission and the
vulnerability of detection models to adversarial attacks. In the
realm of multimedia forensics, learning-based approaches offer a
promising avenue for tackling these complex issues. However, there
is a pressing need to enhance their robustness against various
distortions, obfuscation strategies, and dynamic threats. This talk
explores the latest advancements in robust learning-based multimedia
forensics, delving into novel methodologies designed to fortify
detection capabilities. From developing innovative feature
extraction techniques that can withstand transmission-induced
degradation to creating resilient models that can counter
adversarial manipulations, the presentation aims to outline a
comprehensive research direction for achieving reliable and
trustworthy multimedia forensics in an increasingly digital and
deceptive world.
Biography: Dr. Jiantao Zhou is a Full
Professor at the Department of Computer and Information Science, and
the State Key Laboratory of Internet of Things for Smart City,
University of Macau, where he also serves as the Director for
Research Services and Knowledge Transfer Office. He graduated from
the Hong Kong University of Science and Technology in 2009 with a
PhD in Electrical and Computer Engineering. He was a Fulbright
Junior Scholar at the University of Illinois at Urbana-Champaign
(UIUC). Professor Zhou’s research focuses on AI security, multimedia
information privacy protection and forensics, and intelligent
multimedia information processing. He has published more than 200
papers in top journals such as IEEE T-PAMI, IEEE T-IP, IEEE T-SP,
IEEE T-IFS, IEEE T-AC and other top conferences such as CVPR, ICCV,
ICML, and AAAI. He currently serves as the Associate Editor for IEEE
Trans. Multimedia and IEEE Trans. Dependable and Secure Computing,
the top journals in the field of multimedia information processing
and security and was the Editor-in-Chief of APSIPA Newsletters. He
is the Chair for the Multimedia Systems and Applications Technical
Committee in IEEE Circuits and Systems Society and was the TPC
Co-Chair of ICME 2023 and the General-Chair of APSIPA ASC 2024. He
received the 2022 Macau Science and Technology Award (Third Prize,
Natural Science Award) and the 2023 Alibaba Outstanding Academic
Cooperation Project Award.