Program at a Glance

1 December, 2021

Beijing UTC +8	AEST (BNE) UTC +10	AEDT (SYD) UTC +11	Session	Details (Paper List)
9:00 - 9:10 a.m.	11:00 a.m.	12:00 p.m.	Opening	-
9:10 - 10:10 a.m.	11:10 - 12:10 p.m.	12:10 - 13:10 p.m.	Keynote 1: Mohan Kankanhalli	Privacy-aware Multimedia Analytics
5 mins break
10:15 - 11:45 a.m.	12:15 - 13:45 p.m.	13:15 - 14:45 p.m.	Session 1: Video Understanding in Multimedia (Chair: Hailin Shi)	Motion = Video - Content: Towards Unsupervised Learning of Motion Representation from Videos
				Blindly Predict Image and Video Quality in the Wild
				Hierarchical Deep Residual Reasoning for Temporal Moment Localization
				Video Saliency Prediction via Deep Eye Movement Learning
				Conditional Extreme Value Theory for Open Set Video Domain Adaptation
				Intra- and Inter-frame Iterative Temporal Convolutional Networks for Video Stabilization
11:45 - 12:30 p.m.	13:45 - 14:30 p.m.	14:45 - 15:30 p.m.	Session 2: Best Paper Candidates	Language Based Image Quality Assessment
				Towards Discriminative Visual Search via Semantically Cycle-consistent Hashing Networks
				Latent Pattern Sensing: Deepfake Video Detection via Predictive Representation Learning
12:30 - 14:00 p.m.	14:30 - 16:00 p.m.	15:30 - 17:00 p.m.	Session 3: Deep Learning for Multimedia (Chair: Lu Sheng)	A Local-Global Commutative Preserving Functional Map for Shape Correspondence
				Differentially Private Learning with Grouped Gradient Clipping
				Structural Knowledge Organization and Transfer for Class-Incremental Learning
				Improving Hyperspectral Super-Resolution via Heterogeneous Knowledge Distillation
				Patch-Based Deep Autoencoder for Point Cloud Geometry Compression
				Score Transformer: Generating Musical Score from Note-level Representation
14:00 - 15:30 p.m.	16:00 - 17:30 p.m.	17:00 - 18:30 p.m.	Session 4: Multimodality Learning in Multimedia (Chair: Hongyuan Zhu)	BRUSH: Label Reconstructing and Similarity Preserving Hashing for Cross-modal Retrieval
				Local Self-Attention on Fine-grained Cross-media Retrieval
				Self-Adaptive Hashing for Fine-Grained Image Retrieval
				Hierarchical Composition Learning for Composed Query Image Retrieval
				Few-shot Egocentric Multimodal Activity Recognition
				Inter-modality Discordance for Multimodal Fake News Detection
10 minutes break for transition from Zoom to Gather.Town
15:40 - 16:30 p.m.	17:40 - 18:30 p.m.	18:40 - 19:30 p.m.	Lightning Talk Session 1	Brave New Idea	Discovering Social Connections using Event Images
					SangeetXML: An XML Format for Score Retrieval for Indic Music
					Holodeck: Immersive 3D Displays Using Swarms of Flying Light Specks
				Demo Papers	RoadAtlas: Intelligent Platform for Automated Road Defect Detection and Asset Management
					Private-Share: A Secure and Privacy-Preserving De-Centralized Framework for Large Scale Data Sharing
					An Efficient Bus Crowdedness Classification System
				Short Papers - Part 1	*Paper list is shown on Short Paper List below
16:30 - 17:30 p.m.	18:30 - 19:30 p.m.	19:30 - 20:30 p.m.	Workshop: Multi-Modal Embedding and Understanding	Focusing Attention across Multiple Images for Multi-Modal Event Detection
				Adaptive Cross-stitch Graph Convolutional Networks
				Generation of Variable-Length Time Series from Text using Dynamic Time Warping-Based Method
				Hierarchical Graph Representation Learning with Local Capsule Pooling
				Deep Adaptive-Attention Triple Hashing

2 December, 2021

Beijing UTC +8	AEST (BNE) UTC +10	AEDT (SYD) UTC +11	Session	Details
9:00 - 9:10 a.m.	11:00 - 11:10 a.m.	12:00 - 12:10 p.m.	Best Paper Award Announcement
9:10 - 10:00 a.m.	11:10 - 12:00 p.m.	12:10 - 13:00 p.m.	Keynote 2: Yong Rui	Artificial Intelligence: Paving a Path to Digital Economy Transformation
10:00 - 10:30 a.m.	12:00 - 12:30 p.m.	13:00 - 13:30 p.m.	Grand Challenges	Introduction to two Grand Challenges
10:00 - 10:30 a.m.	12:00 - 12:30 p.m.	13:00 - 13:30 p.m.	Grand Challenges	Paper: Hybrid Improvements in Multimodal Analysis for Deep Video Understanding
10:30 - 12:00 p.m.	12:30 - 14:00 p.m.	13:30 - 15:00 p.m.	Session 5: Vision and Language in Multimedia (Chair: Jing Zhang)	Semantic Enhanced Cross-modal GAN for Zero-shot Learning
				TS2TD: A Tree-Structured Decoder for Image Paragraph Captioning
				Entity Relation Fusion for Real-Time One-Stage Referring Expression Comprehension
				Visual Storytelling with Hierarchical BERT Semantic Guidance
				Efficient Proposal Generation with U-shaped Network for Temporal Sentence Grounding
				Zero-shot Recognition with Image Attributes Generation using Hierarchical Coupled Dictionary Learning
5 mins break
12:05 - 13:35 p.m.	14:05 - 15:35 p.m.	15:05 - 16:35 p.m.	Session 6: Computer Vision in Multimedia (Chair: Tiesong Zhao)	Source-Style Transferred Mean Teacher for Source-data Free Object Detection
				Improving Camouflaged Object Detection with the Uncertainty of Pseudo-edge Labels
				MIRecipe: A Recipe Dataset for Stage-Aware Recognition of Changes in Appearance of Ingredients
				Learning to Decompose and Restore Low-light Images with Wavelet Transform
				Hard-Boundary Attention Network for Nuclei Instance Segmentation
				A Model-Guided Unfolding Network for Single Image Reflection Removal
5 minutes break for transition from Zoom to Gather.Town
13:40 - 14:55 p.m.	15:40 - 16:55 p.m.	16:40 - 17:55 p.m.	Workshop: Multi-Model Computing of Marine Big Data	Deep Reinforcement Learning and Docking Simulations for autonomous molecule generation in de novo Drug Design
				Joint label refinement and contrastive learning with hybrid memory for Unsupervised Marine Object Re-Identification
				Prediction of transcription factor binding sites using deep learning combined with DNA sequences and shape feature data
				A reinforcement learning-based reward mechanism for molecule generation that introduces activity information
				A Fine-Grained River Ice Semantic Segmentation based on Attentive Features and Enhancing Feature Fusion
				Multi-Scale Graph Convolutional Network and Dynamic Iterative Class Loss for Ship Segmentation in Remote Sensing Images
5 mins break
15:00 - 16:00 p.m.	17:00 - 18:00 p.m.	18:00 - 19:00 p.m.	Special Session	Women in Multimedia Roundtable
5 mins break
16:05 - 17:05 p.m.	18:05 - 19:05 p.m.	19:05 - 20:05 p.m.	Lightning Talk Session 2 - Short Papers - Part 2	*Paper list is shown on Short Paper List below
17:05 - 17:40 p.m.	19:05 - 19:40 p.m.	20:05 - 20:40 p.m.	Social Connections on Gather.Town	Posters and Q&A for all tracks

3 December, 2021

Beijing UTC +8	AEST (BNE) UTC +10	AEDT (SYD) UTC +11	Session	Details
9:00 - 9:10 a.m.	11:00 - 11:10 p.m.	12:00 - 12:10 p.m.	Introduction to ACM Multimedia Asia 2022	-
9:10 - 10:00 a.m.	11:10 - 12:00 p.m.	12:10 - 13:00 p.m.	Keynote 3: Divesh Srivastava	How to do Research for Fun and Profit
10:00 - 11:00 p.m.	12:00 - 13:00 p.m.	13:00 - 14:00 p.m.	HDR Lightning Talks	TBA
11:00 - 12:00 p.m.	13:00 - 14:00 p.m.	14:00 - 15:00 p.m.	Keynote 4: Klara Nahrstedt	Navigation Models for Interactive 360-Degree Video Streaming Systems
5 mins break
12:05 - 14:05 p.m.	14:05 - 16:05 p.m.	15:05 - 17:05 p.m.	Tutorial 1: Recent Advances in Video Summarization: Conventional and Deep Learning based Approaches	Zhiyong Wang (USyd), Zhou Zhao (ZJU), Xi Li (ZJU), Kun Kuang (ZJU), and Fei Wu (ZJU)
10 mins break
14:15 - 16:00 p.m.	16:15 - 18:00 p.m.	17:15 - 19:00 p.m.	Tutorial 2: Modeling User Behavior for Vertical Search: Images, Apps and Products	Xiaohui Xie (THU), Jiaxin Mao (THU), Yuqun Liu (THU), and Maarten de Rijke (UvA)
16:00 - 16:45 p.m.	18:00 - 18:45 p.m.	19:00 - 19:45 p.m.	Applied Research Track	Goldeye: Enhanced Spatial Awareness for the Visually Impaired using Mixed Reality and Vibrotactile Feedback
				Convolutional Neural Network-Based Pure Paint Pigment Identification Using Hyperspectral Images
				CFCR: A Convolution and Fusion Model for Cross-platform Recommendation
5 minutes break for transition from Zoom to Gather.Town
16:50 - 17:50 p.m.	18:50 - 19:50 p.m.	19:50 - 20:50 p.m.	Workshop: Visual Tasks and Challenges under Low-quality Multimedia Data	Local-enhanced Multi-resolution Representation Learning for Vehicle Re-identification
				Dedark+Detection: A Hybrid Scheme for Object Detection under Low-light Surveillance
				Making Video Recognition Models Robust to Common Corruptions With Supervised Contrastive Learning
				Visible-Infrared Cross-Modal Person Re-identification based on Positive Feedback
17:50 p.m.	19:50 p.m.	20:50 p.m.	Closing	-

*Short Paper List - Part 1

CMRD-Net: An Improved Method for Underwater Image Enhancement
Deep Multiple Length Hashing via Multi-task Learning
Color Image Denoising via Tensor Robust PCA with Nonconvex and Nonlocal Regularization
Conditioned Image Retrieval for Fashion using Contrastive Learning and CLIP-based Features
PBNet: Position-specific Text-to-image Generation by Boundary
An Embarrassingly Simple Approach to Discrete Supervised Hashing

*Short Paper List - Part 2

Towards Transferable 3D Adversarial Attack
Delay-sensitive and Priority-aware Transmission Control for Real-time Multimedia Communications
Impression of a Job Interview training agent that gives rationalized feedback ~Should Virtual Agent Give Advice with Rationale
A Coarse-to-fine Approach for Fast Super-Resolution with Flexible Magnification
Automatically Generate Rigged Character from Single Image
Flat and Shallow: Understanding Fake Image Detection Models by Architecture Profiling
Multi-branch Semantic Learning Network for Text-to-Image Synthesis
Attention-based Dual-Branches Localization Network for Weakly Supervised Object Localization
Pose-aware Outfit Transfer between Unpaired in-the-wild Fashion Images
Explore before Moving: A Feasible Path Estimation and Memory Recalling Framework for Embodied Navigation
Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages
PLM-IPE: A Pixel-Landmark Mutual Enhanced Framework for Implicit Preference Estimation
Adaptive Viewport Margins Using Head Motion for Improving User Experience in Immersive Video
Chinese White Dolphin Detection in the Wild
BAND: A Benchmark Dataset for Bangla News Audio Classification
A comparison study: the impact of age and gender distribution on age estimation
Spherical Image Compression Using Spherical Wavelet Transform
FQM-GC: Full-reference Quality Metric for Colored Point Cloud Based on Graph Signal Features and Color Features
Cross-layer Navigation Convolutional Neural Network for Fine-grained Visual Classification
NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels