Program at a Glance

1 December, 2021

Beijing UTC +8AEST (BNE) UTC +10AEDT (SYD) UTC +11SessionDetails (Paper List)
9:00 - 9:10 a.m.11:00 a.m.12:00 p.m.Opening-
9:10 - 10:10 a.m.11:10 - 12:10 p.m.12:10 - 13:10 p.m.Keynote 1: Mohan KankanhalliPrivacy-aware Multimedia Analytics
5 mins break
10:15 - 11:45 a.m.12:15 - 13:45 p.m.13:15 - 14:45 p.m.Session 1: Video Understanding in Multimedia (Chair: Hailin Shi)Motion = Video - Content: Towards Unsupervised Learning of Motion Representation from Videos
Blindly Predict Image and Video Quality in the Wild
Hierarchical Deep Residual Reasoning for Temporal Moment Localization
Video Saliency Prediction via Deep Eye Movement Learning
Conditional Extreme Value Theory for Open Set Video Domain Adaptation
Intra- and Inter-frame Iterative Temporal Convolutional Networks for Video Stabilization
11:45 - 12:30 p.m.13:45 - 14:30 p.m.14:45 - 15:30 p.m.Session 2: Best Paper CandidatesLanguage Based Image Quality Assessment
Towards Discriminative Visual Search via Semantically Cycle-consistent Hashing Networks
Latent Pattern Sensing: Deepfake Video Detection via Predictive Representation Learning
12:30 - 14:00 p.m.14:30 - 16:00 p.m.15:30 - 17:00 p.m.Session 3: Deep Learning for Multimedia (Chair: Lu Sheng)A Local-Global Commutative Preserving Functional Map for Shape Correspondence
Differentially Private Learning with Grouped Gradient Clipping
Structural Knowledge Organization and Transfer for Class-Incremental Learning
Improving Hyperspectral Super-Resolution via Heterogeneous Knowledge Distillation
Patch-Based Deep Autoencoder for Point Cloud Geometry Compression
Score Transformer: Generating Musical Score from Note-level Representation
14:00 - 15:30 p.m.16:00 - 17:30 p.m.17:00 - 18:30 p.m.Session 4: Multimodality Learning in Multimedia (Chair: Hongyuan Zhu)BRUSH: Label Reconstructing and Similarity Preserving Hashing for Cross-modal Retrieval
Local Self-Attention on Fine-grained Cross-media Retrieval
Self-Adaptive Hashing for Fine-Grained Image Retrieval
Hierarchical Composition Learning for Composed Query Image Retrieval
Few-shot Egocentric Multimodal Activity Recognition
Inter-modality Discordance for Multimodal Fake News Detection
10 minutes break for transition from Zoom to Gather.Town
15:40 - 16:30 p.m.17:40 - 18:30 p.m.18:40 - 19:30 p.m.Lightning Talk Session 1Brave New IdeaDiscovering Social Connections using Event Images
SangeetXML: An XML Format for Score Retrieval for Indic Music
Holodeck: Immersive 3D Displays Using Swarms of Flying Light Specks
Demo PapersRoadAtlas: Intelligent Platform for Automated Road Defect Detection and Asset Management
Private-Share: A Secure and Privacy-Preserving De-Centralized Framework for Large Scale Data Sharing
An Efficient Bus Crowdedness Classification System
Short Papers - Part 1*Paper list is shown on Short Paper List below
16:30 - 17:30 p.m.18:30 - 19:30 p.m.19:30 - 20:30 p.m.Workshop: Multi-Modal Embedding and UnderstandingFocusing Attention across Multiple Images for Multi-Modal Event Detection
Adaptive Cross-stitch Graph Convolutional Networks
Generation of Variable-Length Time Series from Text using Dynamic Time Warping-Based Method
Hierarchical Graph Representation Learning with Local Capsule Pooling
Deep Adaptive-Attention Triple Hashing

2 December, 2021

Beijing UTC +8AEST (BNE) UTC +10AEDT (SYD) UTC +11SessionDetails
9:00 - 9:10 a.m.11:00 - 11:10 a.m.12:00 - 12:10 p.m.Best Paper Award Announcement
9:10 - 10:00 a.m.11:10 - 12:00 p.m.12:10 - 13:00 p.m.Keynote 2: Yong RuiArtificial Intelligence: Paving a Path to Digital Economy Transformation
10:00 - 10:30 a.m.12:00 - 12:30 p.m.13:00 - 13:30 p.m.Grand ChallengesIntroduction to two Grand Challenges
Paper: Hybrid Improvements in Multimodal Analysis for Deep Video Understanding
10:30 - 12:00 p.m.12:30 - 14:00 p.m.13:30 - 15:00 p.m.Session 5: Vision and Language in Multimedia (Chair: Jing Zhang)Semantic Enhanced Cross-modal GAN for Zero-shot Learning
TS2TD: A Tree-Structured Decoder for Image Paragraph Captioning
Entity Relation Fusion for Real-Time One-Stage Referring Expression Comprehension
Visual Storytelling with Hierarchical BERT Semantic Guidance
Efficient Proposal Generation with U-shaped Network for Temporal Sentence Grounding
Zero-shot Recognition with Image Attributes Generation using Hierarchical Coupled Dictionary Learning
5 mins break
12:05 - 13:35 p.m.14:05 - 15:35 p.m.15:05 - 16:35 p.m.Session 6: Computer Vision in Multimedia (Chair: Tiesong Zhao)Source-Style Transferred Mean Teacher for Source-data Free Object Detection
Improving Camouflaged Object Detection with the Uncertainty of Pseudo-edge Labels
MIRecipe: A Recipe Dataset for Stage-Aware Recognition of Changes in Appearance of Ingredients
Learning to Decompose and Restore Low-light Images with Wavelet Transform
Hard-Boundary Attention Network for Nuclei Instance Segmentation
A Model-Guided Unfolding Network for Single Image Reflection Removal
5 minutes break for transition from Zoom to Gather.Town
13:40 - 14:55 p.m.15:40 - 16:55 p.m.16:40 - 17:55 p.m.Workshop: Multi-Model Computing of Marine Big DataDeep Reinforcement Learning and Docking Simulations for autonomous molecule generation in de novo Drug Design
Joint label refinement and contrastive learning with hybrid memory for Unsupervised Marine Object Re-Identification
Prediction of transcription factor binding sites using deep learning combined with DNA sequences and shape feature data
A reinforcement learning-based reward mechanism for molecule generation that introduces activity information
A Fine-Grained River Ice Semantic Segmentation based on Attentive Features and Enhancing Feature Fusion
Multi-Scale Graph Convolutional Network and Dynamic Iterative Class Loss for Ship Segmentation in Remote Sensing Images
5 mins break
15:00 - 16:00 p.m.17:00 - 18:00 p.m.18:00 - 19:00 p.m.Special SessionWomen in Multimedia Roundtable
5 mins break
16:05 - 17:05 p.m.18:05 - 19:05 p.m.19:05 - 20:05 p.m.Lightning Talk Session 2 - Short Papers - Part 2*Paper list is shown on Short Paper List below
17:05 - 17:40 p.m.19:05 - 19:40 p.m.20:05 - 20:40 p.m.Social Connections on Gather.TownPosters and Q&A for all tracks

3 December, 2021

Beijing UTC +8AEST (BNE) UTC +10AEDT (SYD) UTC +11SessionDetails
9:00 - 9:10 a.m.11:00 - 11:10 p.m.12:00 - 12:10 p.m.Introduction to ACM Multimedia Asia 2022-
9:10 - 10:00 a.m.11:10 - 12:00 p.m.12:10 - 13:00 p.m.Keynote 3: Divesh SrivastavaHow to do Research for Fun and Profit
10:00 - 11:00 p.m.12:00 - 13:00 p.m.13:00 - 14:00 p.m.HDR Lightning TalksTBA
11:00 - 12:00 p.m.13:00 - 14:00 p.m.14:00 - 15:00 p.m.Keynote 4: Klara NahrstedtNavigation Models for Interactive 360-Degree Video Streaming Systems
5 mins break
12:05 - 14:05 p.m.14:05 - 16:05 p.m.15:05 - 17:05 p.m.Tutorial 1:
Recent Advances in Video Summarization: Conventional and Deep Learning based Approaches
Zhiyong Wang (USyd), Zhou Zhao (ZJU), Xi Li (ZJU), Kun Kuang (ZJU), and Fei Wu (ZJU)
10 mins break
14:15 - 16:00 p.m.16:15 - 18:00 p.m.17:15 - 19:00 p.m.Tutorial 2:
Modeling User Behavior for Vertical Search: Images, Apps and Products
Xiaohui Xie (THU), Jiaxin Mao (THU), Yuqun Liu (THU), and Maarten de Rijke (UvA)
16:00 - 16:45 p.m.18:00 - 18:45 p.m.19:00 - 19:45 p.m.Applied Research TrackGoldeye: Enhanced Spatial Awareness for the Visually Impaired using Mixed Reality and Vibrotactile Feedback
Convolutional Neural Network-Based Pure Paint Pigment Identification Using Hyperspectral Images
CFCR: A Convolution and Fusion Model for Cross-platform Recommendation
5 minutes break for transition from Zoom to Gather.Town
16:50 - 17:50 p.m.18:50 - 19:50 p.m.19:50 - 20:50 p.m.Workshop: Visual Tasks and Challenges under Low-quality Multimedia DataLocal-enhanced Multi-resolution Representation Learning for Vehicle Re-identification
Dedark+Detection: A Hybrid Scheme for Object Detection under Low-light Surveillance
Making Video Recognition Models Robust to Common Corruptions With Supervised Contrastive Learning
Visible-Infrared Cross-Modal Person Re-identification based on Positive Feedback
17:50 p.m.19:50 p.m.20:50 p.m.Closing-

*Short Paper List - Part 1

  • CMRD-Net: An Improved Method for Underwater Image Enhancement
  • Deep Multiple Length Hashing via Multi-task Learning
  • Color Image Denoising via Tensor Robust PCA with Nonconvex and Nonlocal Regularization
  • Conditioned Image Retrieval for Fashion using Contrastive Learning and CLIP-based Features
  • PBNet: Position-specific Text-to-image Generation by Boundary
  • An Embarrassingly Simple Approach to Discrete Supervised Hashing

*Short Paper List - Part 2

  • Towards Transferable 3D Adversarial Attack
  • Delay-sensitive and Priority-aware Transmission Control for Real-time Multimedia Communications
  • Impression of a Job Interview training agent that gives rationalized feedback ~Should Virtual Agent Give Advice with Rationale
  • A Coarse-to-fine Approach for Fast Super-Resolution with Flexible Magnification
  • Automatically Generate Rigged Character from Single Image
  • Flat and Shallow: Understanding Fake Image Detection Models by Architecture Profiling
  • Multi-branch Semantic Learning Network for Text-to-Image Synthesis
  • Attention-based Dual-Branches Localization Network for Weakly Supervised Object Localization
  • Pose-aware Outfit Transfer between Unpaired in-the-wild Fashion Images
  • Explore before Moving: A Feasible Path Estimation and Memory Recalling Framework for Embodied Navigation
  • Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages
  • PLM-IPE: A Pixel-Landmark Mutual Enhanced Framework for Implicit Preference Estimation
  • Adaptive Viewport Margins Using Head Motion for Improving User Experience in Immersive Video
  • Chinese White Dolphin Detection in the Wild
  • BAND: A Benchmark Dataset for Bangla News Audio Classification
  • A comparison study: the impact of age and gender distribution on age estimation
  • Spherical Image Compression Using Spherical Wavelet Transform
  • FQM-GC: Full-reference Quality Metric for Colored Point Cloud Based on Graph Signal Features and Color Features
  • Cross-layer Navigation Convolutional Neural Network for Fine-grained Visual Classification
  • NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels

