Program at a Glance
1 December, 2021
Beijing UTC +8 | AEST (BNE) UTC +10 | AEDT (SYD) UTC +11 | Session | Details (Paper List) | |
9:00 - 9:10 a.m. | 11:00 a.m. | 12:00 p.m. | Opening | - | |
9:10 - 10:10 a.m. | 11:10 - 12:10 p.m. | 12:10 - 13:10 p.m. | Keynote 1: Mohan Kankanhalli | Privacy-aware Multimedia Analytics | |
5 mins break | |||||
10:15 - 11:45 a.m. | 12:15 - 13:45 p.m. | 13:15 - 14:45 p.m. | Session 1: Video Understanding in Multimedia (Chair: Hailin Shi) | Motion = Video - Content: Towards Unsupervised Learning of Motion Representation from Videos | |
Blindly Predict Image and Video Quality in the Wild | |||||
Hierarchical Deep Residual Reasoning for Temporal Moment Localization | |||||
Video Saliency Prediction via Deep Eye Movement Learning | |||||
Conditional Extreme Value Theory for Open Set Video Domain Adaptation | |||||
Intra- and Inter-frame Iterative Temporal Convolutional Networks for Video Stabilization | |||||
11:45 - 12:30 p.m. | 13:45 - 14:30 p.m. | 14:45 - 15:30 p.m. | Session 2: Best Paper Candidates | Language Based Image Quality Assessment | |
Towards Discriminative Visual Search via Semantically Cycle-consistent Hashing Networks | |||||
Latent Pattern Sensing: Deepfake Video Detection via Predictive Representation Learning | |||||
12:30 - 14:00 p.m. | 14:30 - 16:00 p.m. | 15:30 - 17:00 p.m. | Session 3: Deep Learning for Multimedia (Chair: Lu Sheng) | A Local-Global Commutative Preserving Functional Map for Shape Correspondence | |
Differentially Private Learning with Grouped Gradient Clipping | |||||
Structural Knowledge Organization and Transfer for Class-Incremental Learning | |||||
Improving Hyperspectral Super-Resolution via Heterogeneous Knowledge Distillation | |||||
Patch-Based Deep Autoencoder for Point Cloud Geometry Compression | |||||
Score Transformer: Generating Musical Score from Note-level Representation | |||||
14:00 - 15:30 p.m. | 16:00 - 17:30 p.m. | 17:00 - 18:30 p.m. | Session 4: Multimodality Learning in Multimedia (Chair: Hongyuan Zhu) | BRUSH: Label Reconstructing and Similarity Preserving Hashing for Cross-modal Retrieval | |
Local Self-Attention on Fine-grained Cross-media Retrieval | |||||
Self-Adaptive Hashing for Fine-Grained Image Retrieval | |||||
Hierarchical Composition Learning for Composed Query Image Retrieval | |||||
Few-shot Egocentric Multimodal Activity Recognition | |||||
Inter-modality Discordance for Multimodal Fake News Detection | |||||
10 minutes break for transition from Zoom to Gather.Town | |||||
15:40 - 16:30 p.m. | 17:40 - 18:30 p.m. | 18:40 - 19:30 p.m. | Lightning Talk Session 1 | Brave New Idea | Discovering Social Connections using Event Images |
SangeetXML: An XML Format for Score Retrieval for Indic Music | |||||
Holodeck: Immersive 3D Displays Using Swarms of Flying Light Specks | |||||
Demo Papers | RoadAtlas: Intelligent Platform for Automated Road Defect Detection and Asset Management | ||||
Private-Share: A Secure and Privacy-Preserving De-Centralized Framework for Large Scale Data Sharing | |||||
An Efficient Bus Crowdedness Classification System | |||||
Short Papers - Part 1 | *Paper list is shown on Short Paper List below | ||||
16:30 - 17:30 p.m. | 18:30 - 19:30 p.m. | 19:30 - 20:30 p.m. | Workshop: Multi-Modal Embedding and Understanding | Focusing Attention across Multiple Images for Multi-Modal Event Detection | |
Adaptive Cross-stitch Graph Convolutional Networks | |||||
Generation of Variable-Length Time Series from Text using Dynamic Time Warping-Based Method | |||||
Hierarchical Graph Representation Learning with Local Capsule Pooling | |||||
Deep Adaptive-Attention Triple Hashing |
2 December, 2021
Beijing UTC +8 | AEST (BNE) UTC +10 | AEDT (SYD) UTC +11 | Session | Details | |
9:00 - 9:10 a.m. | 11:00 - 11:10 a.m. | 12:00 - 12:10 p.m. | Best Paper Award Announcement | ||
9:10 - 10:00 a.m. | 11:10 - 12:00 p.m. | 12:10 - 13:00 p.m. | Keynote 2: Yong Rui | Artificial Intelligence: Paving a Path to Digital Economy Transformation | |
10:00 - 10:30 a.m. | 12:00 - 12:30 p.m. | 13:00 - 13:30 p.m. | Grand Challenges | Introduction to two Grand Challenges | |
Paper: Hybrid Improvements in Multimodal Analysis for Deep Video Understanding | |||||
10:30 - 12:00 p.m. | 12:30 - 14:00 p.m. | 13:30 - 15:00 p.m. | Session 5: Vision and Language in Multimedia (Chair: Jing Zhang) | Semantic Enhanced Cross-modal GAN for Zero-shot Learning | |
TS2TD: A Tree-Structured Decoder for Image Paragraph Captioning | |||||
Entity Relation Fusion for Real-Time One-Stage Referring Expression Comprehension | |||||
Visual Storytelling with Hierarchical BERT Semantic Guidance | |||||
Efficient Proposal Generation with U-shaped Network for Temporal Sentence Grounding | |||||
Zero-shot Recognition with Image Attributes Generation using Hierarchical Coupled Dictionary Learning | |||||
5 mins break | |||||
12:05 - 13:35 p.m. | 14:05 - 15:35 p.m. | 15:05 - 16:35 p.m. | Session 6: Computer Vision in Multimedia (Chair: Tiesong Zhao) | Source-Style Transferred Mean Teacher for Source-data Free Object Detection | |
Improving Camouflaged Object Detection with the Uncertainty of Pseudo-edge Labels | |||||
MIRecipe: A Recipe Dataset for Stage-Aware Recognition of Changes in Appearance of Ingredients | |||||
Learning to Decompose and Restore Low-light Images with Wavelet Transform | |||||
Hard-Boundary Attention Network for Nuclei Instance Segmentation | |||||
A Model-Guided Unfolding Network for Single Image Reflection Removal | |||||
5 minutes break for transition from Zoom to Gather.Town | |||||
13:40 - 14:55 p.m. | 15:40 - 16:55 p.m. | 16:40 - 17:55 p.m. | Workshop: Multi-Model Computing of Marine Big Data | Deep Reinforcement Learning and Docking Simulations for autonomous molecule generation in de novo Drug Design | |
Joint label refinement and contrastive learning with hybrid memory for Unsupervised Marine Object Re-Identification | |||||
Prediction of transcription factor binding sites using deep learning combined with DNA sequences and shape feature data | |||||
A reinforcement learning-based reward mechanism for molecule generation that introduces activity information | |||||
A Fine-Grained River Ice Semantic Segmentation based on Attentive Features and Enhancing Feature Fusion | |||||
Multi-Scale Graph Convolutional Network and Dynamic Iterative Class Loss for Ship Segmentation in Remote Sensing Images | |||||
5 mins break | |||||
15:00 - 16:00 p.m. | 17:00 - 18:00 p.m. | 18:00 - 19:00 p.m. | Special Session | Women in Multimedia Roundtable | |
5 mins break | |||||
16:05 - 17:05 p.m. | 18:05 - 19:05 p.m. | 19:05 - 20:05 p.m. | Lightning Talk Session 2 - Short Papers - Part 2 | *Paper list is shown on Short Paper List below | |
17:05 - 17:40 p.m. | 19:05 - 19:40 p.m. | 20:05 - 20:40 p.m. | Social Connections on Gather.Town | Posters and Q&A for all tracks |
3 December, 2021
Beijing UTC +8 | AEST (BNE) UTC +10 | AEDT (SYD) UTC +11 | Session | Details | |
9:00 - 9:10 a.m. | 11:00 - 11:10 p.m. | 12:00 - 12:10 p.m. | Introduction to ACM Multimedia Asia 2022 | - | |
9:10 - 10:00 a.m. | 11:10 - 12:00 p.m. | 12:10 - 13:00 p.m. | Keynote 3: Divesh Srivastava | How to do Research for Fun and Profit | |
10:00 - 11:00 p.m. | 12:00 - 13:00 p.m. | 13:00 - 14:00 p.m. | HDR Lightning Talks | TBA | |
11:00 - 12:00 p.m. | 13:00 - 14:00 p.m. | 14:00 - 15:00 p.m. | Keynote 4: Klara Nahrstedt | Navigation Models for Interactive 360-Degree Video Streaming Systems | |
5 mins break | |||||
12:05 - 14:05 p.m. | 14:05 - 16:05 p.m. | 15:05 - 17:05 p.m. | Tutorial 1: Recent Advances in Video Summarization: Conventional and Deep Learning based Approaches | Zhiyong Wang (USyd), Zhou Zhao (ZJU), Xi Li (ZJU), Kun Kuang (ZJU), and Fei Wu (ZJU) | |
10 mins break | |||||
14:15 - 16:00 p.m. | 16:15 - 18:00 p.m. | 17:15 - 19:00 p.m. | Tutorial 2: Modeling User Behavior for Vertical Search: Images, Apps and Products | Xiaohui Xie (THU), Jiaxin Mao (THU), Yuqun Liu (THU), and Maarten de Rijke (UvA) | |
16:00 - 16:45 p.m. | 18:00 - 18:45 p.m. | 19:00 - 19:45 p.m. | Applied Research Track | Goldeye: Enhanced Spatial Awareness for the Visually Impaired using Mixed Reality and Vibrotactile Feedback | |
Convolutional Neural Network-Based Pure Paint Pigment Identification Using Hyperspectral Images | |||||
CFCR: A Convolution and Fusion Model for Cross-platform Recommendation | |||||
5 minutes break for transition from Zoom to Gather.Town | |||||
16:50 - 17:50 p.m. | 18:50 - 19:50 p.m. | 19:50 - 20:50 p.m. | Workshop: Visual Tasks and Challenges under Low-quality Multimedia Data | Local-enhanced Multi-resolution Representation Learning for Vehicle Re-identification | |
Dedark+Detection: A Hybrid Scheme for Object Detection under Low-light Surveillance | |||||
Making Video Recognition Models Robust to Common Corruptions With Supervised Contrastive Learning | |||||
Visible-Infrared Cross-Modal Person Re-identification based on Positive Feedback | |||||
17:50 p.m. | 19:50 p.m. | 20:50 p.m. | Closing | - |
*Short Paper List - Part 1
- CMRD-Net: An Improved Method for Underwater Image Enhancement
- Deep Multiple Length Hashing via Multi-task Learning
- Color Image Denoising via Tensor Robust PCA with Nonconvex and Nonlocal Regularization
- Conditioned Image Retrieval for Fashion using Contrastive Learning and CLIP-based Features
- PBNet: Position-specific Text-to-image Generation by Boundary
- An Embarrassingly Simple Approach to Discrete Supervised Hashing
*Short Paper List - Part 2
- Towards Transferable 3D Adversarial Attack
- Delay-sensitive and Priority-aware Transmission Control for Real-time Multimedia Communications
- Impression of a Job Interview training agent that gives rationalized feedback ~Should Virtual Agent Give Advice with Rationale
- A Coarse-to-fine Approach for Fast Super-Resolution with Flexible Magnification
- Automatically Generate Rigged Character from Single Image
- Flat and Shallow: Understanding Fake Image Detection Models by Architecture Profiling
- Multi-branch Semantic Learning Network for Text-to-Image Synthesis
- Attention-based Dual-Branches Localization Network for Weakly Supervised Object Localization
- Pose-aware Outfit Transfer between Unpaired in-the-wild Fashion Images
- Explore before Moving: A Feasible Path Estimation and Memory Recalling Framework for Embodied Navigation
- Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages
- PLM-IPE: A Pixel-Landmark Mutual Enhanced Framework for Implicit Preference Estimation
- Adaptive Viewport Margins Using Head Motion for Improving User Experience in Immersive Video
- Chinese White Dolphin Detection in the Wild
- BAND: A Benchmark Dataset for Bangla News Audio Classification
- A comparison study: the impact of age and gender distribution on age estimation
- Spherical Image Compression Using Spherical Wavelet Transform
- FQM-GC: Full-reference Quality Metric for Colored Point Cloud Based on Graph Signal Features and Color Features
- Cross-layer Navigation Convolutional Neural Network for Fine-grained Visual Classification
- NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels