Grand Challenges – 2022 ACM Multimedia

Multimedia Grand Challenges

Multimedia grand challenge papers will go through a single-blind review process. Submitted papers (.pdf format) must use the ACM Article Template https://www.acm.org/publications/proceedings-template as used by regular ACMMM submissions. Please limit each grand challenge paper submission to 4 pages with 1-2 extra pages for references only. Papers will be handled by a submission site TBD.

Submission: https://openreview.net/group?id=acmmm.org/ACMMM/2022/Track/Grand_Challenges

Deadline: June 25th, 2022 11:59 AoE

The purpose of the Multimedia Grand Challenge is to engage the multimedia research community by establishing well-defined and objectively judged challenge problems intended to exercise the state-of-the-art methods and inspire future research directions. The key criteria for Grand Challenges are that they should be useful, interesting, and their solution should involve a series of research tasks over a long period of time, with pointers towards longer-term research.

The Multimedia Grand Challenge proposals accepted for the ACM Multimedia 2022 edition are the following:

Social Media Prediction (SMP) Challenge

Bo Wu, Wen-Huang Cheng, Bei Liu, Jiebo Luo, Jia Wang, Zhaoyang Zeng, Peiye Liu, Qiushi Huang

https://smp-challenge.com

SMP Challenge is seeking novel methods for forecasting problems and meaningfully improving people’s social lives and business scenarios. The enormous amounts of online content lead to overconsumption, online word-of-mouth helps us to efficiently discover interesting news, emerging topics, or amazing products from information oceans. Therefore, we formulated Social Media Popularity Prediction task as predicting online popularity in social media platforms and provided a benchmark SMPD with about half million of posts.

Short Video Streaming Challenge

Yong Cui, Wei Tsang Ooi, Jiangchuan Liu, Kai Zheng, Junchen Jiang, Xinggong Zhang

https://www.aitrans.online/MMGC2022/

Short video is a new form of video that is shared on online social platforms based on user-generated content. However, it is worth noting that short video companies spend a lot on bandwidth. Saving bandwidth overhead without reducing user quality of experience (QoE) has become an important issue. This challenge of reducing the bandwidth wastage is to match the video download mechanism with user viewing behavior and network conditions.

Deep Video Understanding Challenge

George Awad, Keith Curtis, Shahzad Rajput, Ian Soboroff

https://sites.google.com/view/dvuchallenge2022/

The DVU Challenge seeks new techniques and approaches to address how an automatic system understands and comprehend a full movie in terms of entities (characters, locations, concepts, relationships, interactions, sentiments, etc). The challenge provides a set of development data as whole annotated movies on the movie-level and scene-level. While testing systems on new unseen movies licensed from a professional movie distrubution platform (KinoLorberEdu). Queries on the movie-level and scene-level includes sentiment classification, text summary matching with correct scenes, finding next/previous interactions between specific characters, finding unique scenes, fill-in-the-graph space about how entities are related. Systems will be given the chance to choose to participate in either movie-level, scene-level queries, or both.

Computational Paralinguistics ChallengE

Björn W. Schuller, Anton Batliner, Christian Bergler, Shahin Amiriparian

http://www.compare.openaudio.eu/

The Computational Paralinguistics ChallengE (ComParE) series is an open Challenge in the field of Computational Paralinguistics dealing with states and traits of individuals as manifested in their speech and further signals’ properties. The Challenge takes annually place since 2009. Every year, we introduce new tasks as there still exists a multiplicity of not yet covered, but highly relevant paralinguistic phenomena. At the same time, new baseline methods and challenge types are introduced.

Detecting CheapFakes

Shivangi Aneja, Cise Midoglu, Duc-Tien Dang-Nguyen, Sohail Ahmed Khan, Michael Riegler, Pal Halvorsen, Chris Bregler, Balu Adsumilli

https://detecting-cheapfakes.github.io/

Cheapfake is a recently coined term that encompasses non-AI (“cheap”) manipulations of multimedia content. Cheapfakes are known to be more prevalent than deepfakes. Cheapfake media can be created using editing software for image/video manipulations, or even without using any software, by simply altering the context of an image/video by sharing the media alongside misleading claims. This alteration of context is referred to as out-of-context (OOC) misuse of media. OOC media is much harder to detect than fake media, since the images and videos are not tampered. In this challenge, we focus on detecting OOC images, and more specifically the misuse of real photographs with conflicting image captions in news items. The aim of this challenge is to develop and benchmark models that can be used to detect whether given samples (news image and associated captions) are OOC, based on the recently compiled COSMOS dataset.

Facial Micro-Expression Grand Challenge

Jingting Li, Moi Hoon Yap, Wen-Huang Cheng, John See, Xiaopeng Hong, Xiaobai Li, Su-Jing Wang
https://megc2022.github.io

Facial micro-expressions (MEs) are involuntary movements of the face that occur spontaneously when a person experiences an emotion but attempts to suppress or repress the facial expression, typically found in a high-stakes environment. Unfortunately, the small sample problem severely limits the automation of FME analysis. Furthermore, due to the brief and subtle nature of ME, ME spotting is a challenging task, and the performance is still not satisfactory yet. Addressing these issues, this challenge focuses on two tasks, i.e., the ME generation task and the ME and the macro-expression spotting task.

GigaTracking:

When Multi-Object Tracking meets Gigapixel Videography

Lu Fang, David Brady, Mengqi Ji, Feng Yang

http://www.gigavision.cn/ACMMM2022_main.html

In the past decades, digital cameras have witnessed the emergence of smart imaging solutions driven by AI and CV algorithms. While the development of deep learning has radically improved the capacity of computational imaging, how to explore the benefit of high-performance imaging to inspire and promote more capable computer vision solutions has not been investigated extensively. When it comes to gigapixel-level images/videos, state-of-the-art tracking algorithms perform unsatisfactorily due to traits like wide FoV, high resolution, and intricate object interactions. Therefore, our GigaTracking challenge aims to find new solutions toward accurate MOT for gigapixel-level videos. In particular, the participants are encouraged to investigate the questions, including but not limited to novel representations, network architecture, and paradigm, so as to explore advanced solutions for contributing the communities.

MultiMediate challenge

Philipp Müller, Dominik Schiller, Dominike Thomas, Michael Dietz, Hali Lindsay, Patrick Gebhard, Elisabeth André, Andreas Bulling

https://multimediate-challenge.org

Artificial mediators are a promising approach to support group conversations, but at present, their abilities are limited by insufficient progress in group behaviour sensing and analysis. The MultiMediate challenge is designed to work towards the vision of effective artificial mediators by facilitating and measuring progress on key group behaviour sensing and analysis tasks. This year, the challenge focuses on backchannel detection and agreement estimation from backchannels, but also continues last year’s tasks of eye contact detection and next speaker prediction.

Conversational Head Generation

Yalong Bai, Mohan Zhou, Tong Shen, Wei Zhang, Ting Yao, Xiaodong He, Tao Mei

https://vico-challenge.github.io/

Conversational head generation is to synthesize the head dynamics during a conversation (including both talking and listening roles). This task is critical for applications such as telepresence, digital human, virtual agents, and social robots. Current talking-head generation only covers one-way information flow, still miles away from the full sense of “communication”. This challenge is based on ViCo dataset, the first video corpus featuring conversations between real humans. Two tracks are included in our challenge, including talking head video generation (speaker), and responsive listening head video generation (listener).

Pre-training for Video Understanding Challenge

Yingwei Pan, Zhaofan Qiu, Yehao Li, Fuchen Long, Ting Yao, Tao Mei

http://auto-video-captions.top/2022/

The goal of this challenge is to offer a fertile ground for designing pre-training techniques that facilitate a series of video understanding downstream tasks (e.g., video captioning and video categorization this year). Hence, two tracks (Pre-training for Video Captioning and Pre-training for Video Categorization) will be involved in this grand challenge. Meanwhile, to further motivate and challenge the multimedia community, we provide two large-scale video pre-training datasets, i.e., Auto-captions on GIF (ACTION) and the Weakly-Supervised dataset, for contestants to solve this challenging but emerging task in each track.

Important Dates

The submission deadline is at 11:59 p.m. of the stated deadline date Anywhere on Earth.

Submission of Solutions Grand Challenge:

25 June, 2022

Grand Challenge Paper Notification of Acceptance:

~~13 July, 2022~~

16 July, 2022

Grand Challenge Paper Camera Ready:

~~20 July, 20212~~

23 July, 20212

Contacts

For questions regarding the Grand Challenges you can email the Multimedia Grand Challenge Chairs at <mm22-grand-challenge@sigmm.org>:

Miriam Redi

Georges Quénot