Visual-audio correspondence and its effect on video tipping: Evidence from Bilibili vlogs

来源： | 发布时间：2023-04-25| 点击：次

作者:Li, B (Li, Bu) [1] ; Zhao, JC (Zhao, Jichang) [1]

卷60期3

文献号103347

DOI:10.1016/j.ipm.2023.103347

出版时间:MAY 2023

在线发表:MAR 2023

已索引2023-04-20

文献类型Article

摘要

Video tipping takes a remarkable share in the income of online streaming platforms such as Bilibili. There are some specific mappings between the audio and visual signals that viewers can sense (e.g., congruency of pitch and size), which is generally called visual-audio correspondence (VAC). And it is believed to influence viewer satisfaction with video clips. The way to automatically measure VAC, however, still remains missing and its possible effect on video tipping is rarely examined in previous efforts. In this study, a deep neural network with two sub-networks, namely VAC-Net, is established to map both visual and audio stimuli into a shared embedding space. And the Euclidean distance between visual and audio representations in this space is accordingly presented to be the indicator of VAC. Pre-trained models of both modalities and the triplet loss are further leveraged to train the VAC-Net and it competently evaluates VAC of video clips with a test accuracy of 68.37% by outperforming alternative baselines and even exceeding humans on the similar task. Lab-experiments further show that the VAC measurement of VAC-Net conforms to human cognition. Second, considering that viewers' tipping behavior (TIP) on videos is consistent with the pricing strategy Pay What You Want (PWYW), it is hypothesized that VAC would indirectly influence TIP by reshaping viewer satisfaction (VS). Regression models are thus built to test the hypotheses and it is found that VAC can promote TIP by enhancing VS significantly. Additional tests also demonstrate the robustness of this mechanism by considering various controls and measurement errors. Our results supplement PWYW in streaming videos with a new motive of VAC for viewer tipping and provide streaming practitioners with an automatic tool to estimate the tips videos will receive.

关键词

作者关键词:Streaming videos,Visual-audio correspondence,Deep neural network,Pay What You Want,Consumer satisfaction,Tipping behavior

通讯作者地址

Zhao, Jichang

(通讯作者)

Beihang Univ, Sch Econ & Management, Beijing, Peoples R China

地址

1 Beihang Univ, Sch Econ & Management, Beijing, Peoples R China

电子邮件地址jichang@buaa.edu.cn

原文链接：

https://linkinghub.elsevier.com/retrieve/pii/S0306457323000845