Nguyen Le Thanh, Hyunhwan “Aiden” Lee, Joseph Johnson, Mitsunori Ogihara, Gang Ren, and James W. Beauchamp. (2019). “A new auditory image for social media: Moving towards correlation of spectrographic analysis and interpretation with audience perception”
The Journal of the Acoustical Society of America, 146(4), pp.2846-2846.
Spectrogram and other time-frequency analysis methods transfer an audio file into an auditory image. When signal processing-based analysis and interpretation is performed on these auditory images instead of an audio signal, spectrographic analyses can identify interesting patterns that focus on very different aspects of the signal compared to an audio-based analysis. To facilitate an auditory image-based study, a quantitative analysis and interpretation framework is implemented for exploring the spectrographic images in multiple time and frequency scales and for automatically identifying image features that are relevant to human auditory perception. This analysis framework is applied to two social media datasets: (1) soundtracks from video commercials and “hit” music excerpts from social media platforms, and (2) soundtracks from television and film. Analysis results from social media are also compared with audience subjective evaluations to validate the perceptual relevance of the identified spectrographic patterns.
Modern human-computer interaction systems use multiple perceptual dimensions to enhance intuition and efficiency of the user by improving their situational awareness. A signal processing and interaction framework is proposed for auralizing signal patterns for augmenting the visualization-focused analysis tasks of social media content analysis and annotations, with the goal of assisting the user in analyzing, retrieving, and organizing relevant information for marketing research. Audio signals are generated from video/audio signal patterns as an auralization framework, for example, using the audio frequency modulation that follows the magnitude contours of video color saturation. The integration of visual and aural presentations will benefit the user interactions by reducing the fatigue level and sharping the users’ sensitivity, thereby improving work efficiency, confidence, and satisfaction.
The rapid advances in e-commerce and Web 2.0 technologies have greatly increased the impact of commercial advertisements on the general public. As a key enabling technology, a multitude of recommender systems exists which analyzes user features and browsing patterns to recommend appealing advertisements to users. In this work, we seek to study the characteristics or attributes that characterize an effective advertisement and recommend a useful set of features to aid the designing and production processes of commercial advertisements. We analyze the temporal patterns from multimedia content of advertisement videos including auditory, visual and textual components, and study their individual roles and synergies in the success of an advertisement. The objective of this work is then to measure the effectiveness of an advertisement, and to recommend a useful set of features to advertisement designers to make it more successful and approachable to users. Our proposed framework employs the signal processing technique of cross-modality feature learning where data streams from different components are employed to train separate neural network models and are then fused together to learn a shared representation. Subsequently, a neural network model trained on this joint feature embedding representation is utilized as a classifier to predict advertisement effectiveness. We validate our approach using subjective ratings from a dedicated user study, the sentiment strength of online viewer comments, and a viewer opinion metric of the ratio of the Likes and Views received by each advertisement from an online platform.
Temporal contour shapes are closely linked to the narrative structure of multimedia content and provide important reference points in content-based multimedia timeline analysis. In this paper, multimedia timeline is extracted from content as time varying video and audio signal features. A temporal contour representation is implemented based on sequential pattern discovery algorithm for modeling the variation contours of multimedia features. The proposed contour representation extracts repetitive temporal patterns from a hierarchy of time resolutions or from synchronized video/audio feature dimensions. The statistically significant contour components, depicting the dominant timeline shapes, are utilized as a structural or analytical representation of the timeline. The modeling performance of this proposed temporal modeling framework is demonstrated through empirical validation and subjective evaluations.