Mona Omidyeganeh

Parametric Analysis and Modeling of Video Signals

Supervisor(s) and Committee member(s): Supervisor: Shahrokh Ghaemmaghami, Co-supervisor: Shervin Shirmohammadi

URL: http://www.site.uottawa.ca/~shervin/theses/2012-MonaOmidyeganeh.pdf

Video modeling and analysis have been of great interest in the video research community, due to their essential contribution to systematic improvements concerned in a wide range of video processing techniques. Parametric modeling and analysis of video provides appropriate means for processing the signal and the necessary mining of information for efficient representation of the signal. Video comparison, human action recognition, video retrieval, video abstraction, video transmission, and video clustering are some applications that can benefit from video modeling and analysis.


In this thesis, the parametric analysis and modeling of the video signal is studied through two schemes. In the first scheme, spatial parameters are first extracted from video frames and temporal evolution of these spatial parameters is investigated. Spatial parameters are selected based on the statistics of the 2D wavelet transform of the video frames, where wavelet transform provides a sparse representation of the signals and structurally conforms to the frequency sensitivity distribution of the human visual system. To analyze the temporal relations and progress of these spatial parameters, three methods are considered: inter-frame distance measurement, temporal decomposition, and Autoregressive (AR) modeling. In the first method, employing the Kullback–Leibler (KL) distance between spatial parameters as the similarity measure, the temporal evolution of the spatial features is studied. This analysis is used to determine shot boundaries, segment shots into clusters and select keyframes properly based on both similarity and dissimilarity criteria, within and outside the corresponding cluster, respectively. In the second method, the video signal is assumed to be a sequence of overlapping independent visual components called events, which typically are temporally overlapping compact functions that describe temporal evolution of a given set of the spatial parameters of the video signal. This event-based temporal decomposition technique is used for video abstraction, where no shot boundary detection or clustering is required. In the third method, the video signal is assumed to be a combination of spatial feature time series that are temporally approximated by the AR model. The AR model describes each spatial feature vector as a linear combination of the previous vectors within a reasonable time interval. Shot boundaries are well detected based on the AR prediction errors, and then at least one keyframe is extracted from each shot. To evaluate these models, subjective and objective tests, on TRECVID and Hollywood2 datasets, are conducted and simulation results indicate high accuracy and effectiveness of these techniques.

In the second scheme, video spatio-temporal parameters are extracted from 3D wavelet transform of the natural video signal based on the statistical characteristics analysis of this transform. Joint and marginal statistics are studied and the extracted parameters are utilized for human action recognition and video activity level detection. Subjective and objective test results, on the popular Hollywood2 and KTH datasets, confirm high efficiency of this analysis method, as compared to the current techniques.

While the thesis is written in Farsi, the following English papers encompass some of the main technical aspects from the thesis:

M. Omidyeganeh, S. Ghaemmaghami, and S. Shirmohammadi, “Video Keyframe Analysis Using a Segment-Based Statistical Metric in a Visually Sensitive Parametric Space,” IEEE Transactions on Image Processing, vol. 20, issue 10, pp. 2730:2737, 2011.
M. Omidyeganeh, S. Ghaemmaghami, and S. Shirmohammadi, “Group Based Spatio-Temporal Video Analysis and Abstraction Using Wavelet Parameters,” Signal, Image and Video Processing, Springer, Accepted 2011, to appear.
M. Omidyeganeh, S. Ghaemmaghami, and S. Shirmohammadi, “Application of 3D-Wavelet Statistics to Video Analysis,” Multimedia Tools and Applications, Springer, Accepted 2012, to appear.

Distributed and Collaborative Virtual Environment Research Lab (DISCOVER Lab), University of Ottawa, Canada

URL: http://www.discover.uottawa.ca/

Research at the DISCOVER Lab is directed towards the enhancement of next generation human-to-human communication through advanced multimedia technology and virtual environments. Through our many projects, we are developing new ideas and technology that will make easy-to-use virtual environments and mobile computing a reality. Research projects at the DISCOVER lab typically fall into the following categories:

  • Networked Games and Collaborative Virtual Environments
  • Mobile, 3D, and Multiview Video
  • Haptics and Teleoperation
  • Multimedia Systems and Applications
  • 3D Physical Modelling and Animation
  • Intelligent Sensor Networks and Ubiquitous Computing
  • Multimedia-Assisted Rehabilitation Engineering

Bookmark the permalink.