Generatıng Storıes From Large Scale Image Collectıons
Özet
Making sense of ever-growing amount of visual data that is available on the web is one of the biggest challenges we face today. As a step towards this goal, this study tackles a relatively less-studied topic in the literature, namely generating structured summaries of large photo collections in a purely unsupervised manner. Our methodology relies on the notion of a story graph which captures the main narratives in the data and their complex relationships by means of a directed graph with a set of (possibly intersecting) paths. Our proposed method identifies coherent visual story lines and exploits submodularity to select a subset of these lines which have the maximum coverage. Various experiments and user studies demonstrate that the approach delivers better performance than the previous methods.
Furthermore, this study explores the role of visual attention and image semantics in understanding image memorability. In particular, we present an attention-driven spatial pooling strategy and show that considering image features from the salient parts of images improves the results of the previous models. We also investigate different semantic properties of images by carrying out an analysis of a diverse set of semantic features which encode meta-level object categories, scene attributes, and invoked feelings. We show that these features which are automatically extracted from images provide memorability predictions as nearly accurate as those derived from human annotations.
Finally, by incorporating the memorability property together with aesthetics into the story graph generation framework, the effects of intrinsic properties on story graphs are explored. Experiments utilizing these memorable and aesthetic story graphs as a prior knowledge base show further improvements.