A unified computational framework for visual attention dynamics.


University of Florence, Florence, Italy; University of Siena, Siena, Italy. Electronic address: [Email]


Eye movements are an essential part of human vision as they drive the fovea and, consequently, selective visual attention toward a region of interest in space. Free visual exploration is an inherently stochastic process depending on image statistics but also individual variability of cognitive and attentive state. We propose a theory of free visual exploration entirely formulated within the framework of physics and based on the general Principle of Least Action. Within this framework, differential laws describing eye movements emerge in accordance with bottom-up functional principles. In addition, we integrate top-down semantic information captured by deep convolutional neural networks pre-trained for the classification of common objects. To stress the model, we used a wide collection of images including basic features as well as high level semantic content. Results in a task of saliency prediction validate the theory.


Convolutional neural networks,Principle of least action,Saliency,Scanpath,Visual attention,Visual features,