Simultaneous perception of audio and visual stimuli often causes concealment or misrepresentation of information actually contained in these stimuli. Such effects are called the "image proximity effect" or the "ventriloquism effect" in the literature. Until recently, most research carried out to understand their nature was based on subjective assessments. The authors of this paper propose a methodology based on both subjective and objectively retrieved data. In this methodology, objective data reflect the screen areas that attract most attention. The data were collected and processed by an eye-gaze tracking system. To support the proposed methodology, two series of experiments were conducted - one with a commercial eye-gaze tracking system Tobii T60, and another with the Cyber-Eye system developed at the Multimedia Systems Department of the Gdańsk University of Technology. In most cases, the visual-auditory stimuli were presented using a 3D video. It was found that the eye-gaze tracking system did objectivize the results of experiments. Moreover, the tests revealed a strong correlation between the localization of a visual stimulus on which a participant's gaze focused and the value of the "image proximity effect". It was also proved that gaze tracking may be useful in experiments which aim at evaluation of the proximity effect when presented visual stimuli are stereoscopic.