Vision and Language

Language is used to convey meaning, but how does it come to have meaning? One source for understanding meaning comes from visual information.

To learn the meaning of “the girl chased the boy”, we need to see events where girls chase boys. Critically in this sentence, the girl is doing the chasing, not the boy, and this is signalled by the fact that the girl is in the subject position before “chased”. But how do we learn that the subject of this verb is the doer?

In this project, we developed simulations of how people understand visual events (Samanta & Chang, 2018, 2018). For example for a chasing event, we created videos of circles that moved around randomly, except one of the circles (“the wolf”) was chasing another circle (“the sheep”). Then we developed a computer model of how a person understands this type of scene. The model had to keep track of all of the circles on the screen (otherwise they would forget which object was the wolf). The model also had to track how the objects moved relative to one another, because the wolf can be detected by watching which circle moves towards another circle. We found that this simulation could explain how people recognise the doer in events like pushing and chasing (Jessop & Chang, in press). We now understand some of the basic primitives that are used to understand meaning from visual scenes.

Notable Outputs:

Samanta, S. and Chang, F. (2018). Modelling Human Understanding of Thematic Roles with Motion Heuristics. Proceedings of the 17th IEEE International Conference on Machine Learning and Applications. Orlando, USA

Samanta, S. and Chang, F. (2018) A Computational Model for Thematic Roles Identification. A poster presentation at the 51st Annual Meeting of the Society for Mathematical Psychology, Madison, USA

Jessop, A. & Chang, F. (2020) Thematic role information is maintained in the visual object tracking system. Quarterly Journal of Experimental Psychology

Project Team: Franklin Chang (Lead) and Soumitra Samanta

Dates: September 2016 - August 2019

(Work Package 1)

Vision and Language

What We Do