Vision and Language

Girl pushing boy on swing. Image source: https://www.flickr.com/photos/kokotron/5880634416

Language is used to convey meaning, but how does it come to have meaning?  One source for understanding meaning comes from visual information.  

To learn the meaning of “the girl pushed the boy”, we need to see events where girls push boys. Critically in this sentence, the girl is doing the pushing, not the boy, and this is signalled by the fact that the girl is in the subject position before “pushed”.  But how do we learn that the subject of this verb is the doer?  

This project examines how visual cues in the scene can be used to identify who did what to whom in events.  This will build on a large literature which shows that infants can use simple visual heuristics to understand events.  It will also make contact with work in adult vision, which show that humans have powerful object tracking and relational meaning systems.  We hope to understand the basic primitives that are used to understand meaning from visual scenes and this work could yield computational systems that can understand events and describe them with language.

Project Team: Franklin Chang (Lead),  Anna Theakston and Soumitra Samanta

Start Date: September 2016

Duration: 3 years

(Work Package 1)