Back

Challenges for AI visual recognition

What happens when the driverless car approaches a stop sign sprayed with graffiti? Does the car stop?

Now, Philip Kelman at UCLA has similar doubts about robots, especially on how they can perceive objects.

Well, these systems have gotten quite a bit better at recognizing objects. And in certain standardized tests, they classify images in terms of a thousand object categories about as well as humans do.

But we set out to figure out how they're doing it and whether they make good models for the human visual system. And a specific place that we jumped in was shape. Because we believe that shape required certain abstraction capabilities that these devices aren't capable of. And it turns out that when probed for the shapes of objects in the absence of other cues, these devices fail miserably.

So you might imagine a glass ornament on your desk that's a horse. The devices will say it's a hook or a corkscrew or a website but they don't know anything about 'horse' just from the shape. Real horses in pictures they get because there's texture, there's context. But the idea of actually seeing shape is not something that's in these networks.

And how come we can do it and they can't?

This is a great question and it'll be very helpful going forward and making better artificial systems. We have specialized routines for segmenting figure from ground—knowing where one object ends and where a surface passes behind, or another object. We give shape descriptions to the bounding contours of objects. And so we sort all those things out and that's done by processing routines that have considerable complexity, and obviously, biological vision systems have devoted a lot of resources to those things.

So, I'm reminded very much of—I know it's a bit of a leap—but face recognition where you'd imagine that you could be quite good at that because you have to. You have to recognize faces very, very quickly, especially in modern society. But, there's no variety of shape in a face. What is missing in a brain that can't recognize faces?

Well, that's a very hard question. For starters, it looks like shape recognition has specialized features in human perception. So it's not exactly the same as object perception and classification. There's still some dispute about that. But then the question, is what's missing when someone can't do this? It's easier right now for an artificial system to find a face in a scene and say: "well, that's a face." But the task you're talking about is, "whose face is it, you know, person one or person two?" And the problem is harder and obviously uses different computations. And I think people are still working on just what is missing from the brains of humans who have this difficulty.

One fascinating fact is, we have in neuroscience—as a result of functional magnetic resonance imaging work—one of the best-established findings, which is we have in our brains a special place for recognizing faces. It lights up when people see faces and it doesn't light up to houses or things like that. It turns out that when you test prosopagnosics, people who have a deficit for recognizing faces, their face area and the brain lights up just the same way as normal people. There's no deficit there at all, which is... it's a sobering lesson about using physiological measures or neural measures to try to infer processes, because it's a loose correlation on the best day. And that just makes it all the more mysterious what differs in terms of information processing.

Going back to A.I. and recognizing shapes, do you think they ever will?

Oh, I think they don't have a will. And I think that's one of the funnest conclusions from this kind of work. When we hear on the news that the artificial intelligence system recognizes objects we kind of import a lot of assumptions about what that would mean for us. If you look at a scene and you see a horse and you say: "there's a horse," and the artificial system also does that, we kind of think it sees the horse, it sees the shape, it sees that it's separable from the background, and maybe can retrieve some other information about horses. But it's not doing most of those things.

Okay well, let's see whether this matters at all. Take examples, such as driverless cars. Will they work eventually when you've got to take so many things into account at once?

This is a great question. I hope what will happen from research such as ours and many others', is that we'll continue to advance the work and incorporate deep learning systems which have great value with more routines that get at the kind of abstraction and symbolic coding that human perceptual systems have. And I believe if you did the work properly, you could eventually get systems that work much like ours do. But right now, the deep learning systems which are by far the most successful for recognition, and are used in some aspects in driverless cars... they have problems that you could imagine, such as if somebody spray painted graffiti on a stop sign. It might not be seen as a stop sign. So, I would say right now I'd be very wary of driving in driverless cars.

You wouldn't go in one?

No, no I wouldn't. And they get a lot of latitude because new ideas and progress are always intriguing, and down the road there'll be some great benefit. But for now, I think it's still in transition. An advantage of driverless cars is they can use radar to see the distance of objects and see whether there's a solid object. Of course, that's going to interact with whether it's raining and so forth. So, there are problems in every direction.

Well, one of the examples, a prosaic one that occurs to me as you speak, is of robots which have failed to be good at emptying a dishwasher or loading it. In other words, you've got all these different shapes that we put automatically into a dishwashing machine, the glasses, the plates, all different shapes and different textures. And they can't do it, which is astounding!

I would have to read more about that. But, assuming they have the problems on the motor control side—that is, reaching and grasping—solved, I would assume the problem is in object perception. And, one of the things that's a big issue in object perception, that we've studied in our laboratory for years, is perceiving objects despite partial occlusion. Most objects in the world are not sitting out there as the only object in the scene, but they're partially occluded by other objects. So, you might give an artificial system a nice description of the shape of a plate, but when you put it in among the other plates in the dishwasher, you're not getting light reflected from all of that shape and you get parts of it. Again, human vision has amazing routines for taking the partial information and completing the contours and filling in the surfaces, because our brain seems to know that the world has three dimensions and objects get in each other's ways. So, we've found some ways to solve those problems under lots of circumstances. But, it's as yet an unsolved problem in artificial intelligence.

Philip Coleman is Distinguished Professor in Psychology at UCLA. And, another professor, a colleague of his who talked about ants on The Science Show, has a suggestion I think Sydney Brenner would have liked. That we use ants to solve the AI problem for cars.

Well you know, the population of Australia is 25 million. But, there are some cities in the world that are twenty five million. Is there anything we humans can learn from these creatures in terms of space?

One thing—living in Los Angeles—I think a lot about is commute and highways, and so, thinking about these individual-based rules of robots, can we take these and look at the interactions between space and these agents and apply that to humans? And, yes, something I have to deal with everyday is a commute, and I'm hoping that one day, maybe ten years from now hopefully, self-driving cars will solve this. And so, people can use some of these local rules and these interactions that animals have with their environment to maybe create smart algorithms for how automated cars will talk with each other on these highways. So, you know, these self-driving cars will have to deal with spatial constraints because they can't fly yet, right? They'll have to be on these highways that exist, and there will be other cars around them, and the cars will vary in their behavior because they'll be of different types: potentially they'll be trucks, and there will be other smaller cars and bigger cars. And, potentially as people we might be able to use the interactions of ants with their environment to learn how to get good algorithms for how to teach these automated cars to drive us safely to work.

Professor Noah Pinto Wollman at UCLA. Yes, we keep going back to basics to find ways to leap into the future. The Science Show, on RN.

Philip Kellman The Science Show ABC

Philip Kellman

Distinguished Professor
Adjunct Professor of Surgery
Ph.D., University of Pennsylvania
Area Chair: Cognitive Psychology
Primary Area: Cognitive Psychology

Robyn Williams

Science Journalist and Broadcaster

UCLA

University of California — Los Angeles