The strange, the off-kilter and the not-quite-right

The release of Mufasa, Disney’s photorealistic prequel to The Lion King, occasioned this essay for the Telegraph on the biota of Uncanny Valley

In 1994 Disney brought Shakespeare’s Hamlet, or something like it, to the big screen, In turning the gloomy Dane into an adorable line cub, and his usurping uncle into Scar (arguably their most terrifying villain ever) the company created the highest-grossing movie of the year. Animators sat up and marveled at the way the film combined hand-drawn characters with a digitally rendered environment and thousands of CGI animals. This new technology could aid free expression, after all!

Well, be careful what you wish for.

When in 2019, Disney remade its beloved The Lion King (1994), it swapped the original’s lush hand-drawn animation for naturalistic computer-generated imagery. The 2019 reboot had a budget of $260 million (£200 million) and took more than $1.5 billion (£1.1 billion) at the box office, making it one of the most expensive, and highest-grossing, films of all time – and the focus of a small but significant artistic backlash. Some critics voiced discomfort with the fact that it looked more like an episode of Planet Earth than a high-key musical fantasy. Its prequel Mufasa: The Lion King (directed by Moonlight’s Barry Jenkins), released this month, deepens the trend. For Disney, it’s a show of power, I suppose: “Look at our animation, so powerful, you’ll mistake it for the world itself!” In time, though, the paying public may well regret Disney’s loss of faith in traditional animation.

What animator would want to merely reflect the world through an imaginary camera? The point of the artform, surely, is to give emotion a visual form. But while a character drawn in two dimensions can express pretty much anything (Felix the Cat, Wile E Coyote and Popeye the Sailor are not so much bodies as containers for gestures) drawing expressively in 3D is genuinely hard to do. Any artist with Pixar on their resume will tell you that. All that volumetric precision gets in the way. Adding photorealism to the mix makes the job plain impossible.

Disney’s live-action remake of The Jungle Book (2016) at least used elements of motion capture to match the animals’ faces to the spoken dialogue. In 2024, even that’s not considered “realistic” enough. Mufasa, Simba, Rafiki the mandrill and the rest simply chew on air while dialogue arrives from out of space, in the manner of Italian neorealist cinema (which suggests, incidentally, that, along with the circle of life, there’s also a circle of cinema).
Once you get to this point, animation is a distant memory; you’ve become a puppeteer. And you confront a problem that plagues not only Hollywood films, but the latest advances in robotic engineering and AI: “the uncanny valley”.

The uncanny valley describes how the closer things come to resembling real life, the more on guard we are against being fooled or taken in by them. The more difficult they are to spot as artificial, the stronger our self-preserving hostility towards them. It is the point in the development of humanoid robots when their almost-credible faces might send us screaming and running out of the workshop. Or, on a more relatable level, it describes the uneasiness some of us feel when interacting with virtual assistants such as Apple’s Siri and Amazon’s Alexa.

The term was invented by the Japanese roboticist Masahiro Mori in 1970 – when real anthropomorphic robots didn’t even exist – who warned designers that the more their inventions came to resemble real life-forms, the creepier they would look.

Neurologists seized on Mori’s idea because it suggested an easy and engaging way of studying how our brains see faces and recognise people. Positron emission tomography arrived in clinics in the 1970s, and magnetic resonance imaging about twenty years later. Researchers now had a way of studying the living human brain as it saw, heard, smelled and thought. The uncanny valley concept got caught up in a flurry of very earnest, very technical work about human perception, to the point where it was held up as a profound, scientifically-arrived-at insight into the human condition.

Mori was more guarded about all the fuss. Asked to comment on some studies using slightly “off” faces and PET scans, he remarked: “I think that the brain waves act that way because we feel eerie. It still doesn’t explain why we feel eerie to begin with.” And these days the scientific community is divided on how far to push the uncanny valley concept – or even whether such a “valley” (which implies a happy land beyond it, one in which we would feel perfectly at ease with lifelike technology) exists at all.

Nevertheless, the uncanny valley does suggest a problem with the idea that in order to make something lifelike, you just need to ensure that it looks like a particular kind of living thing – a flaw that is often cited in critical reviews of Disney’s latest photorealist animations. Don’t they realise that the mind and the eye are much more attuned to behaviour than they are to physical form? Appearances are the least realistic parts of us. It’s by our behaviour that you will recognise us. So long as you animate their behaviour, whatever you draw will come alive. In 1944 psychologists Fritz Heider and Marianne Simmel made a charming 90-second animation, full of romance, and adventure, using two triangles, a circle and a rectangle with a door in it.

There are other ways to give objects the gift of life. A few years ago, I met the Tokyo designer Yamanaka Shunji, who creates one-piece walking machines from 3D vinyl-powder printers. One, called Apostroph (a collaboration with Manfred Hild in Paris), is a hinged body made up of several curving frames. Leave it alone, and it will respond to gravity, and try to stand. Sometimes it expands into a broad, bridge-like arch; at other times it slides one part of itself through another, curls up and rolls away.

Engineers, by associating life with surface appearances, are forever developing robots that are horrible. “They’re making zombies!” Shunji complained. Artists on the other hand know how to sketch. They know how to reduce, and abstract. “From ancient times, art has been about the right line, the right gesture. Abstraction gets at reality, not by mimicking it, but by purifying it. By spotting and exploring what’s essential.”

This, I think, gets to the heart of the uncanny valley phenomenon: we tend to associate life with particular outward forms, and when we reproduce those things, we’re invariably disappointed and unnerved, wondering what sucked the life out of them. We’re looking for life in all the wrong places. Yamanaka Shunji’s Apostroph is alive in a way Mufasa will never be.

***

We’re constantly trying to differentiate between living and the non-living. And as AI and other technologies blur the lines between living things and artefacts, we will grapple with the challenge of working out what our moral obligations are towards entities — chatbots, robots, and the like — that lack a clear social status. In that context, the “uncanny valley” can be a genuinely useful metaphor.

The thing to keep in mind is that the uncanny is not a new problem. It’s an evolutionary problem.

Decades ago I came across a letter to New Scientist magazine in which a reader recalled taking a party of blind schoolchildren to London Zoo. He wanted the children to feel and cuddle the baby chimps, learning about their hair, hands, toes and so on, by touch. The experiment, however, proved to be a disaster. “As soon as the tiny chimps saw the blind children they stared at their eyes… and immediately went into typical chimpanzee attack postures, their hair standing upright all over their bodies, their huge mobile lips pouting and grimacing, while they jumped up and down on all fours uttering screams and barks.”
Even a small shift in behaviour — having your eyes closed, say, or not responding to another’s gaze, was enough to trigger the chimpanzee’s fight-or-flight response. Primates, it seems, have their own idea of the uncanny.

Working out what things are is not a straightforward business. When I was a boy I found a hedgehog trying to mate with a scrubbing brush. Dolphins regularly copulate with dead sharks (though that might just be dolphins being dolphins). Mimicry compounds the problem: beware the orchid mantis that pretends to be a flower, or the mimic octopus that’ll shape-shift into just about anything you put in front of it.

In social species like our own, it’s especially important to recognise the people you know.
In a damaged brain, this ability can be lost, and then our nearest and our dearest, our fathers, mothers, sons, daughters, spouses, best friends and pets become no more in our sight than malevolent simulacra. For instance, Capgras syndrome is a psychiatric disorder that occurs when the internal portion of our representation of someone we know becomes damaged or inaccessible. This produces the impression of someone who looks right on the outside, but seems different on the inside – you believe that your loved one has been taken over by an imposter.

Will Mufasa trigger Capgras-like responses from movie-goers? Will they scream and bark at the screen, unnerved and ready to attack?

Hopefully not. With each manifestation of the digital uncanny comes the learning necessary for us not to be freaked out by it. That man is not really on fire. That alien hasn’t really vanished down the actor’s throat. After all, the rise of deepfakes and chatbots shows no sign of slowing. But is this a good thing?

I’m not sure.

When push comes to shove, the problem with photorealist animation is really just a special case of the problem with blockbuster films in general: the closer it comes to the real, the more it advertises its own imposture.

Cinema is, and always has been, a game of sunk costs. The effort grows exponentially, to satisfy the appetites of viewers who have become exponentially more jaded.

And this raises a more troubling thought – that beyond the uncanny valley’s lairs of the strange, the off-kilter and the not-quite-right is a barren land marked, simply, “Indifference”.

The uncanny valley seemed deep enough, in the 1970s, to inspire scientific study, but we’ve had half a century to acclimitise to not-quite-human agents. And not just acclimitise to them: Hanson Robotics’ wobbly-faced Sophia generated more scorn than terror when the Saudi government unveiled her in 2017. The wonderfully named Abyss Creations of Las Vegas turned out their first sexbot in 1996. RealDoll now has global competition, especially from east Asia.

Perhaps we’ve simply grown in sophistication. I hope so. The alternative is not pretty: that we’re steadily lowering the bar on what we think is a person.

 

It’s coming at you!

Exploring volumetric capture for New Scientist, 13 December 2017

OUTSIDE Dimension Studios in Wimbledon, south London, is one of those tiny wood-framed snack bars that served commercial travellers in the days before motorways. The hut is guarded by old shop dummies dressed in fishnet tights and pirate hats. If the UK made its own dilapidated version of Westworld, the cyborg rebellion would surely begin here.

Steve Jelley orders us breakfast. Years ago he left film production to pursue a career developing new media. He’s of the generation for whom the next big thing is always just around the corner. Most of them perished in the dot-com bust of 2001, but Jelley clung to the dream, and now Microsoft has come calling.

His company, Hammerhead, makes 360-degree videos for commercial clients. Its partner in this current venture, Timeslice Films, is best known for volumetric capture of still images – the business of cinematographically recording forms in three dimensions – a practice that goes back to founder Tim MacMillan’s art-school experiments of the early 1980s.

Steve Sullivan, director of the Holographic Video initiative at Microsoft, is fusing both companies’ technical expertise to create volumetric video: immersive entertainment that’s indistinguishable from reality.

There are only three studios in the world that can do this with any degree of conviction, and Wimbledon is the only one outside the US. Still, I’m sceptical. It has been clear for a while that truly immersive media won’t spring from a single “light-bulb” moment. The technologies involved are, in conceptual terms, surprisingly old. Volumetric capture is a good example.

MacMillan is considered the godfather of this tech, having invented the “bullet time” effect central to The Matrix. But The Matrix is 18 years old, and besides, MacMillan reckons that pioneer photographer Eadweard Muybridge got to the idea years before him – in fact, decades before cinema was invented.

Then there’s motion capture (or mocap): recording the movement of points attached to an actor, and from those points, constructing the performance of a three-dimensional model. The pioneering Soviet physiologist Nikolai Bernstein invented the technique in the early 1920s, while developing training programmes for factory workers.

Truly immersive media will be achieved not through magic bullets, but through thugging – the application of ever more computer power, and the ever-faster processing of more and more data points. Impressive, but where’s the breakthrough?

“Well,” Jelley begins, handing me what may be the largest bacon sandwich in London, “you know this business of the ‘uncanny valley’…?” My heart sinks slightly.

Most New Scientist readers will be familiar with Masahiro Mori’s concept of the uncanny valley. It’s a curiously anglophone obsession. In the 30 years since the Japanese engineer published his paper in 1970, it has been referred to in Japanese academic literature only once. Mori himself says the idea was never meant to be taken scientifically. He was merely warning robot designers at a time when humanoid robots didn’t exist that the closer their works came to resemble people, the creepier we would find them.

In the West, discussions of the uncanny valley have grown to a sizeable cottage industry. There have been expensive studies done with PET scans to prove the existence of the effect. But as Mori commented in an interview in 2012: “I think that the brainwaves act that way because we feel eerie. It still doesn’t explain why we feel eerie to begin with.”

Our discomfort extends beyond encounters with physical robots to include some cinematic experiences. Many are the animated movies that have employed mocap to achieve something like cinematic realism, only to plummet without trace into the valley.

Elsewhere, actor Andy Serkis famously uses mocap to transform himself into characters like Gollum in The Lord of the Rings, or the chimpanzee Caesar in Rise of the Planet of the Apes, and we are carried along well enough by these films. The one creature this technology can’t emulate, however, is Serkis himself. Though mocap now renders human body movement with impressive realism, the human face remains a machine far too complex to be seamlessly emulated even by the best system.

Jelley reckons he and his partners have “solved the problem” of the uncanny valley. He leads me into the studio. There’s a small, circular, curtained-off area – a sort of human-scale birdcage. Rings of lights and cameras are mounted on scaffolds and hang from a moveable and very heavy-looking ceiling rig.

There are 106 cameras: half of them recording in the infrared spectrum to capture depth information, half of them recording visible light. Plus, a number of ultraviolet cameras. “We use ultraviolet paint to mask areas for effects work,” Jelley explains, “so we record the UV spectrum, too. Basically we use every glimmer of light we can get short of asking you to swallow radium.”

The cameras shoot between 30 and 60 times a second. “We have a directional map of the configuration of those cameras, and we overlay that with a depth map that we’ve captured from the IR cameras. Then we can do all the pixel interpolation.”

This is a big step up from mocap. Volumetric video captures real-time depth information from surfaces themselves: there are no fluorescent sticky dots or sliced-through ping-pong balls attached to actors here. As far as the audience is concerned, volumetric video is essentially just that, video, and as close to a true record as anything piped through a basement full of computers is ever going to get.

So what kind of films are made in such studios? Right now, the education company Pearson is creating virtual consultations for trainee nurses. Fashion brands and car companies have shot adverts here. TV companies want to use them for fully immersive and interactive dramas.

On a table nearby, a demo is ready to watch on a Vive VR headset. There are three sets of performances for me to observe, all looping in a grey, gridded, unadorned virtual space: the digital future as a filing cabinet. There are two experiments from Sullivan’s early days at Microsoft. Thomas Jefferson is pure animatronic; the two Maori haka dancers are engaging, if unhuman. The circus gymnast swinging on her hoop is different. I recognise her, or think I do. My body-language must be giving the game away, because Jelley laughs.

“Go up to her,” he says. I can’t place where I’ve seen her before. I try and catch her eye. “Closer.”

I’m invading her space, and I’m not comfortable with this. I can see the individual threads, securing the sequins to her costume. More than that: I can smell her. I can feel the heat coming from her skin.

I know she’s not real, but my body doesn’t. Every bit of me that might have rejected a digitised face as uncanny has fallen hook, line and sinker for this super-real gymnast. And this, presumably, is why the bit of my mind that enables me to communicate freely and easily with my fellow humans is in overdrive, trying to plug the gaps in my experience, as if to say, “Of course her skin is hot. Of course she has a scent.”

Mori’s uncanny valley effect is not quantifiable, and I don’t suppose my experience is any more measurable than the one Mori identified. But I’d bet the farm that, had you scanned me, you would have seen all manner of pretty lights. This hasn’t been an eerie experience. Quite the reverse. It’s terrifyingly ordinary. Almost, I might say, human.

Jelley walks me back to the main road. Neither of us says a word. He knows what he has. He knows what he has done.

Outside the snack shack, three shop dummies in pirate gear wobble in the wind.