Attack of the Vocaloids

Marrying music and mathematics for The Spectator, 3 August 2019

In 1871, the polymath and computer pioneer Charles Babbage died at his home in Marylebone. The encyclopaedias have it that a urinary tract infection got him. In truth, his final hours were spent in an agony brought on by the performances of itinerant hurdy-gurdy players parked underneath his window.

I know how he felt. My flat, too, is drowning in something not quite like music. While my teenage daughter mixes beats using programs like GarageBand and Logic Pro, her younger brother is bopping through Helix Crush and My Singing Monsters — apps that treat composition itself as a kind of e-sport.

It was ever thus: or was once 18th-century Swiss watchmakers twigged that musical snuff-boxes might make them a few bob. And as each new mechanical innovation has emerged to ‘transform’ popular music, so the proponents of earlier technology have gnashed their teeth. This affords the rest of us a frisson of Schadenfreude.

‘We were musicians using computers,’ complained Pete Waterman, of the synthpop hit factory Stock Aitken Waterman in 2008, 20 years past his heyday. ‘Now it’s the whole story. It’s made people lazy. Technology has killed our industry.’ He was wrong, of course. Music and mechanics go together like beans on toast, the consequence of a closer-than-comfortable relation between music and mathematics. Today, a new, much more interesting kind of machine music is emerging to shape my children’s musical world, driven by non-linear algebra, statistics and generative adversarial networks — that slew of complex and specific mathematical tools we lump together under the modish (and inaccurate) label ‘artificial intelligence’.

Some now worry that artificially intelligent music-makers will take even more agency away from human players and listeners. I reckon they won’t, but I realise the burden of proof lies with me. Computers can already come up with pretty convincing melodies. Soon, argues venture capitalist Vinod Khosla, they will be analysing your brain, figuring out your harmonic likes and rhythmic dislikes, and composing songs made-to-measure. There are enough companies attempting to crack it; Popgun, Amper Music, Aiva, WaveAI, Amadeus Code, Humtap, HumOn, AI Music are all closing in on the composer-less composition.

The fear of tech taking over isn’t new. The Musicians’ Union tried to ban synths in the 1980s, anxious that string players would be put out of work. The big disruption came with the arrival of Kyoko Date. Released in 1996, she was the first seriously publicised attempt at a virtual pop idol. Humans still had to provide Date with her singing and speaking voice. But by 2004 Vocaloid software — developed by Kenmochi Hideki at the Pompeu Fabra University in Barcelona — enabled users to synthesise ‘singing’ by typing in lyrics and a melody. In 2016 Hatsune Miku, a Vocaloid-powered 16-year-old artificial girl with long, turquoise twintails, went, via hologram, on her first North American tour. It was a sell-out. Returning to her native Japan, she modelled Givenchy dresses for Vogue.

What kind of music were these idoru performing? Nothing good. While every other component of the music industry was galloping ahead into a brave new virtualised future — and into the arms of games-industry tech — the music itself seemed stuck in the early 1980s which, significantly, was when music synthesizer builder Dave Smith had first come up with MIDI.

MIDI is a way to represent musical notes in a form a computer can understand. MIDI is the reason discrete notes that fit in a grid dominate our contemporary musical experience. That maddenning clockwork-regular beat that all new music obeys is a MIDI artefact: the software becomes unwieldy and glitch-prone if you dare vary the tempo of your project. MIDI is a prime example (and, for that reason, made much of by internet pioneer-turned-apostate Jaron Lanier) of how a computer can take a good idea and throw it back at you as a set of unbreakable commandments.

For all their advances, the powerful software engines wielded by the entertainment industry were, as recently as 2016, hardly more than mechanical players of musical dice games of the sort popular throughout western Europe in the 18th century.

The original games used dice randomly to generate music from precomposed elements. They came with wonderful titles, too — witness C.P.E. Bach’s A method for making six bars of double counterpoint at the octave without knowing the rules (1758). One 1792 game produced by Mozart’s publisher Nikolaus Simrock in Berlin (it may have been Mozart’s work, but we’re not sure) used dice rolls randomly to select beats, producing a potential 46 quadrillion waltzes.

All these games relied on that unassailable, but frequently disregarded truth, that all music is algorithmic. If music is recognisable as music, then it exhibits a small number of formal structures and aspects that appear in every culture — repetition, expansion, hierarchical nesting, the production of self-similar relations. It’s as Igor Stravinsky said: ‘Musical form is close to mathematics — not perhaps to mathematics itself, but certainly to something like mathematical thinking and relationship.’

As both a musician and a mathematician, Marcus du Sautoy, whose book The Creativity Code was published this year, stands to lose a lot if a new breed of ‘artificially intelligent’ machines live up to their name and start doing his mathematical and musical thinking for him. But the reality of artificial creativity, he has found, is rather more nuanced.

One project that especially engages du Sautoy’s interest is Continuator by François Pachet, a composer, computer scientist and, as of 2017, director of the Spotify Creator Technology Research Lab. Continuator is a musical instrument that learns and interactively plays with musicians in real time. Du Sautoy has seen the system in action: ‘One musician said, I recognise that world, that is my world, but the machine’s doing things that I’ve never done before and I never realised were part of my sound world until now.’

The ability of machine intelligences to reveal what we didn’t know we knew is one of the strangest and most exciting developments du Sautoy detects in AI. ‘I compare it to crouching in the corner of a room because that’s where the light is,’ he explains. ‘That’s where we are on our own. But the room we inhabit is huge, and AI might actually help to illuminate parts of it that haven’t been explored before.’

Du Sautoy dismisses the idea that this new kind of collaborative music will be ‘mechanical’. Behaving mechanically, he points out, isn’t the exclusive preserve of machines. ‘People start behaving like machines when they get stuck in particular ways of doing things. My hope is that the AI might actually stop us behaving like machines, by showing us new areas to explore.’

Du Sautoy is further encouraged by how those much-hyped ‘AIs’ actually work. And let’s be clear: they do not expand our horizons by thinking better than we do. Nor, in fact, do they think at all. They churn.

‘One of the troubles with machine-learning is that you need huge swaths of data,’ he explains. ‘Machine image recognition is hugely impressive, because there are a lot of images on the internet to learn from. The digital environment is full of cats; consequently, machines have got really good at spotting cats. So one thing which might protect great art is the paucity of data. Thanks to his interminable chorales, Bach provides a toe-hold for machine imitators. But there may simply not be enough Bartok or Brahms or Beethoven for them to learn on.’

There is, of course, the possibility that one day the machines will start learning from each other. Channelling Marshall McLuhan, the curator Hans Ulrich Obrist has argued that art is an early-warning system for the moment true machine consciousness arises (if it ever does arise).

Du Sautoy agrees. ‘I think it will be in the world of art, rather than in the world of technology, that we’ll see machines first express themselves in a way that is original and interesting,’ he says. ‘When a machine acquires an internal world, it’ll have something to say for itself. Then music is going to be a very important way for us to understand what’s going on in there.’

Praying to the World Machine

In late spring this year, the Barbican Centre in London will explore the promise and perils of artificial intelligence in a festival of films, workshops, concerts, talks and exhibitions. Even before the show opens, however, I have a bone to pick: what on earth induced the organisers to call their show AI: More than human?

More than human? What are we being sold here? What are we being asked to assume, about the technology and about ourselves?

Language is at the heart of the problem. In his 2007 book, The Emotion Machine, computer scientist Marvin Minsky deplored (although even he couldn’t altogether avoid) the use of “suitcase words”: his phrase for words conveying specialist technical detail through simple metaphors. Think what we are doing when we say metal alloys “remember” their shape, or that a search engine offers “intelligent” answers to a query.

Without metaphors and the human tendency to personify, we would never be able to converse, let alone explore technical subjects, but the price we pay for communication is a credulity when it comes to modelling how the world actually works. No wonder we are outraged when AI doesn’t behave intelligently. But it isn’t the program playing us false, rather the name we gave it.

Then there is the problem outlined by Benjamin Bratton, director of the Center for Design and Geopolitics at the University of California, San Diego, and author of cyber bible The Stack. Speaking at Dubai’s Belief in AI symposium last year, he said we use suitcase words from religion when we talk about AI, because we simply don’t know what AI is yet.

For how long, he asked, should we go along with the prevailing hype and indulge the idea that artificial intelligence resembles (never mind surpasses) human intelligence? Might this warp or spoil a promising technology?

The Dubai symposium, organised by Kenric McDowell and Ben Vickers, interrogated these questions well. McDowell leads the Artists and Machine Intelligence programme at Google Research, while Vickers has overseen experiments in neural-network art at the Serpentine Gallery in London. Conversations, talks and screenings explored what they called a “monumental shift in how societies construct the everyday”, as we increasingly hand over our decision-making to non-humans.

Some of this territory is familiar. Ramon Amaro, a design engineer at Goldsmith, University of London, drew the obvious moral from the story of researcher Joy Buolamwini, whose facial-recognition art project The Aspire Mirror refused to recognise her because of her black skin.

The point is not simple racism. The truth is even more disturbing: machines are nowhere near clever enough to handle the huge spread of normal distribution on which virtually all human characteristics and behaviours lie. The tendency to exclude is embedded in the mathematics of these machines, and no patching can fix it.

Yuk Hui, a philosopher who studied computer engineering and philosophy at the University of Hong Kong, broadened the lesson. Rational, disinterested thinking machines are simply impossible to build. The problem is not technical but formal, because thinking always has a purpose: without a goal, it is too expensive a process to arise spontaneously.

The more machines emulate real brains, argued Hui, the more they will evolve – from autonomic response to brute urge to emotion. The implication is clear. When we give these recursive neural networks access to the internet, we are setting wild animals loose.

Although the speakers were well-informed, Belief in AI was never intended to be a technical conference, and so ran the risk of all such speculative endeavours – drowning in hyperbole. Artists using neural networks in their practice are painfully aware of this. One artist absent from the conference, but cited by several speakers, was Turkish-born Memo Akten, based at Somerset House in London.

His neural networks make predictions on live webcam input, using previously seen images to make sense of new ones. In one experiment, a scene including a dishcloth is converted into a Turneresque animation by a recursive neural network trained on seascapes. The temptation to say this network is “interpreting” the view, and “creating” art from it, is well nigh irresistible. It drives Akten crazy. Earlier this year in a public forum he threatened to strangle a kitten whenever anyone in the audience personified AI, by talking about “the AI”, for instance.

It was left to novelist Rana Dasgupta to really put the frighteners on us as he coolly unpicked the state of automated late capitalism. Today, capital and rental income are the true indices of political participation, just as they were before the industrial revolution. Wage rises? Improved working conditions? Aspiration? All so last year. Automation has  made their obliteration possible, by reducing to virtually nothing the costs of manufacture.

Dasgupta’s vision of lives spent in subjection to a World Machine – fertile, terrifying, inhuman, unethical, and not in the least interested in us – was also a suitcase of sorts, too, containing a lot of hype, and no small amount of theology. It was also impossible to dismiss.

Cultural institutions dabbling in the AI pond should note the obvious moral. When we design something we decide to call an artificial intelligence, we commit ourselves to a linguistic path we shouldn’t pursue. To put it more simply: we must be careful what we wish for.