The noise around AI
AI is being touted as a transformative technology for the media space, but audio was an early adopter, writes Kevin Hilton.
In many ways the audio business was an early adopter of what could be described as artificial intelligence (AI). Sure, sound engineers like to be able to control and manipulate sounds, sometimes making minute adjustments in level and equalisation according to their ears and experience rather than what a meter or oscilloscope might be telling them. But there some jobs or processes that are time-consuming and dull so anything that can make life easier by taking over more mundane or repetitive tasks has been embraced.
Total recall and Flying Faders became an integral part of music recording in the 1970s, allowing EQ settings and fader positions to be loaded into a computer attached to the mixing desk and called up at a later time when required. Less flashy and obvious is the work of the ubiquitous compressor, a unit that featured in every studio rack and now every digital audio workstation as a plug-in. It is used in transmission to control peak volumes and in music or broadcast is able to limit the dynamic range of a vocal or instrument.
“Controlling dynamic range and the use of compressors could be a broadcast definition of AI because it is data driven,” comments Christopher Hicks, director of engineering at CEDAR Audio. “It removes the need for a human brain. Anything that is automated can be described as AI.” While the term AI has solid scientific and academic foundations, it has also become something of a nebulous, catch-all buzz phrase, which was not helped by the Steven Spielberg film of the same name.
“We prefer the term machine learning because it has connotations of being able to look at the data that has been extracted directly” Christopher Hicks, CEDAR Audio
There is also some confusion with the associated terms machine learning and digital neural networking (DNN). All three are often used interchangeably but are not synonymous. “AI as a term has been around for decades and in computer science is a big umbrella category,” says Hicks. “We prefer to use the term machine learning because it has connotations of being able to look at the data that has been extracted directly, without working to a prescribed model of what to expect.”
In this way machine learning differs from DNN because it involves loose statistical models as opposed to more rigid parameters. DNN is based on large amounts of training data, derived from specific examples, such as recognising particular types of voices and accents or sounds such as air conditioning or traffic. The drawback is that the program is not effective if presented with something it has not been programmed to recognise.
DNN is one way of implementing machine learning but it is too rigid,” observes Hicks. CEDAR Audio is a long-standing developer and manufacturer of sound restoration systems. Initially based on hardware devices, its technology is now also available in software form and used widely to clean up and repair location recordings in TV and film post-production, as well as restoring archive material.
CEDAR first employed aspects of machine learning in 2012 when it launched the DNS 8 Live multi-channel dialogue noise suppressor. This is able to remove background noise from speech and although designed for concert halls and conference venues, is also used in broadcast studios, post facilities and for sports anchoring. This was followed by the DNS 2, a portable unit designed for location recording that is able to deal with traffic noise, wind, rain and make allowances for poorly positioned microphones.