What would a video game be without sound effects heightening the immersion level or even the score letting the player know when something dreadful or exciting is about to happen? The answer being, pretty flat and not very entertaining. Video game sound design is a field that isn’t necessarily explored that often but plays an integral part in the gamer’s experience. With continued advances in the VR (virtual reality) world and mediums such as the Amazon Alexa becoming more lifelike as technology improves, the emphasis on music and sound design to make these experiences as real as possible has become even more crucial. We spoke with an expert in video game audio, Richard Ludlow (founder of Hexany Audio), to learn how the creative process for this field works.
Can you tell us a little bit about Hexany Audio and what the company specializes in? How is Hexany different from other game audio companies?
At Hexany we specialize in audio production for video games, VR, theme parks, and interactive experiences. A team of eight sound designers and composers, we produce original music, sound design, and voiceover [VO] for a wide variety of interactive projects.
Why does Hexany Audio solely focus on video games and not film or TV?
While games are absolutely the core of our business, game engines and similar technologies are now being used to produce a much wider variety of experiences, so we’ve found our own work expanding to these new mediums as well. Sometimes this is interactive VR, theme park attractions, Amazon Alexa skills, etc. But underneath all of these, like games, most of the projects we work on are interactive entertainment.
There are a few reasons for this – first off, we’re all avid gamers at the studio and it’s simply one of our favorite mediums to consume as well as produce for. But at our core, we’re a studio that is focused around integrating technology into our work. We aren’t just an audio post house – we have an in-house Technical Sound Designer / Audio Programmer (Nick Tomassetti) who has developed some amazing tech for us, and my business partner and our Lead Composer (Matthew Carl Earl) is frequently working with audio middleware solutions like Wwise to not only compose music but also design adaptive music systems for our projects.
All of our team works heavily with technology – it allows us to be more creative and tackle new challenges in ways that traditional linear mediums like film don’t. Don’t get me wrong – I absolutely love film and TV, but we find ourselves getting to do a lot more R&D and experimentation in emerging interactive mediums.
When you are tasked with developing a game’s sound effects, voiceover work, and score, how do you decide what order to do things? What would normally be the order and is it important for one angle to get completed before doing another?
That’s a great question and something that often actually depends on the visuals. Most projects we work on start off with way too much dialogue written for them. The dialogue writers pack the game with too much voiceover thinking the players need to be told the entire story, rather than figure things out and experience it through the rest of the audio as well. By the end, that’s usually resolved, but I’m always advocating that temporary voiceover go in as soon as possible so we can start getting a feel for how it plays. So in a way, I think temporary VO sooner than later is critical – but it’s also one of the things that should have the final takes with the final actors recorded toward the end, since it will be iterated on the most during development. Of course, depending on the experience and how dependent other portions are on voiceover performance timings, recording earlier can also be advantageous.
The other thing it’s nice to start early on is finding the musical character and style for a project. This is something we can do pretty early on, but again it’s then nice to wait until things are more fully developed with at least concept art before diving deep into the composing process. The one nice thing about games vs. film though is that the music composition process can absolutely come earlier on because it’s (usually) not as dependent on picture sync. It’s more about numerous granular tracks and layers that all then come together based on player interactions.
While you can absolutely start stubbing in temp sound effects early on in order to get a feel for things and an idea for potential future memory budget issues, this is one of the last things I think that should generally be tackled; because animations are always changing, and it becomes an extreme waste of time to redo sounds numerous times as animation timings and particle effects are iterated on.
When first starting a project do you come up with an initial palette of sounds to draw from? How do you decide what the musical vibe is going to be?
This is a little bit different for every project, but it always starts with a conversation with the developer. Finding out what they love, what their sources of inspiration are, what the lore or story of their game is, etc. These all play a factor, and then usually we’ll come up with a concept pitch for the music or sound. We’ll sometimes create a soundscape or design the audio for a small slice of the game and make sure we’re all on the same page.
Then from there, on projects where we have full creative reign (as opposed to an existing IP), we often like to think about three guiding keywords. We try and distill the mood and emotion that the audio should embody into three keywords, and ask ourselves as we’re designing and producing content: is everything we’re making true to these ideas and emotions that we’ve established through these keywords?
Amazon Alexa, Google Home, and Apple Homepod are a few of the new emerging platforms on the market right now. On platforms like these, where not much is initially known about them before they are released, what sort of research do you do to find out formatting, etc.?
New mediums like this are cropping up more and more, and these 100% audio experiences are especially challenging. Amazon Alexa (at the time of this writing) is by far the most robust, with numerous ‘skills’ (games or applications). We were actually brought on for our first Alexa game by the company Ground Control, which has produced a number of very well crafted games for the platform and is absolutely a leader in this new medium.
They’ve approached us to handle the music, sound, and VO production for a number of their games, and we’ve slowly been figuring out how to best approach this new platform. A lot of it has been trial and error. It’s almost like radio, where you’re telling little stories with sonic landscapes that play out little sonic scenes. If it’s a basketball game you’ll hear music, crowds, the announcer, the sounds of the players traveling up the court, making a dunk, etc.
But like with all games, we’ll often design one of these scenes, put it in game, and realize that it just doesn’t feel good – things take too long to respond or don’t feel natural, etc. So there is definitely some back and forth once you’re actually able to hear how things are working on the device and responding to the user’s own voice.
When working on a specific game, is your approach different if it is for Amazon Alexa, Google Home, or Apple Homepod?
For this new medium as a whole, absolutely. Part of it is that audio scenes we design need to be very short, quick, and digestible. But there’s also some inherent lag in the devices depending on a user’s connection speed. Or if you are disconnected from WiFi you have to take all of that into account. Ground Control had the brilliant idea to always have our voice talent record funny error messages for these moments – so if things aren’t working quite right you still get a laugh and it doesn’t take you out of the experience too much. It’s not just Alexa telling you she’s having technical difficulties.
As far as differences between the platforms, one thing that is immediately apparent is the difference in audio quality. Amazon’s Echo is beautiful because it’s small, highly portable, and extremely affordable. It’s very accessible, which is what has made it so popular. But audio streams at a very, very, low bitrate, and if you’re using the Echo Dot it’s almost like a mobile speaker. This tradeoff for the size, portability, and flexibility doesn’t bother me as a consumer, in fact I prefer it. But it means that we have to be very conscious of how we’re mixing things for the platform. Things need to be crisp and clear, and we always have to check how things are going to sound converted. Also, volume spec has to make sure it matches Alexa’s own voice volume, as that’s key – so volume spec between the platforms also varies.
Also at launch, the Homepod doesn’t offer any of these experiences like Alexa, so we’ve been focused on Alexa while still making sure we retain futureproof platform-agnostic files in the event we do develop for these additional platforms in the future.
There was a recent announcement that Alexa is going to be integrated in Sonos speakers, Garmin navigation, and BMW vehicles. Does this affect your job at all in terms of precision?
As of now, we haven’t really been focused on these new emerging platforms. We’ve definitely worked knowing that we’ll have to support future speaker types with varying audio quality as these platforms mature, so the biggest part of planning for a future like that is asset organization. These games have a lot of assets, and making sure you have archives every step of the way so you can fork your project for different SKUs (or platforms) is very important when we need to go back and revisit a project.
I think making things come together naturally with the flow of all the different sounds and voices happening in the game was one of the biggest challenges. You have Alexa herself speaking some of the time, Buster Posey narrating other portions, another sports announcer we recorded taking you through the blow by blow of the game, all paired with music underscore, in-game source music, and sound effects happening… all one right after the other (or at the same time).
That’s three separate narrators, and frequently all three will happen within the span of a single interaction. These interactions need to be quick and to the point to feel good, so finding how to glue it all together with the right timings is probably the most challenging. People just don’t want to listen to things for long stretches of time, so you have to work within that attention span constraint.
You have worked on Buzzer Beater Basketball Trivia, Full Count Baseball Trivia, and Fourth Down Trivia, which are all sports games. But also an upcoming Gordon Ramsay game which is completely different. How different are sports games to work on than let’s say a cooking game?
It’s definitely a very different experience between the genres. In the sports games, we’ve got announcers, sound effects for the game, in-game music like chants, and background score, much like you might hear the game live on television. But you don’t always need all of those things in every game. For Gordon Ramsay’s skill, I think Ground Control wanted things to be as realistic as possible; it’s supposed to feel like Gordon is in the kitchen with you, and music and other sound effects take you out of that experience and make it a bit less personal. So Gordon’s game is 100% his voice; we had a blast working with him and recording him for the project, and I think ultimately the experience is quite a hoot!
You can learn more about Hexany Audio here: https://hexanyaudio.com/.