/ Forums
New To The Forum? Click Here To Read The How To Guide. -- Developers Click Here.

Ambisonic Microphones

ajocularajocular Posts: 83
edited December 2015 in Audio Development
I'd love to hear people weigh in on ambisonics. I'm looking to pick up an ambisonic mic pretty soon, but I'd like to know who else is using them for VR already and what you think about them vs. other gear.

My background with 3D audio is more on the omni-binaural side, but lately I've been feeling like I want to capture the sound field neutrally in addition to binaurally. Ambisonic mics make it easier to isolate elements from the sound field and play with them in post.

With binaural, you want to leave the track alone as much as possible, but maybe sometimes I want more verb or a flanger or whatever, and maybe I want it on one object but not the rest of the field.

Anybody have recommendations for the best ambisonic mics out there?

My primary concern is this: if ambisonic mixes aren't panned correctly, they can really gum up spatialization in post. Ambisonics are excellent at isolating unique elements of the sound field, but they're not as good at direction cues as binaural, so I think a lot of people will try to mix the two, and that's where you run into trouble. There are ways to avoid the pitfalls, and I hope we as a community can all help each other get up to speed on the strengths and weaknesses.

I'm worried that if ambisonics become super popular in VR, it will lead to blurry spatialization in lots of VR experiences (unless we heed the pitfalls). Ambisonics are a fascinating concept - you can position the sound anywhere within the volume of a sphere around your head. In VR though, we'll want to stay on the surface of that sphere. You have to crank your X,Y, and Z channels all the way to one side (whichever side you want to isolate).

That limits the versatility of the format because it prevents the mic from providing any of the distance cues it is famous for. It's the graphics equivalent of only being allowed to render at infinite. However, it's the only way to get a binaural plugin properly into the mix later in the chain. Those distance cues are virtually simulated by the hardware anyway, so you may as well virtually simulate them using any number of other methods later in the chain, and your results should be just as good.

Anyone have experience with this? It's hard to find people who are breaking ground here.

Comments

  • OlivierJTOlivierJT Posts: 212
    Lawnmower Man (or Woman)
    Hello Ajocular,

    So the thing with these microphone is how do you render them you your VR experience.
    They make sense for a static head similar to today surround sound systems.
    > Move your head and the sounds is not related to it.

    From my experience so far : you need Mono audio sounds...
    No need for complex mics and binaural things.
    Why ?
    The engine will do all that for you :
    -evaluating the sound in relation to you
    -and making it sound right (binaural)

    On today engine (UE4 is the one I know), to get sound spacialization you need : Mono audio.
    If it's stereo the Engine will render it without panelisation. It will be attached to your head.

    I have been waiting and shouting for the importance of 3d audio since I began my work in VR (+2 years ago) and we are close, Oculus should release an Audio SDK very very soon.
    From then we'll be able to figure it out exactly...

    It's possible today to get Wwise and Fmod with a 3d audio plugin, but I haven't tried them (making content with it I mean), as Oculus Audio SDK is nearly there, I can't waste my time on things that may end up not Oculus compatible...

    Don't invest in complicated microphone yet... Mono audio is very probably all you need !
    And a good stereo headphone of course.
  • BrianHookBrianHook Posts: 102 Oculus Staff
    Olivier, you're correct that mono is the way to go for spatialization (especially with head tracking), however there is still the issue of live VR capture (not CGI) and how to capture (and replace!) those sounds for later head tracked spatialization.
  • Regarding capturing surround environments, there's an approachable but short paper available here:
    http://www.adrianofarina.it/Files/paper ... a_2012.pdf

    which uses the MH Acoustics "Eigenmike" (http://www.mhacoustics.com/products)

    Personally, I'm not convinced Ambisonics are high-enough order to really capture enough detail to allow good localization
    (in fact, this paper suggests very strongly that you need better than Ambisonics-level http://www.ncbi.nlm.nih.gov/pubmed/23654379).
  • ajocular wrote:
    I'd love to hear people weigh in on ambisonics. I'm looking to pick up an ambisonic mic pretty soon, but I'd like to know who else is using them for VR already and what you think about them vs. other gear.

    Sorry no practical working experience with the Soundfiled microphone (or other ambi-systems) but experience delivering surround for broadcast and games. I'm interested in what you intend to record using an ambisonic mic' and how you would play that recording back in the context of a game engine.
  • It seems to me that the difference between an A-format mic and something like the Eigen is a lot like the difference between a 360 camera that has 14 lenses vs. one that has 30. As long as there's 100% overlap across the field, then it seems to me like it shouldn't be that big a deal how many mics you use because you can go higher quality, bulkier, and fewer units, or you can go lower quality, smaller, and more units and end up with similar results across the board. Please correct me if it's not fair to assume audio hardware works the same as video in this regard, and please give us some details on that.

    No matter how many mics you include in the array, as long as you have enough to get the whole field with no dead zones (which I think A-format manufacturers would argue is exactly what they do) then the rest of the mics feel like overkill to some of us who aren't hardware gurus. We can't know for ourselves what subjective difference having more mics will make unless we've A/Bed the same field with both an A-format and an Eigen (an experiment I'd very much like to try).

    I imagine the difference in quality would be noticeable, but I find it hard to believe it would create that essence of "undeniably twice as good," which is exactly the essence I get from binaural processing. Full disclosure, I'm obsessed with binaural cues, so much so that I wrote one of the few 3D audio Unity plugins available on the Asset Store, but I get that essence from almost every binaural plugin, not just mine.

    My guess is that going from an A-format mike to an Eigen would give me an essence of stepping the quality up from 95% to 100%. It's in that realm of diminishing returns for the amount of extra hardware needed. I haven't heard an A/B of the same field from one to the other, but I have heard recordings from both and I don't think I'd be able to tell which is which every time in a double-blind test.

    However, price and convenience weigh into as well. Converting A-format to B-format and then hard-panning everything to prep it for binaural processing is a pain, so if the Eigen and others have a simpler process, that's important info for us to know. I don't know how much the Eigen costs, but if it's more than 25% higher than a good A-format mic, that's beyond the realm of "worth it" in my book (unless I'm contracting with someone for whom price is no object) to eek out that last ounce of quality.

    I think the problem is that no matter how high the order is, the goal is accurate spatialization, and binaural processing just takes the ball so much farther down the field than any manipulation of the order. Thus, it's way more important for me to get the binaural cues correct rather than worrying too much about whether the field I captured was a perfect sphere. Sure, I want it to be perfect, but at what cost? Without a limitless budget, it strikes me as a nice-to-have, not a must-have.

    That's especially true because, at the end of the day, the field recording is really just going to be there for an ambience bed. We're all just scratching the surface of pro audio for VR, so most people in this space are looking at this whole-field capture thing as an "all-you-need" solution, but foley and ADR will be coming in to sweeten the deal in a big way very soon.

    We don't have a set of software tools for it yet, so game devs are the only ones who can really take advantage this early on, but I have no doubt someone out there is building a "Final Cut of VR" app that will allow non-developers to drop foley and ADR in anywhere and synch it to an object's position on the sphere. At that point, the challenge inverts. It'll no longer be "which mic is best at isolating any spot in the sound field and bringing it out?" The challenge will become "which mic can isolate any sound in the field and get rid of it so that we can replace it with foley and/or ADR?"

    I think it's going to become standard practice in VR to do an ambience pass with no dialog in every scene (I've done that on one shoot so far), and then ADR almost EVERYTHING in post. Film makers the world over are seeing the words "ADR everything" and cringing. :) If you're a film maker reading this, trust me. This bullet can't be avoided. Any other way will result in noticeably lower quality audio. No matter how good these full-field mics are at isolating, they'll never be able to pull a sound completely out of the mix, so if the sound is in the field at all, then it'll be near impossible to dub foley or ADR on top of it.
  • the5soulsthe5souls Posts: 21
    Lawnmower Man (or Woman)
    I love the informative post, ajocular. I just graduated from college with an unrelated degree, but I am very interested in the audio aspects of gaming. Games just never had that... immersive audio I've been searching for. The closest was probably the newer Battlefield and Battlefield: Bad Company games.
  • Happy to share knowledge. We need as many people as possible to be up to speed because the challenges in doing audio well in VR are similar to those on the graphics side. The bar is much higher in VR than anything we've seen before.

    There actually have been people working hard on 3D audio in gaming since the 90s (university research dates all the way back to the 70s). Ambisonic mics actually predate practical HRTFs, but they were originally intended as an alternative to surround sound in film, whereas HRTFs found a home in the PC gaming boom in the 90s.

    The trouble has been (aside from IP battles between the big players) that most 3D audio rendering methods are performance hogs even by today's standards. To render a sound in 3D, you have to interpolate three primary components of a signal: interaural time difference (ITD), iteraural level difference (ILD), and spectral color (EQ variations across all bands caused by anatomy). Doing all that interpolation multiple times per frame (every time the physics engine updates) for multiple sound sources at once definitely moves the latency needle even for a souped-up gaming rig.

    The audio people on AAA projects sometimes really want to render everything in 3D, but whichever plugin they choose usually gets the axe from the designer or the project manager in order to meet performance benchmarks. 3D audio has always been viewed by the powers that be as this nice-to-have-but-not-essential thing, but VR is flipping the script on that perspective (we hope).
  • > A-format mic and something like the Eigen is a lot like the difference between a 360 camera that has 14 lenses vs. one that has 30

    Ok, now imagine that each camera only has a single blurry pixel and you're closer to how a microphone captures sound. In this case, you are much better with 30 than 14 (as long as each microphone has a tight-enough directional pattern).
  • ajocularajocular Posts: 83
    I'm not sure I follow that update to the comparison. Do you mean that the difference in the number of mics has a much larger effect on recognizable fidelity than I implied with the original comparison?

    If so, it's hard for me to get on board with the extremity you're describing. Do you not agree that's an impossible exaggeration for audio? Or was it intended as hyperbolic?

    The extra fidelity immediately visible to a layperson's naked eye would obviously be recognized as more than twice as good from 14 to 30 pixels. On the other hand, I have heard samples from both mics, and the signal certainly does not become 2X better when you switch from one to the other. I don't hear significant quality differentiation, and I think if double-blind tests were done with laypeople, the numbers would follow suit with my perspective.

    Even if the quality were noticeably twice as good, I still don't agree with the premise of optimizing adjacent isolation. The mic patterns on the Eigen would obviously enable better adjacent isolation compared to A-format. I won't argue against that, but it seems to me that's not important for VR because spatializing a signal that has a little adjacent bleed versus one that has a lot of adjacent bleed will yield results with a negligible difference in spatial fidelity post-HRTF.

    The spatial blurriness I mentioned in a previous post comes from ILD and ITD discrepancies which only exist across opposite sides of the sound field. In other words, bleed from one mic to another doesn't matter much if the mics are next to each other because the HRTF takes over and sharpens the spatialization to a controllable point in either case. Bleed from one mic to the one on exactly the opposite side of the field IS a problem, but the remedy for that is correct mixing for both A-format and Eigen. Pan hard to the correct side of the field, and the problem is gone. It's gone for A-format. It's gone for Eigen. Results would be very similar either way as long as spatial filters are applied, and failing to apply spatial filters is never correct in VR except for underscore (which only exists outside the field anyway).

    If there are A/B samples of A-format versus Eigen (with HRTFs applied) that can prove me wrong, I want to hear them. I don't want to shoot down any hardware undeservedly. I want everyone to have a fair shot, but let's have every manufacturer bring out the biggest guns they've got, and let's hear everything apples to apples.
  • Hi AJ,

    You already know my attitude about ambisonics vs. omni-binaural. First off, you'll want to join the sursound mailing list if you're not already a member. That's where the internet's foremost ambisonics experts chat.

    My personal feeling is you are over-thinking the problem. As others have stated, object sound rendered in real-time will yield the best possible results in a game situation when you're concerned with localizing point sources. Ambisonics provides the best possible solution for ambient soundfield capture, and provides the benefit of allowing for customized HRTF processing at playback. In a game, I think an ambisonic environmental field recording would make a fantastic complement to object sound. For VR video of course the field recording is a 1-to-1 analog to the panoramic array, but for high-end productions we are using Dolby Atmos via their new Dolby VR technology.

    At Jaunt we use CoreSound TetraMics. They are inexpensive and made with obsessive attention to detail, sourcing excellent components and undergoing rigorous capsule matching & calibration. Paired with a Tascam DR-680 the whole setup costs around $2k.

    An Eigenmike will set you back at least $22k. BTW the Eigenmike is not an A-format soundfield microphone. True ambisonics arrays either natively capture spherical harmonics (such as the Nimbus/Halliday configuration), or consist of a regular polyhedron of which the tetrahedral variety (first order) is the only possible solution. MH Acoustics does however provide an Eigenmike-to-b-format (third-order) conversion, but band-limited at 8kHz due to spatial aliasing caused by the spacing between capsules. It is true that directional cues originate predominantly from lower frequencies, so a solution involving mixed-order ambisonics is probably the way to go when using the Eigenmike.

    Due to our large volume of content we have developed a proprietary b-format processing pipeline for asset management, a-to-b conversion, and playback. There are off-the-shelf tools available for all of these things, however, including plugins from Blue Ripple Sound, VVAudio, and Harpex. These are best used in the Reaper DAW due to its flexible channel handling.

    If you really want to understand ambisonics you'll be well-served reading Aaron Heller's papers.
  • ajocularajocular Posts: 83
    Reading Heller's work now. I appreciate the link.

    I'm wondering which part I'm overthinking, though. I think that if two different methods of audio localization (ambisonics and HRTFs) inherently prevent each other from working as intended under certain conditions when both are employed within the same signal, we should all be careful to watch out for those conditions. I think there's no such thing as overthinking it. If you do it wrong, you can't hear the cues as well. One reason I want to get my hands on an ambisonic mic is so that we can have some objective examples that demonstrate this. I think we should think about it as much as necessary to ensure the cues are as clear as possible.

    Omni binaural is a whole other can of worms. Y'all will be seeing a new thread on that topic before too long.
  • ajocularajocular Posts: 83
    Also, I had a long conversation with Marc from Dolby at GDC, and he explained their approach for Dolby VR in detail. Not sure if he intended any part of that conversation to stay between us, so I'll try to be as diplomatic as possible.

    In general, I got the impression that the folks at Dolby won't be chiming in on this forum any time soon because their VR solution is intended as exclusively available to established cinema production companies. They clearly had no desire to get into any pissing contests with competitors over fidelity or performance, which means lowly developers like myself are not allowed to objectively measure their solution against any other options. Curious.

    I told Marc I'll try to refrain from disparaging their contribution to the VR community until I have a chance to see how they measure up, but it seems to me that they wouldn't have anything to hide if their solution really was tops. Their brand is often seen as "the best there is" in audio, and I can't help feeling like that brand is becoming too easy to hide behind. Exclusive back-room deals only help to propagate the "he-said she-said" nonsense that is so prevalent in pro audio. Most of this stuff is objectively measurable, and the quality differences are easily perceptible to laypeople.
  • jlangfordjlangford Posts: 11
    I'm very much interested in this also. Within the next year, I'll be doing sound design on an open cockpit vehicle simulator and am already planning on placing a tetramic in the location of the virtual driver's head as part of the microphone array when I go to record.

    On the VR side, what is the technical feasibility of transcoding from b-format to binaural in real time to allow dynamic "panning" of the 3D sound via HMD rotational input? Are there any plans to support anything like this?
  • henkSPOOKhenkSPOOK Posts: 6
    Virtual Boy (or Girl)
    jlangford wrote:
    On the VR side, what is the technical feasibility of transcoding from b-format to binaural in real time to allow dynamic "panning" of the 3D sound via HMD rotational input? Are there any plans to support anything like this?

    --> You can use 3Dception as a unity plugin for this. First turn your B-format recording into two equal length stereo tracks (W&X) and (Y&Z) which can be easily done with audacity for example, or any other DAW that supports multi channel audio files. Then you load them into the 3Dception AmbiArray Component. If you use a standard asset first person character controller and freeze its x y and z position (so it cannot move around but only look around) you have it set up in no time. Works like a charm. You can then also add sound design elements as mono sources. I have done some test and this works really really well. Drawback is of course that this is only suitable for when you have a character that does not move around, like with 360 film.
  • jlangfordjlangford Posts: 11
    henkSPOOK wrote:
    --> You can use 3Dception as a unity plugin for this. First turn your B-format recording into two equal length stereo tracks (W&X) and (Y&Z) which can be easily done with audacity for example, or any other DAW that supports multi channel audio files. Then you load them into the 3Dception AmbiArray Component. If you use a standard asset first person character controller and freeze its x y and z position (so it cannot move around but only look around) you have it set up in no time. Works like a charm. You can then also add sound design elements as mono sources. I have done some test and this works really really well. Drawback is of course that this is only suitable for when you have a character that does not move around, like with 360 film.
    I was actually not expecting this to be a thing already! Great, I will do some test recordings and experiment. Thank you.

    Regarding the spatial movement issue, you are right - traditionally I would not think of using this technique for nearfield sound, although I think in this particular scenario all will be ok; there will of course be scope in the simulation for the driver to lean around in the seat to some extent, but in the real world, this type of engine has so much aural volume/presence, such small positional changes *should* have a negligable effect in all but the most extreme cases. That is the theory anyway! Will see how it plays out in practice. If it doesn't work, hey - the b-format recording will be a great base for a fixed speaker surround mix :)
  • ajocularajocular Posts: 83
    TwoBigEars has an ambi input? That's pretty cool. I didn't know that.

    Though it's a pain, you can mix any component signal from your ambi tracks down to mono and then feed the output into any 3D audio plugin for binaural rendering (I covered some of the rigors and pitfalls of that above). This is what the Jaunt guys are doing, if I remember right. Not sure what binaural plugin they use, but my personal favorite is the AstoundSound plugin. TwoBigEars is cool too.

    There is no restriction on motion when you do it this way, as long as you track the component motion from your ambi mix appropriately (which of course can be quite difficult if the source was moving quickly). Alternatively, you could just point a bunch of binaural instances of the field equidistantly in all directions. The more you do, the better it'll sound, but it'll render in 3D better if you're able to follow the exact motion of the source in the mixdown, prior to feeding it into the DSP. Also, you should not do that with more than 6 instances if you're running on a mobile processor.

    I hope someone automates tracking sources and syncing them to video one day for ambisonic mixes, but that's a pie-in-the-sky dream on my wishlist. It would make our lives much easier in post. I've actually spoken with several of the 360 video gurus about this, but most of them have enough battles on their hands with visuals alone right now.

    When I really want to capture all of the unique characteristics of a live physical environment, I often opt for my omni-binaural mic, especially in situations where I'm in control of the sound field (i.e. I can isolate each source). This is also the highest performance option by a factor of 10, so if you need LOTs of instances to render simultaneously, accept no substitute. If you're on Unity3D, you can use my VRSFX plugin to drop in omni-binaural output such that it will automatically track dynamically when you put it in motion digitally. I think my plugin is the only way you can do this with binaural hardware, so if you're using Unreal, I'm sorry. Still haven't had time to port it.

    If for some reason I preferred to use a DSP plugin instead, I'd probably still wouldn't capture with an ambisonic mic. I'd just use a dynamic mic unless the isolated source couldn't be successfully tracked for some reason. The tetra is great for full-field, but that's really the only good use of it. At the end of the day, it's a condenser array that acts like a good chameleon of other patterns. You'd be using it as a dynamic wannabe. I'd just use a dynamic instead if you can. If the field is out of your control (i.e. film shoot with dialog and source SFX), go tetra if you need to do a lot of post isolation, filtering, etc. In studio, there are better options.

    If you go dynamic, as with omni-binaural, you can avoid B-format. Your mono output is ready to go straight off the mic and into your binaural rendering plugin of choice.
  • JaeTJaeT Posts: 1
    ajocular wrote:
    I'd love to hear people weigh in on ambisonics. I'm looking to pick up an ambisonic mic pretty soon, but I'd like to know who else is using them for VR already and what you think about them vs. other gear.

    My background with 3D audio is more on the omni-binaural side, but lately I've been feeling like I want to capture the sound field neutrally in addition to binaurally. Ambisonic mics make it easier to isolate elements from the sound field and play with them in post.

    With binaural, you want to leave the track alone as much as possible, but maybe sometimes I want more verb or a flanger or whatever, and maybe I want it on one object but not the rest of the field.

    Anybody have recommendations for the best ambisonic mics out there?

    My primary concern is this: if ambisonic mixes aren't panned correctly, they can really gum up spatialization in post. Ambisonics are excellent at isolating unique elements of the sound field, but they're not as good at direction cues as binaural, so I think a lot of people will try to mix the two, and that's where you run into trouble. There are ways to avoid the pitfalls, and I hope we as a community can all help each other get up to speed on the strengths and weaknesses.

    I'm worried that if ambisonics become super popular in VR, it will lead to blurry spatialization in lots of VR experiences (unless we heed the pitfalls). Ambisonics are a fascinating concept - you can position the sound anywhere within the volume of a sphere around your head. In VR though, we'll want to stay on the surface of that sphere. You have to crank your X,Y, and Z channels all the way to one side (whichever side you want to isolate).

    That limits the versatility of the format because it prevents the mic from providing any of the distance cues it is famous for. It's the graphics equivalent of only being allowed to render at infinite. However, it's the only way to get a binaural plugin properly into the mix later in the chain. Those distance cues are virtually simulated by the hardware anyway, so you may as well virtually simulate them using any number of other methods later in the chain, and your results should be just as good.

    Anyone have experience with this? It's hard to find people who are breaking ground here.

    @ajocular : I have just created an account with the hope of contacting you in regards to your posts on this topic via a private message but the forum isn't letting me do so... If you wouldn't mind, could you please message me or reply to this post with another form of contact, I would really appreciate a moment of your time.

    Many thanks,

    J
  • nosys70nosys70 Posts: 384
    Art3mis
    as far as the subject is gaming, i do not see the relation with recording, since most gaming sound is issued from prerecorded sound (mostly basic sounds like explosions, squeeks and laser guns shots)
    unless you are creating a game that is heavily relying on real life simulation, i doubt you need to fiddle with 3D sound at this level.
  • BonzoDogBonzoDog Posts: 1
    ajocular said:
    I'd love to hear people weigh in on ambisonics. I'm looking to pick up an ambisonic mic pretty soon, but I'd like to know who else is using them for VR already and what you think about them vs. other gear.

    Anybody have recommendations for the best ambisonic mics out there?

    I don't think there is a "best" ambisonic mic.

    I have rented both  the Soundfield MK5 and the Core TetraMic from Audio Rents in Burbank CA.

    The MK5 is a great studio mic, with large diameter condenser capsules, very flat, very low noise, but it's a large mic, and has all the problems that go along with that.

    The Core TetraMic has small capsule electret capsules, so it has a higher noise floor, and it isn't as flat.  But it's great for field work.

    So, if I am going for audiophile quality in a studio, I reach for the MK5.  If I am in the field and recording ambiances or effects, I grab the TetraMic.
  • I am a musican interested in helping developers create music in the 3D sphere, I have lots of ideas about immersion and the musical knowledge to create something like no-ones ever heard before. If you're interested in discussing this or know who I could contact to work in a development office do let me know.
  • afarinaafarina Posts: 2
    Virtual Boy (or Girl)
    I am sorry to say this, but this thread contains really A LOT OF MISINFORMATION....
    I try to provide some clarification, but of course one has to study acoustics for really understanding the spatial properties of sound fields.
    Let' start with physics. In an acoustical sound field, two physical quantities are involved: sound pressure and particle velocity. Pressure is a scalar quantity, and does not carry any "spatial" or "directional" information. A sensor sensible to sound pressure is omnidirectional.
    Instead the particle velocity is a vectorial quantity, and carries the information of the direction-of-propagation of a sound wave. Hence, a directional microphone is always at least partially sensitive to particle velocity. For example, a  cardioid microphone is half sensitive to pressure, half to particle velocity.
    A generic sound field, indeed, is not composed by just ONE wave travelling in ONE direction, but it is the superposition of hundredths or thousands of waves, each travelling in a different direction.
    If these sound waves are plane, progressive waves, then for each of them sound pressure and particle velocity are perfectly in-phase and with an amplitude ratio (p/v) given by the acoustic impedance of air. But whenever these waves are not plane-progressive, then pressure and particle velocity get out of phase, and the ratio of their amplitudes can assume any value.
    When a microphone system has to capture the complete spatial information, for making it possible to recreate it artificially at playback, it should be able to separate the sound field in all the elementary waves that constitute it. And, for each wave, it should be able to capture separately the sound pressure signal, and the three Cartesian components of the particle velocity signal.
    And here comes Ambisonics: in the original Gerzon's formulation, an Ambisonics microphone does exactly this, capturing in 4 separate channels the sound pressure and the three Cartesian components of particle velocity. However, it does not separate the single waves creating the whole spatial scene, it just separates the sound pressure form the Cartesian components of velocity "of the whole". This 4-channels signal is called B-format. At least 4 capsules are required for this, but having more than 4 provides better results, in term of wider frequency range and less noise.
    Indeed, a traditional Ambisonics microphone produces very little spatial resolution, as all the waves are still mixed together. Furthermore, in Ambisonics it is assumed that all these waves are plane waves. Some advanced software do exist, namely Harpex-B, which can process a B-format signal, and separate all the plane waves constituting it. Once the elementary waves are separated, Harpex can render them in many ways, providing enhanced spatial resolution, which is not obtainable by traditional linear processing of the B-format signals (a.k.a. Ambisonics decoding).
    But this is not the only possibility for getting better resolution than traditional Ambisonics (which is really poor).
    Here in Parma, ITALY, we are using at least three different alternative approaches:
    1) High Order Ambisonics
    2) Spatial PCM Sampling
    3) 3DVMS (3D Virtual Microphone System)
    High Order Ambisonics is the equivalent, in space, of the Fourier analysis of a waveform being performed with an higher number of sinusoids for representing it. In space, the spatial distribution of sound is represented, in traditional Ambisonics, by spherical harmonic functions of order 0 (omni) and 1 (figure-of-eight). It is like attempting to emulate a generic waveform with a DC component (order 0) plus a single sinusoid (order 1). Of course the emulation of a complex sound will not be very good, if we stop at 1st order...
    But adding more and more sinusoids, then the emulation can become quite faithful. And the same does HOA, adding spherical harmonics, of order 2, 3, 4, etc... Currently form a microphone array such as the Eigenmike it is possible to get up to 25 spherical harmonics signals (going up to order 4).
    Traditional linear decoding of HOA recordings provides a much better resolution and localization than traditional 1st-order Ambisonics, not requiring the "parametric " tricks played by Harpex. And we have now a free 3rd order Ambisonics VR player, supporting 360 video stereo or mono up to 4096x4096 and 16-channels 3rd-order Ambsionics. It is called Jump Inspector and it has been released by google for the Cardboard and Daydream platforms. It works great! On our web site a number of 3rd-order Ambisonics VR videos are available:
    http://www.angelofarina.it/Public/Jump-Videos/
    Also Facebook employs HOA; namely 2nd order Ambisonics, which is the native format employed by Two Big Ears. The FB 360 Spatial Audio Workstation allows for encoding, processing and decoding in their proprietary TBE format, which indeed is a variant of standard 2nd-order Ambisonics.
    But there are other alternatives. Let's talk of SPS (Spatial PCM Sampling). The idea is to provide a PCM representation of the spatial distribution of sound, using a number of equally-spaced superdirective microphones, sampling the whole sphere. From an Eigenmike, a 32-channels SPS signal (P-format) can be easily obtained, providing better spatial resolution and wider frequency range than HOA.
    This results in 32 "sound objects" at known values of azimuth and elevation, which can be easily transmitted with formats such as MPEG-H or Dolby Atmos, or any other "object based" format. These 32 channels are then rendered as 32 virtual sources, surrounding the listener in his VR environment.
    And finally let's talk of 3DVMS, This is a format which was developed by RAI (Italian Radiotelevision) and is being used not just for cinematic VR, but also for normal broadcasting. The idea is to derive from a microphone array (such as the Eigenmike, or the RAI Cylindrical microphone array, called CMA-32) a small number of significant sound tracks, obtained pointing on each sound source a very directive virtual microphone, which follows the source when moving, also recording its angular coordinates. Currently the RAI 3DVMS app manages up to 7 virtual microphones in realtime, and the source tracking can be manual or automatic (with face recognition, for example). the RAI 3DVMS system makes use of a panoramic video camera mounted over the microphone array for capturing the video scene, allowing the operator to "see" where to point manually the microphones, or to use the face-tracking automatic algorithm.
    These 7 microphone signals are then transmitted, together with the angular metadata, and rendered at the receiver using methods such a VBAP, Dolby Atmos or virtual High Order Ambisonics (the latter is particularly efficient for head-tracked reproduction, as when using a VR visor).
    So it is really a pity that, with all these powerful possibilities, most people are still creating panoramic videos with first-generation spatial audio such as the crappy "quad binaural", or the basic 1st order Ambisonics supported by Youtube. We can do better (even when starting from mono tracks panned, or from a traditional B-format recording).
    At the very minimum, guys, please use the Facebook Spatial Workstation, you will see how 2nd-order Ambsionics is definitely worth the increase in channel number (from 4 to 8)....
    Of course, for recording an 8-channels TBE soundtrack you need to use a microphone array with at least 8 capsules, or to expand the spatial information of a traditional 4-channels B-format soundtrack using Harpex.
Sign In or Register to comment.