I don’t really follow your logic, how else would you propose to shape the audio that is not “just an effect”.
Your analogy to real life does not take into account that the audio source itself is moving, so their is an extra variable outside of just stereo signal -which is what spatial audio is modelling
And your muffling example sounds a bit over simplified maybe? My understanding is that the spatial stuff is produced by phase shifting the LR signals slightly
Finally why not go further? “I don’t listen to speaker audio because it’s all just effects and mirages to sound like a real sound, what only 2^16 discrete positions the diaphragm can be in” :p
Hmm, not so sure. He produced a digital signal, who’s spectrogram happened to be an image, and then played that digital signal to a bird. Dunno if a analogue spectrogram really even makes sense as a concept. The only analogue part of the chain would be the birds vocalisations, right?