Just a general question really.
Does anyone know, why it's seemingly hard for a computing device to hear a keyword over and above, or perhaps through other sound/music?
Do we feel it's down to physical things like quality of microphones, or the poor ability, unlike the human brain to tell the difference between different types of sounds?
I mean, you could be listening to say a pop tune, quite LOUD and your wife speaks to you, and you still notice her, and can probably hear her as her voice is, in your brain, quite separate from the music, even though the music may be louder.
It's only pressure waves through the air hitting your ear drums of course, but we seem to be able to easy pick out and focus on sound coming from a source even though it may, in theory perhaps be muddled up with other sounds.
Perhaps it's that we understand the other sounds, when a computer cannot?
If I hear say a queen track, I know it's a queen track, the sounds of the instruments, the beat, rhythm etc.
If someone says your name, behind you, you'd instantly turn as that sound was not part of the music.
Perhaps that's the biggest problem to solve, like understanding what word you meant with speech, you need to understand the sentence, and the context to get it right.
Filtering out keywords from other sounds may perhaps be a similar hard thing to crack as the mic and computer is just hearing soundwaves and not understanding them, so it can detect something that's not part of what's coming out the speakers or other noise in the room.