It's funny how quickly our expectations of speech recognition as a consumer technology have become so high. Just a few years ago the idea of controlling your home, retrieving all manner of useful information, initiating entertainment, etc simply by speaking to a machine...and having it speak back (in a pleasant, natural-sounding voice, no less) was still mostly perceived as the stuff of science fiction. That's because analyzing human vocalizations in order to recognize words is a complex and computationally intensive process. Until fairly recently the state of the art of the technology consisted of the ability to recognize individual spoken words for the fairly simple purposes like converting them to text for dictation, or combining one or two of them into rudimentary software control commands (ie, "Save file", "Launch Excel", etc.) And that required a fair amount of training the software to reliably recognize a single person's voice, pronunciation and inflection. And even then relatively minor variations in the way you said the same words could completely befuddle the software.
Now we have cell phones, automotive head units and household appliances that come from the factory able to re accurately and reliably recognize words spoken by just about anyone who is capable of relatively clear enunciation. And not only that, they layer on top of that the ability to makes sense of combinations of words, in the form of phrases and even whole sentences spoken in a natural, casual and sometimes even grammatically incorrect manner...and THEN do something useful (or at least entertaining) with it.
I like to remember this from time-to-time...not only because it's nice to recognize how far this stuff has come, but also because it keeps me from becoming frustrated on those occasions when things don't immediately...or maybe ever...work like I think they should.