The way I see it this should not really involve the cloud that much. But for some darn reason Amazon and Phillips seem to be intrested in how often I turn my lights on and off and we're paying for it.
How are we paying for it?
Though added complexity and extra points of failure that reduce utility, not to mention the often outrageously priced "cloud"-based hardware and services. The OP had it right.
It's crazy! This should really be just a streightshot communication from Alexa to lightbulb.
There are several very good reasons for the cloud-based architecture, none of which have anything at all to do with Amazon, Phillips or anyone else being interested in how often you turn your lights on/off...and the fact is that we all actually benefit from that architecture. If it was done way you propose then Echos/Dots/Taps would be bulkier, more complicated and significantly more expensive. They also wouldn't be nearly as flexible and capable in terms of their ability to interface with the long (and getting longer all the time) list of HA products that they can work with now (and will work with in the future).
This is all based on a false premise. Alexa uses the "cloud" to process voice, then communicates with supported HA hardware via the local network (usually by HTTP). It finds the hardware using UPnP. The OP had it half right.
I'm a software engineer, and I've been designing, developing, implementing and maintaining complex computer-based systems for ~30 years now. Trust me, doing it the way you propose would be more cumbersome, complicated and expensive...and yield less functionality and flexibility...than the way it's being done now.
Always leery of arguments that hinge on "trust me". Having worked with HA (and X10) for decades, can assure you that adding the "cloud" to the process of turning lights on and off is hardly a step forward for property owners. It is a step forward for companies looking to sell "modern" HA systems that require use of their services and allows them to collect and sell "big data".
All Amazon needs to do is put a Zigbee radio in with it.
Actually, there'a a LOT more involved than simply slapping an additional RF transceiver into the device. A LOT. And what about Z-Wave? Or Insteon? Or Homekit? Or Thread? Or LightwaveRF? Or devices that only support cloud-based APIs?
This is the only part the OP had entirely wrong. Alexa simply downloads new software to support new hardware. This surely involves little more than a white list of devices that it will accept via UPnP. After that, it's down to sending HTTP directly to the bulbs, switches, plugs, outlets, etc. You can even set up virtual devices that spoof the ones it accepts.