How Alexa Skills and Google Actions Work

Hey there! I’m writing this article to provide you an overview of how Alexa Skills and Google Actions work. The diagram above shows the flow and each step in the flow is numbered. I’ll describe each step in detail below.

1. User makes a request using her voice

User makes a request to the Amazon Echo (or Google Home) device using her voice. These devices are equipped with Far-Field Voice Input Processing (FFVIP). Examples of requests include:

  • Asking for some information (weather, travel schedules, etc.)
  • Instructing a smart home device to do something
  • etc.

Users can also make their request using other devices – such as smart home devices, tablets, smartphones, etc – that are compatible with Amazon Alexa or Google Assistant.

2. Amazon/Google device sends request to Amazon/Google cloud

Then the device sends the request to the Amazon (or Google) cloud. This is depicted as a blue cloud in the diagram – and is where most of the magic happens!

The Amazon (and Google) cloud uses advanced technologies like Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), and Machine Learning AI. This enables the user to interact with the device using his voice and natural language (i.e. spoken human language just like talking to another person).

3. Amazon/Google cloud sends data to Skill Developer cloud

Amazon/Google cloud processes the user’s request and sends relevant data to the Skill Developer’s cloud.

This cloud (depicted as yellow/golden cloud in the diagram) runs the code developed by the Skill Developer to implement functionality specific to their skill. This code is usually hosted on a public cloud (such as AWS cloud) or the developer’s own data center.

When you develop an Alexa Skill or Google Action, most of your time is spent in developing this code and deploying it to your cloud.

4. Skill Developer cloud sends response to Amazon/Google cloud

Skill Developer’s cloud then processes the data received from Amazon/Google cloud and sends the response to the Amazon/Google cloud. This includes data or information specific to the skill developed by the Skill Developer.

5. Amazon/Google cloud sends response to Amazon/Google device

The Amazon/Google cloud takes the data/information from the Skill Developer’s cloud and processes it into a format that is compatible with the Amazon Echo (or Google Home) device. Then the cloud sends this information to the device.

This information can include audio that will be played by the device to the user, visual information to display, or commands to issue to a smart home device (such as a lamp or a printer).

6. Amazon/Google device shares response with user (or smart home device)

Amazon Echo (or Google Home) then shares the response with the user by playing back the audio (and displaying visual information, if applicable).

Alternatively, Amazon Echo (or Google Home) causes a smart home device (such as lamps, printers, etc.) to take a specific action in response to the user’s request.

I hope you found this high-level overview helpful. Post a comment if you have any questions!


About the Author:

Hey I'm Michael, I live in California. I've worked in the tech industry for about 20 years, mostly in silicon valley. My other interests include video games, sci fi, and hot sauces! Thanks for stopping by.

Post a Comment

Be the First to Comment!

Photo and Image Files