Building a Custom Alexa Skill — Order Me a Coffee | by Karan Gupta | May, 2022

Part 1: The complete solution with the challenges faced

Echo Dot — An Alexa-enabled device

This is going to be a multi-part series where I will be sharing my experiences of tackling challenges like:

  1. Making a multilingual skill supporting various locales of different languages ​​if you are thinking of extending your skill customer base
  2. Using live location for reverse geocoding to make a location-sensitive skill
  3. How to resolve latency issues for improved user experience and much more

This part is going to be all about the base architect for a custom Alexa skill.

Well, let’s dive in. Enjoy!

Well, let’s start with the basics: what is an Alexa skill?

Just like there are applications in a Google Play Store you can use, there are skills available in a skill store that are specific to an Alexa device’s region. You can perform more sets of operations using these commands. That’s it.

For example, assume I have a skill named “Coffee toffee” in my local store’s Alexa device. I can go to the skill, enable it for use and check out how it can be invoked and used. It can be invoked by saying, “Alexa, is there any coffee toffee store nearby?” and you can order a coffee by saying, “Order a cappuccino,” and use it on the go.

Here are the steps you can follow to create your own Alexa skill. (We’re gonna do the same):

  1. Plan and design your skill
  2. Set up the skill in developer console
  3. Use the voice design to build your interaction model
  4. Code and test your skill
  5. Beta test your skill
  6. Submit your skill for certification

First, determine the value proposition for your skill: What value will your skill provide customers?” Our skill is going to be “coffee toffee,” so we are going to help our customers order their favorite coffee on the go.

Then, heading straight to the design, design a voice user interface. This maps out the interaction and how your users will interact with your skill.

To get started, create a new skill in the developer console. I am using the following:

  • “Coffee toffee” as my skill name
  • primary locale ENG(IN)
  • Custom model for my skill
  • Alexa-hosted (Python) resources for my skill’s backend resources
  • Hosting region EU (Ireland)
  • No template (start from scratch)

You can face problems while going for Alexa-hosted resources since they are limited. I would suggest creating your own AWS account and using that for backend resources since it provides you much more flexibility and as many resources as you want.

The interaction model refers to your invocation name, collection of intents, sample utterances, and slots.

  • Invocation name — The name you use to invoke the skill in Alexa, like coffee toffee
  • Intents — The requests your skill can handle, like ordering a coffee drink
  • Sample utterances — It maps the intents to the words and phrases users can say to interact with your skill, like “I would like to order a cappuccino,” “Order a cappuccino for me,” “A cappuccino for me,” and so on
  • Slots — Your intent can optionally have arguments called slots, for instance, “A cappuccino for me,” “An espresso for me,” or “A cafe latte for me.” We see that the coffee drink is an argument, ie, slot in this particular intent.

I am using the developer console, Build →Interaction Model →Intents, and adding the intent, sample utterances, and slots in curly braces {}. When creating utterances, you can use pre-defined types like AMAZON.Person for the slot type. This uses a person’s name or creates a custom one like we are doing by creating slot type in slot types and then assigning it in the intent.

Adding Intent, sample utterances, and slots
Defining a custom slot and its values
specifying a slot and its corresponding type

You can head over to Interaction Model →JSON Editor to check the generated JSON for your corresponding voice interaction model. Here’s some code:

JSON Editor

You can make use of the JSON editor, developer console, or CLI to create your interaction model; it all depends upon your convenience.

Let’s head to the code section in the developer console,

We are provided an AWS Lambda (an Amazon Web Services offering), which lets us run code in the cloud without managing servers.

Your primary coding task for your skill is to create a service that can accept requests from the Alexa service and send back responses, which we achieve using the help of classes.

We create classes for every type of request we can receive from the Alexa service depending upon the intents we have specified for our skill. Also, we have a few classes predefined for the launch request and built-in intents supporting our skill.

Each class has two main functions, namely:

  1. can_handle — which checks for the intent name which it has been defined
  2. handle — which makes use of the request and its attributes sent from the Alexa service to send an appropriate response to the user

For our order coffee intent handler, you may see the following in the can_handle function:

  1. It checks whether it is the intent for which it is defined that is “order coffee intent” using is_intent_name
  2. Then in the handleit gets the slot value for the “coffeedrink” slot using handler_input. request_envelope. request. intent. slots[‘coffeedrink’] .value
  3. Then using its value from the user input, it builds the output for the user, simultaneously ending this particular session of the skill using handler_input. response_builder. speak(speak_output). set_should_end_session(True)
  4. Then it returns the same response

Here’s the code:

If you wonder how we are accessing these values, you can look at the sample request being sent from the Alexa service. Check out how we are accessing the slot values ​​from the request following request. intent. slots[‘coffeedrink’]. value in the JSON request.

Here’s more code:

Then we add every request handler to the skill at the end; that’s very important. I have written the code according to my intent, but I haven’t touched in-built intents.

AWS Lambda code

If you are working with custom AWS resources in the developer console, update your skill with your endpoint (for instance, the Lambda ARN).

Then we head straight to the test part, where we check whether our skill is working.

It’s working fine for us.

Once your skill is finished, you have the option of setting up a beta test for your skill. With a beta test, you can make your skill available to a limited group of testers you have personally selected rather than the general public.

Let’s head to the distribution session for that in the developer console itself.

In Skill preview for our locale, add a one-sentence description, detailed description, example phrases(intents), small skill icon, large skill icon, category, keywords, and privacy policy URL. Do the same for the privacy and compliance section of the skill.

Then in the availability section, give public access. In beta test, add the beta test administrator’s email address, then add the tester’s email addresses separated by “,”. After entering the tester’s email addresses, enable beta testing.

If you skip filling any of the options, you may not be able to enable beta testing for your skill since all these things are required for the Alexa skill store.

Then the testers receive an invitation email for beta testing the skill and have the option to enable the skill in the mail itself. Do the required, then do the beta testing.

If you click “more,” then head over to the activity section in your Alexa, you can see your voice history. It would look something like this:

Then you continue to add more features to your skill. When you think your skill is ready to be published to the general public, you submit your skill for certification. There is a checklist for that; you can find more about it here.

Leave a Comment