Top news stories for Episode 1 (July 6, 2017):
1) Reviews for Echo Show are in, and mostly positive, including this one from CNBC.
2) Samsung is producing it's own smart home speaker, powered by Bixby, Samsung's voice assistant.
3) Alexa passes 15,000 skills, more than doubling since beginning of year
4) Google fined 2.4 billion euros by the European Union for "anti-competitive" practices in written search.
5) Google Home just released in the UK market.
6) Alibaba is preparing a "voice hub" within the Chinese market.
7) Adobe is launching a voice analytics product that collects data across all the major players (Alexa, Google Assistant, Cortana, Siri, Bixby, etc) for the purpose of finding actionable customer behavior insights.
Panel for Episode 1 (July 6, 2017):
Dr. Ahmed Bouzid is Founder and CEO or Witlingo. Dr. Bouzid is also co-founder and Director of the Ubiquitous Voice Society, a non-profit organization dedicated to the mission of evangelizing the emerging voice interface, and author of two books on Voice User Interface design. (Dr. Bouzid's recent article on discovery of voice skills, referenced in Episode 1 of This Week In Voice, is here.)
Brian just published issue number 6 of Multiplex Magazine called The Enchanted Loom. He explores a new AI concept for Voice First systems called Artificial Understanding. Get the Read Multiplex App at the iOS store and subscribe for this and the entire catalog of magazines.
Mark Tucker is an Alexa Champion and co-organizer of both the Phoenix Alexa Meetup and the Phoenix Chapter of The Ubiquitous Voice Society and hopes that you will join with them each month to learn about and promote Voice-First technology in Arizona.
Bradley Metrock: [00:00:07] Hi and welcome to the very first episode of This Week in Voice - the weekly show that will examine all the news in this emerging exciting field of voice technology.
Bradley Metrock: [00:00:20] We're pleased today to have three amazing guests. Ahmed Bouzid...Ahmed, say hello.
Ahmed Bouzid: [00:00:27] Hello! Hello everyone.
Bradley Metrock: [00:00:31] Ahmed is CEO of Witlingo. Witlingo is a B to B to C software as a service company focused on enabling companies of all sizes to launch and host highly usable value delivering voice experiences on far-field voice-first platforms, such as Alexa, Google Assistant, and Microsoft Cortana. Witlingo is a certified preferred partner with Amazon, Google, and Microsoft.
Bradley Metrock: [00:00:57] Our next guest is Mark Tucker. Mark, say hello.
Mark Tucker: [00:01:00] Hello!
Bradley Metrock: [00:01:01] Marc Tucker is an Alexa champion and co-organizer of both the Phoenix Alexa meetup and the Phoenix chapter of the Ubiquitous Voice Society and hopes that you will join with them each month to learn about and promote voice-first technology in Arizona. And even if you're not a local designer, developer, entrepreneur, or enthusiast, Mark would still like to connect with you on LinkedIn and his LinkedIn profile will be provided in the show notes of this podcast.
Bradley Metrock: [00:01:28] Our third guest is Brian Roemmele. Brian, say hello.
Brian Roemmele: [00:01:32] Hello!
Bradley Metrock: [00:01:35] Brian is a tech analyst researcher and entrepreneur and he just published issue number six of Multiplex magazine called The Enchanted Loom. He explores a new AI concept for voice-first systems called artificial understanding. Get the Read Multiplex app at the iOS store and subscribe for this and his entire catalog of magazines.
Bradley Metrock: [00:01:55] Gentlemen thank you very much for setting the time aside to come on.
Bradley Metrock: [00:01:59] My name is Bradley Metrock. I'm CEO of Score Publishing, a company based in Nashville, Tennessee. And with that, let's get to the news.
Bradley Metrock: [00:02:11] So the very first story this week is the Echo Show. Now the Echo Show came out last week. But the reviews have been coming in. And most of them are positive, including one from CNBC that we note in the show notes.
Bradley Metrock: [00:02:26] Ahmed, I'll start with you. What is your reception to the Echo Show, positive or negative, and where do you think Amazon's taking this thing?
Ahmed Bouzid: [00:02:38] So I am one of those people who is very skeptical about everything. So for example when the Tap came out, or before when I was at Amazon and we were launching the Tap, I was very skeptical of it. But like any other product we can't really judge it until you have it in front of your eyes, or you start using it. I was skeptical of the Show. I was thinking that it was going to be a step backwards where now that you can touch the screen that voice will take a second you know a secondary role. Received it last week - it's amazing. I love it. I love the fact that I'm able to do what the Echo does but also have the additional screen.
Ahmed Bouzid: [00:03:21] I think we are, just myself, I'm just starting to explore it for real, and thinking of use cases and there are limits to our thinking and thinking is enhanced and helped when we have something in front of us. So my reaction really is I love the product. I think it's going to go very far. It's going to open up a whole new set of these cases of multi-modality. And I'm looking forward to having it be part of my daily life.
Mark Tucker: [00:03:49] I received the Echo Show the day before our last series of meet ups and took it to the meet up. And it got some good positive feedback from the people over there. I have a series of different devices. I have an Echo, a Dot, I had a Dash Wand and an Echo Show there and this particular venue had a network where you had to do credentials or at least click on the button when you redirected to the web page. None of the other Echo devices were able to connect, but the Echo Show actually brought up a web page where I could click the confirmation on the web page, and was able to use that, and demo that. So that was impressive. So looking at it from a developer perspective, I'm excited. Documentation's just barely coming out. It's definitely going to be a "voice first, screen second" platform. It's not a tablet that hooked on to the Show by any means, and in fact I watched a review on YouTube where the reviewer was giving it bad marks because it wasn't more of an interactive tablet experience. That's definitely not what Amazon's going for here.
Brian Roemmele: [00:05:03] Well I've got to echo what Ahmed and Mark have said. You know, I've always thought of this as voice first, not voice only, and Echo Show is exactly the type of system that I ultimately think we're all going to be having but not necessarily with the screen tied in to the device. I've always thought of it as an ephemeral screen a situational screen that presents itself when it's needed and Echo Show is the first generation of that. And it is a wonderful product actually very robustly designed. I got inside of it and the speakers are very hefty. Extremely powerful amplifier - I think it beats the original Echo as far as sound quality and fidelity. The dynamic range, I would think, is slightly better but the power of the speaker especially in front of it is better than the original Echo. As far as the integration of the screen, I think there is going to be a learning curve especially for developers and even Amazon. I think a lot of people are sort of trying to understand: when do you use the screen? And are we just going to get lazy and not really use the voice interface? And I think that's what Ahmed was being skeptical about. And I think the conservatism that Amazon has had in this screen usage is I believe just the right balance. But I think in the future while you're talking you may wind up wanting to see some images in real time rather than waiting for it to complete. I guess that's the best way I can say it. And you know I have demonstrations of how I do that. I've been working on voice-first devices for a couple of decades so ephemeral screens are not new to me. That's only thing I was able to do because of the technology. So I'm hoping to see some of that come about. But I think it's a wonderful device and it's going to do exceedingly well in my view.
Bradley Metrock: [00:07:02] Given the positive reviews across the board...put a number on it. From zero being "this is absolute garbage" to 10 being "this is the best possible product that Amazon...this is the best possible version of itself that Amazon could have released with the Echo Show." Each of you, give me a number, and Brian, you can start first.
Brian Roemmele: [00:07:27] 7.5. And it has nothing to do with the quality or Amazon. It just has to do with where we are in the arc of development.
Mark Tucker: [00:07:38] Mark here. I'll go ahead and give it an 8. I'll give you an example: it was able to surprise me. Not only did I like having the lyrics show when I'm listening to music, but there's a cool feature - I don't know if you've tried it yet - you can hold a product like a bag of Cheetos to the Show and say "Alexa, buy this," and these sparkly things appear on the screen and, all of a sudden, it comes up with the product listing that you can purchase on Amazon.
Ahmed Bouzid: [00:08:08] Yeah I think 7.5 or 8 is probably the right grade. Just to echo - we're going to be using the word 'echo' many times I guess, right? - is to echo the sentiments of both Brian and Mark. I think the product is rich enough for us at this point to understand that we are going to discover as we go along what to do with it which is what a great product is all about. If you don't know exactly the limits what you can do with it, unless you use it. And so because of that I think I think we are...this is an MVP, a minimum viable product. I think, for sure, this embodiment of the screen and speaker in one place is only the first iteration. I think we will definitely be living in a world later on where your voice is ubiquitous and also images and touches are ubiquitous so that you can freely in your normal life, as a three-dimensional human being, be able to call upon technology to help you in real-time. So you can go on with your three-dimensional life, as opposed to being trapped by a device or interface.
Brian Roemmele: [00:09:16] You know I've got to circle back around. I fully believe what Ahmed, everything he's saying, perfect...Mark also. Mark brought up a really interesting point and I really think that the fundamental use case is going to be voice commerce for the Echo Show. There is absolutely no doubt it's a magical experience. Like Mark said you hold something up, it knows what it is, barcode especially, and boom, it's ready to go. And that's always been the hidden agenda with Amazon. It's also, if we get into it, Ali Baba - both companies very large e-commerce companies moving into voice commerce very efficiently. And finally you know I don't know if anybody got to try the video calling - the video calling feature.
Bradley Metrock: [00:10:01] Yes I did.
Brian Roemmele: [00:10:02] It's magical. You know we had dinner with grandma at the table and she's eating, we're eating. My kids are interacting. All I could say it was a magical experience and I could see it really dominating families that are separated. It really touches people.
Bradley Metrock: [00:10:18] Yeah. Brian I did and that's actually why I give it a little bit higher score I think probably made it eight and a half or a 9. The video calling was phenomenal. I called my parents using it and it's very easy to see how you had this thing sitting on your desk in your office and you just tell it to call somebody and it calls. And yeah you know you could do it on your phone but it's just very underrated how great it is to have it be hands free. So that's story number one. And appreciate those perspectives on that.
Mark Tucker: [00:10:53] So I just want to say one last thing about that...
Bradley Metrock: [00:10:57] Sure.
Mark Tucker: [00:10:59] I haven't been able to do the calling feature on the Show, yet. I'm in this sub-class of people that doesn't have an Android phone or an iPhone. And, right now, that's what you need to be able to set up the Show. I guess I have a concern that there is a direction that maybe from Amazon that's going to require these certain things. There's different ways to set up the application - you can go through web, you can do it through a tablet or through a phone - and right now you have to have one of those two phones that I mentioned before, an iPhone or an Android phone, to even set up this. And so I'm thinking of people who don't have smartphones maybe parents or grandparents that you want to set this up for, and they don't have a smartphone. So there's a certain sub-class of people that can't use these features right now, and I'm not sure if it's a trend from Amazon or that the technology just hasn't caught up yet.
Ahmed Bouzid: [00:11:59] Are you sure you can't do through a web browser?
Mark Tucker: [00:12:04] Yeah, I tried that - the option's not even there. So I've got a call out to an evangelist from Amazon that's going to be checking in for me, but I haven't heard back yet.
Brian Roemmele: [00:12:17] I can tell you that that's that's already being worked on. The design, I think they overlooked a couple of things, and I think in the next iteration it won't be more than a software download where you'll just be able to drop in. It will detect the wi-fi, and you're ready to go.
Bradley Metrock: [00:12:35] And Mark, I encountered that myself, where it surprised me that I needed to get my smartphone out. I was looking around, like, "is there some other way to do this?" And it didn't appear like there was. But it's good to know that they are working on that, which in my opinion, I agree with you, they should.
Bradley Metrock: [00:12:51] The second story we've got is that Samsung is producing its own smart home speaker, powered by Bixby, which the Samsung's voice assistant. And since the third story we've got deals with Alexa, Brian, I'm gonna let you take this one first.
Brian Roemmele: [00:13:09] Well thank you. Bixby is a interesting creation by Samsung. And a lot of people have made the mistake of thinking, when we look at Bixby, we're actually looking at Viv, and we aren't. And you know I feel kind of bad for the Viv team because a lot of what Bixby has presented thus far, and its implementations, have been not really taken as being a serious attempt to equal Siri or Google Assistant. Bixby was originally designed for primarily Korean language later on and few other Asian languages and really has not done extremely well with English, whereas Viv is a completely different platform. Now the question that remains is how quickly can the Bixby team assimilate what the Viv team has produced. I am not certain if that's going to take place in a very fast, you know, an accelerated way. The problem with Bixby I think ultimately is going to be when they move onto an independent platform, a voice-first platform, will it be compelling enough for people to want to move in that direction? I'm not convinced yet.
Brian Roemmele: [00:14:32] I have seen some improvements in Bixby. I got to see some very early alpha tests, and they've gotten much further down the line. And I do like the fact that they're enriching and expanding the voice-first platforms. I mean it is radically expanded in 2017 and it's going to quadruple in 2018. So I think Bixby needs to be there. I think they need to hopefully integrate Viv in a much deeper way. But let's also shift to one other concept and that is the real reason why Samsung acquired Viv, and that is Samsung is looking at let's call it "voice-ifying" every thing that they make. And this is a radical departure in user interface. I believe it's going to be the best radical departure we've ever seen as humanity. I have appliances - I have a washer and dryer I really don't know how to use. And here's, Bradley, here's the thing that really is crazy. I bought these things because I had really sophisticated operating systems apparently. Really nice touch screens. They could do everything. But you just don't even want to do it. You kind of...the nerd of you says "yeah, let's do this." And then, it's like, just get me to clean white socks. The kids are walking in the streets with these socks! And so that's the subtext to all this is Bixby and Viv, in some form, is going to reach out to you through millions and millions of appliances, door locks, garbage cans, TVs, appliances, nuclear power plants. Let's not get those words wrong, right? Because they do have machinery that interacts with that. And most specifically, medical machinery. Samsung is working very aggressively for a number of medical operations to be performed purely by voice. And it sounds crazy until you see what somebody has to do in a medical environment to get something done manually. You'll see that it dramatically changes medical experiences. So they got to get this stuff wired and they've got to get it right. So it's good news, but it's just the beginning for Samsung.
Ahmed Bouzid: [00:16:44] I just want to echo Brian's remarks and that I would be disappointed if what the iteration on that, and as far as voice-first would be an Echo-like device, meaning speaker that can talk to you from far away. One use case that I would love for somebody to solve, not that I watch a lot of TV, but television being able to just say I want to watch baseball and baseball shows up. I want to watch Nationals, or I want to watch something. Just that, and just make the remote control completely obsolete. I'd love for that use case to be solved. I think if there were to pick an MVP that would be an amazing MVP to pick, and solve the problem of the TV speaking. And so it nullifies the signal coming out and it's just like magic and it works. But again I think they would be..I think their assumption of Bixby would be hugely positive if it came up with something that is embedded within some device that exists in every home, whether it's a washing machine or what I think would be a compelling case especially in the United States where there are like four or five TVs on average per home is a smart TV they can speak with and that can do things like change the channel, turn off, or go to some you know some content that you're looking for.
Brian Roemmele: [00:18:13] You know Ahmed, I got to echo in here. That's what Apple TV is trying to do. And that's what Siri and Apple have been working on and they haven't got gotten quite there yet. The latest iteration of Apple TV OS - tvOS - is getting close to the point where you can just name you know the ontology of what you're looking for...baseball, football, and...you can kind of get there. But you know Apple really needs to fix the remote. I mean it needs to be far-field, or you know, medium-field. And it needs to be on demand. But yeah that's going to fundamentally change all consumption. And that's a great use case. So I absolutely agree: if Samsung can do that, they're gonna change television.
Mark Tucker: [00:18:57] Yeah, a couple points that I thought of when I read this article is that Bixby was supposed to come out with the new Samsung phone and it's delayed. Amazon and Google has just made it easy. We just think "it's easy to go ahead and do a smart speaker." And I think the visibility of what Samsung is doing with Bixby is just showing us that it's actually difficult. We've kind of got a little bit spoiled with Amazon and Google.
Mark Tucker: [00:19:30] The other point that I wanted to make - this is something that Leor Grebler said in yesterday's VoiceFirst Roundtable - was that any new entrants are going to need to "wow" him. And I think that's kind of what it is with me. Go ahead, Samsung, wow me, but I don't know if this smart speaker's going to do it, but other products will probably do that.
Bradley Metrock: [00:19:51] Mark I completely agree with that, and the only thing I have to add is that you know it's starting to get crowded for me. I already am hesitant as I look toward the HomePod coming out. You know I'm. let's be honest I'm probably going to have to buy that. I'm going to do it kind of angrily.
Brian Roemmele: [00:20:12] Angry buy that! Angry buy it!
Bradley Metrock: [00:20:13] I know! I'll show them, right?
Brian Roemmele: [00:20:21] Yeah.
Bradley Metrock: [00:20:21] But you know for anybody else, other than the "four horsemen" - Google, Amazon, Microsoft, Apple - it better be phenomenal, so we'll see what they have for us.
Brian Roemmele: [00:20:32] Bradley, I've got to ask...I am not being facetious about it, but as a researcher, why.
Bradley Metrock: [00:20:38] Go ahead.
Brian Roemmele: [00:20:39] Why feel that it's just another one? Do you feel like they're too many? Does that make you...on what level? I mean, is it an emotional level? Is it a financial level? All of those?
Bradley Metrock: [00:20:56] For most people, financial aspects of this are definitely going to be a limiting factor. And I think you'll see it with Apple. You know Amazon is going to make every effort to try to ruin the launch of the HomePod by expediting every possible feature for Alexa. And $350 is a lot of money - I don't care who you are - and so that will be interesting to watch.
Bradley Metrock: [00:21:26] But for me personally, it's just a time slash attention span factor. It's "my God, do I really want to learn another one of these things?" And so there needs to be...we're reaching a point, we may not be there quite yet, because all of us are sort of on the vanguard here, but we're going to reach a point where there's just too many of these things competing for too little attention that the layman has to offer.
Brian Roemmele: [00:21:56] You know, I hear you. I'm a little biased because I'm surrounded with 77 voice-first systems, many of them I made myself but some of them are experimental by stealth companies. Some of them are out obviously. And I've gotten to the point where I like the richness. Here's the way I see it: you're going to have your best friends, and then you're going to have the rest of the crowd, and some of the rest of the crowd may be your appliances and other things. You're going to be fine talking to them because you already know the operating system. It's called speaking. And you already know the arcane aspects of that. As these devices get more intelligent or as the OS's become more intelligent - it's what I talked about in the magazine this month, you know, artificial understanding - is artificial understanding, as I define it, it's kind of a new term anyway, is that you're really not going to know who you're talking to, ultimately. There are going to be many devices, and some of those devices will be preferential and execute on what you want and mediate on the backend who's going to respond and how they're going to get it done. And it sounds kind of crazy but really it comes down to resources and whether Google can answer a question better or maybe Amazon can answer that question better. So it's really this idea of a personal assistant that's going to stand between you and these voice-first speakers, because none of these speakers, these voice-first systems, none of them are personal assistants. Not even close. Siri's not even close, and they're the closest, and Amazon certainly isn't. And it has to do with context.
Mark Tucker: [00:23:36] I thought it was interesting that you said "learn another one of these." And that's what voice is supposed to promise, right? That we don't have to learn something new - we just talk. I guess it's still a reminder to us just how early we are in this world of voice-first. That we're still feeling like we have to learn the device, as opposed to the device adapting to us.
Ahmed Bouzid: [00:24:03] Yeah I think, even though we think that this has gone mainstream, I think we're still living in a bubble. The vast majority of the people that I encounter in my daily life do not have an Echo, or have heard of it. Even people who are in the tech industry. So I think Apple does have a shot, big time, because most people who have an iPhone don't have an Echo, and don't have anything that's voice, and they'll just see it as a natural thing to buy the latest and greatest that Apple is offering. So I think by no means one should count Apple out, even though the price point is higher. But then again, you know, the people who buy iPhones and they buy watches - they are not very price sensitive. So I think next year I think the pie chart or the market share is going to look a little bit different. One should never underestimate Apple even though a lot of criticism has been leveraged or levied against them, in terms of innovation. So I think the map will be different next year, which is exciting. I think we need to have competition amongst the giants.
Brian Roemmele: [00:25:10] I agree, Ahmed. I've been fortunate enough to actually hear the sound quality, and again, as an audiophile, I was stunned by the quality. A lot of people think in terms of music, right? But also think in terms of the fidelity of that voice and the ability for that voice to project and just how...in a sense it's a magical experience because a voice literally feels like it's coming out of the room. And especially the new Siri which is much more emotive, much more expressive type of voice where it sounds like it was built for the new HomePod. And obviously Apple is also playing upon the near-field and the far-field. Apple owns the near-field voice-first world. There's nobody else coming close to AirPods. And two years ago I was talking about Apple eliminating - well, actually, seven years ago - I was talk about Apple eliminating the 3.5 millimeter jack. And it was like "that's crazy! People are going to get mad." And what happened was once they got to understand the AirPod experience - and those of us who are cutting-edge, using it for voice-first - you never go back. Most of the things I write I dictate into AirPods. I mean, why do I need to type when I can walk around? I can jog. I mean, I wrote half the magazine while walking in the hills! I didn't type most of it. And you could tell by the way it looks cause I need an editor. But those functionalities you just can't give up. And Apple's got an advantage there.
Ahmed Bouzid: [00:26:47] And Bradley, so I don't forget the thought...
Bradley Metrock: [00:26:50] Sure.
Ahmed Bouzid: [00:26:51] One of these editions of the Roundtable, we can do maybe a section on the pluses and minuses of voice-first, because I think we're all champions...but I think it'd be useful to all of us to think about what is the minus of voice-first, because there are minuses.
Brian Roemmele: [00:27:09] I'd be all in on that.
Ahmed Bouzid: [00:27:11] To help us refine our thinking, as opposed to...I think right now I think it's great for us to be evangelists. We have to, to be able to push the field forward. But I think it would be good for us to think about what is the minus of...and one minus that I see, just to trigger the thinking for whatever next time we talk about this, is being cloistered. So walking around with these ear buds, and talking all the time, and living in your shell, so to speak. "The mind in its own place" kind of a thing, as opposed to being a thing that lives and interacts with the outside world. Anyway, just hopefully an idea that that will turn into something.
Bradley Metrock: [00:27:52] I'm in complete agreement on that.
Brian Roemmele: [00:27:54] I love it too. Yeah I think it's a great idea, because I hear people all the time talking about it.
Bradley Metrock: [00:28:00] Yeah. And trust me there's plenty of people who tell me "what is this VoiceFirst.FM thing?" "How do you know this is going to stick?" Some variation of that question. And you know all the people in this podcast right now, we know that there's no turning back. But we also know that there's still a lot of hurdles in front of, in front of the sector, and it would be good to dissect those further.
Bradley Metrock: [00:28:30] And one final thing: Brian, some of us are still very salty about the removal of the headphone jack. I won't name names but.
Brian Roemmele: [00:28:43] I didn't do it!
Bradley Metrock: [00:28:44] You know, part of it too - and I don't mean open a can of worms here - but the AirPods communicate with each other with Bluetooth that passes through your brain. And said I'm not incredibly comfortable with that. And that's not the least, that's by far not the only reason, but it certainly has not gotten as much attention as it will when someone sues Apple for that, which absolutely will happen. So you know it'll be interesting to see how that plays out. I think wireless is the future, but there's still some some ground yet to be settled with that too.
Brian Roemmele: [00:29:21] Bradley, I got to say, you know, I'm an evangelist of technology but I'm also a study of where humans are going and how it is affecting us medically and psychologically like what Ahmed has been saying. And I can tell you that it is not a good thing to be surrounded by Wi-Fi 24 hours a day. It is not a good thing to have a Bluetooth device on your left or right arm and it's not a good thing to have Bluetooth or any type of cellular type of radio frequency going through your cranium because once it gets in there, through the ear canals, it literally reverberates inside your skull. Some of the frequencies, not all of them. The higher frequencies just pass through, and literally radiate you. We don't know what the impact will be and it might be very much like "nine out of ten doctors prefer Lucky Stripes." It might be that we look back in time and say "why were they so crazy to irradiate themselves with all these spectrum of frequencies? Didn't they know better?" And what I would say this, just because you brought it up, and I got to get on my soapbox about it.
Bradley Metrock: [00:30:34] Go ahead.
Brian Roemmele: [00:30:34] If you have young children and you have them in front of specifically cellular devices that are active cellular and they're in very early - and if pregnant women, absolutely - any of those early developmental moments where brain functionality is being you know formed is abundantly important to read up on the literature, to really understand that some of the studies were done by cellular companies, and to realize the model that they use was a military individual of 25 years old - their skull. They use those tests. Those tests are the defining tests for how cellular frequencies pass through the cranium. And you know that does not really match a developing fetus or a skull of a child that hasn't even been crystallized and formed. So there's a lot to be said about this and I see children when I'm outside - and again, I'm an evangelist of technology - and they have their kids with their phones up to their head listening to things talking or holding an active cellular phone for hours at a time. And it sort of breaks my heart cause you know we brought this technology about and we don't even know what the impacts are going to be. So that's my public service announcement.
Bradley Metrock: [00:31:57] Yeah I completely agree with you and I think you know that's eloquently stated. I'll let that sort of stay. And yeah, I mean you're right, there's a lot more that has to be litigated and I mean literally and metaphorically but also you know there's a lot more that society has to find out about all this technology and it'll be interesting to watch as it happens. We'll continue to have a front row seat.
Bradley Metrock: [00:32:23] I'm going to move on to story number three here, which is that Alexa has passed 15,000 skills, more than doubling since the beginning of the year. So I wrote about this myself earlier in the year that there seemed to be an exponential increase in what we're seeing with Alexa skills, and apparently we're still seeing it, because this was news this week. Mark, I'll start with you: what are your thoughts on the plethora of Alexa skills, and the growth that we're seeing?
Mark Tucker: [00:32:59] Yeah, that sounds good. Just a little bit about my background: I'm actually a fairly new comer to the Alexa platform. I haven't even hit my year mark. I've only been doing this for about 10 months now. Long background programming mostly Microsoft technologies. But when I started this ten months ago, there were 3000 skills, and you'd get a T-shirt if you if you were able to certify a skill and deliver it. And now just basically on average, we're getting about a thousand new skills a month. I'm excited for the fact that what it's telling me is that the platform is very approachable to developers, but there's a lot of novice skills up there, I guess to put it nicely. A lot of very basic skills that are out on the market. And if you're part of the Alexa Slack group - I encourage you to join if you're not - there's quite a bit of eye-rolling last month. In June was the first month where Amazon gets you a Dot if you certified, and this month, it's a Dot plus socks. So there's like I say, a bunch of eye-rolling...I'll quote somebody from the Slack group: it says "Yeah, twenty-five hundred free Dots means another twenty-five hundred hastily-written, narrow-focused, easily-certified minimal skills bloating the skill store. Anyone want to beta-test mine?"
Bradley Metrock: [00:34:32] Yikes.
Mark Tucker: [00:34:34] So there is a little bit of...on one side, I see it, because I think the novice skilled developer of today might just be the next brilliant skilled developer that pushes the platform. So, in a way, I'm excited to see so many developers joining. But I do see that there's going to be a problem with, you know, what do you do with all these skills?
Mark Tucker: [00:34:56] Yeah, I definitely agree. I think I think Amazon needs to move on from the numbers game. I think they made their...Amazon has made the point, or has validated the point, that there is energy and interest in voice. I think the next thing they need to do and they need to do this soon as possible is to enable people to monetize their skills. If you want to build a great skill, it is going to cost a lot of money. It's not a developer game. It is a product. It needs to be researched, needs to be designed, needs to be built, needs to be tested, needs to be beta-tested, needs to be launched and monitored and marketed and so forth, and it's...just like we know that there are companies that didn't in fact turn into unicorns when there was a model on how to make money and so forth using mobile apps, I think we need to move onto the next stage. I think continuing to promote numbers and saying "Now we have 20,000 skills, or 25,000 skills..." at some point it is going to be a game of diminishing returns where people are...when the narrative is going to change from "look at how many skills we have" to "look at how much junk we have out there." Here's a point that I want to make - the last point is this: I believe, if there was no skill store, I think the numbers (in terms of people purchasing Echo) would not change one iota. I think people are not buying these things because the skills, they're buying them for the 'out of the box.' So that should be something that Amazon should worry about. Are our 50,000 skills adding value? Are they compelling people to buy the Echo? Probably I'm wrong, in the sense I'm being extreme. There may be a couple of skills that are really compelling, but I really do believe that if we took the Alexa skill store and completely wiped it out, I think the impact on the purchase of Alexa would be minimal, and the adoption of Alexa would be minimal, so we need to move onto the next stage of enabling people to deliver high-quality experiences.
Mark Tucker: [00:36:56] Yeah, I agree with that, Ahmed. I like to play a game called "the brand game." If you were to pick some top, well-known brands, and then go look for them in the Alexa store, most of them, or maybe none of them, will be there. So for example, there's no Nike, or Adidas, no Pepsi or Coca-Cola, no McDonalds, Burger King, Wendy's, no Walgreens, Walmart, or Weight Watchers. So there's a lot of opportunity and I think that's going to be the next phase and it's going to require not just developers, but voice user interface designers and a whole slew of new talent that's not out there today.
Ahmed Bouzid: [00:37:38] If anybody from those brands is listening, please contact Witlingo, we'll take care of you.
Brian Roemmele: [00:37:46] You know, my view on this...I start with the fundamental reality of all this. And you know - and I echo what Ahmed and Mark said, and Ahmed brought up it very eloquently - that you need monetization. But before monetization, you have to solve discovery. The biggest problem with every voice-first platform - and this will continue on until it's solved...and there are ways to solve it, there is about twenty-one ways, maybe twenty-seven if you really want to stretch it, that solves discovery. But I can tell you what does not solve discovery. What does that solve discovery is you having to go to a visual 'hardware' store to look at an icon using this old iOS app modality and then you somehow activate or download a skill or an app is fundamentally incorrect. It's fundamentally wrong. And the moment that that changes and becomes more human-like, is a moment that these devices will become orders of magnitude more powerful.
Brian Roemmele: [00:38:47] I'll give a hint on one of them. We as humans build our skills as neurons. Neurons don't sit in a silo, isolated - otherwise we don't even know they exist. What neurons do is they interconnect and they interact with other neurons and they form of view of the world. And they form a view of a memory, and maybe there is an activation of that memory and maybe that memory is always active. It doesn't matter. But from a very simple context it's a neuron. Right now skills or apps are not neurons. They stand alone in their world, in their little sandbox, very much like a iOS app would be. And it's, I'm not being critical of the designers here. They are just using their old modalities and saying "OK, this is how we're going to do it." Follow me through, I'll be a little long on this. But if we build neurons - if we build use cases where these systems all interconnect, very much like the way the Internet does, but I don't want to get too iterative - then all of a sudden the value of each individual neuron is of course great, but the sum total is even greater.
Brian Roemmele: [00:39:58] And now, discovery is also...does this app quote unquote exist...and then invocation. How do I remember how to get to it? This whole idea of using certain keywords to try to activate a skill is abundantly insane because there are only so many domain names or only so many ways that you're going to be able to you know take...there might be a kid that takes over the ability to activate Coke! And then Amazon might say "well, this is a better skill" or maybe Coke gets to own it. But after a while they're are going to be some generic quote unquote domain names, if you will, inside the Alexa lexicon that they're going to have a fork in the road. There's 9,000 skills that can be activated with it. And then after a while - I did a study on this - at current rate, in four years, basically you're going to run out of invocation words. You're going to...you're not going to have them. And then you've hit the hard wall. So the problem is fundamentally that there's not enough people like Ahmed, I include myself, and a few other people I know in the world that are sitting down and thinking this through, and saying "hold it. We've gotten here, this is great, we've proven that voice is going to work. But, oh my God, we've painted ourself in a corner. Let's tiptoe around the edges before we get our feet in red paint, and let's find a way to solve this." And this is a huge problem for everybody involved.
Brian Roemmele: [00:41:32] Some of the folks on the Viv team were looking at this. This is something I've been studying for the better part of the last 10 years. It's been keeping me up at night obviously as you can tell. So the answer is we literally have to rethink discovery, invocation, and then monetization. All these other things are irrelevant. I don't care if there's a hundred million skills...if you can't remember how to get to them, there's never going to be an invocation of them, and how do I find that they exist?
[00:42:01] Yeah. Agreed, Brian. Absolutely. Yeah, definitely, I think you and I are definitely in agreement. I think the full team here agrees that discovery is the primary problem right now I think. I think what Amazon did was was good as an MVP. I think they had to piggyback on just a little on a concept that everybody understood which is a store, you go to there. But I think - and I've written a short piece on that - I think Google is definitely going about it the right way which is there is a way for you to say "Hey, Google ask 'blah' for this," right? But they're also building a layer where you just say "Hey Google, I want to find out what the stock is for Facebook and it, or whatever it is, meaning the intelligence that's mediating between you and the plethora or whatever actions - they call them actions on Google - their intelligence sort of figures out which among the actions is the best one for you and so forth. And so that you don't have to remember to say the name of a brand, or some kind of invocation. There is a semantic analysis of what you said with context. And again, I'm just theorizing about how they probably are going about it, because it's sort of a black box. But in essence the problem they're trying to solve, which is a very hard problem, is the problem of you saying things naturally and it discovering for you the service that it has in its ecosystem to deliver or to solve the problem you're trying to solve with the burden - which is what makes this interface very compelling - with burden being completely on the artificial let's call it interface, as opposed to you. The burden being on you not having to remember how to invoke something, but just speak it naturally. And since it's a neural network, hopefully it will learn to learn how to answer you versus somebody else, or somebody else, or somebody else, and give you what you want, given the way you speak or how you talk or how people in general talk. So I think directionally I think we are going in...we're pointing in the right direction, and I think Google is consciously playing, is not playing the game of Amazon, which is they're not playing the number game. They are going about it in a different way, which again, which is great for us. Competition is great and they are, they're trying to solve the problem of discovery in the right way which is to take over the burden.
Brian Roemmele: [00:44:36] Ahmed, I got to ask: my idea is that everything is interconnected like a neuron, right? Once we start building these actions and they become interdependent and this interdependency - I think is a really important part. What if one of the neurons leave? What if like I go into your brain and I pull out a memory and and all of a sudden other memories are impacted because of this interdependency? See what I'm essentially saying is once you go down this path, you are stuck with the idea that you'd literally have to build a very high, contextually aware, true personal assistant that is uniquely with that person. Do you see that as the ultimate directional?
Ahmed Bouzid: [00:45:17] Yeah, absolutely. And you definitely have thought about this a lot more than I have, but I definitely agree with you, with the sentiment, and the understanding I think you have a lot more than I do, that we need to have something that is highly integrated, that is personal, that is deep, and that once we dip into the world of language, we have to embrace the full complexity of how we do things as human beings, because it's really really really complex!
Brian Roemmele: [00:45:50] I love that analogy of that complexity, right?
Ahmed Bouzid: [00:45:54] It's hugely complex, yeah.
Brian Roemmele: [00:45:56] Yeah but it's solvable in your mind, right? I mean we're not talking about trying to climb in an artificial mountain right. Yeah I mean you've been down this road...
Ahmed Bouzid: [00:46:04] What I do think is that we need to keep in mind what we are trying to do which is to make people's lives better, right? As opposed to replicating a human being. To the extent that we can make people's life better - and for me a better life is a natural life, and not one that is infested by gadgets and so forth but one that allows you to have a conversation with human beings in 3D. You hear their voice, you touch them, you do things...and because I think you made that, I think you wrote an article where, or I think I read some of your tweets, were if you compare the amount of time that it took for us to evolve and all that stuff up to the point where we started to interact with technology, this is no more than a dot in a long line. So there is hubris in thinking that we can actually create technology that is going to overcome our innate needs to be fulfilled human beings. Our innate need is to be fulfilled human beings, just to be able to engage with other human beings in a natural way, which is voice, conversation, face-to-face, touch. And if we don't do that then we are in a state of anime, where we're not completely ourselves. So...I'm getting on my soapbox again.
Brian Roemmele: [00:47:26] That's actually beautiful. And I got to say this: the idea that humanity was designed to type...this last issue of Multiplex magazine, I went into Broca's area. You know when you understand that the brain is actually telling you what to type and you have to transcribe it...most people in the research community are not thinking in these terms, and most of us in the technology world just take it as an apriori that "oh, when you type, you type." No. What you're doing is you're transcribing an inner voice.
Brian Roemmele: [00:47:59] So, in a sense, when you are activating your motor neurons and then they are activating your nervous system to try to type one painful letter at a time to form words, you realize that there's a throughput problem in your Broca area - which, Winkel area, they interact - but you know Broca is basically your inner voice or that region. I cut that out of your brain, you can never communicate again. You can't type, you can't write, you can't speak. I give you Broca back, you can. I partially damage it, you might only be able to use a few words. And Broca discovered that by somebody who had a lesion on the area that they named after him.
Brian Roemmele: [00:48:40] But when you understand that humanity took a big step backwards so that computers could understand us...when you really understand that that cannot stand, that the mind is not going to evolve...see evolution doesn't work this way. You don't start working a muscle and then all of a sudden new babies will come out with a bigger brain. You have to die. Certain people have to die, and others have to live, for evolution to work. And I don't see anybody dying because they have a better, or living longer, or whatever, because they have better skills of typing. That certainly is not what's going on in the world. So when you understand why voice-first, I say this: go back to Broca, study his area, study how that we're literally transcribing everything as a voice in our mind. It's literally...think about when you type. As you're typing you're saying words in your brain. There's no way other way around that.
[00:49:33] When you are writing, when you're manually writing - and fortunately you're literally doing both parts of the brain, that's why writing is always better than typing - but as a programmer 10 years ago, 20 years ago, I would have really wanted to argue. I'd get mad at me for saying this stuff. But it's the reality: we were designed to speak. And that's why voice-first is going to dominate. That's one way, or the other.
Mark Tucker: [00:50:00] I love this. So we start from Alexa skills, and we end up on neuroscience!
Bradley Metrock: [00:50:06] Well, yeah, and you know the voice in my head is telling me it's time to move to the next news story. I'm glad you guys had nothing to say on this one though.
Bradley Metrock: [00:50:15] I will say that I look forward to contributing to the garbage in the Alexa skills marketplace store with a skill coming out very soon for This Week In Voice, where you will be able to say "Alexa, please play This Week In Voice," and it will play it for you, just like you can, conveniently, for VoiceFirst Roundtable. You can say "Alexa, please play VoiceFirst Roundtable," and it will begin playing that. You will find funny though working with Fourthcast, which is our sponsor for a couple of the other podcasts, we tried to do an Alexa skill for The Alexa Podcast. And I knew it was going to get rejected but it was just funny to watch the developer who is in charge of that deal with that. So yeah, Amazon was not willing to have an Alexa skill where you say "Alexa, please play the Alexa Podcast." And Brian, it's to your point just that we're going to hit this wall of you know having invocation words...it's just not going to work for too much longer, so your point is well taken.
Bradley Metrock: [00:51:26] Moving on to story number four, in the news: Google fined 2.4 billion euros by the European Union for "anti-competitive practices in written search." And my question - I'll start with you, Mark, again - will this impact acceptance, adoption, etc. for Google Home and Google Assistant, or is this just sort of irrelevant?
Mark Tucker: [00:51:55] This definitely isn't my specialty. I do personally find an issue if Google preaches "consumer first" and then ranks up its products higher. I do have a problem with that, but long term fallout, I don't know that there's going to be any. And I don't know that it's going to effect voice. That's my short analysis.
Brian Roemmele: [00:52:19] Well you know it's interesting you bring this up because this has been something I've been thinking about for a very long time. Anti-trust actions go through cycles. I grew up in Central Jersey when the divestiture of AT&T took place, and there's good and bad about that. Some of it's very bad, as far as the way research has declined in America. You know AT&T led that for a very very long time. But anyway we have history to show us what this looks like within what happened with Microsoft. And it turns out all of that work of trying to regulate Microsoft really didn't do anything. Ultimately Microsoft lost their dominance because technology passed them by. And what is going to happen I believe is...yes, does Google sometimes not do the right thing? I would say probably that's very true. Is it the province of regulators to regulate? Yeah. And do they need to be threatened sometimes? Absolutely. But when it comes to voice it's gonna be a very critically important problem. Problem number one is: what is going to dominate, if in fact the invocations like we talked about go away? Who's going to get to control that in your voice-first system? How much is somebody going to pay, perhaps, to take over certain words? And is that in the best interest of the user when they're asking for things? You know, that's problem number one.
Brian Roemmele: [00:53:53] But it gets even more complex when we start talking about privacy and I believe you know this whole thing about the anti-trust actions and fines are going to set the stage for privacy actions, and I believe the next 10 years we're going to see a war over privacy like we've never seen before. The generation that said "hey, I don't care, I put my whole life up there, big deal." They're going to start getting older. They're going to start changing a lot of their world views, and are going to start saying "I want my privacy back. You don't have the right to know this." And to hand to Google something is they finally stopped fishing through Gmail. I think that's a very important beginning. I think it was overlooked by a lot of people and maybe it's a reaction to other regulation that was coming across - maybe it was a reaction to the political landscape of 2016. I'm not sure. But the bottom line is regulators are there to regulate. They're going to do things that are both in the interest of the people. And unfortunately in the interest of perpetuating being more contentious in a regulatory environment. It's the best way I can say it.
Ahmed Bouzid: [00:55:00] Very briefly, I'm always sympathetic to any entity that challenges big monied interests. So government challenges a massive entity that has infiltrated our lives like Google - probably Google has infiltrated ours a lot more than Apple, for example, because everybody uses Gmail and Gmail is where most of our lives happen in a recorded way and in a way that you can search for, anyway. So I'm sympathetic and I love the fact that Europe has challenged Microsoft and has challenged Google and they need to be kept in check. Number one.
Ahmed Bouzid: [00:55:38] Number two, I think the voice to voice, especially the voice interface, is one that has pluses and minuses. The minus of voice is that it is very constraining. If you search for something, you don't have in front of you the leisure of a screen where you can see multiple results and you can click on this and do your you know your search on your own terms to find what you're looking for. So you are a lot more susceptible to manipulation. So if Google for example is promoting certain results, and provides you those with that answer and the answer is not purely a clean rendering of what is the best result out there, but is influenced by some monied interest, you as a user trusting, and as most users are, most users are not skeptical and think through things and wonder about what interest is behind this and that. Just look at technology. They believe that is done and it is actually delivering the best outcome out there. And so therefore there are a lot more susceptible to manipulation. And so I think the problem is more acute with voice, just because of the nature of the interface. And so I'm looking at this particular challenge very closely to see what how are we going to resolve it. And I think, like all problems that are worth solving, I don't think it's an easy problem to solve. So I'm anxious to see what's going to happen next.
Bradley Metrock: [00:57:09] Okay. And that's a segue into our fifth story, which is a positive story about Google: that Google Home was just released in the United Kingdom. And one thing I'll throw in here before tossing this to y'all is that I was unaware that Google Home recognizes up to six different voices. And I think that's really really cool. I've been, you know, I don't own a Google Home - that will change soon. But you know I've sort of been waiting to see, you know, naturally when am I going to find out what the differences between Google Home and Amazon's hardware are. And I found that to be a fascinating one. So Google's doing a lot of good things, including they just went into this new market. So Ahmed, I'll start with you...your thoughts?
Ahmed Bouzid: [00:58:04] Yeah absolutely. So I think, as I was saying before, I'm very happy that Google is in the mix. I'm happy also that Cortana is in the mix. But for sure Google is is trying very consciously to differentiate themselves from the front runner, which is Alexa. I think distinguishing between voices is a huge one for sure for many reasons. I think the fact that they are in Canada, and the fact that they are in UK, although Alexa is in the UK as well, is great. I think, from a developer, I love the fact that we the people who are interacting with the Google Cloud through their SDK are able to get to not only the intent and the slots but are able to get the full transcription of what the customer said, which is hugely important for discovery and which is a thing that you cannot today do (hopefully the Alexa team is listening to this) that you cannot today do, and thats a big minus because you go out with the skill and action and with an MVP with some assumptions and I did discover what people are asking for and by doing something like a word cloud or what people are asking for, you discover gaps in your action, and you can iterate. So getting the full transcription is a big plus. And then the other thing, which I already mentioned, which is surfacing skills without having to, you know, to do an incantation or formulate "Hey, Google, ask 'x'." You can just say "Hey, Google, I want to find out how the Facebook stock is doing." And it surfaces the Motley Fool action, for example. That's a big plus. So Google is definitely doing some great things and I think, I'm a big champion of them as well as the other ecosystems.
Brian Roemmele: [00:59:43] Yeah and I got to echo what Ahmed said. Google is trying very hard to extend some of the abilities for developers to really understand how individuals are interacting with their designs. And a lot of the folks I work with really appreciate that. I got to say - and Ali Baba, we're going to get to that, is doing voiceprints - it's fundamentally important that we do identity recognition and we understand who is speaking to the system and how they're interacting with it, and also to create sandboxes of activity capability of an open mic. So we already saw what happened with...Burger King did it with Google, and others have done it indirectly or on purpose with Alexa just by yelling a commercial out...already going out in the background, right? So what we need to do is understand that once there is voice commerce attached to these systems, or somebody can do something that can potentially impact somebody's privacy or financially...you have to have some form of identity recognition. And voice-print that, for example, Alibaba is using is pretty good, pretty darn good, not perfect. And I don't think any voice-print technology is going to be abundantly perfect. I think Apple's approach is ultimately going to use biometric at a distance - that's one of the reasons why they acquired PrimeSense. PrimeSense, Microsoft had the opportunity to own that company. They made the system for XBox, many years ago, which they've discarded now. But the idea is who's speaking to me? And what context are they speaking to me? That's what's most important. And Bradley, you surfaced an interesting artifact that most people, even in our community, don't understand - Google is capable of doing some of these things. And we're going to see Alexa do it very soon, also, I can tell you from an inside track.
Mark Tucker: [01:01:39] One of the things I learned from this article that I really enjoyed was that they call them "digital butlers." I just like that image. Congratulations to Google, they are showing that, on a global scale, the race is still early if you think about this as a marathon. Not only for dominance in the United States, or in the world, but also early in the race as far as sophistication of our devices. So if we were to think about this as a marathon, I think we're really just barely past the starting line. It's a global race, and Google's in the US, UK, Canada, and by the end of the year, Australia, France, Germany, and Japan. Amazon had an early head start out there: US, UK, Germany. Looking forward, Japan and India are maybe the next two stops for them. So. I'm excited for the race.
Bradley Metrock: [01:02:41] So we'll move onto story number six, which is Alibaba is preparing a voice hub within the Chinese market. And, for this, Brian, I will start with you. What are your thoughts?
Brian Roemmele: [01:02:54] I think it's absolutely astounding. Alibaba is equivalent to Amazon and in some ways maybe larger, when you look at some of the raw number of items that they move. One of the things that defined the web revolution was pay per click advertising, and that was the fundamental monetization unit that made the web as we know it. If it didn't exist, we wouldn't have had it. The fundamental unit that's going to really drive the voice-first revolution is going to be voice commerce. It's really not going to be an equivalent of pay per click, or even paid for placement that we were talking about prior with Google. Fundamentally what's going to happen is we're going to find ways to buy things in very creative ways. That's what humans do. And it sounds commercial and crass but unfortunately that's what people do with their time, or fortunately, however you want to look at it.
Brian Roemmele: [01:03:50] So Alibaba recognizes this fact. And again getting back to what they have, it's called the Tmall Genie. And it's named after their extremely popular online commerce system. In fact I would say in some ways China's many times more advanced than the rest of the world through Alibaba, as far as making purchases through this type of platform. I mean you can buy live chickens, live pigs, I mean you can buy all sorts of things from Alibaba that...Amazon's not shipping, not that I've checked...I can say "Hey Alexa, can you get me a pig?" But Alibaba, you can get that delivered to you in three hours or less, dressed or undressed with an apple in it's mouth.
Bradley Metrock: [01:04:34] Are you serious?
Brian Roemmele: [01:04:35] Absolutely. Yeah. So this is a commerce platform, it's a real deal. And it's also a compromise of millions of entrepreneurs. You know just like Amazon - most people think Amazon's this monolithic company, and they sell everything. Sixty-eight percent or more is being sold by independent entrepreneurs through the Amazon discovery platform, because that's really what Amazon.com is, is a discovery platform. And so what Alibaba is is an equivalent to that. Some say they copy; some say Amazon copies. It's irrelevant. What really is going on is that two of the largest commerce companies in the world now have voice-first platforms. And if you've been sleeping through this revolution, it's time to wake up. And if you're a brand and you're selling something, really wake up, and say "hold it. Voice commerce is happening.
Brian Roemmele: [01:05:26] I know all the existential debates people have - "I can't see it, I can't buy it" - yeah, that's the same debate I had in 1994 with people like "I can't buy on the web because I can't touch the clothes." And we know where that went. And Amazon dominated. Amazon dominated the web because they knew that they could sell things ephemerally, that people don't touch. Sound familiar? Yeah that's called voice commerce today, and that's why it's hard for people to understand, cause if they understood it they'd be on the train already, you know, taking advantage of this ride because the ride is going to lift a whole lot of people.
Brian Roemmele: [01:06:03] So Alibaba, with the Tmall Genie, is creating an open platform. It's a far-field device, seven microphones...it's got a really robust processor, a deep speaker, deep movement. Well I think it's got almost two inches of movement. So it's going to move a lot of sound. It has voice-print technology, which I mentioned before. Very important to understand why that is big in China. It's because a lot of people are very paranoid over there. Their roommates being able to order things when they didn't give them permission. So Ali knew that, and they kind of built that as an apriori for their system. So this is going to be released for $75 for the first 1000 and maybe a street price of $99. I'm predicting this to be, in the space of three years, one of the largest voice-first platforms on the planet. Just because of how big Ali is, and how aggressively they're going to move it.
Ahmed Bouzid: [01:07:05] Yeah. And just to piggyback on that, I think the way the market has evolved in China is very instructive in the following sense: that what they took care of, the last multiple years, is the 'plumbing.' Emergence of people being able to text anyone or any little business, chicken or wood or buying of food. And that infrastructure is what is now enabling Alibaba to build on top of. So they are way way ahead of the whatever chatbots that we have right now. It's interesting that the chatbots - let's call it the chatbot space - has been tackled from the UI perspective in the United States. Basically people are thinking about how does one talk to a robot, and so on and so forth, and how do we make the conversation natural? As opposed to what happened in China, which is they didn't care too much about the the interface. What they cared about is the plumbing with the interface being between a human and a human. Once you have that interface then you can add on top of that and the artificial layer of somebody talking to a non-human. And that's why I think we have something formidable to look at with Alibaba, and the fact that now that you have this plumbing on top of that voice interface where you can actually do useful things like buy something from, I don't know, the down-the-street grocer and you just make the order and they can deliver to you, you can pick it up, and there - now you have an actual value-based interaction through voice that has an infrastructure that has been being built for multiple years before that, to this point where we are today.
Mark Tucker: [01:08:54] What's interesting about this is it's not really an Amazon competition. Alibaba's not really going up against Amazon, cause if you look in China, Alibaba's got 57 percent of the share, and the next largest retailer has 25 percent. And if you look at the top three internet giants, two out of the three already had some sort of a voice device. And so this is actually Alibaba doing a little bit of catch-up. But excitingly, they have a developer SDK, so it sounds like it's gonna be sensible.
Brian Roemmele: [01:09:27] One of the fundamental things that I think - and Ahmed brought this home really brilliantly - is that Ali really built the plumbing in a way that nobody else around the world has done. And on top of that, they added Ali Pay, and it's so important to understand how voice payments and voice commerce are going to interact with each other. There is no credit card in the voice-first world. So the selection of payment, and how you renumerate various components inside this new paradigm is going to become absolutely stunning. And I think Amazon is in a unique position that they've built Amazon Pay. I think Amazon Pay is going to become fundamentally important, not just for voice payments, but for web payments, and it's going to be quite a disruptive thing. That one click experience is what they're trying to bring around the world.
Mark Tucker: [01:10:27] Yeah I just wanted to say really quickly...I was really new to this topic, and voicebot.ai, I want to do a shout-out to Bret and Eva there. They have a good article today, and yesterday, on this Alibaba subject.
Brian Roemmele: [01:10:42] I also wrote an article on ReadMultiplex.com, you know, kind of covering some of the deeper details. And voicebot.ai is great - great folks over there. The thing that we have to understand though, before we put a head on this, is 450 million daily users. You've just got to think about that mass scale, and that is growing, you know, regularly. I believe that probably, by the end of the year, we're going to be approaching almost a billion users by the growth structure that they have. It's a massive platform.
Ahmed Bouzid: [01:11:15] And also massive data, which will take their voice recognition to a whole new level, even though the Mandarin, Cantonese...perhaps the audience doesn't know this, but the speech recognition problem in Mandarin and Cantonese is a lot easier than it is, than it is for English or French or all the other languages. So on top of that, you have this layer of billions, or an order of magnitude more probably than what Amazon is able to collect, and then you may, you probably will have a superhuman - meaning a system specious that is able to understand better than human being, human-spoken speech - which then takes us to a whole new level of amazingness, as far as technology is concerned.
Bradley Metrock: [01:12:07] Excellent. So I'm calling an audible for the last question. We had a seventh question selected - it involves Adobe and stuff they're doing with analytics. If you want to read about that, if you're listening, feel free - it's on ThisWeekInVoice.com.
Bradley Metrock: [01:12:21] I want to conclude this podcast, which has just been phenomenal - this is just going to be a fun show every week - with each of the three of you just simply stating what you think the most important story or most important aspect or angle on voice technology right now is, starting with you, Ahmed.
Ahmed Bouzid: [01:12:42] So for me right now the biggest thing in voice, or the issue that I think we need to solve, is to get off the notion that what we need are more developers...and to get into the world of more designers and product managers and so forth. Meaning, I would love if we started thinking about how do we make sure that people understand that voice experience is a complex thing to build and design for, as opposed to coding up skills. That's a thing that I would love for us to move off on.
Bradley Metrock: [01:13:18] OK Mark, what's your biggest news story or the biggest thing in voice to you right now?
Mark Tucker: [01:13:26] The thing that I'm focusing in right now is just evangelizing voice-first. Phoenix is a big enough market - it's a top city in the United States - but there is a lot to learn. There's not a lot of people that know much about voice-first. We had a great meet up last month where we had UX people coming in, and they were asking questions about "what does it mean to be a voice user interface designer?" So I'm trying to figure out how to take what I've learned over the last 10 months, and even over the last seven months of doing this meetup, and how I can encourage other people to start meetups in their own communities, just so that this knowledge gets out there to developers, to designers, to entrepreneurs, academics, enthusiasts, anybody...because the people that I'm talking with today are going to be those that are doing it, pushing the voice-first revolution tomorrow.
Bradley Metrock: [01:14:28] Very cool. And Brian, you've got such interesting and diverse...such an interesting vantage point on all of this. What is your biggest story right now in voice?
Brian Roemmele: [01:14:40] Well thank you Bradley. You know, in Multiplex magazine, I've been trying to slowly introduce the world to some of the ideas. I mean I've been thinking about this since the 1980s, and I wrote a manifesto about it, and you know, slowly but surely I'm getting some of these protocols out there. And the very latest protocol is really based upon how Broca discovered how the human mind really works. And a lot of people think this is very esoteric, and maybe we would think when we're using command lines of DOS and Unix and CPM back in the early days that graphic design was an esoteric study, because the computer would never ever possibly display graphics on the screen. That was the belief system.
Brian Roemmele: [01:15:23] And today, just like what Ahmed was saying, is, you know, the mechanics of actually building skills - it's beautiful, and I want to evangelize that. But what we need is to draw people with psychological backgrounds, with anthropology backgrounds, people who are lyricists, people who are poets. We're going to be talking to these devices through the same mechanism that we've used to interact with other human beings. Artificial understanding is not about trying to make the computer become a human. What it's about is to try to have the computer understand the human in a much better sense. And so I'm really promoting, since I've finally released this term - I'd been holding it back for a while - I think it's time that we start talking about it. This whole "artificial understanding" is how does a human brain come to a thought? How does that thought become a word, or a sentence, or an idea, you know, transmitted to another person? And if the computer can understand that, our machine learning and artificial intelligence can understand that better, our throughput in interacting with these devices will increase by a magnitude. There won't be a need for invocation. There won't be a need for all these other types of things. What will happen is we'll have a rich and diverse interaction with a personal assistant, and what I'm really talking about is moving away from Q&A - Question: what's the weather? Answer: it's going to rain - to dialogues. And a dialogue means that it needs to know more context about you. And when it needs to know more context, then it becomes much more invasive in your life. And there's a whole lot of elements that come to that. So my biggest thing is we've got to start with what I believe is ground zero: understanding the human brain. Understand that if we don't react in building these things to a sense of how humans truly interact with each other, we're wasting our time. And that's one of the reasons why, fundamentally, chatbots have failed, is that they were single silos amongst themselves with the developer trying to guess at every possible way that somebody is going to state something. That's ridiculous. It's redundant, and it's ridiculous. We're sort of still doing that right now with our voice-first development. We've got to sit there and think "what invocations will people use? How will they ask?" This is absolutely ridiculous and it's a mechanical process. Technology exists today where we can kind of sidestep that. It's just not being implemented. And so the revolution continues. But I think the real revolution is going to be an artificial understanding and how we wind up really becoming a reflection of ourselves in our technology in a very meaningful way.
Ahmed Bouzid: [01:18:06] Yeah, and one last item, just to amplify on all that Brian has said there, and that is that I think it's becoming clear, beyond the initial core circle on this, and the people on the panel, and the community, is that people are becoming aware that what we need right now people from the humanities to step in, into the fold.
Brian Roemmele: [01:18:33] Absolutely.
Ahmed Bouzid: [01:18:35] Exactly. And so there is an article...I want to point out an article that Harvard Business Review published in the July / August latest issue, and the title of the article is "Liberal arts majors are the future of the tech industry." I believe that one hundred percent. I think we need people who think about the full complexity of the human existence, and how we go about living our lives, as opposed to 'push button and something happens,' which is more of an engineering kind of world view. So read that article - it's fascinating - and it's great that now it's coming into the mainstream consciousness that we need people who think about humanity, on top obviously of the entrepreneurs and engineers and technologists out there.
Brian Roemmele: [01:19:27] You know, I would cap it off like this. And you know, if he was still alive, Steve Jobs would, well, I know as a fact he would be championing voice to a level that we've never seen before. One of the reasons...one of last acts as an executive at Apple was to acquire Siri. And what Steve gave us is the smartphone and the graphic user interface, and that was at the crossroads of engineering, mechanical arts, computer science, and liberal arts. And we have drifted away from that quite a bit. We've gotten back into the mechanized way of looking at things, and that article that Ahmed pointed out is beautifully written, very well-documented, and you know a lot of us engineers...I come from an engineering background but I obviously bifurcated a little bit over my life. You know we really need to understand what does this technology really going to do in our life? Why are we heads down? It breaks my heart to see teenagers walking down the street, and especially in San Francisco, for maybe a mile and never lifting their head up. And wishing that there was a camera to show them that they're in a crosswalk, so they still don't need to lift their head up. You know what I'm saying? That's not what we evolved to. So I always like to balance everything off. It's like, we're gung-ho about this technology, but like Ahmed said, it's got to be humanistic. It's got to be able to make us greater, not make us a slave.
Bradley Metrock: [01:21:00] Gentlemen, thank you very much for setting the time aside. This was phenomenal. Thank you for everyone listening to this. We will aim each week to bring you the most relevant, most insightful news of the week. So guys, thank you very much for being part of our first-ever podcast!
Ahmed Bouzid: [01:21:23] Thank you so much.
Brian Roemmele: [01:21:24] Thank you very much.
Bradley Metrock: [01:21:27] Absolutely. So for the first episode of This Week In Voice, thank you for listening. And until next time.