Está en la página 1de 21
US 20 cu») United States 15 383334, 2) Patent Application Publication co) Pub. No.: US 2015/0138333 Al DeVaul et al. (54) AGENT INTERFACES FOR INTERACTIVE ELECTRONICS THAT SUPPORT SOCIAL cu (75) Inveators: Richard Wayne DeVaul, Mountain View, CA (US); Daniel Aminzade, Mountain View, CA (US) (73) Assignce: Google Ine., Mountain View, CA (US) (21) Appl. Nos 18407159 ) Filed: Feb, 28, 2012 Publication Classification Gh Ic. Git 33/00 (2006.01) Goor x01 (2006.01) GO6F 316 (2005.01) 104, vay weroPHoness CAMERAS! SPEAKER) Boros) (43) Pub, Date May 21, 2015 ‘Mos 57225 200501) G6K 9/00 (2006.01), HOR 32 (200501), (2) US.CL CPC GN6E 3/013 2013.01); HOAR 12 2013.01) HOSN 52257 (2013.01); GO6E 3/16 2013.01); GO6K 9/00288 (2013.01) on ABSTRACT An anthropomorphic device, perhaps inthe form factor of {oll or toy, may be configured to control one vr more media devices. Upon reception oradetection ofa social eve, such as ‘movement andiora speken word oe phrase, the anlropomor phic device may aim its gaveat the souree ofthe social eu. In esponseoreceivinga voice command, the anthropomorphic device may interpret the voice command and map it to a ‘media device command, Then, the anthropomorphic device ‘may transmit the media device command toa mesa device, instructing the media device to ebange state mcRoPHONe{®) Patent Application Publication May 21, 2015 Sheet 1 of 8 US 2015/0138333 Al SERVER DATA STORAGE SERVER DEVICE 110 FIG. 1 107 = ie 8 Patent Application Publication May 21, 2015 Sheet 2 of 8 US 2015/0138333 Al ‘SERVER DEVICE 200 USER INTERFACE PROCESSOR 206 DATA STORAGE 208 FIG. 2A SERVER CLUSTER | { SERVER CLUSTER | [ SERVER CLUSTER 226q 20h 2265 208 226c 206 SERVER SERVER SERVER DEVICES 2008 DEVICES 2008 DEVICES 2006 ‘CLUSTER DATA ‘CLUSTER DATA ‘CLUSTER DATA STORAGE 2224 STORAGE 2228 STORAGE 222¢ ‘CLUSTER CLUSTER ‘CLUSTER ROUTERS 2244 ROUTERS 2248 ROUTERS 2240 NETWORK 108 FIG. 2B US 2015/0138333 Al May 21, 2015 Sheet 3 of 8 Patent Application Publication ve ‘ld + a > SHOUTS viva rr es wars [3 voe7 ric} viva oze NouWvorday| swvuooud | NoUWonddv a aevuols viva soe? oe BOVaNaINI wasn vor) ‘OVABLNI NOILWOINNWWOD zoe?) May 21, 2015 Sheet 4 of 8 US 2015/0138333 Al Patent Application Publication (s)3NOHdONOIN ge ‘Old US 2015/0138333 Al May 21, 2015 Sheet 5 of 8 Patent Application Publication ySld s10¥- oy aNvHNWoo "301Aa0 viGaW ai ‘aNvHiW09 3910, “4O NOULWISGUSINT ou vty LaNYWNNOD. BOI0A wasn LY veaNVD 7 (wiv anv an avn ow so (oman) 507 30IAga aor 20 ad1nza a wana so1Aga viaN ‘SIMA OWOdONHLNY oe US 2015/0138333 Al May 21, 2015 Sheet 6 of 8 Patent Application Publication s ‘Sls ris anynnoo ‘30130 WiGaN zs NvWiNOD aD10n a +40 NOLLW1awaeaANI os INVINNIOD aDIOK eos” ‘Wan SWS SAL WOU UV GNVWMWOD NOLVALLOW sos” | _22!0R any aNyA0D 3O1OA LHL SNINWSL30 ANY, 010A WaSN Ly VEaWVO wiv Nv dn abvAk ae anvnwoo ‘NOWALLOV ad10A os” 0F 301n30 FoF 0% ao1n30 pertars oln30 viaaw ola ONOIOUNY een Patent Application Publication May 21, 2015 Sheet 7 of 8 US 2015/0138333 Al 600 DETECT A SOCIAL CUE, WHEREIN THE ANTHROPOMORPHIC DEVICE INCLUDES A CAMERA AND A MICROPHONE, AND WHEREIN DETECTING THE SOCIAL CUE COMPRISES THE CAMERA DETECTING ‘A GAZE DIRECTED TOWARD THE ANTHROPOMORPHIC DEVICE ‘AIM THE CAMERA AND THE MICROPHONE BASED ON THE DIRECTION OF THE GAZE. 604 WHILE THE GAZE IS DIRECTED TOWARD THE ANTHROPOMORPHIC DEVICE, RECEIVE AN AUDIO SIGNAL VIA THE MICROPHONE BASED ON RECEIVING THE AUDIO SIGNAL WHILE THE GAZE IS DIRECTED TOWARD THE ANTHROPOMORPHIC DEVICE, (i) TRANSMIT ‘A MEDIA DEVICE COMMAND TO A MEDIA DEVICE, AND (|!) PROVIDE ‘AN ACKNOWLEDGEMENT OF THE AUDIO SIGNAL, WHEREIN THE MEDIA DEVICE COMMAND IS BASED ON THE AUDIO SIGNAL FIG. 6 Patent Application Publication May 21, 2015 Sheet 8 of 8 US 2015/0138333 Al 700 ee DETECT A FIRST AUDIO SIGNAL, WHEREIN THE ANTHROPOMORPHIC DEVICE INCLUDES A CAMERA AND A MICROPHONE ARRAY, AND WHEREIN DETECTING THE FIRST AUDIO SIGNAL COMPRISES THE MICROPHONE. ARRAY DETECTING THE FIRST AUDIO SIGNAL, 702 TN DETERMINE THAT THE FIRST AUDIO SIGNAL ENCODES AT LEAST ONE PRE- DETERMINED ACTIVATION KEYWORD 704 IN RESPONSE TO DETERMINING THAT THE FIRST AUDIO SIGNAL ENCODES. ‘THE AT LEAST ONE PRE-DETERMINED ACTIVATION KEYWORD, THE (I) PROCESS THE FIRST AUDIO SIGNAL TO DETERMINE A SOURCE DIRECTION OF THE FIRST AUDIO SIGNAL, AND (ll) AIM THE CAMERA AT THE SOURCE DIRECTION OF THE FIRST AUDIO SIGNAL 708 WHILE THE CAMERA IS AIMED AT THE SOURCE DIRECTION OF THE FIRST AUDIO SIGNAL, RECEIVE, VIA THE MICROPHONE ARRAY, A SECOND AUDIO ‘SIGNAL 708 BASED ON AT LEAST ONE OF INPUT FROM THE CAMERA AND THE SECOND ‘AUDIO SIGNAL, DETERMINE THAT THE FIRST AUDIO SIGNAL AND THE ‘SECOND AUDIO SIGNAL ARE FROM A COMMON SOURCE 710 a) ee ee IN RESPONSE TO DETERMINING THAT THE FIRST AUDIO SIGNAL AND THE ‘SECOND AUDIO SIGNAL ARE FROM THE COMMON SOURCE, ()) TRANSMIT A ‘MEDIA DEVICE COMMAND TO A MEDIA DEVICE, AND (ll) PROVIDE AN ACKNOWLEDGEMENT OF THE SECOND AUDIO SIGNAL, WHEREIN THE MEDIA, DEVICE COMMAND IS BASED ON THE SECOND AUDIO SIGNAL FIG. 7 US 2015/0138333 Al AGENT INTERFACES FOR INTERACTIV’ ELECTRONICS THAT SUPPORT SOCIAL CUES BACKGROUND 10001] With th rise of Internet Protocol (IP) based net- working, the use of media technologies eontinae to expand ‘and diversify. Modern televisions, digital video recorders (DVRs), Digital Video Dise (DVD) players, stereo compo- nents, home automation components, MP3 players, cell phones, and other devices can now communicate with one nother via TP. This advent, in turn, has brought about dra- matic changes in how these media devies are used SUMMARY, 10002] In an example embodiment, an anthropomorphic device may deteta social eve, The anthropomorphic device may’ inelude a camera and a microphone, and detecting the social eve may comprise the eamera detecting a gaze directed towant the anthropomorphic deviee. The anthropomorphic ‘device may’aim the camera andthe microphone based on the direction ofthe gaze, While the gaze is directed toward the anthropomorphic deviee, the anthropomorphic device may Feceive an audio signal vi the microphone. Based on eceiv= ing the audio signal while the gaze is directed toward the ‘anhropomorphie device, theantlropomorphie device my @) transmit a media device command toa media device, and Gi) provide an acknowledgement of the audio signal. The me ‘device command may be based on the audio signal [0003] A further example embodiment may involve un aicle of manufacture incling a non-transitory computer readable medium. The computer-eadable medium may have Stored thereon program instractions tha, upon exceution by ‘an anthropomorphic computing deviee, cause the anlhropo- ‘morphic computing device to perform operations, These ‘operations may include detecting a social eve atthe anthno= pomorphic computing device, wherein the anthropomorphic ‘compiling deve includes a camera and a mieroplone, and ‘wherein detecting the social cue comprises the camera detect- ing a gaze directed toward the anthropomorphie computing device. The operations may also inclide siming the camera and the microphone based on the direction ofthe gaze, and, ‘while the gaze is directed toward the antaropomorphie com puting device, eceiving an audio signal via the microphone. Additionally, the operations may include, based on receiving the suo signal while the gaze is directed toward the anthro- pomorphic computing device, (i) transmitting a media device ‘command to a media device, and (i) providing an aeknowl- ‘edgement of the audio signal, wherein the media device com- ‘mands based onthe audio signal 10003} Another example embodiment may involve an santhropomonphie ‘comprising, a camera, a micro- Phone, and @ processor. The anthopomomphic device may also include data storage containing program instroetions that, upon execution by the processor, cause the anthropo- morphic device to (i) detect a social cue, wherein detecting the sovial eve comprises te camera detecting a gaze directed towand the anthropomorphie device, (i) diroetthecameraand the microphone based on the direction ofthe gaze (i) while the gaze is dirseted toward the anthropomorphic device, receive an audi signal via the microphone, and (v) based on receiving the aidio signal while the gaze is directed toward the anthropomorphic device, (a) tansmit a media device May 21, 2015 ‘command toamedia device and () provide an acknowledge- ‘ent ofthe dio signal, wherein the media device command js based on the audio signal [000] In sill another example embodiment, an anthropo~ ‘ombic device may detect a fist audio signal. The anthro- pomorphic device may include a camera and a microphone nay, and detecting the fist audio signal may comprise the rophone aray detecting te fist audio signal. The anthro- pomorphic device may determine that the fist audi signal encodes atleast one pre-determined setvation keyword, In response to determining that the frst audio signal encodes the a Teast one pre-determined activation keyword, the antheo- pomorphic device may () process the first stdio signal to Setermine a source direction of the firs audi signal, and Gi) sim the camer at the source direction of he frst audio signal ‘While the camera is aimed atthe source direction ofthe first audi signal, the anthropomorphic device may receive a xec- ‘ond aio signal via the microphone array. Based nat least ‘one of iaput om the camera and the soon audio sia, the anthropomorphi device may determine that the first audio ‘Signal andthe second audio signal are from acommon source. Iresponse to determining that the first audio signal and the second aio signal are from the common souree, the anthro- ‘pomorphic device may (i) transmit a media device command foamediadeviee, and (i) provide anacknowledement of the second auto signal. The media deviee command may be based on the second audio signa. [006] These as well as other aspects, advantages, and ltematives wll Become apparent those oferdinary skill in the art by reading the following detailed description with reference where appropriate tothe accompanying drawings. Punlier,itshouldl be understood that the description provided in this summary seetion and elsewhere in this document is intended to ilustrate the claimed subject matter by way’ of example and not by way of limitation. BRIEF DESCRIPTION OF THE FIGURES {0007} FIG. 1 depicts ctistbsted computing architecture Including antropomorphie devices, in accordance with a0 cxample embodiment {0008} FIG. 24 is block diagram of a server device, in cconlance with an example embodiment {0009} "FIG. 2B depicts a cloud-based server sys accordance with an example embodiment (0010) FIG. 3A depicts a block diagram of anthropomor- Pic device hardware and software, ia accordance With an {exemple embodiment IG, 3B depicts example form factors of anthropo- nome devees, ia accordance with example embsiaients [0012] FIG. 45 a message Dow diagram, in accordance ‘with an example embodiment {0013} “FIG is another message flow diagram, in accor dance with an example embodiment {0014} FIG. 6 is a flow char, in soeontance with an example embodiment {0018} FIG. 7 is nother Now chart, in gocoranee with an cxample embodiment DETAILED DESCRIPTION 1. Overview [0016] Inthe past, the vast majority of media consumed by ‘users Was based ether on broadcasts that users had no dizeot US 2015/0138333 Al contol over, or physical media thatthe users purchased oF borrowed. Today, many users are eschewing broadcast and physical media in favor of on-cemanc! media streaming, oF ‘igita-only downloaded media. For example, movies can now be streamed on demand, over TP 10 television, DVR, DVDplayer cell phone, or computer. Additionally, users may prehase and dowsnload media, and store it digitally on theie ‘computers, This media may either he aecessedon that com= puter or via another device. 10017) Consequently, in some homes, these various media devices may be integrated, ether via ‘wireless or wireline networks, inlo one oF more home entertainment systems However, withthe greater flexibility and power ofthese n media technologies comes the possibility that some sere ‘might find using such systems to be too daunting or complex. For example, if a user wants to watch a movie he or she may have to decide whieh device displays the movie (e.g, a tle- vision or computer), which deviee streams the movie (eg. 8 television, DVR, of DVD player), and whether the movie is streamed from 2 local or remote source (eg. from a home media server or an one steaming service). Ifthe meta is streamed from a remote source, the user may need to also decide which of several content providers 0 Use. [0018] Further, in recent years the use of home automation systems has also proliferated. These system allow the cen- tealizedcantralof lighting, HIVAC (beating ventilation and conditioning), appliances, and/or windows curtains and shades of residential, business oF commercial properties “Thus, from one loation, a user can tuen om oF off the proper ‘y's lights, change the property's thenmostat settings, and so ‘on, Furher, the components of «home automation system may communicate with one another via, for example, IP and/or various wireless technologies. Some home automation systems support remote access so that the user ean progam andor adjust the systems parameters from a remote control ‘or from a computing deve, 10019] Thus. itmay be desirable to be able o simplify the ‘management and control ofa variety of media devices that ‘may comprise home entertainment system or a home auto- ‘ation system, However, the embodiments disclosed herein ‘are also applicable to other types of media devices wsed i ‘ther environments. For example, office communication and produedvity tools, including but not limited to audio and video conferencing systems, as well as document sharing systems, may benefit rom these embodiments. Also, the tem mesia device” is used herein for sake of convenience. It should be interpreted generically, to reer fo any type of device tha ean be controlled. Thus, a media device may be 3 home entertainment device that plays media, home auton tion deviee that controls the environmental aspects of lacs tion, or some other type of device. 10020] A function typically intended to simplify manage- ‘ment and control of media devices is remote control. Parti larly, the diversity of media devices fas ld to the popularity ‘of so-called “universal” remote controls that ean be pro- grammed 10 control virwally any media device. Typically, these remote controls use line-of-sight infared signaling More recently, media devices that are capable of being con tolled viaother wirelesstechnologies, such as Wifior BLUE- TOOTH, have become available [0021] Regardless of the wireless technology support, remote controls, especially universal remote controls, ener ally havea large number of buttons, and it snot always clear Which remote control bution affecs a given media device May 21, 2015 ‘unetion, Thus, modem remote controls often add t, rather than reduce, the complexity of home entertainment and home automation systems [0022] One possible way of mitigating this complexity isto ‘have a remote contro that responds fo voiee commands and! or social cues. However, there are challenges with getting suelia mechanism to operate ina robust fashion, Particularly, the remote contol may not he able to determine whether an andi signal that it receives isa voice command or back- ‘round noise. For instance, in noisy rom, the remote con- {tol might not beable to properly recognize Voice commands urer, some individuals may find it intitve to communi- cate with a remote control in a way thet simulates aman [0023] Some aspects ofthe embodiments disclosed herein address controlling multiple media devices in a robust and ie fashion. For example, an anthropomorphic xy serve asa intelligent remote control. Tbe anthro= pomorphic device may be a computing device with form actor that includes humaa-like characteristis, For example, the anthropomorphic device may be a doll or toy that resembles a human, an animal, a mythical creature or an ‘inanimate object. The anthropomorphic devise may have a head (or a body part resembling a head) with objects repre- senting eyes, ears, anda mouth. The head may also cantina ‘camera. microphone, andra speaker that eorespond tothe eyes, ears, and mouth, respectively. [0024] Additionally, the anthropomorphic device may respondto social cues. For instance, upon dotting the pres fence of & user, the anthropomorphic device may sdljust the position of its head and/or eyes to simulate looking at at the ‘user. By making “eye coniaet™ with the user, tho user is presented with » familie form of socal interoction in which {wo parties look at each other while communicating. [02S] Irie user speaks a command while wazing back at the anthropomorphic device, the anthropomorphic device may aeeess a profile ofthe user to determine, based on the ‘user's preference eneoded inthe profile, how to interpret the ‘command. The anthropomorphic device may also access a remote, eloud-based server to access the profile andor to assistin determining how to interpretthe commend. Then, the fanthropomompic device may control, perhaps through Wilh, BLUETOOTH, infrared, or some other wireless or wireline ‘technology, one or more media devices In response to accept- ‘ng the command, the anthropomorphie device may make an andi (eg, spakon phrase or particular sound) or non-audio (ex, # geste andlor another visual signal) acknowledge- sen to the user [0026] Incther embodiments, the anthropomorphic device ‘may respond (0 verbal social eues, For example, the aro- pomorpic device might have « "name." and Uke user might adress the anthropomorphic device by its name. In response to “hearing” its name, the anthropomorphic device may then {engage in eye contact with the user in oer to receive furher Spat from the user 2. Communication System and Deviee Architecture [0027] Themethods, devices, andsystems described herein fan be implemented using so-called “thin clients” and “cloud-based” server devices, as well as other types of eient and server devices. Under various aspects of this paradigm, client devices (e, anthropomorphie devices), may offload ‘some processing and stomge responsibilities to remote server devices, Atleast some of the time, these client services are US 2015/0138333 Al able to communicate, via a network such asthe Intemet, with theserverdevices Ava sul, pplication hat operaton the liet devices may als have a persistent, server-based com- ponent, Nonetheless it sbould be noted that at lest some of the methods, processes, and techniques disclosed horn may beabletoopeateentrely onwelient device ora sever device. 0028} In the embodiments herein, anthropomorphic ‘devices muy include cleat device fanetions. Ths the anto- pomorphic devices may inlode onc or more omasnication Untrces, with which the anthropomorphic devices comme cate wih onc or more server devises Wo caer ou anthropo- morphic device finctons. For ike of convenience, thgh- ‘nt this section antopomomhie devices maybe refered eneically a “elent devices” and may have similar hard- ‘ware and software components as other types of client devices. 10028} This section describes gener system and device architctresforboth client devices and server devices. How ‘ver, the methods, devices, and systems presented inthe ‘ubsoguent scctions msy operate ender diferent paridigms 28 wel. Ths, the embodiments ofthis setion are merely ‘examples ofhow these methods, devices, andsystems canbe ‘enabled 10030} A. Commusiaton System [0031] FIG. isa simplified block dsgram ofa communic ‘ation system 100, in which various embodiments described herein canbe employed. Communication system 100 includes cient devices 102,104, ad 106, which represcat ‘esklop personal computer PC), an anthropomorphic device inthe shape of rabbit and an anthopomorpic device in the shape of a teddy bear, respectively. Each of these client ‘devices may be able commonest with other doves vi a network 108 through the use of wireline or witless eonnce- 10032] Client device 102 may be a general purpose com- Puter tht can be used to cay out computing tasks and may ‘Sommunicatewithother devices in 1G. 1. Aathropomompic ‘device 104 may be based on general purpose computing technology and may be ale to comnieate with andor ‘control television 105" Anthropomorphic device 106 may ‘aso be base on peneral purpose computing technology, and may beable communicate wth andlor conto treo sys- tem 107 10033] Devices that display andor play media, such as television 105, and sero system 107, may be refered to as media devices Other types of media devices incide DVRs, DVD players Intemetapplisces, an! general purpose and special purpose computers. However as noted above, “media ‘vice is 8 generic term also encompassing home avons tion components and other types of devices 10034) In some possible mbodieats, clint devices 102, 104, and 106 and media devices 108 and 107 may be physi cally located in single residential or busines location. For ‘example client devices 102 and 104, aswell s media device 108, may be located in one rooms o's residence, wile clit