1 00:00:00,000 --> 00:00:01,330 2 00:00:01,330 --> 00:00:02,920 - There are over 100 million phones 3 00:00:02,920 --> 00:00:05,230 that can tell if you're using your knuckle or finger to touch 4 00:00:05,230 --> 00:00:06,700 the screen, as well as whether you're 5 00:00:06,700 --> 00:00:08,660 lifting the device to your ear. 6 00:00:08,660 --> 00:00:10,390 They are examples of projects that started here 7 00:00:10,390 --> 00:00:11,830 at the Future Interfaces Group lab 8 00:00:11,830 --> 00:00:15,280 at Carnegie Mellon University in Pittsburgh, Pennsylvania. 9 00:00:15,280 --> 00:00:17,440 The lab has been around since 2014 10 00:00:17,440 --> 00:00:21,040 and counts Google, Intel, and Qualcomm among its sponsors. 11 00:00:21,040 --> 00:00:24,040 Every year they develop hundreds of speculative ideas, all 12 00:00:24,040 --> 00:00:25,690 to do with how we communicate with machines 13 00:00:25,690 --> 00:00:27,090 beyond the mode of keyboard, touch 14 00:00:27,090 --> 00:00:29,380 screen, mouse, or even voice. 15 00:00:29,380 --> 00:00:31,510 We came here to see some of their latest ideas 16 00:00:31,510 --> 00:00:33,280 and what they might have to say about the future 17 00:00:33,280 --> 00:00:35,049 of human computer interaction. 18 00:00:35,049 --> 00:00:41,620 19 00:00:41,620 --> 00:00:44,590 CHRIS HARRISON: I came to CMU as faculty about five years ago 20 00:00:44,590 --> 00:00:46,270 and founded the Future Interfaces Group. 21 00:00:46,270 --> 00:00:47,920 And we set up shop in this building a little bit 22 00:00:47,920 --> 00:00:50,860 off campus so we had lots of space to build crazy prototypes 23 00:00:50,860 --> 00:00:52,160 and put things together. 24 00:00:52,160 --> 00:00:54,460 I wanted to build on my PhD thesis research, which 25 00:00:54,460 --> 00:00:57,130 was looking at how to use the human body as an 26 00:00:57,130 --> 00:00:59,094 interactive computing surface. 27 00:00:59,094 --> 00:01:00,460 And so we extended a lot of those themes, 28 00:01:00,460 --> 00:01:03,460 and obviously I took on master students and undergraduates 29 00:01:03,460 --> 00:01:06,310 and PhD student researchers to extend that vision 30 00:01:06,310 --> 00:01:08,230 and help them sort of explore new frontiers 31 00:01:08,230 --> 00:01:09,640 in human computer interaction. 32 00:01:09,640 --> 00:01:12,670 A grand vision that the whole lab has bought into 33 00:01:12,670 --> 00:01:15,220 is the notion of having intelligent environments. 34 00:01:15,220 --> 00:01:18,030 Right now, if you have a Google Home or an Alexa 35 00:01:18,030 --> 00:01:19,900 or one of these smart assistants sitting 36 00:01:19,900 --> 00:01:22,320 on your kitchen countertop, it's totally oblivious to what's 37 00:01:22,320 --> 00:01:23,120 going on around. 38 00:01:23,120 --> 00:01:24,850 And that's true of your smartwatch and that's true 39 00:01:24,850 --> 00:01:27,850 of your smartphone. {} They want to make them truly assistive, 40 00:01:27,850 --> 00:01:30,090 and they can fill in a lot of context like a good [? human ?] 41 00:01:30,090 --> 00:01:35,130 system would be able to do, they need to have that awareness. 42 00:01:35,130 --> 00:01:36,790 GIERAD LAPUT: Like when humans communicate. 43 00:01:36,790 --> 00:01:39,120 There's these verbal and nonverbal cues that we use, 44 00:01:39,120 --> 00:01:41,760 like you know, gaze and gesture and all these different things 45 00:01:41,760 --> 00:01:44,070 to enrich that conversation. 46 00:01:44,070 --> 00:01:46,620 In human computer interaction, you don't really have that. 47 00:01:46,620 --> 00:01:49,200 A lot of my current work is all about increasing 48 00:01:49,200 --> 00:01:50,860 implicit input bandwidth. 49 00:01:50,860 --> 00:01:53,130 So what I mean by that is increasing the ability 50 00:01:53,130 --> 00:01:56,190 for these devices to have contextual understanding about 51 00:01:56,190 --> 00:01:57,480 what's happening around them. 52 00:01:57,480 --> 00:02:00,600 CHRIS HARRISON: So a good example of this is sound. 53 00:02:00,600 --> 00:02:02,370 We have this project called Ubicoustics 54 00:02:02,370 --> 00:02:04,200 that listens to the environment and tries 55 00:02:04,200 --> 00:02:05,190 to guess what's going on. 56 00:02:05,190 --> 00:02:12,600 57 00:02:12,600 --> 00:02:14,670 If I teleported you into my kitchen 58 00:02:14,670 --> 00:02:17,010 but I blindfolded you and I started blending something 59 00:02:17,010 --> 00:02:19,740 or chopping vegetables, you'd be able to know that Chris is 60 00:02:19,740 --> 00:02:21,150 chopping vegetables or running the blender 61 00:02:21,150 --> 00:02:23,150 or turning on a stove or running the microwave. 62 00:02:23,150 --> 00:02:24,390 And so we just asked ourselves, well, 63 00:02:24,390 --> 00:02:26,760 if sound is so distinctive that humans can do it, 64 00:02:26,760 --> 00:02:28,920 can we not train computers to use the microphones 65 00:02:28,920 --> 00:02:30,690 that almost all of them have? 66 00:02:30,690 --> 00:02:32,720 Whether it's a smart speaker or even a smartwatch, 67 00:02:32,720 --> 00:02:34,620 you have all these sensors that other people have created 68 00:02:34,620 --> 00:02:35,710 that are at your disposal. 69 00:02:35,710 --> 00:02:36,876 And the question is, how do you put 70 00:02:36,876 --> 00:02:39,786 them together to do this in a low cost and practical way? 71 00:02:39,786 --> 00:02:44,087 - You have 12 messages and a meeting in 12 minutes. 72 00:02:44,087 --> 00:02:45,820 GIERAD LAPUT: I think of smartwatches as like really 73 00:02:45,820 --> 00:02:47,290 capable computers. 74 00:02:47,290 --> 00:02:48,790 They should be able to almost, like, 75 00:02:48,790 --> 00:02:51,520 transform the hand into an arm 2.0, 76 00:02:51,520 --> 00:02:53,380 as opposed to just extensions of the phone. 77 00:02:53,380 --> 00:02:56,650 Typically, accelerometers in the watch are around 100 Hertz. 78 00:02:56,650 --> 00:02:59,560 So here what we did is we overclocked 79 00:02:59,560 --> 00:03:02,920 the accelerometer on the watch so that it becomes high speed. 80 00:03:02,920 --> 00:03:06,850 So you can see here when I interact with this coffee 81 00:03:06,850 --> 00:03:10,090 grinder, you can actually see the micro vibrations 82 00:03:10,090 --> 00:03:13,360 that are propagating from my hand to the watch. 83 00:03:13,360 --> 00:03:16,270 You can't see that effect from the 100 Hertz accelerometer 84 00:03:16,270 --> 00:03:17,890 because it's too coarse. 85 00:03:17,890 --> 00:03:20,310 The vibrations when I tap here and when I tap here 86 00:03:20,310 --> 00:03:21,650 are actually quite different. 87 00:03:21,650 --> 00:03:24,400 So I can basically transform this area around the watch 88 00:03:24,400 --> 00:03:26,240 into an input platform. 89 00:03:26,240 --> 00:03:28,840 You can also combine this with the motion data. 90 00:03:28,840 --> 00:03:31,870 So when I snap, I can basically either snap 91 00:03:31,870 --> 00:03:34,180 to turn on the lights and then I can do 92 00:03:34,180 --> 00:03:35,890 this gesture and then twist-- 93 00:03:35,890 --> 00:03:38,620 adjust the lighting in that house. 94 00:03:38,620 --> 00:03:41,350 And then I can do like a clap gesture to turn on the TV 95 00:03:41,350 --> 00:03:43,120 and do like these types of gestures 96 00:03:43,120 --> 00:03:45,460 to navigate up and down. 97 00:03:45,460 --> 00:03:47,650 - These are only a few of the hundreds of ideas that 98 00:03:47,650 --> 00:03:49,660 pop up at the lab every year. 99 00:03:49,660 --> 00:03:51,880 A couple of them turn into real startups. 100 00:03:51,880 --> 00:03:53,890 One of them is Queexo, which is behind the touchscreen 101 00:03:53,890 --> 00:03:56,330 technology we saw at the beginning. 102 00:03:56,330 --> 00:03:59,490 Another newer one is a computer vision startup called Zensors. 103 00:03:59,490 --> 00:04:00,790 CHRIS HARRISON: One of the technologies 104 00:04:00,790 --> 00:04:02,380 that we did for smart environments 105 00:04:02,380 --> 00:04:03,520 was a camera-based approach. 106 00:04:03,520 --> 00:04:05,380 We noticed that in a lot of settings, 107 00:04:05,380 --> 00:04:08,030 like in restaurants or libraries or airports 108 00:04:08,030 --> 00:04:09,730 or even out in the street, there's a lot of cameras 109 00:04:09,730 --> 00:04:10,670 these days. 110 00:04:10,670 --> 00:04:11,480 And what we asked-- 111 00:04:11,480 --> 00:04:13,040 could we turn these into a sensor feed 112 00:04:13,040 --> 00:04:14,320 so you don't have to have someone 113 00:04:14,320 --> 00:04:16,570 in the back room just looking at 50 screens, 114 00:04:16,570 --> 00:04:18,279 but can we somehow actionalize? 115 00:04:18,279 --> 00:04:19,640 And that's why we did it Zensors. 116 00:04:19,640 --> 00:04:22,630 Here's an example of how we can go make a question. 117 00:04:22,630 --> 00:04:23,440 So we have a camera. 118 00:04:23,440 --> 00:04:24,406 It's actually right above us. 119 00:04:24,406 --> 00:04:26,590 You can see us here right now. 120 00:04:26,590 --> 00:04:29,450 This updates now once every 30 seconds or once every minute. 121 00:04:29,450 --> 00:04:33,340 So the first thing you do is we select a region of interest. 122 00:04:33,340 --> 00:04:36,260 So in this case, these two sofas. 123 00:04:36,260 --> 00:04:38,080 It's going to be a, let's say, a how many. 124 00:04:38,080 --> 00:04:39,790 So now I'm literally just going to ask, 125 00:04:39,790 --> 00:04:42,920 how many people are here? 126 00:04:42,920 --> 00:04:43,900 That's it. 127 00:04:43,900 --> 00:04:46,510 And right now it's saying there's three people here. 128 00:04:46,510 --> 00:04:48,640 And we're not just limited to these sofas. 129 00:04:48,640 --> 00:04:52,780 I could ask, is there a laptop or phones on this table? 130 00:04:52,780 --> 00:04:54,520 Is there food on this table? 131 00:04:54,520 --> 00:04:55,990 Anything you can ask you could do. 132 00:04:55,990 --> 00:04:57,880 So I think the model of the company kind of 133 00:04:57,880 --> 00:05:00,260 is if you can see it we can sense it. 134 00:05:00,260 --> 00:05:02,530 So we're doing a real-time parking pilot right now 135 00:05:02,530 --> 00:05:04,270 with the city, and what we're using 136 00:05:04,270 --> 00:05:06,280 is existing cameras along a stretch 137 00:05:06,280 --> 00:05:08,230 to basically count cars. 138 00:05:08,230 --> 00:05:10,180 So we can use that as a real-time model. 139 00:05:10,180 --> 00:05:12,460 Potentially like real-time parking, but also to just help 140 00:05:12,460 --> 00:05:13,600 people find parking spots. 141 00:05:13,600 --> 00:05:15,850 If you can direct them to adjacent parking, 142 00:05:15,850 --> 00:05:17,860 it can be much more efficient and reduce congestion 143 00:05:17,860 --> 00:05:19,210 and air pollution and so on. 144 00:05:19,210 --> 00:05:21,130 Deploying that sort of technology city scale 145 00:05:21,130 --> 00:05:23,050 requires a huge capital investment. 146 00:05:23,050 --> 00:05:24,190 At the end is a number. 147 00:05:24,190 --> 00:05:25,960 Doesn't matter if it's produced by a video camera 148 00:05:25,960 --> 00:05:28,150 or by physical sensors in the pavement. 149 00:05:28,150 --> 00:05:30,130 So in order for technologies to be adopted 150 00:05:30,130 --> 00:05:31,990 downstream, past the research phase 151 00:05:31,990 --> 00:05:34,070 into the engineering and commercialization phase, 152 00:05:34,070 --> 00:05:35,650 is they have to be practical. 153 00:05:35,650 --> 00:05:38,590 Feasibility is obviously critical. 154 00:05:38,590 --> 00:05:40,060 We like to tackle problems that we know 155 00:05:40,060 --> 00:05:42,460 we can make progress on, and what we balance 156 00:05:42,460 --> 00:05:44,650 that with is impact and value. 157 00:05:44,650 --> 00:05:46,990 - The research is undoubtedly exciting. 158 00:05:46,990 --> 00:05:48,880 But what else happens when a security camera doesn't 159 00:05:48,880 --> 00:05:50,980 just see but understands? 160 00:05:50,980 --> 00:05:53,290 Any technology can be misused. 161 00:05:53,290 --> 00:05:56,230 What happens to an idea after it leaves the lab? 162 00:05:56,230 --> 00:05:58,540 CHRIS HARRISON: It is a gray area, sort of like cars. 163 00:05:58,540 --> 00:06:01,210 You're never going to make the 100% safe car, 164 00:06:01,210 --> 00:06:03,730 but that doesn't mean we should eliminate all cars. 165 00:06:03,730 --> 00:06:05,320 And we should think about that for technology. 166 00:06:05,320 --> 00:06:09,310 That no technology is ever going to be 100% secure or 100% 167 00:06:09,310 --> 00:06:10,420 privacy preserving. 168 00:06:10,420 --> 00:06:12,400 And so we always try to think about how 169 00:06:12,400 --> 00:06:14,620 to make these technologies that make the right trade-off. 170 00:06:14,620 --> 00:06:16,970 Because we have a vision of how they're going to exist. 171 00:06:16,970 --> 00:06:18,940 We can think about in our mind, oh, this would be so cool 172 00:06:18,940 --> 00:06:20,020 if I had this in my kitchen. 173 00:06:20,020 --> 00:06:21,430 But we're too close to that domain. 174 00:06:21,430 --> 00:06:23,050 We think everything is cool. 175 00:06:23,050 --> 00:06:24,940 All of the technologies that we build 176 00:06:24,940 --> 00:06:26,420 are put in front of users. 177 00:06:26,420 --> 00:06:28,720 And if you can get people to buy into the vision, 178 00:06:28,720 --> 00:06:30,077 then maybe they'll accept that, oh, 179 00:06:30,077 --> 00:06:31,810 but there's a microphone on this thing that could be 180 00:06:31,810 --> 00:06:33,600 listening to me in my kitchen. 181 00:06:33,600 --> 00:06:35,260 And if you make that value proposition right 182 00:06:35,260 --> 00:06:36,430 they'll accept it. 183 00:06:36,430 --> 00:06:38,080 If you get that value proposition wrong 184 00:06:38,080 --> 00:06:40,390 then it'll just falter and it won't be adopted. 185 00:06:40,390 --> 00:06:43,740 [MUSIC PLAYING] 186 00:06:43,740 --> 00:06:52,480