Video: The future of generative AI

Artificial intelligence has transformed from a simple tool to a powerhouse of creativity and innovation, with generative AI (GenAI) leading the charge. But what does the future of generative AI look ...

Dec 4, 2024 |RapidScale |21 Minute Read

Artificial intelligence has transformed from a simple tool to a powerhouse of creativity and innovation, with generative AI (GenAI) leading the charge. But what does the future of generative AI look like, exactly?

In this video, David Lee (AVP, AI Center of Excellence at Cox Communications) explores the pivotal moments shaping AI's evolution. From the rapid advancements in large language models (LLMs) like GPT-4 to the fascinating concept of latent space, you’ll uncover the mechanisms behind AI's increasing capabilities.

Finally, you’ll dive into AI’s broader impact: reshaping human knowledge, redefining the future of work, and edging closer to the realm of artificial general intelligence. Dive in to discover how this technology is not just evolving—it’s transforming our world.

[video src="https://player.vimeo.com/video/1036112707" aspect_ratio="1/1"]

 

Transcript of Video: The Future of Generative AI

What I wanted to spend some time talking to you guys today about was my team. We've been building applications with generative AI and large language models for almost two years at this point.

And so we feel like we're pretty far along the learning journey. The first thing I'd like to start with is this idea that, the world did change when ChatGPT came out. It's catapulted us into this world of generative AI. But I think it's important for people to separate the world of predictive AI versus generative AI, right?

So predictive AI basically started around 2012. It's the creation of neural networks that allows us to make predictions on what videos people will watch, what products people will watch, what movie that you will like. All of that stuff left the lab and then became products from 2012 to 18-ish.

And then around 2017, a team at Google invented a thing called the Transformer, and then that changed the world in terms of AI's ability to create brand new things. When you hear this word AI, it really is both of these things together. There's this world of predictive stuff, and then this new world of generative stuff.

I'm not going to get too into details, but I think it's actually super important that everybody have a mental model on what this is. So this is a visual representation of a neural network. And basically speaking, this is how machines think. A guy named Jeffrey Hinton won the Nobel Prize in Physics.

And the reason why he won it is because his team figured out how to make this thing useful. It existed for about 10 years prior to that, but what they did is they said, I'm going to feed a bunch of pictures of cats into a neural network, and then I'm gonna give it a certain number of hidden layers and outputs.

This is five output nodes, and I think their model was just two, is this a cat or is this not a cat? And so they fed it 20,000 pictures of cats and the neural network set weights and biases so it could figure out what combinations of inputs would lead to it being a cat or not. And they created a thing called AlexNet, which could tell you this is a cat or this is not a cat.

That happened in 2012, and that's how this whole thing got started. But, this is 35 nodes on it. A cat, not cat image generator probably could be built on 35 nodes. But for reference, GPT-4 has 1.8 trillion nodes in it, right? So they have learned a ton of stuff and put it all into the neural networks.

I actually got the chance to hear Jeffrey Hinton speak and it was really interesting because he was basically talking about these arguments that he gets into. So you guys have probably heard the criticism that these large language models are largely just stochastic parrots. They're really very good next word predictors.

And he was talking about the arguments that he gets in with everyone and how he explains it's just not really. And the one that really hit me was, like I said, 1.8 trillion nodes in GPT-4. Each of these things, not this layer, but everything else, is just two numbers. It's a weight and a bias of the combinations of things that came before it.

And there are no words in this neural network. It is not "let me go through the dictionary and pick out words." We have shoved billions and trillions of words into these models, and they've figured out patterns in these things, and you can put words in. It figures out what the meaning is, gives you an answer, and spits words out.

And that is the miracle of artificial intelligence, right? It is not just a next word predictor. What you're looking at here is a chart of how well the AIs did at various high school level standardized exams. This blue line, these are like AP Calculus, AP English, AP Chemistry, and the sort.

And the blue is how well GPT-3.5, about two years ago, performed on these exams. And you can see roughly, on average, it's about a 40th percentile performer. Best in class AI two years ago was about as good as a 40th percentile high school student.

This green line is how well it improved when it went to GPT-4, right? One of the interesting things here is from GPT-1, 2, 3, 4, the amount of data that was fed into it, the number of words, got bigger by 10 times. So 10 times more data went into GPT-2, and then 10 times more into 3, and 10 times more into 4.

And each time, the AI got about 30 percent smarter. And that has been a pretty surprising result. With most AI, it asymptotes. As you get a certain amount of data, it doesn't keep getting smarter at the same ratio. But the power loss for large language models has not tapered out yet. Every time you give it 10 times more data, it's getting 30 percent smarter.

The problem the industry has run into, is that GPT-4 has read the entire internet. So until we make 9 more internets, we don't have 10 times more data to give it, perhaps. But from GPT-4, you get to these green bars, and then we're talking about an AI that can do about 80th percentile in US standardized testing scores.

The idea here that I want everybody to get grounded is that every time we give these AIs more data or new methods, they get about 30 percent smarter. OpenAI released a model called o1 , and this thing is a leap forward. So what you're looking at here is all of the leading AI models and how they perform on IQ tests.

What you will see here with Claude 3.5, Claude 2, GPT-4.0, is that they've been stacked up in this sort of 90 percentage point, 95 percentage point IQ. And o1 used a lot of synthetic data to get to that sort of fake next nine internets. They created a bunch of fake data, it trained on fake data, and then they also created this thing called Chain of Thought. What you'll notice is in all of the prior AIs, you ask it a question, it just starts answering you right away.

With o1, it does this thing called Chain of Thought, where it takes your question, answers it, asks itself some more questions, answers those questions, talks to itself for 5 to 25 seconds, and then starts to answer you. And so this chain of thought is what has led to this increase from a 90 IQ point to 120 IQ point over the course of 18 months.

So this idea that these AIs are gonna keep getting smarter for a bunch of different reasons is something very important for people to be grounded in. 120 is a pretty smart person. And if the next leap is to 150, we're getting into genius territory.

And that probably is less than 24 months away, if not sooner. So one of the ways we've been thinking about AI in our group is, and especially for business leaders who are non-technical, I asked them the following: What would you do if I could give you the world's smartest intern?

This intern speaks many languages. It knows every single topic on the internet. It's read everything on Reddit, everything on Wikipedia, everything on whatever. It's knowledgeable at every topic. It can answer every question in 30 seconds. It's unbelievably cheap. And then happily will take all feedback when you go, "Oh, you wrote that wrong, too generic. Please do this again." It doesn't get mad. It will just redo that 30 seconds worth of work. But also it has some drawbacks like an intern and since this AI is doing 40 million conversations an hour, probably more than that, it doesn't understand the context of what you are asking.

You may literally in one hundredth of a second, it is helping you write a press release at the same time it's helping somebody else plan a vacation in Italy. And it is able to do all these things in hundredths of a second. But it doesn't understand the context of what you were talking about.

It doesn't do anything unless you explicitly ask. It requires a ton of guidance. And also it has no idea when it's wrong, right? It will answer you and sometimes it thinks it's right and sometimes it is not right. I'm gonna introduce other idea that I think is important.

It's called latent space, right? So I showed you that big neural network at the beginning. In that little 35 node one, it's not that big of a deal. You're going to have a couple of paths through it. But if I give you a 1. 8 trillion node, what ends up happening is there's a ton of ways through it, like an incalculably large number of ways through that model.

And that space is this idea called latent space. And so these AIs that have been trained on the entire internet have all of the good information and bad information in them at the same time. And if you just ask it a generic question, like what's the best way to get from point A to point B, or, give me a strategy for this purpose, it's going to give you an average answer.

And what you have to do as a user is that you have to help, through the prompt, navigate that answer to latent space. Two of my two favorite examples about this is there was a story where somebody says, "Okay, AI, I need you to put yourself in two mental places. I want you to answer every question for the rest of this conversation, one, as a college professor in philosophy, writing a paper for other PhD caliber thinkers. And then, the second time you answer the question, I want you to answer it as an 11 year old middle school student boy who has no interest in being in this conversation." And the user asks a question, writes three paragraphs, and goes, blah, blah, blah, super smart words.

And then for the 11 year old boy answer, and this is the same conversation, one AI interaction, it goes, "Bro, I don't even want to be here." And so you can use the language in your prompting to push into good things. So for example, if you ask AI to write code, it will write you code. But if then you say, write me secure and well documented code, it will then do it better.

And then if you say, write me secure and well documented code that is good enough for production at a place like Google, it will write you even better code. So it doesn't automatically jump to the right side of the curve. It's jumping to the middle and as practitioners and as users of it, we need to push you towards the type of caliber and answer that you want.

So this mental model of an intern, I think really helps because in terms of thinking, "what should we have these AI do?", think of it as an intern. And in particular, when we had been building applications, instead of having one intern, what we are doing is writing applications that have multiple interactions with the AIs. So when you are using GPT at home, you type a thing and it gives you an answer and then you respond to that thing.

That's you talking to one intern. So what we are doing is building applications that take in large datasets and then have dozens of conversations with AI, take those answers as inputs and outputs to a system, and then continue to do more with that information. It's like working with 10,000 interns at any given time.

A couple quick things, one of the things we've done is content generation, where we have it write copy for the website. And so we have essentially three different agents, one that's a first draft writer, and then, we say, write content for this, and it takes that, and then inside the program, it then takes the first draft, sends it to a sort of brand voice agent that we have trained, and it rewrites that thing and brings it back in sort of brand compliant voice. And then takes that draft and sends it to a legal agent, and then it returns a draft.

And then we have that thing sent to a SEO optimization engine. And so you get four sort of rewrites all within about 40 seconds. And then you have essentially what was like a two week human process happening in about 40 seconds. We still got to go do the legal stuff. But for the most part, you're getting multiple interactions because essentially we've trained four different interns to take a different part of the process.

Bespoke answers, knowledge base, there's a ton of stuff around that. The supply chain analysis one is an interesting one, where we have 70,000 items in our parts catalog, and we were trying to figure out how to reorganize them. You cannot just send them all and say, please organize those. So we did it like two different steps, where we say, hey, here's the problem we were doing and we want to create new high level classifications.

We asked five different AIs to give us back those things. We gave them all then to a team of human beings, who said, oh, I like this one, I don't like this one, I like this one, I don't like this one. They picked what the final sort of categorization should be and then we wrote another program that would say here's the classifications.

Now, step by step, AI intern, go through every single line item and then put them into the right buckets. And by the way, then tell me how accurate you think you are. And, because the AIs will make mistakes, as I've said before, the AI will go, I'm really sure this one is in the right place, or I want to make sure it's in the right place.

And as its confidence level goes down, we hand only the unconfident ones back to a team of human reviewers, which will then classify those appropriately. This one is my favorite. I think this is a really good example what AIs can do that we couldn't do before. So this was a project where we were trying to understand future customer behavior.

And because of the structure of the problem, we took individual transcripts. So we have something on the magnitude of 200,000 calls a month of people calling to ask about purchasing various products from us. Of that, we had an AI listen to all those calls and extract just the ones that are about cell phones, and that's about 13 percent of calls.

Then we took those 13 percent of calls and took them in batches of people who purchased cell phones from us in the next 60 days or 30 days post that phone call. And so we could separate all the calls into these did buy, did not buy. Then we took all of these calls and sent them to an AI and said, here are three examples of phone calls where the customer did not buy and here's one where it did buy. Please come up with a question that separates these groups of calls. And then it came up with a question. When you do that on a large data set, we end up with hundreds of thousands of questions. Then we took that list. Send it to another AI that then starts to deduplicate.

Is this the same question or is this a new question? And then we simplify that list of hundreds of thousands of questions down to 25 questions. And then we take those 25 questions, run every single call we have for an entire month to answer those 25 calls. And then we feed those answers. And there are things like, did they mention, one of the following competitors, did they mention a family plan.

And then we take those answers, feed it into a custom trained neural network. And now we can predict with impressively high accuracy, whether or not people will buy a cell phone from us 30 days from a single call interaction. And there's like a seven step interaction that's going on here. This is a combination of both the generative AI and the predictive AI.

Everybody's got AI POCs is going on. But here are some of the sort of guidelines of things that I think that are useful and very easy today to do, things like content generation, like here's a bunch of example documents, please make stuff like that. Or here's 14,000 help files instead of asking my customer service agents to find the right one, just let them verbatim type in any question. And then there's a retrieval process and a summarization process the AI can do that gives you custom answers. We all use, if you use Google or anything nowadays, it's a huge knowledge management problem.

That's actually relatively easy to solve with the tools that are available today. A document summary is, I've got 300 documents or one really long document. Please tell me what's in it. The transcription analysis is one that I was talking about. So back to what's a good problem is something where it would be cool if you could hire 10,000 interns to go do this one task 40 million times, but you would never really do that in the real world.

But if you can do it for 5 dollars with AI, okay. There's probably some insights in there that are worth doing. And then programming, if you haven't seen or talked to your development teams about what a difference it means when you have AI help you program, it will take your breath away.

It has taken my breath away on a monthly basis about what you can do with a 99th percentile programmer and 99th percentile AI. This is the same stuff, but just more generalized. The kinds of work that are really good for AI are straightforward single tasks.

You can't just say, here's all my data, please tell me who's gonna buy my product and who is not. You have to structure everything down so that an 80th percentile college intern could answer that question realistically, but then you can use that information for the next steps. So this is back to everything in one shot.

Think about things in multiple step processes because that really is where you can use the power that's existing in today's LLMs and then turn it into something more impressive. It's also best when inputs are in language, input output. Also reducing the cognitive burden of existing jobs.

I've seen a ton of examples where AI can basically take somebody who just started a job and get them to be about as good as somebody who's been doing it for nine months or longer within a day or two, because the sort of insights of having to do it, that tribal knowledge, that experiential stuff of relatively simple tasks can be embedded in the AI, and the AI can just make everybody's job a lot easier.

Here are the biggest lessons for us as developers from year one. The first is that a lot of what we've been doing is automating conversations. So we have Python scripts that pull data from one place, do a transformation, send it to an AI, take that answer, do something else with it, and then have these multiple interactions, right?

So instead of just a one shot conversation, it's about systematically creating interactions with AIs. One of the aha moments is that as we write these programs, a lot of the development work is much more like teaching a person how to do a job than it is writing programs. A lot of what we build is done in Word and then Word is then pasted into IDEs like VS Code or something like that.

And that's weird but that is the way it works. One of the interesting things, and this is changing, but at the early days, a lot of times, the AI would just forget. So we had some workflows where you would say, hey, analyze this thing and then give me back an answer in these fake HTML tags that say answer and then another one on workspace and the sort.

And then one out of every hundred times, which is pretty high from a system standpoint, the AI would just forget to put the brackets there. And so we would have to write a follow up script that went after it that said, you were supposed to get a piece of data that looks like this. It has workspace at the beginning and the end. If it didn't, please send this back to the agent before you and have them do it again. And that's wild that you have computer programs that just forget to do part of the instructions. The last two is this need to actively navigate latent space.

This is about trying to give it clues to say, act like this when you're doing this. And the last one is that we have found, and this may be different at different organizations, is that the programmers and tinkerers seem to be more comfortable with us than the data scientists. Like a lot of people have pushed it to their data science departments because those are the people who you were asking to do ML work before.

But what I have found is there's a level of precision and accuracy that's default in the mindset of a data scientist that actually slows them down a lot here because they will say the AI is probabilistic. It's not a deterministic system. And so it's going to give you a slightly different answer each time.

And I don't like that as a quantitative person that makes me uncomfortable. And so we found that programmers are just much more comfortable with sort of the randomness that is associated with these AI systems. I want to share three big ideas that inform the way that I've been thinking about the transformation that the world is going to experience because of these generative AI tools.

The first is this. This is an idea that I learned from Walter DeBauer so it's not mine. But if you think about language, like not just English, but all languages, language serves two purposes. One, it allows us to communicate with each other. So I'm telling you some stuff that I've learned.

Hopefully, if this is interesting, you'll tell some of your friends and coworkers tomorrow I heard this Asian guy who's losing his hair talk about these things. But then the second thing is that it's also used to store knowledge. So you might be taking notes now, or if you went to, school or you're in an important meeting, you write stuff down, right?

So language by itself is used for us to talk to each other as humans, and also for us to store knowledge. And these large language models were almost by accident trained on words. They didn't need to be. When the transformer was invented, you can use the transformer on any kind of input data and output data.

But all the large language models are just that. They're large models based around huge bodies of words. And because they learned the entire internet, because they've read, Tens of thousands, if not millions of textbooks, what has happened is that we train these in massively large neural networks on patterns of words, and patterns of words is actually how we store knowledge.

And so by accident, the training process takes petabytes worth of words, compresses it down to gigabytes worth of neural networks, and that means we are literally having these portable databases of human knowledge that are accessible via web browser, by phone, that run on laptops today, and will run effectively on cell phones over the next couple of years.

They run today, but they're going to get better. And so the thing that is really happening is that these large language models are not just like this miracle of data input output. It's actually a way to compress human knowledge and make it easily accessible and transportable to everything. And I think that is amazing.

Imagine that everyone who has a cell phone will have a 99th percentile expert on every single topic on their shoulder or embedded in their glasses. And I think that is going to change the human relationship with knowledge. And that's the like crazy big picture of something I want people to be thinking about.

The second thing here is that these AIs are getting 30 percent smarter every 18 months. And there's a famous mathematician named Terence Tao. I don't know if anybody knows him or has heard of him, but he's literally the highest IQ on Earth. It's something like 225.

So Terence Tao's had early access to o1 for six months. And he described o1 as a mildly incompetent grad student. But a mildly incompetent grad student for Terence Tao is way smarter than I am, right? And so what that means is that is generally available now. And if you fast forward another 18 to 24 months, we're going to be talking about a pretty competent grad student at the Terence Tao level, sitting on everyone's cell phone and sitting on everyone's shoulder.

And so that's how I think people need to be thinking about what is the future of AI? I think I have two other ideas here. I talked about better data already. The internet is a so-so data source. It's got good stuff and bad stuff in there. Microsoft released a model last year called Phi-3, where instead of training on the whole internet, it trained on 10,000 high quality textbooks.

And it was about as smart as GPT-3.5, but using much, much less data. So there's a real evidence that shows better data also leads to smarter AIs, not just more data. Agentic systems, have you guys heard of that term yet? Yeah, this is the new hotness. Like this is the thing everybody will be talking about over the next year and a half.

This is AIs talking to other AIs, giving assignments to other AIs, and then taking and grading each other's work. So the simplest idea that I think is easy to get your head is, imagine if your software development teams, instead of eight people and a scrum master and a product owner and a couple of designers and a couple of testers and a couple of programmers, imagine each of them was just a different AI with a set of interactions and controls.

And you can just say, "Build me a thing that does X." And then it will go, "okay," and then it breaks it into a bunch of tickets and sends those tickets to each other. And then each one of them will do these things. So teams of AIs working together is going to be the thing. Every large foundational model has been releasing new toolkits to make agentic AI easier.

And so that's the thing that's really changing the world. Multimodal, that just means it's not just text. It means sound can come in, pictures can come in, but also sound can come out and pictures can come out. Larger and smaller models. This is like distilling and shrinking models so that they fit small enough to run on your phone.

To be clear, as the models get smaller, they're less smart, but they become more portable. And so there's this trade off of where do you want to run the AI. And then how smart is it? And so the smartest ones will probably always run in data centers but the convenience of having it on your watch or on your glasses is probably gonna outweigh a couple of things over the next decade.

And then this thing of chain of thought, which is relatively new, and this is the thing that o1 does, which is instead of answering right away, it talks to itself for a little while and then gives you the answer. Okay, so the last idea. This is an idea from Mark Andreessen, and I had never heard of it before, so probably in the popular press you've heard of this thing called AGI.

Oh, AGI is coming. And I think AGI is a weird term because we don't really know what that means. You've probably heard of the Turing test. Raise your hand if you know the Turing test.

Most of the room does, but I'll do it super quick. So for 80 ish years, the sort of measure of whether or not artificial intelligence was real was whether or not a human being sitting at a terminal talking to a piece of software could determine whether or not it was a person on the other side of their software, and that was a good test for 80 years, and then the last two years, all of the AIs blew by it, and suddenly nobody cares about everything. "Oh, that was a dumb test. That was a stupid test. Why anyone thought that was a good idea, I don't know."

So I think is the same thing is gonna happen with AGI, which is it'll show up at some point and people will go, "Well that's not really AGI because it needs to do this," and I think that bar is gonna keep changing. I showed you the chart at the very beginning of this talk, which was at every model we've seen over the last two years, the AIs are getting 20 to 30 percent smarter.

And if you measure it in terms of percentile against standardized tests, we went from 50th percentile to 80th percentile to 90th percentile. In a year and a half, we'll probably be looking at 95, year after that, 99. And then we're at 99.9999, where any general model AI outperforms even the Terence Tao's of the world, right?

And in Mark Andreessen's vocabulary, that means we will have reached this thing called artificial human intelligence, right? Where the AIs are as smart as our very smartest human beings. And then also in his parlance, is that there's another layer of knowledge and understanding that will go past that, where even our smartest human beings cannot solve certain problems that the AIs will be able to figure out, and that to me is artificial general intelligence, where the systems, working together or working independently, can address problems that even our smartest human beings cannot solve, and to me that's that line, right?

So we'll see how that happens. I do not have a strong point of view on when that will happen, but I am certain that it will happen in our lifetimes. We can take bets on when it happens, but it's definitely gonna happen within our lifetimes. Gonna do one more. So this is an idea from is it on here?

I forget his name. He works at Andreessen, as well. And he has, talking about the sort of impact of generative AI, and says that generative AI is the third epoch of compute. And whether there's 10 or 12 or 4, it doesn't really matter. But in his model, he says, Look, when the world invented microprocessors in the 60s and 70s, what it meant is that the incremental cost of compute went to zero.

So if you remember those old movies about hidden figures and we sent Apollo to the moon, you had literally rooms full of women doing math to be the calculators, right? And then the microprocessor showed up and then we stopped that. We stopped thinking about the cost of compute because it just happened instantaneously.

When the internet came to be in the 80s and 90s, The cost of distribution for digital assets started to go to zero and we all saw the disruption in the media industry, in the music industry, in the movie industry because of that thing. The idea here is that generative AI is going to take the incremental cost of the creation of digital assets.

So I'm seeing it in software. Like you see, there are a lot of people who just graduating this year with CS degrees who are not finding jobs because all the software companies are starting to see, Oh, AI coding makes our best coders way more efficient. And so the scarce asset of people who know how to build software is suddenly not scarce because you have a 90th, 95th percentile programmer sitting on everybody's web browser at this point.

So one of the big things, and again, I don't know where this is going to go. But I will say that I think it's important for people to get their heads around. The AIs are going to keep getting smarter. They're going to be able to learn from each other and train. And then one of these things that's happening is that the incremental cost of digital output that improves words, movies, video clips, software, the incremental cost of that is going to plummet over the next 10 years.

And so that's the real business impact thing to be thinking about. So I think that's my last slide. I appreciate you guys giving me a chance to chat with you for a little while, and I hope it's a good afternoon. Thank you for coming.