August 10, 2023

Large Language Models: Weighing the Risks of AI

Featuring Finny Kuruvilla, MD, PhD

Artificial Intelligence is developing rapidly, and for many of us, we struggle to understand how it works.

Listen to the converation here:

Large Language Models: Weighing the Risks of AI

Featuring Finny Kuruvilla, MD, PhD

Shaun Morgan: Welcome to the Invest by Design podcast. I’m Shaun Morgan, the Director of Dducation at Eventide. At Invest by Design, we use stories to pull back the curtain on how markets, industries, and economies work so that you can better understand the complicated world of investing. Our hope is that as investing becomes less of a mystery, you’ll understand how important it is to be intentional about how you invest, not only for your own financial goals, but for the wellbeing of the others around you.

Finny Kuruvilla: It’s easy sometimes to get caught up in the headiness of the day and forget about some of the risks and challenges here. And even if AGI’s, the artificial generalized intelligence is along ways away, large language models are concerning, and that’s partly because language is so bound up with being a human, right, and to master language is to have a significant power over humanity.

Shaun Morgan: In Plato’s book the phaedrus, he tells a myth about a God named Thoth. Thoth was credited with discovering a lot of great things like numbers, geometry and astronomy, but his favorite discovery was writing words. He wanted to share his discoveries with all of Egypt, but in order to do that, he had to convince the king of Egypt that his discoveries would be good for the Egyptians. So he presented each discovery and made a case for why they’d make society better. When he presented writing, he claimed that it would help people become wise and help them with memorization, but the king disagreed. He believed that writing would ultimately harm society because people would rely too much on the written word causing their memory to atrophy, and he believed it would give them the appearance of wisdom without actually learning. With each new technology comes a lot of promise, but also a lot of concerns. Artificial intelligence definitely has both. In this episode, we are airing a recording of Eventide’s Co-CIO, Dr. Finny Kuruvilla on his most recent CIO update webinar. He spent the second half of the webinar discussing artificial intelligence, but specifically he explains how large language models, which is one type of artificial intelligence work.

Finny Kuruvilla: AI, what is it? So I made my own definition here that I think is close to how a lot of computer scientists would even define it, but it’s simply the ability for computers to do tasks that are seemingly human. Okay? There’s an intelligence, there’s an awareness, there’s an ability that it seems like the computer is functioning like a human. Some of the examples are understanding language, playing chess, making decisions, expressing creativity, like writing a poem or composing some music, carrying on a conversation. These would all be different examples of AI, and I did a call in, I think it was the first quarter call on covered in Chat GPT, and of course, Chat GPT has really captured the public’s imagination because of how well it interacts with humans. And you can use this to write a poem from a toaster to a refrigerator.

You can use it to write an essay on the history of the French Revolution. You can then ask it to compose it in the language of Shakespeare, and it’ll do that. It’s just amazing. And there’s this sense of awe that people have with Chat GPT that has sparked a lot of imagination and moved a lot of funding in the direction of ai. We’re going to be talking about some of the innards of Chat GPT, and I don’t like to get lost in technical details, but I am going to throw out a couple of terms here. One term is large language model, and we’ll talk about this in a moment, or LLM. Okay, so an LLM is one way of achieving artificial intelligence and Chat GPT is an example of a large language model, which we’ll talk about shortly before we get into Chat GPT and AI.

From an investor perspective, I’m going to first introduce something that I think we all know, and it illustrates how already we’ve been using forms of AI in maybe more quieter, humble or ways. So this is an example of an email that I got from my colleague here at Eventide Robin, and we were talking about a particular task, and he wrote me and said, Thanks, Finny. Yeah, it’ll be difficult without a hands-on manager. And then there are these little blue buttons here on the bottom that Gmail makes suggestions, and you can click on one of these three buttons and it’ll type it in for you with two clicks, say, agreed and send. You’re done. What it’s doing is it’s reading your email and it’s trying to guess based on some keywords and some AI and algorithms that are embedded inside of Gmail, how a typical response might be.

And of course, they’re very short responses, but this is something that at first one that’s popped up, I thought, whoa, that’s kind of weird. Gmail is reading my emails, and now I’ve sort of gotten used to it, and I use these auto responses a lot. This is a simple form of AI that has been there for several years. What has happened is this type of scaling has grown many, many log fold higher. Every log is 10, a power of 10. And what I want to introduce is the basics of what an LLM represents, and it’s actually at its core a very simple concept. So what an LLM or large language model does is it reads texts and it reads lots and lots of texts, and it figures out the structure of a language based on probabilities. So if we consider the sentence here, the car drove into the blank.

So after reading lots and lots of texts, the LLM figures out that 70% of the time, the next part of the sentence is going to say drive. And then from there, 60% of the time is going to say driveway, and then 40% it’ll say drive through. Okay, so it just built out this probability tree here. 30% of the time it’ll say parking, and then 80% lot or 20% garage. So you can read lots and lots of language and annotate it in this sort of way. This is a very fundamental operation that LLMs are doing using probability and statistics just to figure out what’s going to be most common to come next. Okay, so what an LLM then does is it cranks across unbelievably large amounts of texts. Okay? So when I say large, I mean like all of Wikipedia, everything I can find on PubMed, which is where the world of biology and medical research lives.

It will go through news websites. It will go through just huge amounts of data here and read it and figure out the structure of the English language and all this content that is linking up using these probability algorithms. Well now when someone asks a question, the Trevi fountain is in the center of while using this language model, it’s going to predict that the answer is Rome is where this trevi fountain is in the center of which is correct, not Buenos iris at a 0.1% rate. And so you can get this appearance of intelligence simply by taking this probability distribution of what comes next and decorating that with some algorithms to seemingly give the ability to comprehend English. Now, what I want you to do is I want you to think of an  LLM as a glorified auto complete function. Okay? So now again, most of us will type in some sentence in word or Gmail or what have you, and then often we’ll recommend some kind of filler to the rest of the sentence based on what’s typical there.

And LLM of course has a lot of structure to it, but it’s basically at its core this auto complete function. So here would be an easy example if I say I’m making progress slowly, and if you’re a native English speaker, of course the next word that’s going to pop into your mind is, but surely we reflexively know that because we’ve been exposed to that expression so many times. And something as simple as autocomplete when it’s properly encoded in these transformers can have this magical effect of seeming to understand English. I want to stress this point though, that what it’s doing is it’s generating statistical estimates of what’s most likely to come next. It’s not based on reason or logic, it’s based on statistics. Okay? So I’m going to show you how you can prove that and see that with Chat GBT. So yesterday I logged on to Chat GBT when I was preparing this talk and I just gave it a softball. I said, what is two times 30? And Chat GBT responded to me with saying two times 30 equals 60. Alright, that’s good. What it’s doing here when it’s answering two times 30, it’s not using a calculator like we would take out a little pocket calculator or a calculator on your computer to calculate two times 30. It’s searching all this text that it’s got and looking for times that it’s seen two times 30 and then most likely, based on what it’s read, two times 30 equals 60.

I just said, okay, well if this is how Chat GBT is doing math, how can I make a mistake? Well, obviously you just try to pick something that’s probably it hasn’t seen before. It doesn’t have to be complicated math, but it’s just something that it probably hasn’t seen. So I just picked a couple random numbers here and I typed in what is 782 times 23 minus 12? I figured that’s probably not something it’s read a lot in all of the texts that it’s ingested. And so it said it’s, the answer was 17,986. I pulled out my calculator, it was wrong. It was actually my first try. I did my first try and I was able to produce a mistake from Chat GPT. The correct answer is actually 17,974. And so without hardly any effort with pretty small numbers, that takes five seconds on a calculator.

I got Chat GBT to make a mistake simply by realizing that it’s just trying to fill in this auto complete based on all the stuff that it’s read online. I was kind of surprised it was that easy to get it to make a mistake. So I asked it again, you can actually ask the same question multiple times. So I asked it again, same exact question and Chat GPT came to me and said, my apologies for the confusion in my previous response. The correct answer to the arithmetic expression is 17,966, which is still wrong. It’s still the same answer of course, 17,974. And I kept asking it again, again and never got it right. It gave slightly different answers, but it couldn’t get it correct. And so, wow, so simple, right? And it goes to show what is going on with this LLM, this auto complete style function.

And what this also means, if you think about it, is you can exploit this to show that it’s just using statistics and it’s going to make errors that are logically just ridiculous. So I asked Chat GPT also yesterday, what is the world record for crossing the English channel on foot? Of course, the English channel is 150 miles wide of ocean between England and France. Question doesn’t even make sense, but it answered here as of this update, the world record for crossing was held by Lewis Pugh, a British endurance swimmer and environmental activist. And so it is answering this with a swimming record, and it didn’t really catch the fact that I was asking the question about on foot. Obviously no human can walk 150 miles on the ocean on foot. So again, by trying to do this auto complete style, it’s not that hard to make Chat GPT make mistakes.

Shaun Morgan: As Finny describes it, a large language model is simply one form of artificial intelligence that takes an input like a prompt or a question and generates a response based on its understanding of how language works. By predicting each next word based on what has occurred most often in all of the texts that it has read, namely almost every text available on the internet, people have found many ways that this can be useful, but it can also be problematic when we forget that its objective isn’t truth, it’s simply likelihood. This means that large language models like Chat GPT can generate responses to prompts that sound correct, but are completely made up. In June of this year, a federal judge imposed fines on two lawyers for submitting non-existent judicial opinions with fake quotes and citations created by Chat GPT. These lawyers asked Chat GPT what opinions they should cite for a case they were working on.

And it fabricated studies that sounded real because it used language formation processes, but the cases and even the quotes within the cases were entirely fiction. These types of responses that are completely made up are called hallucinations. You can ask Chat GPT what books you can read to better understand economics, and it may give you the names of real authors and book titles that sound like the types of books those authors would’ve written, but the books don’t actually exist. In other words, it has the appearance of being wise, but its ability to produce language is not guided by truth, logic, or reason. And this becomes clear when it’s challenged with questions that require reasoning.

Finny Kuruvilla: And another kind of funny example here is from another person who said, if one woman can make one baby in nine months, how many months does it take nine women to make one baby? Explain each step. You used to arrive at your answer. And so Chat GPT figures out that a woman can produce one ninth of a baby per month. And so if you have nine women, it will take nine women one month to make one baby. So this is a good illustration again of this auto complete. This is not true human intelligence in the way that we think of intelligence. And so because of that, it is very important for us to understand the types of AI. AI is not a monolithic algorithm. There’s a lot of different types of AI. There is no doubt that LLMs have made a huge quantum leap in the last five years, absolutely breathtaking, remarkable.

And these LLMs have massive implications for productivity. It’s already shortening research time, reading legal documents, generating images, videos. It’s incredible. But we don’t want to confuse LLMs with artificial general intelligence or an AGI. Okay? So an artificial general intelligence is much more akin to human reasoning. It uses principles and when we think in different ways than statistically predicting words, right? We’re thinking about truth in abstract concepts and there’s a very interesting debate that is all the rage today, which is how long is it going to take to get to artificial general intelligence? Okay, so this is an example of two of the heavyweights debating here. So Grady Booch is a legend in the computer science world. He’s the chief scientist for software engineering at IBM, he leads their embodied cognition group. Some of you who know programming know about object oriented design.

He was the inventor of that. And then we have Gary Marcus over here. He did his PhD at MIT with Steven Pinker, a brilliant guy, founded Geometric Intelligence, a machine learning company, and they were debating each other about when is artificial general intelligence going to manifest in our world and huge divergence here. So Grady Bouch, this brilliant guy from IBM says AGI will not happen in your lifetime. This is a tweet from January of this year or in the lifetime of your children or in the lifetime of your children’s children. And Gary Marcus took the position. That’ll happen sometime in our lifetime, not like in the next couple of years, but maybe decades away. So this goes to show that what we consider true intelligence, the ability to not just do math at rapid abilities is still a ways away.

This is, I think, very interesting and something that should be brought into the conversation. Okay, let’s talk more about investing implications. Okay? There is no doubt that AI and these LLMs in particular are going to, are already beginning to touch nearly every industry, improving productivity with human supervision. These industries outside of technology will benefit. Of course. A lot of people are thinking about Microsoft and Google and companies like that, but every industry is going to benefit here. And the analogy I would give you is if I were to ask the question to you, how has the internet changed the economy? What industries has it affected? Well, it’s affected everything. It’s affected medicine, it’s affected technology, it’s affected utilities. I mean, who doesn’t use the internet in our world today? Very, very few people. It’s going to be like that where AI is going to lift many, many industries in different ways.

Education, think about the implications for learning media. The ability to write articles more quickly is astonishing. Music. The music models now are very interesting. What you can do with producing high quality music, much more rapidly law, you can almost have paralegal like assistance now with some of these LLMs defense. Imagine the power of using facial recognition and spatial recognition more powerfully. Utilities are going to be having to provide a lot more energy to these server farms because these are so intense. And of course utilities themselves are going to be using AI medicine. I have been saying for probably at least 10 years that the computers will beat doctors in terms of diagnosing diseases. Somebody’s going to come up to a computer and say, this is my symptom. These are my vital signs, and it’s going to give you, the computer will give you a more accurate list of what you have than even a very well-trained physician that’s just around the corner.

I think we’re not far from that at all. Sachs has estimated that AI induced productivity across the whole economy will be about one and a half percent per year over the next 10 years. So a general uplift across many industries. This involves massive amounts of data. It is hard to even comprehend how big the data is. I remember some of the first hard drives that I had when I was young and in the eighties where they were 30 megabyte drives. Now the data that we have is a yottabyte. This is how much data that our world today is dealing with, but we’re moving into brontobytes and geo. I mean, it’s just unbelievable here. And think about implications here for data centers, for network technology, for storage, for semiconductors, for software. I mean, implications are tremendous.

Shaun Morgan: As investors, it’s important for us to consider the usefulness of each form of AI and the timing of how each form will disrupt and catalyze different industries. The more acute popularity of large language models will have their own impact on a wide range of companies, even if artificial general intelligence is still developing. But just like the King of Egypt took time to understand the possible negative impacts of discoveries like writing before welcoming it into a society, we should also take the time to understand the possible negative effects of different forms of artificial intelligence alongside embracing the positive effects.

Finny Kuruvilla: It’s easy sometimes to get caught up in the headiness of the day and forget about some of the risks and challenges here. And even if AGIs, the artificial generalized intelligence is a long way, way, LLMs themselves are concerning, and that’s partly because language is so bound up with being a human right and to master language is to have a significant power over humanity. These are photos rather of various individuals who had excellent command of their language for good or for evil, and could shape the world in powerful ways by command of their language and their ability to galvanize and to inspire with language. Ranging from Winston Churchill, his famous sentence is that he mobilized the English language and sent it into battle right when he was in World War II, leading England. There negative examples like Hitler, of course, who used his eloquence against the Jews, entertaining examples, JK Rowling examples from civil rights, Harriet Beecher Stowe, Uncle Tom’s Cabin and MLK, the inspiration of people like JFK and his famous speech about going to the moon.

This is very, very powerful, and the analogy that I heard someone use is just like today, no human, even the best chess player on the planet can’t beat a computer at chess. What will it look like in a world where the best, most eloquent human can’t compete with computers with respect to their command of the language? It’s not so far away. So the risks are enormous. Of course, deep fakes, people are talking about these audio or video. Imagine a world where you pick up your, you think it’s your mom calling asking a favor, but it’s actually an AI render deepfake. That is something you could do today. DeepFakes and video are starting to proliferate on the web agenda is conversations from unknown sources QAnon, which is still very shadowy about who the actual original sources were posted on these various boards. Here we still don’t know, and that could easily be done today by an LLM, and the ability to control conversations and engage in conversations is the hallmark of discourse and political structures and systems.

Of course, the output of these LLMs is derivative of their input, making it vulnerable to all kinds of biases. Racism, who knows what influences could be present, thinking how much easier it’ll be to scam somebody. And we’re not far away from the 2024 election. What role will AI play there? A little bit of the taste of the power of AI, I think has come from social media. Social media has shaped our world in very powerful ways. Social media is thought to have played some kind of a role in the 2020 election, but certainly it’s played a huge role in terms of consuming the affections and shaping the affections of the world as a whole. Again, I think it is naive to believe that this will not be the case with AI, but on a more sophisticated scale, this was an article I read last year.

This was from April of 2022, about a man in Japan who married a fictional character. He married a hologram, and it sounds bizarre when you hear about it. They actually had a marriage ceremony. But when you read the article, you realize there’s thousands of people in Japan that are in these relationships. Many of them married to fictional characters, and he says, this character doesn’t bother me when I’m tired. She’s always pleasant. She doesn’t get sick, is very entertaining. We can have an experience together. I saw his on Twitter. He posted his proposal to this hologram. This was in the pre-chat GBT world. This can now be something that is far more human-like in terms of engaging with technology. Here it’s been called the Fictosexual movement. I think this is going to be something that grows tremendously. Another maybe more close to home example is just from last summer, Google fired an engineer named Blake Lewan for saying that an AI chat system had a soul.

And what’s fascinating here is that people are already sacrificing, so to speak, for these AI engines. They’re becoming attached to them in powerful ways. Again, I think this is a sign of what’s to come. I’m going to give a positive example lest I be so negative here on my risk section. This was an example that I thought was very interesting. Clever name is WoBo, WoeBot Health, and here they’ve engineered an LLM specifically for mental health. And you can actually download it on your phone today, and it speaks to you in a way that is just remarkable. They call it the best listener and the business of mental health. And it gets to know you. It suggests different things that you can do, and it’s suggesting clinically tested tools to help improve mental health. Something like this, I think, is not far away from getting a lot of traction. So an example, an intriguing example of where AI might head.

Shaun Morgan: At Eventide, there is a lot of excitement and wonder around the positive impacts that artificial intelligence will have on our lives and all of the potential tailwinds that we believe it will provide for many of the companies we invest in. But we also consider the ways we should be cautious as we identify its limitations and the potential for it to be harmful if we have an unbridled approach. If you enjoyed this episode, go ahead and subscribe so that you can know when future episodes come out. And if you want to know more about Eventide, visit


Invest By Design is produced by Eventide Asset Management, also known as Eventide. Eventide is an investment advisor that seeks to invest in companies that we believe are creating compelling value for the global common good. This podcast is for informational purposes only and expresses the views of the speakers and does not necessarily represent the views of Eventide Asset Management. Eventide’s values-based approach to investing may not produce desired results and could result in underperformance compared with other investments. Nothing said on this podcast should be construed as individual investment advice, to purchase or sell any security. Past performance is no guarantee of future results, and the opinions presented cannot be viewed as an indicator of future performance. All corporate names are for illustrative purposes only and are not a recommendation or offer to purchase or sell any security. Eventide clients and investors may or may not maintain positions in the companies discussed in this podcast.