false
Catalog
AOCOPM 2024 Midyear Educational Conference
346719 - Video 11
346719 - Video 11
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Good morning, everyone. My name is Dr. Philpotts. Welcome to the Public Health and Preventive Medicine session. At this time, I'm going to go ahead and present Dr. David Shumway. He's a third-year internal medicine resident and active-duty Air Force captain practicing in coastal Mississippi. He's a principal and founding columnist for the Resident Notebook column in the DO, but writes a wide variety of topics, including recently on applying artificial intelligence in the clinic. He's a member of the AOA Bureau Emerging Leaders, training in policy studies fellow, and inaugural class participant in the AOA Leadership Academy. He's married with two children, and a tricolor Pembroke Corgi, has a tricolor Pembroke Corgi. Welcome, Dr. Shumway. Good morning. Thank you very much for that introduction. Good to see everyone this morning, and thank you for being here. I know it's early. 8 a.m. is what we internal medicine residents call almost lunchtime. So it's nice to see everyone here. All right. So hello, everybody. For those of you that don't know me, I am David Shumway. For those of you that do, excellent to see you again. And I'm very glad to be here in Conroe, Texas. This school building is absolutely gorgeous. I'm always impressed by these newer osteopathic medical schools, just how pretty they are. And it always fills me with delight to see that our profession is growing so fast and is so shiny and new. One thing that is growing fast and is also shiny and new is artificial intelligence. So as I said, we're going to talk a little bit more about AI today, and a little extension and an update from my introductory talk at OMAD, we're going to focus a little bit more on the precise clinical applications of AI rather than the higher-level philosophical stuff today. So like I was saying, the osteopathic community's interest in this topic has really exploded as well. As Chris can probably tell you, recently I was honored when Kathleen Creason asked me to deliver the keynote speech at the Board of Trustees Mid-Year Meeting for their strategic discussion with all the board and all the affiliate heads specifically on AI. Apparently, they did a poll, and that was like far and away what everyone wanted to hear about. And for some reason, she thought this third-year medicine resident would be the person who was appropriate to talk about that. Well, I had a lot of fun there, and I learned a lot. So there's some super exciting things coming down the pipeline, which hopefully we'll get a chance to talk about in a little bit. So without any further ado, let's get started here. All right. So here's our agenda. You know, in the Air Force, they always say you pre-brief the brief, brief the brief, debrief the brief, and then re-brief the brief. And that's funny the first dozen times you hear it. So we're going to retread. Oh, yes. Okay. I'll add that one in. Yeah. So we're going to retread some steps for those who might be new and discuss the background and history of clinical and large-language model AI today. Next, we're going to discuss the current and future clinical applications of AI, which is a practice update since last October, since the models have actually themselves improved a lot, and there are new things out there, even in that short, short period of time. Then, when we're all done, if we have some time, I'd love to get some thoughts that everyone has on this technology. And I managed to work out the kinks of my live demonstration of ambient dictation. So we can get the actual scribe on the screen now, and maybe we'll get a chance to play with that too. But we've got a lot to get through. So we'll get started. This is my AI use statement. I did use some AI to create this presentation. We'll talk about a couple of those. I have no financial connections, but like I said last time, if you're trying to start an AI company and you want someone, give me a call. All right. There's our learning objectives. We'll cover all those in this talk here. All right. Little about me. So as I was introduced, I'm an active duty Air Force internal medicine resident. I'm in Biloxi, Mississippi. I've got a couple months left of residency. Woo! Yeah. Or, you know, thank goodness, as I've been saying a lot recently. I feel like, sometimes I feel like I'm an A-10 with one engine and one wing, just trying to, you know, coast to a stop here. But next year, I'm proud to say I'll officially be going off to a lot closer to my wife's family in the UK. I have an assignment at Lake and Heath to work with the operational medicine and the flight medicine squadron there. I don't know if they'll let me near the birds or not, but I do hope to fly very fast at some point in the future. Some of you may know that I do a lot of stuff with the AOA. I'm a graduate of the AOA's Leadership Academy and the TIPS program, which were excellent, excellent programs that if you have students or residents, I highly, highly encourage you to encourage them to do those. I'm also on the editorial advisory board of the DO, the C3DO advisory committee, the Council of OMED now, the BEL previously, the AOF Osteopathic Future in Education Initiative panelist for the AOF, NBOME national faculty, all sorts of things, pretty much anything anyone will let me do. Despite all of that, I'm still best known for my Disney karaoke. Whenever I meet anyone in the AOA, they always say, hey, aren't you that guy that sang that Frozen song and jumped up on the piano in Boston? And to that I have to say, yes, yes I am. What can I say? There was a lady dressed like, I think, Glinda the Good Witch from The Wizard of Oz, and she had a dress made of champagne. I'm not responsible for my actions. Okay. It's a little quirk of my residency program that whenever we do any academic presentation, we always have a picture of our family and or animals. Since we're about to spend a lot of time talking about robots, I figure I'd show you some humans and one corgi over there. So that's my wife at OMED, wife and family, eldest daughter doing a sassy pose, and then the aforementioned corgi. All right. Another defining characteristic about me, I think I can say definitively at this point is I'm a bit of an AI enthusiast. I've been writing, speaking, doing pretty much anything anyone will let me, and talking to anyone who will listen about AI and medicine, because I really believe in this technology and I think it has the power. For me, it has actually already put joy back in medicine, which is a little bleak considering I've been doing it for such a short period of time. I really do believe it has the ability to make our lives better and have better outcomes for our patients. I hope in the next 45 minutes to an hour, I can show you why. Let me get you on the bandwagon. All right. So why this matters? Why are we here? Artificial intelligence represents one of the most profound developments in healthcare in decades, with the potential to create revolutionary and seismic changes in the practice of medicine as we know it. Integration and adoption of the large language model AI in particular has progressed so rapidly over the past months through the technology's potential to improve patient outcomes, mitigate labor shortages, reduce costs, and ease documentation burdens for healthcare providers. So I'm here today to give an update of some of the leaps and bounds that AI has already taken since I talked to some of you all at OMED, and also talk a little bit more about some specific use cases and products in the clinical AI space that you might be able to put to use today in your practice. We didn't really get that focused at OMED, so this is more of like a practice update. As I said at OMED, if you are not clinically impacted by AI in your practice, within five years I predict every single one of you will be. It's just impossible to avoid. So that's why we're here. All right. Introduction and background. Like I said, I did present some of this at OMED, so I'm rehashing it for those who might have missed it or are completely new to AI, but it's important that we cover it. So absolute basics. What is artificial intelligence? It's the simulation of human intelligence in machines, which allows them to perform tasks that typically require human-like cognitive function. Or as I like to say, this is the only technology I know of that can talk back to you. AI algorithms can learn, reason, problem solve, perceive, understand, and produce language. Despite being in quiet, steady development since the mid-20th century, artificial intelligence was once relegated to the farthest reaches of science fiction. But no more. AI is a real, tangible thing that's already highly integrated into our daily lives and our clinical practice. AI in my clinic? That's right, folks. You heard that right. Contrary to popular belief, clinical AI is also nothing new. In fact, some of you are probably using some version of it in your practice today. If you've ever clicked out of one of those annoying pop-ups in Epic or your EMR, if you use that, or received an automated billing query, or interacted with a health insurance company at all, you've probably interacted with AI in health care. The very earliest decision support algorithms for clinical practice were actually in use as early as the late 1970s with the Mycin system, which was developed to diagnose bloodstream infections and recommend antibiotics. And that's a long time ago, the 1970s. The majority of this discussion is going to be devoted to the elephant in the room, which is the advent of large language models and the first true general AI systems. But we should pay credence to some of these examples of AI that are already with us. And you can see there was always a natural home for AI in the acquisition devices was kind of the first real steps that they took, simply because a lot of the measurements that you get with these acquisition devices, like echocardiograms, and the applications of radiology are very measurement and computational heavy, and computers can improve that over humans. They're already doing this a lot in echocardiography, multiple studies and presentations, and demos have shown that AI-enabled echo machines, for instance, are much better at getting echo measurements than human techs are. So that's something that's already changing. And then in endoscopy, they've done a lot of tests on these, like, polyp identifying AI programs, which are pretty cool when you consider that that's like real-time video analysis. We'll talk about that a little bit as well. But what do I mean by general versus narrow AI? We'll define those terms. So narrow AI, some of you may remember a chess-playing program back in 1997 that beat a human grandmaster. And I don't see anyone who might get this reference in the room, but when I played Super Smash Bros. 1999 as a kid, there was a – the computer on maximum difficulties for Samus used to kick me all over the map. I could never beat that thing. But you couldn't teach either program probably to interpret Shakespeare or make up a cooking recipe. And that's the difference between narrow and general AI, which is sometimes referred to as weak versus strong AI. And I don't really like that term because that computer playing Samus absolutely destroyed me. I don't think of that as a weak AI at all. And I don't think Garry Kasparov would think the chess-playing program that beat him was weak either. So narrow versus general. Other terms. We have machine learning, which is a term you'll hear all the time. It's an umbrella term that is employed with AI a lot. And this is really how you teach these algorithms to learn what they do. Deep learning and reinforcement learnings are subtypes of the machine learning that are used to train AI systems. Deep learning utilizes neural networks to analyze and learn from large amounts of data, often without explicit human understanding, which is why you'll sometimes hear the term black box assigned to AI. And probing this black box is sometimes impossible, but more often than not just requires a lot of extra steps on the developer's part, which are expensive and hard to implement. And so this has been probably one of the biggest areas of early policy intervention and something that President Biden was pushing with his executive order is more algorithmic transparency, which means looking inside the black box. So that's a big area for policy intervention that we're already thinking of. Reinforcement learning utilizes human influence to guide AI development using rewards or penalties based on its performance. And we can use reinforcement learning when we do prompt-based training of an AI aftermarket, which we'll talk about a little bit as well. Natural language processing and OCR are very important when we're talking about the applications of AI. Natural language processing, or NLP, is basically the process that allows a computer to read, understand, and produce language, whereas optical character recognition, or OCR, is how a computer can recognize handwritten or printed static text off of a scan or picture and turn it into editable text and actually read it. Ambient dictation, which we may have a chance to demonstrate today, is a combination of NLP, voice recognition, and the generative capabilities of an AI, which can allow an AI to transcribe a conversation and then record it as a scribe, which then you can re-enter as a prompt and have the AI do things with if you're working with a generative AI. So now on to my favorite topic. You probably heard a lot about these large language models recently, and no less from me. I've probably said the word about 50 times already. They are the new kid on the block, and in only a very short time, they've completely reinvigorated development in AI and launched a wave of enthusiasm across multiple sectors. I hope you'll appreciate that when people talk about AI these days, they're really talking about these things. And the difference here is that large language models are really the first true general AI systems. They are the first AI that can rightfully claim to have passed Turing's test, found to be indistinguishable from humans in one study of 4,000 virtual therapy patients. And in a different study to actually exceed humans in performance of emotional awareness understanding when rated by objective raters. So this is powerful stuff, and the wide availability of these apps, apps like ChatGPT and others like it, has really catapulted us into a whole new era of truly independently intelligent machines. If you were at OMED, you probably heard me talking about the weird, wild world of neural nets, transformers, and how LLMs like ChatGPT actually interpret and produce language based on their training. We're not going down that road again. Honestly, we lost a couple people. I hope they're doing okay out there. Hopefully they learn their wilderness survival skills. Basically, what you need to take away when you're thinking about ChatGPT is the way it works is that it's a very large neural net with billions and billions of weights and nodes that is explicitly and very, very powerfully applied to language specifically. It has this element called a transformer, which allows it to apply focus to different parts of its neural net and become different roles based on how you need it to put on a different face based on what its task is for you, rather than dealing with the entire model in its entirety. Like I said at OMED, basically the way that ChatGPT works is that when you prompt it, it creates a string of words based on the probability that each word will appear in that string. If you listen very closely to a three-year-old's language acquisition process, you'll see that many of the things that occur when we're learning language as humans is very, very similar. My three-year-old will start a sentence, and then start the same sentence again, and then start it again until she finds just the right one. That's basically what ChatGPT does. This diagram shows the multi-step production and training of a specialized LLM, which is really what we're talking about in high-acuity fields like medicine. This is from an excellent recent publication in the Annals of Internal Medicine. Much of the spooky neural network learning stuff that we talked about last time, that really occurs in the first box. Like I said, we're not going to go back into it. Once you have a baby large language model, like an eager child or medical student, you have to train it to think like you want it to. This is really how and where we can take these models and start applying them to specific tasks. This involves fine-tuning with specialized data sets and prompt learning. You may be able to pick out the tiny little stethoscope there. Basically what this means is we inherit the model here. It's been trained. It's been exposed to all of the works of Chaucer and Shakespeare, and everything anyone has ever written in humanity, and then you need to take this and be like, well, now I want you to interpret patient charts. We have more of a narrow data set, and then after that there's human feedback learning in this part of the model where they specifically train it to do medical things. Then afterwards you have this prompt-based learning, which is where the users of the LLM or whatever product it is can give feedback to the model and make it better after it's already basically been taken out of the box. That's one of the things that makes these things special is their ability to learn and improve as they go on. All right. Here's another chart. This is the first time from the folks who created Open Evidence, which I will discuss in a bit. LLMs with applications in medicine vary widely in how they were trained. One example is BiomedLM, which is based on GPT, trained specifically on PubMed articles, whereas another model called ClinicalBERT trained specifically on electronic health records. That's the narrow data sets I was talking about. For those of you who might work at an institution or a place that may be in the position where you're thinking of applying one of these things to your organization or what you do on a higher level, there's basically a couple options for how you do this. This is what I briefed the board about. The cheapest way and easiest way is you start off with a general LLM, like CatGPT. Then you basically train it with aftermarket prompt learning. You take this thing that's already fully baked and you say, okay, I'm going to train you to interpret X. That's the cheapest way to do it, but it doesn't necessarily always get you the result you want. The most time investment and expense-heavy way to do this is to start with essentially the bones of an LLM and train it specifically on your data. into it? Right. Yeah. Is that what they do? Do they connect any one of the details? Yeah, so the people that created this chart, they have a model called Open Evidence, which we'll talk about in a second. And it does exactly that. Like, it's a proprietary LLM, so one of these things, like something they made from the bone, like the absolute fundamentals of a large language model, artificial intelligence. And what they did is, what they trained it on is specifically open access peer-reviewed studies that had certain criteria, like an impact factor above a certain level, et cetera, et cetera. And then when new things are published, it incorporates it into its data. So that's kind of an example of the first thing. And that was developed by Harvard researchers and Mayo Clinic back, so obviously a lot of expertise and effort with doing something like that. I think, ultimately, most of us are probably going to fall somewhere in the middle here, where engineers will produce us a model, and then we'll give it specific data, like clinical notes. And then it'll be fine-tuned, and then we'll use prompt learning, so kind of a combination of everything. But this is just to give you an example of how you could potentially apply one of these things and how they're trained. And part of the reason for that is just that you'll have the ability to evaluate things that developers advertise, or if you ever are in the position that you might be shopping for one of these models. And we'll talk about some specific products, so we can simulate some window shopping here today as well. Here is a map of the currently existing major LLMs that are being used in medicine today, which is from the Review and Annals of Internal Medicine. The circle sizes reflect the model size and the number of parameters used to build a model. You can see some of these, like Palm, Google Developed is pretty big, 540 billion. These are the general purpose LLMs that people are using for clinical purposes, and then prompt training, like I was talking about. And then here are the specifically trained ones that were trained on more narrow data sets. And then, of course, you have some. This is the key here, and then you have some others that are not really in any of these boxes. So I include this information mainly, like I said, so you can evaluate these things if you're looking at them. As identified on the previous bubble slide, the most common generative LLMs are GPT and Palm-based. There's another tool called BERT, which is very powerful, but it's not really a generative AI. It's really for NLP processes. So for instance, like clinical BERT, what they'll do is they'll apply it to thousands and thousands of patients to thousands and thousands of patient notes, and they'll look for certain terms and context of the note, and that has to do with coding and upcoding and stuff like that. So that one behaves a little bit differently. But because it's the best known and, frankly, the one I know the best, we'll use GPT as a model to discuss all LLMs going forward here. So let's talk about the nuts and bolts of actually using an LLM in practice. ChatGPT is a chatbot built on the GPT engine, which means Generative Pre-trained Transformer. That's the transformer we kind of mentioned earlier. ChatGPT, as of today, is capable of direct upload of photos, code snippets, text files like PDFs. You can now directly upload things. When I talked to you in October, you had to do this kind of weird workaround where you had to use the code interpreter and put a PDF in. Now it can take photos directly. Another thing that's happened is they've combined it with Dolly, which is their image creation LLM. Dolly is a portmanteau of the Pixar robot Wall-E and the artist Salvador Dolly, which I think is kind of funny. It's part of a family of LLMs, which instead of words as their weight, remember I was saying that when ChatGPT creates a sentence, it picks the next word that it thinks is the most likely to be there. Well, Dolly does that, but with pixels. So it looks at a pixel and say, if you're asking me to make a picture of a rhinoceros, which pixel is more likely to be next to which other pixel where, in cumulative, it comes up with a picture of a rhinoceros? That's pretty crazy to me. I don't even want to think about that. Since October, OpenAI has also added an app store for ChatGPT, so you can look at specific apps that people have made. Kind of an extension of the plug-ins that were offered earlier, but just with a little bit higher reliability. Some of the plug-ins you had to register on third parties and got a little sketchy, but that's improved a lot. And then really meaningful to us, ChatGPT has an iPhone app with a wickedly powerful Whisper voice recognition engine. And also, the app now offers a Siri-style voice conversational UI, which I think is pretty cool. I would like to train that to do OSCIs or something like that. So now knowing a little bit about what ChatGPT can do, let's see how to work with it. Oh, ChatGPT? It was developed by OpenAI, which was a company which is now, I believe, owned by Microsoft. The online participants are complaining that they're not hearing questions, so if you don't mind just kind of summarizing and repeating the question. Right, OK. Yeah, that's right, because not everyone in the room is mic'd. So the question was, who developed ChatGPT? It was developed by a company called OpenAI, which I believe is now owned by Microsoft. I mentioned this one just for those of you who happen to work for the government. This is a CUI-approved program called AskSage. Has anyone used this? No. Has anyone used AskSage in here? Anyone work for the government specifically? OK. I don't. I was just going to, your previous slide had a deal where it said where the ChatGPT is both free and paid. And I want to make sure everybody understands that, because there is a free version that's very, very, very good. It's downloadable, and it's open to anyone. You're absolutely right. Yeah, the downsides of the free version is it uses the 3.5 engine, and you lose out on some features. But 3.5 is pretty good as well. Yeah, yeah, also, yeah, that's right. The biggest downside I've seen with using the free version is that you get traffic gated, and you're limited sometimes in the number of queries you can do. So for $20 a month, honestly, if you're really interested in using this and you want the unlimited queries and stuff, it can be worth it. All right. Yeah, I just mentioned the AskSage because it exists. I've used it a couple times. I worked with the developer gave me a free trial. I tried to use it for generative AI in the clinic for ambient dictation. It didn't really work that well. It's sort of like absolutely, the free version of this is absolutely unusable. You have to pay for it to work. So I hope we get more options in the DoD is what I'm saying. Some of its features based on its website. Just something to be aware of if you happen to have those restrictions. Also, more importantly, within the past year, an extremely powerful new AI solution has come onto the scene that addresses many of the issues that I feel limit the full capabilities of ChatGPT. Microsoft Copilot is built on GPT and Dolly, but it has the serious benefit of being backed and produced by Microsoft. Will this eventually absorb ChatGPT completely? Honestly, I think so. And the reason is because the biggest benefit of Copilot is it is directly integrated in Office 365 apps. So you can use it in Microsoft Word. You can use it in Excel. Whereas in ChatGPT, it's basically this external dialogue window that you have to port things through. Having the ability to have ChatGPT in your email inbox and say, I don't want to receive any more emails from these moonlighting companies or whatever. I want you to automatically filter these out based on an intelligent review of what's in these emails and things. That's pretty powerful. So I've only just started using this, and there's a strong inclination I may move over to this entirely. But I don't have as much experience with it. So going forward, since it's built on GPT, let's just assume that Copilot and GPT, sort of the bolts of using them are kind of the same. So the key to using an LLM is really the prompt. The prompt is how you communicate with the LLM. It's similar to a query in a search engine. But if you're using an LLM like a search engine and expecting it to read your mind, you're going to have a bad time. The best way to think of prompts is really like a genie or a curiosity store in a Stephen King novel. The more specific you are, the more likely you're going to get exactly what you want. If you're vague, the model will find the easiest way to answer your question, and that might not be what you want. There are a million ways to do prompts, but if you're just getting started from the very beginning, how I recommend is thinking it in a pretty simple three-part format called RTF, which means act as role, perform a task, and show as format. So that tells the LLM what you want it to do, how you want them to do it, and what form the output is going to take. Role setting is how chat GPT and LLMs like it get into character. It's the R part of RTF, and we were talking about the transformer, which allows it to focus on its knowledge base depending on how you prompt it to be. That's the role. So being in the medical field, it's probably useful to have the LLM talk to you, at least with the tone of a peer. Like I said, setting the role allows chat GPT to use its transformer more effectively, hone in on specific information rather than approaching everything from a general perspective. And there's some examples of tone modifiers there. All right. Can anyone throw out how they might ask chat GPT to do something using the RTF format? Just any ideas? Interpret an EKG. Right, OK, so that's your task. So the way I would reframe that is, as a board-certified cardiologist, interpret this EKG and report it in a standard billable report off of the EKG coding guidelines. That would be how you RTF that, interpret this EKG, and get a better output from that. Anyone have any other ideas? So that basically illustrates that. And what's the format? What? I was going to say using, I just blocked out the name of the text, but using Harrison's. Using Harrison's. I was going to say, punching a forgotten button one that I can't remember. Right. Confusing a letter. Yeah, so that's very good. The only thing we're missing is the format. So do you want it in patient language? OK, great. Yeah. You could also use like in a SOAP note, or in a consult note, or something like that. And that really dictates what response you get back from it. So here's some other example of prompt frameworks here. We already talked about the RTF. You can also use like tag, which is task, action, goal. BAB, which is before, after, bridge. CARE, context, action, result, example. RISE, role, input, steps, expectation. And there's sort of some examples in very small text there. This chart is from just like a general, like you search on the internet how to do prompt engineering. This one comes up from like a chat GPT expert site. So lots of stuff about this is out there. We talked a little bit about prompt feedback learning. It's important to understand that the output of an LLM obviously isn't always perfect, but these models' ability to learn is what makes them really special. So by refining and making edits to an output, the model will regenerate a better response. Over time, this is how the model improves and masters your style. So when I was using chat GPT specifically to do clinical notes, which I have to do when I'm rotating at like the civilian sites, I would have like a single context basically where I put in the beginning examples of how I wrote my SOAP notes. And then I would just put in patient transcripts into that like one LLM context. And over time, the responses started to sound way more like me and less like a general person writing something. And I got a better response. And so if like chat GPT puts out something that you don't want, you have to tell it that you don't like that, and here's why, and here's how to do better. And it usually gets there. All right, now that we've learned the nuts and bolts of how to use an LLM AI, let's talk about use cases in specific AI products. So based on my own experience and an extensive review of the growing body of medical literature in the AI space, this slide lists most of the current practical applications of generative LLM AI in medicine today. I covered some of these at OMED, but the technology has since refined, and I do want to go over them again for those of you that might have missed it. You can kind of see those on the screen. We're going to talk about some specific ones. Do I have to be done by 9 or I'm just trying to time myself? OK, all right, all right, we'll get there. Here's some specific clinical applications for AI in our field here. Fitness for duty evaluations and screening exams, AI can process questionnaires, simplify documentation, and clinical reasoning functions can assist determinations. Same thing for the DOT MRBs and the FAA exam AME guide, which I've been told is about 600 pages long. AI can provide the AME with exactly the needed references based on sort of a more nuanced question. Rather than just searching it, you can say, hey, you've read this 600-page guide in less than three seconds. What do you think of this specific question, and where is the supporting evidence for it? All right, so point-of-care decision support. Everyone uses point-of-care resources like UpToDate and Dynamed. I was told that we're giving students apps now to make all the clinical decisions. How much more would you benefit from an intelligent chatbot that could use these databases to give you fast, accurate, and personalized answers to your questions, much like a curbside consult in your pocket? I have some companies up there that are already doing this, and I want to talk specifically about open evidence. So huge thanks to Cole Zanetti, who is like the big AI guru for the AOA. Him and Sameer Sood are sort of leading this whole effort, and I've been really grateful to get to work with them on several things. He showed this one to me. We talked about it a little bit already. It's developed by Harvard researchers, backed by Mayo Clinic and Elsevier. It's nonprofit, free for use by physicians. You only need an MPI to register. And that's why I'm OK mentioning it, because it's free and nonprofit. This is an LLM that only uses peer-reviewed open access studies. When a new one comes out that meets its eligibility requirements, it automatically incorporates it into its knowledge base. And the best thing is we talked a little bit about issues with hallucinations at OMED, which is when these LLMs basically create something that's false, like a false source, if you ask them to create a source. This one automatically creates a bibliography with every response it gives you and links to the papers. So this one, of all of them, I feel like is the easiest to self-verify the output. Yes. Right, so the question is, does this one always get the most current version? So I know that this uses guidelines from the medical societies. I don't know specifically if this uses guidelines like the AME, for instance, but in terms of the medical societies, they just published a whole new guideline set for diabetes. It would absolutely have access to that. And like I said, it would link it, and then you could look at it yourself and very quickly self-verify the output. So that's one I use all the time. Whenever I have a clinical question that I think is really nuanced, and I'm trying to see, like, hey, what should I do with this patient? What do the guidelines say? And I don't have the time or expertise to really search through those guidelines specifically. I'll use this one, and it will interpret my question and give me an answer based on what's out there. I've found this really useful. When you say self-verified, does that mean I take their output and I have to go look up all their literature? Does that mean that this one is protected against hallucinations? I don't know if you all can hear that. The question was, is this protected against hallucinations? I have not experienced one with this one yet. Because of my experience, we'll usually self-verify what comes out of it. If there was a specific part of this question that linked to a specific source, I'll almost always look at the source and just make sure. And I personally have not experienced a hallucination with this one yet. But that's what you mean by self-verify is, when it gives you bibliography, you actually look at at least the abstracts of those articles to make sure that they actually exist. That's right. And currently, with the way AI is in the world today, I would recommend that everyone do that. At this point, that's the responsible thing to do. All right. So probably the most, the clinical application that's the most interesting to people is as a medical scribe. I use this every day in my clinic. And it's just amazing sitting there and talking to a patient directly and not having to write things. At the end of the encounter, so right now, I've been using a combination of this app called Nabla Copilot, which is great because it's free to residents and students. And for physicians, I think it's like $90 a month or $119 a month, which in comparison to the cost for a self-license for Dragon Medical being like $70 a month, and that one doesn't do anything generative, that's pretty cost effective. Anyway, I use that one in clinic. And that one is purely a medical scribe. It doesn't really do any of the generative or clinical reasoning stuff. You'd have to put that back into like ChatGPT for that. But even just having that, what I really like about Nabla is it does a time stamped transcript as opposed to ChatGPT with Whisper, which it does the entire transcript as a block and can sometimes get overloaded if you have the patient who talks for 45 straight minutes. That never happens with Nabla because it's time stamped and it's processing the transcript one line at a time. So I haven't had that issue with that one. But it's great. I mean, we talked about ambient transcription. Yes. Thank you. I know the point is this does more, but you're sitting there talking to your patient. How does this differ from dictating? Like I will dictate into a note. So what will this do for me when I'm talking with my patient than just keeping a dictation? That's a great question. I think that despite the fact that we're crunched for time, I think we need to do this. So I put Nabla on the screen here. I'm not going to have time to simulate a full encounter, but this is basically what it looks like. You can set your encounter type as in-person or phone call, test your microphone. You can put a patient context in. Like say I've got a 65-year-old female with a past medical history of GERD and chronic diarrhea, et cetera. Then you start the encounter. So see, basically, I'm just talking here. And I'm like, so this is so-and-so. What brings you in today? Oh, my stomach is hurting again, et cetera, right? And it's me talking. And despite my falsetto, the program knows it's me talking. But it will often differentiate between speakers, too. So it'll have like a line break. And this is basically it. I mean, the difference between this and dictation is I'm talking, the patient's talking, and then at the end of it, I will say something like, OK, Mrs. So-and-so, this is what you've told me. These were my exam findings. This is what we're going to do about that. Do you have any questions? And she's like, no, that sounds good. And then I hit End Encounter, and it creates a note. So a lot of the actions that I'm taking to dictate already, they're just ambiently being incorporated into this technology so that I don't have to, like previously, and I'm ashamed to say I tried this once because I was really going for efficiency. I literally took my computer on wheels into a patient room one time, had my dragon, and I was like, so what brings you in today? Oh, my stomach hurts. Patient state, stomach hurts, onset three days ago. Describe the pain for me, right? Nobody wants that. This is what that's doing. It's basically writing as you go. So it removes the secondary step of you then having to dictate, and it allows you to practice medicine in a very natural and fluid way and have these conversations. Yeah, it's better than Ascribe because the other thing is, what you can't do with Ascribe is you can't go back to your, once it generates the note, and you see it's basically picking up everything that I'm saying here in real time. Once you generate the note, you can go back to the transcript and look at specific timestamps. You're like, I think around 15 minutes we talked about that. I can't remember what they said. For whatever reason, it was missing from the note. I thought that was important. You can go back and there's an actual transcript of what you talked about, which I use all the time. So that's NABLA. That's an example of just one of these medical scribes. This is probably, aside from ChatGPT, is probably the cheapest one. ChatGPT at like $20 a month is the cheapest one. But again, I have had issues with if you go past 15 minutes, there's a risk that the transcript overheats basically and you lose the whole thing. Whereas that's never happened with this because it does it line by line. Claude is an LLM that has become very, very popular. And I'm not sure if you've heard of Claude or not. But Claude I got asked if I use Claude much. Claude is an LLM that has become very, very popular, but currently Claude doesn't have, it's a general LLM, like ChatGPT. Currently I haven't found any specific focused medical applications of Claude, but chances are they're working on something, I would imagine. But I looked at this a couple weeks ago, because I'm on a bunch of AI newsletters, and this one that I read called The Neuron talks about Claude all the time, and they love Claude. But as far as I know, Claude is like a general LLM. It doesn't have specific biomedical applications. Was that cool? I mean, did you like that? Is that something you think you're going to use? Okay, awesome. All right, I'm going to try to cover the highest yield stuff going on here. So yeah, here we go. These are the AI scribes. Dragon, of course, has one that they're working on. In demonstrations, that is the best one that I've seen, because it is both doing what Nabla just did, but then it also has generative abilities, and it's built right into Epic, so it can look at the rest of the patient's chart in context. I cannot figure out how much it costs. When I asked the rep, he said, we build on an enterprise level, which means, I think, very expensive. Yeah, so. Sir, I don't know if you're going to be speaking about this in a minute, but my concern is, as you've talked with OMED, I'm wondering what were the fears that many of the physicians in the audience might voice relative to AI into their practice? I appreciate the fact what you've said is coming down, and I certainly agree with you, but the resistance we're finding might be a bit daunting. Right, and that is a topic that could eat up. Well, in Clearwater, it ate up like two and a half straight hours of me taking questions about that. To summarize everything, people are afraid AI will replace them. My argument is AI is a tool, and it can never replace us. Certainly, if we do OMM, it can never replace us, because it doesn't have hands yet. Yes. Right, but yeah, so people are afraid it will replace them. People are afraid it is not reliable. There's a lot of policy action to address that fact, and people were afraid at the board level that osteopathy would be—essentially, there was a risk we could be forgotten about as these things pressed forward, and so I said one of the arguments that I made to the board is a really important thing for us to do right at the start is to be aware of that, and I feel it's very important that we approach these developers of these technologies from the beginning that are using them for clinical decision support and say, we want to make sure that osteopathic physical findings and osteopathic principles and practice are incorporated into these algorithms so that DOs can use them, and I can say, you know, I found these—these are my structural findings. I found these Chapman's points, et cetera, et cetera, and the model would know how to help with that. So those were all kind of things that came up. There was a—on the this will replace us front, Sarah Wolf brought up this really interesting idea that actually what the LLM can do, what you can train it to the level of, you wouldn't be able to surpass a physician, because ultimately you still need a pilot of the aircraft, right? But what LLM AI might do is make mid-level practitioners less—it would reduce their role in the hierarchy chain, because you would have this basically robot that could do a lot of the things that we use like mid-levels to do now. So there was this idea that maybe AI would improve the value statement for physicians as being the team leaders, and then the AI is like the virtual team member. Those were kind of things that came up at the board meeting, and I can tell you without revealing too much that the AOIA is working on this like multi-pronged strategy to shepherd AI and other like emerging digital health technologies and be like kind of the beacon for our profession. And I'm really, really excited about the things that they're doing. Okay. And I can't wait for them to share them with the community. I think we're going to hear more about this at House of Delegates without revealing too much. All right. I'll try to get the highest yield stuff going on. Scan doc processing with optical character recognition. You can put in a handwritten form, and it turns it into text. I do this a lot too. There are specific companies that do this as well. For whatever reason, they all like—if you log onto their websites, they're always talking about like this is for insurance companies and lawyers, but they never mention like health systems and physicians. So maybe this is a gap in the industry, but they do have—yeah. So these people are working for the enemies, right, but they're not working for us. Not yet. Okay. So the potential of AI. I want to talk really briefly about like what we've talked about so far really just scratches the surface of what AI can do. The best way to think about AI is this is like a nascent technology, but it's one that's going to grow almost like a child into an adult, and its capabilities will grow with it. And there's not a lot of other technologies that are out there that have that kind of growth potential. So a couple of things we can expect in my opinion and the opinion of the general consensus AI to do in the future. In the future, you may have a full scope medical practice, like an AI that runs every aspect of your outpatient clinic, that schedules, that deals with staff, that answers patient questions, that sends letters to insurance companies, and basically will create this like co-signed document that you look at in 20 minutes in the morning with your coffee. And then you go into the room with your first patient, and it follows you along, takes notes at your medical scribe, writes your notes, reminds you of the preventative medicine thing that you forgot, and goes on. So basically like a full scope of medical practice. This is something AI is capable of. That kind of includes the personal assistant. Also, there's this idea called scalable privilege, where you take like the entire aggregated expertise of like a rare subspecialty at like a tertiary medical center, and you make it so that someone in rural Oklahoma can say, I need to consult an expert on like a neurologist who deals with like rare motor disorders on this patient who's in front of me who is miles, you know, a thousand miles away from Stanford, right? That's the idea of taking like the expertise of like the highest levels of medicine and making them available to everyone with this like pocket curbside. That's something AI can do. Remote patient monitoring they're talking about. I saw a bunch of stuff in the program about the prison system and the prison medical system. Maybe that's an area where remote patient monitoring, which is basically it's like an AI functioning up to the level of a mid-level inside of a camera that's being fed like biomedical data from like an automated sensor or something like that that basically is always watching a patient. So you can simulate the inpatient experience outside of the hospital. And then live video analysis. I don't know if anyone's involved in like accident reports or mishaps or anything like that. One summer I worked for a law firm, and my entire job was to interview like these hundreds of survivors of this train crash to look for like one specific thing which would make this law firm that was suing like the train company, that would make their case. So there's already a NEC has come out with a, the NEC corporations come out with like an LLM that watches dash cam videos from trucks and stuff to create accident reports. It can like review hundreds of hours of like dash cam footage and edit it down into something that's like specifically what you're looking for and then produces a report. So this stuff is already happening. But this is where it is in the future. All right. Okay. So, and we actually, we actually got there. We're not, you know, considering we started a little bit too late, we're actually not too far off. You know, Dr. Shumway, for those of you that do academic and technical writing, it's a tremendous help. So 10 years ago it was no trade secret that if you were working on a dissertation you got somebody, you hired somebody to proofread it and to do your bibliography, et cetera. Well, now you just upload your dissertation and you say, is it APA 7 compliant? And it'll come back and give you all the edits you need to make. It'll correct your bibliography, find passive voice, and it'll find all this passive voice for you. So it becomes your grad student English major for you. And it's phenomenal. And now I'm taking that same content and I'm submitting it for a journal, which uses Chicago, not APA. So it's going to redo my bibliography for me. So it's a wonderful tool not to come up with your intellectual property, but to reformat it into their desired, you know, methodology. So it's been great for me. Absolutely. And, Jeff, I think that really hits at, like, the core of what I feel like the real promise of AI is. That, like, soul-crushing busy work of having to, like, reformat an entire paper that we used to have to do by hand because, like, when I get to the end of writing one of my papers and I realize that I need to move the sources around and I need to change the footnotes, and it's just, like, the amount of mental energy that goes to doing those, like, menial tasks, AI can do that and it can free us to think about, like, higher-level things and creative things and just make us better people. So one of the quotes from an expert at the National Academies of Medicine, which was working on AI, said that I found that resonated with me, is that, like, ultimately going forward we need to shift the paradigm from doctor knows best or patient knows best to, like, person powered by AI knows best, because ultimately AI allows us to be the best version of ourselves. And people harp on, like, the second brain or whatever. I don't think that's a problem. I would love for AI to hold all of that stuff and produce the things that I've forgotten and the things that are important so I can think about the nuances of the problem in front of me rather than dealing with all of the busy work. It helps, too, as we do convention planning. You could take a 29-page vita and say, I need a half a page summary introduction. It'll do that for you. It'll create a needs assessment. I mean, it's been phenomenal for us. The other thing it'll do, like, I got a new dissertation chair the last year, and she insisted that every chapter of the dissertation have a summary of the entire dissertation. Well, I'm not creative enough to restate the same thing five different ways. So I use different – I use Claude and I use Chat and I use Bing. Anyway, and then they'll come up with the summary, but then you're duty bound to re-edit and make sure it's written in your own way. But it'll give you a different way to restate the same thing if you need to do things. Sometimes just, like, to get over a writer's block, just having something in front of you you can edit is such a powerful thing, even if it's garbage. I mean, they used to say that putting the pen to paper is the only way to get past a writer's block, right? You put something there because then that's a building block to build on. AI is pretty good for that. Well, we have just, like, a couple minutes left, and thank you so much to the following speaker. I really do just want to hear if there's anyone else out there that has any thoughts about AI because, honestly, like, when I was addressing the board, I felt like the most fruitful stuff came out of, like, the audience comments, stuff I hadn't thought about.
Video Summary
Dr. David Shumway, a third-year internal medicine resident and active-duty Air Force captain, delivered a presentation on artificial intelligence (AI) during the Public Health and Preventive Medicine session. Shumway emphasized the growing importance of AI in the medical field, outlining its potential to revolutionize healthcare by improving patient outcomes, addressing labor shortages, reducing costs, and easing documentation burdens for providers. The focus was on the practical applications of AI, especially large language models (LLMs) like ChatGPT, which have begun to show significant promise in clinical settings. Shumway discussed various AI applications, such as point-of-care decision support, medical scribing, and document processing. He highlighted specific AI tools and software, including ChatGPT, NABLA Copilot, and Microsoft's Copilot, which can be integrated into existing clinical workflows to enhance efficiency and reduce administrative burdens. Furthermore, Shumway addressed common concerns about AI, such as reliability and the potential to replace medical professionals. He concluded with an optimistic outlook on AI's role in healthcare, suggesting it could free healthcare providers from mundane tasks, allowing them to focus more on patient care and complex problem-solving.
Keywords
artificial intelligence
healthcare
large language models
ChatGPT
clinical workflows
medical scribing
AI tools
patient outcomes
David Shumway
×
Please select your language
1
English