Profile

1 Oct 2023

LLM chatbots in critical systems



Customer Service Chatbots: How to Create & Use Them for Social


Explained] ChatGPT: What is it, How Does it Work, And More - MySmartPrice

Introduction

Of the two images of a chatbot interface above, which one does your brain lead you to? Which one seems the more complex and available to whatever query you have? 

Now which one is actually being used as part of central systems across companies including banking, online shopping and finance? 

The first represents the current mainstream chatbots used in many companies’ systems: a sometimes unhelpful, stale linear path where you can always tell you are not speaking to a human. The other is chat-GPT, which has the benefit of being a LLM, capable of conversing in any topic part of its training data. 

Why are we still stuck with unhelpful linear chatbots? 

The first reason is it just has not been a long enough time for companies to start or trust the implementation of a LLM into their systems. It has occurred in search engines for example, but not central systems with user accounts such as a bank account.  

For the second reason, we must consider the underlying program which is a linear chatbot versus a LLM such as GPT. Let’s say it was a pizza restaurant’s FAQs. The user might ask if Order 1 is vegetarian. The bot might have that recorded and answer appropriately. Now let’s say the user asked if the pizza is gluten-free. The chatbot might not have it recorded. This is where the frustration starts as the bot has no information, all it can do is connect it to a human (or in the diagram above, simply give up and end the conversation). 

Now consider a LLM such as GPT that has been trained on a huge variety of text data. It can be further trained on the FAQ database of the pizza restaurant (this is how GitHub Copilot works: It has the initial training of GPT-3, supplemented by additional training on code and different languages). This allows it to answer the vegetarian question. For the gluten question, because of its understanding outside of just the FAQ, its answers can be adaptable, what might it say? Let’s ask it. 

“Order 1 is indeed a vegetarian pizza. However, I don’t have specific information about its gluten-free status.

To ensure your dietary preferences are met, I recommend contacting the restaurant directly or visiting their website to check for gluten-free options on their menu. If you have any other questions or need assistance with your order, feel free to ask, and I’ll do my best to help!”

The LLM is not simply checking a database and returning a result or null. It understands the user’s query and responds with highly contextual advice. It even recommends visiting the website for other gluten-free options. This is much more human-like than the previous chatbot. 

This higher level of assistance is expandable across different systems too. Consider banking. A user wants to open a bank account but is unsure of how to do so due to international rules from the country they are from. The LLM might be able to consider the country the user is from and guide them through every step to creating one, rather than a linear chatbot which might just contact a human for them or link a guidance page. The benefits seen here are an increase in the efficiency of help for users in many different systems. 

What’s stopping us? 

One problem that might affect smaller to medium-sized companies is the extra cost of building and maintaining a LLM support chatbot. While the bulk of the training can be done by a company like OpenAI for GPT, additional training is needed which will be specific to every company ( such as the FAQ database on the pizza restaurant, or a Bank’s guidance pages ). This includes processing power and also hiring engineers who know how to perform this task. Furthermore, maintaining the chatbot system as new features and systems are added to the company will be another consideration that must be taken. 

Another, more alarming issue, is the security risks that arise from a LLM compared to a linear chatbot. The connection to a critical system for a LLM allows it to adapt and be versatile in its answers but also opens the door to inappropriate answers. 

“So far, they (LLMs) haven’t been connected to critical systems; there’s nothing you can do with them aside from getting them to emit hate speech. The real tests will come when these models are connected to critical systems. Then we will see attempts at data poisoning (feeding the model corrupted training data), model reverse-engineering (discovering private data embedded in the model), and other exploits.”

What Are ChatGPT and Its Friends? By Mike Loukides

 

One possibility is prompt injection. Take, for example, the prompt injection against Bing’s search AI to get it to reveal its name. 

When connected to a banking system, there might be a large amount of information unavailable to the current user, that they should not be able to access. For example, if one asked the banking LLM chatbot to transfer the next account in the database’s money to their account, without specific instructions not to do this, a LLM might follow those instructions. This is quite an obvious pitfall, but there are so many more that pose a risk that have simply not been discovered yet.  

From my understanding, this is akin to putting a plaster on a bullet wound. One cannot just keep telling the LLM to ignore specific instructions when the nature of the LLM is to be flexible in its answers.  

Furthermore, there is also the issue of hallucination. Hallucination is when a LLM provides text that is nonsensical, contradictory or incorrect. This could be because of their dataset, which could include misinformation or contradictory information. It could also occur as a result of the users’ prompts. If the prompt given is contradictory. However, there is no true way to mitigate this from models like GPT because they are trained to extrapolate text from their data, and have no true knowledge of citations or truth. 

This is problematic for critical systems because a chatbot giving incorrect information to a customer (for example the wrong instructions on setting up a bank account) could place them in legal trouble. This has already been seen in the real world when two New York lawyers were fined after submitting a legal brief with references to cases that were entirely fictional. ChatGPT insisted they were real instances, but they were not.

Where are we headed? 

There are numerous benefits and drawbacks to implementing LLMs as chatbot helpers in critical systems. As it stands, I believe the risk outweighs the reward. 

LLMs need to be less susceptible to malicious intent and misinformation. Only then can we enjoy them as part of chatbots in critical systems. This could be from developing a chatbot helper-oriented LLM where the training data is specifically designed to avoid the pitfalls and only access the positives. However, there is also the risk of inaccurate training data, leading to misinformation once again. Another approach is an instruction before answering that dictates it must not be inaccurate or malicious, but even this is susceptible to break.

As of right now, the deterministic approach of current chatbots are more useful to businesses, as they can be given a narrow pathway that should support most users, and not go wrong. In the near future, this may change as the landscape of AI and LLMs is rapidly changing.