ChatGPT: Powerful Tool, Unreliable Business Partner
Large language models (LLMs) like the ones that power ChatGPT are taking the world by storm. Their ability to understand and respond as a human would to almost any topic, including more structured applications, makes them versatile enough to belong in any professional's arsenal of tools.
How far should this be leveraged beyond knowledge support? Will it be able to handle enterprise-level solutions?
To provide some clarity, our team has been developing a new AI-driven application for a client, and this solution did not leverage ChatGPT in any way. After reaching a functional state we were curious in how OpenAI (behind ChatGPT) would perform. We created the same application but leveraged OpenAI to process training data and prompted it with the same request. There were a few rounds of prompt engineering and revisions to ensure we're asking it in the most efficient way. However, we encountered several issues with the OpenAI variant. It would frequently 'make up' random answers to queries and give an answer it thinks would be correct if it does not clearly understand the question asked. We also discovered that the more you change the prompt, the more OpenAI will get stuck on the previous queries. Any new prompts would be futile as the results would be the same, if not even more random. Our problem space was a classification-based problem and in all test scenarios our application performed more accurately.
Could we have done more in terms of limiting the hallucinations or spent more time on prompt engineering? Perhaps. Instead, it is better to acknowledge that different machine-learning strategies are more applicable depending on the problem and requirements. OpenAI/ChatGPT is not the answer to everything. In some cases, certain solutions don't even require artificial intelligence despite every company's recent push to include this as a marketing strategy.
Revisiting the concerns again, OpenAI has a tendency to fabricate information or 'hallucinate' its answers. Instead of giving the user an "error", OpenAI prefers to at least TRY and provide any explanation, even when the answer is incorrect, despite any configurations to avoid hallucination. For enterprise-level applications where accuracy is critical (healthcare, legal, financial) the inaccuracies and blind use of the responses could lead to poor business decisions or worse, the wrong diagnosis for patients. ChatGPT also struggles with context. If a prompt needs more clarity, it may provide an answer that doesn't address the actual issue. This raises concerns about the exact value offered.
Imagine relying on a system that might answer a question with made-up information or give up when asked to clarify its reasoning. Businesses rely on systems that deliver reliable and explainable results. Unfortunately, OpenAI's reasoning process remains opaque. You can prompt the model to explain their work in specific scenarios such as formulas, problem-solving, code, etc. However, what we found is that while OpenAI provides a compelling set of explanations, there are constant flaws where simple conditions originally given to OpenAI are neglected, once again causing the wrong answer. Regardless, the real issue is when the problem is more complex, there is no visibility into how an answer is derived. Revisiting our own solution for the classification problem, we have full control over the different algorithms used, the training of the model, and more. This helps with the required support and maintenance, whereas OpenAI/ChatGPT solutions are simply a black-box approach.
This brings us to a variety of online services and companies offering AI-based solutions. Still, when looking deeper into their offerings, many are just GPT-wrappers. This solution doesn't truly offer any AI solutions but leverages OpenAI (or other equivalent AI-as-a-Service). It is the equivalent of going to a restaurant without its' own kitchen. The staff simply took your order, purchased it from a real restaurant, waited for it to be delivered, and plated it before giving it back to you. Now, if there's something wrong with the dish or you have questions about how it was made, good luck with that.
It's important to note that these findings don't encompass all aspects of enterprise AI adoption. Security, scalability, and data management are additional factors companies must consider. However, these insights offer valuable food for thought, reminding us that even the most powerful AI tools might only be ready for some business challenges. While ChatGPT is undeniably a powerful tool for generating text, its limitations become significant hurdles in the enterprise environment. Businesses seeking reliable AI solutions must look beyond the initial hype and carefully consider the model's ability to deliver accurate and transparent results.