Avoiding the Pitfalls of Hallucinating GPT Models

GPT models have significantly impacted HR technology but can produce ‘hallucinations’—misleading or incorrect information—highlighting the need for HR professionals to understand and mitigate these errors. A scenario involving a model misclassifying an employee’s potential illustrates this challenge. To address such issues, grounding techniques like prompt engineering, integrating external databases, fine-tuning with domain-specific data, Retrieval Augmented Generation (RAG), and Reinforcement Learning from Human Feedback (RLHF) are essential. These methods enhance the model’s accuracy and relevance, improving its application in HR for talent development. As AI technology evolves, its role in HR will become more critical, making it vital for HR professionals to grasp and apply these techniques to navigate the future of talent management effectively.

Welcome to the next addition to my series, AI Buyers Guide for Human Resources (HR) Professionals. This is article number 3 in the series. My objective for this series is to arm HR professionals responsible for selecting, deploying, and managing AI-based HR Tech solutions in the enterprise with the knowledge they need to perform these tasks confidently. The information shared here is not just of value to HR professionals but also generally applies to any buyer of AI-based software. I hope you find the information helpful, and welcome your feedback and comments.

GPT (Generative Pre-trained Transformer) models have been game changers in the rapidly evolving landscape of HR technology. However, applications that use these models can sometimes suffer from ‘hallucinations’ — instances where the model generates misleading or incorrect information. Understanding and overcoming this challenge is crucial for HR professionals, especially generalists who are not deeply entrenched in AI technology.

Imagine a scenario in which a GPT-based model is tasked with plotting an employee on the company’s 9-box grid. The HR team carefully crafted a prompt supplying the model, in this case, ChatGPT using the GPT4-Instruct model, with details about the employee, such as their recent performance scores and feedback from managers and peers. Looking at the two possible extremes of the 9-box, employees who fall on the bottom left of the grid (low performance-low potential) are transitioned out of the organization, and those who fall on the upper right (high performance-high potential) are ready for promotion.One of the company’s employees, Sarah, a recent college graduate, received an outstanding performance appraisal after her first year on the job. Feedback from Sarah’s manager and several of her peers was overwhelmingly positive. The model classified Sarah as an employee “ready for promotion.”

The HR team at this company is aware of LLMs’ shortcomings and sent the employee’s data and the model’s recommendation to a team tasked with validating the model’s recommendation. In this case, the team concluded that the model had incorrectly classified Sarah’s potential. The model didn’t account for the many factors identified by HR experts required to evaluate employee potential. This type of mistake is referred to as a model “hallucination.”

Conversely, it could also have classified Sarah as an employee with low potential. In both cases, the model doesn’t actually “know” in the human sense of the word which one is correct. It doesn’t understand employee potential as an HR expert understands it. It doesn’t know anything about the factors used to evaluate employee potential, yet the model identified Sarah as having high potential. How is that possible? It has learned that given a particular sequence of words, i.e., the “prompt,” it will generate the most probable sequence of words that follow the prompt, i.e., the response we see as a statement of Sarah’s potential. This is, in essence, what a language model does; it completes a sequence of tokens, words in this case, given a starting sequence, i.e., the prompt.

While the model may have been trained on vast amounts of data from various sources, its knowledge of the world and its ability to perform such tasks depends on the data it was trained on and what the model learned during the training process, among other factors. Model hallucinations are a significant challenge in AI, as they can lead to misinformation and reduce the reliability of the model’s responses or predictions, leading to a lack of trust or, worse, making recommendations that ultimately adversely affect someone’s job or career. Addressing these hallucinations is crucial for improving the accuracy and trustworthiness of AI systems.

To reiterate, LLMs don’t understand the task they are being asked to perform or the implications of making a mistake. These mistakes, or hallucinations, can be mitigated (not eliminated) by applying grounding techniques. Grounding techniques are like updates or additional training sessions that enhance the model’s performance. Ultimately, our goal is to teach the model something new and to use that new information to generate a result that we both have confidence in and trust. There are many grounding techniques. I’ve presented a handful of the more popular ones below.

  • Prompt Engineering: Crafting effective prompts involves creating specific, clear, relevant questions or statements that guide the model to generate useful responses. The quality of these prompts directly impacts the relevance and accuracy of the model’s output. In the context of employee development, crafting clear prompts that focus on specific aspects of employee performance or development needs can help the model generate more actionable and relevant insights.
  • External Databases: Integrating with external databases allows the model to access additional data sources outside its initial training set. Access to rich and relevant data enhances the model’s ability to provide accurate and context-aware responses, ensuring that the model’s output is more aligned with real-world data and current information. For HR applications, linking the model to comprehensive employee databases can provide deeper insights into workforce dynamics, leading to more informed decisions regarding employee development and management.
  • Fine-tuning: Fine-tuning involves training a pre-existing model with specific, domain-relevant data to tailor its responses to particular needs. This process adapts the model to focus on and prioritize the new, specific data it has been exposed to, improving its performance in specialized tasks. By fine-tuning a GPT model with HR-specific data, such as company protocols and employee performance records, the model can provide more precise and contextually relevant responses for HR-related tasks.
  • Retrieval Augmented Generation (RAG): RAG is a technique that combines the model’s generative capabilities with the retrieval of relevant documents or data from a large corpus. This approach allows the model to generate responses that are not only based on its training data but also enriched with up-to-date and context-specific information from external sources. In employee development, RAG can enable the model to pull in relevant documents, policies, or records while generating recommendations or insights, ensuring that the outputs are well-informed and contextually accurate.
  • Reinforcement Learning from Human Feedback (RLHF): RLHF is a technique where the model is iteratively improved based on feedback provided by human users. This process helps the model understand the nuances and complexities of real-world applications by learning from human evaluations of its performance. It enhances the model’s ability to generate more accurate and useful responses. In HR contexts, using RLHF allows the model to refine its evaluation techniques and recommendations based on continuous feedback from HR professionals, ensuring that its outputs align more with human expectations and needs.

Integrating GPT models into HR technology holds immense potential for transforming talent management and employee development. However, the inherent risk of ‘hallucinations’—where models produce inaccurate or misleading information—must be carefully managed. By employing grounding techniques such as prompt engineering, integrating external databases, fine-tuning with domain-specific data, Retrieval Augmented Generation (RAG), and Reinforcement Learning from Human Feedback (RLHF), HR professionals can significantly enhance the accuracy and reliability of AI applications. As AI technology continues to evolve, HR professionals need to stay informed and adept at utilizing these tools, ensuring they effectively navigate the future landscape of HR and harness the full potential of AI to make informed, strategic decisions.

Frank Ginac
Co-Founder and Chief Technology Officer at  | + posts

Frank Ginac is a leading figure in the intersection of artificial intelligence (AI) with talent management. As the Co-Founder and Chief Technology Officer of TalentGuard, he has been pivotal in advancing AI applications to address complex challenges within large enterprises. Frank also serves as the head of TalentGuard Labs, where he drives innovation in AI to enhance employee growth, organizational growth, and operational efficiency. Frank has also made significant contributions to the academic field by providing instructional support for graduate-level courses in AI and related subjects at the Georgia Institute of Technology. His work in education reflects his dedication to nurturing the next generation of tech innovators, ensuring ongoing engagement with the forefront of technological research and development. He holds a Master of Science in Computer Science from the Georgia Institute of Technology and a Bachelor of Science in Computer Science with Honors from Fitchburg State University. His academic and professional achievements underline his role as a visionary in applying AI technologies to enhance talent management systems and his commitment to educating future leaders in technology. A published author, he has contributed to AI, software development, and quality assurance through his books and numerous articles. He can be reached at frank.ginac@talentguard.com.

Related Articles

Join the world’s largest community of HR information management professionals.

Scroll to Top
Verified by MonsterInsights