Asking the right questions

Justin Yu and Ruchik Patel

In the era of generative artificial intelligence (AI), the ability to craft effective questions directed at AI systems like ChatGPT (OpenAI) can significantly influence the quality and relevance of the content generated by these systems. Similar to conversing with colleagues, posing well-thought-out questions to modern-day AI systems (e.g., chatbots) enhances the probability of receiving pertinent responses. In this article, we aim to outline best practices for prompt engineering (the process of communicating with AI systems to steer their responses toward meeting specific user needs) and will illustrate these methods with a recent case study where AESARA successfully employed generative AI in a client project. Unlike previous articles on prompt engineering, this article will also focus on the application of prompt engineering in the context of health economics and outcomes research (HEOR) and market access.

In recent times, prompt engineering has emerged as a new discipline that fundamentally changes how we interact with computers. In just a few seconds, for example, one can now prompt an AI system to generate or summarize large amounts of text like never before. Writing the background section for a protocol can be done at least 10-20% (or more) faster than before the existence of generative AI, and one can generate value messages from scientific articles with only several examples from an existing Academy of Managed Care Pharmacy (AMCP) or global value dossier. There are general but no one-size-fits-all rules for prompt engineering, and methods can vary substantially in effectiveness between various large language models (LLMs) that form the basis of different AI systems. Experimentation is often the best way to learn about and harness the power of generative AI and to determine which prompt (or LLM) is best-suited for a particular application (e.g., we currently recommend Claude [Anthropic] over ChatGPT for summaries of scientific articles and to obtain accurate page number references).

The basic principles of prompt engineering involve four S’s:

Single:

Focus each prompt on a single instruction or question that is well-defined.

Specific:

Each prompt must be clear, detailed, and unambiguous. Be direct and avoid prompts that would invite additional questions due to lack of clarity. Consider providing step-by-step instructions and even specify the desired output format and output length (e.g., count of words, sentences, paragraphs, bullet points), if needed. This approach will make it more likely that the LLM will generate a meaningful response.

Short:

Keep each prompt concise and to the point. The shorter, the better, as long as no critical information is missing. Consider using bullet points if appropriate.

Surround:
Provide the appropriate context around each prompt, which may include ordering of information, examples (e.g., “gold standard” versions of documents such as study protocols, study reports, and plain language summaries of manuscripts), asking the LLM to adopt a persona (e.g., healthcare research expert), choice of words (e.g., telling a LLM what to do instead of what not to do), reasons for the task/instruction, and uploaded materials. In the case of reference text provided to the LLM, ask the LLM to answer with citations or simply to take into consideration information from the text.

Including examples in a prompt can help provide additional context for an AI system. Examples are also relevant because the choice of prompt format, training examples, and the order of the examples can lead to significantly different performances from LLMs. Describing difficult or unusual cases can also provide much-needed context if there are occasional but consistent failures in an AI system’s response. Finally, similar to the process of developing a predictive model, consider testing a prompt on content in which there is a known correct answer or “gold standard” response.

In addition to the four S’s, how a prompt is actually structured can significantly impact the results. The CO-STAR framework, for example, can be used to ensure that all key aspects that influence the relevance and effectiveness of an LLM’s response are represented in a prompt. Note that there are elements of the framework that overlap with the four S’s, but the framework differs in that it is an actual template for creating a prompt.

Context (C):

Provide background information to help the LLM better understand the instruction or question. 

Tone (T):

Specify the attitude of the response to ensure it fits within the intended sentiment or emotional context (e.g., formal, objective).

Objectove (O):

Specify the task to be performed by the LLM.

 

Audience (A):

Specify who the response is intended for (e.g., patients, providers, payers).

Style (S):

Specify the writing style (e.g., complete sentences, like an expert in a certain field).

Response (R):

Specify the response format (e.g., list, JSON, report, custom format based on an existing study protocol template).

Additional prompt frameworks can be developed which reflect one’s needs for a particular task, mirroring the multiple ways that various tasks can be completed in the real world (e.g., use of SAS vs Python vs R for data science tasks). Similar to how there often exists an optimal programming approach when solving a certain task, prompt engineering is as much about finding a method that will generate a desired response for a given task as it is about finding the optimal prompt for the task (which may just be one sentence with the “right” combination of words in the “right” sequence).  When it comes to generative AI, always be ready to experiment with a prompt and iterate as many times as necessary to achieve the desired result. Prompt engineering is an empirical science, and with practice we can learn to make the most of generative AI in nearly any situation or for any task.

Case study

At AESARA, we are experts at gathering and processing information so that we wholly understand and can diagnose a client’s issues. A critical tool for collecting this information is the stakeholder interview, which has been traditionally conducted person-to-person. The inherent advantage of this interaction is that we can interject or modify questions in real-time as a client provides information and insights. This process is not dissimilar to the rapid iterative prompting and experimentation as suggested above. The challenge lies in where, when, and how to intersect AI systems and capabilities to the problem-solving work we do daily.

Given that AI systems are well-equipped at handling information, at AESARA, we have experimented with using AI systems to help us process and then quickly analyze or retrieve elements from multiple sources of information. Interviews combined with various supporting project documents are a rich source of qualitative data, and the natural language processing capabilities of an AI system can be harnessed to query insights from repositories of disparate information. At AESARA, we recently found success with transcribing audio files (hours in length) to text (thousands of lines) using Microsoft Word, minimally cleaning the processed data with AI-prompted suggestions (also within Word), providing context using other project documents, and then prompting for specific information such as “What was the total count of interviewees who expressed they found additive value in topic A and related concepts (e.g., B, C, and D)?” (using Word’s built-in AI system, Copilot). As the capabilities of these systems evolve, we can anticipate deriving more value from AI in quickly and accurately describing specific insights or answering specific questions from large amounts of materials such as playbooks, SOPs, guidance documents, study reports, presentation slides, and more.

Contact us

Fill out the form below, and we will be in touch shortly.