Recommendations for creating effective LLM prompts

Prompt engineering can significantly improve the quality of the LLM output. However, it can also be challenging, as it requires understanding the model's capabilities and limitations, as well as the domain and task at hand. Below are some recommendations for prompt engineering when using large language models.

  • Provide clear instructions – make sure the prompt is specific on what you want the model to generate, such as the expected format, length, and tone of the completion. For example, if you want the model to generate a summary of a news article, you can specify the number of sentences, the main points to cover, and the style of writing.

  • Consider delineating key components of the prompt or desired completion format with keywords, symbols, or tags - this can help the LLM more clearly identify meaningful information and disambiguate detailed instructions. You can use a combination of strategies in a single prompt, e.g. indicating the start and end of the completion with phrases like "Summary:" and "<|end|>". You can also instruct the LLM to look for context provided in a certain format, which can help with more complex prompt templating - e.g. "Summarize the text delimited with triple quotes. Use fewer than 25 words. '''<text to summarize>''' ".

  • Offer context – provide some context about the application, the domain, and the user's intention. This context can help the model to generate more relevant and coherent outputs. For example, if you want the model to generate code, you can include some import statements to focus the model on the relevant libraries and modules. You can also provide some comments or descriptions to explain the purpose and logic of the code. For conversational agents, you can provide some background information about the user, the topic, and the goal of the conversation.

  • Start with zero-shot learning (ZSL) and then few-shot learning (FSL) – ZSL can be useful when you have no or limited data for your task, or when you want to test the model's creativity and diversity. FSL can be useful when you have some data for your task, or when you want to guide the model's behavior and quality. You can try with different numbers of examples as needed, and select those examples dynamically based on the input or the feedback. For example, if you want the model to generate a product review, you can start with ZSL and see what the model produces. Provide a few examples of positive or negative reviews to steer the model's sentiment and style.

  • Try rearranging your prompt – the order of the elements in your prompt can affect the model's attention and influence. Few-shot examples or other information at the bottom of the prompt will bias the completion results more than earlier ones at the top. You can experiment with different arrangements of your prompt to see how the model responds and what works best for your task. For example, if you want the model to generate a headline for a news article, you can try putting the article text at the top or the bottom of the prompt. See how the model captures the main idea and the tone of the article.

  • Vary history length – in multi-turn user applications, such as chatbots, the length of the history included in the prompt can provide enough context to the model to generate natural and consistent responses. However, you also need to monitor the model's behavior and clear or limit the history when drift is noticed. Drift is when the model does one of the following actions:

    • Deviates from the topic or the goal of the conversation
    • Repeats itself
    • Contradicts itself

    You can do the following to identify and prevent drift:

    • Vary the history length based on the complexity and the coherence of the conversation
    • Topic modeling
    • Sentiment analysis
    • Repetition detection
  • Optimize few-shot selection – the quality and relevance of the few-shot examples you provide to the model can have a significant effect on the model's performance and generalization. You can train or fine-tune semantic similarity models on your data to select more relevant few-shot samples. Semantic similarity models can measure the similarity between two texts based on their meaning and content, rather than their surface form or syntax. You can use semantic similarity models to rank and filter your data, and select the most similar examples to your input or your task. For example, if you want the model to generate a recipe, you can use a semantic similarity model to select a few examples of recipes that have similar ingredients, cuisines, or steps to your input.

  • Instruct the model how to reason – you can improve the LLM’s ability to do some reasoning with techniques such as chain-of-thought and self-ask. For instance, chain-of-thoughts enables LLMs to decompose multi-step problems into intermediate steps. It enables the LLMs to solve complex reasoning problems that are not solvable with standard prompting methods.

  • Help the LLM more easily handle ambiguous context or queries by instructing it to rephrase questions in a way that makes it easier for it to answer clearly - the Rephrase and Respond paper provide starting prompts and helpful guidance on how this technique allows the LLM to reframe questions posed by humans in a way that is easier for LLMs to reason about, as well as provide suggestions to users on ways to reformulate queries to be more effective. This technique can be composed with other reasoning methods like chain-of-thought to improve reasoning performance and provide a more conversational and user-friendly chat experience.