How to compare two blocks of code in Azure OpenAI.

Kyle Jenko 0 Reputation points

I want to build a tool that will tell me about the differences between two versions of the same program. It will be used to automatically tack and log changes between the older and newer versions of software. Is it possible to do this with embeddings? If not, how should I approach this task?

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
2,446 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Pramod Valavala 20,606 Reputation points Microsoft Employee

    @Kyle Jenko Like any solution using OpenAI, there are a few things that you will need to consider

    • Tokens: This is how you are charged for each request and an area where you could consider optimizing the amount of the code that you may use. Since you are primarily interested in changes in code, you should consider using just the diffs to generate the difference summary
    • Prompt: This is the main part of your solution that defines how the OpenAI model should respond to each prompt. The Prompt Engineering doc is a good place to start to understand how a good prompt can be authored. Ensure your prompt includes statements that
      • Defines what the input looks like
        • Provide examples that highlight what type of differences you are looking for
    • Fine Tuning (optional): This is a step that you could experiment with to see if it improves the output and ensures it is more consistent. While not strictly necessary, fine tuning a model would allow you to better control the output and reduce the number of tokens required in each request.

    Finally, the embeddings model alone would not really help in this use case directly if you are looking to document but one way it could be used for a more advanced solution is to index your codebase and search for semantically similar code, adding to the context of the prompt that could be used to ensure coding conventions are followed.

    This of course would be a more advanced use case that would be dependent on the quality of code (code comments would help in finding semantically similar code), verbosity of function/variable names, etc. You could also consider using static code analysis in addition to semantic search alone, which might make for better context as well.

    0 comments No comments