Creating a Recommendation Engine

David Thielen 3,121 Reputation points
2024-05-20T17:56:04.0433333+00:00

Hi all;

Sorry, this is a giant question asking about everything around this. But if I'm on the right track here, and you all can help me with these questions, then I think I'll have it.

Use Case

I have written an app that manages events for political parties and campaigns. The most valuable feature in this app is to present volunteers with prospective events ranked in the order that places the events they will find most interesting at the top. Just as Google became so compelling because it would put what you were looking for as the first entry of the results 90% of the time.

There are a couple of significant constraints for this use case. The first is the events expire (once they're happened, they're no longer available). A website showing concerts faces the same issue. I can use signups to events in the past for the modeling, but then have to recommend similar events that are in the future.

The second is I will have little to no "purchase history" with most volunteers. For a new volunteer it's incredibly important to recommend events they find interesting. Otherwise they leave. But they have no history of signing up. What I do have is a fair number of properties they have set that I can use as features to find NN volunteers and see what they signed up for.

There are two cases where this will be called. One is a search where there is a search text entry and I need to find the best matches. The other is a carousel showing upcoming events of interest - there's no search string for this case.

I need to walk before I run and this is all new to me. So I want to create a simple straightforward recommendation engine, not one that is using the latest/greatest algorithms. And to keep this to as few steps as possible. I also want to keep the costs low.

My application is written in Blazor and it runs on Azure. So I need to do all this in C# and I'd prefer to use Azure services.

Questions

Q1: I think I need to create vectors of every user and every event. Both have a fair number of boolean and numeric properties (that's straightforward). And for the text properties I need to create an embedding for each - correct?

Q2: Each event has 1 Interest and 0-N Tags. The correct way to do this is to have a feature for each Interest & Tag and the event sets it to true/false for each - correct? If I take this approach over half the features will be all these booleans - will that then weigh those values stronger than everything else? And am I making my model way too complex having these 50 boolean features?

Q3: Every event has text properties (name, description, parent organization name, etc.). I assume I convert each of these into an embedding - correct? Is there an example anywhere showing how to get embeddings from Azure using C#? I've only found Python examples.

Q4: Once I've generated these vectors, where do I save them?

Q5: To find recommendations via similar volunteers, is my approach to find the NN volunteers that have signed up for events (maybe the closest 5 - 10). And then from the events they signed up for, find the NN future events? And if so, is there an example of how to do this in C# calling Azure?

Q6: For the case of a search text string, how do I apply that to find the best match? I still want events they are going to like, but in that set, the subset that matches the search string. This text should match all of the embedded text features in the event vectors. And if so, is there an example of how to do this in C# calling Azure?

Q7: I think I need Hybrid Search because, along with the vectors, distance from the user and datetime (how soon is it) matter. Those are straightforward SQL where clauses and putting them in vectors would require generating event vectors every day and a set for every user. And if so, is there an example of how to do this in C# calling Azure?

And the giant question - is this the right approach? Am I missing anything?

thanks - dave

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,065 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,900 questions
Azure Startups
Azure Startups
Azure: A cloud computing platform and infrastructure for building, deploying and managing applications and services through a worldwide network of Microsoft-managed datacenters.Startups: Companies that are in their initial stages of business and typically developing a business model and seeking financing.
381 questions
{count} votes

Accepted answer
  1. Grmacjon-MSFT 18,451 Reputation points
    2024-05-21T23:38:09.6+00:00

    Hello @David Thielen We highly recommend looking into What are Intelligent Recommendations? - Microsoft Cloud for Retail | Microsoft Learn for what you're trying to accomplish.

    However, here are answers to your setup questions:

    Q1: I think I need to create vectors of every user and every event. Both have a fair number of boolean and numeric properties (that's straightforward). And for the text properties I need to create an embedding for each - correct?

    -Vectors are useful for semantic similarities and considered per field, not per entity, since some you would like to keep as keywords only. Good candidates for vectors: descriptions, options that require synonyms, etc. Good candidates for keyword: names, product IDs, IDs in general, org names, etc.

    Q2: Each event has 1 Interest and 0-N Tags. The correct way to do this is to have a feature for each Interest & Tag and the event sets it to true/false for each - correct? If I take this approach over half the features will be all these booleans - will that then weigh those values stronger than everything else? And am I making my model way too complex having these 50 Boolean features?

     -Not necessarily good or bad approach but depending on how you'd like to manage it and which fields you consider useful. You can just have field with a delimited list of all the possible tags it has, then you just can filter based on the tag (OData search.in function reference - Azure AI Search | Microsoft Learn)

    Q3: Every event has text properties (name, description, parent organization name, etc.). I assume I convert each of these into an embedding - correct? Is there an example anywhere showing how to get embeddings from Azure using C#? I've only found Python examples.

    -Please refer to Q1 for vector candidates.  Here are C# samples: azure-search-vector-samples/demo-dotnet at main · Azure/azure-search-vector-samples (github.com)

    Q4: Once I've generated these vectors, where do I save them?

    - In the AI Search index: Vector search - Azure AI Search | Microsoft Learn

    Q5: To find recommendations via similar volunteers, is my approach to find the NN volunteers that have signed up for events (maybe the closest 5 - 10). And then from the events they signed up for, find the NN future events? And if so, is there an example of how to do this in C# calling Azure? 

    -Here are C# samples for vectors: 

    azure-search-vector-samples/demo-dotnet at main · Azure/azure-search-vector-samples (github.com)

    Here is the documentation of how to work with time offsets: OData language overview - Azure AI Search | Microsoft Learn

    OData comparison operator reference - Azure AI Search | Microsoft Learn

    Q6: For the case of a search text string, how do I apply that to find the best match? I still want events they are going to like, but in that set, the subset that matches the search string. This text should match all of the embedded text features in the event vectors. And if so, is there an example of how to do this in C# calling Azure? 

    -You can check hybrid search for keyword + vector approach: Hybrid search - Azure AI Search | Microsoft Learn

    Also take a look at:

    Query types - Azure AI Search | Microsoft Learn

    Semantic ranking - Azure AI Search | Microsoft Learn

     

    Q7: I think I need Hybrid Search because, along with the vectors, distance from the user and datetime (how soon is it) matter. Those are straightforward SQL where clauses and putting them in vectors would require generating event vectors every day and a set for every user. And if so, is there an example of how to do this in C# calling Azure? 

    -Please see above documentations

    Hope that helps. Let us know if you have further questions.

    Best,

    Grace


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.