Catalog data entities
This article provides guidance on how to configure catalog data entities in the Intelligent Recommendations data contract.
Data entities review
A data entity is a set of one or more data text files, each having a list of columns (also called attributes) and rows containing the actual data values.
Intelligent Recommendations defines logical groups of data entities, each with its own purpose.
Note
Data entities are optional, unless explicitly stated otherwise, which means that their data can be empty or missing.
Go to the full list of data entities
Introduction to catalog data entities
The catalog data entity represents all items and item variants that are candidates for appearing in recommendations results. Candidates are determined by applying availabilities to items, a date that tells the system to include an item in the recommendations results. Without a valid availability, items are ignored when results are returned.
Intelligent Recommendations supports the following features and scenarios:
Items with multiple variations (for example, a shirt in different sizes or colors) or no variations at all. We refer to these variations as variants. Items that have no variants are called standalone items, while items with at least one variant are called item masters.
Assigning filter values to items (for example, category, color, or size). Later, when querying for recommendations, you can filter by these filter values.
Assigning images to items.
Items may be available within different logical entities within the organization. Intelligent Recommendations supports two levels of hierarchies:
Channel: Items can be assigned to a channel, allowing Intelligent Recommendations to provide recommendations scoped to only products included in a specific channel. All items are automatically associated with the default channel, which uses the string 0 (zero) as the reserved channel ID.
Example:
In this example, the dataset contains only three items: X, Y, and Z. These three items are automatically assigned to the default channel (Channel=0). You can also assign these items to your own custom channels. For example, you can assign items X and Y to Channel=C1 and items Y and Z to Channel=C2.
So, when requesting recommendations, you can pass these other query parameters:
- No Channel parameter (equals default channel): All three items can be returned in the response
- Channel=0: Same as no parameter since this channel is the default
- Channel=C1: Only items that belong to C1 channel (items X and Y) may be returned in the response
- Channel=C2: Only items that belong to C2 channel (items Y and Z) may be returned in the response
- Channel=SomethingElse: Empty response because this channel wasn’t defined and no items are assigned to it
Catalog: A catalog is another, finer level of availability granularity. It allows you to define multiple catalogs within a channel and get recommendations for specific catalogs. Similar to a channel, all items are automatically associated with the default catalog within a channel, which uses the string 0 (zero) as the reserved catalog ID.
Example:
Continuing with the Channel example, you have items X, Y, and Z. You assigned items X and Y to channel C1, and they're automatically assigned to the default catalog in the channel (using Catalog=0). You can have further granularity by assigning these items to custom catalogs within the channel. Let’s assign item X to Catalog=A and items X and Y to Catalog=B.
So, when requesting recommendations, you can pass these other query parameters:
- Channel=C1: No catalog parameter, equals default catalog. Both items X and Y can be returned in the response.
- Channel=C1&Catalog=0: Same as no catalog parameter because this catalog is the default.
- Channel=C1&Catalog=A: Only items that belong to A catalog in channel C1 (item X only) may be returned in the response.
- Channel=C1&Catalog=B: Only items that belong to B catalog in channel C1 (items X and Y) may be returned in the response.
- Channel=C1&Catalog=SomethingElse: Empty response because this catalog wasn’t defined in channel C1 and no items are assigned to it.
Declare item availabilities:
- Availability start/end dates: Items outside of their availability time range will be excluded from the recommendation response.
- Fine granularity of availability: Define the start/end dates within specific channel/catalog IDs.
The catalog is composed of several data entities, all optional (depending on which features you want to use), and can remain empty (or missing) from the Intelligent Recommendations root folder. Follow the guidelines in Reco_ItemsAndVariants data entity, described as follows, if you don't want to provide this data entity.
List of Catalog data entities
The following data entities are part of the catalog:
- Items and variants
- Item categories
- Item and variant images
- Item and variant filters
- Item and variant availabilities
Go to the full list of data entities
Items and variants
Data entity name: Reco_ItemsAndVariants
Description: All items and item variants
Attributes:
Name | Data type | Mandatory | Default value | Invalid value behavior | Comments |
---|---|---|---|---|---|
ItemId | String(16) | Yes | Drop entry | See Required data entities per recommendations scenario for item ID. | |
ItemVariantId | String(16) | No | Drop entry | See Required data entities per recommendations scenario for item variant ID. | |
Title | String(256) | No | Trim value | Length limited to 256 characters. | |
Description | String(2048) | No | Trim value | Length limited to 2048 characters. | |
ReleaseDate | DateTime | No | 1970-01-01T00:00:00.000Z | Drop entry | See Required data entities per recommendations scenario for DateTime values. |
Guidelines:
Item variants inherit the attributes of their item master. For example, if an item variant has no title, it inherits the title of its item master (that is, the row with the same ItemId but with an empty ItemVariantId) if it exists.
ItemIds may have a one-to-many relationship with ItemVariantIds. It's possible that a singe ItemId is mapped to more than one ItemVariantId to capture the relationship from an item master to its item variants. It's possible to have a singe entry for a specific ItemId and ItemVariantId combination without specifying other ItemId to ItemVariantId combinations.
The ReleaseDate attribute represents the date at which the item was released (published, introduced) on the market. This attribute is different from the availability of an item (when an item/product can be returned in an API call), but ReleaseDate might be used in scenarios like New and Trending, which rely on dates for item ordering.
If this data entity is empty (or missing), Intelligent Recommendations will automatically use all items and item variants found in the Reco_Interactions data entity as the set of catalog items and assign each item and item variant with the default title, description, and release date. These items are considered as always available unless they were assigned explicit availabilities in the Reco_ItemAndVariantAvailabilities data entity.
Intelligent Recommendations can use the Title and Description attributes to provide textual-based recommendations. Because Intelligent Recommendations currently supports only the en-us locale for textual recommendations, providing the Title and Description in any other locale might degrade the textual recommendations quality.
Sample data:
Headers appear for convenience only and shouldn't be part of the actual data.
ItemId | ItemVariantId | Title | Description | ReleaseDate |
---|---|---|---|---|
Item1 | 2018-05-15T13:30:00.000Z | |||
Item1 | Item1Var1 | Black sunglasses | Black sunglasses for children | 2018-08-01T10:45:00.000Z |
Item1 | Item1Var2 | Brown sunglasses | Brown sunglasses for adults | |
Item2 | Glasses cleaning cloth | 2019-09-20T18:00:00.000Z | ||
Item3 | Item3Var1 |
Return to the list of catalog data entities
Item categories
Data entity name: Reco_ItemCategories
Description: all item categories.
Attributes:
Name | Data type | Mandatory | Default value | Invalid value behavior | Comments |
---|---|---|---|---|---|
ItemId | String(16) | Yes | Drop entry | See Required data entities per recommendations scenario for item ID. | |
Category | String(64) | Yes | Trim value | Length limited to 64 characters. |
Guidelines:
Each ItemId can have multiple categories, meaning it can appear in multiple entries in the data.
If your data is constructed using category trees, you need to supply the full set of categories (flattened) for each item.
Sample data:
Headers appear for convenience only and shouldn't be part of the actual data.
ItemId | Category |
---|---|
Item1 | Category1 |
Item1 | Category1_subCategoryX |
Item1 | Category1_subCategoryY |
Item2 | Category1_subCategoryX |
Return to the list of catalog data entities
Item and variant images
Data entity name: Reco_ItemAndVariantImages
Description: All item and item variant images
Attributes:
Name | Data type | Mandatory | Default value | Invalid value behavior | Comments |
---|---|---|---|---|---|
ItemId | String(16) | Yes | Drop entry | See Required data entities per recommendations scenario for item ID. | |
ItemVariantId | String(16) | No | Drop entry | See Required data entities per recommendations scenario for item variant ID. | |
ImageFullUrl | String(2048) | Yes | Drop entry | Must be an absolute URL. The URL should be properly encoded (using percent-encoding). Length limited to 2048 characters. | |
IsPrimaryImage | Bool | Yes | See guidelines | See Required data entities per recommendations scenario for Boolean values. |
Guidelines:
You must explicitly assign images to an ItemId and to each relevant ItemVariantId. Images assigned to an item aren't automatically assigned to all item variants and vice-versa. Images assigned to an item variant aren't automatically assigned to the item master of that variant.
If more than one primary image is specified for the same <ItemId, ItemVariantId> combination, only one of these images will be used for the visual recommendations inference step and the others are used only when training the entire visual model.
For any image that Intelligent Recommendations failed to access, the image URL is ignored and not used for the recommendation model.
If the IsPrimaryImage value is invalid, a value of false will be used (for example, nonprimary image).
If only nonprimary images were specified for an item or item variant, Intelligent Recommendations use one of the specified images as a primary image to still provide visual recommendations for that item or item variant.
There are two types of supported URLs:
- Publicly available HTTPS URLs: Doesn't require an Authorization header. This URL doesn't include URLs of Azure blobs that are publicly/anonymously available, which aren't supported.
- Azure blob storage URLs that require authentication: Aren't publicly/anonymously available. Permissions for reading the image blobs should be granted to Intelligent Recommendations, as explained in Deploy Intelligent Recommendations). Blob URLs must start with the prefix:
https://<StorageAccountName>.blob.core.windows.net/
.
The maximum supported size for a single image is 512 KB. Any image larger than 512 KB will be ignored by the system.
The ContentType for the image must have an image content type (it should start with image). This requirement applies to all images, both available via HTTPS and image blobs (via the blob ContentType property).
Sample data:
Headers appear for convenience only and shouldn't be part of the actual data.
ItemId | ItemVariantId | ImageFullUrl | IsPrimaryImage |
---|---|---|---|
Item1 | https://my.server.org/images/Item1_primary.jpg |
True | |
Item1 | https://my.server.org/images/Item1_secondary.jpg |
False | |
Item1 | Item1Var1 | https://my.server.org/images/Item1Var1.jpg |
True |
Item2 | https://my.server.org/images/Item2.jpg |
True |
Return to the list of catalog entity types
Item and variant filters
Data entity name: Reco_ItemAndVariantFilters
Description: Item and item variant properties used for runtime results filtering
Attributes:
Name | Data type | Mandatory | Default value | Invalid value behavior | Comments |
---|---|---|---|---|---|
ItemId | String(16) | Yes | Drop entry | See Required data entities per recommendations scenario for item ID. | |
ItemVariantId | String(16) | No | Drop entry | See Required data entities per recommendations scenario for item variant ID. | |
FilterName | String(64) | Yes | Trim value | ||
FilterValue | String(64) | Yes | Trim value | Length limited to 64 characters. | |
FilterType | String | Yes | Drop entry | Possible values include: Textual, Numeric. |
Guidelines:
Items and item variants have a parent-child relationship. This guideline means that Item variants will inherit the filters of their item master. For example, if the “Color” filter was declared for a certain ItemId, all item variants of the same ItemId get the same “Color” filter value, unless a different “Color” value was specified for the item variant.
Textual filter types support the "equals" filtering operation. For example, API requests can filter items with "Color"="Blue".
Numeric filter types support "range" filtering operations. For example, API requests can filter items with "Size" > 40.
You can assign multiple filter values to the same filter. For example, for the "Color" filter, you can provide multiple values, like "Green" and "Blue". In this example, the relevant item has two values for the "Color" filter and will be returned when you filter for either "Green" items or "Blue" items. To assign multiple values to the same filter, add an entry for each filter value you want to assign, using the same FilterName and FilterType values.
For each FilterName, an item variant can either inherit its parent filter values or override them. Merging the two isn't supported. By default, if the variant has no values assigned to a filter, it inherits the parent item filter values. If at least one filter value is assigned to a filter for an item variant, then override mode is switched on and only the variant filter values are effective (for the specific filter only). This value means that to achieve a "merge" behavior, the item variant must repeat its parent filter values. For example, an item supports two colors, Blue and Green. If a variant supports another color, Red, then the variant must list all three colors assigned to the variant ID: Blue, Green, and Red. In this example, the item variant has overridden the values for the "Color" filter, but it can still inherit the values for other filters from its parent item.
Entries with unsupported filter types will be ignored.
You can provide up to 20 different FilterName.
Providing multiple entries with the same FilterName but a different FilterType will fail the intelligent recommendations data ingestion process.
Items or item variants can have no filters specified. If you specify any filter in the API request, the items or item variants without the specified filter will be filtered out.
Sample data:
Headers appear for convenience only and shouldn't be part of the actual data.
ItemId | ItemVariantId | FilterName | FilterValue | FilterType |
---|---|---|---|---|
Item1 | Color | Red | Textual | |
Item1 | Item1Var1 | Color | Burgundy | Textual |
Item1 | Item1Var2 | Style | Rectangular | Textual |
Item2 | Size | 38 | Numeric | |
Item2 | Color | Blue | Textual | |
Item2 | Color | Green | Textual |
Return to the list of catalog entity types
Item and variant availabilities
Data entity name: Reco_ItemAndVariantAvailabilities
Description: All item and item variant availabilities
Attributes:
Name | Data type | Mandatory | Default value | Invalid value behavior | Comments |
---|---|---|---|---|---|
ItemId | String(16) | Yes | Drop entry | See Required data entities per recommendations scenario for item ID. | |
ItemVariantId | String(16) | No | Drop entry | See Required data entities per recommendations scenario for item variant ID. | |
StartDate | DateTime | No | 0001-01-01T00:00:00.000Z | See guidelines | See Required data entities per recommendations scenario for DateTime values. |
EndDate | DateTime | No | 9999-12-31T23:59:59.999Z | See guidelines | See Required data entities per recommendations scenario for DateTime values. |
Double Attribute | Double | No | A double attribute that can be used according to the business' needs and doesn't affect the modeling process. | ||
Channel | String (64) | No | 0 | Trim value | Length limited to 64 characters. |
Catalog | String (64) | No | 0 | Trim value | Length limited to 64 characters. |
Guidelines:
Reminder: availabilities tell the system what items or item variants are considered candidates for recommendations results.
The availability of an item variant is the union of availabilities of its item master with the availability of the item variant itself. Even item variants that have no entries inherit their item master availabilities.
An item that is missing from this data entity will be considered as always available in the default channel and catalog. More specifically, Intelligent Recommendations behave exactly as if that item appears in the data with default values for all attributes.
ItemIds have a one-to-many relationship with ItemVariantIds. While an ItemId isn't required to have an ItemVariantId, it's possible that more than one ItemVariantId can be mapped to a single ItemId. For example, you can add an entry for a specific ItemId and ItemVariantId combination without also explicitly adding another entry for the ItemId (and an empty ItemVariantId). When determining whether item variants have valid availabilities, only the specified item variants are considered as available (at the specified time intervals per each variant).
A catalog is relevant only in the context of a channel (Catalogs are a subset of channel). For example, catalog=MySale in channel=Europe is a different catalog than catalog=MySale in channel=Asia.
If your dataset contains multiple channels and catalogs, you need to add an entry for each relevant channel and catalog combination for each relevant item and item variant.
Availability dates are relevant only for the specific channel and catalog specified. If you want to specify the same availability dates for different channels and catalogs, you need to explicitly add an entry for each channel and catalog.
If there's an invalid value for either of the attributes StartDate or EndDate, the entire entry is modified to represent an unavailable item. Both StartDate and EndDate values are overridden with DateTime values that are in the past.
The 'Double Attribute' can be left empty.
Don't use "0" as a value for "Channel". This value is reserved for the system. Using "0" will result in a processing error.
Sample data:
Headers appear for convenience only and shouldn't be part of the actual data.
ItemId | ItemVariantId | StartDate | EndDate | Double Attribute | Channel | Catalog |
---|---|---|---|---|---|---|
Item1 | 2020-08-20T10:00:00.000Z | |||||
Item1 | Item1Var1 | 2020-08-01T12:00:00.000Z | ||||
Item2 | 2020-04-01T10:00:00.000Z | 2020-04-15T23:59:59.999Z | 15.0 | |||
Item2 | 2020-04-01T10:00:00.000Z | 9.76 | ||||
Item3 | 2020-05-01T12:00:00.000Z | Europe | MySale |
Return to the list of catalog entity types
See also
Data contract overview
Data entities mapping table
Interactions data entities
Reco configuration data entities
Opted-out users data entities
External lists data entities
Recommendations enrichment data entities
Image to item mapping data entities
Intelligent Recommendations API
Quick start guide: Set up and run Intelligent Recommendations with sample data