Data Science Toolkit - Logistic regression model API resources

2024-02-26

The purpose of this document is to serve as a temporary API resource for the Data Science Toolkit services that allow for the creation of Logistic Regression models, Lookup tables, and Hashed Table Predictors.

Descriptor list

Scalar descriptor

Type:

"scalar_descriptor"

Features:

appnexus``_audited
cookie_age
estimated_average_price
estimated_clearing_price
predicted_iab_view_rate
predicted_video_completion_rate
self_audited
size
creative_size
spend_protection
uniform
user_age

Note

The size descriptor is represented as a string in your models ("300x250", for instance), though converted to a scalar in our bidder. Any size is technically valid in our system, hence this feature being treated as a scalar rather than a categorical feature.

Example:

{
    "type": "scalar_descriptor",
    "feature_keyword": "cookie_age",
    "default_value": 0,               //Value returned by the descriptor if no match is found
    "initial_range_log": 4,           //Used for log bucketing, initial range
    "bucket_count_log_per_range": 2   //used for log bucketing, # of buckets per range
}

Segment descriptor

Type:

"segment_descriptor"

Features:

segment_value
segment_age
segment_presence

Example:

{
    "type": "segment_descriptor",
    "feature_keyword": "segment_age",
    "segment_id": 2,                  //ID of referenced segment
    "default_value": 0,               //Value returned by the descriptor if no match is found
    "initial_range_log": 4,           //Used for log bucketing, initial range
    "bucket_count_log_per_range": 2   //used for log bucketing, # of buckets per range
}

Frequency/Recency descriptor

Type:

"frequency_recency_descriptor"

Features:

frequency_life
frequency_daily
recency

Available object types for this descriptor:

advertiser
line_item
campaign

Example:

{
    "type": "frequency_recency_descriptor",
    "feature_keyword": 'frequency_life',
    "object_type": 'advertiser',
    "object_id": 1,                   //ID of the referenced advertiser, 
    "default_value": 0,               //Value returned by the descriptor if no match is found
    "initial_range_log": 4,           //Used for log bucketing, initial range
    "bucket_count_log_per_range": 2   //used for log bucketing, # of buckets per range
}

Categorical descriptor

Type:

"categorical_descriptor"

Features:

country
region
city
dma
postal_code
user_day
user_hour
os_family
os_extended
browser
language
user_gender
domain
ip_address
position
placement
placement_group
publisher
seller_member_id
supply_type
device_type
device_model
carrier
mobile_app
mobile_app_instance
mobile_app_bundle
appnexus``_intended_audience
seller_intended_audience
spend_protection
user_group_id
advertiser_id
brand_category
creative
inventory_url_id
media_type

Example:

{
    "type": "categorical_descriptor",
    "feature_keyword": "city"
}

Hash table descriptor

Type:

"hashed"

Example:

{
    "type": "hashed",
    "keys": [ array of one to 5 descriptors in this list:
        scalar_descriptor,
        custom_model_descriptor,
        freq_rec_descriptor,
        segment_descriptor,
        categorical_descriptor
    ],
    "hash_seeds": [42, 42, 42, 42, 42, 42],  //Seeds used when passed to Murmurhash3_x64_128 function, only first one is used for now, array is for planned future hash functions that need more than one seed
    "hash_id": <existing hash table ID>,
    "default_value": 0,                      //Value returned by the descriptor if no match is found in your hash table
    "hash_table_size_log": 20                //log of maximum value for a key of your table. Values larger than 2^hash_table_size_log will be rejected. Max for hash_table_size_log is 64 (no bucketing)
}

Hash tables

This endpoint is to submit a pre-hashed table. bucket_index0 and bucket_index1, each 64 bits long, are there to support hashing algorithms that produce long values as keys. Currently, we only support one hashing algorithm: MurmurHash3_x64_128, which will create two 64 bit integers but we only use the lower 64 bits of the hash.

Values in bucket_index0 must always be smaller than (2 ^ hash_table_size_log) or they will get rejected.

Currently, the values in bucket_index1 are ignored as this is to be used for future expansion. If a value is sent for bucket_index1, it must be 0. The parameter is optional.

Hash table keys

For each of your Hash Table keys, you will need a uint32 value. These values should be the ID of respective object that you are referencing from our system - domain_id, for instance, rather than the domain string value. These uint32 keys are then transformed into a byte array (little-endian), and hashed.

Python example

hash_bucket = (mmh3.hash64(bytes, seed)[0]) % table_size

Logit function

Model creation and update are similar, same request format is to be used for both.

Method	Endpoint	Purpose
`GET`	`/custom-model-logit`	Retrieve a Logit function associated with the parameters provided.
`PUT`	`/custom-model-logit`	Update an existing Logit function with data in the JSON payload.
`POST`	`/custom-model-logit`	Create a new Logit function from data in the JSON payload.
`DELETE`	`/custom-model-logit`	Delete an existing Logit function matching the parameters provided.

Parameters

Name	Data Type	Parameter Type	Required On	Example
`id`	int	Query	`GET`, `PUT`, `DELETE`	`?id=1`
`member_id`	int	Query	`PUT`, `POST`	`?member_id=1`

Example POST

{"custom-model-logit": {
  "member_id": 1,
  "beta0": 1.2,
  "max": 5,
  "min": 0,  //optional, will be set to 0 if not passed
  "name": "Sample LRE model",
  "offset": 0.3, //optional, will be set to 0 if not passed
  "scale": 1.5, //optional, will be set to 1 if not passed
  "predictors": [
    {
      "coefficient": 0.2,
      "feature_descriptor": {
        "bucket_count_log_per_range": 31,
        "default_value": 0,
        "feature_keyword": "size",
        "initial_range_log": 31,
        "type": "scalar_descriptor"
      },
      "type": "scalar"
    },
    {
      "coefficient": 0.3,
      "feature_descriptor": {
        "bucket_count_log_per_range": 31,
        "custom_model_id": 2,
        "default_value": 0,
        "feature_keyword": "custom_model",
        "initial_range_log": 31,
        "type": "custom_model_descriptor"
      },
      "type": "scalar"
    },
    {
      "coefficient": 0.4,
      "feature_descriptor": {
        "bucket_count_log_per_range": 31,
        "default_value": 0,
        "feature_keyword": "frequency_life",
        "initial_range_log": 31,
        "object_id": 1,
        "object_type": "advertiser",
        "type": "frequency_recency_descriptor"
      },
      "type": "scalar"
    },
    {
      "coefficient": 0.5,
      "feature_descriptor": {
        "bucket_count_log_per_range": 31,
        "default_value": 0,
        "feature_keyword": "segment_age",
        "initial_range_log": 31,
        "segment_id": 2,
        "type": "segment_descriptor"
      },
      "type": "scalar"
    },
    {
        "type": "hashed",
        "keys": [
            {
                "type": "categorical_descriptor",
                "feature_keyword": "advertiser_id"
            },
            {
                "type": "scalar_descriptor",
                "feature_keyword": "user_age",
                "default_value": 0
            }
        ],
        "hash_seeds": [42, 42, 42, 42, 42, 42],
        "default_value": 0,
        "hash_table_size_log": 20,
        "coefficients": [
            {"bucket_index0": 0, "bucket_index1": 0, "weight": 1.3},
            {"bucket_index0": 1, "bucket_index1": 0, "weight": 0.7},
            {"bucket_index0": 2, "bucket_index1": 0, "weight": 1.5},
            {"bucket_index0": 3, "bucket_index1": 0, "weight": 0.9}
        ]
    },
    {
        "type": "lookup",
        "default_value": 0.1,
        "features": [
            {
                "type": "categorical_descriptor",
                "feature_keyword": "advertiser_id"
            },
            {
                "type": "scalar_descriptor",
                "feature_keyword": "user_age",
                "default_value": 0
            }
        ],
        "coefficients": [
            {'weight': 1.1, 'key': [1, 1]},
            {'weight': 1.3, 'key': [2, 2]},
            {'weight': 1.2, 'key': [3, 3]},
        ]
    }
  ]
}}

Logistic Regression Custom Model Service

Share via

Data Science Toolkit - Logistic regression model API resources

Descriptor list

Scalar descriptor

Segment descriptor

Frequency/Recency descriptor

Categorical descriptor

Hash table descriptor

Hash tables

Logit function

Related topic

Additional resources