Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
The purpose of this document is to serve as a temporary API resource for the Data Science Toolkit services that allow for the creation of Logistic Regression models, Lookup tables, and Hashed Table Predictors.
Descriptor list
Scalar descriptor
Type:
"scalar_descriptor"
Features:
appnexus``_audited
cookie_age
estimated_average_price
estimated_clearing_price
predicted_iab_view_rate
predicted_video_completion_rate
self_audited
size
creative_size
spend_protection
uniform
user_age
Note
The size
descriptor is represented as a string in your models ("300x250", for instance), though converted to a scalar in our bidder. Any size is technically valid in our system, hence this feature being treated as a scalar rather than a categorical feature.
Example:
{
"type": "scalar_descriptor",
"feature_keyword": "cookie_age",
"default_value": 0, //Value returned by the descriptor if no match is found
"initial_range_log": 4, //Used for log bucketing, initial range
"bucket_count_log_per_range": 2 //used for log bucketing, # of buckets per range
}
Segment descriptor
Type:
"segment_descriptor"
Features:
segment_value
segment_age
segment_presence
Example:
{
"type": "segment_descriptor",
"feature_keyword": "segment_age",
"segment_id": 2, //ID of referenced segment
"default_value": 0, //Value returned by the descriptor if no match is found
"initial_range_log": 4, //Used for log bucketing, initial range
"bucket_count_log_per_range": 2 //used for log bucketing, # of buckets per range
}
Frequency/Recency descriptor
Type:
"frequency_recency_descriptor"
Features:
frequency_life
frequency_daily
recency
Available object types for this descriptor:
advertiser
line_item
campaign
Example:
{
"type": "frequency_recency_descriptor",
"feature_keyword": 'frequency_life',
"object_type": 'advertiser',
"object_id": 1, //ID of the referenced advertiser,
"default_value": 0, //Value returned by the descriptor if no match is found
"initial_range_log": 4, //Used for log bucketing, initial range
"bucket_count_log_per_range": 2 //used for log bucketing, # of buckets per range
}
Categorical descriptor
Type:
"categorical_descriptor"
Features:
country
region
city
dma
postal_code
user_day
user_hour
os_family
os_extended
browser
language
user_gender
domain
ip_address
position
placement
placement_group
publisher
seller_member_id
supply_type
device_type
device_model
carrier
mobile_app
mobile_app_instance
mobile_app_bundle
appnexus``_intended_audience
seller_intended_audience
spend_protection
user_group_id
advertiser_id
brand_category
creative
inventory_url_id
media_type
Example:
{
"type": "categorical_descriptor",
"feature_keyword": "city"
}
Hash table descriptor
Type:
"hashed"
Example:
{
"type": "hashed",
"keys": [ array of one to 5 descriptors in this list:
scalar_descriptor,
custom_model_descriptor,
freq_rec_descriptor,
segment_descriptor,
categorical_descriptor
],
"hash_seeds": [42, 42, 42, 42, 42, 42], //Seeds used when passed to Murmurhash3_x64_128 function, only first one is used for now, array is for planned future hash functions that need more than one seed
"hash_id": <existing hash table ID>,
"default_value": 0, //Value returned by the descriptor if no match is found in your hash table
"hash_table_size_log": 20 //log of maximum value for a key of your table. Values larger than 2^hash_table_size_log will be rejected. Max for hash_table_size_log is 64 (no bucketing)
}
Hash tables
This endpoint is to submit a pre-hashed table. bucket_index0
and bucket_index1
, each 64 bits long, are there to support hashing algorithms that produce long values as keys. Currently, we only support one hashing algorithm: MurmurHash3_x64_128
, which will create two 64 bit integers but we only use the lower 64 bits of the hash.
Values in bucket_index0
must always be smaller than (2 ^ hash_table_size_log)
or they will get rejected.
Currently, the values in bucket_index1
are ignored as this is to be used for future expansion. If a value is sent for bucket_index1
, it must be 0
. The parameter is optional.
Hash table keys
For each of your Hash Table keys, you will need a uint32 value. These values should be the ID of respective object that you are referencing from our system - domain_id
, for instance, rather than the domain string value. These uint32 keys are then transformed into a byte array (little-endian), and hashed.
Python example
hash_bucket = (mmh3.hash64(bytes, seed)[0]) % table_size
Logit function
Model creation and update are similar, same request format is to be used for both.
Method | Endpoint | Purpose |
---|---|---|
GET |
/custom-model-logit |
Retrieve a Logit function associated with the parameters provided. |
PUT |
/custom-model-logit |
Update an existing Logit function with data in the JSON payload. |
POST |
/custom-model-logit |
Create a new Logit function from data in the JSON payload. |
DELETE |
/custom-model-logit |
Delete an existing Logit function matching the parameters provided. |
Parameters
Name | Data Type | Parameter Type | Required On | Example |
---|---|---|---|---|
id |
int | Query | GET , PUT , DELETE |
?id=1 |
member_id |
int | Query | PUT , POST |
?member_id=1 |
Example POST
{"custom-model-logit": {
"member_id": 1,
"beta0": 1.2,
"max": 5,
"min": 0, //optional, will be set to 0 if not passed
"name": "Sample LRE model",
"offset": 0.3, //optional, will be set to 0 if not passed
"scale": 1.5, //optional, will be set to 1 if not passed
"predictors": [
{
"coefficient": 0.2,
"feature_descriptor": {
"bucket_count_log_per_range": 31,
"default_value": 0,
"feature_keyword": "size",
"initial_range_log": 31,
"type": "scalar_descriptor"
},
"type": "scalar"
},
{
"coefficient": 0.3,
"feature_descriptor": {
"bucket_count_log_per_range": 31,
"custom_model_id": 2,
"default_value": 0,
"feature_keyword": "custom_model",
"initial_range_log": 31,
"type": "custom_model_descriptor"
},
"type": "scalar"
},
{
"coefficient": 0.4,
"feature_descriptor": {
"bucket_count_log_per_range": 31,
"default_value": 0,
"feature_keyword": "frequency_life",
"initial_range_log": 31,
"object_id": 1,
"object_type": "advertiser",
"type": "frequency_recency_descriptor"
},
"type": "scalar"
},
{
"coefficient": 0.5,
"feature_descriptor": {
"bucket_count_log_per_range": 31,
"default_value": 0,
"feature_keyword": "segment_age",
"initial_range_log": 31,
"segment_id": 2,
"type": "segment_descriptor"
},
"type": "scalar"
},
{
"type": "hashed",
"keys": [
{
"type": "categorical_descriptor",
"feature_keyword": "advertiser_id"
},
{
"type": "scalar_descriptor",
"feature_keyword": "user_age",
"default_value": 0
}
],
"hash_seeds": [42, 42, 42, 42, 42, 42],
"default_value": 0,
"hash_table_size_log": 20,
"coefficients": [
{"bucket_index0": 0, "bucket_index1": 0, "weight": 1.3},
{"bucket_index0": 1, "bucket_index1": 0, "weight": 0.7},
{"bucket_index0": 2, "bucket_index1": 0, "weight": 1.5},
{"bucket_index0": 3, "bucket_index1": 0, "weight": 0.9}
]
},
{
"type": "lookup",
"default_value": 0.1,
"features": [
{
"type": "categorical_descriptor",
"feature_keyword": "advertiser_id"
},
{
"type": "scalar_descriptor",
"feature_keyword": "user_age",
"default_value": 0
}
],
"coefficients": [
{'weight': 1.1, 'key': [1, 1]},
{'weight': 1.3, 'key': [2, 2]},
{'weight': 1.2, 'key': [3, 3]},
]
}
]
}}