AksWebservice Class
Represents a machine learning model deployed as a web service endpoint on Azure Kubernetes Service.
A deployed service is created from a model, script, and associated files. The resulting web service is a load-balanced, HTTP endpoint with a REST API. You can send data to this API and receive the prediction returned by the model.
AksWebservice deploys a single service to one endpoint. To deploy multiple services to one endpoint, use the AksEndpoint class.
For more information, see Deploy a model to an Azure Kubernetes Service cluster.
Initialize the Webservice instance.
The Webservice constructor retrieves a cloud representation of a Webservice object associated with the provided workspace. It will return an instance of a child class corresponding to the specific type of the retrieved Webservice object.
- Inheritance
-
AksWebservice
Constructor
AksWebservice(workspace, name)
Parameters
Remarks
The recommended deployment pattern is to create a deployment configuration object with the
deploy_configuration
method and then use it with the deploy
method of the
Model class as shown below.
# Set the web service configuration (using default here)
aks_config = AksWebservice.deploy_configuration()
# # Enable token auth and disable (key) auth on the webservice
# aks_config = AksWebservice.deploy_configuration(token_auth_enabled=True, auth_enabled=False)
Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb
There are a number of ways to deploy a model as a webservice, including with the:
deploy
method of the Model for models already registered in the workspace.deploy_from_image
method of Webservice.deploy_from_model
method of Webservice for models already registered in the workspace. This method will create an image.deploy
method of the Webservice, which will register a model and create an image.
For information on working with webservices, see
The Variables section lists attributes of a local representation of the cloud AksWebservice object. These variables should be considered read-only. Changing their values will not be reflected in the corresponding cloud object.
Variables
- enable_app_insights
- bool
Whether or not AppInsights logging is enabled for the Webservice.
- autoscaler
- AutoScaler
The Autoscaler object for the Webservice.
- compute_name
- str
The name of the ComputeTarget that the Webservice is deployed to.
- container_resource_requirements
- ContainerResourceRequirements
The container resource requirements for the Webservice.
- liveness_probe_requirements
- LivenessProbeRequirements
The liveness probe requirements for the Webservice.
- data_collection
- DataCollection
The DataCollection object for the Webservice.
- max_concurrent_requests_per_container
- int
The maximum number of concurrent requests per container for the Webservice.
- max_request_wait_time
- int
The maximum request wait time for the Webservice, in milliseconds.
- num_replicas
- int
The number of replicas for the Webservice. Each replica corresponds to an AKS pod.
- scoring_timeout_ms
- int
The scoring timeout for the Webservice, in milliseconds.
- azureml.core.webservice.AksWebservice.scoring_uri
- str
The scoring endpoint for the Webservice
- is_default
- bool
If the Webservice is the default version for the parent AksEndpoint.
- traffic_percentile
- int
What percentage of traffic to route to the Webservice in the parent AksEndpoint.
- version_type
- VersionType
The version type for the Webservice in the parent AksEndpoint.
- token_auth_enabled
- bool
Whether or not token auth is enabled for the Webservice.
- environment
- Environment
The Environment object that was used to create the Webservice.
A list of Models deployed to the Webservice.
- deployment_status
- str
The deployment status of the Webservice.
- namespace
- str
The AKS namespace of the Webservice.
- azureml.core.webservice.AksWebservice.swagger_uri
- str
The swagger endpoint for the Webservice.
Methods
add_properties |
Add key value pairs to this Webservice's properties dictionary. |
add_tags |
Add key value pairs to this Webservice's tags dictionary. Raises a WebserviceException. |
deploy_configuration |
Create a configuration object for deploying to an AKS compute target. |
get_access_token |
Retrieve auth token for this Webservice. |
get_token |
DEPRECATED. Use Retrieve auth token for this Webservice. |
remove_tags |
Remove the specified keys from this Webservice's dictionary of tags. |
run |
Call this Webservice with the provided input. |
serialize |
Convert this Webservice into a JSON serialized dictionary. |
update |
Update the Webservice with provided properties. Values left as None will remain unchanged in this Webservice. |
add_properties
Add key value pairs to this Webservice's properties dictionary.
add_properties(properties)
Parameters
add_tags
Add key value pairs to this Webservice's tags dictionary.
Raises a WebserviceException.
add_tags(tags)
Parameters
Exceptions
deploy_configuration
Create a configuration object for deploying to an AKS compute target.
static deploy_configuration(autoscale_enabled=None, autoscale_min_replicas=None, autoscale_max_replicas=None, autoscale_refresh_seconds=None, autoscale_target_utilization=None, collect_model_data=None, auth_enabled=None, cpu_cores=None, memory_gb=None, enable_app_insights=None, scoring_timeout_ms=None, replica_max_concurrent_requests=None, max_request_wait_time=None, num_replicas=None, primary_key=None, secondary_key=None, tags=None, properties=None, description=None, gpu_cores=None, period_seconds=None, initial_delay_seconds=None, timeout_seconds=None, success_threshold=None, failure_threshold=None, namespace=None, token_auth_enabled=None, compute_target_name=None, cpu_cores_limit=None, memory_gb_limit=None, blobfuse_enabled=None)
Parameters
- autoscale_enabled
- bool
Whether or not to enable autoscaling for this Webservice. Defaults to True if num_replicas is None.
- autoscale_min_replicas
- int
The minimum number of containers to use when autoscaling this Webservice. Defaults to 1.
- autoscale_max_replicas
- int
The maximum number of containers to use when autoscaling this Webservice. Defaults to 10.
- autoscale_refresh_seconds
- int
How often the autoscaler should attempt to scale this Webservice. Defaults to 1.
- autoscale_target_utilization
- int
The target utilization (in percent out of 100) the autoscaler should attempt to maintain for this Webservice. Defaults to 70.
- collect_model_data
- bool
Whether or not to enable model data collection for this Webservice. Defaults to False.
- auth_enabled
- bool
Whether or not to enable key auth for this Webservice. Defaults to True.
- cpu_cores
- float
The number of cpu cores to allocate for this Webservice. Can be a decimal. Defaults to 0.1. Corresponds to the pod core request, not the limit, in Azure Kubernetes Service.
- memory_gb
- float
The amount of memory (in GB) to allocate for this Webservice. Can be a decimal. Defaults to 0.5. Corresponds to the pod memory request, not the limit, in Azure Kubernetes Service.
- enable_app_insights
- bool
Whether or not to enable Application Insights logging for this Webservice. Defaults to False.
- scoring_timeout_ms
- int
A timeout to enforce for scoring calls to this Webservice. Defaults to 60000.
- replica_max_concurrent_requests
- int
The number of maximum concurrent requests per replica to allow for this Webservice. Defaults to 1. Do not change this setting from the default value of 1 unless instructed by Microsoft Technical Support or a member of Azure Machine Learning team.
- max_request_wait_time
- int
The maximum amount of time a request will stay in the queue (in milliseconds) before returning a 503 error. Defaults to 500.
- num_replicas
- int
The number of containers to allocate for this Webservice. No default, if this parameter is not set then the autoscaler is enabled by default.
Dictionary of key value properties to give this Webservice. These properties cannot be changed after deployment, however new key value pairs can be added.
- gpu_cores
- int
The number of GPU cores to allocate for this Webservice. Defaults to 0.
- period_seconds
- int
How often (in seconds) to perform the liveness probe. Default to 10 seconds. Minimum value is 1.
- initial_delay_seconds
- int
The number of seconds after the container has started before liveness probes are initiated. Defaults to 310.
- timeout_seconds
- int
The number of seconds after which the liveness probe times out. Defaults to 2 second. Minimum value is 1.
- success_threshold
- int
The minimum consecutive successes for the liveness probe to be considered successful after having failed. Defaults to 1. Minimum value is 1.
- failure_threshold
- int
When a Pod starts and the liveness probe fails, Kubernetes will try failureThreshold times before giving up. Defaults to 3. Minimum value is 1.
- namespace
- str
The Kubernetes namespace in which to deploy this Webservice: up to 63 lowercase alphanumeric ('a'-'z', '0'-'9') and hyphen ('-') characters. The first and last characters cannot be hyphens.
- token_auth_enabled
- bool
Whether or not to enable Token auth for this Webservice. If this is enabled, users can access this Webservice by fetching an access token using their Azure Active Directory credentials. Defaults to False.
- cpu_cores_limit
- float
The max number of cpu cores this Webservice is allowed to use. Can be a decimal.
- memory_gb_limit
- float
The max amount of memory (in GB) this Webservice is allowed to use. Can be a decimal.
- blobfuse_enabled
- bool
Whether or not to enable blobfuse for model downloading for this Webservice. Defaults to True
Returns
A configuration object to use when deploying a AksWebservice.
Return type
Exceptions
get_access_token
Retrieve auth token for this Webservice.
get_access_token()
Returns
An object describing the auth token for this Webservice.
Return type
Exceptions
get_token
DEPRECATED. Use get_access_token
method instead.
Retrieve auth token for this Webservice.
get_token()
Returns
The auth token for this Webservice and when to refresh it.
Return type
Exceptions
remove_tags
Remove the specified keys from this Webservice's dictionary of tags.
remove_tags(tags)
Parameters
run
Call this Webservice with the provided input.
run(input_data)
Parameters
- input_data
- <xref:varies>
The input to call the Webservice with
Returns
The result of calling the Webservice
Return type
Exceptions
serialize
Convert this Webservice into a JSON serialized dictionary.
serialize()
Returns
The JSON representation of this Webservice.
Return type
update
Update the Webservice with provided properties.
Values left as None will remain unchanged in this Webservice.
update(image=None, autoscale_enabled=None, autoscale_min_replicas=None, autoscale_max_replicas=None, autoscale_refresh_seconds=None, autoscale_target_utilization=None, collect_model_data=None, auth_enabled=None, cpu_cores=None, memory_gb=None, enable_app_insights=None, scoring_timeout_ms=None, replica_max_concurrent_requests=None, max_request_wait_time=None, num_replicas=None, tags=None, properties=None, description=None, models=None, inference_config=None, gpu_cores=None, period_seconds=None, initial_delay_seconds=None, timeout_seconds=None, success_threshold=None, failure_threshold=None, namespace=None, token_auth_enabled=None, cpu_cores_limit=None, memory_gb_limit=None, **kwargs)
Parameters
- autoscale_min_replicas
- int
The minimum number of containers to use when autoscaling this Webservice
- autoscale_max_replicas
- int
The maximum number of containers to use when autoscaling this Webservice
- autoscale_refresh_seconds
- int
How often the autoscaler should attempt to scale this Webservice
- autoscale_target_utilization
- int
The target utilization (in percent out of 100) the autoscaler should attempt to maintain for this Webservice
- collect_model_data
- bool
Enable or disable model data collection for this Webservice
- cpu_cores
- float
The number of cpu cores to allocate for this Webservice. Can be a decimal
- memory_gb
- float
The amount of memory (in GB) to allocate for this Webservice. Can be a decimal
- enable_app_insights
- bool
Whether or not to enable Application Insights logging for this Webservice
- scoring_timeout_ms
- int
A timeout to enforce for scoring calls to this Webservice
- replica_max_concurrent_requests
- int
The number of maximum concurrent requests per replica to allow for this Webservice.
- max_request_wait_time
- int
The maximum amount of time a request will stay in the queue (in milliseconds) before returning a 503 error
Dictionary of key value tags to give this Webservice. Will replace existing tags.
Dictionary of key value properties to add to existing properties dictionary
- inference_config
- InferenceConfig
An InferenceConfig object used to provide the required model deployment properties.
- period_seconds
- int
How often (in seconds) to perform the liveness probe. Default to 10 seconds. Minimum value is 1.
- initial_delay_seconds
- int
Number of seconds after the container has started before liveness probes are initiated.
- timeout_seconds
- int
Number of seconds after which the liveness probe times out. Defaults to 1 second. Minimum value is 1.
- success_threshold
- int
Minimum consecutive successes for the liveness probe to be considered successful after having failed. Defaults to 1. Minimum value is 1.
- failure_threshold
- int
When a Pod starts and the liveness probe fails, Kubernetes will try failureThreshold times before giving up. Defaults to 3. Minimum value is 1.
- namespace
- str
The Kubernetes namespace in which to deploy this Webservice: up to 63 lowercase alphanumeric ('a'-'z', '0'-'9') and hyphen ('-') characters. The first and last characters cannot be hyphens.
- token_auth_enabled
- bool
Whether or not to enable Token auth for this Webservice. If this is enabled, users can access this Webservice by fetching access token using their Azure Active Directory credentials. Defaults to False
- cpu_cores_limit
- float
The max number of cpu cores this Webservice is allowed to use. Can be a decimal.
- memory_gb_limit
- float
The max amount of memory (in GB) this Webservice is allowed to use. Can be a decimal.
- kwargs
- <xref:varies>
include params to support migrating AKS web service to Kubernetes online endpoint and deployment. is_migration=True|False, compute_target=.
Exceptions
Feedback
Submit and view feedback for