Best Practice / Integration Architectural Pattern allowing On-premises Windows Apps (Vb6 / old .Net) to talk to Azure AL/ML Models

Question

Best Practice / Integration Architectural Pattern allowing On-premises Windows Apps (Vb6 / old .Net) to talk to Azure AL/ML Models

Kaushik Dutta 225

Hello Team,

I'm working in a new business inititative where we wants to connect our legacy windows applications (vb6 and .Net based Apps with sql server) to azure AI/ML Models (our own custom model in Azure ML Studio) to generate some forecast / lead time data for our business, a kind of prediction data, when user keyed-in some of the data points in the windows system. Business wants the data to be captured from the AI/ML in a near realtime way. What ar ethe best practices and integration areas to consider? What are the key components to choose in order to make this solution work with low-latency, reliable, scalable? Any guidelines will be helpful.

Anonymous

2026-03-04T09:56:41.53+00:00

Hi Kaushik Dutta

Did you get any chance to review the above response.

Thank you!

Answer accepted by question author

AI answer

For low-latency, reliable, and scalable integration between on-premises Windows apps (VB6 / .NET) and Azure ML models, use a service-oriented, event-driven, and secure architecture rather than calling the model directly from each client.

Key architectural patterns and components:

Expose the ML model via a managed API layer
- Deploy the Azure ML model as a web service (online endpoint) in Azure.
- Front it with API management to provide:
  - A stable, versioned API surface for legacy apps.
  - Centralized authentication, throttling, and monitoring.
- This aligns with the recommendation to use API management for secure and scalable access to AI services and data.
Use asynchronous, event-driven integration where possible
- For near real-time but not strictly synchronous scenarios, use asynchronous integration patterns:
  - Message queues or publish-subscribe topics for prediction requests and responses.
- Benefits:
  - Decouples legacy apps from the ML service.
  - Improves resiliency and scalability under load.
- Typical flow:
  1. Legacy app sends a prediction request message (with key data) to a queue/topic.
  2. A cloud service (Function, container, or web app) consumes the message, calls the Azure ML endpoint, and writes back the result to another queue or directly to SQL Server / a shared store.
  3. The legacy app either polls for the result or is notified via a lightweight mechanism.
Event-driven AI application design
- Treat the prediction as an event-driven operation: user input in the on-premises app triggers an event that leads to a prediction.
- Event-driven architecture is recommended for AI workloads to keep components decoupled and responsive.
MLOps and model lifecycle management
- Implement MLOps practices so that the integration remains stable as models evolve:
  - Use model versioning so the API layer can route to specific model versions while maintaining backward compatibility for legacy apps.
  - Monitor model performance and data drift; retrain and redeploy models without changing the client integration contract.
- This ensures compliance, reproducibility, and controlled rollouts.
Data security and access control
- Ensure data security for sensitive business data sent from on-premises apps:
  - Use TLS for all calls to Azure.
  - Enforce authentication and authorization at the API management layer.
  - Apply encryption and access controls on any intermediate storage used for requests/responses.
Hybrid integration considerations
- Treat the solution as a hybrid system: on-premises clients with cloud-based AI.
- Use established integration patterns for hybrid architectures (for example, message-based integration and clear service boundaries) to keep the legacy apps thin and the cloud side responsible for heavy compute.
Scalability and reliability practices
- Scale the ML endpoint and the API layer independently based on load.
- Use retry policies and circuit breakers in the integration layer (not in each VB6/.NET client) to handle transient failures.
- For high-volume scenarios, batch prediction requests where acceptable to reduce per-call overhead.
Client-side integration strategy (VB6 / .NET)
- Encapsulate all Azure calls in a small .NET integration library or service that:
  - Exposes simple methods (e.g., GetForecast(inputs)) to VB6/.NET apps.
  - Handles serialization, authentication, and communication with the API management endpoint or message queue.
- This minimizes changes in legacy code and centralizes integration logic.

Summary pattern:

On-premises Windows apps → call a local integration component → send request (sync or async) to an API/queue in Azure → integration service calls Azure ML endpoint → returns prediction via API/queue → legacy app reads and displays result.
Govern the ML lifecycle with MLOps, secure the data path, and use event-driven patterns for resilience and scalability.

References:

AI-generated content may be incorrect. Read our transparency notes for more information.

1 additional answer

Your answer

Anonymous

2026-03-04T09:56:41.53+00:00

Hi Kaushik Dutta

Did you get any chance to review the above response.

Thank you!

Answer 1

Hi Kaushik Dutta

The business objective is to connect legacy Windows‑based applications (VB6 and .NET, backed by SQL Server) with Azure‑hosted AI/ML models to generate forecasts, lead‑time predictions, or decision insights based on user‑keyed inputs. These predictions must be returned to the Windows system in near real time and captured reliably for downstream business use, auditability, and analytics.

The recommended enterprise pattern is API‑based synchronous inference. Legacy Windows applications should not directly call Azure ML SDKs or models. Instead, they should invoke a stable HTTP API layer, which in turn calls Azure ML online inference endpoints. This pattern decouples legacy client logic from ML lifecycle changes and allows the AI layer to evolve independently.

High‑Level Flow Overview :The end‑to‑end flow is: Windows (VB6 /.NET) Application → HTTPS API Layer → Azure ML Managed Online Endpoint → Model Inference → Response back to API → Response to Windows App. This ensures minimal changes in legacy code while enabling enterprise‑grade scalability, observability, and security.

Azure ML Inference Choice (Critical Decision): For near real‑time predictions, Azure ML Managed Online Endpoints are the correct and supported choice. They are designed for synchronous, low‑latency inference and provide built‑in autoscaling, health monitoring, versioned deployments, and traffic splitting. Batch endpoints or offline scoring pipelines are not suitable for interactive Windows applications.

API Layer as a Mandatory Integration Boundary: An intermediate API layer is strongly recommended. This can be implemented using Azure App Service (ASP.NET Web API), Azure Functions (HTTP‑triggered), or Azure Container Apps. The API layer handles input validation, schema versioning, authentication, retry logic, and response normalization, shielding the legacy client from ML‑specific complexities.

Low‑Latency Design Guidelines: To achieve near real‑time response, the API and Azure ML endpoint must be deployed in the same Azure region. Payloads should be compact and numeric where possible, avoiding heavy JSON or dynamic schema expansion. The ML model should be pre‑loaded (no cold model initialization per request), and the endpoint should maintain at least one warm replica at all times.

Compute and Scaling Considerations: CPU‑based SKUs are usually sufficient for forecasting and lead‑time prediction models. GPU should only be used if the model architecture explicitly requires it. Autoscaling should be configured conservatively to avoid cold starts, with minimum replicas set to ensure predictable latency under normal business load.

The synchronous path (API → ML endpoint) should use strict timeouts and limited retries for transient failures. For business‑critical workflows, an optional asynchronous fallback can be implemented using Azure Service Bus or Storage Queues. In this pattern, if real‑time inference fails, the request is queued and processed asynchronously, ensuring no data loss.

Security and Identity Best Practices: Legacy Windows applications should authenticate to the API layer using client certificates, Entra ID (for .NET apps), or API keys as a last resort. The API layer should authenticate to Azure ML using Managed Identity, never hard‑coded keys. Direct exposure of Azure ML endpoints to VB6 or desktop clients is not recommended.

SQL Server Integration and Data Capture: Predictions should be treated as business records, not transient responses. Store prediction requests and responses in SQL Server with metadata such as request ID, model version, timestamp, and confidence scores. This enables traceability, audit compliance, model comparison, and post‑hoc analysis.

End‑to‑end observability is critical. Application Insights should capture API latency, error rates, and payload validation issues. Azure ML metrics should be monitored for endpoint health, scaling events, and inference failures. Centralized Log Analytics enables correlation between user requests, predictions, and system behavior.

The architecture supports independent horizontal scaling of the API layer and ML inference layer. Azure ML traffic‑splitting allows new model versions to be deployed safely using blue‑green or canary strategies without changing client code. This is essential for continuous model improvement.

Avoid calling ML models directly from VB6 or WinForms applications. Do not embed ML SDKs, secrets, or model logic in desktop clients. Avoid long‑running (>5 seconds) synchronous inference. Do not reuse training pipelines for online inference or perform heavy feature engineering at request time.

VB6 /.NET Application → HTTPS → API Layer (App Service / Functions) → Managed Identity → Azure ML Online Endpoint → Forecast Model → Response → SQL Server Persistence.

For batch forecasting, use Azure Data Factory or Azure ML Batch Endpoints. For extremely high request volumes, consider AKS‑backed online endpoints. For disconnected or unreliable clients, use queue‑based asynchronous inference with eventual consistency.

Reference list

Architecture best practices for Azure Machine Learning https://learn.microsoft.com/azure/well-architected/service-guides/azure-machine-learning
Many-models architecture approach to scale ML models https://learn.microsoft.com/azure/architecture/ai-ml/idea/many-models-machine-learning-azure-machine-learning
Consume an Azure ML model as a web service (C#, Go, Java, Python, Power BI) https://learn.microsoft.com/azure/machine-learning/how-to-consume-web-service
Execute Azure ML pipelines in Azure Data Factory (batch scenarios) https://learn.microsoft.com/azure/data-factory/transform-data-machine-learning-service
Azure Machine Learning + Event Grid + Logic Apps integration https://learn.microsoft.com/azure/machine-learning/how-to-use-event-grid
Connecting Azure ML to on-prem SQL Server (blog) https://www.sqlservercentral.com/blogs/using-azure-machine-learning-with-an-on-premises-database

Thank you.

Share via

Best Practice / Integration Architectural Pattern allowing On-premises Windows Apps (Vb6 / old .Net) to talk to Azure AL/ML Models

1 additional answer

Your answer