If you purchased from the Microsoft Store, there is a 60 day return policy https://www.microsoft.com/en-us/store/b/returns - if you purchased elsewhere - the retailer may have a different policy. I referred you to https://www.microsoft.com/en-au/microsoftfeedback/ in your other question as the proper venue for this type of complaint/issue.
Why does the Surface Pro 11 (Snapdragon X Elite) drastically underperform on AI workloads compared to expectations and marketing—what are my options for compensation or a refund?
In-Depth Benchmark Report: Llama 3.2 3B Chat on Surface Pro 11 (Snapdragon X Elite)
Introduction & Motivation I tested the Qualcomm Snapdragon X Elite NPU’s claims for accelerating large language model (LLM) inference on my Surface Pro 11 (11th Gen) with 16GB RAM and 512GB SSD. Qualcomm markets this NPU as significantly faster than the CPU for AI workloads, but my benchmarks prove otherwise. This report documents my detailed testing and analysis.
Hardware and Software Setup
- Device: Microsoft Surface Pro 11 (11th Gen)
Processor: Qualcomm Snapdragon X Elite
RAM: 16 GB LPDDR5
Storage: 512 GB NVMe SSD
OS: Windows 11 ARM64
Software: AnythingLLM v1.8.4
Model Tested: Llama 3.2 3B Chat (8K context window, ~2.49 GB, 3 billion parameters)
Benchmark Test Procedure Two tests were run using the same prompt (“List benefits of renewable energy”) to compare NPU-accelerated inference against CPU-only inference. Metrics recorded: total response time and tokens per second throughput. Output quality was checked for completeness and coherence.
Raw Benchmark Results and Model Outputs
NPU Mode:
Total Time: 123.333 seconds
Tokens Generated: ~325
Throughput: 2.64 tokens/second
Output: Detailed and coherent benefits list.
**CPU-only Mode**:
Total Time: 10.277 seconds
Tokens Generated: ~265
Throughput: 25.88 tokens/second
Output: Concise and coherent benefits list.
Model Response Quality Observations Both outputs were accurate and coherent. However, the NPU mode took over 10 times longer, making it practically unusable. The CPU mode was far faster and perfectly adequate.
Technical Analysis of NPU Underperformance The Snapdragon X Elite NPU is clearly not optimized for transformer-based LLM inference despite Qualcomm’s aggressive marketing. Software runtimes and drivers fail to effectively leverage the NPU, causing significant overhead and bottlenecks. The CPU’s high-performance cores outperform the NPU by a large margin.
My Position and Demand This device was marketed as having cutting-edge AI acceleration thanks to the Snapdragon X Elite NPU. My testing proves this is false and misleading. The Surface Pro 11’s advertised AI capabilities are grossly overstated, resulting in a device that does not deliver on its promises for real AI workloads.
Given this failure, I expect Microsoft to acknowledge this serious shortcoming and provide appropriate compensation. A refund or equivalent remedy is warranted because the product does not meet the performance standards it was sold on.
This is not just a minor issue; it impacts the device’s core value proposition for developers and users relying on AI features. Silence or ignoring this problem is unacceptable.In-Depth Benchmark Report: Llama 3.2 3B Chat on Surface Pro 11 (Snapdragon X Elite)
Introduction & Motivation
I tested the Qualcomm Snapdragon X Elite NPU’s claims for accelerating large language model (LLM) inference on my Surface Pro 11 (11th Gen) with 16GB RAM and 512GB SSD. Qualcomm markets this NPU as significantly faster than the CPU for AI workloads, but my benchmarks prove otherwise. This report documents my detailed testing and analysis.
Hardware and Software Setup
Device: Microsoft Surface Pro 11 (11th Gen)
Processor: Qualcomm Snapdragon X Elite
RAM: 16 GB LPDDR5
Storage: 512 GB NVMe SSD
OS: Windows 11 ARM64
Software: AnythingLLM v1.8.4
Model Tested: Llama 3.2 3B Chat (8K context window, ~2.49 GB, 3 billion parameters)
Benchmark Test Procedure
Two tests were run using the same prompt (“List benefits of renewable energy”) to compare NPU-accelerated inference against CPU-only inference. Metrics recorded: total response time and tokens per second throughput. Output quality was checked for completeness and coherence.
Raw Benchmark Results and Model Outputs
NPU Mode:
Total Time: 123.333 seconds
Tokens Generated: ~325
Throughput: 2.64 tokens/second
Output: Detailed and coherent benefits list.
**CPU-only Mode**:
Total Time: 10.277 seconds
Tokens Generated: ~265
Throughput: 25.88 tokens/second
Output: Concise and coherent benefits list.
Model Response Quality Observations
Both outputs were accurate and coherent. However, the NPU mode took over 10 times longer, making it practically unusable. The CPU mode was far faster and perfectly adequate.
Technical Analysis of NPU Underperformance
The Snapdragon X Elite NPU is clearly not optimized for transformer-based LLM inference despite Qualcomm’s aggressive marketing. Software runtimes and drivers fail to effectively leverage the NPU, causing significant overhead and bottlenecks. The CPU’s high-performance cores outperform the NPU by a large margin.
My Position and Demand
This device was marketed as having cutting-edge AI acceleration thanks to the Snapdragon X Elite NPU. My testing proves this is false and misleading. The Surface Pro 11’s advertised AI capabilities are grossly overstated, resulting in a device that does not deliver on its promises for real AI workloads.
Given this failure, I expect Microsoft to acknowledge this serious shortcoming and provide appropriate compensation. A refund or equivalent remedy is warranted because the product does not meet the performance standards it was sold on.
This is not just a minor issue; it impacts the device’s core value proposition for developers and users relying on AI features. Silence or ignoring this problem is unacceptable.
Surface | Surface Pro | Performance and maintenance
Locked Question. You can vote on whether it's helpful, but you can't add comments or replies or follow the question.