I think - finally - I have traced down the problem: it's the GPU time.
Testing on NV32as_v4 resulted in a 47% performance improvement respect my own Ryzen 5 PC. Still not an ideal scaling considering has x5 the threads (but -30% the MHz) but this is a real improvement.
Armed with this result I tried on a more traditional D32as_v4 switching from hardware rendering to Arnold rendering (Arnold its a CPU renderer) and I got a 40% improvement vs my PC.
Testing with D64as_v4 resulted in 47% improvement, so scaling is really bad.
The best will be to find an instance with the highest CPU sustained clock with the same threads, so I will try also F type and L type that if I good remember are (slightly) better than D type, but at least I can benefit from some real improvement using Azure cloud.
In term of cost/performance I think might be not the best option: I will probably reach the same improvement upgrading my PC to a Ryzen 9 CPU (best balance probably the 700 USD 5900x, it's also just a drop in replacement in my current motherboard) considering the advantage in CPU clock of the latest 5000 series, but will first have more tests and next decide.
Maya Bifrost it's definitely not benefiting from multiple CPU so to need is to have the hightest single CPU performance both in therm of single and multithread, a very specific need so for sure not the focus of a public cloud infrastructure.