Azure performance issues using Maya Bifrost

davide445 1 Reputation point
2021-06-01T05:43:11.383+00:00

Starting from a local 6c12t machine needing 6 hours running a 48 frame simulation on Maya Bifrost, I decided try speeding up things on Azure.

So far installed and tested Maya Bifrost on 32, 64 and even 120 vCPU instances of D v4, F v2, L v2, HB series. None of these instances does have a dedicated GPU, but I was able to have Maya UI working and Bifrost sim is totally on CPU so that in theory I will have a strong performance boost.

So far I got in the best case a 10% speedup and on average a -50% slowdown. Storage seems not to be an issue (generating only 1MB images a frame as output), cpu is working even if stays almost idle for some time between the various sim computation.

Setting up now on NVv4 GPU powered instances so to check if having an actual dedicated GPU will make any difference, in theory no since once launched the sim is all CPU time and just a few graphics activity for the final image rendering.

Wanted to ask how I can troubleshoot so find the bottleneck, or if there is any other variable I'm not considering (i.e. vm overhead respect a dedicated machine).

Azure Virtual Machines
Azure Virtual Machines
An Azure service that is used to provision Windows and Linux virtual machines.
9,035 questions
0 comments No comments
{count} votes

5 answers

Sort by: Most helpful
  1. davide445 116 Reputation points
    2021-06-02T04:59:15.707+00:00

    I think - finally - I have traced down the problem: it's the GPU time.

    Testing on NV32as_v4 resulted in a 47% performance improvement respect my own Ryzen 5 PC. Still not an ideal scaling considering has x5 the threads (but -30% the MHz) but this is a real improvement.

    Armed with this result I tried on a more traditional D32as_v4 switching from hardware rendering to Arnold rendering (Arnold its a CPU renderer) and I got a 40% improvement vs my PC.
    Testing with D64as_v4 resulted in 47% improvement, so scaling is really bad.

    The best will be to find an instance with the highest CPU sustained clock with the same threads, so I will try also F type and L type that if I good remember are (slightly) better than D type, but at least I can benefit from some real improvement using Azure cloud.

    In term of cost/performance I think might be not the best option: I will probably reach the same improvement upgrading my PC to a Ryzen 9 CPU (best balance probably the 700 USD 5900x, it's also just a drop in replacement in my current motherboard) considering the advantage in CPU clock of the latest 5000 series, but will first have more tests and next decide.

    Maya Bifrost it's definitely not benefiting from multiple CPU so to need is to have the hightest single CPU performance both in therm of single and multithread, a very specific need so for sure not the focus of a public cloud infrastructure.

    1 person found this answer helpful.

  2. davide445 116 Reputation points
    2021-06-02T08:28:11.607+00:00

    Looking at all the series CPU specs I can't find a better one than the simple D_v4 series, using Xeon Platinum 8272CL with 3.2Ghz it's far the hightest frequency I can get with enough treads.

    The Standard H16 using Xeon E5-2667 v3 has also same base frequency but only 8 threads, same the DCsv2 using Xeon E-2288G, all the others are 2.7 or 2.5Ghz base clock, prob too penalizing.

    Problem is Azure pricing table state the Xeon Platinum 8272CL run also at 2.5Ghz, so probably I can get on all about the same performance, differing only on the price.

    So pricing the D32as_v4 I'm using will need to execute some 150-200 sims to justify my PC CPU upgrade costs, even if with a 5900x i can get probably a 70% improvement in performances vs a 40% one I will stick with Azure cloud so far as most cost effective and flexible solution in the near term.

    1 person found this answer helpful.
    0 comments No comments

  3. davide445 116 Reputation points
    2021-06-02T22:15:05.63+00:00

    Making some tests in the end I find the best performer is the Fs_v2 series that's also the less expensive, only marginally outperformed from the Dds_v4 that has the same CPU.
    Seems with 32 vCPU you got already 80% of the max benefit, will probably use the 48 one just to at least half my simulation times.
    Attached a graph with the results for the F series in term of performance improvement respect my R5 CPU
    101796-fs-v2.jpg

    1 person found this answer helpful.

  4. davide445 116 Reputation points
    2021-06-06T07:24:23.34+00:00

    A pity I can't complete the curve with a 24 vCPU point, since from another researcher experience seems diminishing returns start after 24 threads for the specific algorithm used in Maya Bifrost.
    But didn't find any suitable VM with 24 vCPU except the N series not really the one I need.
    Maybe the other following this thread can test on their machines if they have at least a 12 core cpu and post the results.

    0 comments No comments

  5. kobulloc-MSFT 26,811 Reputation points Microsoft Employee Moderator
    2021-06-01T23:39:46.567+00:00

    Identifying the bottleneck
    The best place for determining the simulation bottleneck is likely going to be the Maya Bitfrost forums, but I'll help as best I can on the Azure and hardware side of things. In terms of Azure bottlenecks nothing immediately comes to mind as you are working on a single VM and it doesn't sound like file read or write times are significant.

    Increasing VM performance
    While I don't have a copy of Maya that I can use to compare simulation times, I can help map out the different VM hardware that is available.

    It's unclear how much Maya Bitfrost benefits from multiple CPUs (link) so I would experiment with high performance single core and high performance multiple core configurations with varying amounts of RAM depending on the complexity of your project. I agree, storage doesn't appear to be a major concern with 1MB image output unless the read time is important.

    https://learn.microsoft.com/en-us/azure/virtual-machines/sizes

    Using what you've experimented with so far:

    • Dv4-series: General purpose - balanced CPU-to-memory ratio (remote storage only)
    • Fsv2-series: Compute optimized - high CPU-to-memory ratio (SSD temp storage)
    • Lsv2-series: Storage optimized - high disk throughput and IO (high memory and IOPs).
    • HB-series: High performance compute - our fastest and most powerful CPU virtual machines (HB recommended for memory bandwidth used in fluid dynamics - see full list).

    I think you're on the right path experimenting with Fsv2 and HB although I'm curious if the GPU series offer any improvement (especially the NC or NV series that you are trying now).

    Benefits of different hardware
    I am by no means a Maya expert but my understanding is that the following hardware is useful for specific purposes (this may contain errors):

    CPU: Used for a majority of tasks including simulation, however some tasks use a single thread.
    GPU: Primarily affects the viewport FPS.
    RAM: Stores information on a per frame basis.
    Storage: Read and write speed affects load and save times.

    Additional reading:

    Software and settings increases
    On the Maya side of things it sounds like there are a number of settings that can be adjusted (max time steps, max transport steps) that may give you significantly better performance without sacrificing too much in terms of output. The Maya forums and the Maya Bitfrost forums will be a good place to check those settings.

    I hope that helps! Please update us with your results as I'm curious what the best configuration ends up being.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.