How to increase Azure RTOS GUIX performance

S.A.M 56 Reputation points
2021-07-06T11:51:16.773+00:00

Hello,

I designed my demo with Azure RTOS GUIX, And running it on a STM32H750-Disco board;
The problem is, I'm getting around 15-18 fps from GUIX which is not ideal, how can I improve this?

Some extra information that would help:

  • I have 4 circular gauges in my demo,
  • Code is executing from External flash(QSPI) + pixel maps are in that section as well,
  • I tried moving gauge's pixel map to SDRAM/Internal Flash so far, but didn't effect the performance that much,

Would be appreciated if you help, I'll try to answer ASAP if anybody had a question.
Thanks in advance,
Best regards.

Azure RTOS
Azure RTOS
An Azure embedded development suite including a small but powerful operating system for resource-constrained devices.
324 questions
{count} votes

Accepted answer
  1. Ken Maxwell 706 Reputation points Microsoft Employee
    2021-09-14T14:36:42.19+00:00

    Hello @S.A.M ,

    Regarding the flickering, that sounds a lot like a memory bandwidth issue. it sounds like the DMA2D isn't getting enough bandwidth to perform the background rendering operation. I'm wondering if something is fundamentally wrong, like the main CPU clock is not configured correctly, or the memory cache is not enabled?

    My colleague put some time into further optimizing this for you. I have copied her reply below:

    ----------

    I modified gx_circular_gauge_background_draw to make it work for multiple circular gauge case, see attachment.
    Changes:
    Call _gx_icon_background_draw after call gx_display_driver_callback_assign to set wait function, so that needle rotation can be process while the hardware is drawing circular gauge background.

    I also created a demo with 4 circular gauges for STM32H753I-EVAL board, see attachment.

    1. Use LCD refresh interrupt as timer source to increase frame rate of the animation.
    2. Replace gx_circular_gauge_background_draw with the modified version.

    Performance:

    1. gx_display_driver_callback_assign is NOT set, frame rate is around 49.4 frames/second
    2. gx_display_driver_callback_assign is set, frame rate is around 55.6 frames/second.

    ----------

    She found that the order of setting the display driver callback and drawing the background image was wrong (a bug), and fixed it in the attached gx_circular_gauge_background_draw.c. However as you can see, the difference between running the needle rotation code in parallel with background drawing is not dramatic, 49 fps to 55 fps. Either case is much better than the performance you are reporting, which again makes me wonder if something very fundamental is incorrect in your CPU/memory configuration. I copied links below for the modified source file and the example project which she created for STM32H753I eval board. This change will be in our next GUIX update release.

    Modified GUIX lib source file: https://expresslogic.sharefile.com/d-s3b26ec2edbc64b4787859810230b3a85
    Circular gauge test project: https://expresslogic.sharefile.com/d-sa9bdeffed18a47faa8a0c09aac531191

    Best Regards,

    Ken

    2 people found this answer helpful.
    0 comments No comments

6 additional answers

Sort by: Most helpful
  1. Ken Maxwell 706 Reputation points Microsoft Employee
    2021-07-14T13:08:07.77+00:00

    A couple of basic things to look at:

    1) Is the ChromeArt graphics accelerator enabled? This is enabled/disabled via a #define in the display driver. Make sure it is enabled.
    2) Is the timer running fast enough? The default setting is usually 20 ms, which would give you 50 FPS if the animation interval is set to 1 tick. You can re-define the timer source if you want to, for example you can drive the GUIX timer based on vertical sync interrupt from the LCD controller, to give you a faster and higher resolution timer upon which the animations are based.
    3) Turn off RLE encoding (i.e. compressed) option for the pixelmaps that need to be fast. RLE encoding saves some space, but it also adds some time and prevents using the ChromeArt engine for pixelmap rendering. Turn off this option for the gauge pixelmaps and see what effect this has.
    4) Are you manually invoking the canvas refresh, or just setting the gauge parameters and allowing the gauge to invalidate and refresh naturally? If you force things, which is allowed, you can accidently cause extra buffer toggle operations which slows things down. Best just to use the gauge API and let it refresh itself as needed.

    Let me know if you need any more details, and if these suggestions help.

    Best Regards,

    Ken

    2 people found this answer helpful.

  2. Ken Maxwell 706 Reputation points Microsoft Employee
    2021-07-15T12:46:49.543+00:00

    Can you tell me which GUIX source code version you are using? We have done some work on making it easy to use an external timer source, but that work is very recent so I need to know your GUIX release to give you the correct advice here.

    For images, if you use the "Compress Output" option the image is saved in your resource file as an RLE encoding pixelmap, which is not compatible with DMA2D. At runtime, the driver checks the image format and if the format is not compatible with DMA2D the image is rendered using our generic software rendering.

    In addition, if the pixelmap is encoded with an alpha channel and 16 bpp format, this format is also not compatible with DMA2D (GUIX saves the alpha channel information in an auxiliary data chunk when configured for 16 bpp 565 format with alpha). The ST 565 format display driver looks to me like it supports images saved in ARGB 8:8:8:8 format and sending those through DMA2D, but I just tried it and GUIX Studio won't let me select that image format. This looks like a bug to me and I have entered a task to get that fixed.

    If you have enough memory available and you are not already doing so you could try running in 24 bit xrgb format. It takes more memory for the display frame buffer(s), but it can be faster if there is a lot of alpha blending going on.

    1 person found this answer helpful.

  3. Ken Maxwell 706 Reputation points Microsoft Employee
    2021-09-10T14:26:50.417+00:00

    Hello @S.A.M ,

    The logic to rotate the gauge needle image can consume some CPU time. We've worked hard to optimize this logic, and actually in a head-to-head performance test with ST's own graphics package we came out on top. But there is a key feature here that needs to be enabled. In the function gx_circular_gauge_background_draw(), we try to do two things in parallel:

    1) Fire off DMA2D to draw the gauge background image and
    2) Calculate the rotated needle image.

    In order to do these two things in parallel, the display driver needs to have a callback function assignment function. This means when we trigger the DMA2D operation to draw the gauge background, we get a callback to do some other work while DMA2D is rendering the background image. In our example drivers for ST, we initialize the callback assignment function like this:

    display -> gx_display_driver_callback_assign = gx_display_wait_function_set_24xrgb;

    which is the key thing to enable this parallel execution. Is this being done in your display driver?

    The only other thing we can think of is basic CPU configuration. Is the data cache enabled?

    Best Regards,

    Ken

    1 person found this answer helpful.
    0 comments No comments

  4. S.A.M 56 Reputation points
    2021-09-11T07:12:06.457+00:00

    Hello @Ken Maxwell
    Thanks for your detailed answer,

    I do appreciate your work, and I'm actually surprised ​why the FPS is too low considering your rich library;
    I have tried many different embedded GUI libraries and platforms so far, and this FPS is a bit unusual.

    I understood your explanation,
    I'm actually using the display driver which belongs to the STM32F746G-DK IAR Samples, and modified it a bit to port it for STM32H750B-DK;
    And I just realized, that line of code you mentioned above, has been commented in the display driver as you can see below:

        #if defined(GX_CHROMEART_ENABLE)  
            /* override those functions that can be accelerated with DMA2D */  
            display -> gx_display_driver_horizontal_line_draw = gx_chromeart_horizontal_line_draw;  
            display -> gx_display_driver_vertical_line_draw = gx_chromeart_vertical_line_draw;  
            display -> gx_display_driver_canvas_copy        = gx_chromeart_canvas_copy;  
            display -> gx_display_driver_pixelmap_draw      = gx_chromeart_pixelmap_draw;  
            display -> gx_display_driver_pixelmap_blend     = gx_chromeart_pixelmap_blend;  
            display -> gx_display_driver_8bit_glyph_draw    = gx_chromeart_glyph_8bit_draw;  
            //display -> gx_display_driver_callback_assign    = gx_display_wait_function_set;  
          
          
            //display -> gx_display_driver_canvas_blend                  = _gx_display_driver_24xrgb_canvas_blend;  
            //display -> gx_display_driver_4bit_glyph_draw               =  _gx_display_driver_generic_glyph_4bit_draw;  
            //display -> gx_display_driver_1bit_glyph_draw               =  _gx_display_driver_24bpp_glyph_1bit_draw;  
    #endif  
    

    And if I uncomment this line: "//display -> gx_display_driver_callback_assign = gx_display_wait_function_set;" , there will be some abnormalities with the gauge's needle, needle will flicker a lot and most of the times the needle doesn't get rendered as well and you can't see the needle on the screen (only needle, not background.),

    Now the question is, for enabling that feature (having a callback assignment function and doing those two tasks you mentioned in parallel), are there any further steps to do?
    Because it is not working properly as I said, and it seems I need to modify some other places in the code as well.

    I'm trying to solve it as well, but this is the display driver I'm using if you want to take a look at: 131203-display-driver.txt


    And about data cache, I didn't modify that part from the sample project, the functions for initializing I&D cache are being called from here: "common_hardware_code.c -> hardware_setup()"


    Thanks for your time once again,

    Kind Regards.

    0 comments No comments