top of page
Search
  • skhan4059

Running Final Tests with the Auto-vectorizer

Here is a link to the last blog:



This time we are going to look at the specifics of where the autovectorizer made its changes and why. So, last time we built the auto-vectorized version of the software which worked successfully so now I will check what actually changed within c files themselves.

One good sign shown above is that even though the amount of vectorized loops seems kind of low, they are all in the places that seem to do the actual decoding and encoding of the data. Hopefully they are most impactful loops.


So in order to find where these the sve2 instructions are being implemented, I made a log file when using by changing the CFLAGS using the configure script to show to make the compiler show where and why it implemented what it did. Then, copied that to a log file.

the commands I used were


./configure CFLAGS="-g -O3 -march=armv8-a+sve2 -fopt-info-vec-all"

The -fop-info-vec-all being the option to show the logic of the compiler. After setting that, I used make to build and then copied the output to a make.log file.

As shown above, the files that primarily have vectorized loops seem to be in the important files: lzma_decoder.c and lzma_encoder.c. In order to figure out whether the important loops have been vectorized, I looked at the functions that have said vectorized loops and check how much they are called and read the intention of the function. This allows me to see how often and what the function is used for which may not be the perfect method, but I think it should give a good idea of how much the vectorization is effecting the software.



find . -type f -name "*.c" | while read F ;do cat $F | grep "function_name";  done

This little command allows me to check every single .c file for a for any given string (in this case a function name which replaces the function_name in the regex match string) and we will be looking for the function that have the vectorized loops which again, we found by looking at the line number of the .c file in the make.log that we save. As a result, we find 5 functions that have vectorized for loops are:


length_encoder_reset

lzma_lzma_encoder_reset

lzma_decoder_reset

fill_align_prices

is_format_lzma


Using the command above I searched for all instances of this function in the entire build. I was planning to search for every function that called these functions but after a little bit of searching I found that there were very few functions that called any of these functions, and when I checked all the functions that called these only called said functions for very menial tasks. Reading the function descriptions showed that in general they have a very small part to play within the xz software. So although my intention was to the autovectorizer in a positive light, this here proves that in this instance it did not have the desired outcome.


So, the result of our effort was not as impactful as I thought it would be. There can be several reasons that this has happened. The option for the autovectorizer itself may not be as effective as it should and maybe some development in that department is require, another reason could be that this software is not meant for the autovectorizer, or rather, the software may have been written in a way that does not encourage the changes. Another lesson that could be learned for this is that even though our intentions were to improve the software optimization, we ended up proving otherwise, and in this we now know several things. We know that the autovectorizer could be improved for this type of program. We know that programs could be written in a way that keeps the autovectorizer in mind which could increase efficiency. Finally we now know that this software has very little need for the autovectorizer, there is no need for xz to have use this method to improve optimization as the effects are minimal and could make the software unstable as the armv9 isn't even out yet. Maybe in the future this will be a more viable option when everything is tested and all the kinks are worked out, but for now, I see no need to use the autovectorizer. I do want to mention that some of the other blogs in my class have seen massive improvements using the autovectorizer and has significant promise for the future of sve2, so I do believe that this holds massive potential and look forward to the possibilities it holds. This is the last part of the analysis of autovectorization using gcc, I thank everyone for reading and hope you learned something from here, as I have learned a lot.

11 views0 comments

Recent Posts

See All

Testing Auto-vectorization for upcoming software.

We will be testing portability for the new 64-bit ARM architecture coming out in the near future and trying to apply the new auto-vectorization feature in the GCC Compiler to see how much of an impact

bottom of page