top of page
Search
  • skhan4059

64-Bit Assembler Lab

Previously we were using 6502 assembler and practicing how the assembly language works:


This time we will be stepping it up and be using aarch64 assembly to make a simple program. 64 bit assembler is much more complex and the commands are much different from 6502, but with the help of the internet, I think we should be able to make everything work. A link to the lab is here:



We start off with a sample package called hello on dedicated server for this course. We copy and unpack it using tar and it should look like this:


spo600
└── examples
    └── hello                     # "hello world" example programs
        ├── assembler
        │   ├── aarch64           # aarch64 gas assembly language version
        │   │   ├── hello.s
        │   │   └── Makefile
        │   ├── Makefile
        │   └── x86_64            # x86_64 assembly language versions
        │       ├── hello-gas.s   # ... gas syntax
        │       ├── hello-nasm.s  # ... nasm syntax
        │       └── Makefile
        └── c                     # Portable C versions
            ├── hello2.c          # ... using write()
            ├── hello3.c          # ... using syscall()
            ├── hello.c           # ... using printf()
            └── Makefile

The objective is to make a program that can loop and print out a single statement 10 times. We are given a template for the loop as follows:


.text
.global _start

min = 0                          /* starting value for the loop index; note that this is a symbol (constant), not a variable */
max = 30                         /* loop exits when the index hits this number (loop condition is i<max) */

_start:

    mov     x19, min

loop:

    /* ... body of the loop ... do something useful here ... */

    add     x19, x19, 1
    cmp     x19, max
    b.ne    loop

    mov     x0, 0           /* status -> 0 */
    mov     x8, 93          /* exit is syscall #93 */
    svc     0               /* invoke syscall */

The important stuff goes in the center of the loop which in this case will be code to write the word "loop" 10 times, so I looked through the template file in hello.s in the aarch64 directory and join that with this code which resulted in:



.text
.global _start

min = 0                          /* starting value for the loop index; note that this is a symbol (constant), not a variable */
max = 10                         /* loop exits when the index hits this number (loop condition is i<max) */

_start:
    sub sp, sp, 16
    mov     x19, min

loop:
        mov     x0, 1           /* file descriptor: 1 is stdout */
        adr     x1, msg         /* message location (memory address) */
        mov     x2, len         /* message length (bytes) */

        mov     x8, 64          /* write is syscall #64 */
        svc     0               /* invoke syscall */

    add     x19, x19, 1
    cmp     x19, max
    b.ne    loop

    add sp, sp, 16

    mov     x0, 0           /* status -> 0 */
    mov     x8, 93          /* exit is syscall #93 */
    svc     0               /* invoke syscall */

.data
msg:    .ascii      "Loop\n"

len=    . - msg

which gives us the result of:

Loop
Loop
Loop
Loop
Loop
Loop
Loop
Loop
Loop
Loop

Just as expected.


The next part is quite a bit more difficult, it requires us to print the number of times the loop ran starting from 0 all the way to 9. We need to modify our code to convert x19 register to the ascii character equivalent. In order to we need to add a value of 48 to it because that is the difference in actual value between the integer 0 and the ascii character value of 0. So, after adding 48 to it, we still need a place to store it, as we can't print directly from the register so we will save it on the so convenient stack.


One thing to note here is that in comparison to the 6502 assembly, this is more complex but also a lot more intuitive. We no longer need to put the character in a specific address to print them on a screen and we can just right them out using system calls to standard output.


Here is the code that we get:



.text
.global _start

min = 0                          /* starting value for the loop index; note that this is a symbol (constant), not a variable */
max = 10                         /* loop exits when the index hits this number (loop condition is i<max) */

_start:
    sub sp, sp, 16
    mov     x19, min

loop:
        add x13, x19, 48 /* loop variable + 48 goes into register 13 (or whichever one you choose) */
        strb w13, [sp] /* register 13 stored into memory */

        mov     x0, 1           /* file descriptor: 1 is stdout */
        adr     x1, msg         /* message location (memory address) */
        mov     x2, len         /* message length (bytes) */

        mov     x8, 64          /* write is syscall #64 */
        svc     0               /* invoke syscall */

        mov x0, 1 /* printing code */
        mov x1, sp
        mov x2, 1
        mov x8, 64
        svc 0

        mov x13, 10 /* stores ascii value of newline to x13 */
        strb w13, [sp] /* put value on stack */
        mov x0, 1
        mov x1, sp
        mov x2, 1
        mov x8, 64
        svc 0

    add     x19, x19, 1
    cmp     x19, max
    b.ne    loop

    add sp, sp, 16

    mov     x0, 0           /* status -> 0 */
    mov     x8, 93          /* exit is syscall #93 */
    svc     0               /* invoke syscall */

.data
msg:    .ascii      "Loop: "

len=    . - msg

Basically we use the x13 register to store the the incrementing index by 48, save this value on the stack. This leads to another problem, we now need to place the newline after we print the number somehow. I did this by following the same steps for printing the numbers, except in this case we use the constant value of 10 and put that on the stack. Results in:



The next has us do the same thing but we need to modify the program so that we can print until 30. This will prove to be quite the hurdle as our current code is only equipped to handle singe digit numbers. At this part I got really stuck so I decided to consult the internet where I found a very useful thread:



The idea is to use integer division to get the ten's digit by itself, then using the msub instruction which will multiply the ten's digit by ten and then subtract it with the index which will effectively give us our one's digit, then print each out separately. The code results in this:



.text
.global _start

min = 0                          /* starting value for the loop index; note that this is a symbol (constant), not a variable */
max = 30                         /* loop exits when the index hits this number (loop condition is i<max) */

_start:
    sub sp, sp, 16

    mov     x19, min
    mov     x20, 10

loop:
        mov     x0, 1           /* file descriptor: 1 is stdout */
        adr     x1, msg         /* message location (memory address) */
        mov     x2, len         /* message length (bytes) */

        mov     x8, 64          /* write is syscall #64 */
        svc     0               /* invoke syscall */

        cmp x19, 9
        b.gt nine_hundred



zero_nine:


        add x13, x19, 48 /* loop variable + 48 goes into register 13 (or whichever one you choose) */
        strb w13, [sp] /* register 13 stored into memory */

        mov x0, 1 /* printing code */
        mov x1, sp
        mov x2, 1
        mov x8, 64
        svc 0
        b newline

nine_hundred:

        udiv x21, x19, x20
        msub x22, x20, x21, x19

        add x13, x21, 48
        strb w13, [sp]
        mov x0, 1
        mov x1, sp
        mov x2, 1
        mov x8, 64
        svc 0

        add x13, x22, 48
        strb w13, [sp]
        mov x1, sp
        mov x2, 1
        mov x8, 64
        svc 0




newline:

        mov x13, 10
        strb w13, [sp]
        mov x0, 1
        mov x1, sp
        mov x2, 1
        mov x8, 64
        svc 0

        add     x19, x19, 1
        cmp     x19, max
        b.ne    loop

        add sp, sp, 16

        mov     x0, 0           /* status -> 0 */
        mov     x8, 93          /* exit is syscall #93 */
        svc     0               /* invoke syscall */

.data
msg:    .ascii      "Loop: "

len=    . - msg

The x13 stores all values into the stack for printing purposes, x21 store the ten's value, x22 stores the one's value and then similar to the code meant for single digits we print each digit separately. This leads to down the road problems of the code not being able to run if the number is over 100 but using this method we should be able to theoretically loop through this to make sure all the numbers are printed, but unfortunately that is over my head so I won't be attempting it here, though it sounds like a good challenge for when I have a better understanding of assembly. The result is here:


And everything works like intended. This lab was actually fairly difficult to understand at first since it was so much more intuitive and complex compared to 6502 that it took me a while to adjust but I must say I much prefer this over 6502 and it seems like there is a lot more you can do with access to more diverse instructions and more registers to play with. Well that's it for this lab, I hope you enjoyed reading this.

6 views0 comments

Recent Posts

See All

Testing Auto-vectorization for upcoming software.

We will be testing portability for the new 64-bit ARM architecture coming out in the near future and trying to apply the new auto-vectorization feature in the GCC Compiler to see how much of an impact

bottom of page