.NET Zone is brought to you in partnership with:

Den is a DZone Zone Leader and has posted 460 posts at DZone. You can read more from them at their website. View Full User Profile

Let's Talk ASM - String Concatenation

05.02.2013
| 3010 views |
  • submit to reddit

Not a lot of developers today know Assembly, which - regardless of your professional line of work - is a good skill to have. Assembly teaches you think on a much lower level, going beyond the abstracted out layer provided by many of the high-level languages. Today we're going to look at a way to implement a string concatenation function.

Specifically, I want to follow the following procedure for building the final result:

  1. Ask the user for input
  2. Append a crLf (carriage return + line feed) to the entered string
  3. Append the entered string to the existing composite string 
  4. Follow back from step 1 until the user enters a terminator character
  5. Display the composite string
Let's assume that you have zero knowledge of Assembly. If that is the case, I would recommend starting here. In this example, I am using Visual Studio 2012 to test the code, but you might as well use an older version of the IDE if you want. For convenience purposes, I would recommend downloading the basic framework code that comes for free from the writer of the Introduction to 80x86 Assembly Language and Computer Architecture book:

First, you have the standard declarations:

.586
.MODEL FLAT

INCLUDE io.h            ; header file for input/output
cr      equ     0dh     ; carriage return character
Lf      equ     0ah     ; line feed

.STACK 4096

.DATA
prompt      BYTE   cr, Lf, "Original string? ",0
resTitle    BYTE   "Final Result",0
stringIn    BYTE   1024 DUP (?)
stringOut   BYTE   1024 DUP (?)
linefeed	BYTE   cr, Lf

Notice the reference to io.h - at this point you want a way to receive user input and display output data through standard WinAPI channels, and io.h does just that. Some ASM experts might argue that it is not a good idea to use WinAPI hooks in the context of a "pure" assembly program, for educational purposes, but in this situation the focus is on the inner workings of a different function.

NOTE: The program is adapted to the scenario where the execution of the string concatenation function is the sole purpose. As you will get a hang of the execution flow, you can easily adapt it to a scenario where some of the registers can be re-used.

Let's start by clearing the ECX and EDX registers:

.CODE
_MainProc PROC
		; clear the ECX and EDX registers because these will 
		; be used for length counters and sequential increments.
		xor			ECX, ECX
		xor			EDX, EDX

Once the strings will be entered by the user, I will need to find out the length of the string to append, in order to have a correct sequential memory address. Now I need to get user input:

INPUT_DATA:
	; prompt the user to enter the string he ultimately
	; wants appended to the main string buffer.
 input	prompt, stringIn, 40    ; read ASCII characters

	; make sure that the string doesn't start with the $ character
	; which would automatically mean that we need to terminate the 
	; reading process
	cmp		stringIn, '$'
	je		DONE
	lea		EAX, [stringOut + EDX]  ; destination address
 push	EAX			; push the destination on the stack
 lea		EAX, [stringIn]		; source address
 push	EAX			; push the source on the stack
 call	strcopy			; call the string copy procedure

Once the string is entered, I can check whether the terminator character - "$", was used. 


One of the great things about the cmp instruction is the fact that it checks the starting address of the entered string, therefore I can simply compare the entered data with a single character.

In case the character is encountered, the program flow terminates at DONE, where the output is displayed:

	DONE:
		; output the new data.
        output		resTitle, stringOut 
        mov			EAX, 0  
        ret


strcopy is an internal procedure that will simply copy a string from one memory address to another:

strcopy		PROC	NEAR32

			push	EBP
			mov		EBP, ESP

			push	EDI
			push	ESI

			pushf

			mov		ESI, [EBP+8]
			mov		EDI, [EBP+12]

			cld

		whileNoNull:
			cmp		BYTE PTR [ESI], 0
			je		endWhileNoNull
			movsb
			jmp		whileNoNull

		endWhileNoNull:
			mov		BYTE PTR [EDI], 0

			popf
			pop		ESI
			pop		EDI
			pop		EBP
			ret		8
strcopy		ENDP

To make sure that the next string is properly appended, I need to find out the length of the previous one, for a correct memory address offset:

; let's get the length of the current string - move it
; to the proper register so that we can perform the measurement
mov		EDI, EAX

; find the length of the string that was just entered
sub		ECX, ECX
sub		AL, AL
not		ECX
cld
repne	scasb
not		ECX
dec		ECX
add		EDX, ECX

REPNE SCASB is used for an in-string iterative null terminator search (you can read more about it here). It will decrement ECX for each character.

; we need to append the linefeed (crLf) to the string so we apply
; the same string concatenation procedure for that sequence.
lea		EAX, [stringOut + EDX]  ; destination address
push	EAX              	; first parameter
lea		EAX, [linefeed]    	; source
push	EAX              	; second parameter
call	strcopy          	; call string copy procedure
mov		EDI, EAX

; we know that the crLf characters are 2 entities, therefore
; increment the overall counter by 2.
add		EDX, 2

; ask for more input because no terminator character was used.
jmp		INPUT_DATA

Once the basic input data is processed, I can append the crLf sequence and increment EDX for the proper offset, after which the program flow is being reset from the point where the user has to enter the next character sequence.