Unit 8.1 -- Concatenate Two Character Strings
PROG
Concatenate two character strings
PED
Illustrate character strings and a common application
CONCEPTS
A character string consists of
-Maximum Length
-Current Length
-Bytes
Byte = one memory location, contains one 'character'
Character string is passed as pointer to "Bytes"
Thus, to get the Actual Length,
assuming RegA contains the "address of the character string"
we write
LR RegB,RegA
A RegB,=f'-4'
L RegC,0(0,RegB)
where RegB is a temporary register and RegC will contain the
Actual Length.
To obtain the Maximum Length, we have something very
similar.
LR RegB,RegA
A RegB,=f'-8'
L RegC,0(0,RegB)
To define a character string we write:
DC A(STRINGnL)
DC A(STRINGnA)
STRINGn DC C'chars'
STRINGnA EQU *-STRING1
DC C' ' optional padding
STRING1L EQU *-STRING1
ad
If we are defining a character string constant, whose
actual length = maximum length we write:
DC A(STRINGnL)
DC A(STRINGnL)
STRINGn DC C'chars'
STRINGnL EQU *-STRINGn
Comparing characters is done with the CLM instruction.
Two compare two characters, we write
CLM regA,1,mem-loc
31.html
Character Strings
We are going to learn how to make a set of routines to
handle character strings. These routines handle
concatenations and comparing strings. Additonal routines
tha tyou might write and include in your character string
package including searching a character string for another.
Some of these features are built into higher level
languages.
A character string has an "ActualLength" That is the count
of the number of characters in the string. The string 'abc'
would have a length of three.
A string has an actual length. That is how much room there
is in the character string variable for things. For
example, the string 'abc' may be in the string, 'string1."
String1 would have an "ActualLength" of 3. However, string1
may have a "MaximumLength" of 6. That means we could add
three more characters to String1 before we would run out of
room. We need this so our assembler routines don't try to
add too many characters and thus clobber whatever might be
in storage.
A character string package should contain many useful
things. We will discuss two of them in this unit. The rest
are left as an exercise to the proverbial interested reader.
The first is a routine to add one character string to
another. For example, assume we have a string, string1,
containing "abc" whose MaximumLength was 6. Now, assume
there is a second string containing "de". If we concatenate
the second string onto the first string, then string1 would
contain "abcde"
However, if the second string were "defg" then there
wouldn't be enough room for all four characters of the
second string. The total number of strings in "abcdefg" is
seven. But, string1 only has room for six! Thus, the
result in string1 would be truncated to "abcdef"
Our second routine compares two character strings to
determine if the first is less than, equal to, or greater
than the other. We all know what "<" "=" and ">" are for
numbers. We must define what this means for character
strings. We define this to be the same as how we look up
words in the dictionary or names in the phone book. This is
sometimes called "lexicographic" ordering.
Easy examples are,
string1 string2 result string1 ro string2
abc def <
cat cot <
dog dagostin >
o
In the event that the first string is the same as the
beginning of the second string, or vice-versa, then the
longest string is considered larger than the shorter one.
Examples:
string1 string2 result
cat catastro <
phe
zulx zul >
Only, if all the letters match and the ActualLength's are
the same, are the two character strings considered equal.
In Assembler, we define a character string as a sequence of
characters. We let the "address of the character string" be
the address of the first byte of the characters. The actual
length is on the integer just before this. The Maximum
Length is one integer prior.
Thus to get the Actual Length, assuming Ra contains the
"address of the character string" we write
LR Rb,Ra
A Rb,=f'-4'
L Rc,0(0,Rb)
where Rb is a temporary register and Rc will contain the
Actual Length.
To obtain the Maximum Length, we have something very
similar.
LR Rb,Ra
A Rb,=f'-8'
L Rc,0(0,Rb)
To define a character string we write:
DC A(STRINGnL)
DC A(STRINGnA)
STRINGn DC C'chars'
STRINGnA EQU *-STRING1
DC C' ' optional padding
STRING1L EQU *-STRING1
If we are defining a character string constant, whose
actual length = maximum length we write:
DC (STRINGnL)
DC A(STRINGnL)
STRINGn DC C'chars'
STRINGnL EQU *-STRINGn
There are several things we have to know about how
characters are handled in Assembler. This is the first new
type of object we see in this class. We spent all the time
so far this semester just learning different things we can
do with integers.
We allocate them with the
DC C'chars'
where chars is the text to put in them.
If we want the computer to right pad the characters to a
certain length, we can write
DC CLn'chars'
This will allocate room for n characters. Then the first m
spaces will be filled with the text in 'chars' m is the
number of characters in the space.
Thus
A DC CL5'abc'
will allocate 5 bytes (or characters) at A. The first
position will be 'a' The second will be 'b' The third
will be 'c' The fourth position will be blank and the fifth
position will be blank.
We also can write
DC 10C'char'
where 'char' is a single character, often blank, to get 10
characters.
We can view a character string as an array of bytes. Each
character takes ONLY ONE byte. Thus, when we go through a
character string sequentially, and are using register
tricks, we increment the register by ONE.
To load a character, we use the IC instruction. (IC stands
for "insert character.")
IC RegA,0(0,RegB)
or
IC RegA,mem
The IC instruction loads a single byte and puts it in the
rightmost eight bits of the register. The other bits remain
unchanged. This doesn't matter in practice, and we can view
IC as a load character instruction.
If ra is the register in which we would like to get the
character, then the form looks as below. rb is a register
containing the address of the character to be loaded.
IC RegA,0(0,RegB)
or
IC RegA,mem
Storing a character is done with STC instruction. Again STC
loads the character at the rightmost eight bits of the
register. This matches the IC so we can view IC and STC as
a pair that load and store a character, just as L and ST do
for integers.
To store a character, we write
STC RegA,0(0,RegB)
STC RegA,mem
Again, ra is the register containing the character. rb
would be the register we can use.
Comparing characters is done with the CLM instruction.
To compare two characters, we write
CLM regA,1,mem-loc
mem-loc is either 0(regA) where reg contains the address of
the second character to compare or a label.
RegA is a register containing the character to cmpare in the
right hand eight bits. It was probably loaded with IC.
ASIDE:
You do have the option of comparing more than one character
at a time by changing the middle number. For example, one
could compare the first and third characters in the register
to two characters in memory. (There are 32 bits in a
register and only
eight bits in a character. Thus a register could be made to
hold four characters.) However, there is no way to
practically use this feature, so lets ignore this.
END OF ASIDE:
Two examples are provided to show how character strings are
used. The first concatenates two character strings. It
appends the second string onto the first. If the first
string were 'abc' and the second were 'def', then after
concatenation the first string would be 'abcdef.' The
second string would remain unchanged.
The second example compares two character strings and
returns a different number if the first is less than the
second, they are both equal and if the first is greater than
the second.
In concatenation, we have to do the following
a) determine where to put the characters from string 2.
This is done by adding the appropriate length to the
beginning address.
b) keep copying the characters from string2 to string1.
We want to stop if either
i. we run out of characters to copy, i.e., we
exceed the length of string 2
ii. we run out of room in the first string, i.e.,
the position in string1 would exceed the maximum
length.
In the PASCAL code, we use i as the position in the first
string. By setting it equal to "string1.ActualLength+1," we
are making it point to the empty space after the last
character. j is the position in the second string. We
start copying from the first character of this string.
Notice, in the "while" statement, we stop at the time when
either the first string is full,
"(i<=string1.maximumlength)" or we gave copied all the
characters in the second string,
"(j<=string2.actuallength)."
"string1.chars[i]:=string2.chars[j]" moves a character from
the appropriate position in the second string to the
position pointed to by "i" in the first string. The next
two statements bump the two pointers.
Lastly, at the end, we change the count of characters in the
first string--the maximum length remains the same as that
simply reflects the memory allocated for the string at
compile time.
The code for the concatenate string subroutine can be found
on lines 61 to 112. You will observe that comments and
extracts from the equivalent PASCAL program are interspersed
to help the understanding. Of course, there is also
documentation to tell you which register corresponds to
which label.
In lines 61 and 62, we initialize pointers to the two
character strings. This is a simple matter of copying the
data out of the first and second words of the control block.
Register 8 now points to the first character string and
register 9 points to the second.
To get the maximum length of any character string we have to
look two words behind the address of the string. We get
this by subtracting eight from the address and using this as
a pointer to the place from which we can extract the maximum
length. We do this in lines 64 to 66. See Page 209 for
templates to obtain these lengths.
To get the "actual length," the number of characters
currently in the string, we have to subtract four from the
address. For string2, we do this in lines 68 to 70. And we
fetch the actual length of the string in lines 72 to 74.
We compute "I:=string1.ActualLength+1" in lines 76 to 77 and
compute "J:=1" in line 79.
To compute the pointer to the location marked by
string1.chars[i], we have to compute the position of the
firt empty space in the first string. That is we have to
compute the address of
"string1.chars[string1.actuallength+1]." This is done in
lines 81 and 82. We also have to compute the initial
address for the character to be moved. This is done by
simply copying the address of the character string to
register 5 as done in line 84 since we start moving from the
first position of character string 2..
We are now ready to begin converting the "while loop." The
place to branch back to, starting the loop is in line 87.
We have two conditions to check in the "while" loop. Since
they were connected with an "and," we have to branch out if
either one is not true. First we do the comparison
"I<=STRING1.MAXLENGTH." The table of registers in the
comment before tells us that register 2 contains I and
register 6 contains STRING1.MAXLENGTH. The comparison is
thus as on line 90 and we branch out on ">" in line 91.
Likewise, in doing "J<=STRING2.ACTUALLENGTH" we find out
from the table of registers that J can be found in register
7 and STRING2.ACTUALLENGTH can be found inregister 7. Thus
the comparison is thus as on line 93 and the branch out is
on line 94 on greater-than.
We now do the translation of the actual move of the
character in lines 96 through 97. We find from the register
table that register 5 points to the place to get the
character from, or the address of "string2.chars[j]" We
retrieve that character in line 96. It puts the character
in the rightmost byte of register 10. (It won't matter what
happens to or is in the rest of register 10.) The register
table also tells us that register 4 contains the address of
the "string1.chars[i]" or the place to copy the character
to. Thus, we store the rightmost byte of register 10 into
that byte in line 97.
Lines 99 to 102, increment i, j, and the pointers in string1
and string2. Note that the pointers get incremented by one
as we are going byte by byte rather than character by
character.
And line 104 ends the loop by branching back up to the
beginning.
The last thing to do is to load the actual length,
string1.ActualLength to the appropriate value. This is done
in lines 107 to 110. We store "j-1" into this position.
The place to put the Actual length is computed by taking the
address of string1 and subtracting four from it. This gives
us the location where the ActualLength should be put. We do
the store in line 110.
Now let's look at the main program. We simply call the
concatenate routine with two strings indicated by string1
and string2. We loads these two address into the first two
words of the control block in lines 12 to 15. We let
register one point to the control block in line 16 and do
the call in line 17. Line 18 takes us back to the operating
system after the return from the subroutine.
Now let's look at the data structures. String1 is
initialized in lines 20 to 25. It is somewhat larger than
the number of characters in it. There is room for a total
of six characters and only the first three characters are
put in at the time the program is loaded. The first string
is 'ABC' on line 22. We record the length in line 23.
Notice that "STRING1A" appears at the word just before the
beginning of the character string. We then add some blanks
to contain additional characters that might be added in line
24. The total room is kept in the equate, STRING1L in line
25. Notice that STRING1L appears in line 20. See the
templates on Page 209.
The second character string is defined on lines 26 to line
29. It simply contains 'DEFG' with no extra room. Thus, we
can use the same equate for both the maximum length and
actual length, STRING2L on line 29. This equate appears at
the two words prior to the character string on lines 26 to
27.
When the call is completed, DEF will be copied to the empty
space at offset 0027. There won't be room for the fourth
character of STRING2 so that won't be copied.
Our second example is one to compare two character strings.
The PASCAL program is provided on Page 226. Let us see how
we convert the example to ASSEMBLER. Our PASCAL program has
the compare string routine in a subroutine called compare.
The corresponding ASSEMBLER is in lines 68 to 114.
The main routine simply calls the Compare routine with a
CSCB with three arguments. Note that we load the address of
the beginning of characters comprising STRING1 into the
first word of the CSCB. The two lengths appear at -4 and -8
before this. Then the address, STRING2, goes into the
second word of the CSCB. This is the address of the "F" in
"FGHIJK" Then we call the compare subroutine. Upon return,
we simply go back to the operating system.
The subroutine begins by fetching into registers the address
of the two argument strings. These get loaded into R2 and
R3.
Now we need the lengths of both STRING1 and STRING2--we want
the actual length here. We don't need to compare empty
space.
Lines 59-61 fetch the length of STRING1 into register 4.
Lines 62 to 64 fetch the length of STRING2 into register 5.
Note that each of these is done by adding four to the
address of the "string" passed before and using that (stored
in register 9) as an address.
Then lines 65 sets "result," being stored in register 8, to
zero. Line 66 sets i, being stored in register 7, to one.
At the label, CS1, we run through the three compares in the
"WHILE" statement. Lines 68 to 69 check to see if result is
still equal to zero. Lines 70 to 71 check to see if
i<=string1.ActualLength and lines 72 and 73 check to see if
i<=string2.ActualLEngth.
We now do the character comparison in line 48 in lines 74 to
75. If we fall through to the statement at line 77, then we
have found two characters where the first is less than the
second. We set result to -1, corresponding to string1 being
less than string2, in this line.
Otherwise, we check to see if we have a character greater
than the other in matching locations in string1 and string2.
This is done in lines 80 and 82. If the character is
greater than, we set the result to 1, as the whole string is
greater than the other.
Lines 84 to 87 increment the register 7 which we are using
for "i" as well as the pointers to the position in character
string 1 and character string 2. And we go back to the
beginning of the while loop in line 87.
At CS2, we perform the if statements from "if result=0" to
the end of the program.
Lines 101 to 103 simply take the result, kept in register 8,
and stick it in the third parameter. That is a var
parameter so we use the template for storing in a var
parameter. And line 103 takes us back to the caller.
93.PAS