Unit 8.1 -- Concatenate Two Character Strings

PROG

Concatenate two character strings

PED

Illustrate character strings and a common application

CONCEPTS

A character string consists of

-Maximum Length
-Current Length
-Bytes

Byte = one memory location, contains one 'character'

Character string is passed as pointer to "Bytes"

Thus, to get the Actual Length,
assuming RegA contains the "address of the character string"
we write

               LR   RegB,RegA
               A    RegB,=f'-4'
               L    RegC,0(0,RegB)

where RegB is a temporary register and RegC will contain the
Actual Length.

To  obtain  the  Maximum  Length,  we  have  something  very
similar.

               LR   RegB,RegA
               A    RegB,=f'-8'
               L    RegC,0(0,RegB)

To define a character string we write:

               DC   A(STRINGnL)
               DC   A(STRINGnA)
STRINGn  DC    C'chars'
STRINGnA EQU   *-STRING1
         DC    C'     '  optional padding
STRING1L EQU   *-STRING1
ad

If we are defining a character string constant, whose
actual length = maximum length we write:

         DC    A(STRINGnL)
         DC    A(STRINGnL)
STRINGn  DC    C'chars'
STRINGnL EQU   *-STRINGn

Comparing characters is done with the CLM instruction.

Two compare two characters, we write
         CLM   regA,1,mem-loc
31.html

                       Character Strings

We  are  going  to learn how to make a set  of  routines  to
handle    character   strings.    These   routines    handle
concatenations  and  comparing strings.  Additonal  routines
tha  tyou  might write and include in your character  string
package  including searching a character string for another.
Some   of  these  features  are  built  into  higher   level
languages.

A  character string has an "ActualLength"  That is the count
of the number of characters in the string.  The string 'abc'
would have a length of three.

A  string has an actual length.  That is how much room there
is  in  the  character  string  variable  for  things.   For
example,  the string 'abc' may be in the string,  'string1."
String1 would have an "ActualLength" of 3.  However, string1
may  have  a "MaximumLength" of 6.  That means we could  add
three more characters to String1 before we would run out  of
room.  We need this so our assembler routines don't  try  to
add  too many characters and thus clobber whatever might  be
in storage.

A  character  string  package  should  contain  many  useful
things.  We will discuss two of them in this unit.  The rest
are left as an exercise to the proverbial interested reader.

The  first  is  a  routine to add one  character  string  to
another.    For  example, assume we have a string,  string1,
containing  "abc" whose MaximumLength was  6.   Now,  assume
there is a second string containing "de".  If we concatenate
the  second string onto the first string, then string1 would
contain "abcde"

However,  if  the  second  string  were  "defg"  then  there
wouldn't  be  enough  room for all four  characters  of  the
second string.  The total number of strings in "abcdefg"  is
seven.   But,  string1 only has room  for  six!   Thus,  the
result in string1 would be truncated to "abcdef"

Our  second  routine  compares  two  character  strings   to
determine  if the first is less than, equal to,  or  greater
than  the other.  We all know what "<" "=" and ">"  are  for
numbers.   We  must  define what this  means  for  character
strings.   We define this to be the same as how we  look  up
words in the dictionary or names in the phone book.  This is
sometimes called "lexicographic" ordering.

Easy examples are,
string1   string2   result string1 ro string2
abc       def       <
cat       cot       <
dog       dagostin  >
          o

In  the  event  that the first string is  the  same  as  the
beginning  of  the  second string, or vice-versa,  then  the
longest  string is considered larger than the  shorter  one.
Examples:

string1   string2   result
cat       catastro  <
          phe
zulx      zul       >


Only,  if  all the letters match and the ActualLength's  are
the same, are the two character strings considered equal.

In  Assembler, we define a character string as a sequence of
characters.  We let the "address of the character string" be
the address of the first byte of the characters.  The actual
length  is  on  the integer just before this.   The  Maximum
Length is one integer prior.

Thus  to  get  the Actual Length, assuming Ra  contains  the
"address of the character string" we write

         LR    Rb,Ra
         A     Rb,=f'-4'
         L     Rc,0(0,Rb)

where  Rb  is  a temporary register and Rc will contain  the
Actual Length.

To  obtain  the  Maximum  Length,  we  have  something  very
similar.

         LR    Rb,Ra
         A     Rb,=f'-8'
         L     Rc,0(0,Rb)

To define a character string we write:

         DC    A(STRINGnL)
         DC    A(STRINGnA)
STRINGn  DC    C'chars'
STRINGnA EQU   *-STRING1
         DC    C'     '      optional padding
STRING1L EQU   *-STRING1

If  we  are  defining  a  character string  constant,  whose
actual length = maximum length we write:

         DC    (STRINGnL)
         DC    A(STRINGnL)
STRINGn  DC    C'chars'
STRINGnL EQU   *-STRINGn

There  are  several  things  we  have  to  know  about   how
characters are handled in Assembler.  This is the first  new
type  of object we see in this class.  We spent all the time
so  far this semester just learning different things we  can
do with integers.

We allocate them with the
          DC   C'chars'
where chars is the text to put in them.

If  we  want the computer to right pad the characters  to  a
certain length, we can write
         DC    CLn'chars'
This will allocate room for n characters.  Then the first  m
spaces  will be filled with the text in 'chars'   m  is  the
number of characters in the space.

Thus
A        DC    CL5'abc'
will  allocate  5 bytes (or characters) at  A.    The  first
position  will   be 'a'  The second will be 'b'   The  third
will be 'c'  The fourth position will be blank and the fifth
position will be blank.

We also can write
         DC    10C'char'
where  'char' is a single character, often blank, to get  10
characters.

We  can  view a character string as an array of bytes.  Each
character  takes ONLY ONE byte.  Thus, when we go through  a
character  string  sequentially,  and  are  using   register
tricks, we increment the register by ONE.

To  load a character, we use the IC instruction.  (IC stands
for "insert character.")
         IC    RegA,0(0,RegB)
or
         IC    RegA,mem

The  IC  instruction loads a single byte and puts it in  the
rightmost eight bits of the register.  The other bits remain
unchanged.  This doesn't matter in practice, and we can view
IC as a load character instruction.

If  ra  is  the register in which we would like to  get  the
character,  then the form looks as below.  rb is a  register
containing the address of the character to be loaded.
         IC    RegA,0(0,RegB)
or
         IC    RegA,mem

Storing a character is done with STC instruction.  Again STC
loads  the  character at the rightmost  eight  bits  of  the
register.  This matches the IC so we can view IC and STC  as
a  pair that load and store a character, just as L and ST do
for integers.

To store a character, we write

         STC   RegA,0(0,RegB)

         STC   RegA,mem

Again,  ra  is  the register containing the  character.   rb
would be the register we can use.

Comparing characters is done with the CLM instruction.

To compare two characters, we write
         CLM   regA,1,mem-loc

mem-loc is either 0(regA) where reg contains the address  of
the second character to compare or a label.

RegA is a register containing the character to cmpare in the
right hand eight bits.  It was probably loaded with IC.

ASIDE:

You  do have the option of comparing more than one character
at  a time by changing the middle number.  For example,  one
could compare the first and third characters in the register
to  two  characters in memory.  (There  are  32  bits  in  a
register and only
eight bits in a character.  Thus a register could be made to
hold  four  characters.)   However,  there  is  no  way   to
practically use this feature, so lets ignore this.

END OF ASIDE:

Two  examples are provided to show how character strings are
used.   The  first concatenates two character  strings.   It
appends  the  second string onto the first.   If  the  first
string  were  'abc'  and the second were 'def',  then  after
concatenation  the  first string  would  be  'abcdef.'   The
second string would remain unchanged.

The  second  example  compares  two  character  strings  and
returns  a  different number if the first is less  than  the
second, they are both equal and if the first is greater than
the second.

In concatenation, we have to do the following

a)   determine  where to put the characters from  string  2.
     This  is done by adding the appropriate length  to  the
     beginning address.

b)   keep  copying the characters from string2  to  string1.
     We want to stop if either
         i.    we  run out of characters to copy,  i.e.,  we
         exceed the length of string 2
         ii.   we run out of room in the first string, i.e.,
         the  position in string1 would exceed  the  maximum
         length.

In  the  PASCAL code, we use i as the position in the  first
string.  By setting it equal to "string1.ActualLength+1," we
are  making  it  point  to the empty space  after  the  last
character.   j  is  the position in the second  string.   We
start copying from the first character of this string.

Notice,  in the "while" statement, we stop at the time  when
either       the      first      string       is       full,
"(i<=string1.maximumlength)"  or  we  gave  copied  all  the
characters        in        the        second        string,
"(j<=string2.actuallength)."
"string1.chars[i]:=string2.chars[j]" moves a character  from
the  appropriate  position  in  the  second  string  to  the
position  pointed to by "i" in the first string.   The  next
two statements bump the two pointers.

Lastly, at the end, we change the count of characters in the
first  string--the maximum length remains the same  as  that
simply  reflects  the memory allocated  for  the  string  at
compile time.

The  code for the concatenate string subroutine can be found
on  lines  61  to 112.  You will observe that  comments  and
extracts from the equivalent PASCAL program are interspersed
to  help  the  understanding.   Of  course,  there  is  also
documentation  to  tell  you which register  corresponds  to
which label.

In  lines  61  and  62, we initialize pointers  to  the  two
character  strings.  This is a simple matter of copying  the
data out of the first and second words of the control block.

Register  8  now  points to the first character  string  and
register 9 points to the second.

To get the maximum length of any character string we have to
look  two  words behind the address of the string.   We  get
this by subtracting eight from the address and using this as
a pointer to the place from which we can extract the maximum
length.   We  do this in lines 64 to 66. See  Page  209  for
templates to obtain these lengths.

To  get  the  "actual  length,"  the  number  of  characters
currently in the string, we have to subtract four  from  the
address.  For string2, we do this in lines 68 to 70.  And we
fetch the actual length of the string in lines 72 to 74.

We compute "I:=string1.ActualLength+1" in lines 76 to 77 and
compute "J:=1" in line 79.

To   compute   the  pointer  to  the  location   marked   by
string1.chars[i],  we have to compute the  position  of  the
firt  empty space in the first string.  That is we  have  to
compute              the             address              of
"string1.chars[string1.actuallength+1]."  This  is  done  in
lines  81  and  82.   We  also have to compute  the  initial
address  for  the character to be moved.  This  is  done  by
simply  copying  the  address of  the  character  string  to
register 5 as done in line 84 since we start moving from the
first position of character string 2..

We  are now ready to begin converting the "while loop."  The
place  to  branch back to, starting the loop is in line  87.
We  have two conditions to check in the "while" loop.  Since
they were connected with an "and," we have to branch out  if
either  one  is  not  true.   First  we  do  the  comparison
"I<=STRING1.MAXLENGTH."   The  table  of  registers  in  the
comment  before  tells us that register  2  contains  I  and
register  6  contains STRING1.MAXLENGTH.  The comparison  is
thus  as  on  line 90 and we branch out on ">" in  line  91.
Likewise,  in doing "J<=STRING2.ACTUALLENGTH"  we  find  out
from  the table of registers that J can be found in register
7  and STRING2.ACTUALLENGTH can be found inregister 7.  Thus
the  comparison is thus as on line 93 and the branch out  is
on line 94 on greater-than.

We  now  do  the  translation of  the  actual  move  of  the
character in lines 96 through 97.  We find from the register
table  that  register  5 points to  the  place  to  get  the
character  from,  or  the address of "string2.chars[j]"   We
retrieve  that character in line 96.  It puts the  character
in the rightmost byte of register 10.  (It won't matter what
happens  to or is in the rest of register 10.)  The register
table also tells us that register 4 contains the address  of
the  "string1.chars[i]" or the place to copy  the  character
to.   Thus, we store the rightmost byte of register 10  into
that byte in line 97.

Lines 99 to 102, increment i, j, and the pointers in string1
and  string2.  Note that the pointers get incremented by one
as  we  are  going  byte by byte rather  than  character  by
character.

And  line  104  ends the loop by branching back  up  to  the
beginning.

The  last  thing  to  do  is  to  load  the  actual  length,
string1.ActualLength to the appropriate value.  This is done
in  lines  107  to 110.  We store "j-1" into this  position.
The place to put the Actual length is computed by taking the
address of string1 and subtracting four from it.  This gives
us the location where the ActualLength should be put.  We do
the store in line 110.

Now  let's  look  at the main program.  We simply  call  the
concatenate  routine with two strings indicated  by  string1
and  string2.  We loads these two address into the first two
words  of  the  control block in lines 12  to  15.   We  let
register  one point to the control block in line 16  and  do
the call in line 17.  Line 18 takes us back to the operating
system after the return from the subroutine.

Now   let's  look  at  the  data  structures.   String1   is
initialized  in lines 20 to 25.  It is somewhat larger  than
the  number of characters in it.  There is room for a  total
of  six  characters and only the first three characters  are
put  in at the time the program is loaded.  The first string
is  'ABC'  on  line 22.  We record the length  in  line  23.
Notice  that "STRING1A" appears at the word just before  the
beginning of the character string.  We then add some  blanks
to contain additional characters that might be added in line
24.   The total room is kept in the equate, STRING1L in line
25.   Notice  that  STRING1L appears in line  20.   See  the
templates on Page 209.

The  second character string is defined on lines 26 to  line
29.  It simply contains 'DEFG' with no extra room.  Thus, we
can  use  the  same equate for both the maximum  length  and
actual length, STRING2L on line 29.  This equate appears  at
the  two words prior to the character string on lines 26  to
27.

When  the call is completed, DEF will be copied to the empty
space  at  offset 0027.  There won't be room for the  fourth
character of STRING2 so that won't be copied.

Our  second example is one to compare two character strings.
The  PASCAL program is provided on Page 226.  Let us see how
we convert the example to ASSEMBLER.  Our PASCAL program has
the  compare string routine in a subroutine called  compare.
The corresponding ASSEMBLER is in lines 68 to 114.

The  main  routine simply calls the Compare routine  with  a
CSCB with three arguments.  Note that we load the address of
the  beginning  of  characters comprising STRING1  into  the
first word of the CSCB.  The two lengths appear at -4 and -8
before  this.   Then  the address, STRING2,  goes  into  the
second word of the CSCB.  This is the address of the "F"  in
"FGHIJK"  Then we call the compare subroutine.  Upon return,
we simply go back to the operating system.

The subroutine begins by fetching into registers the address
of  the two argument strings.  These get loaded into R2  and
R3.

Now we need the lengths of both STRING1 and STRING2--we want
the  actual  length here.  We don't need  to  compare  empty
space.

Lines  59-61  fetch the length of STRING1 into  register  4.
Lines 62 to 64 fetch the length of STRING2 into register  5.
Note  that  each  of  these is done by adding  four  to  the
address of the "string" passed before and using that (stored
in register 9) as an address.

Then lines 65 sets "result," being stored in register 8,  to
zero.  Line 66 sets i, being stored in register 7, to one.

At  the label, CS1, we run through the three compares in the
"WHILE" statement.  Lines 68 to 69 check to see if result is
still  equal  to  zero.  Lines 70 to  71  check  to  see  if
i<=string1.ActualLength and lines 72 and 73 check to see  if
i<=string2.ActualLEngth.

We now do the character comparison in line 48 in lines 74 to
75.  If we fall through to the statement at line 77, then we
have  found two characters where the first is less than  the
second.  We set result to -1, corresponding to string1 being
less than string2, in this line.

Otherwise,  we  check to see if we have a character  greater
than the other in matching locations in string1 and string2.
This  is  done  in  lines 80 and 82.  If  the  character  is
greater than, we set the result to 1, as the whole string is
greater than the other.

Lines  84 to 87 increment the register 7 which we are  using
for "i" as well as the pointers to the position in character
string  1  and character string 2.  And we go  back  to  the
beginning of the while loop in line 87.

At  CS2, we perform the if statements from "if result=0"  to
the end of the program.

Lines 101 to 103 simply take the result, kept in register 8,
and  stick  it  in  the  third parameter.   That  is  a  var
parameter  so  we  use the template for  storing  in  a  var
parameter.  And line 103 takes us back to the caller.

93.PAS