Unit 2.5 An Introduction to Characters and Character Strings


Allocate a "string"  with the
     DC   C'chars'
where chars is the text to put in them.

     DC   CLn'chars'
allocate room for n characters.

right-padded with blanks

A    DC   CL5'abc'

     DC   10C'char'
gives ten of 'char'

character string = an array of bytes.

Each character takes ONLY ONE byte.


  load  a character from the ith position of a string, use
the IC instruction.

     IC   RegA,memA+n-1

RegA  is  the  register in which we would like  to  get  the
character

n is the number of the character we want
memA is the label at the first position

To store a character into nth position of string, we write

     STC  RegA,memA+n-1

Again, RegA is the register containing the character.   RegB
would be the register we can use.

Only applies for n being a constant



77.html


Unit 2.5 -- Character Strings

Assembly   supports  the  concept  of  a  single  character,
analagous  to a variable declared of type "char" in  Pascal.
A single character takes one byte in memory.  We could think
of  a string of characters as an array of characters.  There
is   no  concept  of  'string'  in  Assembly  Language.   An
operation  to a string like concatenating two strings  would
be  done  character  by character. Let  us  consider  how  a
sequence  of string of characters like 'ABC' would  be  laid
out  in  memory.  The first character, 'A', would be in  the
first  position.  The second character 'B' would  be  at  an
address  that is one more.  And the 'C' would be located  in
the  next  position.  So, if the 'A' were at memory location
20040, then the 'B' would be at 20041 and the C would be  at
20042.

This sequence of characters, 'ABC' could be defined with the
name CX by writing:

CX        DC        C'ABC'

We could refer to the character 'A' as CX or CX+0.  We could
refer to the character 'B' as CX+1.  The character 'C' would
be at CX+2.

It  is also possible to create blank filling after a string.
We can explicitly give a length by writing
CLmm'   '.  The mm tells how many characters will be in  the
string.   Let n be the number of characters between the  two
quotes.  If mm is greater than n, the string will be  blank-
filled.   Blanks will be added at the end of the  characters
specified.  If mm was smaller than n, then the string  would
be truncated.

Thus,

CY    DC   CL6'ABC'

would have 'A' at CY, 'B' at CY+1, 'C' at CY+2, and blank at
CY+3, CY+4 and CY+5.  In other words, the total lengthof the
string  would be six characters or bytes.  The  first  three
would  be  'ABC'   The remaining three characters  would  be
blank.

If we wrote:
CZ        DC        CL2'ABCD'

Then  there would be two characters in CZ.  At CZ  would  be
'A'   AT  CZ+1 would be 'B'  The 'C' and the 'D'  would  not
appear in memory.  The Assembler truncated them away.

We  can  refer  to  refer the individual characters  in  the
string by the form XX+i-1.  XX is the label on the DC.  i is
the  position of the character we want.  Thus, we could  get
to the third character of CX by writing CX+2.  Note that the
first  character can be referred to simply as XX instead  of
the form XX+0, although the latter will, of course, work.

There  are  two  instructions used to manipulate  individual
characters.   The  instruction IC  will  retrieve  a  single
character into a register.

The  instruction STC will deposit a single character in  the
register and put it in memory.

Thus the sequence
          IC        7,CX+2
          STC       7,CY+4

will  copy the C in the second position of CX to the  fourth
position  of CY.  CY will now be the string, 'ABCC   '   The
first of the blanks will be replaced by a 'C'

Note that a register contains 32 bits.  A character only has
8  bits.  We say that a character is one byte and a register
has  four bytes.  When, we do the IC instruction, we replace
only  the  rightmost byte of the register.  The first  three
bytes  are unaffected.  When, we do a STC, the last byte  is
put  into  the  designated memory location.  Normally,  this
won't  matter  since if a character went into  the  register
with  the  IC,  it  will  go  out  correctly  with  the  STC
instruction.

However,  it  does  matter after I tell you  a  fact.   Each
character  is associated with a binary number.   We  usually
write  this  binary number in hex.  Since each character  is
eight  bits, it can be expressed as a two-digit hex  number.
Every letter has a unique number associated with it.

For the capital letters, we have the codes:

Letter     Hex        Letter     Hex        Letter     Hex
A          C1         J          D1         S          E2
B          C2         K          D2         T          E3
C          C3         L          D3         U          E4
D          C4         M          D4         V          E5
E          C5         N          D5         W          E6
F          C6         O          D6         X          E7
G          C7         P          D7         Y          E8
H          C8         Q          D8         Z          E9
I          C9         R          D9

The  digits for the numbers also correspond to hex  numbers,
specifically

Character        Hex
0                F0
1                F1
2                F2
3                F3
4                F4
5                F5
6                F6
7                F7
8                F8
9                F9

In  PASCAL,  you  have no doubt encountered the  concept  of
applying the 'ord' function to a character and getting  back
a  unique number.  The number you get from taking the ord of
a  variable  of  type char is the number listed  above.   Of
course,  in PASCAL, you would get the decimal equivalent  of
the hexadecimal sequence above.

If  you were to look at a character sequence in memory witht
he  debugger (covered in section 3), you would see  the  hex
numbers here.

For example, assume your program had the declaration,

CX        DC        C'AB19C'

and  CX appeared at memory location 20040. Then, assume that
we issued the debugger command:
DISPLAY 20040

Your result would be

C1C2F1F9

You could see the "C3" for the 'C' by display 20044.


If  it is desired,to perform the operation ord(CZ) where  CZ
contained a character, we could write:
          SR        regA,regA
          IC        regA,CZ

The  first  instruction would clear out all  four  bytes  of
regA.   The  IC  would  load the hexadecimal  value  of  the
character CZ into regA.  We could now use this value  as  we
would any other integer.

Note that every character that could be printed or typed has
a  corresponding hexadecimal number.  Thus, there  would  be
unique  numbers  for the period ("."), the semicolon  (";"),
the right bracket ("]"), etc.  You can find a complete table
of these values in many of the optional books and materials.
Since  a byte can contain  256 distinct numbers (0-255),  it
turns out that many of the possible numbers don't correspond
to  non-printing  characters.  Some  of  these  non-printing
characters are control-characters that perform such tasks as
skipping to the top of page or carriage return.

There  are two systems for assigning numbers to the  various
characters  that can be printed or typed.  The one  used  on
IBM  mainframes is EBCDIC.  The above information and tables
apply to the EBCDIC coding sequence.  All other machines use
ASCII   which   stands  for  American  Standard   Code   for
Information Interchange.  A list of these codes can be found
in Appendix B of Silver and Appendix G of POP.

It  is  now time to look at our sample program.  Two strings
are defined, which are given the names A and B.  On line 14,
we see the definition for A, CL6'ABC'  Note that A starts at
1C in memory.  We find in the first three positions, C1, C2,
and  C3.   Following this are three blanks.  Blanks  have  a
code  in  the EBCDIC system, hexadecimal 40.  Thus,  we  see
that  there are three 540's at locations 1F, 20, and 21.   B
contains  the  characters, D,E, and F.  Note the  codes  for
them, C4, C5 and C6 on line 15.

Our  program will simply copy the characters one by one from
B  to  the  end  of  A.   That is, we will  copy  the  first
character  of B to be put in the fourth position of  A;  the
second  character of B will go in the fifth position  of  A;
and the third character of B will go into the sixth position
of A.

Lines 6 and 7 move the first character of B.  Note that  the
fourth  position of A corresponds to A+3 since the  template
is  "A+i-1"  Lines 8 and 9 move the second character of B to
the  fifth position of A and lastly lines 10 and 11 move the
last  character of B to the sixth position of A.  Thus,  DEF
will  overwrite the three blanks that were put  in  A  after
'ABC'

One  thing I should mention is that we cannot refer  to  the
Ith  character where I is not a constant.  In otherwords  if
we  had  a  memory  locatin called POS, which  contained  an
integer,
we  could  not  write  A+POS-1 to  get  to  the  appropriate
position  of A.  If we want to do such things, we will  have
to  use  the techniques of arrays.  In fact, most meaningful
applications  of character strings will have  to  await  the
learning of array techniques.