Unit 2.5 An Introduction to Characters and Character Strings
Allocate a "string" with the
DC C'chars'
where chars is the text to put in them.
DC CLn'chars'
allocate room for n characters.
right-padded with blanks
A DC CL5'abc'
DC 10C'char'
gives ten of 'char'
character string = an array of bytes.
Each character takes ONLY ONE byte.
load a character from the ith position of a string, use
the IC instruction.
IC RegA,memA+n-1
RegA is the register in which we would like to get the
character
n is the number of the character we want
memA is the label at the first position
To store a character into nth position of string, we write
STC RegA,memA+n-1
Again, RegA is the register containing the character. RegB
would be the register we can use.
Only applies for n being a constant
77.html
Unit 2.5 -- Character Strings
Assembly supports the concept of a single character,
analagous to a variable declared of type "char" in Pascal.
A single character takes one byte in memory. We could think
of a string of characters as an array of characters. There
is no concept of 'string' in Assembly Language. An
operation to a string like concatenating two strings would
be done character by character. Let us consider how a
sequence of string of characters like 'ABC' would be laid
out in memory. The first character, 'A', would be in the
first position. The second character 'B' would be at an
address that is one more. And the 'C' would be located in
the next position. So, if the 'A' were at memory location
20040, then the 'B' would be at 20041 and the C would be at
20042.
This sequence of characters, 'ABC' could be defined with the
name CX by writing:
CX DC C'ABC'
We could refer to the character 'A' as CX or CX+0. We could
refer to the character 'B' as CX+1. The character 'C' would
be at CX+2.
It is also possible to create blank filling after a string.
We can explicitly give a length by writing
CLmm' '. The mm tells how many characters will be in the
string. Let n be the number of characters between the two
quotes. If mm is greater than n, the string will be blank-
filled. Blanks will be added at the end of the characters
specified. If mm was smaller than n, then the string would
be truncated.
Thus,
CY DC CL6'ABC'
would have 'A' at CY, 'B' at CY+1, 'C' at CY+2, and blank at
CY+3, CY+4 and CY+5. In other words, the total lengthof the
string would be six characters or bytes. The first three
would be 'ABC' The remaining three characters would be
blank.
If we wrote:
CZ DC CL2'ABCD'
Then there would be two characters in CZ. At CZ would be
'A' AT CZ+1 would be 'B' The 'C' and the 'D' would not
appear in memory. The Assembler truncated them away.
We can refer to refer the individual characters in the
string by the form XX+i-1. XX is the label on the DC. i is
the position of the character we want. Thus, we could get
to the third character of CX by writing CX+2. Note that the
first character can be referred to simply as XX instead of
the form XX+0, although the latter will, of course, work.
There are two instructions used to manipulate individual
characters. The instruction IC will retrieve a single
character into a register.
The instruction STC will deposit a single character in the
register and put it in memory.
Thus the sequence
IC 7,CX+2
STC 7,CY+4
will copy the C in the second position of CX to the fourth
position of CY. CY will now be the string, 'ABCC ' The
first of the blanks will be replaced by a 'C'
Note that a register contains 32 bits. A character only has
8 bits. We say that a character is one byte and a register
has four bytes. When, we do the IC instruction, we replace
only the rightmost byte of the register. The first three
bytes are unaffected. When, we do a STC, the last byte is
put into the designated memory location. Normally, this
won't matter since if a character went into the register
with the IC, it will go out correctly with the STC
instruction.
However, it does matter after I tell you a fact. Each
character is associated with a binary number. We usually
write this binary number in hex. Since each character is
eight bits, it can be expressed as a two-digit hex number.
Every letter has a unique number associated with it.
For the capital letters, we have the codes:
Letter Hex Letter Hex Letter Hex
A C1 J D1 S E2
B C2 K D2 T E3
C C3 L D3 U E4
D C4 M D4 V E5
E C5 N D5 W E6
F C6 O D6 X E7
G C7 P D7 Y E8
H C8 Q D8 Z E9
I C9 R D9
The digits for the numbers also correspond to hex numbers,
specifically
Character Hex
0 F0
1 F1
2 F2
3 F3
4 F4
5 F5
6 F6
7 F7
8 F8
9 F9
In PASCAL, you have no doubt encountered the concept of
applying the 'ord' function to a character and getting back
a unique number. The number you get from taking the ord of
a variable of type char is the number listed above. Of
course, in PASCAL, you would get the decimal equivalent of
the hexadecimal sequence above.
If you were to look at a character sequence in memory witht
he debugger (covered in section 3), you would see the hex
numbers here.
For example, assume your program had the declaration,
CX DC C'AB19C'
and CX appeared at memory location 20040. Then, assume that
we issued the debugger command:
DISPLAY 20040
Your result would be
C1C2F1F9
You could see the "C3" for the 'C' by display 20044.
If it is desired,to perform the operation ord(CZ) where CZ
contained a character, we could write:
SR regA,regA
IC regA,CZ
The first instruction would clear out all four bytes of
regA. The IC would load the hexadecimal value of the
character CZ into regA. We could now use this value as we
would any other integer.
Note that every character that could be printed or typed has
a corresponding hexadecimal number. Thus, there would be
unique numbers for the period ("."), the semicolon (";"),
the right bracket ("]"), etc. You can find a complete table
of these values in many of the optional books and materials.
Since a byte can contain 256 distinct numbers (0-255), it
turns out that many of the possible numbers don't correspond
to non-printing characters. Some of these non-printing
characters are control-characters that perform such tasks as
skipping to the top of page or carriage return.
There are two systems for assigning numbers to the various
characters that can be printed or typed. The one used on
IBM mainframes is EBCDIC. The above information and tables
apply to the EBCDIC coding sequence. All other machines use
ASCII which stands for American Standard Code for
Information Interchange. A list of these codes can be found
in Appendix B of Silver and Appendix G of POP.
It is now time to look at our sample program. Two strings
are defined, which are given the names A and B. On line 14,
we see the definition for A, CL6'ABC' Note that A starts at
1C in memory. We find in the first three positions, C1, C2,
and C3. Following this are three blanks. Blanks have a
code in the EBCDIC system, hexadecimal 40. Thus, we see
that there are three 540's at locations 1F, 20, and 21. B
contains the characters, D,E, and F. Note the codes for
them, C4, C5 and C6 on line 15.
Our program will simply copy the characters one by one from
B to the end of A. That is, we will copy the first
character of B to be put in the fourth position of A; the
second character of B will go in the fifth position of A;
and the third character of B will go into the sixth position
of A.
Lines 6 and 7 move the first character of B. Note that the
fourth position of A corresponds to A+3 since the template
is "A+i-1" Lines 8 and 9 move the second character of B to
the fifth position of A and lastly lines 10 and 11 move the
last character of B to the sixth position of A. Thus, DEF
will overwrite the three blanks that were put in A after
'ABC'
One thing I should mention is that we cannot refer to the
Ith character where I is not a constant. In otherwords if
we had a memory locatin called POS, which contained an
integer,
we could not write A+POS-1 to get to the appropriate
position of A. If we want to do such things, we will have
to use the techniques of arrays. In fact, most meaningful
applications of character strings will have to await the
learning of array techniques.