Widget HTML Atas

Computer Organization Textbook Pdf Free Download

ResearchGate Logo

Discover the world's research

  • 20+ million members
  • 135+ million publications
  • 700k+ research projects

Join for free

Introduction to Computer

Organization

with x86-64 Assembly Language & GNU/Linux

Robert G. Plantz, Ph.D.

Sonoma State University

bob.cs.sonoma.edu

January 2011

Copyright notice

Copyright ©2008, ©2009, ©2010, ©2011 by Robert G. Plantz. All rights reserved.

This book may be reproduced and distributed in its entirety (including this authorship, copyright, and permission

notice), provided that no charge is made for the document itself (except for the cost of the printing or copying s ervice),

without the author's written consent. This includes "fair use" excerpts like reviews and advertising and derivative

works like translations. You may print or copy individual pages for your own use.

Instructors are encouraged to use this book in their classes. The auth or would appreciate being notifi ed of such

usage.

The author has used his best efforts in preparing this book. The author makes no warranty of any kind, expressed

or implied, with regard to the programs or the documentation contai ned in this book. Th e author shall not be lia b l e in

any event from incidental or consequential damages in connection with, or arising out of, the furnishing, performance,

or use of these programs.

All products or services mentioned in this book are the trademarks or service marks of their respective companies

or organizations. Eclipse is a trademark of Eclipse Foundation, Inc.

Contents

Preface xvi

1 Introduction 1

1.1 Computer Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 How the Subsystems Interact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Data Storage Formats 6

2.1 Bits and Groups of Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Mathematical Equivalence of Binary and Decimal . . . . . . . . . . . . . . . . . . . 8

2.3 Unsigned Decimal to Binary Conversion . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4 Memory A Place to Store Data (and Other Things) . . . . . . . . . . . . . . . . . 10

2.5 Using C Programs to Explore Data For mats . . . . . . . . . . . . . . . . . . . . . . . 13

2.6 Examining Memory With gdb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.7 ASCII Character Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.8 write and read Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3 Computer Arithmetic 28

3.1 Addition and Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2 Arithmetic Errors Un sign ed Integer s . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.3 Arithmetic Errors Signed Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.4 Overflow and Signed Decimal Integers . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.4.1 The Me aning of CF and OF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.5 C/C++ Basic Data Type s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.5.1 C/C++ Shift Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.5.2 C/C++ B it Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.5.3 C/C++ Data Type Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.6 Other Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.6.1 BCD Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.6.2 Gray Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4 Logic Gates 55

4.1 Boolean Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.2 Canonical (Standard) Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.3 Boolean Function Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.3.1 Minimization Using Alge braic Manipulations . . . . . . . . . . . . . . . . . . 61

4.3.2 Minimization Using Graphic Tools . . . . . . . . . . . . . . . . . . . . . . . . 63

4.4 Crash Course in Elec tronics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.4.1 Power Supplies and Batteries . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.4.2 Resistors, Capacitors, and Inductors . . . . . . . . . . . . . . . . . . . . . . . 70

4.4.3 CMOS Transistors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.5 NAND an d N OR Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

iii

iv CONTENTS

5 Logic Circuits 82

5.1 Combinational Logic Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.1.1 Adder C irc uits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.1.2 Ripple-Carry Addition/Subtraction Circuits . . . . . . . . . . . . . . . . . . . 85

5.1.3 Decode rs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5.1.4 Multiplexers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5.2 Programmable Logic Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.2.1 Programmable Logic Arr ay (PLA) . . . . . . . . . . . . . . . . . . . . . . . . 91

5.2.2 Read Only Memory (ROM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.2.3 Programmable Array Logic (PAL) . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.3 Sequential Logic C irc uits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.3.1 Clock Pulses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5.3.2 Latches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5.3.3 Flip-Flops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.4 Designing Sequential C ircuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5.5 Memory Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.5.1 Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.5.2 Shift Re gisters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

5.5.3 Static Rand om Acce ss Memory (SRAM) . . . . . . . . . . . . . . . . . . . . . 112

5.5.4 Dynamic Random Access Memory (DRAM) . . . . . . . . . . . . . . . . . . . 114

5.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

6 Central Processing Unit 116

6.1 CPU Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

6.2 CPU Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

6.3 CPU Interaction with Memory and I/O . . . . . . . . . . . . . . . . . . . . . . . . . . 122

6.4 Program Execution in the CPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

6.5 Using gdb to View the CPU Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

7 Programming in Assembly Languag e 132

7.1 Creating a New Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

7.2 Program Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

7.2.1 First instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

7.2.2 A Note About Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

7.2.3 The A dditional Assembly Language Generated by the Compiler . . . . . . . 143

7.2.4 Viewing Both the Assembly Language and C Source Code . . . . . . . . . . 144

7.2.5 Minimum Program in 32-bit Mode . . . . . . . . . . . . . . . . . . . . . . . . 146

7.3 Assemblers and Linkers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

7.3.1 Assemblers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

7.3.2 Linkers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

7.4 Creating a Pro gram in Assembly Language . . . . . . . . . . . . . . . . . . . . . . . 150

7.5 Instructions Introduced Thus Far . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

7.5.1 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

7.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

8 Program Data Input, Store, Output 154

8.1 Calling write in 64-bit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

8.2 Introduction to the Call Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

8.3 Local Variables on the Call Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

8.3.1 Calling printf and scanf in 64-bit Mode . . . . . . . . . . . . . . . . . . . . . 171

8.4 Designing the Local Variable Portio n of the Call Stack . . . . . . . . . . . . . . . . 173

8.5 Using syscall to Perfor m I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

8.6 Calling Functions, 32-Bit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

8.7 Instructions Introduced Thus Far . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

8.7.1 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

8.7.2 Addressing Mod es . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

CONTENTS v

8.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

9 Computer Operations 183

9.1 The Assignment Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

9.2 Addition and Subtraction Ope rators . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

9.3 Introduction to Machine Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

9.3.1 Assembler Listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

9.3.2 General Format of Instruction s . . . . . . . . . . . . . . . . . . . . . . . . . . 199

9.3.3 REX Prefix By te . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

9.3.4 ModRM Byte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

9.3.5 SIB By te . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

9.3.6 The mov Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

9.3.7 The add Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

9.4 Instructions Introduced Thus Far . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

9.4.1 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

9.4.2 Addressing Mod es . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

9.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

10 Program Flow Constructs 208

10.1 Repetition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

10.1.1 Comparison Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

10.1.2 Conditional Jumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

10.1.3 Unconditional Jump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

10.1.4 while Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

10.2 Binary Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

10.2.1 Short-Circuit Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

10.2.2 Conditional Move . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

10.3 Instructions Introduced Thus Far . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

10.3.1 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

10.3.2 Addressing Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

10.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

11 Writing Your Own Functions 236

11.1 Overview of Passing Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

11.2 More Than Six Arguments, 64-Bit Mode . . . . . . . . . . . . . . . . . . . . . . . . . 242

11.3 Interface Between Functions, 32-Bit Mode . . . . . . . . . . . . . . . . . . . . . . . 251

11.4 Instructions Introduced Thus Far . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

11.4.1 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

11.4.2 Addressing Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256

11.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256

12 Bit Operations; Multiplication and Di vision 258

12.1 Logical Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

12.2 Shifting Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

12.3 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273

12.4 Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280

12.5 Negating Signed ints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286

12.6 Instructions Introduced Thus Far . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

12.6.1 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

12.6.2 Addressing Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288

12.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288

vi CONTENTS

13 Data Structures 291

13.1 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291

13.2 struct s (Records) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296

13.3 struct s as Function Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

13.4 Structs as C++ Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306

13.5 Instructions Introduced Thus Far . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315

13.5.1 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315

13.5.2 Addressing Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317

13.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317

14 Fractional Numbers 319

14.1 Fractions in Binary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319

14.2 Fixed Point ints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320

14.3 Floating Point Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321

14.4 IEEE 754 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323

14.5 Floating Point Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326

14.5.1 SSE2 Floating Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327

14.5.2 x87 Floating Point Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330

14.5.3 3DNow! Floating Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335

14.6 Comments About Numerical Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . 335

14.7 Instructions Introduced Thus Far . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336

14.7.1 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336

14.7.2 Addressing Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338

14.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338

15 Interrupts and Exceptions 342

15.1 Hardware Inter rupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342

15.2 Exception s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343

15.3 Software Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344

15.4 CPU Response to an Interru pt or Exception . . . . . . . . . . . . . . . . . . . . . . . 345

15.5 Return from Interrupt/Exception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345

15.6 The syscall and sysret Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . 345

15.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348

15.8 Instructions Introduced Thus Far . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349

15.8.1 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349

15.8.2 Addressing Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351

15.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351

16 Input/Output 352

16.1 Memory Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352

16.2 I/O Device Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353

16.3 Bus Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353

16.4 I/O Interfacing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353

16.5 I/O Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355

16.6 Programming Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355

16.7 Interrupt-Driven I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365

16.8 I/O Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366

16.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366

A Reference Material 367

A.1 Basic Logic Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367

A.2 Register Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368

A.3 Argument Order in Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368

A.4 Register Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369

A.5 Assembly Language Instructions Used in This Book . . . . . . . . . . . . . . . . . . 369

A.6 Addressing Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372

CONTENTS vii

B Using GNU make to Build Programs 373

C Using the gdb Debugger for Assembly Language 378

D Embedding Assembly Code in a C Function 383

E Ex ercise Solutions 388

E.2 Data Storage Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388

E.3 Computer Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396

E.4 Logic Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406

E.5 Logic Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408

E.6 Central Processing Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410

E.7 Programming in Assembly Lang uage . . . . . . . . . . . . . . . . . . . . . . . . . . . 412

E.8 Program Data Input, Stor e, Output . . . . . . . . . . . . . . . . . . . . . . . . . . 416

E.9 Computer Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419

E.10 Program Flow Constructs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425

E.11 Writing Your Own Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436

E.12 Bit Operations; Multiplication and Division . . . . . . . . . . . . . . . . . . . . . . . 444

E.13 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457

E.14 Fractional Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480

E.15 Interrupts and Exc eptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483

Bibliography 485

Index 486

List of Figures

1.1 Subsystems of a computer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1 Possible contents of the rst sixteen bytes of memory . . . . . . . . . . . . . . . . . 11

2.2 Repeat of Figure 2.1 with contents shown in hex. . . . . . . . . . . . . . . . . . . . . 11

2.3 A text string stored in memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.1 "Decoder Ring" f or three-bit signed and unsigned integers. . . . . . . . . . . . . . . 42

3.2 Relationship of I/O libraries to application and op erating system. . . . . . . . . . . 45

3.3 Truth table for adding two bits with carry from a previous bit ad dition. . . . . . . . 47

3.4 Truth tables showing bitwise C/C++ operations. . . . . . . . . . . . . . . . . . . . . 47

3.5 Truth tables showing C/C++ logical operations. . . . . . . . . . . . . . . . . . . . . . 48

4.1 The AND gate acting on two variables, x and y . . . . . . . . . . . . . . . . . . . . . . 55

4.2 The OR gate acting on two variables, x and y . . . . . . . . . . . . . . . . . . . . . . . 56

4.3 The NOT gate acting on one variable, x. . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.4 Hardware implementation of the function in Equation 4.20. . . . . . . . . . . . . . 62

4.5 Hardware implementation of the function in Equation 4.28. . . . . . . . . . . . . . 62

4.6 Mapping of two-variable minterms on a Karnaugh map. . . . . . . . . . . . . . . . 63

4.7 Karnaugh map for F

1

(x, y) = x · y

+ x

· y + x · y . . . . . . . . . . . . . . . . . . . . . 64

4.8 Two-variable Karnaugh map showing the groupings x and y . . . . . . . . . . . . . . 64

4.9 Mapping of three-variable minterms on a Karnaugh map. . . . . . . . . . . . . . . 65

4.10 Mapping of four- variable minterms on a Karnaugh map. . . . . . . . . . . . . . . . 65

4.11 Comparison of one minterm (a) versus one maxterm (b) on a Karnaugh map. . . . 67

4.12 Mapping of three-variable maxterms on a Karnaugh map. . . . . . . . . . . . . . . 67

4.13 Mapping of four- variable minterms on a Karnaugh map. . . . . . . . . . . . . . . . 68

4.14 The XOR gate acting o n two variables, x and y . . . . . . . . . . . . . . . . . . . . . . 68

4.15 A "don't care" cell on a Karnaugh map. . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.16 Karnaugh map for xor function if we know x = y = 1 can not occur. . . . . . . . . . 69

4.17 AC/DC power supply. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.18 Two resistors in series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.19 Two resistors in parallel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.20 Capacitor in series with a resistor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.21 Capacitor charging over time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.22 Inductor in series with a resistor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.23 Inductor building a magnetic field ov er time. . . . . . . . . . . . . . . . . . . . . . . 74

4.24 A single n-type MO SFET transistor switch. . . . . . . . . . . . . . . . . . . . . . . . 75

4.25 Single transistor switch equivalent circuit. . . . . . . . . . . . . . . . . . . . . . . . 75

4.26 CMOS inverter (NOT) circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.27 CMOS inverter equivalent circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.28 CMOS AND circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.29 The NAND gate acting on two variables, x and y . . . . . . . . . . . . . . . . . . . . . 77

4.30 The NOR gate acting on two variables, x and y . . . . . . . . . . . . . . . . . . . . . . 78

4.31 An alternate way to draw a NAND gate. . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.32 A NOT gate built from a NAND gate. . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.33 An AND gate built from two NAND gates. . . . . . . . . . . . . . . . . . . . . . . . . 78

viii

LIST OF FIGURES ix

4.34 An OR gate built fro m three NAND gates. . . . . . . . . . . . . . . . . . . . . . . . . 79

4.35 The function in Equation 4.41 using two AND g ate s and one OR gate. . . . . . . . 79

4.36 The function in Equation 4.41 using two A ND gates, one OR gate and four NOT

gates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.37 The function in Equation 4.41 using only thr ee NAND gates. . . . . . . . . . . . . . 79

5.1 A half add er circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.2 A full add er circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5.3 Four-bit adder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5.4 Four-bit adder/subtracter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5.5 Circuit for a 3 × 8 decoder with enable. . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5.6 Full adder implemented with 3 × 8 decod er. . . . . . . . . . . . . . . . . . . . . . . . 89

5.7 A 2-way multiplexer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.8 A 4-way multiplexer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.9 Symbol for a 4-way multiplexe r. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.10 Simplified circuit for a pro grammable logic array. . . . . . . . . . . . . . . . . . . . 90

5.11 Programmable logic array schematic. . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.12 Eight-byte Read Only Memory (R OM). . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.13 Two-fun ction Programmable Array Logic (PAL). . . . . . . . . . . . . . . . . . . . . 94

5.14 Clock signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5.15 NOR gate implementation of an SR latch. . . . . . . . . . . . . . . . . . . . . . . . . 96

5.16 State diagram for an SR latch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5.17 NAND gate implementation of an S'R' latch. . . . . . . . . . . . . . . . . . . . . . . 97

5.18 State table and state diagram for an S'R' latch. . . . . . . . . . . . . . . . . . . . . . 98

5.19 SR latch with Control input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5.20 D latch constructed fr om an SR latch. . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.21 D flip-flop, positive-edge triggering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.22 D flip-flop, positive-edge triggering with asynchronous preset. . . . . . . . . . . . . 101

5.23 Symbols for D flip-flops. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.24 T flip-flop state table and state diagram. . . . . . . . . . . . . . . . . . . . . . . . . . 102

5.25 T flip-flop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

5.26 JK flip-flop state table and state diagram. . . . . . . . . . . . . . . . . . . . . . . . . 103

5.27 JK flip-flop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

5.28 A 4-bit register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.29 A 4-bit register with load. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

5.30 8-way mux to select outpu t of register le. . . . . . . . . . . . . . . . . . . . . . . . . 110

5.31 Four-bit serial-to-parallel shift register. . . . . . . . . . . . . . . . . . . . . . . . . . 111

5.32 Tri-state buffer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

5.33 Four-way multiplexer built from tri-state buffers. . . . . . . . . . . . . . . . . . . . 112

5.34 4-bit memory cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

5.35 Addressing 1 MB of memo ry with one 20 × 2

20

address dec oder. . . . . . . . . . . . 113

5.36 Addressing 1 MB of memo ry with two 10 × 2

10

address decoders. . . . . . . . . . . 114

5.37 Bit storage in DRAM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

6.1 CPU block diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

6.2 Graphical representation of general purpose registers. . . . . . . . . . . . . . . . . 120

6.3 Condition codes portion of the rflags register. . . . . . . . . . . . . . . . . . . . . . 121

6.4 Subsystems of a computer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

6.5 The instruction execution cy cle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

7.1 Screen shot of the creation of a p r ogram in assembly language. . . . . . . . . . . . 151

8.1 The stack in Listing 8.3 when it is first initialized. . . . . . . . . . . . . . . . . . . . 161

8.2 The stack with one data item on it. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

8.3 The stack with three data items on it. . . . . . . . . . . . . . . . . . . . . . . . . . . 162

8.4 The stack after all three data items have been popped off. . . . . . . . . . . . . . . 162

8.5 Local variables in the program from Listing 8.5 are allocated on the stack. . . . . . 167

x LIST OF FIGURES

8.6 Local variable stack area in the program from Listing 8.5. . . . . . . . . . . . . . . 168

9.1 Assembler listing le for the function shown in Listing 9.7. . . . . . . . . . . . . . . 198

9.2 General format of instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

9.3 REX prefix byte. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

9.4 ModRM byte. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

9.5 SIB byte. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

9.6 Machine code for the mov from a register to a register instruction. . . . . . . . . . . 201

9.7 Machine code for the mov immediate data to a register in struction. . . . . . . . . . 202

9.8 Machine code for the add immediate data to the A register . . . . . . . . . . . . . . 203

9.9 Machine code for the add immediate data to a register . . . . . . . . . . . . . . . . 203

9.10 Machine code f or the add immediate d ata to a register instruction. . . . . . . . . . 203

9.11 Machine code f or the add register to register instruction. . . . . . . . . . . . . . . . 204

10.1 Flow chart of a while loop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

10.2 Flow chart of if-else construct. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

11.1 Arguments and local variables in the stack frame, sumInts function. . . . . . . . . 241

11.2 Arguments 7 9 are passed on the stack to the sumNine function. . . . . . . . . . . 246

11.3 Arguments and local variables in the stack frame, sumNine function. . . . . . . . . 247

11.4 Overall layout of the stack frame. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

11.5 Calling function's stack fr ame, 32-bit mode. . . . . . . . . . . . . . . . . . . . . . . . 254

13.1 Memory allocation for the variables x an d y fr om the C program in Listing 13.6. . 298

14.1 IEEE 754 bit patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324

14.2 x87 floating point register stack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332

16.1 Typical bus controllers in a modern PC. . . . . . . . . . . . . . . . . . . . . . . . . . 354

List of Tables

2.1 Hexadecimal representation of four bits. . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 C/C++ syntax for specif ying literal numbers. . . . . . . . . . . . . . . . . . . . . . . 8

2.3 ASCII code for re presenting characters. . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.1 Correspondence between binary, hexade cimal, and unsigned decimal value s for

the hexadecimal digits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2 Four-bit signed integers, two's complem ent notation. . . . . . . . . . . . . . . . . . . 35

3.3 Sizes of some C/C++ data types in 32-bit and 64-bit modes. . . . . . . . . . . . . . . 43

3.4 Hexadecimal characters and corresponding int. . . . . . . . . . . . . . . . . . . . . 48

3.5 BCD code for the decimal digits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.6 Sign codes for packed BCD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.7 Gray code for 4 bits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.1 Minterms for three v ariables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.2 Maxterms for three variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.1 BCD decoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5.2 Truth table for a 3 × 8 decoder with enable. . . . . . . . . . . . . . . . . . . . . . . . 87

5.3 NOR-based SR latch state table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

5.4 SR latch with Control state table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5.5 D latch with Control state table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5.6 T flip-flop state table with D flip-flop inputs. . . . . . . . . . . . . . . . . . . . . . . 102

5.7 JK flip-flop state table with D ip-flop inputs. . . . . . . . . . . . . . . . . . . . . . . 103

6.1 X86-64 operating modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

6.2 The x86-64 registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.3 Assembly language names for portio ns of the general-pu r pose C PU registers. . . . 119

6.4 General purpose registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

7.1 Effect on other bits in a register wh en less than 64 bits are changed . . . . . . . . . 141

8.1 Common assembler directives for allocating memor y. . . . . . . . . . . . . . . . . . 156

8.2 Order of passing arguments in general purpose registers. . . . . . . . . . . . . . . . 157

8.3 Register set up for using syscall instruction to read, write , or exit. . . . . . . . . 177

9.1 Walking through the code in Listing 9.4. . . . . . . . . . . . . . . . . . . . . . . . . . 194

9.2 The mm field in the ModRM byte. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

9.3 Machine code of general purpose registers. . . . . . . . . . . . . . . . . . . . . . . . 201

10.1 Conditional jump instruction s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

10.2 Conditional jump instruction s for unsign ed values. . . . . . . . . . . . . . . . . . . 212

10.3 Conditional jump instruction s for signe d values. . . . . . . . . . . . . . . . . . . . . 212

10.4 Machine code f or the je instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

11.1 Argument register save area in stack frame. . . . . . . . . . . . . . . . . . . . . . . 241

xi

xii LIST OF TABLES

12.1 Bit patterns (in binary) of the ASCII numerals and the corresponding 32-bit ints. 274

12.2 Register usage for the mul instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . 275

12.3 Register usage for the div instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . 281

14.1 MXCSR status register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327

14.2 SSE scalar floating point conversion instructions. . . . . . . . . . . . . . . . . . . . 328

14.3 Some SSE oating point arithmetic and data movement instructions. . . . . . . . . 328

14.4 x87 Status Wo rd. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331

14.5 A sampling of x87 floating point instructions. . . . . . . . . . . . . . . . . . . . . . . 333

15.1 Some system call codes for the syscall instruction. . . . . . . . . . . . . . . . . . . 347

Listings

2.1 Using printf to display numbers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 C program showing the mathematical equivalence of the decim al and hexadeci-

mal number systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.3 Displaying a single character using C. . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.4 Echoing characters enter ed from the keyboard. . . . . . . . . . . . . . . . . . . . . 23

3.1 Shifting to multiply and divide by powers of two. . . . . . . . . . . . . . . . . . . . 46

3.2 Reading hexadecimal value s from keyboard. . . . . . . . . . . . . . . . . . . . . . . 49

6.1 Simple program to illustrate the use of gdb to view CPU re gisters. . . . . . . . . . 125

7.1 A "null" program (C). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

7.2 A "null" program (gcc assembly language). . . . . . . . . . . . . . . . . . . . . . . . 134

7.3 A "null" program (programmer assembly language). . . . . . . . . . . . . . . . . . 135

7.4 A "null" program (gcc assembly language w ithout exception handler frame). . . . 144

7.5 The "null" program rewritten to show a label placed on its own line. . . . . . . . . 144

7.6 Assembly language embedded in C source code listing. . . . . . . . . . . . . . . . . 145

7.7 A "null" program (gcc assembly language in 32-bit mode). . . . . . . . . . . . . . . 146

7.8 A "null" program (programmer assembly language in 32-bit mode). . . . . . . . . 147

8.1 "Hello world" program using the write system call function (C). . . . . . . . . . . 154

8.2 "Hello world" program u sing the write system call function (gcc assembly lan-

guage). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

8.3 A C imple mentation of a stack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

8.4 Save and restore the contents of the rbx and r12 r15 registers. . . . . . . . . . . 163

8.5 Echoing characters enter ed from the keyboard (gcc assembly language). . . . . . 165

8.6 Echoing characters enter ed from the keyboard (programmer assembly language). 169

8.7 Calling printf and scanf to write and read fo rmatted I/O (C). . . . . . . . . . . . 171

8.8 Calling printf and scanf to write and read formatted I/O ( gcc assembly language).171

8.9 Calling printf and scanf to write and read formatted I/O (programmer assembly

language). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

8.10 Some local variables (C). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

8.11 Some local variables (gcc assembly language). . . . . . . . . . . . . . . . . . . . . . 174

8.12 Some local variables (programm er assembly language). . . . . . . . . . . . . . . . 175

8.13 General for mat of a function written in assembly language. . . . . . . . . . . . . . 176

8.14 Echo character program using the syscall instruction. . . . . . . . . . . . . . . . 177

8.15 Displaying fo ur characters o n the screen using the write system call function in

assembly language. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

9.1 Assignment to a register variable (C). . . . . . . . . . . . . . . . . . . . . . . . . . . 184

9.2 Assignment to a register variable (gcc assembly language). . . . . . . . . . . . . . 184

9.3 Assignment to a register variable (programmer assembly lang uage). . . . . . . . 186

9.4 Addition and subtraction (C). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

9.5 Addition and subtraction (gcc assembly language). . . . . . . . . . . . . . . . . . . 192

9.6 Addition and subtraction (pro grammer assembly language). . . . . . . . . . . . . 194

9.7 Some instructions for us to assemble. . . . . . . . . . . . . . . . . . . . . . . . . . . 196

10.1 Displaying a string one character at a time (C). . . . . . . . . . . . . . . . . . . . . 208

10.2 Unconditional jumps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

10.3 Displaying a string one character at a time (gcc assembly language). . . . . . . . 215

10.4 General structure of a count-controlled while loop. . . . . . . . . . . . . . . . . . . 217

xiii

xiv LISTINGS

10.5 Displaying a string one character at a time (program mer assembly language). . . 218

10.6 A do-w hile loop to pr int 10 characters. . . . . . . . . . . . . . . . . . . . . . . . . . 220

10.7 Get yes/no response f rom user (C). . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

10.8 Get yes/no response f rom user (gcc assembly language). . . . . . . . . . . . . . . . 223

10.9 General structure of an if-else construct. . . . . . . . . . . . . . . . . . . . . . . . 224

10.10 Get yes/no response from user (pro grammer assembly language). . . . . . . . . . 225

10.11 Compo und boolean expression in an if-else construct (C). . . . . . . . . . . . . . 227

10.12 Compo und boolean expression in an if-else construct (gcc assembly language). 228

10.13 Simple for loop to pe r form multiplication. . . . . . . . . . . . . . . . . . . . . . . . 234

11.1 Passing argu ments to a function (C). . . . . . . . . . . . . . . . . . . . . . . . . . . 238

11.2 Accessing arguments in the sumInts function from Listing 11.1 ( gcc assembly

language). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

11.3 Accessing arguments in the sumInts function from Listing 11.1 (programmer as-

sembly language) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

11.4 Passing mo re than six arguments to a function (C). . . . . . . . . . . . . . . . . . . 243

11.5 Passing mo re than six arguments to a function (gcc assembly language). . . . . . 244

11.6 Passing mo re than six arguments to a function (programmer assembly language). 249

11.7 Passing mo re than six arguments to a function (gcc assembly language, 32-bit). . 252

12.1 Convert letters to upper/lower case (C). . . . . . . . . . . . . . . . . . . . . . . . . . 260

12.2 Convert letters to upper/lower case (gcc assembly language). . . . . . . . . . . . . 262

12.3 Convert letters to upper/lower case (programmer assembly language). . . . . . . 266

12.4 Shifting bits (C). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270

12.5 Shifting bits (gcc assembly language). . . . . . . . . . . . . . . . . . . . . . . . . . 271

12.6 Shifting bits (program mer assembly language). . . . . . . . . . . . . . . . . . . . . 272

12.7 Convert decimal text string to int (C). . . . . . . . . . . . . . . . . . . . . . . . . . 277

12.8 Convert decimal text string to int (gcc assembly language). . . . . . . . . . . . . 278

12.9 Convert decimal text string to int (programmer assembly language). . . . . . . . 279

12.10 Convert unsigned int to decimal text string (C). . . . . . . . . . . . . . . . . . . . 282

12.11 Convert unsigned int to decimal text string (gcc assembly language). . . . . . . . 283

12.12 Convert unsigned int to decimal text string (programmer assembly language). . 285

13.1 Storing a value in one ele ment of an array (C). . . . . . . . . . . . . . . . . . . . . 291

13.2 Storing a value in one ele ment of an array (gcc assembly language). . . . . . . . . 292

13.3 Clear an array (C). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

13.4 Clear an array (gcc assembly language). . . . . . . . . . . . . . . . . . . . . . . . . 294

13.5 Clear an array (programmer assembly language). . . . . . . . . . . . . . . . . . . 295

13.6 Two struct variables (C). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297

13.7 Two struct variables (gcc assembly language). . . . . . . . . . . . . . . . . . . . . 298

13.8 Two struct variables (programm er assembly language). . . . . . . . . . . . . . . . 299

13.9 Passing struct variables (C). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302

13.10 Passing struct variables (gcc assembly language). . . . . . . . . . . . . . . . . . . 303

13.11 Passing struct variables assembly language version. . . . . . . . . . . . . . . . 305

13.12 Add 1 to user's' fraction (C++). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307

13.13 Add 1 to user's' fraction (C). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310

13.14 Add 1 to user's' fraction (programmer assembly language). . . . . . . . . . . . . . 314

14.1 Fixed point addition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320

14.2 Converting a fraction to a flo at. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329

14.3 Converting a fraction to a flo at (gcc assembly language, 64-bit). . . . . . . . . . . 329

14.4 Converting a fraction to a flo at (gcc assembly language, 32-bit). . . . . . . . . . . 333

14.5 Use float for Loop Control Variable? . . . . . . . . . . . . . . . . . . . . . . . . . . . 338

14.6 Are floats accurate? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339

14.7 Casting integer to float in C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340

14.8 Casting integer to float in assembly language. . . . . . . . . . . . . . . . . . . . . . 340

15.1 Using syscall to cat a file. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346

16.1 Sketch of basic I/O function s using memory-m apped I/O C version. . . . . . . . 356

16.2 Memory-mapped I/O in assembly language. . . . . . . . . . . . . . . . . . . . . . . 358

16.3 Sketch of basic I/O function s, isolated I/O C ver sion. . . . . . . . . . . . . . . . 361

Preface xv

16.4 Isolated I/O in assembly language. . . . . . . . . . . . . . . . . . . . . . . . . . . . 362

B.1 An example of a Makefile for an assembly language program with one source file. 374

B.2 An example of a Makefile for a program with both C and assembly language

source les. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375

B.3 Makefile variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375

B.4 Incomplete Makefile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376

D.1 Embedding an assembly language instruction in a C fu nction (C). . . . . . . . . . 383

D.2 Embedding an assembly language instruction in a C function gcc assembly lan-

guage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384

D.3 Embedding more than one assembly language instruction in a C function and

specifying a register (C). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385

D.4 Embedding more than one assembly language instruction in a C function and

specifying a register (gcc assembly language). . . . . . . . . . . . . . . . . . . . . . 386

Preface

This book introduces the concepts of how c omputer hardware works from a programmer's point

of view. A programmer's job is to design a sequence of instructions that will cause the hardware

to perform operations that solve a pro blem. This book looks at these instructions by e xploring

how C /C++ lang uage constructs are implemented at the instruction set architecture level.

The specific architecture presented in this book is the x86-64 that has evolved ov er the years

from the Intel 8086 pr ocessor. The GNU programming environment is used, and the operating

system kernel is Linux.

The basic guidelines I followed in creating this book are:

O ne should avoid writing in assembly language ex cept when absolutely necessary.

Learn in g is easier if it builds upon concepts you already know.

"Real world" hardware and software make a more interesting platform for learning theo-

retical concepts.

The tools used for teaching should be ine xpensive and read ily available.

It may seem strange that I would re c ommend against assembly lang uage programming in

a book largely devoted to the subject. Well, C was introduced in 1978 specifically for low-level

programming. C c ode is much easier to write and to maintain than assembly language. C

compilers have evolved to a point where they produce better machine cod e than all but the best

assembly language programmers can. In addition, the hardware technology has increased such

that there is seldom an y significant advantage in writing the most efficient machine code. In

short, it is hardly ever worth the effort to write in assembly languag e.

You might well ask why you should study assembly lang uage, given that I think y ou should

avoid writing in it. I believe very strongly that the best programmers have a good understanding

of how computer hardware works. I think this principle holds in most fields: the best drivers

understand how automobiles work; the best mu sicians understand how their instrument works;

etc.

So this is not a book on how to write programs in assembly language. Most of the programs

you will be asked to write will be in assembly language, but they are very simple programs

intended to illustrate the concepts. I believe that this book will help you to become a better

programmer in any programming lang uage, even if you never write another line of assembly

language.

Two issues arise immediately when studying assembly language:

I/O inter action with a user through even the keyboard and screen is a very complex prob-

lem, well beyond the programming expertise of a beginner.

The re is an almost endless variety of instructions that can be used.

There are several ways to deal with these problems in a textbook. Some books use a simple

operating system for I/O, e.g., MS-DOS. Others provide libraries of I/O functions that are specific

for the examples in the book. Several textbooks deal with the instruction set issue by p r esenting

a simplified "idealized" architecture with a small number of instructions that is intended to

illustrate the concepts.

In kee ping with the "real world" criterio n of this book, it deals with these two issues by:

xvi

Preface xvii

1. sho wing you how to call the I/O fun ctions already available in the C Standard Library, and

2. p resenting only a small subset of the available instructions.

This has the additional advantage of not requiring additional software to be installed. In ge n-

eral, all the programming discussed in the book and be do ne on any of the common Linux dis-

tributions that has been set up for software development with few or no changes.

Stand-alone

assembly

language

programs.

Readers who wish to wr ite assembly language programs that do not use the C runtime envi-

ronment should read Sections 8.5 (page 177) and 15.6 (page 345).

If you do decide to write more complex programs in assembly language there are several

other excellent books on that topic; see the Bibliography on page 485. And, of co urse, you would

want the manufacturer's pr ogramming manuals; see for example [2] [6] and [14] [18]. The

goal here is to provide you with an introduc tory "look under the h ood" of a high-lev el language

at the hardware that lies below.

This book also pr ovides an introduction to computer hardware architecture. The view is

from a programmer's eye. Other excellent books provide implementation details. You ne ed to

understand many of the implementation details, e.g., pipelining, caches, in order to write highly

optimized program s. This book provides the intro duction that prepares you for learning about

more advanced architectural concepts.

This is not the place to argue about operating systems. I could rationalize my choice of

GNU/Linux, but I could also rationalize using others. Therefore, I will simply state that I

believe that GNU/Linux provides an excellent e nvironmen t for studying programm ing in an

academic setting. One of the mor e important features of the GNU programming environment

with respect to the goals of this book is the close integration of C/C++ and assembly language.

In addition, I like GNU/Linux.

I wish to comment on my use of "GNU/Linux" in stead of the simpler "Linux." Much has

been written about these n ames. A good source of the various argu ments can be found at

www.wikipedia.org. The two main p oints are that (a) Linux is only the kernel, and (b) all

general-purpose distributions rely on m an y GNU components for the remaining systems soft-

ware. Although "Linux " has bec ome essentially a synomym for "GNU/Linux," this book could

not exist without the GNU components, e.g., the assembler (as), the link editor (ld ) , the make

program, etc. Therefore, I wish to acknowledge the importance o f the GNU project by using the

full "GNU/Linux" name.

In some ways, the x86-64 instruction set architecture is not the best choice for studying

computer architecture. It maintains backwards compatibility and is thus somewhat more com-

plicated at the instruction set leve l. Ho wever, it is by far the m ost widely de ployed architecture

on the desktop and one of the least expensive way to set u p a system whe re these concepts can

be studied.

Assembly language is my favorite subject in computer science, but I have taught the subject

to e nough students to know that, realistically, it probably w ill not be the same for you. However,

please keep your eye on the long term. I am confident that material presen ted in this book will

help you to become a better pr ogrammer, and if you do enjoy assembly language, you will have

a g ood introduction to a more ad vanced study of it.

Assumed Background

You should have taken an introdu ctory class in programming, preferably in C, C++, or Java.

The high-le vel language used in this book is C, however all the C prog ramming is simple. I

am confi dent that the C programming examples in Chapters 2 and 3 will provide sufficient C

programming concepts to make the rest of the book v ery usable, regardless of the language you

learned in your introductory class.

I believe that more experienced programmers who wish to write for the x86-64 architecture

can also benefit from reading this book. I n principle, these programmers can learn everything

they need to know from reading the approp r iate manuals. However, I have found that it is

usually helpful to have an overview of a new architecture before tackling the manuals. This

book should provide that over view. In this sense, I believe that this book can provide a good

"introduction" to using the manuals.

xviii Preface

Learning from this Book

This book is intended for a one-seme ster, fo ur unit course. Our course format at So noma State

University consists of three hours of lecture and a two three hour supervised lab session p er

week. Many of the exercises in each chapter provide good in-lab exercises for supervised labs.

Solutions to almost all the chapter exercises are provided in Appendix E. Students should

attempt to solve an exer c ise before looking at the answer for h in ts. But I think it helps the

learning process if a student can see a solution while attempting his or her own solution.

If you have an electronic copy of this book, d o not copy and paste code. Think about it

typing in the code for ces you to read every single character. Yes, it is very tedious, but you will

learn much more this way. I'm assuming her e that your goal is to learn the material, not simply

Do not copy and

paste code!

to get the example programs to work. They are rather silly p rograms, so just getting them to

work is not of much use.

Additional resources related to this book, including an errata, can be found on my website,

bob.cs.sonoma.edu.

Development Environment

Most developers use an Integrated Develop ment Environment (IDE), which hides the process of

building a program from source code. In this book we use the component programs individually

so that you can see what is taking place.

The examples in this book were compiled or assembled on a com puter running Ubuntu 9.04.

The development programs used were:

gcc version 4.3.3

as version 2.19.1

In mo st cases compilation was done with no optimization (-O0) bec ause the goal is to study

concepts, not create the most efficient code.

The examples should work in any x86_64 GNU development environment with gcc and as

(binutils) installed. However, the machine code generated by the com piler may differ depending

on its specific configu ration and version. You w ill begin looking at compiler-generated assembly

language in Chapter 7. What you see in y our environment may differ from the examples in this

book, but the differences should be consistent as you continue through the rest of the book.

You should also keep in mind that the programs used for developmen t may have bugs. Yes,

nobody is perfect. For example, when I upgraded my Ubuntu system from 9.04 to 9.10, the

GNU assembler was upgraded from 2.19 to 2.20. The n ewer version had a bu g that caused the

line n umbering in a particular listing file to start from 0 instead of 1. (It affected the C source

code in Listing 7.6 on page 145; the numbers have been corrected in this listing.) Fortunately,

this bug did not affect the quality of the final program, but it could cause some confusion to the

programmer.

Organization of the Book

Data storage formats are covered in Chapters 2 and 3. Chapter 2 introduces the binary and

hexadecimal numbe r systems and presents the ASCII code f or storing character data. Decimal

integers, both sign ed an d unsigned, are discussed in Chapter 3 along with the code used to store

them. We u se C programs to explore the concepts in Chapter 3. The C examples also provide an

introduction to programming in C for those who have not used it yet. This introdu c tio n to C will

be sufficient for the rest of the book.

Chapters 4 and 5 get down to the actual hardware level. Chapter 4 introduces the mathemat-

ics and electronic circ uits used to build computers. There is a section on basic electronic circuit

elements for those who are n ew to electronics. Then Chapter 5 moves on to some o f the more

common logic circuits used in computers. It ends with a discussion of memory implementations.

Preface xix

If the book is being used fo r a software-only course, the instructor could consider skipping over

these two chapters

Chapter 6 introduces the ce ntral processing unit (CPU) and its relationship to memory and

I/O. There is a description of how to use the gdb de bugger to view the reg isters in the CPU. The

basic set of registers used by programmers in the x86-64 architecture is given in this chapter.

Assembly language programm in g is introduced in Chapter 7. The topic is introduced by

showing how to c reate a file con taining the assembly language generated by the gcc compiler

from C code. The basic assembly language template for a function is introduced, both for 64-bit

and 32-bit mod e. There is an overall sketch of how assemblers and linkers work.

In Chapter 8 we see h ow automatic variables are allocated on the stack, how values are

assigned to them, and how func tio ns are called. Argument passing, both in registers and on the

stack, is discussed. The chapter shows how to call the write, read, printf, and scanf C Standard

Library functions for user I/O. There is also a section on writing standalone programs that do

not use the C environment an d use the syscall instruction for direct operating system I/O.

Chapter 9 gives an introduction to machine code. There is a discussion of the REX codes

used in 64-bit mode. Two instructions, mov and add, are used as examples.

Program control ow, specifically repetition and binary decision, are covered in in Chapter

10. Conditional jumps are discussed in this chapter.

Chapter 11 discusses how to write your own fu nctions and u se the arguments passed to it.

Both the 64-bit and 32-bit function interface techniques are described.

Bit-level logical and shift operations are covered in Chapter 12. The multiplication and

division instructions are also discussed.

Arrays and structs are discussed in Chapter 13. This chapter includes a discussion of how

simple C++ objects are implemented at both the C and the assembly language lev el.

Until this point in the book we have bee n using integers. In Chapter 14 we introduce formats

for storing fractional values, including some IEEE 754 formats. In 64-bit mode the gcc co mpiler

uses SSE2 instructions for floating p oint, but x87 instructions are used in 32-bit mode. The

chapter gives an introduction to both instruction sets.

Exceptions and interru pts are discussed in Chapter 15. Chapter 16 is an introduction to

hardware level I/O. Since most students will never do I/O at this level, this is another chapter

that could be skipped.

A summary of the instructions used in this book is p r ovided in Appendix A.5. At this point,

there is only a list of the instructions. Eventually, there will be a description of each of them.

Appendix B is a highly simplified discussion of the fundamental concepts of the make facility.

Appendix C provides a very brief tutorial on using gdb for assembly language programs.

Appendix D gives a very brief introduction to the gcc syntax for embedding assembly lan-

guage in a C function.

Almost all the solutions to the chapter exercises are provided in Appendix E. These can be

useful for students who wish to use the exercises for self study; if yo u find yourself g etting stuck

on a problem, peek at the solution for some hints. Instructors are encouraged to discuss these

solutions with their students. There is much to be learned from looking at another per so n's

solution and thinking about how you might do it better.

The Bibliography lists a small fraction of the many books I have consulted when learning

this mater ial. I urge you to look at this list of books. I believe that you will want at least some

of them in your reference library.

Suggested Usage

O ur course at Sonoma State University covers each chapter approximately in the book's

order. The progr am ming exercises in Chapters 2 and 3 get the students used to using the

lab right f rom the beginning of the course. Hardware simulators are used in the lab for

Chapters 4 and 5.

A pure assembly language course could easily omit Ch apters 4 and 5.

In a curriculum where binary numbers are covered in another course Chapters 2 and 3

could be skimmed. I recommend covering the C coding examples in Chapters 2 and 3 for

xx Preface

students who have n ot programmed in the language. This would p r ovide an in troduction

to C that should be adequate for the rest of the boo k.

Experience d programm ers who are using this book to learn x86-64 assembly language

on their own should be able to skim the first five chapters. I believe that the remainin g

chapters would provide a good "primer" for reading the appropriate manuals.

Production of the Book

I used L

A

T

E

X 2

ε

to typese t and draw the figures for this book. The main text font is New Century

Schoolbook and the font f or code is Bera Mono scaled by 85%.

Acknowledgements

I would like to thank the many students w ho have taken assembly language from me. They

have asked many questions that caused me to think about the subject and how I can better

explain it. They are the main reason I have written this book.

My special thanks go to David Tran, a student who used this book in a class taught by

Michael Lyle at Santa Rosa Junior College in Fall 2010. David caught many of my typos and

errors, and gave me many helpful suggestions for clarifying my writing. I am very grateful for

his careful reading of the book and the time he spent providing me with his comments. It is

definitely a be tter book as a result of his diligence.

I wish to thank Richard Gordon, Ly nn Stauffer, Allan B. Cruse, Michael Lyle, and Suzanne

Rivoire for their thorough proofreading and critique of the previous versions of this book. By

teaching from this book they have caught many of my errors and pr ovided many excellent sug-

gestions for clarifying the presentation.

In addition, I would like to thank my partner, João Barretto, for e ncouraging me to write this

book and putting up with my many ho urs spent at my computer.

Chapter 1

Introduction

My goal is to make this book available as inexpensively as possible, but I would apprec iate

being p aid for the work I did to write an d produce it. As you know, a textbook like this

would ordinarily cost $50 $100 if it were published through a mainstream publisher.

The author would probably get $5 $15 of that cost. I am try in g a different way to get

paid a "royalty" here.

I have made the book fr eely available in pdf format at bob.cs.sonoma.edu. Corrections,

updates, etc. for the book will also be posted there. As you can see from m y copyright

notice above, you can only be charged the cost of the p rinting o r copying ser vice for a print

copy. I am leaving it up to you to decide how much of a "royalty" this boo k is worth to you

and how much you can afford to pay.

If you wish to pay me a "royalty" for my work please send it to my personal email account,

plantz@cds1.net, using either

y our Amazon accoun t at payments.amazon.com or

y our PayPal account at www.paypal.com

Both systems have a "Send Mone y" feature.

I want to emphasize that this is entirely voluntary on your part. The most important thing

is f or this boo k to serve your needs in learning this material. I would appr eciate hearing

any feedback you have about how I can improve the boo k to meet this goal.

Unlike most assembly language books, this one does no t emphasize writing programs in

assembly language. Higher-level languages, e.g., C, C++, Java, are much better for that. You

should avoid writing in assembly language whenever possible.

You may wonder why you should study assembly language at all. The usual reasons given

are:

1. Assembly language is more efficient. This does not always hold. Moder n compilers are

excellent at optimizing the machin e code that is generated. Only a very good assembly

language programmer can do better, and only in some situations. Assembly language

programming is very tedious, even for the best programmers. Hence, it is ve ry expensive.

The po ssible gains in efficiency are seldom worth the added expense.

2. There are s ituations where it mu s t be used. This is more difficult to evaluate. How do you

know whether assembly language is required or not?

Both these reasons presuppose that you know the assembly language equivalent of the trans-

lation that your compile r does. Otherwise, y ou would have no way of deciding whether you can

write a more efficie nt program in assembly language, and you would not know the machine level

limitations of your higher-level language. So this book begins with the fund am ental high-lev el

1

2 CHAPTER 1. INTRODUCTION

language concep ts and "loo ks under the hood" to see how they are implemented at the assembly

language level.

There is a more important reason for reading this book. The interface to the hardware from

a programmer 's view is the instruction set architecture (ISA). This book is a de scrip tion of the

ISA of the x86 architecture as it is used by the C/C++ programming languages. High er-level

languages tend to hide the ISA from the programmer, but good programmers need to understand

it. This understanding is bound to make you a better programmer, even if you never write a

single assembly langu age statement after reading this book.

Some of you will enjoy assembly language programming and wish to carry on. If your inter-

ests take you into systems p rogramming, e.g., writing parts o f an operating system, writing a

compiler, or even designing another higher-level language, an u nderstanding of assembly lan-

guage is required. There are many challenging oppor tunities in programming embedded sys-

tems, and much of the work in this area demands at least an understanding of the ISA. This

book serves as an introduction to assem bly language prog ramming and prepares you to move

on to the intermediate and advanced levels.

In his book The Design and Evoluti on of C++[32] Bjarne Stroustrup nicely lists the purp oses

of a programming language:

a tool for instructing machines

a means of communicating between pr ogrammers

a veh icle for expressing high-level designs

a notation for algorithms

a way of expressing relationships betw een concepts

a tool for experimentation

a means of controlling computerized de vices.

It is assumed that you have had at least an introduction to programming that cov ered the

first five items on the list. This book focuses on the first item instructing machines by

studying assembly language p rogramming of a 64-bit x86 architecture comp uter. We will use C

as an example higher-le vel language and study how it instructs the computer at the assembly

language lev el. Since there is a one-to-on e correspondence between assembly language an d

machine language, this amounts to a study of how C is used to instruct a machine (computer).

You have already learned that a compiler (or interpreter) translates a program written in a

higher-level lang uage into machine language, which the computer can execute. But what does

this mean? For example, you might wonder:

How is an integer stored in memory?

How is a computer instructed to implement an if-else construct?

What happens when one function calls another fun c tio n? How does the computer know

how to return to the statemen t follo wing the function call statement?

How is a computer instructed to display a simple character string for example, "Hello,

world" on the screen?

It is the goal of this book to answer these and many other questions. The specific higher-level

programming language concepts that are addressed in this book include:

1.1. COMPUTER SUBSYSTEMS 3

General concept C/C++ implementation

Program organization Functions, variables,

literals

Allocation of variables for

storage of primitive data

types integer s,

characters

int, char

Program flow co ntrol

constructs loops,

two-way decision

while and for; if-else

Simple arithmetic and

logical op erations

+, -,

*

, /, %, &, |

Boolean operators !, &&, ||

Data o rganization

constructs arrays,

records, o bjects

Arrays, structs, classes

(C++ only)

Passing data to/from

named proc edures

Function parameter lists;

return values

Object operations Invoking a member

function (C++ on ly)

This book assumes that yo u are familiar with these programming c oncepts in C, C++, and/or

Java.

1.1 Computer Subsystems

We begin with a very brief overview of computer hardware. The presentation here is intended

to provide you with a rough context of how things fit together. In subsequen t chapters we will

delve into more details of the hardware and how it is controlled by software.

We can think of computer hardware as consisting of thr ee separate subsystems as shown in

Fig. 1.1.

CPU Memory I/O

Data Bus

Address Bus

Control Bus

Figure 1.1: Subsystems of a com puter. The C PU, Memory, and I/O subsystems communicate

with one another via the three buses.

Central Processing Unit (CPU) controls most of the activities of the computer, perform s the

arithmetic and logical oper ations, and contains a small amount of very fast memory.

Memory provides storage for the instructions for the CPU and the data they manipulate.

Input/Output (I/O) com municates with the outside world and with mass storage devices (e.g.,

disks).

When you create a new program, you use an editor program to write your new p rogram in

a high-level language, for example, C, C++, or Java. The editor program sees the source code

4 CHAPTER 1. INTRODUCTION

for your new progr am as data, which is typically stored in a file on the disk. Then you use

a compiler program to translate the high-level language statements into machine instructions

that are stored in a disk file. Just as with the editor program, the compiler program sees both

your source code and the resulting machine cod e as data.

When it comes time to exec ute the program, the instructions are read from the machine

code disk file into memory. At this point, the program is a sequence of instructions stored in

memory. Most programs include some constant data that are also stored in memory. The CPU

executes the program by fetching each instruction from memory and executing it. The data are

also fetched as needed by the program.

This computer model both the program instructions and data are stored in a memory unit

that is separate from the processing unit is referred to as the von Neum ann architecture. It

was described in 1945 by John von Neumann [35], although other computer science pioneer s of

the day were working with the same conce pts. This is in contrast to a fixed-program computer,

e.g., a calculator. A compiler illustrates one of the benefits of the von Neumann architecture. It

is a program that treats the source fi le as data, which it translates into an executable binary file

that is also treated as data. But the executable binary file can also be run as a program.

A downside of the von Neumann architecture is that a program can be written to v iew it-

self as data, thus enabling a self-modifying pr ogram. GNU/Linux, like most mode rn, general

purpose operating systems, prohibits applications from modifying themselves.

Most programs also access I/O devices, an d each access must also be program med. I/O de-

vices vary widely. Some are meant to interact with humans, for example, a keyboard, a mouse,

a screen. Others are meant for machine readable I /O. For example, a program can store a file

on a disk or read a file fro m a network. These d evices all have very different behavior, and their

timing characteristics differ drastically from one another. Since I/O device programming is diffi-

cult, and e very program makes use of them, the software to handle I/O devices is included in the

operating system. GNU/Linux provides a rich set of functions that an applications programmer

can use to perform I/O actions, and we will call upon these services of GNU/Linux to perf orm our

I/O operations. Befo r e tackling I/O pr ogramming, you need to gain a thorough understanding of

how the CPU executes programs and interacts with memory.

The goal of this book is study how programs are ex ecuted by the com puter. We will fo cus on

how the p rogram and data are stored in memory and how the CPU executes instructions. We

leave I/O programming to more advanced books.

1.2 How the Subsystems Interact

The subsystems in Figure 1.1 communicate with one another via buses. You can think of a

bus as a communication pathway with a protocol specifying exactly how the pathway is used.

The buses shown here are logical groupings of the sign als that must pass between the three

subsystems. A given bus implementation may n ot have physically separate paths for each of the

three types of signals. For example, the PCI bus standard uses the same physical pathway for

the address and the data, bu t at different times. Control signals indicate whether there is an

address or data on the lines at any given time.

A program consists of a sequence of instructions that is stored in memory. When the CPU is

ready to execute the next instruction in the program, the location of that instruction in memory

is placed on the a ddress bu s . The CPU also places a "read" signal on the control bus. The

memory subsystem responds by placing the instruction on the data bus, where the CPU can

then read it. If the CPU is instructed to read data from me mory, the same sequence of e vents

takes place.

If the CPU is instructed to store data in memory, it places the data on the data bus, places

the location in memory where the data is to be stored on the address bus, and places a "write"

signal o n the control bus. The memory subsystem responds by copy in g the data on the data bus

into the specified memory location.

If an instruction calls for reading or writing d ata from memo ry or to memory, the next in-

struction in the program sequence cannot be read from memory over the same bus until the

current instruction has comp leted the data transfer. This conflict has given rise to another

stored-program architecture. In the Harva rd architecture the program and d ata are stored in

1.2. HOW THE SUBSYSTEMS INTERACT 5

different memories, each with its own bus c onnected to the CPU. This make s it possible for the

CPU to access both program instructions and data simultaneously. The issues shou ld become

clearer to you in C hapter 6.

In modern computers the bus connecting the CPU to external memo ry modules cannot keep

up with the execution speed of the CPU. The slowdown of the bus is called the von Neumann

bottleneck. Almost all modern CPU chips include some cache memory, which is connected to

the o ther CPU co mponents with much faster internal buses. The cache memor y closest to the

CPU commonly has a Harvard architecture configuration to achieve higher throu ghput of data

processing.

CPU interaction with I/O devices is e ssentially the same as with memory. If the CPU is

instructed to read a piece of data from an input device, the particular device is specified on the

address bus and a "read" signal is placed on the control bus. The device responds by placing the

data item on the data bus. And the CPU can send data to an outpu t device by placing the data

item on the data bus, specifying the device on the address bus, and p lacing a "write" signal on

the contro l bus. Since the timing of various I/O devices varies drastically from CPU and memory

timing, special programming techniques must be used. Chap ter 16 provides an introduction to

I/O programming techniques.

These few paragraphs are intended to provide you a very general overall view of how c om-

puter hardware works. The r est of the bo ok will explore many of these co ncepts in more depth.

Most of the discussion is at the ISA le vel, but we will also take a peek at the hardware imple-

mentation. In C hapter 4 we will e ven look at some transistor circuits. The goal of the book is to

provide you with an introduction to computer architecture as seen from a software point of view.

Chapter 2

Data Storage Formats

In this chapter, w e be gin exploring h ow data is en coded for storage in memory an d wr ite some

programs in C to explore these concepts. One way to look at a modern computer is that it is

made up of:

Millions, perhaps billions, of two-state switches. Each of the switches is always in one

state o r the other, and it stays in that state until the control unit changes its state or the

power is turned off. and

A control unit that can

detect the state of each switch and

possibly change the state of that switch and/or other switches.

There is also provision for communicating with the world outside the c omputer input and

output.

2.1 Bits and Groups of Bits

Since nearly everything that take s place in a computer, from the instructions that make up a

program to the data these instructions act upon, depends upon two-state switches, we need a

good notation to use when talking about the states of the switches. It is clearly very cumbersome

to say something like,

"The rst switch is on, the second one is also on,

but the third is off, while the fourth is o n."

We need a more concise notation, which leads u s to use numbers. When dealing with numbers,

you are m ost familiar with the decimal system, which is based on ten, and thus uses ten digits.

Decimal digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

Two number systems are useful when talking about the states of switches the binary system,

which is based on two,

Binary digits: 0, 1

and the hexadecimal system, which is based on sixteen.

Hexadecimal digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f

A less c ommonly used number system is o ctal, which is based on eight.

Octal digits: 0, 1, 2, 3, 4, 5, 6, 7

6

2.1. BITS AND GROUPS OF BITS 7

"Binary digit" is commonly shortened to "bit." It is common to bypass the fact that a bit

A bit represents

the state of an

on-off switch.

represents the state of a switch, and simply call the switches "bits." Using bits (binary digits),

we can g reatly simplify the previous statement about switches as 1101, which you can think of

as representing "on, on, off, on." It does not matter whether we use 1 to represent "on" and 0 as

"off," or 0 as "on" and 1 as "off." We simply need to be consistent. Yo u will see that this will oc c ur

naturally; it will not be an issue.

Hexadecimal is commonly used as a sho rthand notation to specify bit patterns. Since there

Hexadecimal is

shorthand for

binary.

are sixteen hexadecimal digits, each one can be used to specify uniquely a group of four bits. Ta-

ble 2.1 shows the c orrespondence between each possible group of four bits and one hexadecimal

digit. Thus, the above English statement spe cifying the state of four switches can be written

with a single hexadecimal digit, d.

Memorize this

table.

Four binary digits (bits) One hexadecimal digit

0000 0

0001 1

0010 2

0011 3

0100 4

0101 5

0110 6

0111 7

1000 8

1001 9

1010 a

1011 b

1100 c

1101 d

1110 e

1111 f

Table 2.1: Hexadecimal representation of four bits.

When it is not clear from the context, we will ind icate the base of a number in this text with

a subscript. For example, 100

10

is written in decimal, 100

16

is written in hexadecimal, and 100

2

is written in binary.

Hexadecimal digits are especially convenient when we need to specify the state of a group of,

say, 16 or 32 switches. In place of each group of four bits, we can write one hexadecimal digit.

For example,

0110 1100 0010 1010

2

= 6c2a

16

and

0000 0001 0010 0011 1010 1011 1100 1101

2

= 0123 abcd

16

A single bit has limited usefulness when we want to store data. We usually need to use

a grou p of bits to store a data item. This grouping of bits is so common that most modern

computers only allow a program to access bits in groups of eight. Each of these groups is called

a byte.

byte: A contiguous group of bits, usually eight.

Historically, the number of bits in a byte has varied depending on the hardware and the ope r at-

ing system. For example, the CDC 6000 series of scientific mainframe co mputers used a six-bit

byte. Nearly ev eryone uses "byte" to me an eight bits today.

Another important reason to learn hexadecimal is that the p r ogramming language may not

allow you to specify a value in binary. Prefixing a number with 0x (zero, lower-case ex) in C/C++

means that the number is expressed in hexadecimal. Ther e is no C/C++ syntax for writing a

number in binary. The syntax for spec ifying bit patterns in C/C++ is shown in Table 2.2. (The

8 CHAPTER 2. DATA STORAGE FORMATS

32-bit p attern for the decim al value 123 will be come clear after you read Sections 2.2 and 2.3.)

Although the GN U assembler, as, includes a notation f or specifying bit patterns in binary, it is

usually m ore convenien t to use the C/C++ notation.

Specifying bit

patterns in your

source code.

Prefix Example 32-bit pattern (binary)

Decimal: none 123 0000 0000 0000 0000 0000 0000 0111 1011

Hexadecimal: 0x 0x123 0000 0000 0000 0000 0000 0001 0010 0011

Octal: 0 0123 00 000 000 000 000 000 000 000 001 010 011

Table 2.2: C/C++ syntax for specifying literal numbers. Octal bits grouped by three for readabil-

ity.

2.2 Mathematical Equivalence of Binary and Decimal

We have seen in the previous section that binary digits are the natural way to show the states of

switches within the computer and that hexadecimal is a convenient way to show the states of up

to four switches with only one character. Now we explore some of the mathematical properties

of the binary number system and show that it is numerically equivalent to the more familiar

decimal (base 10) n umber system. Showing the mathematical equivalence of the hexade cimal

and decimal num ber systems is left as exe rcises at the end of this chapter.

We will consider only integers at this point. The mathematical presentation here does, of

course, generalize to fractional value s. Simply continue the exponents of the rad ix, r, on to

negative values, i.e., n-1, n-2, . . . , 1, 0, -1, -2, . . . . This will be c overed in detail in Chapter 14.

By convention, we use a positional notation when writing numbers. Fo r example, in the

decimal nu mber system, the integer 123 is taken to mean

1 × 100 + 2 × 10 + 3 × 1

or

1 × 10

2

× 10

1

+ 3 × 10

0

The right-most digit (3 in this example) is the least significant digit because it "counts" the least

in the total value of this number. The left-most digit (1 in this example) is the most significant

digit because it "counts" the most in the total value of this number.

The base or ra dix of the decimal number system is ten. There are ten symbols for represent-

ing the digits: 0, 1, . . . , 9. Mov in g a digit one place to the left increases its value by a factor of

ten, and moving it one place to the right decreases its value by a factor of ten. The positional

notation generalizes to any radix, r:

d

n1

× r

n1

+ d

n2

× r

n2

+ . . . d

1

× r

1

+ d

0

× r

0

(2.1)

where there are n digits in the number and each d

i

= 0, 1, . . . , r-1. The radix in the binary

number system is 2, so there are only two symbols fo r representing the digits: d

i

= 0, 1. We can

specialize Equation 2.1 for the binary num ber system as

d

n1

× 2

n1

+ d

n2

× 2

n2

+ . . . d

1

× 2

1

+ d

0

× 2

0

(2.2)

where the re are n digits in the number and each d

i

= 0, 1.

For example, the eig ht-digit binary number 1010 0101 is inter preted as

1 × 2

7

+ 0 × 2

6

+ 1 × 2

5

+ 0 × 2

4

+ 0 × 2

3

+ 1 × 2

2

+ 0 × 2

1

+ 1 × 2

0

If we evaluate this expression in decimal, we get

128 + 0 + 32 + 0 + 0 + 4 + 1 + 1 = 165

10

2.3. UNSIGNED DECIMAL TO BIN AR Y CONVERSION 9

This example illustrates the metho d for con verting a number fro m the binary n umber system

to the decimal number system. It is stated in Algorithm 2.1.

Algorithm 2.1: Convert binary to unsigned decimal.

input : An integer expressed in binary.

output: De cimal expression of the integer.

Compute the value of each po wer of 2 in Equation 2.2 in decimal. 1

Multiply each power of two by its correspo nding d

i

. 2

Sum the terms in Equation 2.2. 3

Be careful to distinguish the binary number system from writing the state of a bit in binary.

Each switch in the computer can be represented by a bit (binary digit), but the en tity that it

represents may not even be a number, much less a number in the binary numbe r system. For

example, the bit pattern 0011 0010 represents the character "2" in the ASCII code for characters.

But in the binary number system 0011 0010

2

= 50

10

.

See Exercises 2-8 and 2-9 for co nverting hexadecimal to decimal.

2.3 Unsigned Decimal to Binary Conversion

In Section 2.2 (page 8), we covered conversion of a binary number to decimal. In this section

we will learn how to convert an unsigned dec imal integer to binary. Unsigned numbers have

no sign. Signed nu mbers can be either positive or negative. Say we wish to conve rt a unsigned

A positive signed

number is not

unsigned.

decimal integer, N, to binary. We set it equal to the expression in Equation 2.2, giving us:

N = d

n1

× 2

n1

+ d

n2

× 2

n2

+ . . . + d

1

× 2

1

+ d

0

× 2

0

(2.3)

where d

i

= 0 or 1. Div iding both sides by 2,

(N / 2) +

r

0

2

= d

n1

× 2

n2

+ d

n2

× 2

n3

+ . . . + d

1

× 2

0

+ d

0

× 2

1

(2.4)

where / is the div operator and the remainder, r

0

, is 0 or 1. Since (N/ 2) is an in teger and all the

terms excep t the 2

1

term on the right-hand side of Equation 2.4 are integers, we can see that

d

0

= r

0

. Subtracting r

0

/2 from both sides gives,

(N / 2) = d

n1

× 2

n2

+ d

n2

× 2

n3

+ . . . + d

1

× 2

0

(2.5)

Dividing both sides of Equation 2.5 by two:

(N / 4) +

r

1

2

= d

n1

× 2

n3

+ d

n2

× 2

n4

+ . . . + d

1

× 2

1

(2.6)

From Equation 2.6 we see that d

1

= r

1

. It follows that the binary representation of a number can

be produced from right (low-order bit) to left (high-order bit) by applying the algorithm shown

in Algorithm 2.2.

Algorithm 2.2: Convert unsigned decimal to binary.

input : An integer expressed in decimal.

output: Binary expression of the integer, one bit at a time, right-to-left.

quotient theInteger; 1

while quotient 6= 0 do 2

nextBit quotient % 2; 3

quotient quotient / 2; 4

10 CHAPTER 2. DATA STORAGE FORMATS

Example 2-a

Convert 123

10

to binary.

123 ÷ 2 = 61 + 1/2 d

0

= 1

61 ÷ 2 = 3 0 + 1/2 d

1

= 1

30 ÷ 2 = 1 5 + 0/2 d

2

= 0

15 ÷ 2 = 7 + 1/ 2 d

3

= 1

7 ÷ 2 = 3 + 1/2 d

4

= 1

3 ÷ 2 = 1 + 1/2 d

5

= 1

1 ÷ 2 = 0 + 1/2 d

6

= 1

0 ÷ 2 = 0 + 0/2 d

7

= 0

So

123

10

= d

7

d

6

d

5

d

4

d

3

d

2

d

1

d

0

= 01111011

2

= 7b

16

There are times in some programs wh en it is more natural to specify a bit pattern rather

than a decimal number. We have seen that it is possible to easily conv ert between the number

bases, so you could convert the bit pattern to a decimal value, then use that. It is usually much

easier to think of the bits in groups of four, then convert the pattern to hexadecimal.

For example, if your algorithm require d the use of zeros alternating with ones:

0101 0101 0101 0101 0101 0101 0101 0101

this can be converted to the decimal value

1431655765

or the he xadecimal value (shown here in C/C++ syntax)

Use hex to

specify bit

patterns.

0x55555555

Once y ou have memorized Table 2.1, it is clearly much easier to work with hexadecimal for bit

patterns.

The discussion in these two sections has dealt only with unsigned integers. The represen-

tation of signed integ ers depends upon som e architectural features of the CPU and will be dis-

cussed in Chapter 3 when we discuss computer arithmetic.

2.4 Memory A Place to Store Data ( an d Other Things)

We now have the language necessary to begin discussing the major c omponents of a computer.

We start with the memory.

You can think of mem ory as a (very long) array o f bytes. Each byte has a particular location

(or address) within this array. That is, you could think of

memory[123]

as specifying the 124

th

byte in memory. (Don't forget that array indexing starts with 0.) We

Each byte in

memory is

numbered.

generally do not use array notation and simply use the index number, calling it the address or

location of the byte.

address (or location): Id entifies a specific byte in memory.

The address of a p articular byte ne ver changes. That is, the 957

th

byte fr om the beginning

of memory will always remain the 957

th

byte. However, the state of each of the bits either 0

or 1 in any given byte c an be changed.

2.4. MEMORY A PLACE TO STORE DATA (AND OTHER THINGS) 11

Computer scientists typically exp ress the address of each byte in memory in hexade cimal.

So we would say that the 957

th

byte is at address 0x3bc.

From the discussion of hexadecimal in Sec tio n 2.1 (page 6) we can see that the first sixteen

bytes in memory have the addresses 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, and f. Using the

notation

address: contents (bit-pattern-at-the-address)

we show the (possible) contents (the state of the bits) of each of the first sixteen by tes of memory

in Figure 2.1.

Address Contents Address Contents

00000000: 0110 1010 00000008: 1111 0000

00000001: 1111 0000 00000009: 0000 0010

00000002: 0101 1110 0000000a: 0011 0011

00000003: 0000 0000 0000000b: 0011 1100

00000004: 1111 1111 0000000c: 1100 0011

00000005: 0101 0001 0000000d: 0011 1100

00000006: 1100 1111 0000000e: 0101 0101

00000007: 0001 1000 0000000f: 1010 1010

Figure 2.1: Po ssible contents of the first sixteen bytes of memory; addresses shown in hexadeci-

mal, contents shown in binary. Note that the addresses are shown as 32-bit values.

(The contents shown her e are arbitrary.)

The state of each bit is indicated by a binary digit (bit) and is arbitrary in Figure 2.1. The

bits have been g rouped by four for re adability. The grouping of the memory bits also shows that

we can use two hexadecimal digits to indicate the state of the bits in each by te, as shown in

Figure 2.2. For example, the contents of memory location 0000000b are 3c. That means the e ight

Each

hexadecimal

digit represents

four bits.

bits that make up the twelfth by te in memory are set to the bit pattern 0011 1100.

Address Contents Address Contents

00000000: 6a 00000008: f0

00000001: f0 00000009: 02

00000002: 5e 0000000a: 33

00000003: 00 0000000b: 3c

00000004: ff 0000000c: c3

00000005: 51 0000000d: 3c

00000006: cf 0000000e: 55

00000007: 18 0000000f: aa

Figure 2.2: Repeat of Figure 2.1 with contents sho wn in hex. Two hexadecimal characters are

required to specify one byte.

Once a bit (switch) in memory is set to either zero or one, it stays in that state until the

control unit actively changes it or the power is turned off. There is an ex ception. Computers

also contain memory in which the bits are permanently set. Such memory is called Read O nly

Memory or ROM.

Read Only Memory (ROM) : Each bit is permanently set to either zero or one. The control

unit c an read the state of each bit but cannot change it.

You have probably heard the term "RAM" used for memory that can be changed by the control

unit. RAM stands for Random Access Memory. The terminology used here is inconsistent.

"Random access" means that it takes the same amount of time to access any byte in the memory.

This is in c ontrast to memory that is sequentially accessible, e.g., tape. The length of time it

takes to access a byte on tape depends upon the physical location of the byte with respect to the

current tape p osition.

12 CHAPTER 2. DATA STORAGE FORMATS

Random Access Memory (RAM) : The control unit can read the state of each bit and can

change it.

A bit can be used to store data. Fo r example, we could use a single bit to indicate whether a

student passes a course or not. We might use 0 for "not passed" and 1 for "passed." A single bit

allows on ly two possible values of a data item. We cannot for example, use a single bit to store

a course letter grade A, B, C, D, or F.

How many bits would we need to store a letter grade? Consider all possible c ombinations of

two bits:

00

01

10

11

Since there are only four possible bit combinations, we cannot repre sent all five le tter grades

with only two bits. Let's add another bit and look at all possible bit com bination s:

000

001

010

011

100

101

110

111

There are eight possible bit patterns, wh ich is more than suffic ient to store any one of the five

letter grades. For example, we may choose to use the code

Letter Grade Bit Pattern

A 000

B 001

C 010

D 011

F 100

This e xample illustrates two issues that a programmer must consider when storing data in

memory in addition to its location(s):

How many bits are required to store the data? In or der to answer this we need to know

how many different values are allow ed for the particular data item. Study the two ex-

amples above two bits and three bits and you can see that adding a bit doubles the

number of possible values. Also, notice that we might not use all the possible bit patterns.

What is the code for storing the data? Most of the data we deal with in everyday life is no t

expressed in terms of zeros and ones. In or der to store it in computer memory, the program-

mer must decid e upon a code of zeros and ones to use. In the above (three bit) example we

used 000 to represent a letter grade of A, 001 to represent B, etc.

Thus, in the grade example, a programmer m ay choose to store the letter grade at byte

number bffffed0 in mem ory. If the grade is "A", the programmer would set the bit pattern

at location bffffed0 to 00

16

. If the grade is "C", the programmer would set the bit pattern at

location bffffed0 to 02

16

. In this example, one of the jobs of an assembly language programmer

would be to determine how to set the bit pattern at byte number bffffed0 to the appropriate bit

pattern.

High-level languages u se data types to determine the number of bits and the storage code.

For example, in C you may choose to store the letter grades in the abo ve examp le in a char

variable and use the characters 'A', 'B',. . . ,'F' to indicate the grade. In Section 2.7 you will learn

that the compiler would use the following storage formats:

2.5. USING C PROGRAMS TO EXPLORE DATA FORMATS 13

Letter Grade Bit Pattern

A 0100 0001

B 0100 0010

C 0100 0011

D 0100 0100

F 0100 0101

And programming languages, even assembly language, allow programmers to create sym-

bolic names for memory addre sses. The compiler (or assembler) determines the correspondence

between the programm er's symbolic name and the numerical address. The programmer can

refer to the address by simply using the symbolic name.

2.5 Using C Programs to Explore Data Formats

Before writing an y programs, I urge you to read Appendix B on writing Makefiles,

even if you are familiar with them. Many of the problems I have helped students solve

are due to erro rs in their Makefile. And many of the Makefile errors go undetected due

to the defau lt behavior of the make program.

We will use the C programming language to illustrate these concepts because it takes care of

the memory allocation problem, yet still allows us to get r easonably close to the hardware. You

probably learned to program in the higher-level, object-orie nted paradigm using either C++ or

Java. C does not support the object-oriented paradigm.

C is a procedural progra mming language. The program is divided into functions. Since there

are no classes in C, there is no such thin g as a member function. The programmer focuses on

the algorithms used in each function, and all data items are explicitly passed to the fun c tio ns.

We can see h ow this works by exploring the C Standard Library functions, printf and scanf,

which are use d to write to the screen and read from the keyboard. We will develop a program

in C using printf and scanf to illustrate the conc epts discussed in the previous sec tions. The

header file required by either of these functions is:

#include <stdio.h>

which includes the prototype statements for the printf and scanf functions:

int printf(const char

*

format, ...);

int scanf(const char

*

format, ...);

Use printf for

formatted output

to the screen and

scanf for

formatted input

from the

keyboard.

printf is used to display text on the screen. The first argument, format , controls the text display.

At its simplest, format is simply an explicit text string in d ouble quotes.

1

For example,

printf("Hello, world.\n");

would display

Hello, world.

If ther e are additional argume nts, the format string must specify how each of these argu-

ments is to be converted for display. This is accomplished by inserting a conversion code within

the format string at the point where the argument value is to be displayed. Each conversion

code is introduced by the '%' character. For example, Listing 2.1 shows how to display both an

int variable and a float variable.

1 /

*

2

*

intAndFloat.c

3

*

Using printf to display an integer and a float.

4

*

Bob Plantz - 4 June 2009

5

*

/

1

The text string is a null-terminated array of characters as described in Section 2.7 (page 19). This is not the C++

string class.

14 CHAPTER 2. DATA STORAGE FORMATS

6 #include <stdio.h>

7

8 int main(void)

9 {

10 int anInt = 19088743;

11 float aFloat = 19088.743;

12

13 printf("The integer is %i and the float is %f\n", anInt, aFloat);

14

15 return 0;

16 }

Listing 2.1: Using printf to display numbers.

A run of the program in Listing 2.1 on my computer g ave (user input is boldface):

bob$ ./intAndFloat

The integer is 19088743 and the float is 19088.742188

bob$

Yes, the float really is that far off. This w ill be explaine d in Ch apter 14.

Some comm on conversion codes are d or i for integer, f for float, x for hex adecimal. The

conversion codes m ay includ e other characters to specify properties like the field width of the

display, whether the value is left or right justified within the field, etc. We w ill not cover the

details here. You should read man page 3 for printf to learn more.

scanf is u se d to read from the keyboard . The f ormat string typically includes o nly conversion

codes that specify how to convert each value as it is entered from the keyboard and stored in

the following arguments. Since the values will be stored in variables, it is necessary to pass the

address of the variable to scanf. For example, we can store keyboard-entered values in x (an int

variable) and y (a float variable) thusly

scanf needs the

address of each

variable.

scanf("%i %f", &x, &y);

The use of printf and scanf are illustrated in the C program in Listing 2.2, w hich will allow

us to explore the mathematical equivalence of the decimal and hexade cimal number systems.

1 /

*

2

*

echoDecHex.c

3

*

Asks user to enter a number in decimal and one

4

*

in hexadecimal then echoes both in both bases

5

*

Bob Plantz - 4 June 2009

6

*

/

7

8 #include <stdio.h>

9

10 int main(void)

11 {

12 int x;

13 unsigned int y;

14

15 while(1)

16 {

17 printf("Enter a decimal integer (0 to quit): ");

18 scanf("%i", &x);

19 if (x == 0) break;

20

21 printf("Enter a bit pattern in hexadecimal (0 to quit): ");

22 scanf("%x", &y);

23 if (y == 0) break;

2.5. USING C PROGRAMS TO EXPLORE DATA FORMATS 15

24

25 printf("%i is stored as %#010x, and\n", x, x);

26 printf("%#010x represents the decimal integer %i\n\n", y, y);

27 }

28

29 printf("End of program.\n");

30

31 return 0;

32 }

Listing 2.2: C program showing the mathematical equ ivalence of the decimal and h exadecimal

number systems.

Here is an example run o f this program (user input is boldface):

bob$ ./echoDecHex

Enter a decimal integer: 123

Enter a bit pattern in hexadecimal: 7b

123 is stored as 0x0000007b, and

0x0000007b represents the decimal integer 123

Enter a decimal integer: 0

End of program.

bob$

Let us walk through the progr am in Listing 2.2.

The program declares two ints, x and y.

The user is p r ompted to enter an integer in decimal, and the user's response is read from

the keyboard and stored in the memory allocated for x. The conve rsion code text string

passed to scanf, "%i", causes scanf to interpret the user's keystrokes as representing a

decimal integer. Note that the address of x, &x, must be passed to scanf so that it can store

the integ er at the memory location named x.

The program next pr ompts the user to enter a bit pattern in hexadecim al. I n this case

the conversion code text string passed to scanf is "%x", which causes scanf to interpret the

user's keystrokes as r epresenting hexadecimal digits. Note that the addre ss of y, &y, must

be passed to scanf so that it can store the integer at the memory location named y.

Now let us examine the two printf function calls that display the results. The "%i" co n-

version code is straightforward. The value of the correspon ding variable is displayed in

decimal at that point in the text string.

The "%#010x" conversion factor is more intere sting. (If you are at a compu ter read section

3 of the man page for printf as you follo w through this description.) The basic conversion

is spec ified by the "x" character; it causes the value to be displayed in hexadecimal. The

"#" character causes an "alternate form" to be used for the display, which is the C syntax

for hexadecimal numbers; that is, the value is pre faced by 0x when it is d isplayed. The '0'

character immediately after the '#' character causes '0' to be used as the fill character. The

number "10" causes the display to occupy at least ten characters (the fie ld width).

Look carefully at the output fr om this program above. The bit patterns used to store the

data input by the user, shown in hexadecimal, show that the unsigned ints are stored in

the binary number system (see Section 2.2, page 8 and Section 2.3, page 9). That is, 123

10

is stored as 0000007b

16

.

The program in Listing 2.2 demonstrates a very important concept hexadecimal is used

as a human convenience for stating bit patterns. A number is not inherently binary, decimal, or

Hex is for

humans.

hexadecimal. A particular value can be expressed in a pre cisely equivalent way in each of these

three number bases. Fo r that matter, it can be expressed equivalently in any number base.

16 CHAPTER 2. DATA STORAGE FORMATS

2.6 Examining Memory With gdb

Now that we have started writing programs, you nee d to learn how to use the GNU debugger,

gdb. It may seem premature at this point. The programs are so simple, they hardly require

debugging. Well, it is better to learn how to use the debugger on a simple example than on a

complicated program that does not work. In other words, tackle one problem at a time.

There is a better reason for learning how to use gdb now. You will find that it is a v ery

valuable tool for learning the material in this book, even when you write bug-free programs.

gdb has a large number of com mands, but the f ollowing are the ones that will be used in this

section:

li lineNumber lists ten lines of the source code, centered at the specified line number.

Useful gdb

commands.

break sourceFilename:lineNumber se ts a breakpoint at the specified line in the sour c e

file. Control will return to gdb when the line number is encountered.

run begins execution of a program that has be en loaded under control of gdb.

cont continues execution of a program that h as be en running.

print expression evaluate expression and display its value.

printf "format", var1, var2,... displays the values of the vars, using the format

specified in the format string.

2

x/nfs memoryAddress displays (examine) n values in memory in format f of size s start-

ing at memoryAddress.

We will use the program in Listing 2.1 to see how gdb can be used to explore the con cepts

in more dep th. Here is a screen shot of how I compiled the program then used gdb to control

the exe c ution of the program and observe the mem ory contents. My typing is boldface and

the sessio n is annotated in italics. Note that you will probably see differ ent addresses if you

replicate this example on your own (E xercise 2-27).

bob$ gcc -g -o intAndFloat intAndFloat.c

The "-g" option is required. I t tells the compiler to include debugger information in

the executable program.

bob$ gdb ./intAndFloat

GNU gdb 6.8-debian

Copyright (C) 2008 Free Software Foundation, Inc.

License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law. Type "show copying"

and "show warranty" for details.

This GDB was configured as "x86

_

64-linux-gnu"...

(gdb) li

1 /

*

2

*

intAndFloat.c

3

*

Using printf to display an integer and a float.

4

*

Bob Plantz - 4 Jun 2009

5

*

/

6 #include <stdio.h>

7

8 int main(void)

9

2

Follows the same pattern as the C Standard Library printf.

2.6. EXAMINING MEMORY WITH GDB 17

10 int anInt = 19088743;

(gdb)

11 float aFloat = 19088.743;

12

13 printf("The integer is %i and the float is %f\n", anInt, aFloat);

14

15 return 0;

16

(gdb)

The li co mmand lists ten lines of source code. The display ends with the ( gdb) prompt.

Pushing the return key will repeat the previous com mand, and li is sma r t enough to

display the next (up to) ten lin es.

(gdb) br 13

Breakpoint 1 at 0x400523: file intAndFloat.c, line 13.

I set a breakpoint at line 13. When the program is executing, if it ever gets to this state-

ment, execution will pause before the statement is executed, and control will r eturn to

gdb.

(gdb) run

Starting program: /home/bob/intAndFloat

Breakpoint 1, main () at intAndFloat.c:13

13 printf("The integer is %i and the float is %f\n", anInt, aFloat);

The run command causes the program to start execu tion from the beginning. When it

reaches our bre akpoint, control returns to gdb.

floats are not

more accurate

than ints.

(gdb) print anInt

$1 = 19088743

(gdb) print aFloat

$2 = 19088.7422

The print command displays the val ue currently stored in the named var iable. There

is a r ound off error in the float value. As menti oned above, this wi ll be ex plained in

Chapter 14.

(gdb) printf "anInt = %i and aFloat = %f\n", anInt, aFloat

anInt = 19088743 and aFloat = 19088.742188

(gdb) printf "anInt = %#010x and aFloat = %#010x\n", anInt, aFloat

and in hex, anInt = 0x01234567 and aFloat = 0x00004a90

The printf command can be used to format the displayed values. The formatting

string is essentially the same as for th e printf functi on in the C Standard Library.

Take a moment and convert the hexadecima l values to decimal. The value of anInt is

correct, but the value of aFloat is 19088

10

. The reason for this odd behavior is that the

x formatting character in the printf function first converts the value to an int, then

displays that int in hexadecimal. In C/C++, conversion from float to int truncates

the fractional part.

Fortunately, gdb provides another command for examining the contents of memory

directly that is, the actual bit patterns. In order to use this command, we need

to determine the actual memory ad dresses where the anInt and aFloat variables are

stored.

18 CHAPTER 2. DATA STORAGE FORMATS

(gdb) print &anInt

$3 = (int

*

) 0x7fff86b6ddfc

(gdb) print &aFloat

$4 = (float

*

) 0x7fff86b6ddf8

The address-of operator (&) ca n be used to print the address of a variable. Notice that

the addresses are very large. The system is in 64-bit mode, which uses 64-bit addresses.

(gdb does no t display leading zeros.)

(gdb) help x

Examine memory: x/FMT ADDRESS.

ADDRESS is an expression for the memory address to examine.

FMT is a repeat count followed by a format letter and a size letter.

Format letters are o(octal), x(hex), d(decimal), u(unsigned decimal),

t(binary), f(float), a(address), i(instruction), c(char) and s(string).

Size letters are b(byte), h(halfword), w(word), g(giant, 8 bytes).

The specified number of objects of the specified size are printed

according to the format.

Defaults for format and size letters are those previously used.

Default count is 1. Default address is following last thing printed

with this command or "print".

The x com mand is used to exa mine mem ory. Its help message is very brief, but it tells

you everything y ou need to know.

(gdb) x/1dw 0x7fff86b6ddfc

0x7fff86b6ddfc: 19088743

(gdb) x/1fw 0x7fff86b6ddf8

0x7fff86b6ddf8: 19088.7422

The x command can be used to display the values in thei r stored data type.

(gdb) x/1xw 0x7fff86b6ddfc

0x7fff86b6ddfc: 0x01234567

(gdb) x/4xb 0x7fff86b6ddfc

This shows lit-

tle endian, as ex-

plained below.

0x7fff86b6ddfc: 0x67 0x45 0x23 0x01

The display of the anInt variable in he xadecimal, which is located at memory address

0x7fff86b6ddfc, also looks good. However, when displaying these same four bytes as

separate valu es, the least significan t byte appears first in memory.

Notice that in the m ultiple byte di s play, the first byte (the one that contains 0x67) is

located at the address shown on the left of the row. The n ext byte in the row is at the

subsequent address (0x7fff86b6ddfd). So this row displays e ach of the bytes stored

at the four memory addresses 0x7fff86b6ddfc, 0x7fff86b6ddfd, 0x7fff86b6ddfe, and

0x7fff86b6ddff.

(gdb) x/1fw 0x7fff86b6ddf8

0x7fff86b6ddf8: 19088.7422

(gdb) x/1xw 0x7fff86b6ddf8

0x7fff86b6ddf8: 0x4695217c

(gdb) x/4xb 0x7fff86b6ddf8

0x7fff86b6ddf8: 0x7c 0x21 0x95 0x46

The display of the aFloat variable in hexadecimal simply looks wrong. This is due to

the storage format of floats, which is very diffe r ent from ints. It will be explained in

Chapter 14.

The byte by byte display of the aFloat variable i n hexadecimal also sh ows that it is

stored in little endian order.

2.7. ASCII CHARACTER CODE 19

(gdb) cont

Continuing.

The integer is 19088743 and the float is 19088.742188

Program exited normally.

(gdb) q

bob$

Finally, I continue to the end of the program. Notice that gdb is still running and I

have to quit the gdb p rogram.

This example illustrates a property of the x86 pr ocessors. Data is stored in me mory with the

least significant byte in the lowest-numbered address. This is called little endian storage. Look

again at the display of the four bytes beginning at 0x7fff56597b58 abov e. We can rearrange this

display to show the bit patterns at each of the four locations:

7fff86b6ddfc: 67

7fff86b6ddfd: 45

7fff86b6ddfe: 23

7fff86b6ddff: 01

Yet when we look at the entire 32-bit value in hexadecimal the by tes seem to be arranged in the

proper order:

7fff86b6ddfc: 01234567

When we examine memory one byte at a time, each byte is displayed in nu merically ascend-

ing addresses. At rst glance, the value appears to be stored backwards.

We should note here that many processors, e.g., the PowerPC architecture, use big endian

storage. As the name suggests, the most significant ("biggest") byte is stored in the first (lowest-

numbered) memory address. If we ran the program above on a big endian computer, we would

see (assuming the variable is located at the same address):

(gdb) x/1xw 0x7fff86b6ddfc

0x7fff86b6ddfc: 0x01234567

(gdb) x/4xb 0x7fff86b6ddfc

Big endian com-

puter, not ours!

0x7fff86b6ddfc: 0x01 0x23 0x45 0x67

Generally, you d o not need to worry about endianess in a program. It beco mes a concern w hen

data is stored as one data type, then accessed as another.

2.7 ASCII Character Code

Almost all programs perform a great deal of text string manipulation. Text strings are made up

of groups of characters. The first program you wrote was probably a "Hello world" program. If

you wrote it in C, you used a statement like:

printf("Hello world\n");

and in C++:

cout << "Hello world\n";

When translating either of these statements into machine code, the compiler must do two things:

store each of the characters in a loc ation in mem ory where the control unit can access

them, and

g enerate the machine instructions to write the characters on the screen.

20 CHAPTER 2. DATA STORAGE FORMATS

bit bit bit bit

pat. char pat. char pat. char pat. char

00 NUL (Null) 20 (Space) 40 @ 60 '

01 SOH (Start of Hdng) 21 ! 41 A 61 a

02 STX (Start of Text) 22 " 42 B 62 b

03 ETX (End of Text) 23 # 43 C 63 c

04 EOT (End of Transmit) 24 $ 44 D 64 d

05 ENQ (Enquiry) 25 % 45 E 65 e

06 ACK (Acknowledge) 26 & 46 F 66 f

07 BEL (Bell) 27 ' 47 G 67 g

08 BS (Backspace) 28 ( 48 H 68 h

09 HT (Horizontal Tab) 29 ) 49 I 69 i

0a LF (Line Feed) 2a

*

4a J 6a j

0b VT (Vertical Tab) 2b + 4b K 6b k

0c FF (Form Feed) 2c , 4c L 6c l

0d CR (Carriage Return) 2d - 4d M 6d m

0e SO (Shift O ut) 2e . 4e N 6e n

0f SI (Shift In) 2f / 4f O 6f o

10 DLE (Data-Link Escape) 30 0 50 P 70 p

11 DC1 (Device Contro l 1) 31 1 51 Q 71 q

12 DC2 (Device Contro l 2) 32 2 52 R 72 r

13 DC3 (Device Contro l 3) 33 3 53 S 73 s

14 DC4 (Device Contro l 4) 34 4 54 T 74 t

15 NAK (Negative ACK) 35 5 55 U 75 u

16 SYN (Synchronous idle) 36 6 56 V 76 v

17 ETB (End of Trans. Block) 37 7 57 W 77 w

18 CAN (Cancel) 38 8 58 X 78 x

19 EM (End of Medium) 39 9 59 Y 79 y

1a SUB (Substitute) 3a : 5a Z 7a z

1b ESC (Escape) 3b ; 5b [ 7b {

1c FS (File Separator) 3c < 5c \ 7c |

1d GS (Group Separator) 3d = 5d ] 7d }

1e RS (Record Separator) 3e > 5e ˆ 7e

1f US (Unit Separator) 3f ? 5f

_

7f DEL (Delete)

Table 2.3: ASCII code for representing characters. The bit patterns (bit pat.) are shown in

hexadecimal.

We start by considering how a single character is stored in memory. There are many co des for

representing characters, but the most common one is the American Standard Code for In forma-

tion Interchange ASCII (pronounced "ask' e"). It uses seven bits to represent each character.

Table 2.3 shows the bit patterns for each character in hexadecimal.

It is not the sort of table that you would memorize. However, you should become familiar

Use the "man

ascii"

GNU/Linux

command.

with some of its general characteristics. In particular, notice that the numerical characters, '0'

. . . '9', are in a contiguous sequence in the code, 0x30 . . . 0x39. The same is true of the lower case

alphabetic characters, 'a' . . . 'z', and of the upp er case characters, 'A' . . . 'Z'. Notice that the lower

case alphabetic characters are numerically higher than the upper case.

The codes in the left-hand column of Table 2.3 (00 through 1f) define control characters. The

ASCII code was developed in the 1960s for transmitting data from a sender to a receiver. If you

read some of names of the control characters, you can imagine how they co uld be use d to control

the"dialog" between the send er and receiver. They are generate d on a keyboard by holding the

control key down while pressing an alphabetic key. For example, ctrl-d generates an EOT (End

of Transmission) character.

ASCII code s are usually stored in the rightmost seven bits of an eight-bit byte. The eighth bit

(the highest-order bit) is called the parity bit. It can be used for error detection in the following

way. The sender and receiver would agree ahead of time whether to use even parity or odd parity.

2.7. ASCII CHARACTER CODE 21

Even parity means that an even number of ones is always transmitted in each characters; odd

means that an odd number of ones is transmitted. Before transmitting a character in the ASCII

code, the sender would adjust the eighth bit such that the total number of ones matched the

even o r odd agreement. When the code was received, the receiver would co unt the ones in

each eight-bit byte. If the sum did not match the agreement, the receiver knew that one of

the bits in the byte had been received incorrectly. Of course, if two bits had been incorrectly

received, the er ror would pass undetected, but the chances of this double error are remarkably

small. Modern communication systems are much more reliable, and parity is seldom used when

sending individual bytes.

In some environments the high-order bit is used to provide a code for special characters. A little thought

will show you that even all eight bits will not support all languages, e.g., Greek, Russian, Chinese. The

Unicode character coding has recently been adopted to support documents that use other characters.

Java uses Unicode, and C libra ries that support Unicode are also available.

A computer system that uses an ASCII vid eo system (most modern computers) can be pro-

grammed to send a byte to the screen. The video system interprets the bit pattern as an ASCII

code (from Table 2.3) and d isplays the corresponding character on the screen.

Getting back to the text string, "Hello world\n", the com piler would store this as a constant

char array. There must be a way to specify the length of the array. In a C-style string this is

accomplished by using the sen tinel character NUL at the end o f the string. So the compiler must

allocate thirteen bytes for this string. An example of how this string is stored in memor y is

shown in Figure 2.3. Notice that C uses the LF character as a single newline character even

C-style strings

are terminated

with a NUL

character, not a

newline.

though the C syntax requires that the pr ogrammer write two characters '\n'. The area of

memory sho wn includes the three bytes immediately following the text string.

Address Contents

4004a1: 48

4004a2: 65

4004a3: 6c

4004a4: 6c

4004a5: 6f

4004a6: 20

4004a7: 77

4004a8: 6f

4004a9: 72

4004aa: 6c

4004ab: 64

4004ac: 0a

4004ad: 00

4004ae: 25

4004af: 73

4004b0: 00

Figure 2.3: A text string stored in memory by a C compiler, including thre e "garbage" bytes

after the string. Values are shown in hexadecimal. A different compilation will

likely place the string in a different memory location .

In Pascal the length of the string is specified by the first b yte in th e string. It is taken to be an 8-bit

unsigned integer. So C-style strings are typi cally processed by sentinel-controlled loops, and count-

controlled string processing loops are more common in Pascal.

The C++ string class has additional features, but the actual text string is stored as a C-style text string

within th e C++ string instance.

22 CHAPTER 2. DATA STORAGE FORMATS

2.8 write and read Functions

In Section 2.5 (page 13) we used the printf and scanf functio ns to c onvert between C data

types and single characters w r itten on the screen or read fro m the keyboard. In this section, we

introduce the two system call functions write and read. We will use the write function to send

bytes to the scree n and the read fun ction to get bytes from the keyboard.

When these low-level functions are used, it is the programmer's responsibility to convert

between the individual characters and the C/C++ data ty pe storage formats. Although this

clearly requires more programming e ffort, we will use them instead of printf and scanf in

order to better illustrate data storag e formats.

Use write to

output raw bytes

to the screen and

read to input raw

bytes from the

keyboard.

The C program in Listing 2.3 shows how to display the character 'A' on the screen. This

program allocates one byte of memory as a char variable and names it "aLetter." This byte is

initialized to the bit pattern 41

16

('A' from Table 2.3). The write function is invoked to display

the character on the screen. The arguments to write are:

1. STDOUT

_

FILENO is defined in the system header file, unistd.h.

3

It is the GNU/Linux file

descriptor for standard out (usually the screen). GNU/Linux sees all devices as files. When

a program is started the operating system opens a path to standard out and assigns it as

file descriptor number 1.

2. &aLetter is a m emory address. The sequence of one-byte bit patterns starting at this

address will be sent to standard out.

3. 1 (one) is the number of bytes that will be sent (to standard out) as a result of this call to

write .

The program returns a 0 to the op erating system.

1 /

*

2

*

oneChar.c

3

*

Writes a single character on the screen.

4

*

Bob Plantz - 4 June 2009

5

*

/

6

7 #include <unistd.h>

8

9 int main(void)

10 {

11 char aLetter = 'A';

12 write(STDOUT

_

FILENO, &aLetter, 1); // STDOUT

_

FILENO is

13 // defined in unistd.h

14 return 0;

15 }

Listing 2.3: Displaying a single character using C.

Now let's consider a program that echoes each character entered from the keyboard. We will

allocate a single char variable, read one character into the variable, and then echo the character

for the user with a message. The program will repeat this sequence on e character at a time until

the user hits the return key. The program is shown in Listing 2.4.

A run of this program gave:

When testing

your programs,

read the screen

very carefully.

bob$ ./echoChar1

Enter one character: a

You entered: abob$

bob$

3

It is generally better to use symbolic names instead of plain numbers. Th e names provide implicit documentation,

and the value may be redened in some future version.

2.8. WRITE AND READ FUNCTIONS 23

which probably looks like the program is not working correctly to y ou.

Look more carefully at the program behaviour. It illustrates some important issues when

using the read function. First, how many keys did the user hit? There were actually two

keystrokes, the "a" key and the return key. In fact, the program waits until the user hits the

return key. The user co uld have used the de lete key to change the character befo r e hitting the

return ke y.

This shows that keyboard inpu t is line buffered. Even though the application program is

requesting only one character, the operating system doe s not honor this request until the user

hits the return key, thus entering the entire line. Since the line is buffered, the user can e dit

the line before entering it.

Next, the program corr ectly e choes the first key hit then terminates. Upon program termi-

nation the shell prompt, bob$, is displayed. But the return character is still in the input buffer,

and the shell program reads it. The r esult is the same as if the user had simply pressed the

return ke y in response to the shell prompt.

1 /

*

2

*

echoChar1.c

3

*

Echoes a character entered by the user.

4

*

Bob Plantz - 4 June 2009

5

*

/

6

7 #include <unistd.h>

8

9 int main(void)

10 {

11 char aLetter;

12

13 write(STDOUT

_

FILENO, "Enter one character: ", 21); // prompt user

14 read(STDIN

_

FILENO, &aLetter, 1); // one character

15 write(STDOUT

_

FILENO, "You entered: ", 13); // message

16 write(STDOUT

_

FILENO, &aLetter, 1);

17

18 return 0;

19 }

Listing 2.4: Echoing characters entered from the keyboard.

Here is another run where I entered three characters before hitting the return key:

bob$ ./echoChar1

Enter one character: abc You entered: abob$ bc

bc 1.06.94

Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc.

This is free software with ABSOLUTELY NO WARRANTY. For details type 'warranty'.

Again, the program corr ectly echoes the first character, but the two characters bc remain in the

input line buffer. When echoChar1 terminates the shell program reads the remaining characters

from the line buffer and interprets them as a command. In this case, bc is a program, so the

shell executes that program.

An impo rtant point of the program in Listing 2.4 is to illustrate the simplistic behavior of

the write and read functions. They work at a very low level. It is y our responsibility to design

your program to interpre t each byte that is written to the screen or r ead from the keyboard .

24 CHAPTER 2. DATA STORAGE FORMATS

2.9 Exercises

2-1 (§2.1) Express the following bit patterns in hexade cimal.

a) 0100 0101 0110 0111

b) 1000 1001 1010 1011

c) 1111 1110 1101 1100

d) 0000 0010 0101 0000

2-2 (§2.1) Express the following bit patterns in binary.

a) 83af

b) 9001

c) aaaa

d) 5555

2-3 (§2.1) How many bits are represented by each of the following?

a) ffffffff

b) 7fff58b7def0

c) 1111

2

d) 1111

16

e) 00000000

2

f) 00000000

16

2-4 (§2.1) How many hexadecimal digits are required to represent each of the f ollowing?

a) eight bits

b) thirty-two bits

c) sixty-four bits

d) ten bits

e) twenty bits

f) seven bits

2-5 (§2.2) Refering to Equation 2.1, what are the values of r, n and each d

i

for the decimal

number 29458254? The hexadecimal number 29458254 ?

2-6 (§2.2) Convert the following 8-bit nu mbers to decimal by hand:

a) 10101010

b) 01010101

c) 11110000

d) 00001111

e) 10000000

f) 01100011

g) 01111011

h) 11111111

2-7 (§2.2) Convert the following 16-bit numbers to decimal by hand:

a) 1010101111001101

b) 0001001000110100

c) 1111111011011100

d) 0000011111010000

e) 1000000000000000

f) 0000010000000000

g) 1111111111111111

h) 0011000000111001

2-8 (§2.2) In Section 2.2 we developed an algorithm for converting from binary to decimal.

Develo p a similar algorithm for converting fr om he xadecimal to decimal. Use your n ew

algorithm to con vert the following 8-bit numbers to decimal by hand:

a) a0

b) 50

c) ff

d) 89

e) 64

f) 0c

g) 11

h) c8

2.9. EXERCISES 25

2-9 (§2.2) In Section 2.2 we developed an algorithm for converting from binary to decimal.

Develo p a similar algorithm for converting fr om he xadecimal to decimal. Use your n ew

algorithm to con vert the following 16-bit numbers to decimal by h and:

a) a000

b) ffff

c) 0400

d) 1111

e) 8888

f) 0190

g) abcd

h) 5555

2-10 2.3) Convert the following u nsigned, d ecimal integers to 8-bit hexade cimal representa-

tion.

a) 100

b) 123

c) 10

d) 88

e) 255

f) 16

g) 32

h) 128

2-11 2.3) Convert the followin g unsigned, decimal integers to 16-bit hexadecimal representa-

tion.

a) 1024

b) 1000

c) 32768

d) 32767

e) 256

f) 65635

g) 2005

h) 43981

2-12 2.3) Invent a code that would allow us to store letter grades with plus or minus. That is,

the grad es A, A- B+, B, B-, . . . , D, D-, F. How many bits are required for your code?

2-13 2.3) We have shown how to write only the first sixteen add resses in hexadecimal in

Figure 2.1. How would you write the address of the seventeenth byte (byte number sixteen)

in hexadecimal? Hint: If we started with ze ro in the decimal number system we would use

a '9' to represent the te nth item. How wou ld you represent the eleventh item in the decimal

system?

2-14 2.3) Redo the table in Figure 2.2 such that it shows the memory con tents in decimal.

2-15 2.3) Redo the table in Figure 2.2 such that it shows each of the sixteen by tes containing

its byte number. That is, byte number 0 contains zero, number 1 contains one, etc. Show

the contents in binary.

2-16 2.3) Redo the table in Figure 2.2 such that it shows each of the sixteen by tes containing

its byte number. That is, byte number 0 contains zero, number 1 contains one, etc. Show

the contents in hexadecimal.

2-17 2.4) You want to allocate an area in memory for storing any number between 0 and

4,000,000,000. This memory area will start at location 0x2fffeb96. Give the addre sses of

each byte of memory that will be required.

2-18 2.4) You want to allocate an area in memo ry for storing an array of 30 bytes. The first

byte will have the value 0x00 stored in it, the second 0x01, the third 0x02, etc. This memory

area will start at location 0x001000. Show what this area of memory looks like.

2-19 2.4) In Section 2.4 we invented a binary code for representing letter g r ades. Ref erring to

that code, express each of the grades as an 8-bit unsigned decimal integer.

2-20 2.5) Enter the program in Listing 2.2 and check your answers for Exercise 2-6. Note that

printf and scanf do not have a conversion for binary. Check the answers in hexadecimal.

26 CHAPTER 2. DATA STORAGE FORMATS

2-21 2.5) Enter the program in Listing 2.2 and check your answers for Exercise 2-7. Note that

printf and scanf do not have a conversion for binary. Check the answers in hexadecimal.

2-22 2.5) Enter the program in Listing 2.2 and check your answers for E xercise 2-8.

2-23 2.5) Enter the program in Listing 2.2 and check your answers for E xercise 2-9.

2-24 2.5) Enter the program in Listing 2.2 and check your answers for E xercise 2-10.

2-25 2.5) Enter the program in Listing 2.2 and check your answers for E xercise 2-11.

2-26 2.5) Modify the program in Listing 2.2 so that it also displays the addresses of the x and

y variables. Note that addresses are typically displayed in hexadecimal. How many bytes

does the compiler allocate for each of the ints?

2-27 2.6) Enter the program in Listing 2.1. Follow through the program with gdb as in the

example in Section 2.6. Using the numbers you get, explain where the variables anInt and

aFloat are stored in memory and what is stored in each location.

2-28 2.7) Write a prog r am in C that creates a display similar to Figure 2.3. Hints: use a char

*

variable to process the string one character at a time; use %08x to format the display of the

address.

2-29 2.6) Enter the program in Listing 2.4. Explain why there se ems to be an extra prompt

in the program. Set breakpoints at both the read statement and at the following write

statement. Examine the contents of the aLetter variable before the read and after it.

Notice that the behavior of gdb seems very strange when dealing with the read statement.

Explain the behavior. Hint: Both gdb and the program you are debugging use the same

keyboard for input.

2-30 2.8) Modify the pr ogram in Listing 2.4 so that it prompts the user to enter an entire line,

reads the line, then echoes the entire line. Read only one byte at a time from the keyboard.

2-31 2.8) This is similar to Exer c ise 2-30 except that when the ne wline character is read from

the keyboard (and stored in memory), the program r eplaces the newline character with

a NUL character. The program has now read a line from the keyboard and stored it as a

C-style text string. If your algorithm is correct, you will be able to read the text string

using the read low -level function and display it with the printf library function thusly

(assuming the variable where the string is stored is named theString),

printf("%s\n", theString);

and have only one newline. Notice that this program discards the newline generated when

the user hits the return key. This is the same behavior you would see if you used

scanf("\%s", theString);

in C, or

cin >> theString;

in C++ to read the input text from the keyboard.

2-32 2.8) Write a C program that prompts the user to enter a line of text on the keyboard

then echoes the entire line. The program should co ntinue echoing each line until the user

responds to the prompt by not enter in g any text and hitting the return key. Your program

should have two functions, writeStr and readLn, in ad dition to the main function. The text

string itself should be stored in a char array in main. Both functions should operate on

NUL-terminated text strings.

writeStr takes one argument, a pointer to the string to be displayed and it returns

the num ber of characters actually displayed. It u se s the write system call function to

write characters to the screen.

2.9. EXERCISES 27

readLn takes two arguments, one that points to the char array where the characters

are to be stored and one that specifies the maximum number of characters to store in

the char array. Additional keystrokes entered by the user should be read from the OS

input buf fer and discarded. readLn should return the number of characters actually

stored in the char array. readLn should not store the newline character ('\n'). It uses

the read system call function to read characters from the keyboard.

Chapter 3

Computer Arithmetic

We next turn our attention to a code for storing decimal intege r s. Since all storage in a computer

is by means of on/off switches, we c an not simply store integers as decimal digits. Exercises 3-1

and 3-2 should convince you that it will take some thought to come up with a good code that

uses simple on/off switches to represent de c im al numbers.

Another very important issue when talking about comp uter arithmetic was pointed out in

Section 2.3 (page 9). Namely, the programmer must decide how man y bits will be used for

storing the numbers before performing an y arithmetic operations. This raises the possibility

that some results will n ot fit into the allocated number o f bits. As you will see in Section 9.2

(page 189), the computer hardware provides for this possibility with the Carry Flag (CF) and

Overflow Flag (OF) in the rflags r egister located in the CPU. Depending on what you intend

the bit patterns to represent, either the Carry Flag or the Overflow Flag (not both) will indicate

the correctness of the result. However, most high le vel languages, including C and C++, do not

check the CF and OF after performing arithmetic operations.

3.1 Addition and Subtraction

Computers perform addition in the binary number system.

1

The operation is really quite easy to

understand if y ou recall all the de tails of performing addition in the decimal number system by

hand. Since most people pe r form addition on a calculator these days, let us review all the step s

required when d oing it by hand. Consider two two-dig it nu mbers, x = 67 and y = 79. Adding

these by hand on paper would look something like:

1 1 carries

67 x

+ 79 y

46 sum

We start by working f r om the right, adding the two decimal digits in the ones place. 7 + 9

exceeds 10 by 6. We show this by placing a 6 in the ones place in the sum and carrying a 1 to

the tens p lace. Next we add the three decimal digits in the tens place, 1 (the carry into the tens

place from the ones place) + 6 + 7. The sum of these three digits exceeds 10 by 4, which we sho w

by placing a 4 in the tens place in the sum and recording the fact that there is an ultimate c arr y

of one. Recall that we had decided to use only two digits, so there is no hundreds place. Using

the no tation of Equation 2.1 (page 8), we describe addition of two decimal in tegers in Algorithm

1

Most computer architectures provide arithmetic operations i n other number sy stems, but these a re somewhat spe-

cialized. We will not consider them in this b ook.

28

3.1. ADDITION AND SUBTRACTION 29

3.1.

Algorithm 3.1: Add fixed- width decimal inte gers.

given: N, number of digits.

Starting in the ones place:

for i =0 to (N-1) do 1

sum

i

(x

i

+ y

i

) % 10 ; // div operation 2

carry ( x

i

+ y

i

) / 10 ; // mod operation 3

i i + 1; 4

Notice that:

Algorithm 3.1 works because we use a positional notation when writing numbers a digit

one place to the left c ounts te n times more.

Carry from the current position one place to the left is always 0 or 1.

The reason we use 10 in the / and % operations is that there are exactly ten digits in the

decimal nu mber system : 0, 1, 2, . . . , 9.

Since we are working in an N-digit system, we m ust restrict our result to N digits. The

final carry (0 or 1) must be stated in addition to the N-digit result.

By changing "10" to "2" we get Algorithm 3.2 for add ition in the binary number system. The

only difference is that a digit one place to the left counts two times more.

Algorithm 3.2: Add fixed- width binary integ ers.

given: N, number of bits.

Starting in the ones place:

for i =0 to (N-1) do 1

sum

i

(x

i

+ y

i

) % 2 ; // div operation 2

carry ( x

i

+ y

i

) / 2 ; // mod operation 3

i i + 1; 4

Example 3-a

Compute the sum of x = 10101011 and y = 11001101.

0 0001 111 carries

1010 1011 x

+ 0100 1101 y

1111 1000 sum

This is how the algorithm was applied.

ones place:

sum

0

= (1 + 1) % 2 = 0

carry = (1 + 1) / 2 = 1

twos place:

sum

1

= (1 + 1 + 0) % 2 = 0

carry = (1 + 1 + 0) / 2 = 1

fours place:

sum

2

= (1 + 0 + 1) % 2 = 0

carry = (1 + 0 + 1) / 2 = 1

eights place:

sum

3

= (1 + 1 + 1) % 2 = 1

carry = (1 + 1 + 1) / 2 = 1

sixteens place:

sum

4

= (1 + 0 + 0) % 2 = 1

carry = (1 + 0 + 0) / 2 = 0

30 CHAPTER 3. COMPUTER ARITHMETIC

thirty-twos place:

sum

5

= (0 + 1 + 0) % 2 = 1

carry = (0 + 1 + 0) / 2 = 0

sixty-fours place:

sum

6

= (0 + 0 + 0) % 2 = 1

carry = (0 + 0 + 0) / 2 = 0

one hundred twenty-eights place:

sum

7

= (0 + 1 + 0) % 2 = 1

carry = (0 + 1 + 0) / 2 = 0

In this eight-bit example the r esult is 1111 1000, and the re is ther e is no carry beyond the eight

bits. The lack of carry is recorded in the rflags register by setting the CF bit to zero.

It should not surprise you that this alg orithm also works for hexadecimal. In fact, it works

for any radix, as shown in Algorithm 3.3.

Algorithm 3.3: Add fixed- width integers in any radix.

given: N, number of digits.

Starting in the ones place:

for i =0 to (N-1) do 1

sum

i

(x

i

+ y

i

) % radix ; // div operation 2

carry ( x

i

+ y

i

) / radix ; // mod operation 3

i i + 1; 4

For hexadecimal:

A digit one place to the left counts sixteen times more.

We use 16 in the / and % operations because ther e are sixteen digits in the hexadecimal

number system: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f.

Addition in hexadecimal brings up a notational issue. For example,

d + 9 = ?? Oops, how do we write this?

Although it is certainly possible to perform all the com putations using hexadecimal notation,

most people find it a little awkward. After you have memorized Table 3.1 it is much easier to :

convert the (hexadecimal) digit to its equivalen t decimal v alue

apply our algorithm

convert the results back to hexad ecimal

Actually, we did this when applying the algorithm to binary addition. Since the conversion of

binary digits to decimal digits is trivial, you probably did not think about it. But the conversion

of h exadecimal digits to decimal is not as trivial. To see ho w it works, first recall that the

conversion from he xadecimal to binary is straightforward. (You should h ave memo rized Table

2.1 by now.) So we will consider co nversion fro m binary to decimal.

As mentioned above, the relative position of each bit has significance. The rightmost bit

represents the ones place, the next one to the left the fours place, then the eights place, etc. In

other words, each bit repr esents 2

n

, where n = 0, 1, 2, 3,... and we start from the right. So the

binary number 1011 repre se nts:

1 × 2

3

+ 0 × 2

2

+ 1 × 2

1

+ 1 × 2

0

This is easily converted to decimal by simply working out the arithmetic in decimal:

1 × 2

3

+ 0 × 2

2

+ 1 × 2

1

+ 1 × 2

0

= 8 + 0 + 2 + 1 = 11

3.1. ADDITION AND SUBTRACTION 31

From Table 2.1 on page 7 we see that 1011

2

= b

16

, and we con clude that b

16

= 11

10

. We can add

a "decimal" colu mn to the table, giving Table 3.1.

Memorize this

table.

Four binary digits (bits)

One hexadecima l digit Decimal equivalent

0000 0 0

0001

1 1

0010 2 2

0011

3 3

0100 4 4

0101 5 5

0110

6 6

0111 7 7

1000 8 8

1001

9 9

1010 a 10

1011 b 11

1100

c 12

1101 d 13

1110

e 14

1111 f 15

Table 3.1: Correspo ndence between binary, hexadecimal, and unsigned decimal value s for the

hexadecimal digits.

Example 3-b

Compute the sum of x = 0xabcd and y = 0x6089.

1 011 carr ies

abcd x

+ 6089 y

0c56 sum

Now w e can see how Algorithm 3.3 with radix = 16 was applied in order to add the hexadeci-

mal numbers, abcd and 6089. Having memorized Table 3.1, we will convert between hexadecim al

and decimal "in our heads."

ones place:

sum

0

= (d + 9) % 16 = 6

carry = (d + 9) / 16 = 1

sixteens place:

sum

1

= (1 + c + 8) % 16 = 5

carry = (1 + c + 8) / 16 = 1

two hundred fifty- sixes place :

sum

2

= (1 + b + 0) % 16 = c

carry = (1 + b + 0) / 16 = 0

four thousand nine ty-sixes place:

sum

3

= (0 + a + 6) % 16 = 0

carry = (0 + a + 6) / 16 = 1

This four-digit example has an u ltimate carr y of 1, which is recorded in the rflags register

by setting the CF to one. The arithmetic was performed by first converting each digit to decimal.

It is then a simple matter to conver t each decimal value back to hexadecimal (see Table 3.1) to

express the final answer in hexadecim al.

Let us now turn to the subtraction operation. As you recall from subtraction in the decimal

32 CHAPTER 3. COMPUTER ARITHMETIC

number system, yo u must sometime s borrow from the next higher-order dig it in the minuend.

This is shown in Algorithm 3.4.

Algorithm 3.4: Subtract fixed-width integers in any radix.

given: N, number of bits.

Starting in the ones place, subtract Y from X:

for i =0 to (N-1) do 1

if y

i

x

i

then 2

difference

i

x

i

y

i

; 3

borrow 0; 4

else 5

j i + 1; 6

while x

j

= 0 do 7

j j + 1; 8

for j to i do 9

x

j

x

j

- 1; 10

j j - 1; 11

x

j

x

j

+ radix ; 12

i i + 1; 13

This algorithm is not as complicated as it first looks.

Example 3-c

Subtract y = 10101011 from x = 11001101.

0 0100 010 borrows

1100 1101 x

- 1010 1011 y

0010 0010 difference

The bits have been grouped to improve readability. A 1 in the borrow row indicates that 1 was

borrowed from the minuend in that place, which becomes 2 in the next place to the right. A 0

indicates that n o borrow was required. This is how the algorithm was applied.

ones place:

difference

0

= 1 - 1 = 0

twos place:

Borrow fr om the fours place in the minuend.

The borrow beco mes 2 in the twos place.

difference

1

2 - 1 = 1

fours place:

Since we borrowed 1 from here, the min uend has a 0 left.

difference

2

= 0 - 0 = 0

eights place:

difference

3

= 1 - 1 = 0

sixteens place:

difference

4

= 0 - 0 = 0

thirty-twos place:

Borrow fr om the sixty-fours place in the minuend.

The borrow beco mes 2 in the thirty-twos place.

difference

5

= 2 - 1 = 1

sixty-fours place:

Since we borrowed 1 from here, the min uend has a 0 left.

difference

6

= 0 - 0 = 0

one hundred twenty-eights place:

3.2. ARITHMETIC E R R ORS UNSIGNED INTEGERS 33

difference

7

= 1 - 1 = 0

This, of course, also works for hexadecimal, but remember that a digit one place to the le ft

counts sixteen times more. For example, conside r x = 0x 6089 and y = 0xab5d:

1 101 borrows

6089 x

ab5d y

b52c sum

Notice in this second example that we had to borrow from "beyond the width" of the two

values. That is, the two values are each sixteen bits wide, and the result must also be sixteen

bits. Whether there is borrow "from outside" to the high-order d igit is recorded in the CF of the

rflags register whenever a subtract operation is pe rformed:

no borrow from outside CF = 0

borrow from outside CF = 1

Another way to state this is for unsign ed numbers:

if the subtrahend is equ al to or less than the minuend the CF is set to z ero

if the subtrahend is larger than the minuend the CF bit is set to one

3.2 Arithmetic Errors Unsigned Inte gers

The binary number system was introduced in Section 2.2 (page 8). You undoubtedly realize by

now that it probably is a go od system for storing unsigned integers. Don't for get that it does not

matter whether we think of the integers as being in decimal, hexadecimal, or binary since they

are mathe matically equivalent. If we are going to store in tegers this way, we need to consider

the arithmetic prop erties of addition and subtraction in the binary num ber system. Since a

computer p erforms arithmetic in binary (see footnote 1 on page 28), we might ask whe ther

addition yields arithmetically correct results when representing decimal numbers in the binary

number system. We will use four-bit values to simplify the discussion. Consider addition of the

two numbers:

0100

2

= 0 × 2

3

+ 1 ×2

2

+ 0 ×2

1

+ 0 ×2

0

= 4

10

+ 0010

2

= 0 × 2

3

+ 0 ×2

2

+ 1 ×2

1

+ 0 ×2

0

= + 2

10

0110

2

= 0 × 2

3

+ 1 ×2

2

+ 1 ×2

1

+ 0 ×2

0

= 6

10

and CF = 0.

So far, the binary number system looks reasonable. Let's try two larger four-bit numbers:

0100

2

= 0 ×2

3

+ 1 ×2

2

+ 0 ×2

1

+ 0 ×2

0

= 4

10

+ 1110

2

= 1 ×2

3

+ 1 ×2

2

+ 1 ×2

1

+ 0 ×2

0

= +14

10

0010

2

= 0 ×2

3

+ 0 ×2

2

+ 1 ×2

1

+ 0 ×2

0

= 2

10

and CF = 1. The result, 2, is arithmetically incorrect. The problem here is that the addition

has produced carry beyond the fourth bit. Since this is not taken into account in the result, the

answer is wrong.

Now consider subtraction of the two numbers:

0100

2

= 0 ×2

3

+ 1 ×2

2

+ 0 ×2

1

+ 0 ×2

0

= 4

10

- 1110

2

= 1 ×2

3

+ 1 ×2

2

+ 1 ×2

1

+ 0 ×2

0

= -14

10

0110

2

= 0 ×2

3

+ 1 ×2

2

+ 1 ×2

1

+ 0 ×2

0

= 6

10

34 CHAPTER 3. COMPUTER ARITHMETIC

and CF = 1.

The result, 6, is arithmetically incorrect. The problem in this case is that the subtraction has

had to borrow from beyond the fourth bit. Since this is n ot taken into account in the result, the

answer is wrong.

From the discussion in Section 3.1 (page 28) yo u should be able to convince yourself that

these four- bit arithmetic examples generalize to any size arithmetic performed by the computer.

After adding two numbers, the Carry Flag will always be set to zero if the r e is no ultimate c arr y,

or it will be set to one if there is ultimate carry. Subtraction will set the Carry Flag to ze r o if

no borrow from the "outside" is required, or one if borrow is requ ired. These examples illustrate

the principle:

When adding or subtracting two unsigned integers, the result is arithmetically correct if

and on ly if the Carry Flag (CF) is set to zero.

It is im portant to realize that the CF and OF bits in the rflags register are always set to the

appropriate value, 0 or 1, each time an addition or subtraction is performe d by the CPU. In

particular, the CPU will not ignore the CF whe n there is no carr y, it will actively set the CF to

zero.

3.3 Arithmetic Errors Signed Integers

When re presenting signed decimal integers we hav e to use one bit for the sign. We might be

tempted to simply use the highest-order bit for this purpose. Let us say that 0 means + and 1

means -. We will try adding (+2) and (-2):

0010

2

= (+2)

10

+ 1010

2

= + (-2)

10

1100

2

= (-4)

10

The result, -4, is arithmetically incorrect. We should n ote here that the problem is the way

in which the computer does addition it p erforms binary addition on the bit patterns that

in themselves have no inherent meaning. There are computers that use this particular code

for storing signed decimal integers. The y have a special "signed add" instruction. By the way,

notice that such computer s have both a +0 and a -0!

Most computers, including the x86, use another code for representing signed decimal integers

the two's complement code. To see how this code works, we start with an ex am ple using the

decimal nu mber system.

Say that you have a cassette player and w ish to represent both positive and n egative posi-

tions on the tape. It would make sense to somehow fast-forward the tape to its center and call

that point "zero." Most cassette players have a four decimal digit counter that represents tape

position. The counter, of course, does not give actual tape position, but a "coded" representation

of the tape position. Since we wish to call the center of the tape "zero," we pu sh the counter reset

button to set it to 0000.

Now, moving the tape forward the positive direction will cause the counter to increment.

And moving the tape backward the negative direction will cause the counter to decr ement.

In particular, if we start at z ero and move to "+1" the "code" o n the tape counter will show 0001.

On the other hand, if we start at zero and move to "-1" the "code" on the tape counter will show

9999.

Using ou r tape code system to perform the arithmetic in the previous example (+2) + (-2):

1. Move the tape to (+2); the counter shows 0002.

2. Add (-2) by decrementing the tape counter by two.

The counter shows 0000, which is 0 according to our code.

Next we will perform the same arithmetic starting with (-2), then adding (+2):

1. Move the tape to (-2); the cou nter shows 9998.

3.3. ARITHMETIC E R R ORS SIGNED INTEGERS 35

2. Add (+2) by incrementing the tape counter by two.

The counter shows 0000, but there is a carry. (9998 + 2 = 0000 with carry = 1.) If we ignore the

carry, the answer is correct. This example illustrates the principle:

When adding two signed integers in the two's complement notation, c arr y is irrelevant.

The two's complement cod e uses this pattern for repre se nting signed decimal integers in bit

patterns. The correspondence between signed decimal (two's complement), hexadecimal, and

binary for four-bit values is shown in Table 3.2.

Bit patterns have

no sign; your

program does

that.

Four binary digits (bits)

One hexadecima l digit Decimal equivalent

1000

8 -8

1001 9 -7

1010 a -6

1011

b -5

1100 c -4

1101

d -3

1110 e -2

1111 f -1

0000

0 0

0001 1 1

0010 2 2

0011

3 3

0100 4 4

0101

5 5

0110 6 6

0111 7 7

Table 3.2: Four-bit signed integers, two's complement notation.

We make the following observations about Table 3.2:

The high-o rder bit of each positive number is 0.

The high-o rder bit of each negative number is 1.

However, changin g the sign of (negating) a number is more complicated than simply chang-

ing the high-order bit.

The code allows for one mor e neg ative number than positive numbers.

The range of integers, x, that can be represented in this code (with four bits) is

8

10

x +7

10

or

2

(41)

x +(2

(41)

1)

The last obser vation can be generalized f or n bits to:

2

(n 1)

x +(2

(n 1)

1)

In the two's com plement code, the negative of any integer, x, is defined as

x + ( x) = 2

n

(3.1)

Notice that 2

n

written in binary is "1" followed by n zeros. That is, it requires n+1 bits to

represent. Another way of saying this is, "in the n-bit two's complement code adding a number

to its negative produces n zeros and c arry."

36 CHAPTER 3. COMPUTER ARITHMETIC

We now derive a method for computing the negative of a number in the two's complement

code. Solvin g Equation 3.1 for x , we get:

x = 2

n

x (3.2)

For example, if we wish to compute -1 in binary (in the two's complement code ) in 8 bits, we

perform the arithmetic:

1

10

= 100000000

2

00000001

2

= 11111111

2

or in hexadecimal:

1

16

= 100

16

01

16

= f

16

This subtraction is error prone, so let's perform a few algebraic manipulations on Equation

3.2, which defin es the negation operation. First, we subtract one from both sides:

x 1 = 2

n

x 1 (3.3)

Rearranging a little:

x 1 = 2

n

1 x

= (2

n

1) x (3.4)

Now, consider the quan tity (2

n

1). Since 2

n

is written in binary as one (1) followed by n

zeros, (2

n

1) is written as n ones. For example, for n = 8:

2

8

1 = 11111111

2

(3.5)

Thus, we can express the righ t- hand side of Equation 3.4 as

2

n

1 x = 111 . . . 111

2

x (3.6)

where 111 . . . 111

2

designates n ones.

You can see how easy the subtraction on the rig ht-hand side of Equation 3.6 is if we consider

the previous example of compu ting -1 in binary in eight bits. Let x = 1 , giving:

11111111

2

00000001

2

= 11111110

2

or in hexadecimal:

f

16

01

16

= fe

16

Another (simpler ) way to look at this is

2

n

1 x = "flip all the bits in x " (3.7)

The value of the right-hand side of Equation 3.7 is called the reduced radix complement o f x.

Since the radix is two, it is commo n to call this the one's complement of x. From Equation 3.4 we

see that this computation the reduced radix com plement of x gives

x 1 = the reduced radix complemen t o f x (3.8)

Now we can easily compute -x by adding one to both sides of Equation 3.8:

x 1 + 1 = (the reduced rad ix complement of x) + 1 (3.9)

= x (3.10)

This leads us to Algorithm 3.5 fo r negating any integer stored in the two's complement, n-bit

code.

Algorithm 3.5: Negate a number in binary (compute 2's complement).

We use x' to denote the complement of x.

x x'; 1

x x + 1; 2

Negate in binary.

3.3. ARITHMETIC E R R ORS SIGNED INTEGERS 37

This process computing the one's compleme nt, then addin g one is called computing the

two's complement.

Be Careful!

"In two's complement" describes the storage code.

"Taking the two's complement" is an active computation. If the value the computation is ap-

plied to an integer stored in the two's complement notation, this computation is mathematically

equivalent to negating the number.

Combining Algorithm 3.5 with observations abou t Table 3.2 above, we can easily compute

the decimal equivalent of any integer stored in the two's complement notation by applying Al-

gorithm 3.6.

Convert binary to

signed decimal.

Algorithm 3.6: Signed binary-to-decimal conversion.

if the high-order bit is zero then 1

compute the decimal e qu ivalent of the number; 2

else 3

take the two 's complement (negate the number); 4

compute the decimal e qu ivalent of this result; 5

place a minus sign in front of the de c im al equivalent; 6

Example 3-d

The 16-bit integ er 5678

16

is store d in two's complement notation. Convert it to a signed, deci-

mal integer.

Since the high-order bit is zero, we simply compute the decimal equivalent:

5678

16

= 5 × 4096 + 6 × 256 + 7 × 16 + 8 × 1

= 20480 + 1536 + 112 + 8

= +22136

10

Example 3-e

The 16-bit integer 8765

16

is stored in two's complement notation. Convert it to a signed, decimal

integer.

Since the high-order bit is one, we first n egate the number in the two's complement format.

Take the on e's com plement 789 a

16

Add o ne 789 b

16

Compute the de c im al equivale nt.

789b

16

= ×4096 + ×256 + 9 × 16 + 11 × 1

= 28672 + 2048 + 144 + 11

= +30875

10

Place a minus sign in front of the number (since we negated it in the two's complement domain).

8765

16

= 30875

10

38 CHAPTER 3. COMPUTER ARITHMETIC

Algorithm 3.7 shows how to convert a signed decimal number to two 's complement binary.

Algorithm 3.7: Signed decimal-to-binary conversion.

if the num ber is positive the n 1

simply convert it to binary; 2

else 3

negate the number; 4

convert the result to binary; 5

compute the two's co mplement of result in the binary domain; 6

Convert signed

decimal to

binary.

Example 3-f

Convert the signed, decimal integer +31693 to a 16-bit integer in two's complement no tation.

Give the answer in hexadecimal.

Since this is a positive number, we simply convert it. The answer is to be given in hexadec im al,

so we will repetitively divide by 16 to get the answer.

31693 ÷ 16 = 1980 with remainder 13

1980 ÷ 16 = 123 with remainder 12

123 ÷ 16 = 7 with remainder 11

7 ÷ 16 = 0 with remainder 7

So the answer is

31693

10

= 7bcd

16

Example 3-g

Convert the signed, decimal integer -250 to a 16-bit integer in two's complement notation. Give

the answer in hex adecimal.

Since this is a negative number, w e first negate it, giving +250. Then we convert this value. The

answer is to be given in hexadecimal, so we will repetitively divide by 16 to get the answer.

250 ÷ 16 = 15 with remainder 10

15 ÷ 16 = 0 with remainder 15

This gives us

250

10

= 00fa

16

Now we take the one's complement: 00fa ff 05

and add one: ff06 So the answer is

250

10

= ff06

16

3.4. OVERFLOW AND SIGNED DE C IMAL INTEGERS 39

3.4 Overflow and Signed Decimal Intege rs

The number of bits used to represent a value is dete r mined at the time a program is written.

So when performing arithme tic operations we cannot simply add more d igits (bits) if the result

is too large, as we can do on pap er. You saw in Section 3.1 (page 28) that the CF indicates when

the sum of two unsigned integers exceeds the numbe r of bits allocated to it.

In Section 3.3 (page 34) you saw that carr y is irrelevan t when working with signed integers.

You also saw that adding two signed numbers can produce an incorrect result. That is, the sum

may exceed the range of values that can be represented in the allocated number of bits.

The flag s register, rflags, provides a bit, the Overflow Flag (OF), for detecting whe ther the

sum of two n-bit, signed numbers stored in the two's c omplement code has exceeded the range

allocated for it. Each operation that affects the overflow flag sets the bit equal to the ex clusive

or of the carry into the highest-order bit of the operands and the ultimate carry. For example,

when adding the two 8-bit numbers, 15

16

and 6f

16

, we get:

carry 0 1 penultimate carry

0001 0101 x

+ 0110 1111 y

1000 0100 sum

In this example, there is a carry of zero and a penultimate (next to last) carry of one. The OF

flag is equal to the exclusive or of carry and penultimate carry:

OF = CF ˆ penultimate c arr y

where "ˆ" is the exclusive or operator. In the above example

OF = 0 ˆ 1 = 1

There are three cases when addin g two numbers:

Case 1: T he two numbers are of opposite sign . We will let x be the negative number and y

the positive nu mber. Then we can ex press x and y in binary as:

x = 1 . . .

y = 0 . . .

That is, the high-order bit of one number is 1 and the high-order bit of the other is 0,

regardless of what the other bits are. Now, if we add x and y, there are two possible results

with respe ct to carry:

1. If the penultimate carry is zero:

carry 0 0 penultimate carry

0 . . . x

+ 1 . . . y

1 . . . sum

this addition would produce OF = 0 ˆ 0 = 0.

2. If the penultimate carry is one:

carry 1 1 penultimate carry

0 . . . x

+ 1 . . . y

0 . . . sum

this addition would produce OF = 1 ˆ 1 = 0.

40 CHAPTER 3. COMPUTER ARITHMETIC

We conclude that adding two integers of opposite sign always yields 0 for the over flow flag.

Next, notice that since y is positive and x negative:

0 y +(2

(n 1)

1) (3.11)

2

(n 1)

x < 0 (3.12)

Adding inequalities (3.11) and (3.12), w e get:

2

(n 1)

x + y +(2

(n 1)

1) (3.13)

Thus, the sum of two integers of opposite sign remains within the range of sign ed in tegers,

and there is no overflow (OF = 0).

Adding two

signed integers of

opposite sign is

always correct.

Case 2: Both numbers are positive. Since both are positive, we can express x and y in binary

as:

x = 0 . . .

y = 0 . . .

That is, the high-order bit is 0, regardless of what the other bits are. Now, if we add x and

y, there are two possible results with respect to carry:

1. If the penultimate carry is zero:

carry 0 0 penultimate carry

0 . . . x

+ 0 . . . y

0 . . . sum

this addition would produce OF = 0 ˆ 0 = 0. The high-order bit of the sum is ze r o, so it

is a positive number, and the sum is within range.

2. If the penultimate carry is one:

carry 0 1 penultimate carry

0 . . . x

+ 0 . . . y

1 . . . sum

this addition would produce OF = 0 ˆ 1 = 1. The high-order bit of the sum is one, so it is

a negative numbe r. Adding two positive numbers cannot yield a negative sum, so this

sum has exceeded the allocated range.

Case 3: Both numbers are negative. Since both are negative, we can express x and y in bi-

nary as:

x = 1 . . .

y = 1 . . .

That is, the high-order bit is 1, regardless of what the other bits are. Now, if we add x and

y, there are two possible results with respect to carry:

1. If the penultimate carry is zero:

carry 1 0 penultimate carry

1 . . . x

+ 1 . . . y

0 . . . sum

3.4. OVERFLOW AND SIGNED DE C IMAL INTEGERS 41

this addition would produce OF = 1 ˆ 0 = 1. The high-order bit of the sum is ze r o, so it

is a positive number. Adding two negative numbers cannot yield a negative sum, so

this sum has exceeded the allocated range.

2. If the penultimate carry is one:

carry 1 1 penultimate carry

1 . . . x

+ 1 . . . y

1 . . . sum

this addition would produce OF = 1 ˆ 1 = 0. The high-order bit of the sum is one, so it

is a negative number, and the sum is within range.

3.4.1 The Meaning of CF and OF

These results, together with the re sults from Section 3.2 (page 33), yield the following rules

when adding or subtraction two n-bit in tegers:

If your algorithm treats the result as unsigned, the Carry Flag (CF ) is zero if and only if

the result is within the n-bit range; OF is irrelevant.

Look at CF for

unsigned

integers. Look at

OF for signed

integers.

If y our algorithm treats the result as signed ( using the two's complement code), the Over-

flow Flag (OF) is zero if and only if the result is within the n-bit range; CF is irrelevant.

The CPU does not consider integers as either sign ed or unsigned. Both the CF and OF are

set according to the r ules of binary arithmetic by each arithmetic operation. The distinction

between signed and unsigned is completely determined by the program. After each addition or

subtraction operation the program should check the state of the CF for unsigned integers or the

OF of signed integers and at least indicate when the sum is in error. Most high-level languages

do not p erform this check, w hich can lead to some obscure program bug s.

Be Careful! Do not to confuse posi tive signed numbers with unsigned numbers. The ra nge for un-

signed 32-bit integers is 0 42949672 95, and for signed 32 -bit integers the range is -214748364 8

+2147483647.

Positive numbers

are signed.

The codes used for both unsigne d integers and signed integers are circular in nature. That

is, for a given numbe r of bits, each code "wraps around." This can be seen pictorially in the

"Decoder Ring" shown in Figure 3.1 for three-bit numbers.

Example 3-h

Using the "Decoder Ring" (Figure 3.1), add the unsigned integers 3 + 4.

Wo rking only in the inner ring, start at the tic mark for 3, which corresponds to the bit pat-

tern 011. The bit pattern corresponding to 4 is 100, which is four tic m arks CW from zero. So

move four tic marks CW fro m the 3 tic mark. This places us at the tic mark labeled 111, which

corresponds to 7. Since we did not pass the tic mark at the top of the Decoder Ring, CF = 0. Thus,

the result is correct.

Example 3-i

Using the "Decoder Ring" (Figure 3.1), add the unsigned integers 5 + 6.

Wo rking only in the inner ring, start at the tic mark for 5, wh ich corresponds to the bit pattern

101. The bit pattern corresponding to 6 is 110, which is six tic marks CW from zero. So move six

tic marks CW from the 5 tic mark. This places us at the tic mark labeled 011, which corr esponds

to 3. Since we have crossed the tic mark at the top of the Decoder Rin g, the CF becomes 1. Thus,

the result is incorrec t.

42 CHAPTER 3. COMPUTER ARITHMETIC

Figure 3.1: "Decoder Ring" f or three-bit signed and unsigned integers. Move clockwise when

adding numbers, c ounter-clockwise when subtracting. Crossing over 000 sets the CF

to one, indicating an error for unsigned integers. Crossing over 100 sets the OF to

one, indicating an error for signed integer s.

3.5. C/C++ BASIC DATA TYPES 43

Example 3-j

Using the "Decoder Ring" (Figure 3.1), add the signed integers (+1) + (+2).

Wo rking only in the outer ring, start at the tic mark for +1, which correspon ds to the bit pattern

001. The bit pattern corresponding to +2 is 010, which is two tic marks CW from ze ro. So move

two tic marks CW from the +1 tic mark. This places us at the tic m ark labeled 011, which

corresponds to +3. Since we did not pass the tic mark at the bottom of the Decoder Ring, OF = 0.

Thus, the result is correct.

Example 3-k

Using the "Decoder Ring" (Figure 3.1), add the signed integers (+3) + (-4).

Wo rking only in the outer ring, start at the tic mark for +3, which correspon ds to the bit pattern

011. The bit pattern corresponding to -4 is 100, which is four tic marks C C W from zero. So move

four tic marks CCW from the +3 tic mark. This places us at the tic mark labeled 111, which

corresponds to -1. Since we did not pass the tic mark at the bottom of the Decoder Ring, OF = 0.

Thus, the result is correct.

Example 3-l

Using the "Decoder Ring" (Figure 3.1), add the signed integers (+3) + (+1).

Wo rking only in the outer ring, start at the tic mark for +3, which correspon ds to the bit pattern

011. The bit pattern corresponding to +1 is 001, which is one tic mark CW from zero. So move one

tic mark CW from the +3 tic mark. This places us at the tic mark labeled 100, w hich correspond s

to -4. Since we did pass the tic mark at the bottom of the Decoder Ring, OF = 1. Thus, the result

is incorrect.

3.5 C/C++ Basic Data Types

High-level languages provide some basic data types. For example, C/C++ provides int, char,

float , etc. The sizes of som e data types are shown in Table 3.3. The sizes given in this table

Data type

32-bit mode 64-bit mode

char 8 8

int

32 32

long 32 64

long long

64 64

float 32 32

double 64 64

*

any

32 64

Table 3.3: Sizes (in bits) of some C/C++ data types in 32-bit and 64-bit modes. The size of a long

depends on the mode. Pointers (addresses) are 32 bits in 32-bit mode and can be 32

or 64 bits in 64-bit mode.

are taken from the System V Application Binary Interface specifications, reference [33] for 32-

bit and r eferenc e [25] for 64-bit, and are used by the gcc comp iler for the x86-64 architecture.

Language specifications tend to be more permissive in order to accommodate other hardware

architectures. For example, see reference [10] for the specific ation s for C.

44 CHAPTER 3. COMPUTER ARITHMETIC

A given "real world" v alue can usually be represented in more than one data type. For

example, mo st people would think of "123" as representing "one hu ndred twenty-three." This

value could be stored in a computer in int format or as a text string. An int in our C /C++

mode.

environm ent is stored in 32 bits, and the bit pattern would be

0x0000007b

As a C-style text string, it wo uld also require four bytes of memory, but their bit patterns would

be

0x31 0x32 0x33 0x00

The int format is easier to use in arithme tic and logical expressions, but the interface with

the outside world throug h the scree n and the keyboard uses the char format. If a user entered

123 from the keyboard, the operating system would read the individual characters, each in char

format. The text string must be converted to int fo r mat. After the numbe rs are manipulated,

the result must be converted from the int format to char format for display o n the screen.

C programmers use func tio ns in the stdio library and C++ pr ogrammers use functions in

the iostream library to do these conversions betwee n the int and char formats. For example,

the C code sequence

scanf("%i", &x);

x += 100;

printf("%i", x);

or the C++ cod e sequence

cin >> x;

x += 100;

cout << x;

reads characters from the keyboard and converts the character sequence into the corre-

sponding int format.

adds 100 to the int.

converts the resulting int into a character sequence and displays it on the screen.

The C or C++ I/O library functio ns in the code segments above do the necessary conversions

between character sequ ences and the int storage form at. How ever, once the conversion is p er-

formed, they ultimately call the read system call function to read bytes from the keyboard and

the write system call function to write bytes to the screen. As shown in Figure 3.2, an applica-

tion program can call the read and write functions directly to transfer bytes.

When using the read and write system call functions for I/O, it is the programmer's respon-

sibility to do the conversions between the char type used fo r I/O and the storage formats used

within the program. We will soon be writing our own functions in assembly language to convert

between the character format used for scre en display and keyboard input, and the internal stor-

age format of integers in the binary number system. The purpose of writing our own functions

is to gain a thorough understanding of how data is represented internally in the comp uter.

Aside: If the numerical data are used primarily for display, with f ew arithmetic operations, it makes

more sense to store numerical data in character format. Indeed, this is done in many business data

processing environments. But this makes ari thmetic operation more complicated.

3.5.1 C/C++ Shift Operations

Since our primary goal here is to study storage formats, we will concentrate on bit patterns.

We will develop a program in C that allows a user to enter bit patterns in hexadecimal. The

program will read the characters from the keyboard in ASCII code and con vert them into the

3.5. C/C++ BASIC DATA TYPES 45

printf scanf

write read

write read

application

C I /O libraries

OS

screen/keyboard

Figure 3.2: Relationship of I/O libraries to application and operating system. An application

can use functions in the I/O libraries to con vert between keyboard/screen chars and

basic data types, or it can directly use the read /write system calls to transfer raw

bytes.

corresponding int storage format as shown in Algorithm 3.8. This conversion algorithm involves

manipulating data at the bit level.

Algorithm 3.8: Read hexade c im al value from keyboard.

x 0; 1

Read character from keyboard; 2

while more characters do 3

x x shifted left four bit positions; 4

y new character converted to an int; 5

x x + y; 6

Read character from keyboard; 7

Display the integer; 8

Let us examine this algorithm. Each character read from the keyboard represents a hex-

adecimal digit. That is, each character is on e of '0', . . . ,'9','a', . . . ,'f'. (We assume that the user

does not make mistakes.) Since a hexadecimal digit represents four bits, we need to shift the

accumulated in teger f our bits to the le ft in order to make r oom f or the new four- bit value.

You should recognize that shifting an integer four bits to the left multiplies it by 16. As

you will see in Sections 12.3 and 12.4 (pages 273 and 280), multiplication and division are

complicated operations, and they can take a great deal of pr ocessor time. Using left/right shifts

to effect multiplication/division by powers of two is very efficient. More importantly, the fo ur-bit

shift is more natural in this application.

The C/C++ oper ato r for shifting bits to the left is «.

2

For example, if x is an int, the statement

x = x << 4;

shifts the value in x four bits to the left, thus multiplying it by sixteen. Similarly, the C/C++

operator for shif ting bits to the right is ». For example, if x is an int, the statement

x = x >> 3;

shifts the value in x three bits to the righ t, thus dividing it by eight. Note that the three right-

most bits are lost, so this is an integer div operation. The program in Listing 3.1 illustrates the

use of the C shift operators to multiply and divide by powers of two.

2

In C++ the » and « operators have been overloaded for use with the input and output streams.

46 CHAPTER 3. COMPUTER ARITHMETIC

1 /

*

2

*

mulDiv.c

3

*

Asks user to enter an integer. Then prompts user to enter

4

*

a power of two to multiply the integer, then another power

5

*

of two to divide. Assumes that user does not request more

6

*

than 32 as the power of 2.

7

*

Bob Plantz - 4 June 2009

8

*

/

9

10 #include <stdio.h>

11

12 int main(void)

13 {

14 int x;

15 int leftShift, rightShift;

16

17 printf("Enter an integer: ");

18 scanf("%i", &x);

19

20 printf("Multiply by two raised to the power: ");

21 scanf("%i", &leftShift);

22 printf("%i x %i = %i\n", x, 1 << leftShift, x << leftShift);

23

24 printf("Divide by two raised to the power: ");

25 scanf("%i", &rightShift);

26 printf("%i / %i = %i\n", x, 1 << rightShift, x >> rightShift);

27

28 return 0;

29 }

Listing 3.1: Shifting to m ultiply and divide by powers of two.

3.5.2 C/C++ Bit Operations

We begin by reviewing the C/C++ bitwise logical operators,

and &

or |

exclusive or ˆ

complement

It is easy to see what each of these operators does by using truth tables. To illustrate how truth

tables w ork, co nsider the algorithm for binary addition. In Section 3.1 (page 28) we saw that the

ith bit in the r esult is the sum o f the ith bit of one number plus the ith bit of the other number

plus the carry produced fro m adding the (i-1)th bits. This sum will produc e a carry of zero or

one. In other words, a bit adder has three inputs the two corresponding bits from the two

numbers bein g added an d the carry from the previous bit addition and two outputs the

result and the carry. In a truth table we have a column for e ach input and each output. Then we

write down all possible input bit combinations and the n show the output(s) in the corresponding

row. A truth table for the bit addition operation is shown in Figure 3.3. We use the notation

x[i] to re present the ith bit in the variable x; x[i-j] would specify bits i j.

3.5. C/C++ BASIC DATA TYPES 47

x[i] y[i] carry[(i-1)] z[i] carry[i]

0 0 0 0 0

0 1 0 1 0

1 0 0 1 0

1 1 0

0 1

0 0 1 1 0

0 1 1

0 1

1 0 1 0 1

1 1 1 1 1

Figure 3.3: Truth table for adding two bits with carry from a previous bit addition. x[i] is the

ith bit of x; carry[(i-1)] is the carry from adding the (i-1)th bits.

The bitwise logical operators act on the corresponding bits of two operands as shown in

Figure 3.4.

and

x[i] y[i]

x[i] & y[i]

0 0 0

0 1

0

1 0 0

1 1 1

inclusive or

x[i] y[i]

x[i] | y[i]

0 0 0

0 1 1

1 0

1

1 1 1

exclusive or

x[i] y[i]

x[i] ˆ y[i]

0 0 0

0 1 1

1 0

1

1 1 0

complement

x[i]

x[i]

0 1

1 0

Figure 3.4: Truth tables showing bitwise C/C++ operations. x[i] is the ith bit in the variable x.

Example 3-m

Let int x = 0x1234abcd. Compute the and, or, and xor with 0xdcba4321.

x & 0xdcba4321 = 0x10300301

x | 0xdcba4321 = 0xdebeebed

x ^ 0xdcba4321 = 0xce8ee8ec

Make sure that you distinguish these bitwise logical operators from the C/C++ logical opera-

tors, &&, ||, and !. The logical operators work on groups of bits organized into integral data types

rather than individual bits. For com parison, the truth tables for the C/C++ logical operators are

shown in Figure 3.5

48 CHAPTER 3. COMPUTER ARITHMETIC

and

x y

x && y

0 0 0

0 non-zero 0

non-zero 0 0

non-zero non-zero

1

or

x y

x || y

0 0 0

0 non-zero 1

non-zero 0 1

non-zero non-zero

1

complement

x

!x

0 1

non-zero 0

Figure 3.5: Truth tables showing C/C++ logical o perations. x and y are variables o f integral data

type.

3.5.3 C/C++ Data Type Conversions

Now we are prepared to see how we can convert from the ASCII character code to the int format.

The & operator works very nicely for the conversion. If a numeric character is stored in the char

variable aChar, from Table 3.4 we see that the required operation is

aChar = aChar & 0x0f;

Hex character ASCII code Corresponding int

0 0011 0000 0000 0000 0000 0000 0000 0000 0000 0000

1 0011 0001 0000 0000 0000 0000 0000 0000 0000 0001

2 0011 0010 0000 0000 0000 0000 0000 0000 0000 0010

3 0011 0011 0000 0000 0000 0000 0000 0000 0000 0011

4 0011 0100 0000 0000 0000 0000 0000 0000 0000 0100

5 0011 0101 0000 0000 0000 0000 0000 0000 0000 0101

6 0011 0110 0000 0000 0000 0000 0000 0000 0000 0110

7 0011 0111 0000 0000 0000 0000 0000 0000 0000 0111

8 0011 1000 0000 0000 0000 0000 0000 0000 0000 1000

9 0011 1001 0000 0000 0000 0000 0000 0000 0000 1001

a 0110 0001 0000 0000 0000 0000 0000 0000 0000 1010

b 0110 0010 0000 0000 0000 0000 0000 0000 0000 1011

c 0110 0011 0000 0000 0000 0000 0000 0000 0000 1100

d 0110 0100 0000 0000 0000 0000 0000 0000 0000 1101

e 0110 0101 0000 0000 0000 0000 0000 0000 0000 1110

f 0110 0110 0000 0000 0000 0000 0000 0000 0000 1111

Table 3.4: Hexadecimal characters and corre sponding int. Note the change in pattern from '9'

to ' a'.

We ll, we still have an 8-bit value (with the four high-order bits zero), but we will wor k on this

in a mom ent.

Next consider the alphabetic hexadecimal digits in Table 3.4. Notice that the low-order four

bits are the same whether the character is upper case or lower case. We can use the same &

operation to obtain these four bits, then add 9 to the result:

aChar = 0x09 + (aChar & 0x0f);

Conversion from the 8-bit char type to the 32-bit int type is accomplished by a type cast in C.

3.6. OTHER CODES 49

The resulting program is shown in Listing 3.2. Notice that we use the printf function to

display the resulting stored value, both in hexadecimal and decimal. The conve rsion from stored

int format to hexadecimal display is left as an exercise (Exercise 3-13).

1 /

*

2

*

convertHex.c

3

*

Asks user to enter a number in hexadecimal

4

*

then echoes it in hexadecimal and in decimal.

5

*

Assumes that user does not make mistakes.

6

*

Bob Plantz - 4 June 2009

7

*

/

8

9 #include <stdio.h>

10 #include <unistd.h>

11

12 int main(void)

13 {

14 int x;

15 unsigned char aChar;

16

17 printf("Enter an integer in hexadecimal: ");

18 fflush(stdout);

19

20 x = 0; // initialize result

21 read(STDIN

_

FILENO, &aChar, 1); // get first character

22 while (aChar != '\n') // look for return key

23 {

24 x = x << 4; // make room for next four bits

25 if (aChar <= '9')

26 {

27 x = x + (int )(aChar & 0x0f);

28 }

29 else

30 {

31 aChar = aChar & 0x0f;

32 aChar = aChar + 9;

33 x = x + (int )aChar;

34 }

35 read(STDIN

_

FILENO, &aChar, 1);

36 }

37

38 printf("You entered %#010x = %i (decimal)\n\n", x, x);

39

40 return 0;

41 }

Listing 3.2: Reading hexadecimal values from keyboard.

3.6 Other Codes

Thus far in this chapter we have used the binary number system to represent numerical values.

It is an ef ficient code in the sense that each of the 2

n

bit patterns represents a value. On the

other hand, there are some limitations in the code. We will explor e some other codes in this

section.

50 CHAPTER 3. COMPUTER ARITHMETIC

3.6.1 BCD Code

One limitation of using the binary number system is that a decimal number must be converted

to binary before storing or performin g arithmetic operations on it. An d binary numbers must be

converte d to decimal for most real-world display purposes.

The Binary Coded Decimal (BCD) code is a code for individual d ecimal digits. Since there are

ten decimal digits, the code must use fou r bits for each digit. The B C D code is shown in Table

3.5. For example, in a 16-bit storage location the decimal number 1234 would be stored in the

Decimal digit BCD code (four bits)

0 0000

1 0001

2 0010

3 0011

4 0100

5 0101

6 0110

7 0111

8 1000

9 1001

Table 3.5: BCD code for the decimal digits.

BCD code as

0001 0010 0011 0100 # BCD

and in binary as

0000 0100 1101 0010 # binary

From Table 3.5 we can see that six bit patterns are "wasted." The effect of this inefficien c y is

that a 16-bit storage location has a range of 0 9999 if we use BCD, but the range is 0 65535

if we use binary.

BCD is important in specialized systems that deal primarily with numerical data. There

are I/O devices that deal directly with numbe rs in BCD withou t converting to/fr om a character

code, for example, ASCII. The COBOL programming language supports a packed BCD format

where two BCD characters are stored in each 8-bit byte. The last (4-bit) digit is used to store the

sign of the number as shown in Table 3.6. The specific codes used depend upon the particular

implementation.

Sign BCD code (four bits)

+ 1010

- 1011

+ 1100

- 1101

+ 1110

unsigned 1111

Table 3.6: Sign codes for packed BCD.

3.6.2 Gray Code

One of the problems with both the binary and B C D codes is that the difference between two

adjacent values often requires that more than one bit be changed. For example, three bits must

be changed when increme nting from 3 to 4 0011 to 0100. If the value is read during the

time when the bits are being switched there may be an error. This is more apt to occur if the

bits are implemented with, say, mechanical switches instead o f electronic. The Gray cod e is

3.6. OTHER CODES 51

one where there is only one bit that differ s between any two adjacent v alues. As you will see in

Section 4.3, this property also allows for a very useful visual tool for simplifying Boolean algebra

expressions.

The Gray code is easily constructed. Start with one bit:

decimal Gray code

0 0

1 1

To add a bit, first duplicate the existing pattern, but reflected:

Gray code

0

1

1

0

then add a zero to the beginning of each of the original bit patterns and a 1 to each of the

reflected ones:

decimal Gray code

0 00

1 01

2 11

3 10

Let us repeat these two steps to add another bit. Reflect the pattern:

Gray code

00

01

11

10

10

11

01

00

then add a zero to the beginning of each of the original bit patterns and a 1 to each of the

reflected ones:

decimal Gray code

0 000

1 001

2 011

3 010

4 110

5 111

6 101

7 100

The Gray code for four bits is shown in Table 3.7. Notice that the pattern of only changing

one bit between adjacent values also holds when the bit pattern "wraps around." That is, only

one bit is changed when going from the highest value (15 for four bits) to the lowest (0).

52 CHAPTER 3. COMPUTER ARITHMETIC

Decimal Gray code

0 0000

1 0001

2 0011

3 0010

4 0110

5 0111

6 0101

7 0100

8 1100

9 1101

10 1111

11 1110

12 1010

13 1011

14 1001

15 1000

Table 3.7: Gray code for 4 bits.

3.7 Exercises

3-1 (§3.1) How many bits are required to store a single decimal digit?

3-2 (§3.1) Using the answer from Exercise 1, invent a code for storing eight decimal digits in

a thirty-two bit register. Using your new code, doe s binary addition produce the correct

results?

3-3 (§3.3) Select several pairs of signed integ ers from Table 3.2, convert each to binary using

the table, perform the binary addition, and check the results. Does this cod e alway s work?

3-4 (§3.3) If you did not select them in Exercise 3, add +4 an d +5 u sing the four-bit, two's

complement code (fr om Table 3.2). What answer do you get?

3-5 (§3.3) If you did not select them in Exercise 3, add -4 and -5 using the four-bit, two's

complement code (fr om Table 3.2). What answer do you get?

3-6 (§3.3) Select any positive integer from Table 3.2. Add the binary representation for the

positive value to the binary repre se ntation for the negative value. What is the four-bit

result? What is the value of the CF? The OF? If you do the addition "on paper" (that is,

you can use as m an y digits as you wish), how could you ex press, in English, the result of

adding the positive representation of an integer to its ne gative representation in the two's

complement notation? The negative representation to the positive representation? Which

two integers do not have a representation of the opposite sign?

3-7 (§3.3) The following 8-bit hexadecimal values are stored in two's complement format. What

are the equivalent signed decimal numbers?

a) 55

b) aa

c) f0

d) 0f

e) 80

f) 63

g) 7b

3.7. EXERCISES 53

3-8 (§3.3) The following 16-bit hexadecimal values are stored in two's compleme nt format.

What are the equivalent signed decimal numbers?

a) 1234

b) edcc

c) fedc

d) 07d0

e) 8000

f) 0400

g) ffff

h) 782f

3-9 (§3.3) Show how each of the following signed, decimal integers would be stored in 8-bit

two's complement format. Give your answer in hexadecim al.

a) 100

b) -1

c) -10

d) 88

e) 127

f) - 16

g) -32

h) -128

3-10 3.3) Show how each of the followin g signed, decimal integers would be store d in 16-bit

two's complement format. Give your answer in hexadecim al.

a) 1024

b) -1024

c) -1

d) 32767

e) -256

f) - 32768

g) -32767

h) -128

3-11 3.4) Perform binary addition of the following pairs of 8-bit numbers (show n in hexadeci-

mal) and indicate whe ther your result is "right" or "wrong." First treat them as unsigned

values, then as signed values (stored in two's complem ent format). Thus, you will have two

"right/wrong" answers for each sum. Note that the computer performs o nly one addition,

setting both the CF and OF according to the results of the addition. It is up to the program

to test the appropriate flag depending on whether the nu mbers are being considered as

unsigned or signed in the program.

a) 55 + aa

b) 55 + f0

c) 80 + 7b

d) 63 + 7b

e) 0f + ff

f) 80 + 80

3-12 3.4, 3.5) Perf orm binary addition of the following pairs of 16-bit numbers ( shown in hex-

adecimal) and indicate whether your result is "right" or "wrong." First treat them as un-

signed values, then as sign ed values (stored in two's complement format). Thus, you will

have two "right/wrong" answers for each sum. Note that the co mputer performs only one

addition, setting both the CF and OF according to the results of the addition. It is up to

the program to test the appropriate flag depending on whether the number s are being

considered as unsigned or signed in the prog ram.

a) 1234 + edcc

b) 1234 + fedc

c) 8000 + 8000

d) 0400 + ffff

e) 07d0 + 782f

f) 8000 + ffff

3-13 3.5) Enter the p rogram in Figure 3.1 and get it to work. Use the program to compute 1

(one) multiplied by 2 raised to the 31st power. What result do you get for 1 (one) multiplied

by 2 raised to the 32nd power? Explain the results.

54 CHAPTER 3. COMPUTER ARITHMETIC

3-14 3.5) Write a C program that prompts the user to enter a hexadecimal value, multiplies

it by ten, then displays the result in hexadecimal. Your main function should

a) declare a char array,

b) call the readLn function to read from the keyboard,

c) call a function to convert the input text string to an int,

d) multiply the int by ten,

e) call a function to convert the int to its corresponding hex adecimal text string,

f) call writeStr to display the resulting hexadecimal text string.

Use the readLn and writeStr functions from Exercise 2 -32 to read from the keyboard and

display on the screen. Place the functions to perfo rm the conversions in separate files.

Hint: review Figure 3.2.

3-15 3.5) Write a C program that prompts the user to enter a binary value, multiplies it by

ten, then displays the result in binary. ("Binary" here means that the user comm unicates

with the program in ones and zeros.) Your main function should

a) declare a char array,

b) call the readLn function to read from the keyboard,

c) call a function to convert the input text string to an int,

d) multiply the int by ten,

e) call a function to convert the int to its corresponding binary text string,

f) call writeStr to display the resulting binary te xt string.

Use the readLn and writeStr functions from Exercise 2 -32 to read from the keyboard and

display on the screen. Your functio ns to convert from a binary text string to an int and

back should be placed in separate functions.

3-16 3.5) Write a C program that prompts the user to enter unsigned decimal integer, mul-

tiplies it by ten, then displays the result in binary. ("Binary" here means that the user

communicates with the program in on es and zeros.) Your main function shou ld

a) declare a char array,

b) call the readLn function to read from the keyboard,

c) call a function to convert the input text string to an int,

d) multiply the int by ten,

e) call a function to convert the int to its corresponding decimal text string,

f) call writeStr to display the resulting decimal text string.

Use the readLn and writeStr functions from Exercise 2 -32 to read from the keyboard and

display on the screen. Yo ur function to convert from a decimal text string to an int should

be placed in a separate function. Hint: this problem cannot be so lved by simply shifting

bit p attern s. Think carefully about the mathematical equivale nce of shifting bit patterns

left or right.

3-17 3.5) Modify the program in Exer c ise 3-16 so that it works with signed decimal integers.

Chapter 4

Logic Gates

This chapter provides an overview of the hardware components that are used to build a com-

puter. We will limit the discussion to electronic computers, which use transistors to switch

between two different voltages. One voltage represe nts 0, the other 1. The hardware devices

that implement the logical ope r ations are called logic gates.

4.1 Boolean Algebra

In or der to understand how the componen ts are combined to build a computer, you need to

learn an other algebra system Boolean algebra. There are many approaches to learning abo ut

Boolean algebra. Some authors start with the postulates of Boolean algebra and develop the

mathematical tools needed for working with switching circuits f r om the m. We will take the

more pragmatic approach of starting with the basic properties of Boolean alg ebra, then explore

the properties of the algebra. For a more theoretical app r oach, including discussions of mor e

general Boolean algebra concepts, search the internet, or take a look at books like [9], [20], [23],

or [24].

There are only two values, 0 and 1, unlike ele mentary algebra that d eals with an infinity of

values, the real numbers. Since there are only two values, a truth table is a very useful tool for

working with Boolean algebra. A truth table lists all possible combinations of the variables in

the problem. The resulting value of the Boole an operation(s) for each variable combination is

shown on the respective row.

Elementary algebra has four operations, add ition, subtraction, multiplication, and division,

but Boolean algebra has only three operations:

AN D a binary operator; the result is 1 if and only if both operands are 1; otherwise the

result is 0. We will use '·' to designate the AND operation. It is also common to use the

'' symbol or simply "AND". The hardware symbol for the AND gate is shown in Figure

4.1. The in puts are x and y . The resulting output, x · y , is sho wn in the truth table in this

figure.

x

y

x · y

x y

x · y

0 0 0

0 1 0

1 0

0

1 1 1

Figure 4.1: The AND gate acting on two variables, x and y .

We can see from the truth table that the AND operator follows similar rules as multiplica-

tion in elementary algebra.

55

56 CHAPTER 4. LOGIC GATES

O R a binary operator; the result is 1 if at least one of the two operands is 1; otherwise

the result is 0. We will use '+' to designate the OR operation. I t is also common to use the

'' symbol or simply "OR". The hardware symbol for the OR gate is shown in Figure 4.2.

The inputs are x and y. The resulting output, x + y , is shown in the truth table in this

figure. From the truth table we can see that the OR operator follows the same rules as

x

y

x + y

x y

x + y

0 0 0

0 1 1

1 0 1

1 1

1

Figure 4.2: The OR gate acting o n two variables, x and y .

addition in elementary algebra excep t that

1 + 1 = 1

in Boolean algebra. Unlike elementary algebra, there is no carry from the OR operation.

Since addition of inte gers can produce a carr y, you will see in Section 5.1 that implementing

addition requires more than a simple OR gate.

NOT a unary operator; the result is 1 if the operand is 0, or 0 if the operand is 1. Other

names for the NOT operation are complement and invert. We will use x

to designate the

NOT operation. It is also common to use ¬x , or

x. The hardware symbol for the NOT gate

is shown in Figure 4.3. The input is x. The resulting output, x

, is shown in the truth table

in this figure.

x x

x

x

0 1

1

0

Figure 4.3: The NOT gate acting on one variable, x.

The NOT operation has no analog in elementary algebra. Be care ful to notice that in-

version of a value in elementary algebra is a division operation, which do es not exist in

Boolean algebra.

Two-state variables can be combined into expressions with these three operators in the same

way that you would use the C/C++ oper ators &&, ||, an d ! to create logical expressions commonly

used to control if and while statements. We now examine som e Boolean alge bra properties for

manipulating such ex pressions. As you read through this material, keep in mind that the same

techniques can be applied to logical expressions in programming languages.

These prope rties are commonly presented as theorems. They are easily proved from applica-

tion of truth tables.

There is a duality between the AND and OR operators. In any equality you can interchange

AND and OR along with the c onstants 0 and 1, and the equality still holds. Thus the properties

will be presented in pairs that illustrate their duality. We fir st consider pro perties that are the

same as in elementary algebra.

AN D and OR are associative:

x · ( y · z ) = (x · y ) · z (4.1)

x + (y + z ) = (x + y ) + z (4.2)

It is straightfor ward to prove these equations with truth tables. For example, for Equation

4.1:

4.1. BOOLEAN ALGEBRA 57

x y z (y · z ) (x · y ) x · ( y · z ) = (x · y ) · z

0 0 0 0 0 0 0

0 0 1 0 0 0 0

0 1 0 0 0 0 0

0 1 1 1 0 0 0

1 0 0 0 0 0 0

1 0 1 0 0 0 0

1 1 0 0 1 0 0

1 1 1 1 1 1 1

And for Equation 4.2:

x y z (y + z ) (x + y )

x + (y + z ) = (x + y ) + z

0 0 0 0 0 0 0

0 0 1 1 0 1 1

0 1 0 1 1 1 1

0 1 1 1 1 1 1

1 0 0 0 1 1 1

1 0 1 1 1 1 1

1 1 0 1 1 1 1

1 1 1 1 1 1 1

AN D and OR have an identity value:

x · 1 = x (4.3)

x + 0 = x (4.4)

Now we consider properties where Boolean algebra d iffers from elementary algebra.

AN D and OR are commutative:

x · y = y · x (4.5)

x + y = y + x (4.6)

This is easily proved by looking at the seco nd and third lines of the respective truth tables.

In elementary algebra, only the addition and multiplication oper ato r s are com mutative.

AN D and OR have a null value:

x · 0 = 0 (4.7)

x + 1 = 1 (4.8)

The null value for the AND is the same as multiplication in elementary algebra. But

addition in elementary algebra does not have a null constant, while OR in Boolean algebra

does.

AN D and OR have a complement value:

x · x

= 0 (4.9)

x + x

= 1 (4.10)

Compleme nt does no t exist in elementary algebra.

AN D and OR are idempotent:

x · x = x (4.11)

x + x = x (4.12)

That is, repeated application of either operator to the same value does not change it. This

differs considerably from elementary algebra rep eated application of addition is equiv-

alent to multiplication and repeated application of multiplication is the power op eration.

58 CHAPTER 4. LOGIC GATES

AN D and OR are d istributive:

x · ( y + z ) = x · y + x · z (4.13)

x + y · z = (x + y ) · (x + z ) (4.14)

Going from right to left in Equation 4.13 is the v ery familiar factoring from addition and

multiplication in elementary algebra. On the other hand, the operation in Equation 4.14

has no analog in elementary alg ebra. It follows from the idempotency property. The NOT

operator has an obvious prope rty:

NOT shows in volution:

(x

)

= x (4.15)

Again, since there is no complement in elementary algebra, there is no equivalent property.

D eMorgan's Law is an important expression of the duality between the AND and OR op -

erations.

(x · y )

= x

+ y

(4.16)

(x + y )

= x

· y

(4.17)

The validity of DeMorgan's Law can be seen in the following truth tables. For Equation

4.16:

x y (x · y )

(x · y )

x

y

x

+ y

0 0 0 1 1 1 1

0 1 0 1 1 0 1

1 0 0 1 0 1 1

1 1 1 0 0 0 0

And for Equation 4.17:

x y (x + y )

(x + y )

x

y

x

· y

0 0 0 1 1 1 1

0 1 1 0 1 0 0

1 0 1 0 0 1 0

1 1 1 0 0 0 0

4.2 Canonical (Standard) Forms

Some terminology and definitions at this point will help our discussion. Consider two dictionary

definitions of li teral[26]:

literal 1b: adhering to fact or to the ordinary co nstruction or primary

meaning of a term or expression : A CTUAL.

2: of, relating to, or ex pressed in letters.

In programming we use the first definition of literal. For examp le, in the following code sequence

int xyz = 123;

char a = 'b';

char

*

greeting = "Hello";

the number "123", the character 'b', and the string "Hello " are all literals. They are interpreted

by the compiler exactly as written. On the othe r hand , "xyz", "a", and "greeting" are all names

of v ariables.

In mathematics we use the second definition of literal. That is, in the algebraic expression

3x + 12y z

the letters x, y, and z are called liter als. Furthermore, it is common to om it the "·" operator to

designate multiplication. Similarly, it is often dropped in Boolean algebra expressions when the

AND operation is implied.

The meaning of literal in Boolean algebra is slightly more specific.

4.2. CANONICAL (STANDARD) FORMS 59

literal A presence of a variable or its complement in an expression. For example, the expression

x · y + x

· z + x

· y

· z

contains seven literals.

From the context of the discussion you should be able to tell which meaning of "literal" is in-

tended and when the "·" operator is omitted.

A Boolean expression is created from the numbers 0 and 1, and literals. Literals can be com-

bined using either the "·" or the "+" operators, which are multiplicative an d additive operations,

respectively. We will use the following terminology.

product term: A term in which the literals are connected with the AND operator. AND is

multiplicative, hence the use of "product."

minterm or standard product: A product term that co ntains each of the variables in the

problem, either in its complemented or uncomplemented form. For example, if a prob-

lem in volves three variables (say, x, y, and z), x · y · z , x

· y · z

, and x

· y

· z

are all minterms,

but x · y is not.

sum of products (SoP): One or more product terms connected with OR operators. OR is addi-

tive, hence the use of "sum."

sum of minterms (SoM) or canonical sum: An SoP in which each pr oduct term is a minterm.

Since all the variables are present in each minterm, the canonical sum is unique for a given

problem.

When first defining a problem, starting with the SoM ensures that the full effect of each

variable has been taken into account. This often does not lead to the best imple mentation. In

Section 4.3 we will see some tools to simplify the expression, an d hence, the implementation.

It is commo n to index the minterms according to the values of the variables that would cause

that minterm to evaluate to 1. For example, x

· y

· z

= 1 when x = 0, y = 0, an d z = 0, so this

would be m

0

. The minterm x

· y · z

evaluates to 1 when x = 0 , y = 1 , and z = 0 , so is m

2

. Table

4.1 lists all the minterms for a three-variable ex pression.

minterm

x y z

m

0

= x

· y

· z

0 0 0

m

1

= x

· y

· z

0 0 1

m

2

= x

· y · z

0 1 0

m

3

= x

· y · z 0 1 1

m

4

= x · y

· z

1 0 0

m

5

= x · y

· z 1 0 1

m

6

= x · y · z

1 1 0

m

7

= x · y · z

1 1 1

Table 4.1: Minterms for three variables. m

i

is the ith minterm. The x, y , and z values cau se the

corresponding minterm to evaluate to 1.

A convenie nt notation for expressing a sum of minterms is to use the

P

symbol with a

numerical list of the minterm in dexes. For example,

F (x, y, z ) = x

· y

· z

+ x

· y

· z + x · y

· z + x · y · z

= m

0

+ m

1

+ m

5

+ m

6

=

X

(0, 1, 5, 6) (4.18)

As you might expect, each of the terms defined above has a dual definition.

sum term: A term in which the literals are connected with the O R operator. OR is additive,

hence the use of "sum."

60 CHAPTER 4. LOGIC GATES

maxterm or standard sum: A sum term that contains each of the variables in the problem, ei-

ther in its co mplemented or uncomplemented form. For example, if an expression involves

three variables, x, y, and z, (x + y + z ), (x

+ y + z

), and (x

+ y

+ z

) are all maxterms, but

(x + y ) is not.

product of sums (PoS): One or more sum terms connected with AND operators. AND is mul-

tiplicative, hence the use of "product."

product of maxterms (PoM) or canon ical product: A PoS in which each sum term is a max-

term. Since all the v ariables are present in each max term, the c anonical product is unique

for a given problem.

It also follo ws that any Boolean function can be uniquely expressed as a product of max-

terms (PoM) that evaluate to 1. Starting with the product of maxterms ensures that the full

effect of each variable h as been taken into account. Again, this often d oes not lead to the best

implementation, and in Section 4.3 we will see some tools to simplify PoMs.

It is common to index the maxterm s according to the values of the variables that would cause

that maxterm to evaluate to 0. For example, x + y + z = 0 when x = 0 , y = 0 , and z = 0 , so this

would be M

0

. The maxter m x

+ y + z

evaluates to 0 when x = 1 , y = 0 , and z = 1 , so is m

5

.

Table 4.2 lists all the maxterms for a three -variable expression.

Maxterm x y z

M

0

= x + y + z 0 0 0

M

1

= x + y + z

0 0 1

M

2

= x + y

+ z

0 1 0

M

3

= x + y

+ z

0 1 1

M

4

= x

+ y + z 1 0 0

M

5

= x

+ y + z

1 0 1

M

6

= x

+ y

+ z 1 1 0

M

7

= x

+ y

+ z

1 1 1

Table 4.2: Maxterms fo r three variables. M

i

is the ith maxterm. The x, y , and z values cause

the corresponding m axterm to evaluate to 0.

The similar notation for expressing a p roduct of maxterms is to use the

Q

symbol with a

numerical list of the maxterm indexes. For example (and see Exercise 4-8),

F (x, y, z ) = (x + y

+ z ) · (x + y

+ z

) · (x

+ y + z ) · (x

+ y

+ z

)

= M

2

· M

3

· M

4

· M

7

=

Y

(2, 3, 4, 7) (4.19)

The names "minterm" and "maxterm" may seem somewhat arbitrary. But consider the two

functions,

F

1

(x, y, z ) = x · y · z

F

2

(x, y, z ) = x + y + z

There are e ight (2

3

) permutations of the thre e variables, x, y , and z . F

1

has one minterm and

evaluates to 1 for only on e of the permutations, x = y = z = 1 . F

2

has o ne maxterm and

evaluates to 1 for all permutations except when x = y = z = 0 . This is shown in the following

truth table:

4.3. BOOLEAN FUNCTION MINIMIZATION 61

minterm maxterm

x y z F

1

= (x · y · z ) F

2

= (x + y + z )

0 0 0 0 0

0 0 1 0 1

0 1 0 0 1

0 1 1 0 1

1 0 0 0 1

1 0 1 0 1

1 1 0 0 1

1 1 1 1 1

ORing more minterms to an So P exp ression expands the number of cases w here it evaluates

to 1, and ANDing more maxterms to a PoS expression reduces the number of cases where it

evaluates to 1.

4.3 Boolean Function Minimization

In this section we explore some important tools for manipulating Boolean expressions in order to

simplify their hardware implementation. When implemen ting a Boolean function in hardware,

each "·" op erator re presents an AND gate and each "+" operator an OR gate. In general, the

complexity of the hardware is related to the number of AND and OR gates. NOT gates are

simple and tend not to contribute significantly to the complexity.

We begin with some definitions.

minimal sum of products (mSoP): A sum of products expression is minimal if all other math-

ematically equivale nt SoPs

1. h ave at least as many pro duct terms, and

2. those with the same number of product terms have at least as many literals.

minimal product of sums (mPoS): A product of sums expression is minim al if all other math-

ematically equivale nt PoSs

1. h ave at least as many sum factors, and

2. those with the same number of sum factors have at least as many literals.

These definitions imply that there can be more than one minimal solution to a problem.

Good hardware design practice inv olves fi nding all the minimal solutions, then assessing each

Minimal

expressions may

not be unique.

one within the context of the available hardware. For example, judiciously placed NOT gates

can actually reduce hardware complexity (Section 4.4.3, page 75).

4.3.1 Minimization Us ing Algebraic Manipulations

To illustrate the impo rtance of reducing the complexity of a Boo lean fu nction, consider the fol-

lowing function:

F

1

(x, y ) = x · y

+ x

· y + x · y (4.20)

The expression on the right-hand side is an SoM. The circuit to implement this function is shown

in Figure 4.4. It require s three AND gates, one OR gate, and two NOT gates.

Now let us simplify the expression in Equation 4.20 to see if we can reduce the hardware

requirements. This process will pr obably seem odd to a person who is not used to manipulating

Boolean expressions, because there is not a single correct path to a solution. We present one

way here. First we use the idempotency property (Equation 4.12) to duplicate the last term:

F

1

(x, y ) = x · y

+ x · y + x

· y + x · y (4.21)

Next we use the distributive proper ty (Equation 4.13) to factor the expression:

F

1

(x, y ) = x · (y

+ y ) + y · (x

+ x) (4.22)

62 CHAPTER 4. LOGIC GATES

x y

(x · y

) + (x

· y ) + (x · y )

Figure 4.4: Hardware implementation of the function in Equation 4.20.

And from the complement property (Equation 4.10) we get:

F

1

(x, y ) = x · 1 + y · 1 (4.23)

= x + y (4.24)

which you recognize as the simple OR operation. It is easy to see that this is a minimal sum of

products for this function. We can implement Equation 4.20 with a single OR gate see Figure

4.2 on page 56. This is clearly a less exp ensive, faster circuit than the one shown in Figure 4.4.

To illustrate how a product of sums expression can be minimized, conside r the function:

F

2

(x, y ) = (x + y

) · (x

+ y ) · (x

+ y

) (4.25)

The expression on the right-hand side is a PoM. The circuit for this function is shown in Figure

4.5. It requires three OR gates, one AND gate, and two NOT gates.

x y

(x + y

) · (x

+ y ) · (x

+ y

)

Figure 4.5: Hardware implementation of the function in Equation 4.28.

We will use the distributive property (Equation 4.14) on the right two factors and recognize

the complement (Equation 4.9):

F

2

(x, y, z ) = (x + y

) · (x

+ y · y

) (4.26)

= (x + y

) · x

(4.27)

Now, use the distributive (Equation 4.13) and complement (Equation 4.9) p roperties to obtain:

F

2

(x, y, z ) = x · x

+ x

· y

(4.28)

= x

· y

(4.29)

Thus, the function can be implemented with two NOT gates and a single AND gate, which is

clearly a minimal product of sum s. Again, with a little algebraic manipulation we have arr ived

at a mu ch simpler solution.

4.3. BOOLEAN FUNCTION MINIMIZATION 63

Example 4-a

Design a func tion that will detect the even 4-bit integers.

The even 4-bit integers are given by the function:

F (w, x, y, z ) = w

· x

· y

· z

+ w

· x

· y · z

+ w

· x · y

· z

+ w

· x · y · z

+w · x

· y

· z

+ w · x

· y · z

+ w · x · y

· z

+ w · x · y · z

Using the distributive property repeatedly we get:

F (w, x, y, z ) = z

· (w

· x

· y

+ w

· x

· y + w

· x · y

+ w

· x · y

+w · x

· y

+ w · x

· y + w · x · y

+ w · x · y )

= z

· (w

· (x

· y

+ x

· y + x · y

+ x · y ) + w · (x

· y

+ x

· y + x · y

+ x · y ))

= z

· (w

+ w) · (x

· y

+ x

· y + x · y

+ x · y )

= z

· (w

+ w) · (x

· (y

+ y ) + x · (y

+ y ))

= z

· (w

+ w) · (x

+ x) · (y

+ y )

And from the complement property we arrive at a minimal sum of products:

F (x, y, z ) = z

4.3.2 Minimization Us ing Graphic Tools

The Karnaugh map was invented in 1953 by Maurice Karnaugh while working as a teleco mmu-

nications eng in eer at Bell Labs. Also known as a K-map, it provides a graphic view of all the

possible minterms for a given number of variables. The format is a rectangular grid with a cell

for each minterm. There are 2

n

cells f or n variables.

Figure 4.6 sho ws how all four m in terms for two variables are mappe d onto a four-cell Kar-

naugh map. The vertical axis is used for plotting x and the horizontal for y . The value of x for

F (x, y )

y

0 1

x

0

1

m

0

m

1

m

2

m

3

Figure 4.6: Mapping of two-variable minterms on a Karnaugh map.

each row is shown by the number (0 or 1) imme diately to the le ft of the row, and the value of y

for each colum n appears at the top of the column.

The procedure for simplifying an SoP expression using a Karnaugh m ap is:

1. Place a 1 in each cell that corresponds to a minterm that e valuates to 1 in the expression.

2. Combine c ells with 1s in them and that share edges into the largest possible groups. Larger

groups result in simpler expressions. The number of cells in a group must be a power of

2. The edges of the Karnaugh map are considered to wrap around to the other side, both

vertically and ho rizontally.

3. Groups may overlap. In fact, this is common. However, no group should be fully enclosed

by another group.

4. The result is the sum of the product terms that re present each group.

64 CHAPTER 4. LOGIC GATES

The simplification comes from the fact that the number of variables needed to specify a group

of cells is reduced by 2

n

g

where n

g

is the number of cells in the group. Thus the number of

variables required to specif y an entire group of cells in an n-variable Karnaugh map is:

number of group variables = log

2

n log

2

n

g

where:

n = number of variables in Ka rnaugh map

n

g

= number of variables in the group

Let us use a Karnaugh map to find a minimal sum o f products for Equation 4.20 (repe ated

here):

F

1

(x, y ) = x · y

+ x

· y + x · y

We start by placing a 1 in each cell correspond ing to a minter m that appears in the equation as

shown in Figure 4.7. It is easy to see two groups of two cells each. They are circled in Figure

F

1

(x, y )

y

0 1

x

0

1

1

1 1

Figure 4.7: Karnaugh map for F

1

(x, y ) = x · y

+ x

· y + x · y .

4.8. The group in the bottom r ow represents the product te rm x, and the one in the right- hand

F

1

(x, y )

y

0 1

x

0

1

1

1 1

Figure 4.8: Two-variable Karnaugh map showing the groupings x an d y .

column represents y . So the simplification is:

F

1

(x, y ) = x + y (4.30)

(4.31)

Notice that the two en c ircled grou ps overlap with the x · y minterm. This is the term that

we added to the function in Equation 4.21 when perform ing the algebraic simplification. The

Karnaugh map provides a graphical means to find the same simplification as the algebraic

manipulations (see Equation 4.24). Many people nd it easier to spot simplification patterns o n

a K arnaugh map.

Although it is not obvious in a two-variable Karnau gh map, the cells must be arranged such

that only one variable changes between two cells that share an edge. This is called the adjacency

property. We can see this in a three-variable Karnaugh map. Table 4.1 (page 59) lists all the

minterms fo r three variables, x, y , and z , numbered from 0 8. A total of eight cells are needed,

so we will draw it four cells wide and two high. Our Karnaugh map will be drawn with y and z

on the horizontal axis, and x on the vertical. Figure 4.9 shows how the three-variable minterm s

map onto a Karnaugh map. Notice the order of the bit patterns along the top of the Karnaugh

map. It is the same as a two-variable Gray code (Table 3.7, page 52). That is, the order of the

columns is such that the yz values follow the Gray code.

4.3. BOOLEAN FUNCTION MINIMIZATION 65

F(x,y,z)

yz

00 01 1011

x

0

1

m

0

m

1

m

2

m

3

m

4

m

5

m

6

m

7

Figure 4.9: Mapping of three-variable minterms on a Karnaugh map.

A four-variable K arnaugh map is shown in Figure 4.10. The y and z variables are on the

horizontal ax is, w and x on the vertical. From this four-variable Karnaugh map we see that the

order of the rows is such that the wx v alues also follow the Gray code.

F(w,x,y,z)

yz

00 01 1011

wx

00

01

10

11

m

0

m

1

m

2

m

3

m

4

m

5

m

6

m

7

m

8

m

9

m

10

m

11

m

12

m

13

m

14

m

15

Figure 4.10: Mapp ing of four-variable minterms on a Karnaugh map.

Other axis labeling schemes also work. The only requireme nt is that en tries in adjacent cells

differ by only one bit (which is a property of the Gray code). See Exercises 4-9 and 4-10.

Example 4-b

Find a minimal sum of products ex pression for the function

F (x, y, z ) = x

· y

· z

+ x

· y

· z + x

· y · z

+ x · y

· z

+ x · y · z

+ x · y · z (4.32)

First we draw the Karn au gh map:

F (x, y, z )

yz

00 01 1011

x

0

1

1 1 1

1 11

Several groupings are possible. Keep in mind that groupings can wrap around. We will work

with

F (x, y, z )

yz

00 01 1011

x

0

1

1 1 1

1 11

!

#

"

which yields a minimal sum of produc ts:

F (x, y, z ) = z

+ x

· y

+ x · y

66 CHAPTER 4. LOGIC GATES

We may wish to implement a function as a product o f sums instead of a sum of products.

From D eMorgan's Law, we know that the complement of an expr ession exchanges all ANDs

and ORs, and complements each of the literals. The zeros in a Karnaugh map represent the

complement of the expression. So if we

1. p lace a 0 in each cell of the Karnaugh map corresponding to a miss ing minterm in the

expression,

2. fi nd groupings of the cells with 0s in them,

3. write a sum of products expression repr esented by the grouping o f 0s, and

4. complement this expre ssion,

we will have the desired expression expressed as a product of sums. Let us use the previous

example to illustrate.

Example 4-c

Find a minimal product of sums fo r the function in Equation 4.32.

Using the Karnaugh map zero s,

F (x, y, z )

yz

00 01 1011

x

0

1

0

0

we obtain the complement of our desired function,

F

(x, y, z ) = x

· y · z + x · y

· z

and from DeMorgan's Law:

F (x, y, z ) = (x + y

+ z

) · (x

+ y + z

)

We now work an example with fou r variables.

Example 4-d

Find a minimal sum of products ex pression for the function

F (x, y, z ) = w

· x

· y

· z

+ w

· x

· y · z

+ w

· x · y

· z

+ w

· x · y · z + w · x · y

· z + w · x · y · z

+ w · x

· y

· z

+ w · x

· y · z

(4.33)

Using the groupings on the Karnaugh m ap,

F (w, x, y, z )

yz

00 01 1011

wx

00

01

10

11

1 1

1 1

1 1

1 1

!

"

#

#

"

!

4.3. BOOLEAN FUNCTION MINIMIZATION 67

we obtain a minimal sum of products,

F (w, x, y, z ) = x

· z

+ x · z

Not only have we greatly reduced the number of AND and OR gates, we see that the two vari-

ables w and y are not needed. By the way, you have probably encountered a circuit that imple-

ments this function. A light contro lled by two switches typically does this.

As you probably exp ect by now a Karnaugh map also wor ks when a f unction is specified as a

product of sums. The differenc es are:

1. maxterms are numbered 0 for un complemented variables and 1 for complemented, and

2. a 0 is placed in each c ell of the Karnaugh map that corresponds to a maxterm.

To see how this works let us fi r st compare the Karnaugh maps for two functions,

F

1

(x, y, z ) = (x

· y

· z

)

F

2

(x, y, z ) = (x + y + z )

F

1

is a sum of products with only one min term, and F

2

is a product of sums with only one

maxterm. Figure 4.11(a) shows how the minterm appears on a Karnaugh map, and Figure

Karnaugh map

"minterm"

versus

"maxterm."

4.11(b) shows the maxterm.

F

1

(x, y, z )

yz

00 01 1011

x

0

1

1

F

2

(x, y, z )

yz

00 01 1011

x

1

0

(a) (b)

Figure 4.11: Comparison of one m in term (a) ver sus one m axterm (b) on a Karnaugh m ap.

Figure 4.12 shows how three-variable maxterms m ap onto a Karnaug h map. As with minterms,

x is on the vertical axis, y and z on the horizo ntal. To use the Karnaugh map for maxterms, place

a 0 is in each cell corresponding to a maxterm.

F (x, y, z )

yz

00 01 1011

x

0

1

M

0

M

1

M

2

M

3

M

4

M

5

M

6

M

7

Figure 4.12: Mapping of three-v ariable maxterms on a Karnaugh map.

A four-variable Karnaugh map of maxterms is shown in Figure 4.13. The w and x variables

are on the vertical axis, y and z on the horizontal.

68 CHAPTER 4. LOGIC GATES

F (w, x, y, z )

yz

00 01 1011

wx

00

01

10

11

M

0

M

1

M

2

M

3

M

4

M

5

M

6

M

7

M

8

M

9

M

10

M

11

M

12

M

13

M

14

M

15

Figure 4.13: Mapp ing of four-variable minterms on a Karnaugh map.

Example 4-e

Find a minimal product of sums fo r the function of Equation 4.25. That function is

F (x, y, z ) = (x + y + z ) · (x + y + z

) · (x + y

+ z

)

· (x

+ y + z ) · (x

+ y

+ z

)

So this expression includes maxterms 0, 1, 3, 4, and 7. These appear in a Karnaugh map:

F (x, y, z )

yz

00 01 1011

x

0

1

0 0 0

0 0

Next we encircle the largest ad jacent blocks, where the number of cells in each block is a power

of two. Notice that m axterm M

0

appears in two groups.

F (x, y, z )

yz

00 01 1011

x

0

1

0 0 0

0 0

From this Karnaugh map it is very easy to write the function as a minimal prod uct of sums:

F (x, y, z ) = (x + y ) · (y + z ) · (y

+ z

)

which is the same as we found in Equation 4.28.

There are situations where some minterms (or maxterms) are irrelevant in a function. This

might occur, say, if certain input conditions are impossible in the design. As an example, assume

that you have an ap plication where the exclusive or (XOR) operation is required. The symbol for

the operation and its truth table are shown in Figure 4.14. The minterms required to implem ent

x

y

x y

x y

x y

0 0 0

0 1 1

1 0

1

1 1 0

Figure 4.14: The XOR gate acting on two variables, x and y .

this operation are:

x y = x · y

+ x

· y

4.4. CRASH COURSE IN ELECTRONICS 69

This is the simplest f orm of the XOR operation. It requires two AND gates, two NOT gates, and

an OR gate for realization.

But let us say that we have the additional information that the two inputs, x and y can never

be 1 at the same time. Then we can draw a Karnaugh map with an "×" for the minterm that

cannot exist as shown in Figure 4.15. The "×" represents a "don't care" cell we don't care

whether this cell is grouped with other ce lls or not.

F (x, y )

y

0 1

x

0

1

1

1

×

Figure 4.15: A "don't care " cell on a Karnaugh map. Since x and y cannot both be 1 at the same

time, we don't care if the cell xy = 11 is included in our groupings or not.

Since the cell that represents the minterm x · y is a "don't care", we can include it in our

minimization groupings, leading to the two groupings shown in Figure 4.16. We easily recognize

F (x, y )

y

0 1

x

0

1

1

1

×

Figure 4.16: Karnaugh map for xor function if we know x = y = 1 cannot occur.

this Karnaugh map as being realizable w ith a single OR gate, which saves on e OR gate and an

AND g ate.

4.4 Crash Course in Electronics

Although it is not necessary to be an electrical en gineer in order to understand how logic gates

work, some basic concepts will help. This section pro vides a very brief overview of the funda-

mental concepts of elec tronic circuits. We begin with two definitions.

Current is the movement of electrical charge. Electrical charge is measured in coulombs. A

flow of one coulomb per second is defined as one ampere, often abbre viated as one amp.

Current only flows in a closed path throug h an e lectrical circu it.

Voltage is a difference in electrical potential between two points in an electrical circuit. One

volt is defined as the potential differ ence between two points on a conduc tor when one

ampere of current flowing through the conductor dissipates one watt of power.

The ele c tronic circuits that make up a compute r are constructed from:

A power source that provides the electrical power.

Passive ele ments that control current flow and voltage levels.

Active e lements that switch between various combinations of the power source, passive

elements, and othe r active elemen ts.

We will look at how each of these three categories of electro nic components behaves.

70 CHAPTER 4. LOGIC GATES

4.4.1 Power Supplies and Batteries

The ele c trical power is supplied to our homes, schools, and businesses in the form o f alternating

current (AC). A plot of the mag nitude of the voltage versus time shows a sinusoidal wave shape.

Computer circuits use direct current (DC) po wer, which does not vary over time. A power su pply

is used to convert AC power to DC as shown in Figure 4.17. As you probably know, batteries

-

time

-

+

voltage

-

time

-

+

voltage

Power

Supply

c

c

c

c

AC DC

Figure 4.17: AC/DC power supply.

also pr ovide DC power.

Computer circuits use DC power. They distinguish between two dif ferent voltage levels to

provide logical 0 and 1. For example, logical 0 may be represented by 0.0 volts an d logical 1 by

+2.5 volts. Or the reverse may be used +2.5 volts as logical 0 and 0.0 volts as logical 1. The

only requirement is that the hardware design be consistent. Fortunately, p rogrammers do not

need to be concerned about the actual voltage s used.

Electrical eng in eers typically think of the AC characteristics of a circuit in terms of an ongo-

ing sinusoid al voltage. Although DC power is used, computer circuits are constantly switching

between the two voltage levels. Compu ter hardware engineers need to consider circuit element

time characteristics when the voltage is suddenly switched from one level to another. It is this

transient behavior that will be described in the following sec tions.

4.4.2 Resistors, Capacitors, and Inductors

All e lectrical circu its have resistance, capacitance, and inductance.

Resistance dissipates power. The e lectric ene rgy is transformed into heat.

Ca pacitance stores energy in an electric field . Voltage across a capacitance canno t change

instantaneously.

Inductance stores energy in a magnetic field. Current through an inductance cannot

change instantaneously.

All three of these electro-magnetic properties are distributed througho ut any electronic circuit.

In comp uter circuits they tend to limit the speed at which the circuit can operate and to consume

power, collectively known as impedance. Analyzing their effects can be quite co mplicated and is

beyond the scope of this book. Instead, to get a feel fo r the effects of each of these properties,

we will consider the electronic devices that are used to add one of these properties to a specific

location in a circuit; namely, resistors, capacitors, and inductors. Each of the se circuit devices

has a different relationship between the voltage difference across the device and the current

flowing through it.

A resis tor irreversibly transforms electrical energy into heat. I t does not store energy. The

relationship between voltage and current for a r esistor is given by the equation

v = i R (4.34)

where v is the voltage difference across the resistor at time t, i is the current owing through it

at time t, and R is the value of the resistor. Resistor values are specified in ohms. The circuit

shown in Figure 4.18 shows two resistors connected in series through a switch to a battery. The

battery supplies 2.5 volts. The Greek letter is used to indicate ohms, and kΩ indicates 10

3

ohms. Since current can only flow in a closed path, none flows until the switch is closed.

4.4. CRASH COURSE IN ELECTRONICS 71

2.5 v

+

1.0 k Ω

i

1.5 k Ω

A B

C

Figure 4.18: Two resistors in series.

Both resistors are in the same path, so when the switch is closed the same cur rent flows

through each of them. The resistors are said to be connected in series. The total resistance in

the path is their sum:

R = 1.0 kΩ + 1.5 kΩ

= 2.5 × 10

3

ohms

The amount of current can be determined from the application of Equation 4.34. Solving for i,

i =

v

R

=

2.5 volts

2.5 × 10

3

ohms

= 1.0 × 10

3

amps

= 1.0 ma

where "ma" means "milliamps."

We can now use Equation 4.34 to determine the voltage difference between points A and B.

v

AB

= i R

= 1 .0 × 10

3

amps × 1.0 × 10

3

ohms

= 1 .0 volts

Similarly, the voltage difference between points B and C is

v

BC

= i R

= 1 .0 × 10

3

amps × 1.5 × 10

3

ohms

= 1 .5 volts

Figure 4.19 shows the same two resistors connected in parallel. In this case, the voltage

2.5 v

+

i

t

1.0 k Ω

i

1

1.5 kΩ

i

2

A

C

Figure 4.19: Two resistors in parallel.

across the two resistors is the same: 2.5 volts when the switch is closed. The current in each one

depends upon its resistance. Thus,

i

1

=

v

R

1

=

2.5 volts

1.0 × 10

3

ohms

= 2 .5 × 10

3

amps

= 2 .5 ma

72 CHAPTER 4. LOGIC GATES

and

i

2

=

v

R

2

=

2.5 volts

1.5 × 10

3

ohms

= 1.67 × 10

3

amps

= 1.67 ma

The total current, i

t

, supplied by the battery when the switch is closed is divided at point A to

supply both the resistors. It must equal the sum of the two currents through the resistors,

i

t

= i

1

+ i

2

= 2 .5 ma + 1.6 7 ma

= 4 .17 ma

A capacitor stores energ y in the for m of an ele ctric field. It reacts slowly to voltage changes,

requiring time for the electric field to build. The voltage across a capacitor changes with time

according to the equation

v =

1

C

Z

t

0

i dt (4.35)

where C is the value of the capacitor in farads.

Figure 4.20 sho ws a 1.0 microfarad capacitor being charged through a 1.0 kiloohm resistor.

This circuit is a rough approximation of the output of one transistor conne cted to the input of

2.5 v

+

1.0 k Ω

i

1.0 µf

A B

C

Figure 4.20: Capacitor in series with a resistor; v

AB

is the voltage across the resistor and v

BC

is

the voltage across the capacitor.

another. (See Section 4.4.3.) The output of the first transistor has resistance, and the input to

the second transistor has capacitance. The switching behavior of the second transistor depends

upon the voltage across the (equivalent) capacitor, v

BC

.

Assuming the voltage across the capacitor, v

BC

, is 0.0 volts when the switch is first closed,

current flows through the resistor and capacitor. The voltage across the resistor plus the voltage

across the capacitor must be equal to the voltage available from the battery. That is,

2.5 = i R + v

BC

(4.36)

If we assume that the voltage acro ss the capacitor, v

BC

, is 0.0 volts wh en the switch is first

closed, the full voltage of the battery, 2.5 v olts, will app ear across the resistor. Thu s, the initial

current ow in the circuit will be

i

initial

=

2.5 volts

1.0 k Ω

= 2 .5 ma

As the voltage across the capacitor increases, according to Equation 4.35, the vo ltage across the

resistor, v

AB

, decreases. This results in an exponentially decreasing build up of voltage across

the capacitor. When it nally equals the voltage of the battery, the voltage across the resistor

is 0.0 volts and current ow in the c ircuit becomes zero. The rate of the exponential decrease is

given by the prod uct RC , called the time co nstant.

4.4. CRASH COURSE IN ELECTRONICS 73

Using the values of R and C in Figure 4.20 we get

R C = 1.0 × 10

3

ohms × 1.0 × 10

6

farads

= 1 .0 × 10

3

seconds

= 1 .0 msec.

Thus, assuming the capacitor in Figure 4.20 has 0.0 volts across it when the switch is closed,

the voltage that develops over time is given by

v

BC

= 2 .5 (1 e

t/10

3

) (4.37)

This is shown in Figure 4.21. At the time t = 1.0 millisecond (one time constant), the voltage

0

0.5

1

1.5

2

2.5

0 2 4 6 8 10

0

0.5

1

1.5

2

2.5

v

BC

mathrm volts v

AB

mathrm volts

msec.

Figure 4.21: Capacitor charging over time in the circu it in Figure 4.20. The lef t-hand y-axis

shows vo ltage across the capacitor, the right-hand voltage across the resistor.

across the capacitor is

v

BC

= 2.5 (1 e

10

3

/10

3

)

= 2.5 (1 e

1

)

= 2.5 × 0.63

= 1.58 volts

After 6 time constants of time h ave passed, the voltage across the capacitor has reached

v

BC

= 2 .5 (1 e

6×10

3

/10

3

)

= 2 .5 (1 e

6

)

= 2 .5 × 0.9975

= 2 .49 volts

At this time the voltage across the resistor is essentially 0.0 volts and current flow is very low.

Inductors are not used in logic circuits. In the typical PC, they are found as part of the CPU

power supply c ircuitry. If you have access to the inside o f a PC, you can probably see a small (1

cm. in diameter) donut-shaped device with wire wrapped around it on the motherboard near

the CPU. This is an inductor used to smooth the power supplied to the CPU.

An inductor stores energy in the form of a magnetic field. It reacts slowly to current changes,

requiring time for the magnetic field to build. The relationship between voltage at time t across

an in ductor and current flow through it is given by the e quation

v = L

di

dt

(4.38)

where L is the value of the induc tor in henrys.

Figure 4.22 sho ws an inductor conne cted in series with a resistor. Whe n the switch is open

74 CHAPTER 4. LOGIC GATES

2.5v

+

1.0 µh

i

1.0 k Ω

A B

C

Figure 4.22: Inductor in serie s with a resistor.

no cu r rent flows through this circuit. Upon closing the switch, the inductor initially impedes

the flow of current, taking time for a magnetic field to be built up in the inductor.

At this initial point no current is flowing thro ugh the re sistor, so the voltage across it, v

BC

, is

0.0 volts. The full voltage of the battery, 2.5 volts, appears across the inductor, v

AB

. As current

begins to ow through the inductor the voltage acro ss the resistor, v

BC

, grows. This results in

an exponentially decreasing voltage acr oss the inductor. When it fi nally reaches 0.0 volts, the

voltage across the resistor is 2.5 volts and curre nt flow in the circuit is 2.5 ma.

The rate of the exponential voltage decrease is given by the time c onstant L/R. Using the

values of R and L in Figure 4.22 we get

L

R

=

1.0 × 10

6

henrys

1.0 × 10

3

ohms

= 1.0 × 10

9

seconds

= 1.0nanoseconds

When the switch is closed, the voltage that develops across the inductor over time is given by

v

AB

= 2.5 × e

t/10

9

(4.39)

This is shown in Figure 4.23. Note that after about 6 nanoseconds (6 time constants) the voltage

0

0.5

1

1.5

2

2.5

0 2 4 6 8 10

0

0.5

1

1.5

2

2.5

v

AB

, v olts v

BC

, v olts

nanosec.

Figure 4.23: Inductor building a magnetic field over time in the circuit in Figure 4.22. The left-

hand y-axis shows voltage across the indu ctor, the right-hand voltage across the

resistor.

across the inductor is essentially equal to 0.0 volts. At this time the full voltage of the battery is

across the resistor and a steady c urrent o f 2.5 ma flows.

This circuit in Figure 4.22 illustrates how inductors are used in a CPU power supply. The

battery in this circ uit represents the computer power supply, and the resistor represe nts the load

provided by the CPU. The voltage produced by a power supply includes noise, which consists of

small, high-frequen cy fluctuations added to the DC level. As can be seen in Figure 4.23, the

voltage supplied to the CPU, v

BC

, changes little over short periods of time.

4.4. CRASH COURSE IN ELECTRONICS 75

4.4.3 CMOS Transistors

The general id ea is to use two different voltages to represen t 1 and 0. For example, we might

use a high voltage, say +2.5 volts, to represent 1 and a low voltage, say 0.0 volts, to rep resent 0.

Logic circuits are constructed from components that can switch between these the high and low

voltages.

The basic switching device in to day's computer logic circu its is the metal-oxide-semicond uctor

field-ef fect transistor (MOSFET). Figure 4.24 shows a NOT g ate implemented with a single

MOSFET. The MOSFET in this circuit is an n-type. You can think of it as a three-terminal

input

V

SS

R

V

DD

output

Figure 4.24: A single n-type MOSFET transistor switch.

device. The input terminal is called the gate. The terminal connecte d to the output is the drain,

and the terminal conn ected to V

SS

is the source. In this circuit the drain is connected to positive

(high) voltage of a DC pow er supply, V

DD

, through a resistor, R. The source is connected to the

zero voltage, V

SS

.

When the input voltage to the transistor is high, the gate acquires an electrical charge, thus

turning the transistor on. The p ath be tw een the drain and the sou r ce of the transistor essen-

tially become a closed switch. This causes the output to be at the low voltage. The transistor

acts as a pull down device.

The resulting circuit is equivalent to Figure 4.25(a). In this circuit curr ent flows from V

DD

to

V

DD

R

outputinput = high

V

SS

(a)

V

DD

R

outputinput = low

V

SS

(b)

Figure 4.25: Single transistor switch equivalent circuit; (a) switch closed; (b) switch open.

V

SS

through the resistor R . The output is connec ted to V

SS

, that is, 0.0 volts. The c urrent flow

through the resistor and transistor is

i =

V

DD

V

SS

R

(4.40)

The problem with this current flow is that it uses power ju st to keep the output low.

If the input is switched to the low voltage, the transistor turns off, resulting in the equ ivalent

circuit shown in Figure 4.25(b). The ou tput is typically connected to anothe r transistor's input

(its gate), w hich draws essentially no current, except d uring the time it is switching from one

state to the other. In the steady state con dition the output connection does not draw current.

Since no current flows through the resistor, R, there is no voltage change across it. So the ou tput

connection will be at V

DD

volts, the high voltage. The resistor is acting as the pull up device.

These two states can be expressed in the truth table

76 CHAPTER 4. LOGIC GATES

input output

low high

high low

which is the logic required of a NOT gate.

There is another problem with this hardware design. Although the gate of a MOSFET tran-

sistor draws essentially no current in order to remain in either an on or off state, current is

required to cause it to change state. The gate of the transistor that is connecte d to the ou tp ut

must be charged. The gate behaves like a capacitor during the switching time. This charging

requires a flow of current over a period of time. The problem here is that the resistor, R, re-

duces the amoun t of current that can flow, thus taking larger to charge the transistor gate. (See

Section 4.4.2.)

From Equation 4.40, the larger the resistor, the lower the current flow. So we h ave a dile mma

the resistor should be large to reduce power consumption, bu t it should be small to increase

switching speed.

This problem is solved with Complementary Metal Oxide Semicond uctor (CMOS) technology.

This technology packages a p-type MOSFET together with each n-type. The p-type wor ks in the

opposite way a high value on the gate turns it off, and a low value turns it on. The circuit in

Figure 4.26 shows a NOT gate using a p-type MOSFET as the pull up device.

input

V

SS

V

DD

output

input

output

0 1

1 0

Figure 4.26: CMOS inve rter (NOT) circuit.

Figure 4.27(a) shows the equivalent circuit with a high voltage input. The pull up transistor

(a p-type) is off, and the pull down transistor (an n-type) is on. This results in the output being

pulled down to the low voltage. In Figure 4.27(b) a low voltage input turns the pull up transistor

V

DD

outputinput = high

V

SS

(a)

V

DD

outputinput = low

V

SS

(b)

Figure 4.27: CMOS inverter equivalent circu it; ( a) pull up open an d pull down closed; (b) pull

up closed and pull down open.

on and the pull down transistor off. The result is the output is pulled u p to the high voltage.

Figure 4.28 shows an AND gate implemented with CMOS transistors. (See Exercise 4-12.)

Notice that the signal at point A is NOT(x AND y ). The circuit from point A to the output is

4.5. NAND AND NOR GATES 77

V

DD

A

V

DD

V

SS

output

V

SS

x

y

x y A output

0 0 1 0

0 1 1 0

1 0 1 0

1 1

0 1

Figure 4.28: CMOS AND circuit.

a N OT g ate. It requires two fewer transistors than the AND operation. We will examine the

implications of this result in Section 4.5.

4.5 NAND and NOR Gates

The discussion of transistor circuits in Section 4.4.3 illustrates a common characteristic. Be-

cause of the inherent way that transistors work, most circuits invert the signal. That is, a high

voltage at the input produces a low voltage at the output and vice versa. As a result, an AND

gate typically requires a NOT gate at the o utput in order to achieve a true AND operation.

We saw in that discussion that it takes fewer transistors to produce AN D NOT than a pure

AND. The combination is so commo n, it h as been given the name NAND g ate. And, of course,

the same is true for OR gates, giving us a NOR gate.

NA ND a binary operator; the result is 0 if and only if both operands are 1; otherwise

the result is 1. We will use (x · y )

to designate the NAND operation. It is also common to

use the '' symbol or simply "NAND ". The hardware symbol for the NAND gate is shown

in Figure 4.29. The inputs are x and y . The resulting output, (x · y )

, is shown in the truth

table in this figure.

x

y

(x · y )

x y

(x · y )

0 0 1

0 1 1

1 0

1

1 1 0

Figure 4.29: The NAND gate acting on two variables, x and y .

NOR a binary operator; the result is 0 if at least one of the two operands is 1; otherwise

the result is 1. We will use (x + y )

to designate the NOR operation. It is also comm on to

use the ' ' symbol or simply "NOR". The hardware symbol for the NOR gate is shown in

Figure 4.30. The inputs are x and y . The resulting outpu t, (x + y )

, is shown in the truth

table in this figure.

The small circle at the output of the NAND and NOR gates signifies "NOT", just as with the

NOT gate (see Figure 4.3). Altho ugh we h ave explicitly shown NOT gates when inputs to gates

78 CHAPTER 4. LOGIC GATES

x

y

(x + y )

x y

(x + y )

0 0 1

0 1 0

1 0 0

1 1

0

Figure 4.30: The NOR gate acting on two variables, x and y .

are complemented, it is common to simply use these small circles at the input. For example,

Figure 4.31 shows an OR gate with bo th inpu ts comple mented. As the truth table in this fig ure

x

y

(x

+ y

)

x y

(x

+ y

)

0 0 1

0 1

1

1 0 1

1 1

0

Figure 4.31: An alternate way to draw a NAND gate.

shows, this is an alternate way to draw a NAND gate. See Exercise 4-14 for an alternate way to

draw a NOR gate.

One of the interesting properties about NAND gates is that it is possible to build AND, OR,

and NOT gates from them. That is, the NAND gate is sufficient to implement any Boolean

function. In this sense, it can be though of as a universal gate.

First, we construct a NOT gate. To do this, simply connect the signal to both inputs of a

NAND gate, as shown in Figure 4.32.

x (x · x)

= x

Figure 4.32: A NO T gate built from a NAND gate.

Next, w e can use deMorgan's Law to derive an AND gate.

(x · y )

= x

+ y

(x

+ y

)

= (x

)

· (y

)

= x · y

So we need two NAND gates connected as shown in Figure 4.33.

x · y

x

y

(x · y )

Figure 4.33: An AND gate built from two NA ND gates.

Again, using deMorgan's Law

(x

· y

)

= (x

)

+ (y

)

= x + y

we use three NAND gates connected as shown in Figure 4.34 to create an OR gate.

4.5. NAND AND NOR GATES 79

x

y

x + y

Figure 4.34: An OR gate built from three NAND gates.

It may seem like we are creating more complexity in order to build circuits from NAND gates.

But consider the function

F (w, x, y, z ) = (w · x) + (y · z ) (4.41)

Without knowin g how logic gates are co nstructed, it would be reasonable to implement this

function with the circuit shown in Figure 4.35. Using the involution property (Equation 4.15) it

w

x

y

z

(w · x) + (y · z )

Figure 4.35: The function in Equation 4.41 using two AND gates and one OR gate.

is clear that the circuit in Figure 4.36 is equ ivalent to the one in Figure 4.35.

w

x

y

z

(w · x) + (y · z )

Figure 4.36: The function in Equation 4.41 using two AND gates, one OR gate and four NOT

gates.

Next, comparing the AND-gate /NOT-gate combination with Figure 4.29, we see that each

is simply a NAN D gate. Similarly, comparing the NOT-gates/OR-gate combination with Figure

4.31, it is also a NAND gate. Thus we can also implement the function in Equation 4.41 with

three NAND g ates as shown in Figure 4.37.

w

x

y

z

(w · x) + (y · z )

Figure 4.37: The function in Equation 4.41 using only three NAND gates.

From simply viewing the circuit diagrams, it may se em that we have n ot gained an ything

in this circuit transformation. But we saw in Sec tion 4.4.3 that a NAND gate requires fewer

transistors than an AND gate or OR gate due to the signal inversion properties of transistors.

Thus, the NAND gate implementation is a less expensive and faster implementation.

The conversion from an AND/OR/NOT g ate design to one that uses only NAND gates is

straightforward:

1. Express the function as a minimal SoP.

2. Convert the prod ucts (AND terms) and the final sum (OR) to NANDs.

3. Add a NAND gate for any produc t with only a single literal.

80 CHAPTER 4. LOGIC GATES

As with software, hardware design is an iterative process. Since there usually is not a unique

solution, you often need to develop several designs and analyze each one within the context of

the available hardware. The example above shows that two solutions that look the same on

paper may be dissimilar in hardware.

In Chapter 6 we will see how these conce pts can be used to construct the heart of a computer

the CPU.

4.6 Exercises

4-1 (§4.1) Prove the identity property expressed by Equations 4.3 and 4.4.

4-2 (§4.1) Prove the commutative property expressed by Equations 4.5 and 4.6.

4-3 (§4.1) Prove the null property expressed by Equations 4.7 and 4.8.

4-4 (§4.1) Prove the complement property expressed by Equations 4.9 and 4.10.

4-5 (§4.1) Prove the idempotent property e xpressed by Equations 4.11 and 4.12.

4-6 (§4.1) Prove the distributive property ex pressed by Equations 4.13 and 4.14.

4-7 (§4.1) Prove the involution p roperty expressed by Equation 4.15.

4-8 (§4.2) Show that Equations 4.18 and 4.19 represent the same function. This shows that

the sum of minterms and product of maxterms are complementary.

4-9 (§4.3.2) Show where each minterm is located with this Karnaugh map axis labeling using

the no tation of Figure 4.9.

F (x, y, z )

xy

00 01 1011

z

0

1

4-10 4.3.2) Show where each minterm is located with this Karnaugh map axis labeling using

the no tation of Figure 4.9.

F (x, y, z )

xz

00 01 1011

y

0

1

4-11 4.3.2) Design a logic fun c tio n that detects the prime single-digit number s. Assume that

the numbers are coded in 4-bit BCD (see Section 3.6.1, page 50). The function is 1 for each

prime number.

4-12 4.4.3) Using drawings similar to those in Figure 4.27, verfy that the logic circuit in Figure

4.28 is an AND gate.

4-13 4.5) Show that the gate in Figure 4.31 is a NAND gate.

4-14 4.5) Give an alternate way to draw a NOR gate, similar to the alternate NAND gate in

Figure 4.31.

4.6. EXERCISES 81

4-15 4.5) Design a circuit using NAND gates that detects the "below" condition f or two 2-bit

values. That is, given two 2-bit variables x and y , F (x, y ) = 1 when the unsigned integer

value of x is less than the unsigned integer value of y .

a) Give a truth table for the output of the circuit, F.

b) Find a minimal sum of produc ts for F.

c) Implement F using NAND gates.

Chapter 5

Logic C ircuits

In this chapter we examine how the concepts in Chapter 4 can be used to build some of the logic

circuits that make up a CPU, Mem ory, and other d evices. We will not describe an entire unit,

only a few small parts. The goal is to provide an introductory overview of the concepts. There

are many excellent books that cover the de tails. For example, see [20], [23], or [24] for circuit

design d etails and [28], [31], [34] for CPU architecture design concepts.

Logic circuits can be classifie d as either

C ombinational Logic Circuits the output(s) depend only on the inp ut(s) at any spe-

cific time and not on any previous inpu t( s).

Sequential Logic Circuits the output(s) depend both on previous and current in-

put(s).

An example o f the two con c epts is a television remote control. You can enter a number

and the output (a particular television channel) depends only on the number entered. It does

not matter what channels been viewed prev iously. So the relationship between the input (a

number) and the output is combinational.

The remote control also has inputs for stepping either up or down one channel. When using

this input method, the channel selected depends o n what channel has been p reviously selected

and the sequence of up/down button pushe s. The channel u p/down buttons illustrate a sequen-

tial input/output relationship.

Although a more formal definition will be given in Section 5.3, this television example also

illustrates the concept of state. My television remote control has a button I can push that will

show the current channel setting. If I make a note of the be ginning channel setting, and keep

track of the sequence of channel up and down button pushes, I will know the ending channel

setting. It does not matter how I originally got to the beginn ing channel setting. The channel

setting is the state of the channel selection mechanism because it tells me everything I need

to know in order to select a new channel by using a sequence of channel u p and down button

pushes.

5.1 Combinational Logic Circuits

Combinational logic circuits have no memory. The output at any given time depen ds completely

upon the circuit config uration and the inpu t(s).

5.1.1 Adder Circuits

One of the most fundamental operations the ALU must do is to add two bits. The possible results

are:

0 + 0 = 00

0 + 1 = 01

82

5.1. COMBINATIONAL LOGIC CIRCUITS 83

1 + 0 = 01

1 + 1 = 10

As you have seen in Section 3.1 (page 28), the 1 in the last sum (1 + 1 = 10) is c arr ied to the

next higher-order bit when performing multi-bit addition. This implies that add ition of the next

higher-order bits must be able to add three bits:

0 + 0 + 0 = 00

0 + 0 + 1 = 01

0 + 1 + 0 = 01

0 + 1 + 1 = 10

1 + 0 + 0 = 01

1 + 0 + 1 = 10

1 + 1 + 0 = 10

1 + 1 + 1 = 11

We will construct two adders:

half adder: A combinational logic device that has two 1-bit inputs, x

i

and y

i

, and two outputs

that are related as shown in the truth table (where x

i

is the i

th

bit of the multiple bit value,

x):

x

i

y

i

Carry

i+1

Sum

i

0 0 0 0

0 1

0 1

1 0 0 1

1 1 1 0

full adder: A combinational logic device that has three 1-bit inputs, Carry

i

, x

i

, and y

i

, and two

outputs that are related:

Carry

i

x

i

y

i

Carry

i+1

Sum

i

0 0 0 0 0

0 0 1 0 1

0 1 0

0 1

0 1 1 1 0

1 0 0 0 1

1 0 1

1 0

1 1 0 1 0

1 1 1 1 1

Carry

i+1

is the carry from adding the next-lower significant bits, x

i

, y

i

, and Carry

i

.

The terms half adder and full adder come from the fact that a full adder can be constructed

from two half adders, with the addition of a carry input.

We begin with a half adder. Looking at the sum in the definition of half adder, it is easy to

see that this is simply the XOR of the two inputs. The carry is the AND of the two inputs. This

leads to the circuit in Figure 5.1.

x

i

y

i

Sum

i

Carry

i+1

Figure 5.1: A half adder circuit.

The full adder is not as obv ious. First, let us look at the Karnaugh map for the sum:

84 CHAPTER 5. LOGIC CIRCUITS

Sum

i

x

i

y

i

00 01 1011

Carry

i

0

1

1 1

1 1

The d iag onal pattern suggests that the XOR oper ation can be used in our solution, and the

half adder used an XOR operation. But the pattern is difficult to see, because we have a map

that shows minterms and there are no obvious gro upings.

So let us try an algebraic approach. We can write the function as a sum of product terms

from the Karnaugh map.

Sum

i

(Carry

i

, x

i

, y

i

) = Carry

i

· x

i

· y

i

+ Carry

i

· x

i

· y

i

+ Carry

i

· x

i

· y

i

+ Carry

i

· x

i

· y

i

(5.1)

Using the distribution rule, we can rearrange:

Sum

i

(Carry

i

, x

i

, y

i

) = Carry

i

· (x

i

· y

i

+ x · y

i

) + Carry

i

· (x

i

· y

i

+ x

i

· y

i

)

= Carry

i

· (x

i

y

i

) + Carry

i

· (x

i

· y

i

+ x

i

· y

i

) (5.2)

Let us m an ipulate the last product term in Equation 5.2.

x

i

· y

i

+ x

i

· y

i

= x

i

· x

i

+ x

i

· y

i

+ x

i

· y

i

+ y

i

· y

i

= x

i

· (x

i

+ y

i

) + y

i

· (x

i

+ y

i

)

= (x

i

+ y

i

) · (x

i

+ y

i

)

= (x

i

y

i

)

So the right side of Equation 5.2 is in the form

a

· b + a · b

where

a = Carry

i

b = x

i

y

i

Thus, we conclude :

Sum

i

(Carry

i

, x

i

, y

i

) = Carry

i

(x

i

y

i

) (5.3)

Now for the carry:

Carry

i+1

x

i

y

i

00 01 1011

Carry

i

0

1

1

1 11

Let us rst group the two minterms:

Carry

i+1

x

i

y

i

00 01 1011

Carry

i

0

1

1

1 11

5.1. COMBINATIONAL LOGIC CIRCUITS 85

You should be able to see two other possible groupings on this Karnaugh map and may wonder

why they are not circled here. The two ungrouped minterms, Carry

i

· x

i

· y

i

and Carr y

i

· x

i

· y

i

,

form a pattern that suggests an exc lusive or operation.

This grouping yields a three-term function that defines when Carry

i+1

= 1:

Carry

i+1

= x

i

· y

i

+ Carry

i

· x

i

· y

i

+ Carry

i

· x

i

· y

i

= x

i

· y

i

+ Carry

i

· (x

i

· y

i

+ x

i

· y

i

)

= x

i

· y

i

+ Carry

i

· (x

i

y

i

) (5.4)

Notice that the first product term in Equation 5.4, x

i

· y

i

, is generated by the Ca rry portio n of

a half-adder, and that the exclusive or portion, x

i

y

i

, of the second product term is generated

by the Sum portion. A logic gate implementation of a full adder is shown in Figure 5.2. You can

x

i

y

i

Sum

i

Carry

i+1

Carry

i

Figure 5.2: A full adder circuit.

see that it is implemented using two half-adders and an OR gate.

5.1.2 Ripple-Carry Addition/Subtraction Circuits

An n-bit adder can be implemented with n full adders. Figure 5.3 shows a 4-bit adder. Addition

Full AdderFull AdderFull AdderFull Adder

x

3

y

3

x

2

y

2

x

1

y

1

x

0

y

0

s

0

s

1

s

2

s

3

c

1

c

2

c

3

0

c

4

s = x + y

CF = c

4

OF = c

3

c

4

Figure 5.3: Four-bit adder.

begins with the f ull adder on the right r eceiving the two lowest-order bits, x

0

and y

0

. Since this

is the lowest-order bit there is no carry an d c

0

= 0. The bit sum is s

0

, and the carry f rom this

addition, c

1

, is connected to the carry input of the next full adder to the left, where it is adde d to

x

1

and y

1

.

So the i

th

full adder add s the two i

th

bits of the operands, plus the carr y (w hich is either 0

or 1) from the (i 1)

th

full adder. Thus, each full adder handles one bit (of ten re ferred to as a

"slice") of the total width of the values being added, and the carry "ripp les" from the lowest-order

place to the highest-order.

The final carry from the h ighest-order full adder, c

4

in the 4-bit adder of Figure 5.3, is stored

in the CF bit of the Flags register (see Section 6.2). And the exclusive or of the final carry and

penultimate carr y, c

4

c

3

in the 4-bit adder of Figure 5.3, is stored in the OF bit.

Recall that in the 2's com plement code for storing integers a number is negated by taking its

2's complement. So we can subtract y from x by doing :

86 CHAPTER 5. LOGIC CIRCUITS

x y = x + (2's c omplement o f y )

= x + [(y 's bits flipped ) + 1]

Thus, subtraction can be performed w ith our adder in Figure 5.3 if we complement each y

i

and set the initial carr y in to 1 instead of 0. Each y

i

can be complemented by XOR-ing it with 1.

This leads to the 4-bit circuit in Figure 5.4 that will add two 4-bit numbers when func = 0 and

subtract them when f unc = 1 .

Full AdderFull AdderFull AdderFull Adder

x

3

y

3

x

2

y

2

x

1

y

1

x

0

y

0

s

0

s

1

s

2

s

3

c

1

c

2

c

3

func

c

4

if (func == 0 )

s = x + y

else // f unc == 1

s = x y

CF = c

4

OF = c

3

c

4

Figure 5.4: Four-bit adder/subtracter.

There is, of course, a time delay as the sum is computed from right to left. The computation

time can be significantly reduced through more complex circuit designs that pre-compute the

carry.

5.1.3 Decoders

Each instruction must be decoded by the CPU before the instruction c an be carried out. In the

x86-64 architecture the instruction for copying the 64 bits of one register to another register is

0100 0s0d 1000 1001 11ss sddd

where "ssss" specifie s the source register and "dddd" specifies the destination re gister. (Yes, the

bits that specify the registers are distributed through the instruction in this man ner. You will

learn mo re about this seemingly odd coding p attern in Chapter 9.) For example,

0100 0001 1000 1001 1100 0101

causes the ALU to copy the 64-bit value in register 0000 to register 1101. You will see in Chapter

9 that this instruction is written in assembly language as:

movq %rax, %r13

The Control Unit must select the co rrect two registers based on these two 4-bit patterns in the

instruction. It uses a decoder circuit to perform this selection.

decoder: A device with n binary inpu ts and 2

n

binary outputs. Each bit pattern at the input

causes exactly one of the 2

n

to equal 1.

A decoder can be thought of as converting an n-bit input to a 2

n

output. But while the input can

be an arbitrary bit pattern, each corresponding output value has only on e bit set to 1.

5.1. COMBINATIONAL LOGIC CIRCUITS 87

In some applications no t all the 2

n

outputs are used. For example, Table 5.1 is a truth table

that shows how a decoder can be used to convert a BCD value to its corresponding decimal

numeral display. A 1 in a "display" column means that is the numeral that is selected by the

input display

x

3

x

2

x

1

x

0

9

8

7

6

5

4

3

2

1

0

0 0 0 0 0 0 0 0 0 0 0 0 0 1

0 0 0 1

0 0 0 0 0 0 0 0 1 0

0 0 1 0 0 0 0 0 0 0 0 1 0 0

0 0 1 1 0 0 0 0 0 0 1 0 0 0

0 1 0 0

0 0 0 0 0 1 0 0 0 0

0 1 0 1 0 0 0 0 1 0 0 0 0 0

0 1 1 0 0 0 0 1 0 0 0 0 0 0

0 1 1 1

0 0 1 0 0 0 0 0 0 0

1 0 0 0 0 1 0 0 0 0 0 0 0 0

1 0 0 1

1 0 0 0 0 0 0 0 0 0

Table 5.1: BCD decoder. The 4-bit input cause s the numeral with a 1 in its column to be dis-

played.

corresponding 4-bit input value. There are six other possible outputs corresponding to the input

values 1010 1111. But these input values are illegal in BCD, so these outputs are simply

ignored.

It is common for decoders to have an additional input that is used to enable the output. The

truth table in Table 5.2 shows a decoder with a 3-bit input, an enable line, and an 8-bit (2

3

)

output. The output is 0 whenever enable = 0 . When enable = 1 , the i

th

output bit is 1 if and

enable x

2

x

1

x

0

y

7

y

6

y

5

y

4

y

3

y

2

y

1

y

0

0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0

1 0 0 0 0 0 0 0 0 0 0 1

1 0 0 1

0 0 0 0 0 0 1 0

1 0 1 0 0 0 0 0 0 1 0 0

1 0 1 1

0 0 0 0 1 0 0 0

1 1 0 0 0 0 0 1 0 0 0 0

1 1 0 1 0 0 1 0 0 0 0 0

1 1 1 0

0 1 0 0 0 0 0 0

1 1 1 1 1 0 0 0 0 0 0 0

Table 5.2: Truth table for a 3 × 8 decoder with enable. If enable = 0 , y = 0 . If ena ble = 1 ,

x = i y

i

= 1 and y

j

= 0 for all j 6 = i .

only if the binary value of the input is equal to i. For example, when enable = 1 and x = 011

2

,

y = 00001000

2

. That is,

y

3

= x

2

· x

1

· x

0

= m

3

This clearly generalizes such that we can give the follo wing description of a decoder :

1. For n input bits (excluding an enable bit) there are 2

n

output bits.

88 CHAPTER 5. LOGIC CIRCUITS

2. The i

th

output bit is equal to the i

th

minterm for the n input bits.

The 3 × 8 d ecoder specified in Table 5.2 can be impleme nted with 4-inp ut AND gates as shown

in Figure 5.5.

enable x

2

x

2

x

1

x

1

x

0

x

0

y

0

y

1

y

2

y

3

y

4

y

5

y

6

y

7

Figure 5.5: Circuit for a 3 × 8 dec oder with enable.

Decoders are more versatile than it might seem at first glance. Each possible input can be

seen as a minterm. Since each output is one only when a particular minte rm evaluates to one,

a decoder can be viewed as a "minterm g enerator." We know that any logical expression can be

represente d as the OR of minterms, so it follows that we can implement any logical expression

by ORing the ou tput(s) of a d ecoder.

For example, let us rewrite Equation 5.1 for the Sum expression of a full adder using minterm

notation (see Sec tion 4.3.2):

Sum

i

(Carry

i

, x

i

, y

i

) = m

1

+ m

2

+ m

4

+ m

7

(5.5)

And for the Carry exp ression:

Carry

i+1

(Carry

i

, x

i

, y

i

) = m

3

+ m

5

+ m

6

+ m

7

(5.6)

where the subscripts on x, y , and Carr y refer to the bit slice and the subscripts on m are part of

the minterm notation. We can impleme nt a f ull adder with a 3 × 8 decoder and two 4-input OR

gates, as shown in Figure 5.6.

5.1.4 Multiplexers

There are many places in the CPU where one of several signals must be selected to pass onward.

For example, as you will see in Chapter 9, a value to be added by the ALU may come from a CPU

register, come from memory, or actually be stored as part of the instruction itself. The device

that allows this selection is essentially a switch.

multiplexer: A device that selects one of multiple inputs to be passed on as the output based

on one or more sele ction lines. Up to 2

n

inputs can be selected by n selection lines. Also

called a mux.

5.1. COMBINATIONAL LOGIC CIRCUITS 89

3 × 8

decoder

m

0

m

1

m

2

m

3

m

4

m

5

m

6

m

7

x

i

y

i

Carry

i

Enable

Sum

i

Carry

i+1

Figure 5.6: Full adder implemente d with 3 × 8 decoder. This is for one bit slice. An n-bit adder

would require n of these circuits.

Figure 5.7 shows a multiplexer that can switch between two different inputs, x and y . The select

input, s, determin es which of the sources, either x or y , is passed on to the output. The action of

this 2-way multiplexer is most easily seen in a truth table:

s

Output

1 x

0 y

x

y

s

Output

Figure 5.7: A 2-way multiplexer.

Here is a truth table for a multiplexer that c an switch between four inputs, w , x, y , and z :

s

1

s

0

Output

0 0 w

0 1

x

1 0 y

1 1 z

That is,

Output = s

0

· s

1

· w + s

0

· s

1

· x + s

0

· S

1

· y + s

0

· s

1

· z (5.7)

which is implemented as shown in Figure 5.8. The symbol for this multiplexer is shown in

Figure 5.9. Notice that the selection input, s, must be 2 bits in o r der to select between four

inputs. In general, a 2

n

-way multiplexer requires an n-bit selection input.

90 CHAPTER 5. LOGIC CIRCUITS

w

x

y

z

s

0

s

1

Output

Figure 5.8: A 4-way multiplexer.

Sel

0

1

2

3

w

x

y

z

Output

S

0

, S

1

Figure 5.9: Symbol for a 4-way multiplexer.

5.2 Programmable Logic Devices

Combinational logic circuits can be con structed from programmable logic d evices (PLDs). The

general idea is illustrated in Figure 5.10 fo r two input variables and two output functions of

these variables. Each of the input variables, both in its uncomplemented and complemented

x y

F

1

(x, y ) F

2

(x, y )

Figure 5.10: Simplified c irc uit for a progr am mable logic array. The "S" shaped line at the inputs

to each gate represen t fuses. The fuses are "blow n" to remove that input.

5.2. PROGRAMMABLE LOGIC DEVICES 91

form, are inputs to AND gates through fuses. (The "S" shaped lines in the circuit diagr am

represent fu se s.) The fuses can be "blown" or left in place in order to program e ach AND gate to

output a product. Since every input, plus its complement, is input to each AND gate, any of the

AND g ates can be programmed to output a minterm.

The prod ucts produce d by the array o f AND gates are all connected to OR gates, also through

fuses. Thus, depending on which OR-gate fuses are left in place, the ou tp ut of each OR gate is a

sum of pr oducts. There may be additional logic circuitry to select between the different outp uts.

We have already seen that any Boolean function can be expressed as a sum of products, so this

logic device can be programmed by "blowing" the fuses to implement any Boolean function .

PLDs come in many configurations. Some are pre-programmed at the time of manufactur e.

Others are p r ogrammed by the manufacturer. And ther e are types that can be programmed by

a user. Some can even be erased and re programmed. Programming technologies range from

specifying the manufacturing mask (for the pre-programmed devices) to inexpe nsive electronic

programming systems. Some devices use "antifuses" instead of fuses. They are norm ally open.

Programming such devices consists of completing the connection instead of removing it.

There are three general categories of PLDs:

Programmable Logic Array (PLA): Both the AND gate plane and the OR gate plane are

programmable.

Read Only Memory (ROM): Only the OR gate plane is programmable.

Programmable Array Logic (PAL): Only the AN D gate plane is programmable.

We will now look at each catego ry in more detail.

5.2.1 Programmab le Lo g ic Array (PLA)

Programmable logic arr ays are typically larger than the on e shown in Figure 5.10, which is

already complicated to draw. Figure 5.11 shows how PLAs are typically diagrammed. This

w x y z

F

1

F

2

F

3

Figure 5.11: Programmable logic array schematic. The horizontal lines to the AND gate inputs

represent multiple wires one for each inp ut variable and its complement. The

vertical lines to the OR gate inputs also represent multiple wires one for each

AND g ate output. The dots represen t connections.

diagram deserves some explanation. Note in Figure 5.10 that each input variable and its com-

plement is connected to the inputs of all the AND gates through a fuse. The AND gates have

multiple inputs one for each variable an d its complement. Thus, the horizontal line leading

92 CHAPTER 5. LOGIC CIRCUITS

to the inputs of the AND gates represent multiple wires. The diagram of Figure 5.11 has four

input variables. So each AND gate has eight inputs, and the horizontal lines each represent the

eight wires coming from the inputs an d their complements.

The dots at the intersectio ns of the vertical and horizontal line rep resent places where the

fuses have been left intact. For e xample, the three dots on the topmost horizontal line indicate

that there are three inputs to that AND gate The output of the topmost AND gate is

w

· y · z

Referring again to Figure 5.10, we see that the output from each AND gate is co nnected to

each of the OR gates. Therefore, the OR gates also have multiple inputs one for each AND

gate and the vertical lines leadin g to the OR gate inputs repre se nt multiple wires. The PLA

in Figure 5.11 has been pro grammed to provide the three functions:

F

1

(w, x, y, z ) = w

· y · z + w · x · z

F

2

(w, x, y, z ) = w

· x

· y

· z

F

3

(w, x, y, z ) = w

· y · z + w · x · z

5.2.2 Read Only Memory (ROM)

Read only memory can be implem ented as a progr ammable logic device where only the OR plane

can be programm ed. The AND gate plane is wired to p rovide all the minterms. Thus, the in puts

to the ROM can be thought of as addresses. Then the OR gate plane is programmed to provide

the bit pattern at each addr ess.

For example, the R OM diagrammed in Figure 5.12 has two inputs, a

1

and a

0

. The AN D gates

a

1

a

0

d

7

d

6

d

5

d

4

d

3

d

2

d

1

d

0

× ×

× ×

× ×

× ×

Figure 5.12: Eight-byte Read Only Memor y (ROM). The "× " conne c tio ns represent permanent

connection s. Each AND gate can be tho ught of as producing an address. The eight

OR gates produce one byte. The connections (dots) in the OR plane represe nt the

bit pattern stored at the addr ess.

are wired to give the minterms:

minterm address

a

1

a

0

00

a

1

a

0

01

a

1

a

0

10

a

1

a

0

11

5.2. PROGRAMMABLE LOGIC DEVICES 93

And the OR gate plane has been programmed to store the four characters (in ASCII code):

minterm address contents

a

1

a

0

00

0

a

1

a

0

01

1

a

1

a

0

10

2

a

1

a

0

11

3

You can see from this that the terminology "Read Only Memory" is perhaps a bit misleading.

It is actually a combinational logic circuit. Strictly speaking, memory has a state that can be

changed by inputs. (See Section 5.3.)

5.2.3 Programmab le Array Logic (PAL)

In a Programmable Array Logic (PAL) device, each OR gate is p ermanently wired to a group of

AND gates. Only the AND gate plane is pr ogrammable. The PAL diagrammed in Figure 5.13

has four inputs. It provides two outputs, e ach of which can be the sum of up to four products. The

"×" connections in the OR gate plane show that the top four AND gates are summed to prod uce

F

1

and the lower four to produce F

2

. The AND gate plane in this figure has been programmed

to produce the two functions:

F

1

(w, x, y, z ) = w · x

· z + w

· x + w · x · y

+ w

· x

· y

· z

F

2

(w, x, y, z ) = w

· y · z + w · x · z

+ w · x · y · z + w · x · y

· z

94 CHAPTER 5. LOGIC CIRCUITS

w x y z

F

1

F

2

×

×

×

×

×

×

×

×

Figure 5.13: Two-function Programmable Array Logic (PAL). The "×" connections represent per-

manent connections. Each AND gate can be thought of as producing an address.

The eight OR gates produce one byte. The connections (dots) in the OR plane rep-

resent the bit pattern stored at the address.

5.3 Sequential Logic Circuits

Combinational circuits (Section 5.1) are instantaneous (except for the time required for the

electronics to settle). Their ou tput depends only on the input at the time the outp ut is observed.

Sequential logic circuits, on the other hand, have a time history. That history is summarized by

the current sta te of the circ uit.

state: The state of a system is the description o f the system such that knowing

(a) the state at time t

0

, and

(b) the input(s) from time t

0

through time t

1

,

uniquely determines

(c) the state at time t

1

, and

(d) the output(s) from time t

0

through time t

1

.

This definition means that knowing the state of a system at a given time tells you every thin g

you need to know in order to specify its behavior from that time on . How it got into this state is

irrelevant.

This definition implies that the system h as memo ry in which the state is stored. Since there

are a finite number of states, the term finite state machine(FSM ) is commonly used. Inputs to

the system can cause the state to change.

5.3. SEQUENTIAL LOGIC CIRCUI TS 95

If the output(s) depe nd only on the state of the FSM, it is called a Moore machine. And if the

output(s) depend on both the state and the curr ent input(s), it is called a Mealy machine.

The most commo nly used sequential circuits are synchronous their action is controlled by

a sequenc e of clock pulses. The clock pulses are created by a c lock generator circuit. The clock

pulses are applied to all the sequential elements, thu s causing them to operate in synchrony.

Asynchronous sequential circu its are n ot based on a clock. They depend upon a timing delay

built into the individu al elements. Their behavior depends upon the order in which inputs are

applied. Hence, they are difficult to analyze and will not be discussed in this book.

5.3.1 Clock Pulses

A clock signal is typically a square wave that alternates between the 0 and 1 le vels as shown

in Figure 5.14 The amount of time spent at each level may be unequal. Altho ugh not a require-

ment, the timing pattern is usually uniform.

(a) Level trigger.

6 6 6 6

(b) Positive-edge trigger.

? ? ? ?

(c) Negative- edge trigger.

Time -

Figure 5.14: Clock signals. (a) For level-triggered circuits. (b) For positive-edge triggering. (c)

For negative-edge triggering.

In Figure 5.14(a), the circuit operations take place d uring the entire time the clock is at the

1 level. As will be explaine d below, this can lead to unreliable circuit behavior. In order to

achieve more reliable behavior, most circuits are designed such that a transitio n of the clock

signal trigge rs the circuit elements to start their respective operations. Either a positive-going

(Figure 5.14(b)) or negative-go in g (Figure 5.14(c)) transition may be used. The clock frequency

must be slow enough such that all the circuit elements have time to com plete their operations

before the next clock transition (in the same direction) occurs.

5.3.2 Latches

A latch is a storage device that can be in one of two states. That is, it store s one bit. It can be

constructed from two or more gates connec ted such that feedback m aintains the state as long as

power is applied. The most fundamental latch is the SR (Set-Reset).

A simple implemen tation using NOR gates is shown in Figure 5.15. When Q = 1 ( Q

= 0)

it is in the Set state. When Q = 0 ( Q

= 1) it is in the Reset state.

There are four possible input c ombinations.

96 CHAPTER 5. LOGIC CIRCUITS

S

R

Q

Q

Figure 5.15: NOR gate implementation of an SR latch.

S = 0, R = 0: Keep current state. If Q = 0 and Q

= 1, the output of the upper NOR gate is

(0 + 0)

= 1, and the output of the lower NOR gate is (1 + 0)

= 0.

If Q = 1 and Q

= 0, the output of the upper NOR gate is (0 + 1)

= 0, and the output of the

lower NOR gate is (0 + 0)

= 1.

Thus, the cross feedback between the two NOR gates maintains the state Set or Reset

of the latch.

S = 1, R = 0: Set. If Q = 1 and Q

= 0, the output o f the upper NOR gate is (1 + 1)

= 0, and the

output of the lowe r NOR gate is (0 + 0)

= 1. The latch remains in the Set state.

If Q = 0 and Q

= 1, the output of the upper NOR gate is (1 + 0)

= 0 . This is fed back

to the input of the lower NOR gate to give (0 + 0)

= 1. The feedback from the output of

the lo wer NOR gate to the input of the upper kee ps the output of the upper N OR gate at

(1 + 1)

= 0. The latch has moved into the Set state.

S = 0, R = 1: Reset. If Q = 1 and Q

= 0 , the output of the lower NOR gate is (0 + 1)

= 0. This

causes the output of the upper NOR gate to become (0 + 0)

= 1. The feedback from the

output of the upper NOR gate to the input of the lower keeps the output of the lower NOR

gate at (1 + 1)

= 0. The latch has moved into the Rese t state.

If Q = 0 and Q

= 1, the output o f the lower NOR gate is (1 + 1)

= 0, and the outp ut of the

upper NOR gate is (0 + 0)

= 1. The latch remains in the Reset state.

S = 1, R = 1: Not allowed. If Q = 0 and Q

= 1, the o utput of the upper NOR gate is (1+0)

= 0.

This is fe d back to the input of the lower NOR gate to give (0 + 1)

= 0 as its output. The

feedback from the output of the lower NOR gate to the input of the upper maintains its

output as (1 + 0)

= 0. Thus, Q = Q

= 0, which is not allowed.

If Q = 1 and Q

= 0, the output of the lower NOR gate is (0 + 1)

= 0. This is fed back to

the input of the upper NOR gate to g ive (1 + 0)

= 0 as its output. The feedback from the

output of the upper NOR gate to the input of the lower maintains its output as (0 + 1)

= 0.

Thus, Q = Q

= 0, which is not allowed.

The state table in Table 5.3 summarizes the behavior of a NOR-based SR latch. The inputs

Current Next

S R

State State

0 0 0 0

0 0 1 1

0 1 0 0

0 1

1 0

1 0 0 1

1 0 1 1

1 1

0 X

1 1 1 X

Table 5.3: SR latch state table. "X" indicates an indeterminate state. A circuit using this latch

must be de sign ed to prevent this input combination.

to a NOR-based SR latches are nor mally held at 0, which maintains the current state, Q. Its

5.3. SEQUENTIAL LOGIC CIRCUI TS 97

current state is available at the ou tput. Momentarily changing S or R to 1 causes the state to

change to Set or Reset, respectively, as shown in the Q

next

column.

Notice that placing 1 on both the Set and Res et inputs at the same time causes a problem.

Then the outputs of both NOR gates would become 0. In other words, Q = Q

= 0, which is

logically impossible. The circuit design must be such to prevent this input combination.

The behavior of an SR latch can also be shown by the state diagram in Figure 5.16 A state

0 1

00

01

10

00

10

01

SR

Figure 5.16: State diagram fo r an SR latch. There are two possible inputs, 00 or 01 , that cause

the latch to remain in state 0. Similarly, 00 or 10 cause it to remain in state 1. Since

the output is simply the state, it is not shown in this state diagram. Notice that

the inpu t 11 is not allowed, so it is not shown on the diagram.

diagram is a directed graph. The circles show the possible states. Lines with arrows show the

possible transitions between the states and are labeled with the input that causes the transition.

The two circles in Figure 5.16 show the two possible states of the SR latch 0 or 1. The

labels on the lines show the two-bit inputs, SR , that cause each state transition. Notice that

when the latch is in state 0 there are two possible inputs, SR = 00 and SR = 01 , that cause

it to remain in that state. Similarly, w hen it is in state 1 either of the two inputs, SR = 00 or

SR = 1 0 , cause it to remain in that state.

The output of the SR latch is simply the state so is not shown separately on this state dia-

gram. In general, if the output of a circuit is dependent on the input, it is often shown on the

directed lines of the state diagram in the format "input/output." If the output is dependent on

the state, it is more commo n to show it in the corresponding state circle in "state/output" format.

NAND gates are more commo nly used than N OR gates, and it is possible to build an SR

latch f r om NAN D gates. Recalling that NAND and NOR have complementary proper tie s, we

will think ahead and u se S

and R

as the inputs, as shown in Figure 5.17. Consider the four

S

R

Q

Q

Figure 5.17: NAND gate implementation of an S'R' latch.

possible input combinations.

S' = 1, R' = 1: Keep current state. If Q = 0 and Q

= 1, the output of the upper NAND gate is

(1 · 1)

= 0, and the output of the lower NAND gate is (0 · 1)

= 1.

If Q = 1 and Q

= 0, the output of the upper NAN D gate is (1 · 0)

= 1, and the output of

the low er NAND gate is (1 · 1)

= 0.

Thus, the cross feedback between the two NAND gates maintains the state Set or Reset

of the latch.

S' = 0, R' = 1: Set. If Q = 1 and Q

= 0 , the output o f the upper NAND gate is (0 · 0)

= 1 , and

the ou tput of the lower NAND gate is (1 · 1)

= 0. The latch remains in the Set state.

If Q = 0 and Q

= 1, the output of the uppe r NAND gate is (0 · 1)

= 1. This causes the

output of the lower NAND gate to become (1 · 1)

= 0. The feedback from the o utput of the

lower NAND gate to the in put of the upper keeps the ou tp ut of the upper NAND gate at

(0 · 0)

= 1. The latch has moved into the Set state.

98 CHAPTER 5. LOGIC CIRCUITS

S' = 1, R' = 0: Reset. If Q = 0 and Q

= 1, the output of the lower NAND gate is (0 · 0)

= 1, and

the ou tput of the upper NAND gate is (1 · 1)

= 0. The latch remains in the Reset state.

If Q = 1 and Q

= 0 , the output of the lower NAND gate is (1 · 0)

= 1. This is fe d back to

the input of the uppe r NAND gate to giv e (1 · 1)

= 0. The f eedback from the output of the

upper N A ND gate to the input of the lower keeps the output of the lower NAND gate at

(0 · 0)

= 1. The latch has moved into the Reset state.

S' = 0, R' = 0: Not allowed. If Q = 0 and Q

= 1, the output of the upper NAND gate is (0 · 1)

=

1. This is fed back to the input of the lower NAND gate to give (1 · 0)

= 1 as its output.

The feedback from the output of the lower NAND gate to the input of the u pper maintains

its output as (0 · 0)

= 1. Thus, Q = Q

= 1, which is not allowed.

If Q = 1 and Q

= 0 , the output of the lower NAND gate is (1 · 0)

= 1. This is fe d back to

the input of the upper NAND gate to give (0 · 1)

= 1 as its output. The feedback from the

output of the upper NAND gate to the input of the lower maintains its output as (1 · 1)

= 0.

Thus, Q = Q

= 1, which is not allowed.

Figure 5.18 show s the behavior of a NAND-based S'R' latch. The inputs to a N A ND-based

S'R' latch are n ormally held at 1, which maintains the current state, Q. Its current state is

available at the output. Momen tarily changing S

or R

to 0 causes the state to change to Set or

Reset, re spectively, as shown in the "Next State" column.

Current Next

S

R

State State

1 1 0 0

1 1 1 1

1 0 0 0

1 0

1 0

0 1 0 1

0 1

1 1

0 0 0 X

0 0 1 X

0 1

11

10

01

11

01

10

S'R'

Figure 5.18: State table and state diagram f or an S'R' latch. There are two possible inputs, 11 or

10, that cause the latch to remain in state 0. Similarly, 11 or 01 cause it to remain in

state 1. Since the output is simply the state, it is not shown in this state diagram.

Notice that the input 00 is not allowed , so it is not shown on the diag r am .

Notice that placing 0 on both the Set and Res et inputs at the same time causes a problem.

Then the outputs of both NOR gates would become 0. In other words, Q = Q

= 0, which is

logically impossible. The circuit design must be such to prevent this input combination.

So the S'R' latch implemented with two NAND gates can be thought of as the complement of

the NO R gate SR latch. The state is maintained by holding both S

and

at 1. S

= 0 causes the

state to be 1 (Set), and R

= 0 causes the state to be 0 (Reset). Using S

and R

as the activating

signals are usually called active-low signals.

You have already seen that ones and zeros are represented by either a h igh or low voltage

in electronic logic circuits. A given logic device may be activated by combinations of the two

voltages. To sho w which is used to cause activation at any giv en input, the following definitions

are used:

active-high signal: The higher voltage represents 1.

active-low signal: The lower voltage represents 1.

Warning! The d efinitions of active-high versus active-low signals vary in the literature. Make sure

that you and the people you are working with have a clear agreement on the definitions you are using.

5.3. SEQUENTIAL LOGIC CIRCUI TS 99

An active-high signal can be connected to an active-low input, but the hardware designer

must take the difference into account. For ex am ple, say that the requ ired logical input is 1 to

an active-low input. Since it is active-low, that means the required voltage is the lower of the

two. If the signal to be connected to this input is active-high, then a logical 1 is the higher of the

two voltages. So this signal must first be complemented in order to be interpreted as a 1 at the

active-low input.

We can get better control over the SR latch by adding two NAND gates to prov ide a Control

input, as shown in Figure 5.19. In this circuit the outputs of both the control NA ND g ates

S

R

Q

Q

Contr ol

Figure 5.19: SR latch with Control input.

remain at 1 as long as Control = 0 . Table 5.4 sho ws the state behavior of the SR latch with

control.

Current Next

Contr ol S R State State

0 − − 0 0

0 − −

1 1

1 0 0 0 0

1 0 0 1 1

1 0 1

0 0

1 0 1 1 0

1 1 0

0 1

1 1 0 1 1

1 1 1 0 X

1 1 1

1 X

Table 5.4: SR latch with Control state table. "–" indicates that the value does not matter. "X" in -

dicates an indeterminate state. A circuit using this latch must be designed to prevent

this input combination.

It is clearly better if we could find a design that eliminates the po ssibility of the "not allowed"

inputs. Table 5.5 is a state table for a D latch. It has two inputs, one f or control, the other fo r

data, D. D = 1 sets the latch to 1, and D = 0 re se ts it to 0.

Current Next

Contr ol D

State State

0 0 0

0 1 1

1 0

0 0

1 0 1 0

1 1 0 1

1 1

1 1

Table 5.5: D latch with Control state table. "–" indicates that the value does not matter.

The D latch can be implemented as shown in Figure 5.20. The one d ata inpu t, D, is fed to

the "S " side of the SR latch; the complement of the data value is fed to the "R" side.

Now we have a circuit that can store one bit of data, using the D input, and can be syn-

chronized with a clock signal, using the Control input. Although this circuit is reliable by itself,

100 CHAPTER 5. LOGIC CIRCUITS

G1

G3

G4

G2

D

S

R

Q

Q

Contr ol

Figure 5.20: D latch constructed from an SR latch.

the issue is whether it is reliable when connected with other circuit elements. The D signal

almost certainly comes fro m an interconnection of co mbinational and sequential logic circuits.

If it changes while the Control is still 1, the state of the latch will be changed.

Each electronic ele ment in a circuit takes time to activate. It is a very short per iod of time,

but it can vary slightly dependin g upon precisely how the other logic elements are intercon-

nected and the state of each of them when they are activated. The problem here is that the

Contr ol input is bein g use d to control the circuit based on the clock signal level. The clock level

must be maintained for a time long enough to allow all the circuit elements to comple te their

activity, which can vary depending on what action s are be ing performed. In essence, the circuit

timing is determined by the circuit elements and their actions instead of the clock. This makes

it very difficult to achieve a reliable design.

It is much easier to design reliable circuits if the time when an activity can be triggered is

made very short. The solution is to use edge-trigg ered logic elements. The inputs are app lied

and en ough time is allowed for the electronics to settle. Then the n ext clock transitio n activates

the circuit element. This scheme provides concise timing under control of the clock instead of

timing determined more of less by the particular circuit design.

5.3.3 Flip-Flops

Although the terminology varies somewhat in the literature, it is gene r ally agreed that (see

Figure 5.14.):

A latch u se s a level based clock signal.

A flip-flop is triggered by a clock signal edge.

At each "tick" of the clock, there are four possible actions that might be taken on a single bit

store 0, store 1, complement the bit (also called toggle), or leave it as is.

A D flip-flop is a common device for storing a single bit. We can turn the D latch into a D

flip-flop by using two D latches connected in a master/slave c onfiguration as shown in Fig ure

5.21. Let us walk through the operation of this circuit.

D

CK

Q

Q

Master Slave

Figure 5.21: D flip-flop, positive-edge trigge r ing.

The bit to be store d, 0 or 1, is applied to the D input of the Master D latch. The clock signal

is applied to the CK input. It is normally 0. When the clock signal makes a transition from 0 to

1, the Master D latch will either Reset or Set, following the D input of 0 or 1, respectively.

5.3. SEQUENTIAL LOGIC CIRCUI TS 101

While the CK input is at the 1 level, the control signal to the Slave D latch is 1, wh ich

deactivates this latch. Mean while, the output of this flip-flop, the output of the Slave D latch, is

probably con nected to the input of another circuit, wh ich is activated by the same CK . Since the

state of the Slave does not change during this clock half-cycle, the second circuit has enough time

to read the current state of the flip-flop connected to its input. Also during this clock half-cycle,

the state of the Master D latch has ample time to settle.

When the CK input transitions back to the 0 level, the control signal to the Master D latch

becomes 1, deactivating it. At the same time, the control input to the Slave D latch goes to 0,

thus activating the Slave D latch to store the appropriate value, 0 or 1. The new input will be

applied to the Slave D latch during the second clock half-cycle, after the circuit c onnected to its

output has had sufficient time to read its previous state. Thus, signals travel along a path of

logic circu its in lock step with a clock signal.

There are applications where a flip-flop must be set to a known value before the clocking

begins. Figure 5.22 shows a D flip-flop with an asynchronous preset input added to it. Whe n a 1

D

CK

Q

Q

P R

Figure 5.22: D flip-flop, positive-edge trigge r ing with asynchronous preset.

is applied to the P R input, Q becomes 1 and Q

0, regardless of what the other inputs are, even

CLK . It is also common to have an asynchronous clear input that sets the state (and output) to

0.

There are more efficient circuits for implementing edge -triggered D flip-flops, but this discus-

sion serve s to show that they can be constructed from ordinary logic gates. They are economical

and efficient, so are widely used in very large scale integration circuits. Rather than draw the

details for each D flip-flop, circuit designer s use the symbols shown in Figure 5.23. The various

Q1

Q

Q

CK

PR

CLR

D

Q2

Q

Q

CK

PR

CLR

D

(a) (b)

Figure 5.23: Sy mbols for D flip-flops. Includes asynchronous clear (CLR) and preset (PR). ( a)

Positive-edge triggering; (b) Negative-edge triggering.

inputs and outputs are labeled in this figure. Hardware designer s typically use

Q instead of

Q

. It is common to label the circuit as "Qn," with n = 1, 2,. . . for iden tification . The small circle

at the clock input in Figure 5.23(b) means that this D flip-flop is triggered by a negative-going

clock transition. The D ip-flop c irc uit in Figure 5.21 can be changed to a negative-going trigger

by simply removing the first NO T gate at the CK input.

102 CHAPTER 5. LOGIC CIRCUITS

The flip-flop that simply complements its state, a T fli p-flop, is easily constructed from a D

flip-flop. The state table and state diagram for a T flip-flop are shown in Figure 5.24.

Current Next

T

State State

0 0 0

0 1 1

1

0 1

1 1 0

0 10

1

0

1

T

Figure 5.24: T flip-flop state table and state diagram. Each clock tick causes a state transition,

with the nex t state depending on the current state and the value of the input, T .

To determine the value that must be presented to the D flip-flo p in orde r to implement a T

flip-flop, we add a colu mn for D to the state table as shown in Table 5.6. By simply loo king in

Current Next

T State State D

0 0 0 0

0 1 1 1

1 0 1 1

1

1 0 0

Table 5.6: T flip-flop state table showing the D flip-flop input required to p lace the T flip- flop in

the next state.

the "Next State" column we c an see what the input to the D flip-flop must be in order to obtain

the correct state. These values are entered in the D column. (We w ill generalize this design

procedure in Section 5.4.)

From Table 5.6 it is easy to write the equation fo r D:

D = T

· Q + T · Q

= T Q (5.8)

The resulting design for the T flip-flop is shown in Figure 5.25.

Q1

Q

Q

CK

D Q

Q

T

CK

(a)

Q2

Q

Q

CK

T

(b)

Figure 5.25: T flip-flop. (a) Circuit using a D flip-flop. (b) Symbol for a T flip-fl op.

Implementing all four possible actions set, reset, keep, toggle requires two inp uts, J

and K , which leads us to the JK flip-flop. The state table an d state d iag ram for a JK flip-flop

are shown in Figure 5.26.

5.3. SEQUENTIAL LOGIC CIRCUI TS 103

Current Next

J K State State

0 0 0 0

0 0

1 1

0 1 0 0

0 1 1 0

1 0

0 1

1 0 1 1

1 1

0 1

1 1 1 0

0 1

00

01

10

11

00

10

01

11

JK

Figure 5.26: JK flip-flop state table and state diagram.

In order to determine the value that must be presented to the D flip-flop we add a column for

D to the state table as shown in Table 5.7. shows what values must be input to the D flip -flop.

Current Next

J K State State D

0 0 0 0 0

0 0

1 1 1

0 1 0 0 0

0 1 1 0 0

1 0

0 1 1

1 0 1 1 1

1 1 0 1 1

1 1

1 0 0

Table 5.7: JK flip-flop state table showing the D fl ip-flop input r equired to place the JK flip-flop

in the next state.

From this it is easy to write the equation for D:

D = J

· K

· Q + J · K

· Q

+ J · K

· Q + J · K · Q

= J · Q

· (K

+ K ) + K

· Q · (J + J

)

= J · Q

+ K

· Q (5.9)

Thus, a JK flip-flop can be constructed from a D flip-flop as shown in Figure 5.27.

Q1

Q

Q

CK

D

Q

Q

J

K

CK

(a)

Q

Q

CK

PR

CLR

K

J

(b)

Q2

Figure 5.27: JK flip-flop. (a) Circuit using a D flip-flop. (b) Symbol for a JK ip-flop with asyn-

chronous CLR and PR in puts.

104 CHAPTER 5. LOGIC CIRCUITS

5.4 Designing Sequential Circuits

We will now consider a more general set of ste ps for designing sequential circuits.

1

Design in

any fi eld is usually an iterative process, as you have no doubt learne d from you r programm ing

experience. You start with a design , analyze it, then refine the design to m ake it faster, less

expensive, e tc. After gaining some e xperience, the design process usually requires fe wer itera-

tions.

The fo llowing steps form a good method for a first working design:

1. From the word description of the problem, create a state table and/or state diagram show-

ing what the circ uit must do. These form the basic technical specifications for the circuit

you will be designing.

2. Ch oose a binary code for the states, and create a binary-coded version of the state table

and/or state diagram. For N states, the c ode will need log

2

bits. Any code will work, but

some codes may lead to simpler combinational logic in the circuit.

3. Ch oose a particular type of flip-flop. This choice is often dictated by the compone nts you

have on hand.

4. Add columns to the state table that show the input required to each flip-flop in order to

effect each transition that is required.

5. Simplify the input(s) to each fl ip-flop. Karnaugh maps or algebraic methods are good tools

for the simplification process.

6. Draw the circuit.

Example 5-a

Design a counter that has an E nable inp ut. When Enable = 1 it increments through the

sequence 0, 1, 2, 3, 0, 1,. . . with each clock tick. Enable = 0 causes the counter to remain in

its c urrent state.

1. First we cr eate a state table and state diagram:

Enable = 0 En able = 1

Current Next Next

n n n

0 0 1

1 1 2

2 2 3

3 3 0

0

1 2

30

1

0

1

0

1

0

1

At e ach clock tick the counter increments by one if Enable = 1 . If Enable = 0 it remains in

the current state. We have only shown the inputs because the output is equal to the state.

2. A reasonable choice is to use the binary nu mbering system for each state. With four states

we need two bits. We will let n = n

1

n

0

, giving the state table:

Enable = 0 Enable = 1

Current Next Next

n

1

n

0

n

1

n

0

n

1

n

0

0 0 0 0 0 1

0 1 0 1 1 0

1 0 1 0 1 1

1 1 1 1 0 0

1

I wish to thank Dr. Lynn Stauffer for her valuable suggestions for this section.

5.4. DESIGNING SEQUENTIAL CIRC UITS 105

3. Since JK flip-flop s are very general we will use those.

4. We need two flip- flops, one for each bit. So we add columns to the state table showing

the input required to each JK fl ip-flop to cause the correct state transition. Referring to

Figure 5.26 (page 103), we see that JK = 00 keeps the current state, JK = 0 1 resets it (to

0), JK = 10 se ts it (to 1), and JK = 11 complements the state. We use X when the input

can be either 0 or 1.

Enable = 0 Enable = 1

Current Next Next

n

1

n

0

n

1

n

0

J

1

K

1

J

0

K

0

n

1

n

0

J

1

K

1

J

0

K

0

0 0 0 0 0 X 0 X 0 1 0 X 1 X

0 1 0 1 0 X X 0 1 0 1 X X 1

1 0 1 0 X 0 0 X 1 1 X 0 1 X

1 1

1 1 X 0 X 0 0 0 X 1 X 1

Notice the "don't care" en tries in the state table. Since the JK flip-flop is so versatile,

including the "don't cares" helps find simpler circuit r ealizations. (See Exercise 5-3.)

5. We use Karnaugh maps, using E for Enable.

J

0

(E, n

1

, n

0

)

n

1

n

0

00 01 1011

E

0

1

X X

1

X

1

X

K

0

(E, n

1

, n

0

)

n

1

n

0

00 01 1011

E

0

1

X X

X

1

X

1

J

1

(E, n

1

, n

0

)

n

1

n

0

00 01 1011

E

0

1

XX

1

XX

K

1

(E, n

1

, n

0

)

n

1

n

0

00 01 1011

E

0

1

X X

X X

1

J

0

(E, n

1

, n

0

) = E

K

0

(E, n

1

, n

0

) = E

J

1

(E, n

1

, n

0

) = E · n

0

K

1

(E, n

1

, n

0

) = E · n

0

6. The circuit to implement this counter is:

Q

CK

K

J

Q

CK

K

J

Q1

Q0

n

1

n

0

Enable

CLK

106 CHAPTER 5. LOGIC CIRCUITS

The timing of the binary counter is shown here when counting through the sequence 3, 0, 1, 2,

3 (11, 00, 01, 10, 11).

CLK

1

0

Q

1

Q

0

11 00 01 10 11

Q

0

.JK

1

0

n

0

1

0

Q

1

.JK

1

0

n

1

0

1

Q

i

.JK is the input to the i

th

JK flip-flop, and n

i

is its output. (Recall that J = K in this design.)

When the i

th

input, Q

i

.JK , is applied to its JK flip-flop, remember that the state of the flip-flop

does not change until the second half of the clock cycle. This can be seen when comparing the

trace for the correspo nding output, n

i

, in the figure.

Note the short delay after a clock transition before the value of each n

i

actually changes.

This represents the time required for the electronics to completely settle to the new values.

Except for very inexpensive microcontrollers, most modern C PUs execute instructions in

stages. An instruction passes through each stage in an assembly-line fashion, called a pipeli ne.

The action of the first stage is to fetch the instruction from memory, as will be explained in

Chapter 6.

After an instruction is fetched from memory, it passes on to the next stage. Simultaneously,

the first stage of the CPU fetches the next instruction from memory. The result is that the

CPU is working on several instructions at the same time. This provide s some parallelism, thus

improving execution speed.

Almost all programs contain conditional branch points p laces whe re the next instruction

to be fetched can be in one of two different memory locations. Unfortunately, the decision of

which of the two instructions to f etch is not known until the decision-making in struction has

moved several stages into the pipeline. In order to maintain execution speed, as soon as a

conditional branch instruction has passed on fr om the fetch stage, the CPU needs to predict

where to fetch the next instruction from .

In this next example we will design a circuit to im plement a pre diction circuit.

Example 5-b

Design a circuit that predicts whether a conditional branch is taken or not. The predictor

continues to predict the same outcome, take the branch or do not take the branch, until it

makes two mistakes in a row.

1. We use "Yes" to indicate when the branch is taken and "No" to indicate when it is not. The

state diagram shows four states:

5.4. DESIGNING SEQUENTIAL CIRC UITS 107

fromNo

No

No

No

fromYes

Yes

Yes

Yes

Y N YN

N

Y

Y

N

Let us begin in the "No " state. The prediction is that the next branch will also not be

taken. The notation in the state bubbles is

state

output

, showin g that the ou tp ut in this state is

also "No."

The input to the circuit is whether or not the branch was actually taken. The arc labele d

"N" shows the transition whe n the branch was not taken. It loops back to the "No" state,

with the prediction (the output) that the branch will not be taken the next time. If the

branch is taken, the "Y" arc shows that the circuit moves into the "fromNo" state, but still

predicting no branch the next time.

From the "fromNo" state, if the branch is no t taken (the prediction is correct), the circuit

returns to the "No" state. However, if the branch is taken, the "Y" shows that the circuit

moves into the "Yes" state. This means that the circuit predicted incorrectly twice in a row,

so the predictio n is changed to "Ye s."

You should be able to follow this state diagram for the other cases and convince yourself

that both the "fromNo" and "fromYe s" states are required.

Next we loo k at the state table:

Actual = N o Actual = Yes

Current Next Next

State Prediction State Prediction State Prediction

No No No No fromNo No

fromNo No No No Yes Yes

fromYes Yes No No Yes Yes

Yes Yes fromYes Yes Yes Yes

2. Since there are four states, we need two bits. We will let 0 rep resent "No" and 1 represent

"Yes." The input is whether the branch is actually taken (1) or not (0). And the output is

the prediction of whether it will be taken (1) or not (0).

We choose a binary code for the state, s

1

s

0

, such that the h igh-orde r bit represents the

prediction, and the low-ord er bit what the last input was. That is:

State Prediction s

1

s

0

No N o 0 0

fromNo No 0 1

fromY es Y es 1 0

Y es Y es 1 1

This leads to the state table in binary:

Input = 0 Input = 1

Current Next Next

s

1

s

0

s

1

s

0

s

1

s

0

0 0 0 0 0 1

0 1 0 0 1 1

1 0 0 0 1 1

1 1 1 0 1 1

108 CHAPTER 5. LOGIC CIRCUITS

3. We will use JK flip-flops for the circuit.

4. N ext we add colu mns to the binary state table showing the JK inputs required in order to

cause the correct state transitions.

Input = 0 Input = 1

Current Next Next

s

1

s

0

s

1

s

0

J

1

K

1

J

0

K

0

s

1

s

0

J

1

K

1

J

0

K

0

0 0 0 0 0 X 0 X 0 1 0 X 1 X

0 1 0 0 0 X X 1 1 1 1 X X 0

1 0

0 0 X 1 0 X 1 1 X 0 1 X

1 1 1 0 X 0 X 1 1 1 X 0 X 0

5. We use Karnaugh maps to derive equations for the JK flip-flop inputs.

J

0

(In, s

1

, s

0

)

s

1

s

0

00 01 1011

In

0

1

X X

1

X

1

X

K

0

(In, s

1

, s

0

)

s

1

s

0

00 01 1011

In

0

1

X

1

X

1

X X

J

1

(In, s

1

, s

0

)

s

1

s

0

00 01 1011

In

0

1

XX

1

XX

K

1

(In, s

1

, s

0

)

s

1

s

0

00 01 1011

In

0

1

X X

1

X X

J

0

(In, s

1

, s

0

) = In

K

0

(In, s

1

, s

0

) = In

J

1

(In, s

1

, s

0

) = In · s

0

K

1

(In, s

1

, s

0

) = In

· s

0

6. The circuit to implement this predictor is:

Q

CK

K

J

Q

Q

CK

K

J

Q1

Q0

s

1

= P rediction

s

0

Actual

CLK

5.5. MEMORY ORGANIZATION 109

5.5 Memory Organization

In this section we will discuss how registers, SRAM, and DRAM are organized and constructed.

Keeping with the inte nt of this book, the discussion will be introductory only.

5.5.1 Registers

Registers are used in places where small amounts o f very fast memory is required. Many are

found in the CPU where they are used for numerical computations, tempor ary data storage, etc.

They are also used in the hardware that serves to interface between the CPU and othe r devices

in the computer system.

We begin with a simple 4-bit register, which allows us to store four bits. Figure 5.28 shows

a design for implementing a 4-bit r egister using D ip-flops. As described above, each time the

d

3

Q3

Q

CK

D

d

2

Q2

Q

CK

D

d

1

Q1

Q

CK

D

d

0

Q0

Q

CK

D

CLK

r

0

r

1

r

2

r

3

Figure 5.28: A 4-bit reg ister. A D flip-flop is used to hold each bit. The state of the i

th

bit is

set by the value of d

i

at each clock tick. The 4-bit value stored in the register is

r = r

3

r

2

r

1

r

0

.

clock cycles the state of each of the D flip-flops is set according to the value of d = d

3

d

2

d

1

d

0

.

The problem with this circuit is that any changes in any of the d

i

s will change the state of the

corresponding bit in the next clock cycle, so the contents of the register are essentially valid for

only one clock c ycle.

One-cycle buffering of a bit pattern is sufficient for some applications, but there is also a

need for registers that will store a v alue until it is explicitly changed, perhaps billions of clock

cycles later. The circuit in Figure 5.29 uses adds a load signal and fee dback from the output of

each bit. When load = 1 each bit is set according to its correspondin g input, d

i

. When load = 0

the output of each bit, r

i

, is used as the input, giving n o change. So this register can be used to

store a value for as many clock cycles as desired. The value will not be changed until load is set

to 1.

Most computers need many general purpose registers. When two or more registers are

groupe d together, the unit is called a register file. A mechanism must be provided for addressing

one of the registers in the register file.

Consider a register le composed of eight 4-bit registers, r 0 r 7 . We could build eight copies

of the circuit shown in Figure 5.29. Let the 4-bit data input, d, be conne cted in p arallel to all of

110 CHAPTER 5. LOGIC CIRCUITS

d

3

Q3

Q

CK

D

d

2

Q2

Q

CK

D

d

1

Q1

Q

CK

D

d

0

Q0

Q

CK

D

load

CLK

r

0

r

1

r

2

r

3

Figure 5.29: A 4-bit register with load. The storage portion is the same as in Figure 5.28. When

load = 1 e ach bit is set according to its corresponding input, d

i

. When load = 0 the

output of each bit, r

i

, is used as the input, giving no change.

the corresponding d ata pin s, d

3

d

2

d

1

d

0

, of each of the eight registers. Three bits are required to

address one of the registers (2

3

= 8). If the 8-bit output from a 3 × 8 decoder is connected to the

eight load inputs of each of the registers, d will be loaded into one, and only one, of the registers

during the next clock cycle. All the othe r registers w ill have load = 0 , and they will simply

maintain their current state. Selecting the outpu t fr om one of the eight registers can be done

with four 8-input multiplex ers. One such multiplexer is shown in Figure 5.30. The in puts r 0

i

r 7

i

are the i

th

bits fro m each of eight reg isters, r0 r 7 . One of the eight registers is se lected

Sel

0

1

2

3

4

5

6

7

r0

i

r1

i

r2

i

r3

i

r4

i

r5

i

r6

i

r7

i

3

Reg _Sel

Reg _Out

i

Figure 5.30: 8-way mux to select output of register file. This only shows the output of the i

th

bit. n are required for n-bit registers. Reg _Sel is a 3-bit signal that selects on of the

eight inp uts.

5.5. MEMORY ORGANIZATION 111

for the 1-bit outpu t, Reg _Out

i

, by the 3-bit input Reg _Sel . Keep in mind that four o f these

output circuits would be require d for 4- bit registers. The same Reg _Sel would be applied to all

four multiplexers simultaneously in order to output all four bits of the same register. Larger

registers would, of course, requ ire correspondingly more multiplexers.

There is another important feature of this design that follows from the master/slave property

of the D fl ip-flops. The state o f the slave portion does not change until the second half of the

clock cycle. So the circuit connected to the output of this register can read the current state

during the first half of the clock cycle, while the master portion is preparing to change the state

to the new contents.

5.5.2 Shift Registers

There are m any situations where it is desirable to shift a group of bits. A shift register is a

common device for doing this. Common applications include:

Inserting a time delay in a bit stream.

Converting a serial bit stream to a parallel group of bits.

Converting a parallel group of bits into a serial bit stream.

Shifting a parallel group of bits left or right to perform multiplication or division by powers

of 2.

Serial-to-parallel and parallel-to-serial conversion is required in I/O controllers because most

I/O co mmunication is serial bit streams, while data processing in the CPU is performe d on

groups of bits in parallel.

A simple 4-bit serial-to-parallel shift register is shown in Figure 5.31. A serial stream of bits

Q3

Q

CK

D

Q2

Q

CK

D

Q1

Q

CK

D

Q0

Q

CK

D s

i

CLK

r

0

r

1

r

2

r

3

Figure 5.31: Four-bit serial-to-parallel shift register. A D flip-flop is use d to hold each bit. Bits

arrive at the input, s

i

, one at a time. The last four input bits are available in

parallel at r

3

r

0

.

is inpu t at s

i

. At each clock tick, the output of Q

0

is applied to the input of Q

1

, thus copying

the previous value of r

0

to the new r

1

. The state of Q

0

changes to the value of the new s

i

, thus

copying this to be the new value of r

0

. The serial stream of bits continue s to ripple through the

four bits of the shift register. At any time, the last f our bits in the serial stream are available in

parallel at the four outputs, r

3

,. . . ,r

0

.

The same circuit could be used to provide a time delay of four clock ticks in a serial bit

stream. Simply use r

3

as the serial output.

112 CHAPTER 5. LOGIC CIRCUITS

5.5.3 Static Random Access Memor y (SRAM)

There are several problems with trying to extend this d esign to large memory systems. First,

although a multiplexer works fo r selecting the output from several registers, one that sele cts

from a many million memory cells is simply too large. From Figure 5.8 (page 90), we see that

such a multiplexer would need an AND gate for each memory cell, plus an OR gate with an

input for e ach of these millions of AND gate outputs.

We need anothe r logic ele ment called a tri-state buffer. The tri-state buffer has three possible

outputs 0, 1, and "high Z." "High Z" d escribes a very high impedance connection (see Section

4.4.2, page 70.) It can be thought of as essentially "no connection" or "open."

It takes two in puts data input and enable. The truth table describing a tri-state buffer is:

Enable In

Out

0 0 h ighZ

0 1 h ighZ

1 0

0

1 1 1

and its circuit symbol is shown in Figure 5.32. When Enable = 1 the output, wh ich is equal

In Out

Enable

Figure 5.32: Tri-state buffer.

to the input, is connected to whatever circuit element follows the tri-state buffer. But when

Enable = 0, the output is essentially disconnected. Be careful to realize that this is different

from 0; being disconnected means it has no effect on the circuit element to which it is connected.

A 4-way multiplexer using a 2 × 4 decod er and four tri-state buff ers is illustrated in Figure

5.33. Compare this design with the 4-way multiplexer sho wn in Figure 5.8, page 90. The tri-

2 × 4

decoder

s

0

s

1

w

x

y

z

Output

Figure 5.33: Four way multiplexer built from tri-state buffers. Output = w, x, y , or z , depending

on which one is selected by s

1

s

0

fed into the decoder. Compare with Figure 5.8,

page 90.

state buf fer design may not be an advantage for small mu ltiple xers. But an n-way multiplexer

without tri-state buffers requires an n-input OR gate, which presents some technical electronic

problems.

Figure 5.34 shows how tri-state buffer s can be used to implement a single memory cell.

This circuit shows only one 4-bit memory cell so you can compare it with the register design

in Fig ure 5.28, but it scales to much larger memories. W rite is asserted to store data in the

D flip-flops. Read enables the output tri-state buffer in order to connect the single output line

to Mem_data_ou t. The add ress decoder is also used to enable the tri-state buffers to connect a

memory cell to the output, r

3

r

2

r

1

r

0

.

This type of memory is called Static Random Access Memory (SRAM). "Static" because the

memory retains its stored v alues as lon g as power to the circuit is maintained. "Random access"

because it takes the same length of time to access the memory at any address.

5.5. MEMORY ORGANIZATION 113

d

3

Q3

Q

CK

D

d

2

Q2

Q

CK

D

d

1

Q1

Q

CK

D

d

0

Q0

Q

CK

D

CLK

r

0

r

1

r

2

r

3

Read_enable

W rit e _enable

addr

j

Figure 5.34: 4-bit memory cell. Each is output through a tri-state buffer. addr

i

is one output

from a de c oder correspon ding to an address.

A 1 MB memory requires a 20 bit address. This requires a 20 × 2

20

address decoder as shown

in Figure 5.35. Recall fro m Section 5.1.3 (page 86) that an n × 2

n

decoder requires 2

n

AND

20 × 2

20

Decoder

1 MB Mem.Address

20 2

20

Data

Write

Read

Figure 5.35: Addressing 1 MB of memory with one 2 0 × 2

20

address decoder. The short line

through the co nnector lines indicates the number of bits traveling in parallel in

that connection.

gates. We can simplify the circuitry by organizing memory into a grid of rows and columns as

114 CHAPTER 5. LOGIC CIRCUITS

shown in Figure 5.36. Although two de c oders are re qu ired, each requires 2

n/2

AND gates, for a

Address

20

10 × 2

10

Decoder

1 MB Mem.

10 × 2

10

Decoder

10 2

10

10

2

10

Data

Write

Read

Figure 5.36: Addressing 1 MB of memory with two 10 × 2

10

address decoders.

total of 2 × 2

n/2

= 2

(n/2)+1

AND g ates for the decoders. Of course, memory cell access is slightly

more complex, and som e complexity is added in order to split the 20-bit address into two 10-bit

portions.

5.5.4 Dynamic Random Access Memory (DRAM)

Each bit in SRAM requires about six transistors for its imp lementation. A less expensive solu-

tion is found in Dynamic Random Access Memory (DRAM). In DRAM each bit value is stored by

a charging a capacitor to on e of two v oltages. The circuit requires only one transistor to charge

the capacitor, as shown in Figure 5.37. This Figure shows only four bits in a single row.

Row Address Select

Data Latch

Figure 5.37: Bit storage in DRAM.

When the "Row Add ress Se lect" line is asserted all the transistors in that row are turned on,

thus c onnecting the respective capacitor to the Data Latch. The value stored in the capacitor,

high voltage or low voltage, is stored in the Data Latch. There, it is available to be read from

the me mory. Since this action tends to discharge the capacitors, they must be refreshed from

the values stored in the Data Latch.

5.6. EXERCISES 115

When n ew data is to be store d in DRAM, the current values are first stored in the Data

Latch, just as in a read operation. Then the appropriate changes are m ade in the Data Latch

before the capacitors are refreshed.

These operations take more time than simply switching ip-flops, so DRAM is appr eciably

slower than SRAM. In addition, capacitors lose their charge over time. So each row of capacitors

must be read and refreshed in the order of ever y 60 msec. This requires additional c ircuitry and

further slows memory access. Bu t the mu ch lower cost of DRAM compared to SRAM warrants

the slower access time.

This has be en only an introductio n to how switching transistors can be connected into c irc uits

to create a CPU. We leave the details to more advanc ed books, e.g., [20], [23], [24], [28], [31], [34].

5.6 Exercises

The greatest benefit will be derived from these exercises if you either build the circuits with

hardware or u sing a simulation program. Several free circuit simulation applications are avail-

able that run under GNU/Linux.

5-1 (§5.1) Build a four-bit adder.

5-2 (§5.1) Build a four-bit adder/subtractor.

5-3 (§5.4) Redesign the 2-bit counter of Example 5-a using only the "set" and "re set" inputs of

the JK flip-flops. So you r state table will no t have an y "don't cares."

5-4 (§5.4) Design a 4- bit up counte r 0, 1, 2,. . . ,15, 0,. . .

5-5 (§5.4) Design a 4- bit d own counte r 15, 14, 13,. . . ,0, 15,. . .

5-6 (§5.4) Design a decimal counter 0, 1, 2,. . . ,9, 0,. . .

5-7 (§5.5) Build the register file described in Se ction 5.5.1. It h as eight 4-bit registers. A 3 × 8

decoder is used to select a register to be loaded. Four 8-way multiplexers are used to select

the four bits from one register to be output.

Chapter 6

Central Proce ssing Unit

In this chapter we move on to consider a pro grammer's view of the Central Processing Unit

(CPU ) and how it interacts with memory. X86-64 CPUs can be used with either a 32-bit or a 64-

bit operating system. The CPU features available to the p rogrammer depend on the operating

mode of the CPU. The modes o f interest to the applications programmer are summarized in

Table 6.1. With a 32-bit operating system, the CPU behaves essentially the same as an x86-32

CPU.

Mode Submode Operating Default Default

System Address (bits) int (bits)

64-bit 64

IA-32e o r

64-bit 32

32

Long Com patibility

16 16

32 32

Protected

Legacy

Virtual-8086

32-bit

16 16

Real 16-bit

Table 6.1: X86-64 operating modes. Intel manuals u se the term "IA-32e" and AMD manuals

use "Long" when running a 64-bit operating system. Both manuals use the same

terminology for the two sub-modes. Adapted from Table 1-1 in [2].

In this book we de scribe the view of the CPU when running a 64-bit operating system. Intel

manuals c all this the IA-32e mode and the AMD manuals call it the long mode. The CPU can

run in one of two sub-modes under a 64-bit operating system. Bo th manuals use the same

terminology for the two sub-modes.

Compatibility mode Most programs compiled for a 32-bit or 16-bit environment can be

run without re-compiling.

64-bit mode The program must be compiled for 64-bit execution.

The two modes cannot be mixed in the same progr am.

The d iscussion in this chapter focuses on the 64-bit mode. We will also poin t out the differ-

ences of the compatibility mode, which we will refer to as the 32-bit mode.

6.1 CPU Overview

An overall block diagram of a typical CPU is shown in Figure 6.1. The subsystems are connected

together through internal buses. Keep in mind that this is a h ighly simplified diagram. Actual

CPUs are much more complicated, but the general concepts discussed in this chapter apply to

all of the m.

116

6.1. CPU OVERVIEW 117

Instruction Pointer

Instruction Register

Control Unit

Arithmetic

/Logic Unit

Flags Register

L1 Cache

Memory

Registers

Bus Interface

to A ddress, Data, and Control Buses

Figure 6.1: CPU block diagram. The CPU communicates with the Memory and I/O subsystems

via the Address, Data, and Control buses. See Figure 1.1 (page 3).

We will no w describe briefl y each of the subsystems in Figure 6.1. The descriptions provided

here are g eneric and apply to most CPUs. Com ponents that are of particular interest to a

programmer are de scribed within the context of the x86 ISA later in this chapter.

Bus Interface: This is the means for the CPU to communicate with the rest of the computer

system Memory and I/O Devices. It contains circuitry to place addresses on the address

bus, read and write data on the data bus, and read and write signals on the control bus.

The bus interface on many CPUs interfaces with extern al bus control u nits that in turn

interface with m emory and with different types of I/O buses, e.g., SATA, PCI-E, etc. The

external control units are transparent to the pro grammer.

L1 Cache Memory: Althou gh it could be argued that this is not a part of the CPU, most mod-

ern CPUs include very fast cache memory on the CPU chip. As you will see in Section 6.4,

each instruction must be fetched from memory. The CPU can execute instructions much

faster than they can be fetched. The interface with memory makes it more efficient to fetch

several instructions at one time, storing them in L1 cache where the CPU has very fast

access to them. Many modern CPUs use two L1 cache memories organized in a Harvard

architecture one for instructions, the other for data. (See Se ction 1.2, page 4.) It's use is

generally transparent to an applications programmer.

Registers : A register is a group of bits that is intended to be used as a variable in a p rogram.

Compilers and assemblers have names for each register. Almost all arithmetic and logic

operations and data move ment operations involve at least one register. See Section 6.2 for

more details.

Instruction Pointer: This is a 64-bit register that always contains the address of the next

instruction to be executed. See Section 6.2 for more details.

Instruction Register: This register contains the instruction that is currently being executed.

Its bit pattern determines what the Control Unit is causing the CPU to do. Once that

action has been completed, the bit pattern in the instruction r egister can be changed, an d

the CPU will perform the operation specified by this next bit pattern.

118 CHAPTER 6. CENTRAL PROCESSING UNIT

Most modern CPUs use an instructi on queue that is b uilt into the chip. Several instructions are

waiting in the queue, read y to be executed. Separate el ectronic circuitry keeps the instruction

queue full while the regular control unit is executing the instructions. But this is simply an

implementation detail tha t allows th e control unit to run f aster. The essence of how the control

unit executes a program is represented by the single i nstruction register model.

Control Unit: The bits in the Instruction Register are decoded in the Control Unit. It gener-

ates the signals that control the other subsystems in the CPU to carry out the action(s)

specified by the instruction. I t is typically implemented as a nite-state machine and con-

tains Dec oders (Section 5.1.3), Multiplexers (Sec tio n 5.1.4), and other logic components.

Arithmetic Logic Unit (ALU): A device that pe rforms arithmetic and logic operations on groups

of bits. The logic circuitry to perform addition is discussed in Section 5.1.1.

Flags Register: Each operation perfor med by the ALU results in various con ditions that must

be recorded. For example, addition c an produce a carry. One bit in the Flags Register will

be set to e ither zero (no carry) o r one (carry) af ter the ALU h as completed any operation

that may produce a carry.

We will now look at how the log ic circuits discussed in Chapter 4 can be used to implement some

of these subsystems.

6.2 CPU Registers

A portion of the memory in the CPU is organize d into registers. Machine instructions access

CPU registers by their addresses, just as memory contents are accessed. Of course, the register

addresses are not placed on the address bus since the registers are in the CPU. The difference

from a programmer's point of view is that the assembler has predefined names for the registers,

whereas the programme r creates symbolic names for memory addresses. Thus in each program

that you write in assembly language:

CPU reg isters are accessed by using the names that are predefined in the assembler.

Memory is accessed by the programmer providing a name for the memory location and

using that name in the user program.

The x 86- 64 architecture registers are shown in Table 6.2. Each bit in each register is num-

bered from right to left, beginning with zero. So the right-most bit is number 0, the next one to

the left number 1, etc. Since there are 64 bits in each register, the left-most bit is number 63.

The gen e ral purpose registers can be accessed in the following ways:

Q uadword all 64 bits [63 0].

D oubleword the low-order 32 bits [31 0].

Word the low-order 16 bits [15 0].

Byte the low-order 8 bits [7 0] (and in four registers bits [15 8]).

The assembler uses a different name for each group of bits in a register. The assembler

names fo r the groups of the bits are given in Table 6.3. In 64-bit mode, writing to an 8-bit or

16-bit po rtion of a r egister do es not affect the other 56 or 48 bits in the register. However, when

Storing 32 bits

sets top half of

register to zero.

writing to the low-order 32 bits, the high-order 32 bits are set to zero.

6.2. CPU REGISTERS 119

Basic Programming Regis ters

16 64-bit General purpose (GPRs)

1 64-bit Flags

1 64-bit Instruction pointe r

6 16-bit Segment

Floating Point Registers

8 80-bit Floating point data

1 16-bit Contro l

1 16-bit Status

1 16-bit Tag

1 11-bit Opcode

1 64-bit FPU Instruction Pointer

1 64-bit FPU Data Pointer

MMX Registers

8 64-bit MMX

XMM Registers

16 128-bit XMM

1 32-bit MXCSR

Model-Specific Registers ( MSRs)

These vary depending on the specific

hardware implementation. They are only

accessible to the operating system.

Table 6.2: The x86-64 registers. Not all the registers shown here are discussed in this chapter.

Some are discussed in subsequent chapters that deal with the related topic.

bits 63-

0

bits 31-

0

bits 15-

0

bits 15-

8

bits 7-0

rax eax ax ah al

rbx ebx bx bh bl

rcx ecx cx ch cl

rdx edx dx dh dl

rsi esi si sil

rdi edi di dil

rbp ebp bp bpl

rsp

esp sp spl

r8 r8d r8w r8b

r9 r9d r9w r9b

r10 r10d r10w r10b

r11 r11d r11w r11b

r12 r12d r12w r12b

r13 r13d r13w r13b

r14 r14d r14w r14b

r15 r15d r15w r15b

Table 6.3: Assembly language names for portions of the general-p urpose CPU registers. Pro-

grams running in 32-bit mode can only use the registers above the line in this table.

64-bit mode allows the use of all the registers. The ah, bh, ch, and dh registers cannot

be used with any of the (8-bit) registers below the line.

120 CHAPTER 6. CENTRAL PROCESSING UNIT

A pictorial representation of the naming of each portion of the gener al-purpose registers is

shown in Figure 6.2.

rax

eax

ax

ah al

-

-

-

- -

rsi

esi

si

sil

-

-

-

-

r8

r8d

r8w

r8b

-

-

-

-

Figure 6.2: Graph ical r epresentation of general purpose registers. The three shown here are

representative of the pattern of all the general purpose registers.

The 8-bit register portion s ah , bh, ch, and dh are a holdover from the Intel® 8086/8088 ar-

chitecture. It had four 16-bit registers, ax, bx, cx, and dx. The low-order bytes were named al,

bl, cl, and dl and the high-order bytes named ah, bh, ch, and dh. Access to these registers has

been maintained in 32-bit m ode for backward compatibility but is limited in 64-bit mode. Access

to the 8-bit low-order portions of the rsi, rdi, rsp, and rbp registers was added along with the

move to 64 bits in the x86-64 architecture but cannot be used in the same instruction with the

8-bit re gister portions of the xh registers.

When usi ng less than the entire 64 bits in a register, it is generally b ad to write code that assumes

the remaining portion is in any particular state. Such code is difficult to read and leads to errors

during its maintenance phase.

Although these are called "general purp ose," the descriptions in Table 6.4 show that some

of them have some spec ial significance, depending upon how they are used. (Some of the de-

scriptions may not make sense to yo u at this point.) I n this book, we will use the rax, rdx, rdi,

esi, and r8 r15 registers for general-purpose storage. They will be used just like variables in

a high-level language. Usage of the rsp and rbp registers fo llows a very strict discipline. You

should not use either of them for your assembly language programs until you understand how

to use them.

The instruction pointer register, rip

1

, always points to the ne xt in struction to be executed .

As explained in Section 6.4 (page 123), every time an instruction is fetched, the rip register is

automatically incremented by the control unit to contain the address of the next instru ction.

Thus, the rip register is never dire ctly accessed by the programmer. On the other h an d, every

instruction that is executed affects the contents of the rip re gister. Thus, the rip register is not

a g eneral-purpose register, but it guides the flow of the entire program .

1

In many other environments, the equ i valent register is called the program counter.

6.2. CPU REGISTERS 121

Register Special usage Called function preserves contents

rax 1st func tion return value. No

rbx Optional base pointer. Yes

rcx Pass 4th argument to func-

tion.

No

rdx Pass 3rd arg ument to func-

tion; 2nd function return

value.

No

rsp Stack pointer. Yes

rbp Optional frame pointer. Yes

rdi Pass 1st argument to func-

tion.

No

rsi Pass 2nd argument to func-

tion.

No

r8 Pass 5th argument to fu nc-

tion.

No

r9 Pass 6th argument to fu nc-

tion.

No

r10 Pass f unction's static chain

pointer.

No

r11 No

r12 Yes

r13 Yes

r14 Yes

r15 Yes

Table 6.4: General purpose registers.

Most arithmetic and logical operations affect the condition codes in the rflags register. The

bits that are affected are shown in Figure 6.3.

OF SF ZF AF PF CF

11 10 9 8 7 6 5 4 3 2 1 0

Figure 6.3: Condition codes portion of the rflags register. The high-order 32 bits (32 63) are

reserved fo r other use an d are not shown here. Neither are bits 12 31, which are

for system flags (see [3]).

The names of the condition codes are:

OF Overflow Flag

SF Sign Flag

ZF Zero Flag

AF Auxiliary carry or Adjust Flag

PF Parity Flag

CF Carry Flag

The OF, SF, ZF, and CF are described at appropriate places in this book. See [ 3] and [14] for

descriptions o f the other flags.

Two other registers are very important in a program. The rsp register is used as a stack

pointer, as will be discussed in Section 8.2 (page 158). The rbp register is typically used as a

base pointer; it will be discussed in Section 8.3 (page 164).

122 CHAPTER 6. CENTRAL PROCESSING UNIT

The "e" prefix on the 32-bit portion of each register name comes from the history of the x86 architec-

ture. The introduction of the 80386 in 1986 b rought an increase of register size f rom 16 bits to 32

bits. There were n o new registers. The old ones were simply "extended."

6.3 CPU Interaction with Memory and I/O

The connections between the CPU and Memory are shown in Figure 6.4. This figure also in-

cludes the I/O (input and output) subsystem. The I/O system will be discussed in subsequent

chapters. The control unit is connected to memor y by thr ee buses:

address bus

d ata bus

control bus

Bus: a communication path between two or more devices.

Several devices can be connected to one bus, but only two devices can be communicating

over the bus at one time.

CPU Memory I/O

Data Bus

Address Bus

Control Bus

Figure 6.4: Subsystems of a com puter. The C PU, Memory, and I/O subsystems communicate

with one another via the three bussed. (Repeat of Figure 1.1.)

As an example of how data can be stored in memory, let us imagine that we have some data

in one of the CPU registers. Storing this data in memory is effe cted by setting the states of a

group of bits in memory to match those in the CPU register. The control unit can be programmed

to do this by

1. sen ding the memory address on the address bus,

2. sen ding a copy of the register bit states on the data bus, then

3. sen ding a "write" signal on the contro l bus.

For example, if the eight bits in memory at address 0x7fffd9a43cef are in the state:

0x7fffd9a43cef: b7

the al register in the CPU is in the state:

%al: e2

and the control unit is programm ed to store this value at location 0x7fffd9a43cef, the control

unit then

Store data in

memory by

writing it there.

1. p laces 0x7fffd9a43cef on the addre ss bus,

2. p laces the bit pattern e2 on the data bus, and

3. p laces a "write" signal on the contro l bus.

6.4. PROGRAM EXECUTION IN THE CPU 123

Then the bits at memor y location 0x7fffd9a43cef will be changed to the state:

0x7fffd9a43cef: e2

Important. When the state of any bit in memory or in a register is changed any pre vious

states are lost forever. There is no way to "undo" this state change nor to determine how

the bit got in its current state.

6.4 Program Execution in the CPU

You may be w ondering how the CPU is programmed. It contains a special register the in-

struction register whose bit pattern determines wh at the CPU will d o. Once that action has

been c ompleted, the bit pattern in the instruction register can be chan ged, and the CPU will

perform the operation specified by this next bit pattern.

Most modern CPUs use an instruction queue. Several instructions are waiting in the queue, ready

to be executed. Separate electronic circuitry keeps the instruction queue full while the regula r

control unit i s executing the instructions. But this is simply an implementation detail that all ows

the control unit to run faster. The essence of how the control unit executes a program is represented

by the single instruction register model.

Since instructions are simply bit patterns, they can be stored in memory. The instruction

pointer register always has the memory address of (points to) the next instruction to be executed.

In order for the control unit to execute this instruction, it is copied into the in struction re gister.

The situation is as follows:

1. A sequence of instructions is stored in memor y.

2. The memory address where the first instruction is located is copied to the instruction

pointer.

3. The CPU sends the ad dress in the instruction pointer to memory on the address bus.

4. The CPU sends a "read" signal on the co ntrol bus.

5. Memory responds by sending a copy of the state of the bits at that memory location on the

data bus, which the CPU then copies into its instruction register.

6. The instruction pointer is automatically incremented to c ontain the address of the next

instruction in memor y.

7. The CPU executes the instruction in the instruction register.

8. Go to step 3.

Steps 3, 4, and 5 are called an instruction fetch. Notice that steps 3 8 constitute a cycle, the

instruction execution cy cle. It is shown graphically in Figure 6.5.

124 CHAPTER 6. CENTRAL PROCESSING UNIT

Fetch the

instruction

pointed to by the

Instruction

Pointer

Add number of

bytes in the

instruction to

Instruction

Pointer

Execute the

instruction

Is it the halt

instruction?

Stop CPU

No

Yes

Figure 6.5: The instruction execution cycle.

6.5. USING GDB TO VIEW THE CPU REGISTERS 125

This raises a couple of questions:

How do we get the instructions into memory? The instructions for a program are stored

in a le on a storage device, usually a disk. The computer system is controlled by an

operating system. When you indicate to the operating system that you wish to execute

a program, e.g., by double-clicking on its icon, the o perating system locates a reg ion of

memory large enough to hold the instructions in the p rogram then copie s them from the

file to memory. The contents in the file remain unchanged.

2

How do we create a file on the disk that contains the instructions? This is a multi-step

process using several programs that are pro vided for you. The programs and the files that

each create are:

An editor is used to create source files.

The source fi le is written in a programming language, e.g., C++. This is very similar

to creating a file with a word processor. The main differe nces are that an editor is

much simpler than a word proce ssor, and the contents of the sour ce file are written in

the programming language instead of, say, English.

A compiler/assembler is used to cr eate object files.

The compiler translates the programming language in a source file into the bit p at-

terns that can be used by a CPU (machine language). The sour c e le contents remains

unchanged.

A lin ker is used to create executable files.

Most progr am s are made up of several object files. For examp le, a GNU/Linux in-

stallation includes many object files that contain the machine instructions to perform

common tasks. These are pro grams that h ave already been written and compiled.

Related tasks are com monly grou ped together into a single file called a library.

Whenever possible, you should use the short programs in these libraries to p erform

the computations your pro gram needs rather that write it yourself. The linker pro -

gram will merge the machine code from these several object files into one file.

You may have used an integrated development environment (IDE), e.g., Microsoft®Visual

Studio®, Eclipse™, which combines all of these three programs into one package where e ach

of the intermediate steps is performed automatically. You use the editor program to create the

source file and then give the run command to the IDE. The IDE will co mpile the program in

your source files, link the resulting object files with the nece ssary libraries, load the resulting

executable file into memo ry, then start your program. In general, the intermediate object files

resulting from the compilation of each source file are au tomatically deleted from the disk.

In this book we will explicitly perfor m each of these steps separately so we can learn the role

of each program editor, assembler, linker used in preparing the application program.

6.5 Using gdb to View the CP U Registers

We will use the program in Listing 6.1 to illustrate the use of gdb to view the conte nts of the

CPU registers. I have used the register storage class modifier to requ est that the compiler use

a CPU register for the int

*

ptr variable. The register m odifier is "advisory" only. See Exercise

6-3 for an example when the compiler may not be able to honor our request.

1 /

*

2

*

gdbExample1.c

3

*

Subtracts one from user integer.

4

*

Demonstrate use of gdb to examine registers, etc.

5

*

Bob Plantz - 5 June 2009

2

This is a highly simplified description. The details depend upon the overall system.

126 CHAPTER 6. CENTRAL PROCESSING UNIT

6

*

/

7

8 #include <stdio.h>

9

10 int main(void)

11 {

12 register int wye;

13 int

*

ptr;

14 int ex;

15

16 ptr = &ex;

17 ex = 305441741;

18 wye = -1;

19 printf("Enter an integer: ");

20 scanf("%i", ptr);

21 wye +=

*

ptr;

22 printf("The result is %i\n", wye);

23

24 return 0;

25 }

Listing 6.1: Simple program to illustrate the use of gdb to view CPU registers.

We introduced some gdb commands in Chapter 2. Here are some additional ones that will be

used in this section:

n execute current source code statement of a program that has been running; if it's a

call to a function, the entire function is execu ted.

Useful gdb

commands.

s execute current source code statement of a program that has been running; if it's a

call to a function, step into the function.

si e xecute current (machine) instruction of a program that has been running; if it's a

call to a function, step into the function.

i r info re gisters displays the contents of the registers, except floating po in t an d

vector.

Here is a scre en sho t of how I compiled the program then used gdb to control the execution

of the p rogram and observe the register contents. My typing is boldface and the session is

annotated in italics. Note that you will probably see different addresses if you replicate this

example on your own (Exercise 6-1).

bob$ gcc -g -O0 -Wall -fno-asynchronous-unwind-tables \

> -fno-stack-protector -o gdbExample1 gdbExample1.c

The "-g" option is required. I t tells the compiler to include debugger information in

the executable program.

bob$ gdb gdbExample1

GNU gdb 6.8-debian

Copyright (C) 2008 Free Software Foundation, Inc.

License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law. Type "show copying"

and "show warranty" for details.

This GDB was configured as "x86

_

64-linux-gnu"...

(gdb) li

6.5. USING GDB TO VIEW THE CPU REGISTERS 127

7

8 #include <stdio.h>

9

10 int main(void)

11

12 register int wye;

13 int

*

ptr;

14 int ex;

15

16 ptr = &ex;

(gdb)

17 ex = 305441741;

18 wye = -1;

19 printf("Enter an integer: ");

20 scanf("%i", ptr);

21 wye +=

*

ptr;

22 printf("The result is %i\n", wye);

23

24 return 0;

25

(gdb)

The li command lists ten lines of source cod e. The display is centered around the

current line. Since I have not started exec ution of this program, the display is centered

around the begin ning of main. The d isplay ends with the (gdb) prompt. Pushing the

return key repeats the previous command, and li is smart enough to display the next

ten lines.

(gdb) br 19

Breakpoint 1 at 0x400569: file gdbExample1.c, line 19.

(gdb) run

Starting program: /home/bob/my

_

book

_

64/progs/chap06/gdbExample1

Breakpoint 1, main () at gdbExample1.c:19

19 printf("Enter an integer: ");

I set a breakp oint at line 19 then run the program. When li ne 19 is reached, the

program is paus e d before the statement is executed, and control returns to gdb.

(gdb) print ex

$1 = 305441741

(gdb) print &ex

$2 = (int

*

) 0x7fff504c473c

I use the print command to view the value ass igned to the ex variable and learn its

memory ad dress.

(gdb) help x

Examine memory: x/FMT ADDRESS.

ADDRESS is an expression for the memory address to examine.

FMT is a repeat count followed by a format letter and a size letter.

Format letters are o(octal), x(hex), d(decimal), u(unsigned decimal),

t(binary), f(float), a(address), i(instruction), c(char) and s(string).

Size letters are b(byte), h(halfword), w(word), g(giant, 8 bytes).

The specified number of objects of the specified size are printed

according to the format.

Defaults for format and size letters are those previously used.

128 CHAPTER 6. CENTRAL PROCESSING UNIT

Default count is 1. Default address is following last thing printed

with this command or "print".

The help co mmand will provide very brief instructions o n using a command. We want

to display values stored in specific memory lo c ations in various formats, and the help

command p rovides a remind er of how to use the command.

(gdb) x/1dw 0x7fff504c473c

0x7fff504c473c: 305441741

I verify that the value assigned to the ex variable is stored at location 0x7fff504c473c.

(gdb) x/1xw 0x7fff504c473c

0x7fff504c473c: 0x1234abcd

I examine the same integer in hexadecimal format.

(gdb) x/4xb 0x7fff504c473c

0x7fff504c473c: 0xcd 0xab 0x34 0x12

Next, I exa mine all four bytes of the word, one byte at a time. In this display,

0xcd is stored in the byte at address 0x7fff504c473c,

0xab is stored in the byte at address 0x7fff504c473d,

0x34 is stored in the byte at address 0x7fff504c473e, and

0x12 is stored in the byte at address 0x7fff504c473f.

In other words, the byte-wis e display appears to be backwards. This is due to the

values being stored in the little endian storage scheme as explai ned on page 19 in

Chapter 2.

(gdb) x/2xh 0x7fff504c473c

0x7fff504c473c: 0xabcd 0x1234

I also examine all four bytes of the word, two bytes at a time. In this display,

0xabcd is stored in the two bytes starting at a ddress 0x7fff504c473c, and

0x1234 is stored in the two bytes starting at a ddress 0x7fff504c473e.

This shows how gdb displa ys thes e four bytes as though they represen t two 16-bit ints

stored in little endi an format. (Yo u can now s e e wh y I entered such a strange integer

in this demonstration run.)

(gdb) print ptr

$3 = (int

*

) 0x7fff504c473c

(gdb) print &ptr

$4 = (int

**

) 0x7fff504c4740

Look carefully at the ptr variable. It is located at address 0x7fff504c4740 and it

contains another a ddress, 0x7fff504c473c, that is, the ad dress of the variable ex. It

Memory

addresses can be

stored in

memory.

is important that you learn to distinguish between a memo ry ad dress a nd the value

that is stored there, which can be another memory a ddress. Perhaps a good way to

think about this is a gro up of nu mbered mailboxes, each containing a single piece of

paper that you can write a single number o n. You could write a number that represents

a "data" value on th e paper. Or you can write the addres s of a mailbox on the paper.

One of the jobs of a programmer is to write the p r ogram such that it interprets the

number a ppropriately either a data value or an address.

6.5. USING GDB TO VIEW THE CPU REGISTERS 129

(gdb) print wye

$5 = -1

(gdb) print &wye

Address requested for identifier "wye" which is in register $rbx

The compiler has honored our request and allocated a register for the wye variable.

Registers are located in the CPU and do not have memory addresses, so gdb cannot

print the address. We will need to use the i r c ommand to view the register contents.

(gdb) i r

rax 0x7fff504c473c 140734540564284

rbx 0xffffffff 4294967295

rcx 0x0 0

rdx 0x7fff504c4838 140734540564536

rsi 0x7fff504c4828 140734540564520

rdi 0x1 1

rbp 0x7fff504c4750 0x7fff504c4750

rsp 0x7fff504c4730 0x7fff504c4730

r8 0x7ff0482a22e0 140669979599584

r9 0x7ff0482b6160 140669979681120

r10 0x7fff504c4590 140734540563856

r11 0x7ff047f534c0 140669976130752

r12 0x400460 4195424

r13 0x7fff504c4820 140734540564512

r14 0x0 0

r15 0x0 0

rip 0x400569 0x400569 <main+29>

eflags 0x206 [ PF IF ]

cs 0x33 51

ss 0x2b 43

ds 0x0 0

es 0x0 0

fs 0x0 0

gs 0x0 0

fctrl 0x37f 895

fstat 0x0 0

ftag 0xffff 65535

fiseg 0x0 0

fioff 0x0 0

foseg 0x0 0

---Type <return> to continue, or q <return> to quit---

fooff 0x0 0

fop 0x0 0

mxcsr 0x1f80 [ IM DM ZM OM UM PM ]

The i r c ommand displays the cur rent co ntents of the CPU registers. The first colum n

is the name of the register. The second shows the current bit pattern in the r e gister,

in hexadeci mal. Notice that leading zeros are not displayed. The third column shows

some the register contents in 64-bit signed decimal. The registers that a lways hold

addresses are also shown in hexadec imal in th e third column. The columns are often

not al igned due to the tabbing of the display.

We s ee that the value in the ebx general purpose register is the same as that stored in

the wye variable, 0xffffffff.

3

(Recall that ints a re 32 bits, even in 64-bit mode.) We

conclude that the compil er chose to allocate ebx as the wye variable.

3

If this is not clear, you need to review Section 3.3.

130 CHAPTER 6. CENTRAL PROCESSING UNIT

Notice the va lue in the rip register, 0x400569. Refer back to where I set the break-

point on source line 19. This shows th at the program stopped at the correc t memory

location.

It is only coincidental that the address o f the ex variable is currently stored in the rax

register. If a general purpose register is not allocated as a variable wi thin a function,

it is often used to store re sults of intermediate computations. You will learn h ow to use

registers this way in subsequent chapters of this book.

(gdb) br 21

Breakpoint 2 at 0x40058b: file gdbExample1.c, line 21.

(gdb) br 22

Breakpoint 3 at 0x400593: file gdbExample1.c, line 22.

These two breakpoints will allow us to examin e the value stored in the wye variable

just before and after it is modified.

(gdb) cont

Continuing.

Enter an integer: 123

Breakpoint 2, main () at gdbExample1.c:21

21 wye +=

*

ptr;

(gdb) print ex

$6 = 123

(gdb) print wye

$7 = -1

This verifies that the user's input value is stored correctly and that the wye variable

has not yet been changed.

(gdb) cont

Continuing.

Breakpoint 3, main () at gdbExample1.c:22

22 printf("The result is %i\n", wye);

(gdb) print ex

$8 = 123

(gdb) print wye

$9 = 122

And this verifies that our (rather simple) alg orithm works correctly.

(gdb) i r rbx rip

rbx 0x7a 122

rip 0x400593 0x400593 <main+71>

We can specify which registers to display with the i r command. This verifi e s that the

rbx reg ister is being used as the wye variable.

And we see that the rip has incremen ted from 0x400569 to 0x400593. Don't forge t that

the rip register always points to the next instructi on to be executed.

(gdb) cont

Continuing.

The result is 122

Program exited normally.

(gdb) q

bob$

Finally, I continue to the end of the program. Notice that gdb is still running and I

have to quit the gdb p rogram.

6.6. EXERCISES 131

6.6 Exercises

6-1 (§6.2, §6.5) Enter the program in Listing 6.1 and trace through the program one line at a

time using gdb. U se the n com mand, not s or si. Keep a written record of the rip register

at the beginnin g of each line. Hint: use the i r command. How many bytes of machine

code are in each of the C statements in this program? Note that the addresses you see in

the rip register may differ from the example given in this chapter.

6-2 (§6.2, §6.4) As y ou trace through the program in Exercise 6-1 stop on line 22:

wye +=

*

ptr;

We determined in the example above that the %rbx register is used for the variable wye.

Inspect the registers.

a) What is the address of the first instruction that will be executed when you enter the

n command?

b) How will %rbx change wh en this statement is executed?

6-3 (§6.5) Modify the program in Listing 6.1 so that a register is also requested for the ex

variable. Were you able to convince the compiler to do this for you? Did the compiler

produce any error or warning messages? Why do you think the compiler would not use a

register for this variable.

6-4 (§6.2, §6.5) Use the gdb debugger to observe the con tents of memory in the program from

Exercise 2-31. Verify that your algorithm c reates a null-term inated string without the

newline character.

6-5 (§6.2, §6.5) Write a program in C that allows you to de termine the e ndianess of your com-

puter. Hint: use unsigned char

*

ptr.

6-6 (§6.2, §6.5) Modify the program in Exercise 6-5 so that you can demonstrate, using gdb,

that endianess is a property of the CPU. That is, even though a 32-bit in t is stored little

endian in memory, it will be read into a r egister in the "proper" order. Hint: declare a

second int that is a re gister v ariable; ex amine memory one by te at a time.

Chapter 7

Programming in Assembly

Language

While reading this chapter, you should also consult the info resources available in

most GNU/Linux installations for both the make and the as programs. Appendix B

provides a general tutorial for writing Makefiles, but yo u need to get the details from

info. info is especially important for learning about as's assembler directives.

Yo u should also reread the Developm ent Environment secti on on page xviii.

Creating a program in assembly language is essentially the same as creating one in a high-

level comp iled lang uage like C, C++, Java, FORTRAN, etc. We will begin the chapter by looking

in detail at the steps involved in creating a C program. Then we will look at which of these steps

apply to assembly language programming.

7.1 Creating a New Program

You have probably learned ho w to program using an Integrated Development Environ ment

(IDE) , which incorporates several programs within a single user interf ace:

1. A text editor is used to write the sourc e code and save it in a fi le.

2. A compiler translates the so urce code into machine language that can be executed by the

CPU.

3. A lin ker is used to integ r ate all the functions in y our program, including externally ac-

cessed libraries of functions, and to determine where each component will be loaded into

memory when the program is executed.

4. A loader is used to load the machine c ode version of the program into memory where the

CPU can execute it.

5. A debugger is used to help the programmer locate errors that may have crept into the the

program. (Yes, none of us is perfe ct!)

You enter your source code in the text editor part, click on a "build" button to compile and link

your pr ogram, then click on a "run" button to load and execute the prog r am . There is typically

a "debug" button that loads and executes the program under control o f the debugger prog ram

if you need to debug it. The individu al steps of program preparation are obscured by the IDE

user inter face. In this book we use the GNU programming environment in which each step is

perform ed explicitly.

Several excellen t text editors exist for GNU/Linux, each with its own "p ersonality." My "fa-

vorite" changes from time to time. I recommend trying several that are available to you and

deciding which one you prefer. You shou ld avoid using a word processor to create source files

132

7.2. PROGRAM ORGANIZATION 133

because it will add formatting to the tex t (unless you explicitly specify text-only). Text editors I

have used include:

gedit is probably installed if you are using the gnome desktop.

kate is probably installed if you are using the kde desktop.

vi is sup posed to be installed on all Linux (and Unix) systems. It provides a command line

user interface that is mode oriented. Text is manipulated through keyboard commands.

Several commands place vi in "text insert" mode. The 'esc' key is used to retur n to com-

mand mode. Most installations include vim (Vi IMproved) which has additional features

helpful in editing program source c ode.

emacs also has a command line user interface. Text is inserted directly. The 'ctrl' and

"meta" keys are used to specify keyboard sequences for manipulating text.

GUI interfaces are available for both vi an d emacs. Any of these, and many other, text editors

would be an excellent choice for the programming cover ed in this boo k. Don't spend to o much

time trying to pick the "best" one.

The GNU prog r am ming tools are executed from the command line instead of a graphical

user interface (GUI). (IDEs for Linux and Unix are typically GUI fron tends that ex ecute GNU

programming tools behind the scenes.) The GNU compiler, gcc, c reates an exec utable program

by perfor ming several distinct steps [22]. The description here assumes a single C source file,

filename.c.

1. Preprocessing. This resolves compile r directives such as #include (file inclusion), #define

(macro definition), and #if (conditional compilation) by invoking the program cpp. Com-

pilation can be stopped at the end of the preprocessing phase with the -E option, which

writes the resulting C source code to standard out.

2. Compilation itself. The source code that results from preprocessing is translated into as-

sembly language. Compilation can be stopped at the end of the comp ilation phase with the

-S option, which writes the assembly language source code to filename.s.

3. Assembly. The assembly language source code that results from compilation is translated

into machine code by invoking the as program. Comp ilation can be stopped at the end o f

the assembly phase with the -c option, which writes the machine code to filename.o.

4. Linking. The machine code that results from assembly is linked with other machine code

from standard C libraries and other machine code modules, and addresses are resolved.

This is accomplished by invoking the ld program. The default is to write the executable

file, a.out. A different executable file name can be specified with the -o option.

7.2 Program Organization

Programs wr itten in C are organized into functions. Each function has a name that is unique

within the program. Prog r am execution begins with the function named "main."

I recommend that you create a separate directory for each program you write. Place all the source

files, plus the Makefile (see Appendix B) for the p rogram in this directory. This will help you keep

your program fil es organized.

Let us consider the m in imum C program, Listing 7.1.

1 /

*

2

*

doNothingProg1.c

3

*

The minimum components of a C program.

4

*

Bob Plantz - 6 June 2009

5

*

/

6

134 CHAPTER 7. PROGRAMMING IN ASSEMBLY LANGUAGE

7 int main(void)

8 {

9 return 0;

10 }

Listing 7.1: A "null" program (C).

The only thing this program does is return a zero.

Despite the fact that this program accomplishes very little, som e instructions need to be

executed just to return zero. In ord er to see what takes place, we first translate this program

from C to assembly language with the GNU/Linux command :

gcc -S -O0 doNothingProg1.c

This creates the file doNothingProg1.s (see Listing 7.2), which contains the assembly language

generated by the gcc compiler. The two c ompiler options used here have the following mean in gs:

-S Causes the compiler to create the .s file, which co ntains the assembly language equ ivalent

of the source code. The machine code (.o fi le) is not created.

-O0 Do not do any optimization. For instructional purpose s, we want to see every step of the

assembly language. (This is upper-case "oh" followed by the num eral zero.)

1 .file "doNothingProg1.c"

2 .text

3 .globl main

4 .type main, @function

5 main:

6 .LFB2:

7 pushq %rbp

8 .LCFI0:

9 movq %rsp, %rbp

10 .LCFI1:

11 movl $0, %eax

12 leave

13 ret

14 .LFE2:

15 .size main, .-main

16 .section .eh

_

frame,"a",@progbits

17 .Lframe1:

18 .long .LECIE1-.LSCIE1

19 .LSCIE1:

20 .long 0x0

21 .byte 0x1

22 .string "zR"

23 .uleb128 0x1

24 .sleb128 -8

25 .byte 0x10

26 .uleb128 0x1

27 .byte 0x3

28 .byte 0xc

29 .uleb128 0x7

30 .uleb128 0x8

31 .byte 0x90

32 .uleb128 0x1

33 .align 8

34 .LECIE1:

35 .LSFDE1:

36 .long .LEFDE1-.LASFDE1

7.2. PROGRAM ORGANIZATION 135

37 .LASFDE1:

38 .long .LASFDE1-.Lframe1

39 .long .LFB2

40 .long .LFE2-.LFB2

41 .uleb128 0x0

42 .byte 0x4

43 .long .LCFI0-.LFB2

44 .byte 0xe

45 .uleb128 0x10

46 .byte 0x86

47 .uleb128 0x2

48 .byte 0x4

49 .long .LCFI1-.LCFI0

50 .byte 0xd

51 .uleb128 0x6

52 .align 8

53 .LEFDE1:

54 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"

55 .section .note.GNU-stack,"",@progbits

Listing 7.2: A "null" program (gcc assembly language). Much of the code the compiler generates

(lines 16 53) is mean t to improve the efficiency of the program or for debugging

and is not relevant to the concepts discussed in this book.

Unlike the relationship between assembly language and machine language, there is not a one-to-one

relationship between higher-level languages and assembly language. The a ssembly language gener-

ated by a compiler may differ across different releases of the compiler, and different optimization levels

will generally affect the code that is generated by the compiler. The code in Listing 7 .2 was generated

by release 4.2.3 of gcc and the optimization level was -O0 (no optimization). If you attempt to replicate

this example, your results may vary.

This is not easy to read, even for an experienced assembly language programmer. So we will

start with the program in Listing 7.3, which was written in assembly language by a program-

mer (rather than by a compiler). N aturally, the programmer has added comments to improve

readability.

1 # doNothingProg2.s

2 # Minimum components of a C program, in assembly language.

3 # Bob Plantz - 6 June 2009

4

5 .text

6 .globl main

7 .type main, @function

8 main: pushq %rbp # save caller's frame pointer

9 movq %rsp, %rbp # establish our frame pointer

10

11 movl $0, %eax # return 0 to caller

12 movq %rbp, %rsp # restore stack pointer

13 popq %rbp # restore caller's frame pointer

14 ret # back to caller

Listing 7.3: A "null" program (program mer assembly language).

After e xamining what the assembly language programmer did we will re tur n to Listing 7.2 and

look at the assembly language generated by the compiler.

Assembly language provides of a set of mnemonics that have a one-to-one correspondence to

the machine language. A mnemonic is a short, English-like g roup of characters that suggests

the action of the instruction. For example, "mov" is used to represent the instruction that copies

("moves") a value from one place to another. Thus, the machine instruction

136 CHAPTER 7. PROGRAMMING IN ASSEMBLY LANGUAGE

4889E5

copies the entire 64-bit v alue in the rsp register to the rbp register. Even if you have neve r seen

assembly language before, the mnemonic representation of this instruction in Listing 7.2,

9 movq %rsp, %rbp # establish our frame pointer

probably makes much more sense to you than the machine code. (The 'q' suffix on "mov" means

a quadword (64 bits) is being moved.)

Strictly sp ea king, the mnemonics are completely arbitrary, as long as you have an assembler pro-

gram that will translate them into th e desired mach ine instructions. However, most assembler

programs more or less use the mnemonics used in the manuals provided by CPU vendors.

The first thing to notice is that assembly language is line-oriented. That is, the re is only o ne

assembly language statement on each line, and none of the statements spans more than one line.

A statement can continue onto subsequent lines, but this requires a special line -continuation

character. This differs from the "free form" nature o f C/C++ where the line structure is irrel-

evant. In fact, good C/C++ programmers take advantage of this to im prove the readability of

their code.

Next, notice that the pattern of each line falls in to o ne of three categories:

Lines 1 3 begin with the "#" character. The rest of the line is written in En glish and is

easily read. The "#" character in the first column design ates a comment line. Just as with

a high-level language, comments are intended solely for the human reader and have no

effect on the program.

Lines 4 and 10 have been left blank in order to improve re adability. (Well, they will improve

readability once you learn how to read assembly language.)

The remaining nine lines are organized into columns. They probably do not make mu ch

sense to you at this point because they are written in assembly language, but if you look

carefully, each of the assembly language lines is organized into four possible fields:

label : operation operand(s) #comment

The assembler requires at least one space or tab character to separate the fields. Whe n

Use the tab key

for readability.

writing assembly language, your program will be much easier to read if you use the tab

key to move from one eld to the next.

Let us consider each field:

1. The label field allows us to give a symbolic name to any line in the program. Since each

line corresponds to a memory location in the program, other parts of the program can the n

refer to the memory location by name.

(a) A label consists of an identifier immediately followed by the ":" character. You, as the

programmer, must make up these identifiers. The rules for creating an ide ntifier are

given below.

(b) Notice that most lines are not labeled.

2. The operation eld pr ovides the basic purpose of the line. There are two types of opera-

tions:

(a) as s embly language mnemonic The assembler translates these into actual m achine

instructions, which are copied into memory w hen the program is to be executed. Each

machine instruction will occupy from one to five bytes of memory.

(b) as s embler di rective(pseudo op) Each of these operations beg ins with the period

(".") character. They are used to direct the way in which the assembler translates the

file. They do not translate directly into machine instructions, although some do cause

memory to be allocated.

Assembler

directives are

called Pseudo

Ops in info as.

Read about

Pseudo Ops.

7.2. PROGRAM ORGANIZATION 137

3. The operand field specifies the arguments to be used by the operation. The arguments are

specified in several different way s:

(a) an explicit or literal value, e.g., the integer 75.

(b) a name that has meaning to the assembler, e.g., the name of a register.

(c) a name that is made up by the programmer, e.g., the name of a variable or a constant.

Different operations require differing numbers of operands zero, one, two, or three.

4. The comment field is just like a comment line, except it takes up only the remainder of

the line. Since assem bly language is not as easy to read as higher-level languages, good

programmers will place a comment on almost every line.

The rules for creating an id entifier are very similar to those for C/C++. Each identifier

consists of a sequence of alphanumeric characters and may include other printable characters

such as ".", "

_

", and "$". The first character must not be a numeral. An identifier may be

any length, and all characters are significant. Case is also significant. For example, "myLabel"

Identifiers are

called symbols in

info as. Read

about symbol

names.

and "MyLabel" are diffe r ent. Com piler-generated labels begin with the "." character, and many

system related names begin with the "

_

" character. It is a good idea to avoid beginning your

own labels with the "." or the "

_

" character so that you do not inad vertently create one that is

already in use by the system.

Integers can be used as labels, but they have a special meaning. They are used as local labels, which

are sometimes useful in advanced as sembly language programming techniques. They will not be

used in this book.

The assembler program, as, will translate the file doNothingProg2.s (see Listing 7.3) into

machine code and provide the memory allocation information fo r the ope rating system to use

when the program is executed. We will first de scribe the contents of this file, then look at the

GNU c ommands to convert it into an executable program.

Now we turn attention to the specific file in Listing 7.3, doNothingProg2.s. On line 5 you

recognize

5 .text

as an assembler directive bec ause it starts with a period character. It directs the assembler to

place whatever follows in the text section.

What does "text section" mean? When a source code file is translated in to machine code, an

object file is produced. The obj ect file organization follows the Executable and Linking Format

(ELF). ELF file s can be seen from two different poin ts of view. Programs that store information

in E LF files store it in sections. The ELF standard specifies many different ty pes of sections,

each depending on the type of information stored in it.

The .text directive specifies that when the following assembly language statements are

translated into machine instructions, they should stored in a text section in the object fi le. Text

sections are use d to store program instructions in machine code format.

GNU/Linux divides memory into different segments for specific purpose s when a program is

loaded from the disk. The four gener al categories are:

Memory

segments.

tex t (also called code) is where program instructions and constant data are stored. It is

read-only memory. The operating system pr events a program from changing anything

stored in the text segment.

data is where global variables and static local variables are stored. It is read-write memory

and rem ains in place for the dur ation of the program.

stack is where automatic local variables and the data that links functions are stored. It is

read-write memory that is allocated and deallocated dynamically as the progr am executes.

he ap is the pool of memory available when a C program calls the malloc functio n (or C++

calls new). I t is read-write memory that is allocated and deallocated by the program.

138 CHAPTER 7. PROGRAMMING IN ASSEMBLY LANGUAGE

The operating system needs to vie w an ELF le as a set of segments. One of the functions

of the ld program is to group sections together into segments so that they c an be loaded into

memory. Each segment contains one or more sections. This grouping is generally accomp lished

by arrays of pointers to the file, not necessarily by physically m oving the sections. That is, the re

is still a section view of the ELF le remaining. So the info rmation stored in an ELF file is

groupe d into sections, but it may or may not also be group ed into seg ments.

All ELF files

have sections,

but only some

have segments.

When the ope r ating system lo ads the program into memory, it uses the segment view of the

ELF file. Thus the contents of all the text sections will be loade d into the text segment of the

program process.

This has been a very simplistic over view of E LF sections and segments. We w ill touch on the

subject again briefly in Section 8.1. Further details can be found by reading the man page for elf

and sources like [13] and [21]. The readelf program is also useful for learning about ELF files.

It is included in the binutils collection of the GNU binary tools so is installed along with as and

ld.

The assembler directive on line 6

6 .globl main

has one operand, the identifier "main." As you know, all C/C++ programs start with the function

named "main." In this book, we also start our assembly language programs with a main function

and execute them within the C/C++ runtime environment. The .globl dire ctive makes the name

globally known, analogo us to defining an identifier outside a function body in C/C++.

1

That is,

code outside this file can refer to this name. When a program is executed, the operating system

does some preliminary set up of system resources. It then starts program execution by calling a

function named "main," so the name must be global in scope.

One can write stand-a lone assembly language programs. In GNU/Linux this is accomplished by

using the

__

start label on the first instruction in the program. The object (.o) files are then linked

using the ld command directly rather than use gcc. See Section 8.5.

The assembler directive on line 7

7 .type main, @function

has two operands, a name and a type. The name is en tered into the symbol table (see Section

7.3). In ad dition to the machine code, the object file contains the symbo l table along with infor-

mation about each symbol. The ELF format recognizes two types of symbols, data and function.

The .type directive is used here to specify that the symbol main is the name of a function.

None of these three directives get translated into actual machine instructions, and none

of them occupy any memory in the finished program. Rather, they are used to describe the

characteristics of the statements that follow.

IMPORTANT! You need to distinguish

assembler directives instructions to the assembler (the program that trans-

lates assembly langu age into machine code).

from

assembly language instructions the code that gets translated into machine

code.

What fo llows next in Listing 7.3 are the actual assembly languag e instructions. They will

occupy memory when they are translated. The first instruction is on line 8:

8 main: pushq %rbp # save caller's frame pointer

1

Function names are defined outside the fu nction body (outside the {. . .} block) in C/C++. Hence, the names are

global, and a function can call functions defined in other files. Variables can also be declared outside functions. Functions

in other files can reference such variables using the extern storage class specifier.

7.2. PROGRAM ORGANIZATION 139

It illustrates the use of all four fields on a line of assembly language.

1. First, there is a label on this line, main. Since this name has been declared as a global

name (with the assembler directive .globl main), functions d efined in other files can c all

this function by name. In particular, after the oper ating system has loaded this function

into memory, it can call main, and execution will start with this line.

2. The operation is a pushq instruction, which stands for "push quadword." It "pushes" a

value onto the call stack. This will be explained in Section 8.2 ( page 158). For now, this is

a technique for temporarily saving the value stored in the operand.

The "quadw ord" p art of this instruction means that 64 bits are moved. As you will see in

more detail later, as requires that a single letter be appended to most instructions:

"b" "byte" operand is 8 bits

"w" "word" operand is 16 bits

"l" "long" operand is 32 bits

"q" "quadword" operand is 64 bits

to specify the size of the op erand(s).

3. There is one operand, %rbp. The GNU assembler requires the "%" prefix on the operand to

indicate that this is the name of a register in the cpu. This instruction saves the 64-bit

value in the rbp register o n the call stack.

The v alue in the rbp register is an address. In 64-bit mode add r esses can be 64 bits long,

and we have to save the entire address.

Addresses can be

64 bits.

4. Finally, we have added a comment to this line. The comment shows that the purpose of

this instruction is to save the value that the calling function was using as a frame pointer.

(The reasons for doing this will be explained in Chapter 8.)

The next line

9 movq %rsp, %rbp # establish our frame pointer

uses only three of the fields.

1. First, ther e is no label on this line. Notice that the label field is left blank by using the tab

key to indent into the se cond field, the operation field . It is important for readability that

you use the tab key to keep the beginning of each field lined up in a column.

2. The operation is a movq instruction, which stands for "move quadword." It "moves" a

bit pattern from one location to another. Actually, "copy" is probably a better term than

"move," because is do es not change the bit pattern in the place copied f rom. But "move"

has become the accepted terminology for this operation.

3. There are two operands, %rsp and %rbp. Again, the "%" p refix to each operand means that

it is the name of a reg ister in the c pu.

(a) The order of the operands in as is: sou rce, destination.

(b) Thus this instruction copies the 64-bit value in the rsp register to the rbp register in

the cpu.

4. Finally, I have added a com ment to this line. The comment shows that the purpose of this

instruction is to establish a new frame pointer in this function. (Again, the reason s for

doing this will be explained in Chapter 8.)

As the name of this "program" implies, it does not do anything, but it still must retur n to

the oper ating system. GNU/Linux expects the main function to return an integer to it, and the

return value is placed in the eax register. Zero means that the program executed with no errors.

Function return

value goes in eax

register.

This may not make a lot of sense to you at this point, but it should become clearer later in the

book. Returning the integer ze r o to the operating system is accomp lished on line 12:

140 CHAPTER 7. PROGRAMMING IN ASSEMBLY LANGUAGE

11 movl $0, % eax # return 0 to caller

This is an ell (for

"long"), not a one.

1. This line also has no label. After indenting, it begins with a movl instruction.

2. The first op erand is prefixed with a "$ " character, which indicates that the operand is to

be taken as a literal value. That is, the source operand is the integer zero. You recognize

that the second operand is the eax register in the cpu. This instruction places a copy of the

32-bit integer zero in the eax re gister.

Even though the CPU is in 64-bit mode, 64-bit integers are seldom needed. So the default

behavior of environment is to use 32 bits for ints. 64- bit ints can be specified in C/C++

ints are 32 bits.

with either the long or the long long modifier. In assembly language the programmer

would use quadw ords for integers. (As pointed out on page 118 this instruction also zeros

the high-order 32 bits of the rax register. But you should not write code that depen ds upon

this behavior.)

3. The comment on this line shows that the purpose of this instruction is to return a zero to

the calling function (the operating system).

The first two instructions in this fun c tio n,

8 main: pushq %rbp # save caller's frame pointer

9 movq %rsp, %rbp # establish our frame pointer

form a prologue to the actual p rocessing that is performed by the function. They changed some

values in registers and used the call stack. Before re tur ning to the operating system, it is essen-

tial that an epilogue be executed to restore the values. The compiler uses the leave instruction

(see Listing 7.2) to accomplish this. The leave instruction is equivalent to the fo llowing two

instructions:

12 movq %rbp, %rsp # restore stack pointer

13 popq %rbp # restore caller's frame pointer

1. N o labels are used on these lines. The movq instruction ensures that the stack pointer

is moved back to the location where the rbp register was saved. Since the stack pointer

was not used in this func tion, this instruction is not necessary here. But your program

will crash if the stack pointer is not in the correct location when the next instruction is

executed, so it is a good idea to get into the habit of always using both these instructions

at the end of a function.

2. The popq instru ction copies the 64-bit value on the top of the call stack into the operand

and moves the stack pointer accordingly. (You will learn about using the stack pointer in

Section 8.2.) The operand in this case is the rbp register.

3. The comm ent states that the reason for the popq instruction is to r estore the frame po inter

value f or the calling function (the operating system since this is main).

4. Although the leave instruction is slightly more efficient, we will use the movq an d popq

instructions in this book to emphasize the two operations that must be perfor med.

Finally, this function must return to the function that called it, which is back in the operating

system.

14 ret # back to caller

1. This line has no label. And the instruction does not specify any oper an ds. This is the

instruction for returning program control back to the function that c alled this one. In

this particular case, since this is the main function, control is passed back to the operating

system.

2. He re is an example of an instruction that changes the value in the instruction pointer reg-

ister (rip) in order to alter the linear flow of the pro gram. We will see later the mechanism

that is used to implement this.

3. The comment on this line briefly describes the reason for the instruction.

7.2. PROGRAM ORGANIZATION 141

7.2.1 First instructions

As you can see from this example, even a function that does nothing requires several instruc-

tions. The most commonly used assembly language instruction is

movs source, destination

where s denotes the size of the operand:

s meaning number of bits

b byte 8

w word 16

l longword 32

q quadword 64

In the Intel syntax, the size of the data is determined by the o perand, so the size character

(b, w, l, or q) is not appended to the instruction, and the order of the o perands is reverse d:

Intel®

Syntax mov destination, source

The mov instruction copies the bit pattern from the source operand to the d estination operand.

The bit pattern of the source operand is not changed. If the destination operand is a register

and its size is less than 64 bits, the eff ect on the other bits in the register is shown in Table 7.1.

size destination bits remaining bits

8 7 0 63 8 are unchanged

8 15 8 63 16, 7 0 are unchanged

16 15 0 63 16 are unchanged

32 31 0 63 32 are set to 0

Table 7.1: Effect on other bits in a re gister when less than 64 bits are changed.

The mov instruction does not affect the rflags register. In particular, neither the CF nor

the OF flags are affecte d. No mor e than one of the two operands may be a memory loc ation.

Thus, in order to move a value from one memory location to ano ther, it must be m oved from the

first memory location into a register, then from that r egister into the seco nd memory location.

(Accessing data in memory will be covered in Sections 8.1 and 8.3.)

You have to use a

register to move

data.

The other instructions used in this "do nothing" program pushq, popq, and ret use the

call stack. The call stack will be discussed in Section 8.2, which will then allow us to discuss

these instructions. For now, yo u should memorize how to use them as "boiler plate" for the

prologue and epilogue of each f unction.

7.2.2 A Note About Syntax

If you have any experience with x86 assembly language, the syntax u sed by the GNU assembler,

as, will lo ok a little strange to you. In principle, the syntax is arbitrary. A progr ammer could

invent an y sort of assembly language and write a program that would translate it into the

appropriate machine code. But most cpu m an ufacturers publish a m an ual with a suggested

assembly language syntax for their cpu.

Most assemblers for the x86 cpus follow the syntax suggested by Intel®, but as uses the

AT&T syn tax. It is no t radically diff erent from Intel's. Some of the more striking differences

are:

142 CHAPTER 7. PROGRAMMING IN ASSEMBLY LANGUAGE

AT&T Intel®

operand order: source, destination destination, source

register names: prefixed with the "%" char-

acter, e.g., %eax

just the name, e.g., eax

literal values: prefixed with the "$" char-

acter, e.g., $123

just the value, e.g., 123

operand size: use the b, w, l, o r q suffix

on o pcode to denote byte,

word, long, or quadruple

word

determined by the register

specification (more compli-

cated if o perand is stored

in memory)

Pay attention to

operand order.

The GNU assembler, as, does not require the size suffix on i nstructions in all cases. From the info

documentation for as:

9.13.4 Instruction Naming

-----------------

Instruction mnemonics are suffixed with one character modifiers which specify the size

of operands. The letters 'b', 'w', 'l' and 'q' specify byte, word, long and quadruple

word operands. If no suffix is specified by an instruction then 'as' tries to fill in the

missing suffix based on the destination register operand (the last one by convention).

Thus, 'mov %ax, %bx' is equivalent to 'movw %ax, %bx'; also, 'mov $1, %bx' is equivalent

to 'movw $1, bx'. Note that this is incompatible with the AT&T Unix assembler which

assumes that a missing mnemonic suffix implies long operand size. (This incompatibility

does not affect compiler output since compilers always explicitly specify the mnemonic

suffix.)

It is recommended that you get in the habit of using the size suffix letters when you begin writing

your own assembly language. This wi ll help you to avoid introducing obscure bugs in your code.

The assembler directives are typically not specified by the cpu manufacturer, so you will see

a much wider variety of syntax, depending on the particular assembler program. We will no t

try to list any differences here.

The GNU assembler, as, also supports the Intel® syntax. The assembler directive .intel

_

syntax

says that following assembly language is written in the Intel® syntax; .att

_

syntax says it is

written in AT&T syntax. Using Intel® syntax, the assembly language c ode in Listing 7.3 would

be written

main: push rbp

mov rbp, rsp

Intel®

Syntax mov eax, 0

mov rsp, rbp

pop rbp

ret

Keep in mind that gcc produces assembly language in AT&T syntax, so you will undoubtedly

find it easier to use that when you write your own code. The .intel

_

syntax directive mig ht be

useful if somebody gives you and e ntire functio n written in Intel® syntax assembly language.

The syntax rules for our particular assembler, as, are described in an on-line manual that is

in the GNU info format. as supports some two dozen computer architec tur es, so it is a challenge

to wade through the info manual to find what you nee d. On the other hand, it provides the

most up to date information. And it is especially important for le arning how to use assembler

directives because they are specific to the assembler.

Now would be a good time to start learn in g how to use info for as. As you encounter new

assembly language conce pts in this book, also look them up in info for as. If you are unfamiliar

with info , at the GNU/Linux prompt, simply type

$ info info

7.2. PROGRAM ORGANIZATION 143

for a nice tutorial.

7.2.3 The Additional Assembly Language Generated by the Co m piler

First, notice that the compiler-generated labels (e.g., .LFB2 , .LCFI0,. . . ) each begin w ith a period

character, just like assembler directives. You can te ll that they are labels because o f the ":"

immediately following.

If you compare the asse mbly language program in Listing 7.3 with that generated by the

compiler in Listing 7.2, you can see that the compiler includes much more information in the

file. Most of this information will not be used elsewhere in this book. We explain it here for

completeness.

The first line,

1 .file "doNothingProg1.c"

identifies the name of the C source le. When you write in assembly lang uage this information

clearly does not apply.

The five lines

5 main:

6 .LFB2:

7 pushq %rbp

8 .LCFI0:

9 movq %rsp, %rbp

set up the call stack for this function. The use of the call stack will be explained in m ore detail

in Section 8.2 on page 158 and in subsequent Sections.

The additional labels generated by the compiler, .LFB2 and .LCF10 are used for entries in the

unwind table, which is briefly described below. Our programs will no t include unwind tables, so

we will not need such labels.

Notice that the lines after the two labels main, and .LFB2 are blank. The assembler does not

generate any machine code for either of these two lines, so they do not take up any memory. The

next thing that comes in memory is the

7 pushq %rbp

instruction. Thus, both labels apply to the address where this instruction is locate d.

The instruction

12 leave

accomplishes the same thing as the two instructions

12 movq %rbp, %rsp # restore stack pointer

13 popq %rbp # restore caller's frame pointer

in the assembly language written by a programmer (Figure 7.3). We use the two individual

instructions because they explicitly show the operations that must be performed at the end of

each function. They undo the set up of the call stack that took place at the very beginning of the

function. (The goal of this book is to show what the computer is doing.)

Lines 16 53 m ake up what is called an unwi nd table. The -fasynchronous-unwind-tables

option causes the compiler to generate an unwind table in dwarf2 form at for the fu nction. In my

version of the compiler, the default is to ge nerate the table in 64-bit mode and not generate it in

32-bit mode. This may vary depen ding on different versions of the compiler. We will no t use the

table so will use the -fno-asynchronous-unwind-tables option to turn off the feature, as show n

in Listing 7.4. The GNU/Linux c ommand is:

gcc -S -O0 -fno-asynchronous-unwind-tables doNothingProg1.c

which gives the comp iler-generated assembly language in Listing 7.4.

144 CHAPTER 7. PROGRAMMING IN ASSEMBLY LANGUAGE

1 .file "doNothingProg1.c"

2 .text

3 .globl main

4 .type main, @function

5 main:

6 pushq %rbp

7 movq %rsp, %rbp

8 movl $0, %eax

9 leave

10 ret

11 .size main, .-main

12 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"

13 .section .note.GNU-stack,"",@progbits

Listing 7.4: A "null" program (gcc assembly language). We have used the

-fno-asynchronous-unwind-tables compiler option to remo ve the exception

handler frame.

Lines 15, 54, and 55 in Listing 7.2 are the same as lines 11 13 in Listing 7.4. They also use

directives that d o not apply to the prog rams we will be writing in this book.

Finally, you may have noticed that the main label is o n a line by itself in Listing 7.2 but not

in Listing 7.3. When there is only a label on a line, no machine instructions are gene rated, and

no memory is allocated . Thus, the label really applies to the next line. It is common to place

labels on their ow n line so that longer, easier to read labels can be used while still keeping the

operations visually line d up in a column. This technique is illustrated in Listing 7.5.

1 # doNothingProg3.s

2 # The minimum components of a C program, written in assembly

3 # language. Same as doNothingProg2.s, except with the main

4 # label on its own line.

5 # Bob Plantz - 7 June 2009

6

7 .text

8 .globl main

9 .type main, @function

10 main:

11 pushq %rbp # save caller's frame pointer

12 movq %rsp, %rbp # establish our frame pointer

13

14 movl $0, %eax # return 0 to caller

15 movq %rbp, %rsp # restore stack pointer

16 popq %rbp # restore caller's frame pointer

17 ret # back to caller

Listing 7.5: The "null" program rewritten to show a label placed on its own line.

7.2.4 Viewing Both the Assembly La nguage and C Source Code

The gcc compiler provides a set of options that will allow you to generate a listing that shows

both the assembly language and the corresponding C statement(s). This will allow you to more

easily see the assembly language that the compiler generates to implement a C statemen t in

assembly language. Compiling the program in Listing 7.1 with the c ommand:

$ gcc -O0 -g -Wa,-adhls doNothingProg1.c > doNothingProg1.lst

generates the assembly languag e code in Listing 7.6.

7.2. PROGRAM ORGANIZATION 145

1 GAS LISTING /tmp/cczPwhLl.s page 1

2

3

4 1 .file "doNothingProg1.c"

5 9 .Ltext0:

6 10 .globl main

7 12 main:

8 13 .LFB0:

9 14 .file 1 "doNothingProg1.c"

10 1:doNothingProg1.c

****

/

*

11 2:doNothingProg1.c

**** *

doNothingProg1.c

12 3:doNothingProg1.c

**** *

The minimum components of a C program.

13 4:doNothingProg1.c

**** *

Bob Plantz - 6 June 2009

14 5:doNothingProg1.c

**** *

/

15 6:doNothingProg1.c

****

16 7:doNothingProg1.c

****

int main(void)

17 8:doNothingProg1.c

****

{

18 15 .loc 1 8 0

19 16 .cfi

_

startproc

20 17 0000 55 pushq %rbp

21 18 .LCFI0:

22 19 .cfi

_

def

_

cfa

_

offset 16

23 20 0001 4889E5 movq %rsp, %rbp

24 21 .cfi

_

offset 6, -16

25 22 .LCFI1:

26 23 .cfi

_

def

_

cfa

_

register 6

27 9:doNothingProg1.c

****

return 0;

28 24 .loc 1 9 0

29 25 0004 B8000000 movl $0, % eax

30 25 00

31 10:doNothingProg1.c

****

}

32 26 .loc 1 10 0

33 27 0009 C9 leave

34 28 000a C3 ret

35 29 .cfi

_

endproc

36 30 .LFE0:

37 32 .Letext0:

38 GAS LISTING /tmp/cczPwhLl.s page 2

39

40

41 DEFINED SYMBOLS

42

*

ABS

*

:0000000000000000 doNothingProg1.c

43 /tmp/cczPwhLl.s:12 .text:0000000000000000 main

44

45 NO UNDEFINED SYMBOLS

Listing 7.6: Assembly language embedded in C source code listing. The line number in the C

source file is also indicated with the .loc assembler directive. Note that the C source

code line numbering begins with 0; this can vary with different versions of as.

The "-g" option tells the compiler to include symbols for debugging. "-Wa, " passes the imme-

diately following options to the assembly phase of the compilation process. Thus, the options

passed to the assembler are "-adhls", which cause the assembler to generate a listing with the

following characteristics:

-ad: omit debugging d irectives

146 CHAPTER 7. PROGRAMMING IN ASSEMBLY LANGUAGE

-ah: include-high level source

-al: include assembly

-as: include symbols

As you can see above the secondary letters can be combined with one "-a." The "d" has the same

effect as the "-fno-asynchronous-unwind-tables" option. The listing is written to standard out,

which can be redirected to a file. We gave this file the ".lst" le extension because it cannot be

assembled.

7.2.5 Minimum Program in 32-bit M o de

The x86-64 processors can also run in 32-bit mode. Most GN U/Linux distributions also provide

a 32-bit version. Some d istributions are only available in 32-bit.

The gcc option to compile a program for 32-bit mode is -m32. Listing 7.7 shows the assembly

language generated by the GNU/Linux command:

gcc -S -O0 -m32 doNothingProg1.c

1 .file "doNothingProg1.c"

2 .text

3 .globl main

4 .type main, @function

5 main:

6 leal 4(%esp), %ecx

7 andl $-16, %esp

8 pushl -4(%ecx)

9 pushl %ebp

10 movl %esp, %ebp

11 pushl %ecx

12 movl $0, %eax

13 popl %ecx

14 popl %ebp

15 leal -4(%ecx), %esp

16 ret

17 .size main, .-main

18 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"

19 .section .note.GNU-stack,"",@progbits

Listing 7.7: A "null" program (gcc assembly language in 32-bit mode).

The first thing to notice is that all the instructions use the "l" suf fix to indicate "longword"

because addresses are 32 bits. And only the 32-bit portion of the registers is used. That is, esp

instead of rsp, etc.

The prologue in the 32-bit main function,

6 leal 4(%esp), %ecx

7 andl $-16, %esp

8 pushl -4(%ecx)

9 pushl %ebp

10 movl %esp, %ebp

11 pushl %ecx

is much more complex that the 64-bit version. This has to do with the use of 32-bit addresses

and other p erformance issues that are beyond the scope of this book. Similarly, the epilogue,

13 popl %ecx

14 popl %ebp

15 leal -4(%ecx), %esp

7.3. ASSEMBLERS AND LINKE R S 147

needs to be mor e complex in ord er to match the pr ologue.

Although the prologu e/epilogue generated by the compiler make a more ro bust program for a

production environment, the essence of the "do nothing" program in 32-bit mode can be written

as shown in Listing 7.8.

1 # doNothingProg4.s

2 # The minimum components of a C program, written in assembly

3 # language. A 32-bit version of doNothingProg1.s.

4 # Bob Plantz - 7 June 2009

5

6 .text

7 .globl main

8 .type main, @function

9 main:

10 pushl %ebp # save caller's frame pointer

11 movl %esp, %ebp # establish our frame pointer

12 movl $0, %eax # return 0 to caller

13 movl %ebp, %esp # restore stack pointer

14 popl %ebp # restore caller's frame pointer

15 ret # back to caller

Listing 7.8: A "null" program (programmer assembly language in 32-bit mode).

7.3 Assemblers and Linkers

We present a highly simplified view of how assemblers and linkers work here. The goal of this

presentation is to introduce the concepts. Most assemblers and linkers have capabilities that go

far beyond the concepts described here (e.g., macro expansion, dynamic load/link). We leave a

more thorough discussion of assemblers and linkers to a book on systems programming.

7.3.1 Assemblers

An assembler must perform the following tasks:

Translate assembly language mnemonics into machine language.

Translate symbolic names for addresses into numeric addresses.

Since the numeric value of an address may be required be fore an instruction can be trans-

lated into machine language, there is a proble m with forward references to memory locations.

For example, a code sequence like:

1 # if (response == 'y')

2 cmpb $'y', response # was it 'y'?

3 jne noChange # no, there is no change

4

5 # then print the "save" message

6 movq $saveMsg, %rbx # point to first char

7 saveLoop:

8 cmpb $0, (%rbx) # at null character?

9 je saveEnd # yes, exit loop

10

11 movl $1, %edx # no, send one byte

12 movq %rbx, %rsi # at this location

13 movl $STDOUT, %edi # to screen.

14 call write

15 incq %rbx # increment char pointer

16 jmp saveLoop # check at top of loop

148 CHAPTER 7. PROGRAMMING IN ASSEMBLY LANGUAGE

17 saveEnd:

18 jmp allDone # skip over false block

19

20 # else print the "discard" message

21 noChange:

22 movq $discardMsg, %rbx # point to first char

creates a problem for the assembler when it needs to translate the

3 jne noChange # no, there is no change

instruction o n line 3. (Don't forget that assembly lang uage is line oriented ; translation is done

one line at a time.) When this code sequence is executed, the immediately previous instruction

(cmpb $'y', response) compares the byte stored at location response with the character 'y'.

If they are not equal, i.e., a 'y' is not stored at loc ation response, the jne instruction causes

program flow to jump to location noChange. In order to accomplish this action, the translation

of this instruction (the machine code) must include a numerical value that specifies how far to

jump. That is, it must include the distance, in number of bytes, between the jne instruction

and the memory location labeled noChange on line 23. In or der to compute this distance, the

assembler must determine the address that correspond s to the label noChange when it translates

this instruction, but the assembler has not even encountered the noChange label, much less

determined its corresponding address.

The simplest solution is to u se a two-pass assembler:

1. The first pass builds a symbol table, which provides an add ress for each memory label.

2. The second pass pe rforms the actual translation into machine language, consulting the

symbol table fo r numeric values of the symbols.

Algorithm 7.1 is a highly simplifie d description of how the first pass of an assembler works.

Algorithm 7.1: First pass of a two-pass assembler.

Data: SymbolT a ble with each entry a Symb ol /N umber pair

Data: LocationCount er

LocationCounter 0; 1

get first line of so urce code; 2

while more lin e s do 3

if line has label then 4

SymbolTable.Symbol label; 5

SymbolTable.Number LocationCounter; 6

determine n umber of bytes required by line when assembled; 7

LocationCounter LocationCounter + number of bytes; 8

get next line of source code; 9

The symbol table is carried from the first pass to the second. The sec ond pass also consults

a table of operation codes, which provides the machine code corre sponding to each instruction

7.3. ASSEMBLERS AND LINKE R S 149

mnemonic. A highly simplified description of the second pass is g iven in Algorithm 7.2.

Algorithm 7.2: Second pass of a two-pass assembler.

given: Sym bolT able from Pass One

given: Op CodeT able

Data: LocationCount er

LocationCounter 0; 1

get first line of so urce code; 2

while more lin e s do 3

if line is instruction then 4

find machine code from Op-Code Table; 5

find symbol v alue from SymbolTable; 6

assemble instruction into machine code; 7

else 8

carry out directive; 9

write machine code to object file; 10

determine n umber of bytes used; 11

LocationCounter LocationCounter + number of bytes; 12

get next line of source code; 13

7.3.2 Linkers

Look again at the code sequence above. On line 14 there is the instruction:

call write

This call to the write function is a reference to a memory label outside the file being assembled.

Thus, the assembler has no way to determine the address of write fo r the symbol table during

the first pass. The on ly thing the assemble r can do during the second pass is to leave en ough

memory space for the address of write when it assembles this instruction. The actual address

will h ave to be filled in later in order to create the entire program. Filling in these references to

external memory locations is the job of the linker program.

The algorithm fo r linking functions together is very similar to that of the assembler. The

same forward referen c e problem exists. Again, the simplest solution is to use a two-pass linker

program.

The highly simplified algorithm in Algorithms Algorithms 7.3 and 7.4 also provide for loading

the entire program into memory. The functions are linked together as they are loaded. In

practice, this is seldom done. For examp le, the GNU linker, ld, does not lo ad the program into

memory. Instead, it creates another machine language fi le the executable program. The

executable p rogram file contains all the function s of the program w ith all the cr oss-function

memory references resolved. Thus ld is a link editor pr ogram.

Getting even more realistic, man y of the fu nctions used by a program are not even included

in the executable program file. They are loaded as required w hen the program is executing. The

link e ditor pro gram must provide dynamic links fo r the executable program file.

Howeve r, you can get the general idea of linking separately assembled (or compiled) functions

together by studying the algorithms in Algorithms 7.3 and 7.4. In particular, notice that the

assembler (or compiler) must include other information in addition to machine cod e in the object

file. The additional information includes:

1. The name of the function.

2. The name of each external memory reference.

3. The location relative to the beginning of the function where the external mem ory reference

is made.

150 CHAPTER 7. PROGRAMMING IN ASSEMBLY LANGUAGE

Algorithm 7.3: First pass of a two-pass linker.

Data: GlobalSymbolT a ble with each entry a Symbol /Nu m ber pair

Data: LocationCount er

LocationCounter 0; 1

open first object file; 2

while more object files do 3

GlobalSymbolTable.Symbol function name; 4

GlobalSymbolTable.Number LocationCounter; 5

determine n umber of bytes required by the function; 6

LocationCounter LocationCounter + number of bytes; 7

open next object file; 8

Algorithm 7.4: Second pass of a two-pass linker.

Data: MemoryP ointer

given: GlobalSym bol T able from Pass One

MemoryPointer ad dress f r om OS; 1

open first object file; 2

while more object files do 3

while more m achine code do 4

CodeByte Leftarrow read byte of code from object file; 5

*MemoryPointer CodeByte; 6

MemoryPointer Lef tarrow MemoryPointer + 1; 7

while more external memo ry references do 8

get value corresponding to reference from GlobalSymbolT a ble; 9

determine where value should be stored; 10

store value in code that was just loaded; 11

open next object file; 12

7.4 Creating a Program in Assembly Language

Since w e are c oncerned with assembly language in this book, let us go through the steps of

creating a program for the assembly language sou rce code in Listing 7.5.

First, Figur e 7.1 is a screen shot of what I did with my typing in boldface. The notation I

use here assumes that I am doing this for a class named CS 252, and my instructor has specified

that each project should be submitted in a directory named CS252lastNameNN, where lastName

is the student's surname and NN is the project number. I have appended .0 to the project folder

name for my own use. As I develop my projec t, subsequent versions will be numbered .1, .2, . . . .

Let us go through the steps in Figure 7.1 one line at a time, explaining each line.

bob$ mkdir CS252plantz01.0

I create a directory na med "CS252plantz01.0." All the files that you create for each

program s hould be kept in a separate directory only for that program .

bob$ cd CS252plantz01.0/

I make the newly created subdirectory the current working directory.

bob$ ls

bob$ pwd /home/bob/CS252/CS252plantz01.0

These two commands show that the new subdirecto ry is empty and where m y current

working directory is located within the file hierarchy.

bob$ emacs doNothingProg.s

7.4. CREATING A PROGRAM IN ASSEMBLY LANGUAGE 151

bob$ mkdir CS252plantz01.0

bob$ cd CS252plantz01.0/

bob$ ls

bob$ pwd /home/bob/CS252/CS252plantz01.0

bob$ emacs doNothingProg.s

This is whe re I used emacs to enter the program from Listing 7.5.

bob$ ls

doNothingProg.s

bob$ as -gstabs -o doNothingProg.o doNothingProg.o

bob$ ls

doNothingProg.o doNothingProg.s

bob$ gcc -o doNothingProg doNothingProg.s

bob$ ls

doNothingProg doNothingProg.o doNothingProg.s

bob$ ./doNothingProg

bob$

Figure 7.1: Screen shot of the creation of a program in assembly language.

This starts up the emacs program and creates a new file named "doNothingProg.s." You

may use any text editor. I am now ready to use the emacs editor to enter my program.

emacs is an extre mely p owerful and versatile editor. We could easily spend the rest

of the book simply learning about emacs, but the following very small subse t of emacs

commands will be enough to ge t you started. These are all keyboard commands, which

will allow you to use emacs from a remote system that does not support X-window.

To enter text, simply type.

Use the arr ow keys to move around in existing text.

The "Backspace" or the "Delete" key will delete the character immediately to the

left of the cursor.

Typing ctrl-x then ctrl-s will save yo ur cur r ent work, writi ng over the previous

contents in th e file.

Typing ctrl-x then ctrl-c will exit emacs giving you the option of first saving

unsaved changes.

If y ou wi sh to learn more about emacs, ctrl-h will start the emacs tutorial.

bob$ ls

doNothingProg.s

This shows that I have created the file, doNothingProg.s.

bob$ as -gstabs doNothingProg.s -o doNothingProg.o

bob$ ls

doNothingProg.o doNothingProg.s

On th e first l ine, I invoke the ass e mbler, as. The –gstabs option directs the assembl e r

to include debugging information with the output file. We will very defin itely make

use of the debugger! The -o option is followed by the name of the output (object) fi le.

Yo u sho uld always use the same name as the sou rce file, but with the .o extension. The

second com mand simply shows the new file that has been created in m y d irectory.

bob$ gcc doNothingProg.o -o doNothingProg

bob$ ls

doNothingProg doNothingProg.o doNothingProg.s

152 CHAPTER 7. PROGRAMMING IN ASSEMBLY LANGUAGE

Next I link the object file. Even though there is only one object file, this step is required

in order to bring in the GNU/Linux libraries needed to create an executable progr am.

As in as, the -o option is used to specify the name of a file. In the linking ca s e, this

will be the name of the final produ ct of our efforts.

Note: The linker program is actua lly ld. The problem with using it directly, for ex-

ample,

ld doNothingProg.o -o doNothingProg *** DOES NOT WORK ***

is that you must also explicitly specify all th e libraries that are used. By using gcc for

the linking, the appropriate libraries are automatical ly inclu ded in the linking.

bob$ ./doNothingProg

bob$

Finally, I execute the p rogram (w hich does nothing).

7.5 Instructions Introduced Thus Far

This summary shows the assembly language instructions introduced thus far in the book. It

should be sufficient for doing the exercises in the c urrent chapter. The page number where the

instruction is explained in more d etail, which may be in a subsequent chapter, is also given. The

summary will be repeated and updated, as appropriate, at the end of each succee ding chapter

in the book. This book provides only an introduction to the usage of each instruction. You

need to consult the manuals ([2] [6], [14] [18]) in order to learn all the possible uses of the

instructions.

7.5.1 Instructions

data movement:

opcode source destination action see page:

movs $imm/ %reg %reg/mem move 141

popw %reg/mem pop from stack 163

pushw $imm / %reg /mem push onto stack 163

s = b, w, l, q; w = l, q

arithmetic/logic:

opcode source destination action see page:

cmps $imm/ %reg %reg/mem compare 209

incs %reg/mem increment 220

s = b, w, l, q

program ow control:

opcode location action see page:

call label call function 156

je label jump equal 211

jmp label jump 213

jne label jump no t equal 211

ret return from function 168

7.6 Exercises

The functions you are asked to write in these exercises are not complete programs. You can

check that you have written a valid function by writing a main fu nction in C that c alls the

function y ou h ave written in assembly language. Compile the main f unction with the -c option

so that you get the corresponding object (.o) file. Assemble your assembly language file. Make

7.6. EXERCISES 153

sure that you specify the debugging options w hen compiling/assembling. Use the linking phase

of gcc to link the .o les together. Run your program under gdb and set a breakpoint in your

assembly languag e function. (Hint: you can specify the source file name in gdb commands.) Now

you can verify that your assembly languag e fu nction is being called. If the function returns a

value, you can print that value in the main function using printf.

7-1 (§7.2) Write the C function:

/

*

f.c

*

/

int f(void) {

return 0;

}

in assembly language. Make sure that it assembles with no errors. Use the -S option to

compile f.c and compare gcc's assembly language with yours.

7-2 (§7.2) Write the C function:

/

*

g.c

*

/

void g(void) {

}

in assembly language. Make sure that it assembles with no errors. Use the -S option to

compile g.c and compare gcc's assembly language with yours.

7-3 (§7.2) Write the C function:

/

*

h.c

*

/

int h(void) {

return 123;

}

in assembly language. Make sure that it assembles with no errors. Use the -S option to

compile h.c and compare gcc's assembly language with yours.

7-4 (§7.2) Write three assembly language functions that do nothing but return an integer.

They should each return diff erent, non-z ero, integers. Write a C main function to test you r

assembly language functions. The main function should capture each of the return values

and display them using printf.

7-5 (§7.2) Write three assembly languag e functions that do nothin g but return a character.

They should each return differ ent characters. Write a C main function to test your assembly

language functions. The main function should cap tur e each of the r eturn values and display

them using printf.

7-6 (§7.2, §6.5) Write an assembly language function that returns four characters. The return

value is always in the eax register in our environment, so you can store four characters

in it. The easiest way to do this is to determine the hexadecim al value for each character,

then combine them so you c an store one 32-bit hexadecimal value in eax.

Write a C main f unction to test your assembly language function. The main function should

capture the return values and display them using the write system call.

Explain the order in which they are displayed.

Chapter 8

Program Data Input, Store,

Output

Most programs f ollow a similar pattern:

1. Read data from an input device, such as the keyboard, a disk file, the internet, etc., into

main memory.

2. Load data from main memory into CPU r egisters.

3. Perform arithmetic/logic operations on the data.

4. Store the results in main memory.

5. Write the results to an output device, such as the scre en, a disk le, audio speakers, etc.

In this chapter you will learn how to call functions that can read input from the keyboard,

allocate memory for storing data, and write output to the screen.

8.1 Calling write in 64-bit Mode

We start with a pro gram that has no input. It simply writes constant data to the screen the

"Hello World" pro gram.

We will use the C system call function write to display the text on the screen and show how

to call it in assembly language. As we saw in Section 2.8 (page 22) the write function requires

three arguments. Reading the argument list from left to right in Listing 8.1:

1. STDOUT

_

FILENO is the file descriptor of standard out, normally the screen. This symbolic

name is defined in the unistd.h header file.

2. Although the C syntax allo ws a programmer to place the text string here, only its address

is passed to write, not the en tire string.

3. The programmer has counted the number of characters in the text string to write to

STDOUT

_

FILENO.

1 /

*

2

*

helloWorld2.c

3

*

4

*

"hello world" program using the write() system call.

5

*

Bob Plantz - 8 June 2009

6

*

/

7 #include <unistd.h>

8

154

8.1. CALLING WRITE IN 64-BIT MOD E 155

9 int main(void)

10 {

11

12 write(STDOUT

_

FILENO, "Hello world.\n", 13);

13

14 return 0;

15 }

Listing 8.1: "Hello world" program using the write system call function (C).

This program uses only constant data the text string "Hello world." Constant data used by a

program is part of the program itself and is not changed by the program.

Looking at the compiler-generated assembly language in Listing 8.2, the constant data ap-

pears on line 4, as indicated by the comment added on that line. Comments have also been

added on lines 11 14 to explain the argument set up for the call to write.

1 .file "helloWorld2.c"

2 .section .rodata

3 .LC0:

4 .string "Hello world.\n" # constant data

5 .text

6 .globl main

7 .type main, @function

8 main:

9 pushq %rbp

10 movq %rsp, %rbp

11 movl $13, %edx # third argument

12 movl $.LC0, %esi # second argument

13 movl $1, %edi # first argument

14 call write

15 movl $0, %eax

16 leave

17 ret

18 .size main, .-main

19 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"

20 .section .note.GNU-stack,"",@progbits

Listing 8.2: "Hello world" program using the write system call function (gcc assembly lan-

guage).

Data can only be located in one of two places in a computer:

in mem ory, o r

in a CPU register.

(We are ignoring the case of r eading from an input device or writing to an output device here.)

Recall fr om the d iscussion of memory segments on page 137 that the Linux kernel uses different

memory segments for the various parts of a program. The directive on line 2,

2 .section .rodata

uses the .section assembler directive to direct the assembler to store the data that follows in a

"read-only data" section in the object file. Even though it begins with a '.' character .rodata is

not an assembler direc tive but the name of a section in an ELF file.

Your first thought is probably that the .rodata section should be loaded into a data segment

in memory, but recall that data memory segments are read/write. Thu s .rodata sections are

mapped into a text segment, which is a read-only memory segment.

The .string directive on line 4,

3 .LC0:

4 .string "Hello world.\n" # constant data

156 CHAPTER 8. PROGRAM DATA INPUT, STORE, OUTPUT

allocates enough bytes in memory to hold each o f the characters in the text string, plus one for

the NUL character at the end. The first byte contains the ASCII code for the character 'H',

the second the ASCII code for 'e', etc. Notice that the last character in this string is '\n',

the newline character; it occupies only one byte of memory. So fourteen byte s of memory are

allocated in the .rodata section in this program, and each byte is set to the corresponding ASCII

code for each character in the text string. The label on line 3 provides a symbolic name for the

beginning add r ess of the text string so that the program can refer to this memory location.

The most common directives for allocating memory for data are shown in Table 8.1. If these

[label] .space expre ssion evaluates expression and allocates that many

bytes; memory is not initialized

[label] .string "text" initializes memory to null-terminated string

[label] .asciz "text" same as .string

[label] .ascii "text" initializes memory to the string without null

[label] .byte expressi on alloc ate s one byte and initializes it to the value

of expression

[label] .word expressi on alloc ate s two bytes and initializes them to the

value of exp ression

[label] .long expressi on alloc ate s f our bytes and initializes them to the

value of exp ression

[label] .quad expressi on alloc ate s eight bytes and initializes them to the

value of exp ression

Table 8.1: Common assembler directives for allocating memory. The label is optional.

are used in the .rodata section, the values can only be used as constants in the prog ram.

The assembly language instruction used to call a function is

call functionName

where functionName is the name of the f unction being called. The call instruction do es two

things:

call pushes the

return address

onto the call

stack.

1. The address in the rip register is pushed onto the call stack. (The call stack is described in

Section 8.2.) Recall that the rip register is incremented immediately after the instruction

is fetched. Thus, when the call instruction is ex ecuted, the value that gets pushed onto

the stack is the address of the instruction immediately following the call instruction. That

is, the return address gets p ushed onto the stack in this first step.

2. The address that functionName resolves to is placed in the rip register. This is the ad-

dress of the function that is being called, so the next instruction to be fetched is the fi r st

instruction in the called function.

The call of the write function is made on line 14.

14 call write

Before the call is made, any arguments to a function must be stored in their proper locations,

as spec ified in the ABI [25]. Up to six argume nts are passed in the general purp ose registers.

Reading the argument list from left to right in the C code, the order of using the registers is

given in Table 8.2. If there are more than six arg uments, the additional ones are pushed onto

the call stack, but in right-to-left order. This will be described in Section 11.2.

Each of the three arguments to write in this pro gram the file descriptor, the address of

the text string, and the number of bytes in the te xt string is also a constant whose value is

known when the prog r am is rst load ed into memory and is not changed by the program. The

locations of these constants on lines 11 13,

8.1. CALLING WRITE IN 64-BIT MOD E 157

Argument Register

first rdi

second rsi

third rdx

fourth rcx

fifth r8

sixth r9

Table 8.2: Order of passing arguments in general purpose registers.

11 movl $13, %edx # third argument

12 movl $.LC0, %esi # second argument

13 movl $1, %edi # first argument

are not as obvious. The location of the data that an instruction operates on must be specified

in the instruction and its operands. The manner in which the instruction uses an operand to

locate the data is called the addressing mode. Assembly language includes a syntax that the

programmer uses to spec ify the addressing mode for each operand. When the assembler trans-

lates the assembly language into machine code it sets the bit pattern in the instruction to the

corresponding addressing mode for each operand. Then when the CPU de codes the in struction

during program execution it knows where to locate the data represente d by that operand.

The simplest addressing m ode is register direct. The syntax is to simply use the name of a

register, and the data is located in the r egister itself.

Register direct: The data value is located in a CPU register.

syntax: the name of the reg ister with a "%"prefix

example: movl %eax, %ebx

The instructions on lines 9 10,

9 pushq %rbp

10 movq %rsp, %rbp

use the register direct addressing mod e f or their op erands. The pushq instruction has only one

operand, and the movq has two.

Each of the instructions on lines 11 13 u se the register direct addressing mode for the

destination, but the source operand is the data itself. So all three instructions employ the

immediate data ad dressing mode for the source.

Immediate data: The data value is located in memory im mediately after the instruction. This

addressing mode can only be used for a source operand .

syntax: the data value with a "$" prefix

example: movq $0x123456789abcd, %rbx

Although the register direct addressing mode can be used to specify either a source or destina-

tion operand, or both, the immediate data addressing mode is valid only for a source operand.

Let us consider the mechanism by which the control unit accesses the data in the immediate

data ad dressing mode. First, we should say a few words about how a control unit executes an

instruction. Although a programmer thinks of each instruction as being executed ato mically, it

is actually don e in discrete steps by the control unit. In addition to the registers used by a pro-

grammer, the CPU contains many registers that cannot be used directly. The control unit uses

these registers as "scratch paper" for tempor ary storage of interme diate values as it progresses

through the steps of executing an instruction.

158 CHAPTER 8. PROGRAM DATA INPUT, STORE, OUTPUT

Now, recall that when the contro l unit fetches an instruction from memory, it automatically

increments the instruction pointer (rip) to the next memory location immediately following the

instruction it j ust fetched. Usually, the instruction p ointer would now be pointing to the next

instruction in the program. But in the case of the immediate data addressing mode, the "$"

symbol tells the assembler to store the ope rand at this location.

As the con trol unit decodes the ju st fetched instruction, it detects that the im mediate d ata

addressing mode has been used for the source operand. Since the instruction pointer is currently

pointing to the data, it is a simple matter for the control unit to fetch it. Of course, when it does

this fetch, the control unit increments the instruction pointer by the size of the data it just

fetched.

Now the con trol unit has the source data, so it can continue executing the instruction. And

when it has completed the current instruction, the instruction pointer is already po in ting to the

next instruction in the prog r am .

The constants in the instructions on lines 11 and 13 are obvious. (The symbolic name

"STDOUT

_

FILENO" is defined in unistd.h as 1.) The constant on line 12 is the label .LC0, which

resolves to the address of this memory location. As explained above, this address will be in the

.rodata section when the program is loaded into memory. The address is not known within the

.text se gment when the fi le is first compiled. The compiler leaves space for it immediately after

the instruction (immediate addressing mode). Then wh en the address is determined d uring the

linking phase, it is plu gged in to the space left for it. The net r esult is that the address becomes

immediate data when the program is executed.

So the following code sequence:

11 movl $13, %edx # third argument

12 movl $.LC0, %esi # second argument

13 movl $1, %edi # first argument

14 call write

implements the C statement

13 write(STDOUT

_

FILENO, "Hello world.\n", 13);

in the original C program (Listing 8.1, page 154).

Some notes about the write function call:

The characters written to the screen must be stored in memory.

The numbe r of bytes actually written to the screen is returned in the eax register. So if the

current functio n is using eax, the value will be changed by the call to write.

The write function is a C wrapper that sets up the registers for the syscall in struction.

Unfortunately, there is no guarante e that it re stores the values that were in the registers

when it was called.

8.2 Introduction to the Call Stack

Most variables are stored on the call stack. Be fore describing how this is done, we need to

understand what stacks are an d how they are used.

A stack is an area of memory for storing data items tog ether with a pointer to the "top" of the

stack. Informally, you can thin k of a stack as being org anized very much like a stack of dinner

plates on a shelf. We can only access the one item at the top of the stack. There are only two

fundamental operations on a stack:

push data-item causes a the data-item to be placed on the top of the stack and moves the

stack pointer to point to this latest item.

pop location causes the data item on the top of the stack to be removed and placed at

location and moves the stack pointer to point to the next item left on the stack.

8.2. INTRODUCTION TO THE CALL STACK 159

Notice that a stack is a "last in, first out" (LIFO) data structure. That is, the last thin g to be

pushed onto the stack is the first thing to be popped off.

To illustrate the stack concept let us use our dinner plate example. Say we have three dif-

ferently colored dinner plates, a red one on the dining table, a green on e on the kitchen counte r,

and a blue one on the bed side table. Now we will stack them on the shelf in the following way:

1. p ush dining- table- plate

2. p ush kitchen-counter-plate

3. p ush bedside-table- plate

At this point, our stack looks like:

blue plate

green plate

red plate

There is no way

to tell where the

dinner plates

came from.

Now if we perform the operation:

1. p op kitchen-counter

We will have a blue dinner plate on our kitchen counter, and our stack will look like:

green plate

red plate

A stack must be used according to a very strict discipline:

1. Always push an item onto the stack before popping anything off.

2. N ever pop more things off than you have pushe d on.

3. Always pop everything off the stack.

If you have no use for the item(s) to be popped off, you may simply adjust the stack pointer.

This is equivale nt to discarding the items that are pop ped off. (Our dinner plate analogy

breaks down here.)

A go od way to maintain this discipline is to think of the use of paren theses in an algebraic

expression. A push is analogous to a left parenthe sis, and a pop is analo gous to a right paren-

thesis. An attempt to push too many items onto a stack causes stack overflow. And an attempt

to pop items off the stack beyond the "bottom" causes stack underflow .

Next we will explore how we mig ht implement a stack in C. Our program will allocate space

in memory for storing data elements and provide both a push operation and a pop operation. A

simple program is shown in Listing 8.3.

1 /

*

2

*

stack.c

3

*

implementation of push and pop stack operations in C

4

*

Bob Plantz - 9 June 2009

5

*

6

*

/

7

8 #include <stdio.h>

9

10 int theStack[500];

11 int

*

stackPointer = &theStack[500];

160 CHAPTER 8. PROGRAM DATA INPUT, STORE, OUTPUT

12

13 /

*

14

*

precondition:

15

*

stackPointer points to data element at top of stack

16

*

postcondtion:

17

*

address in stackPointer is decremented by four

18

*

dataValue is stored at top of stack

19

*

/

20 void push(int dataValue)

21 {

22 stackPointer--;

23

*

stackPointer = dataValue;

24 }

25

26 /

*

27

*

precondition:

28

*

stackPointer points to data element at top of stack

29

*

postcondtion:

30

*

data element at top of stack is copied to

*

dataLocation

31

*

address in stackPointer is incremented by four

32

*

/

33 void pop(int

*

dataLocation)

34 {

35

*

dataLocation =

*

stackPointer;

36 stackPointer++;

37 }

38

39 int main(void)

40 {

41 int x = 12;

42 int y = 34;

43 int z = 56;

44 printf("Start with the stack pointer at %p",

45 (void

*

)stackPointer);

46 printf(", and x = %i, y = %i, and z = %i\n", x, y, z);

47

48 push(x);

49 push(y);

50 push(z);

51 x = 100;

52 y = 200;

53 z = 300;

54 printf("Now the stack pointer is at %p",

55 (void

*

)stackPointer);

56 printf(", and x = %i, y = %i, and z = %i\n", x, y, z);

57 pop(&z);

58 pop(&y);

59 pop(&x);

60

61 printf("And we end with the stack pointer at %p",

62 (void

*

)stackPointer);

63 printf(", and x = %i, y = %i, and z = %i\n", x, y, z);

64

65 return 0;

66 }

Listing 8.3: A C implementation of a stack.

8.2. INTRODUCTION TO THE CALL STACK 161

Read the code in Listing 8.3 and note the following:

The program uses a pointer, stackPointer, to keep track of the data value that is curren tly

at the top of the stack.

The stack pointer is initialized to point to one beyond the highest array element in the

array that is allocated for the stack. Thus the stack must "grow" from high-numbered

elements to low-nu mbered elements as items are pushed onto the stack.

A push operation pre-de crements the stack pointer bef ore storing an item on the stack.

A pop operation post-increments the stack p ointer after retrieving an item fr om the stack.

The states of the variables from the program in Listing 8.3 are shown ju st after the stack is

initialized in Figure 8.1. Notice that the stack pointer is pointing beyond the end of the array as

a result of the C statement,

int

*

stackPointer = &theStack[500];

The stack is "empty" at this point.

????

????

????

????

????

????

theStack[499]

theStack[498]

theStack[497]

theStack[496]

theStack[2]

theStack[1]

theStack[0]

stackPointer

Figure 8.1: The stack in Listing 8.3 when it is first initialized. "????" means that the value in

the array elem ent is undefined.

After pushing one value onto the stack

push(x);

the stack appears as shown in Figure 8.2. Here you can see that since the push operation pre-

decrements the stack pointer, the first data item to be p laced on the stack is store d in a v alid

portion of the array.

????

12

????

????

????

????

theStack[499]

theStack[498]

theStack[497]

theStack[496]

theStack[2]

theStack[1]

theStack[0]

stackPointer

Figure 8.2: The stack with one data item on it.

162 CHAPTER 8. PROGRAM DATA INPUT, STORE, OUTPUT

After all three data items x, y, and z are pushed onto the stack, it appears as shown

in Figure 8.3. The stack pointer always points to the d ata item that is at the top of the stack.

Notice that this stack is "growing" toward lower number ed elements in the array.

Most stacks grow

toward lower

addresses. We

tend to draw

them "upside

down."

????

12

34

56

????

????

theStack[499]

theStack[498]

theStack[497]

theStack[496]

theStack[2]

theStack[1]

theStack[0]

stackPointer

Figure 8.3: The stack with three data items on it.

After changing the values in the variables, the program in Listing 8.3 restores the original

values by popping from the stack in reverse o rder. The state of the stack after all three pops

are shown in Figure 8.4. Even though we know that the values are still store d in the array, the

permissible stack operations push and pop will not allow us to access these values. Thus,

from a programming point of view, the values are gone.

????

12

34

56

????

????

theStack[499]

theStack[498]

theStack[497]

theStack[496]

theStack[2]

theStack[1]

theStack[0]

stackPointer

Figure 8.4: The stack after all three data items have been popped off. Even though the values

are still stored in the array, it is conside red a prog ramming error to acc ess them.

The stack mu st be considere d as "empty" whe n it is in this state.

Our very simple stack in this program does not protect against stack overflow or stack un-

derflow. Most software stack impleme ntations also include operations to check for an empty

stack and for a full stack. And many implementations include an operation for looking at, but

not removing, the top elem ent. But these are not the main features of a stack data structure, so

we will not be concerned with them here.

In GNU/Linux, as w ith most operating systems, the call stack has already been set up for us.

We do no t need to worry abo ut allocating the memory or initializing a stack pointer. When the

operating system transfers control to our program, the stack is ready for us to use.

The x 86- 64 architecture uses the rsp register for the call stack pointer. Although you could

create yo ur own stack and stack pointer, several instructions use the rsp r egister implicitly. And

all these instructions cause the stack to grow from high memo r y addresses to low (see Exercise

8-2). Although this may seem a bit odd at first, there are some good reasons for doing it this

way.

In particular, think about how you might organize things in memory. Re call that the instruc-

tion poin ter (the rip register) is automatically incremented by the control unit as your program

8.2. INTRODUCTION TO THE CALL STACK 163

is exe cuted. Programs com e in vastly different sizes, so it makes sense to store the program in-

structions at low me mory addresses. This allows maximum flexibility with respect to program

size.

The stack is a dynamic structure. You do not know ahead of time how much stack space will

be required by any given program as it executes. It is impossible to know how much space to

allocate for the stack. So you w ould like to allocate as much space as possible, and to keep it as

far away from the programs as possible. The solution is to start the stack at the highest addre ss

and have it grow toward lower addresses.

This is a hig hly simplified rationalization for implementing stacks such that they grow

"downward" in memory. The organ ization of various program ele ments in memory is much

more complex than the simple description given here. But this may help you to understand that

there are some good reasons for what may seem to be a rather odd implementation.

The assembly language push instruction is:

pushq s ource

The pushq instruction causes two actions:

1. The value in the rsp register is decremented by eight. That is, eight is subtracted from the

stack pointer.

A push changes

rsp before

putting value on

the stack.

2. The eight bytes of the source operand are copied into memory at the new lo c ation pointed

to by the (now decremen ted) stack pointer. The state of the operand is not changed.

The assembly language pop instruction is:

popq destination

The popq instruction causes two actions:

1. The eight bytes in the memory location pointed to by the stack pointer are copied to the

destination operand. The previous state of the operand is replaced by the value from

memory.

A pop changes

rsp after getting

value from the

stack.

2. The value in the rsp register is incremented by eight. That is, eight is added to the stack

pointer.

In the Intel syntax the "q" is not appended to the instruction.

push source

Intel®

Syntax pop destination

The size of the operand, eight bytes, is determined by the op erating system. When executing

in 64-bit mode, all pushes and pops operate on 64-bit values. Unlike the mov instruction, you

cannot push or p op 8-, 16-, or 32-bit values. This means that the address in the stack poin ter

(rsp register) will always be an integral multiple of eight.

A good e xample of using a stack is saving registers within a function. Recall that there is

only one se t of registers in the CPU. When one function calls another, the called fu nction has no

way of knowing which registers are being used by the calling function. The ABI [25] specifies

that the values in r egisters rbx, rbp, rsp, and r12 r15 be preserved by the called function (see

Table 6.4 on page 121).

The program in Listing 8.4 shows how to save and restore the values in these registers.

Notice that since a stack is a LIFO structure, it is necessary to pop the values off the top of the

stack in the reverse order from how they were pu shed on.

1 # saveRegisters1.s

2 # The rbx and r12 - r15 registers must be preserved by called function.

3 # Sets a bit pattern in these registers, but restores original values

164 CHAPTER 8. PROGRAM DATA INPUT, STORE, OUTPUT

4 # in the registers before returning to the OS.

5 # Bob Plantz - 8 June 2009

6

7 .text

8 .globl main

9 .type main, @function

10 main:

11 pushq %rbp # save caller's frame pointer

12 movq %rsp, %rbp # establish our frame pointer

13

14 pushq %rbx # "must-save" registers

15 pushq %r12

16 pushq %r13

17 pushq %r14

18 pushq %r15

19

20 movb $0x12, %bl # "use" the registers

21 movw $0xabcd, %r12w

22 movl $0x1234abcd, %r13d

23 movq $0xdcba, %r14

24 movq $0x9876, %r15

25

26 popq %r15 # restore registers

27 popq %r14

28 popq %r13

29 popq %r12

30 popq %rbx

31

32 movl $0, %eax # return 0

33 popq %rbp # restore caller's frame pointer

34 ret # back to caller

Listing 8.4: Save an d restore the contents of the rbx and r12 r15 r egisters. See Table 6.4, page

121, for the re gisters that should be saved/restored in a function if they are used in

the function.

The problem with this technique is maintaining the ad dress in the stack pointer at a 16-byte

boundary. Another way to save/restore the registers will be given in Section 11.2.

8.3 Local Variables on the Call Stack

Now we see that we can store values on the stack by pushing them, and that the push operation

decreases the value in the stack pointer register, rsp. In o ther words, allocating variables on the

call stack involves s ubtracting a value from the stack pointer. Similarly, deallocating variables

from the call stack involves adding a value to the stack pointer.

From this it follows that we can create local variables on the call stack by simply subtracting

the number of bytes r e quired by each variable from the stack pointer. This does not store any data

in the variables, it simply sets aside memory that we can use. (Perhaps you have e xperienced

the error of fo rgetting to initialize a local variable in C !)

Next, we have to figure out a way to access this reserved data area on the call stack. Notice

that there are no labels in this area of memory. So we cannot directly use a name like we did

when accessing memory in the .data segment.

We co uld use the popl and pushl instructions to store data in this area. For example,

popl %eax

movl $0, %eax

pushl %eax

8.3. LOCAL VARIAB LES ON THE CALL STACK 165

could be used to store zero in a variable. But this technique would obviously be very tedious,

and any changes made to your code would almost certainly lead to a great deal of debugging.

For example, can you figure out the reason I had to do a pop before pushing the value onto the

stack? (Recall that the four bytes have already been reserved on the stack.)

At first, it may seem tempting to use the stack pointer, rsp, as the reference p ointer. But this

creates complications if we wish to use the stack within the function.

A better te chnique would be to maintain another pointer to the local variable area on the

stack. If we do not change this pointe r throughout the function , we can always use the base

register plus o ffset addressing mode to directly access any of the loc al variables. The syntax is:

offset(register_name)

Intel®

Syntax

[register_name + offset]

When it is zero, the offset is not required.

base register plus offset: The data value is located in memory. The address of the memory

location is the sum of a value in a register plu s an offset value, which can be an 8-, 16- or

32-bit signed integer.

syntax: place parentheses around the register name with the offset v alue imme-

diately before the left parenthesis.

examples : -8(%rbp); (%rsi); 12(%rax)

Intel®

Syntax

[rbp - 8]; [rsi] ; [rax + 12]

The appropriate register for implementing this is the frame pointer, rbp.

When a function is called, the calling function begins the process of creating an area on

the stack, called the stack frame. Any arguments that need to be passed on the call stack are

first pushed onto it, as described in Section 11.2. Then the call instruction pushes the return

address onto the call stack (page 156).

The first thing that the called func tion must do is to complete the creation of the stack frame.

The function prologue, first introduced in Section 7.2 (page 133), performs the following actions

at the ve ry beginning of each fun ction:

1. Save the c aller's value in the frame pointer on the stack.

2. Copy the current value in the stack pointer to the frame pointer.

3. Subtract a value from the stack pointer to allow for the local variables.

Once the functio n prologue has completed the stack frame, we observe that:

The local variables are located in an area of the call stack between the addresses in the

rsp and rbp registers.

The rbp register is a po inter to the bottom (the numerically highest address) of the local

variable area.

The remaining area of the stack can be accessed using the stack poin ter (rsp) as always.

Notice that each local variable is located at some fixed offset from the base register, rbp. In fact,

it's a negative offset.

Listing 8.5 is the compiler-ge nerated assembly language for the program in Listing 2.4 (page

23). Comments have bee n added to explain the parts of the code being discussed here.

1 .file "echoChar1.c"

2 .section .rodata

3 .LC0:

166 CHAPTER 8. PROGRAM DATA INPUT, STORE, OUTPUT

4 .string "Enter one character: "

5 .LC1:

6 .string "You entered: "

7 .text

8 .globl main

9 .type main, @function

10 main:

11 pushq %rbp # save caller's frame pointer

12 movq %rsp, %rbp # establish our frame pointer

13 subq $16, %rsp # space for local variable

14 movl $21, %edx # 21 characters

15 movl $.LC0, %esi # address of "Enter ... "

16 movl $1, %edi # STDOUT

_

FILENO

17 call write

18 leaq -16(%rbp), %rsi # address of aLetter var.

19 movl $1, %edx # 1 character

20 movl $0, %edi # STDIN

_

FILENO

21 call read

22 movl $13, %edx # 13 characters

23 movl $.LC1, %esi # address of "You ... "

24 movl $1, %edi # STDOUT

_

FILENO

25 call write

26 leaq -16(%rbp), %rsi # address of aLetter var.

27 movl $1, %edx # 1 character

28 movl $1, %edi # STDOUT

_

FILENO

29 call write

30 movl $0, %eax # return 0;

31 leave # undo stack frame

32 ret # back to caller

33 .size main, .-main

34 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"

35 .section .note.GNU-stack,"",@progbits

Listing 8.5: Echoing characters ente red from the keyboard (gcc assembly language). Comments

added. Refer to Listing 2.4 for the original C version.

The function begins by pushing a copy of the caller's frame pointer (in the rbp register) onto

the call stack, thus saving it. Next it sets the frame pointer fo r this register at the current top

of the stack. These two actions establish a reference poin t to the stack frame for this function.

Next the prog ram allocates sixteen bytes on the stack for the local variable, thus growing the

stack frame by sixteen bytes. It may seem wasteful to set aside so much m emory since the only

variable in this program requires only one byte of memory, but the AB I [25] specifies that the

stack pointer (rsp) should be on a sixteen-byte address boundary before calling another function.

The easiest way to comply with this specification is to allocate memory for local v ariables in

multiples of sixteen.

Figure 8.5 shows the state of the stack just after the prologue has been executed. The return

address to the calling function is safely stored on the stack, followed by the caller's frame pointer

value. The stack pointer (rsp) has been moved up the stack to allow memory for the local

variable. If this function needs to push data onto the stack, such activity will not interfere with

the local variable, the caller's frame pointer value, nor the return address. The frame pointer

(rbp) provides a reference point for accessing the local variable.

IMPORTANT: The space for the local variables must be allocated immediately after establishing the

frame pointer. Any other use of the stack within the function, e.g., saving registers, must be done

after allocating space for local variables.

Most of the code in the body o f the func tion is already f am iliar to you, but the instruction

8.3. LOCAL VARIAB LES ON THE CALL STACK 167

1 byte for aLetter

Unused memory (15 bytes)

Memory available

for u se as

a stack by

this function

rsp

rbp +8

+0

-8

-16

Return address

Caller's rbp

Figure 8.5: Local variables in the program f r om Listing 8.5 are allocated on the stack. Numbers

on the lef t are offsets from the ad dress in the frame pointer ( rbp register).

that loads the address of the local variable, aString into the rsi register:

18 leaq -16(%rbp), %rsi # address of aLetter var.

is new. It uses the base register plus offset addressing mode for the source.

We can see from the instruction on line 18 that the aString variable is located negative

sixteen byte s away from the address in the rbp register.

As with the write function, the second argument to the read function must be the ad dress

of a variable. However, the address of aString cannot be known when the program is compiled

and linked because it is the address of a variable that exists in the stack frame. There is no way

for the compiler or linker to know where this function's stack frame will be in me mory when it

is called. The address of the variable must be computed at run time.

Each instruction that accesses a stack frame variable must co mpute the variable's addre ss,

which is called the effective addre ss. The instruction for computing addresses is load effective

address leal for 32-bit and leaq for 64-bit addresses. The syntax of the lea instruction is

Use lea to get a

memory address;

use mov to access

what is stored at

the address.

leaw source, %register

where w = l for 32-bit, q for 64-bit.

Intel®

Syntax

lea r e gister, so urce

The sour c e operan d must be a memory location. The lea instruction comp utes the effec-

tive address of the source operand and stores that address in the destination register. So the

instruction

leaq -16(%rbp), %rsi

takes the value in rbp ( the base address of this function's stack frame), adds -16 to it, and stores

this sum in rsi. Now rsi contains the address of the variable aLetter.

So the following code sequence:

18 leaq -16(%rbp), %rsi # address of aLetter var.

19 movl $1, %edx # 1 character

20 movl $0, %edi # STDIN

_

FILENO

21 call read

implements the C statement

14 read(STDIN

_

FILENO, &aLetter, 1); // one character

in the original C program (Listing 2.4, page 23).

Some notes about the read function call:

The characters read from the keyboard must be stored in memory. You cannot pass the

name of a cpu register to the read function.

168 CHAPTER 8. PROGRAM DATA INPUT, STORE, OUTPUT

The number of bytes actually read from the ke yboard is retur ned in the eax register. So if

the current func tion is using eax, the value will be changed by the call to read.

The read function is a C wrapper that se ts up the reg isters for the syscall instruction.

Unfortunately, there is no guarante e that it re stores the values that were in the registers

when it was called.

IMPORTANT: Since neither the write nor the read system call functions are guaranteed to restore

the values in the registers, your program must save any required register values before call ing

either of these f unctions.

There is also a new in struction on line 31:

31 leave # undo stack frame

Just before this function exits the portion of the stack frame allocated by this function must be

released and the value in the rbp register restored. The leave instruction performs the actions:

movq %rbp, %rsp

popq %rbp

which effectively

1. d eletes the local variables

2. restores the caller's frame pointer value

After the epilogu e has been executed, the stack is in the state shown in Figure 8.6. The

1 byte for aLetter

Unused memory (15 bytes)

rsp

+8

+0

-8

-16

Return address

Caller's rbp

Figure 8.6: Local variable stack area in the program from Listing 8.5. Althou gh the values in the

gray area may remain they are invalid; using them at this point is a programming

error.

stack poin ter (rsp) points to the address that will r eturn p rogram flow back to the instruction

immediately after the call instruction that called this function. Althoug h the data that was

stored in the memory which is now above the stack pointer is still there, it is a violation of stack

protocol to access it.

One more step remains in completing execution of this function returning to the calling

function. Since the return address is at the top of the call stack, this is a simple matter of

popping the address from the top of the stack into the rip register. This requ ires a special

instruction,

ret

which does not require any arguments.

Recall that there are two classes of local variables in C:

Automatic variables are created when the function is fir st entered. They are dele ted upon ex it

from the function, so any value store d in them d uring execution of the function is lost.

Static variables are created when the program is rst started. Any values stored in them

persist thro ughout the lifetime of the program.

8.3. LOCAL VARIAB LES ON THE CALL STACK 169

Most local variables in a function are automatic variables. General purpose registers are

used for local variables whe never possible. Since there is only one set of general purpose regis-

ters, a function that is using one for a variable m ust be careful to save the value in the register

before calling another functio n. Register usage is specified by the A BI [25] as show n in Table

6.4 on page 121. But you should not write code that depends upo n everyo ne else following these

recomme ndations, and there are only a small num ber of registers available for use as variables.

In C/C++, most of the automatic variables are typically allocated on the call stack. As you have

seen in the discussion above, they are created (au tomatically) in the p r ologue when the function

first starts and are deleted in the epilogue just as it ends. Static variables must be stored in the

data segment.

We are now in a position to write the echoChar program in assembly language. The pro gram

is shown in Listing 8.6.

1 # echoChar2.s

2 # Prompts user to enter a character, then echoes the response

3 # Bob Plantz - 8 June 2009

4

5 # Useful constants

6 .equ STDIN,0

7 .equ STDOUT,1

8 # Stack frame

9 .equ aLetter,-16

10 .equ localSize,-16

11 # Read only data

12 .section .rodata

13 prompt:

14 .string "Enter one character: "

15 .equ promptSz,.-prompt-1

16 msg:

17 .string "You entered: "

18 .equ msgSz,.-msg-1

19 # Code

20 .text # switch to text section

21 .globl main

22 .type main, @function

23 main:

24 pushq %rbp # save caller's frame pointer

25 movq %rsp, %rbp # establish our frame pointer

26 addq $localSize, %rsp # for local variable

27

28 movl $promptSz, %edx # prompt size

29 movl $prompt, %esi # address of prompt text string

30 movl $STDOUT, %edi # standard out

31 call write # invoke write function

32

33 movl $2, %edx # 1 character, plus newline

34 leaq aLetter(%rbp), %rsi # place to store character

35 movl $STDIN, %edi # standard in

36 call read # invoke read function

37

38 movl $msgSz, %edx # message size

39 movl $msg, %esi # address of message text string

40 movl $STDOUT, %edi # standard out

41 call write # invoke write function

42

43 movl $2, %edx # 1 character, plus newline

44 leaq aLetter(%rbp), %rsi # place where character stored

170 CHAPTER 8. PROGRAM DATA INPUT, STORE, OUTPUT

45 movl $STDOUT, %edi # standard out

46 call write # invoke write function

47

48 movl $0, %eax # return 0

49 movq %rbp, %rsp # delete local variables

50 popq %rbp # restore caller's frame pointer

51 ret # back to calling function

Listing 8.6: Echoing characters entered from the keyboard (programmer assembly language).

This program introduces another assembler directive (lines 6,7,9,10,15,18):

.equ name, expression

The .equ directive ev aluates the ex pression and sets the nam e equivalent to it. Note that the

expression is evaluated during assem bly, not during program execution. In essence, the name

and its value are placed on the symbol table during the first pass of the assembler. During the

second pass, wherever the programmer has used "name" the assembler substitutes the number

that the expression evaluated to during the first pass.

You see an example on line 9 of Listing 8.6:

9 .equ aLetter,-16

In this case the expression is simply -16. Then when the symbol is used on line 34:

34 leaq aLetter(%rbp), %rsi # place to store character

the assembler substitutes -16 during the seco nd pass, and it is exactly the same as if the pro-

grammer had written:

leaq -16(%rbp), %rsi # place to store character

Of course, using .equ to provide a symbolic name makes the code much easier to read.

An ex ample of a more complex expression is shown on lines 13 15:

13 prompt:

14 .string "Enter one character: "

15 .equ promptSz,.-prompt-1

The "." means "this address". Re call that the .string directive allocates one byte for each char-

acter in the text string, plus on e for the NUL character. So it has allocated 22 bytes here. The

expression co mputes the difference between the beginning and the end of the memory allocated

by .string, minus 1. Thus, promptSz is entered on the symbol table as being equivalent to 21.

And on line 28 the programmer can use this symbolic name,

28 movl $promptSz, %edx # prompt size

which is much easier than counting each of the characters by hand and writing:

movl $21, %edx # prompt size

More importantly, the programmer can change the text string and the assembler will compute

the new length and change the number in the instruction automatically. This is obviously much

less prone to error.

Be careful not to mistake the .equ directive as creating a variable. It does not allocate any memory.

It simply gi ves a symbolic name to a number you wish to use in your program, thus making your

code easier to read.

A comment about programming style when using the .equ directive is appropriate her e. No-

tice that the programmer has u se d it to give the same numerical value to two differ ent symbo ls:

9 .equ aLetter,-16

10 .equ localSize,-16

Each symbol is used diffe rently in the code. It wou ld be confusing to a reader if only one symbol

were used in both places.

8.3. LOCAL VARIAB LES ON THE CALL STACK 171

8.3.1 Calling printf and scanf i n 64-bit Mode

The printf function can be used to format data and write it to the screen, and the scanf function

can be used to read formatted input from the keyboard. In ord er to see how to call these two

functions in assembly language we begin with the C program in Listing 8.7.

1 /

*

2

*

echoInt1.c

3

*

Reads an integer from the keyboard and echos it.

4

*

Bob Plantz - 11 June 2009

5

*

/

6

7 #include <stdio.h>

8

9 int main(void)

10 {

11 int anInt;

12

13 printf("Enter an integer number: ");

14 scanf("%i", &anInt);

15 printf("You entered: %i\n", anInt);

16

17 return 0;

18 }

Listing 8.7: Calling printf and scanf to write and read fo rmatted I/O (C).

The assembly language generated by the gcc compiler is shown in Listing 8.8. Comments have

been ad ded to explain the printf and scanf calls.

1 .file "echoInt1.c"

2 .section .rodata

3 .LC0:

4 .string "Enter an integer number: "

5 .LC1:

6 .string "%i"

7 .LC2:

8 .string "You entered: %i\n"

9 .text

10 .globl main

11 .type main, @function

12 main:

13 pushq %rbp

14 movq %rsp, %rbp

15 subq $16, %rsp

16 movl $.LC0, %edi # address of message

17 movl $0, %eax # no floats

18 call printf

19 leaq -4(%rbp), %rsi # address of anInt

20 movl $.LC1, %edi # address of format string

21 movl $0, %eax # no floats

22 call scanf

23 movl -4(%rbp), %esi # copy of anInt value

24 movl $.LC2, %edi # address of format string

25 movl $0, %eax # no floats

26 call printf

27 movl $0, %eax

28 leave

29 ret

172 CHAPTER 8. PROGRAM DATA INPUT, STORE, OUTPUT

30 .size main, .-main

31 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"

32 .section .note.GNU-stack,"",@progbits

Listing 8.8: Calling printf and scanf to wr ite and read formatted I/O (gcc assembly language).

The first call to printf p asses on ly one argument. However, on line 17 in Listing 8.8 0 is

passed in eax:

16 movl $.LC0, %edi # address of message

17 movl $0, %eax # no floats

18 call printf

The eax register is not listed as being used for passing arguments (see Section 8.1).

Both printf and scanf can take a variable number of arguments. The ABI [25] specifies

that the total number of argu ments p assed in SSE registers must be passed in rax. As you will

learn in Section 14.5, the SSE registers are used for passing floats in 64-bit mo de. Since no fl oat

arguments are being passed in this call, rax must be set to 0. Recall that setting eax to 0 also

sets the high-order bits of rax to 0 (Table 7.1, page 141).

The call to scanf on line 14 in the C version passes two arguments:

scanf("%i", &anInt);

That call is implemented in assembly language on lines 19 22 in Listing 8.8:

19 leaq -4(%rbp), %rsi # address of anInt

20 movl $.LC1, %edi # address of format string

21 movl $0, %eax # no floats

22 call scanf

Again, we see that the eax register must be set to 0 because there are n o float arguments.

The program written in assembly language (Listing 8.9) is easier to read because the pro-

grammer has used symbolic names for the co nstants and the stack variable.

1 # echoInt2.s

2 # Prompts user to enter an integer, then echoes the response

3 # Bob Plantz -- 11 June 2009

4

5 # Stack frame

6 .equ anInt,-4

7 .equ localSize,-16

8 # Read only data

9 .section .rodata

10 prompt:

11 .string "Enter an integer number: "

12 scanFormat:

13 .string "%i"

14 printFormat:

15 .string "You entered: %i\n"

16 # Code

17 .text # switch to text section

18 .globl main

19 .type main, @function

20 main:

21 pushq %rbp # save caller's frame pointer

22 movq %rsp, %rbp # establish our frame pointer

23 addq $localSize, %rsp # for local variable

24

25 movl $prompt, %edi # address of prompt text string

26 movq $0, %rax # no floating point args.

27 call printf # invoke printf function

8.4. DESIGNING THE LOCAL VARIABLE PORTION OF THE CALL STACK 173

28

29 leaq anInt(%rbp), %rsi # place to store integer

30 movl $scanFormat, %edi # address of scanf format string

31 movq $0, %rax # no floating point args.

32 call scanf # invoke scanf function

33

34 movl anInt(%rbp), %esi # the integer

35 movl $printFormat, %edi # address of printf text string

36 movq $0, %rax # no floating point args.

37 call printf # invoke printf function

38

39 movl $0, %eax # return 0

40 movq %rbp, %rsp # delete local variables

41 popq %rbp # restore caller's frame pointer

42 ret # back to calling function

Listing 8.9: Calling printf and scanf to write and read formatted I/O (pr ogrammer assembly

language).

8.4 Designing the Local Variable Portion of the Call Stack

When designing a function in assembly language, you need to determine where each local vari-

able will be located in the memory that is allocated on the call stack. The ABI [25] specifies

that:

1. Each variable should be aligned on an address that is a multiple of its size.

2. The address in the stack pointer (rsp) should be a multiple of 16 immediately before an-

other function is called.

These rules are best illustrated by consider ing the program in Listing 8.10.

1 /

*

2

*

varAlign1.c

3

*

Allocates some local variables to illustrate their

4

*

alignment on the call stack.

5

*

Bob Plantz - 11 June 2009

6

*

/

7

8 #include <stdio.h>

9

10 int main(void)

11 {

12 char alpha, beta, gamma;

13 char

*

letterPtr;

14 int number;

15 int

*

numPtr;

16

17 alpha = 'A';

18 beta = 'B';

19 gamma = 'C';

20 number = 123;

21 letterPtr = &alpha;

22 numPtr = &number;

23

24 printf("%c %c %c %i\n",

*

letterPtr,

25 beta, gamma,

*

numPtr);

174 CHAPTER 8. PROGRAM DATA INPUT, STORE, OUTPUT

26

27 return 0;

28 }

Listing 8.10: Some loc al variables (C).

The asse mbly language generated by the co mpiler is shown in Listing 8.11 with c omments ad ded

for explanation.

1 .file "varAlign1.c"

2 .section .rodata

3 .LC0:

4 .string "%c %c %c %i\n"

5 .text

6 .globl main

7 .type main, @function

8 main:

9 pushq %rbp

10 movq %rsp, %rbp

11 subq $32, %rsp # 2

*

16

12 movb $65, -1(%rbp) # alpha = 'A';

13 movb $66, -2(%rbp) # beta = 'B';

14 movb $67, -3(%rbp) # gamma = 'C';

15 movl $123, -8(%rbp) # number = 123;

16 leaq -1(%rbp), %rax

17 movq %rax, -16(%rbp) # letterPtr = &alpha;

18 leaq -8(%rbp), %rax

19 movq %rax, -24(%rbp) # numPtr = &number;

20 movq -24(%rbp), %rax

21 movl (%rax), %edx

22 movsbl -3(%rbp),%ecx

23 movsbl -2(%rbp),%edi

24 movq -16(%rbp), %rax

25 movzbl (%rax), %eax

26 movsbl %al,%esi

27 movl %edx, %r8d

28 movl %edi, %edx

29 movl $.LC0, %edi

30 movl $0, %eax

31 call printf

32 movl $0, %eax

33 leave

34 ret

35 .size main, .-main

36 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"

37 .section .note.GNU-stack,"",@progbits

Listing 8.11: Some local variables (gcc assembly language).

The char variables take one byte, so they can be aligned on e ach byte:

12 movb $65, -1(%rbp) # alpha = 'A';

13 movb $66, -2(%rbp) # beta = 'B';

14 movb $67, -3(%rbp) # gamma = 'C';

The next available byte is at -4, but the int requires fo ur bytes. However, it cannot be allocated

at -7 because it m ust be aligned on a byte addre ss that is a multiple of four. So it is placed at -8:

15 movl $123, -8(%rbp) # number = 123;

8.4. DESIGNING THE LOCAL VARIABLE PORTION OF THE CALL STACK 175

The two pointer variables each re qu ire eight bytes. So placing letterPtr at -16 and numPtr

at -24 allows enough me mory for each and places each on an address that is a multiple of eight.

16 leaq -1(%rbp), %rax

17 movq %rax, -16(%rbp) # letterPtr = &alpha;

18 leaq -8(%rbp), %rax

19 movq %rax, -24(%rbp) # numPtr = &number;

Placing each variable such that the alignment rules are met r equires 24 bytes o n the stack

for local v ariables. However, the ABI also states that the stack pointer m ust be on a 16-byte

address boundary. So we need to allocate 32 bytes for the local variables:

11 subq $32, %rsp # 2

*

16

Listing 8.12 shows how an assembly language programmer u se s symbolic names to write

code that is easier to read.

1 # varAlign2.s

2 # Allocates some local variables to illustrate their

3 # alignment on the call stack.

4 # Bob Plantz - 11 June 2009

5 # Stack frame

6 .equ numPtr,-24

7 .equ letterPtr,-16

8 .equ number,-8

9 .equ gamma,-3

10 .equ beta,-2

11 .equ alpha,-1

12 .equ localSize,-32

13 # Read only data

14 .section .rodata

15 format:

16 .string "%c %c %c %i\n"

17 # Code

18 .text

19 .globl main

20 .type main, @function

21 main:

22 pushq %rbp # save caller's frame pointer

23 movq %rsp, %rbp # establish our frame pointer

24 addq $localSize, %rsp # for local vars

25

26 movb $'A', alpha(%rbp) # intialize variables

27 movb $'B', beta(%rbp)

28 movb $'C', gamma(%rbp)

29 movl $123, number(%rbp)

30

31 leaq alpha(%rbp), %rax # initialize pointers

32 movq %rax, letterPtr(%rbp)

33 leaq number(%rbp), %rax

34 movq %rax, numPtr(%rbp)

35

36 movq numPtr(%rbp), %rax # load pointer

37 movl (%rax), %r8d # for dereference

38 movb gamma(%rbp), %cl

39 movb beta(%rbp), %dl

40 movq letterPtr(%rbp), %rax

41 movb (%rax), %sil

42 movl $format, %edi

176 CHAPTER 8. PROGRAM DATA INPUT, STORE, OUTPUT

43 movq $0, %rax

44 call printf

45

46 movl $0, %eax # return 0 to OS

47 movq %rbp, %rsp # restore stack pointer

48 popq %rbp # restore caller's frame pointer

49 ret

Listing 8.12: Some local variables (p r ogrammer assembly language).

Notice the assembly language syntax for sing le character constants on lines 26 28:

26 movb $'A', alpha(%rbp) # initialize variables

27 movb $'B', beta(%rbp)

28 movb $'C', gamma(%rbp)

The GNU assembly language info doc umentation specifies that only the first single quote, 'A, is

required. But the C syntax, 'A', also works, so we have used that because it is gener ally easier

to read.

1

We can summarize the proper se qu ence of instructions for establishing a local variable envi-

ronment in a function:

These three

operations

MUST be

performed

EXACTLY in this

order at the

BEGINNING of

each function.

1. Push the calling function's frame pointer on to the stack.

2. Copy the value in the stack pointer r egister (rsp) into the frame pointer register (rbp) to

establish the frame pointer for the curr ent functio n.

3. Allocate space for the local v ariables by moving the stack pointer to a lower address.

Just before ending this function, these three steps need to be undone. Since the frame pointer

is po in ting to where the top of the stack was before we allocated memory for local variables, the

local variable memory can be deleted by simply copying the value in the frame pointe r to the

stack pointer. Now the calling function's frame pointer value is at the top of the stack. The

ending sequence is:

1. Copy the value in the fram e pointer register (rbp) to the stack pointer register (rsp).

2. Pop the value at the top of the stack into the frame pointer register ( rbp).

These two

operations

MUST be

performed

EXACTLY in this

order at the END

of each function.

Listing 8.13 shows the general format that must be followed when writing a function. If you

follow this format and do everything in the order that is given for all your functions, you will

have many few er problems getting them to work properly. If you do not, I guarantee that you

will h ave many problems.

1 # general.s

2 .text

3 .globl general

4 .type general, @function

5 general:

6 pushq %rbp # save calling function's frame pointer

7 movq %rsp, %rbp # establish our frame pointer

8

9 # Allocate memory for local variables and saving registers here.

10 # Ensure that the address in rsp is a multiple of 16.

11 # Save the contents of general purpose registers that must be

12 # preserved and are used in this function here.

13

14 # The code that implements the function goes here.

15

16 # Restore the contents of the general purpose registers that

1

Also, the L

A

T

E

Xmacro used to pretty print listings in this book does not process the single-quote syntax correctly.

8.5. USING SYSCALL TO PERFORM I/O 177

17 # were saved above.

18 # Place the return value, if any, in the eax register.

19

20 movq %rbp, %rsp # delete local variables

21 popq %rbp # restore calling function's frame

22 # pointer

23 ret

Listing 8.13: General format of a function written in assembly language.

8.5 Using syscall to Perform I/O

The printf and scanf functions discussed in Section 2.5 (page 13) are C library functions that

convert program data to and from text formats for interacting with users via the screen and

keyboard. The write and read functions discussed in Section 2.8 (page 22) are C wrapper fu nc-

tions that only pass bytes to output and from inpu t devices, relying on the pro gram to perform

the conversions so that the bytes are meaningful to the I/O device. Ultimately, each of these

functions call upon the services of the operating system to perform the actual byte transfers to

and from I /O devices.

In assembly language, you do not need to use the C environment. The convention is to begin

program execution at the

__

start label. (Note that there are two underscore characters.) The

assembler is used as bef ore, but instead of using gcc to link in the C libraries, use ld directly.

You need to specify the entry point of y our program. For example, the command for the program

in Listing 8.14 is:

bob$ ld -e

__

start -o echoChar3 echoChar3.o

When performing I/O you invoke the Linux operations yourself. The technique involves mov-

ing the arguments to specific registers, placing a special code in the eax register, and then using

the syscall instruction to call a function in the operating system. (The way this works is de-

scribed in Section 15.6 on pag e 345.) The operating system will perfo rm the action specified

by the code in the eax re gister, using the arguments passed in the other registers. The values

required for reading from and writing to files are given in Table 8.3.

system call eax edi rsi edx

read 0 file descriptor pointer to place

to store bytes

number of bytes

to read

write 1 file descriptor pointer to first

byte to write

number of bytes

to write

exit 60

Table 8.3: Register set up for using syscall instruction to read, write, or exit.

In Listing 8.14 we h ave rewritten the program of Listing 8.6 without using the C environ-

ment.

1 # echoChar3.s

2 # Prompts user to enter a character, then echoes the response

3 # Does not use C libraries

4 # Bob Plantz -- 11 June 2009

5

6 # Useful constants

7 .equ STDIN,0

8 .equ STDOUT,1

9 .equ READ,0

10 .equ WRITE,1

11 .equ EXIT,60

178 CHAPTER 8. PROGRAM DATA INPUT, STORE, OUTPUT

12 # Stack frame

13 .equ aLetter,-16

14 .equ localSize,-16

15 # Read only data

16 .section .rodata # the read-only data section

17 prompt:

18 .string "Enter one character: "

19 .equ promptSz,.-prompt-1

20 msg:

21 .string "You entered: "

22 .equ msgSz,.-msg-1

23 # Code

24 .text # switch to text section

25 .globl

__

start

26

27

__

start:

28 pushq %rbp # save caller's frame pointer

29 movq %rsp, %rbp # establish our frame pointer

30 addq $localSize, %rsp # for local variable

31

32 movl $promptSz, %edx # prompt size

33 movl $prompt, %esi # address of prompt text string

34 movl $STDOUT, %edi # standard out

35 movl $WRITE, %eax

36 syscall # request kernel service

37

38 movl $2, %edx # 1 character, plus newline

39 leaq aLetter(%rbp), %rsi # place to store character

40 movl $STDIN, %edi # standard in

41 movl $READ, %eax

42 syscall # request kernel service

43

44 movl $msgSz, %edx # message size

45 movl $msg, %esi # address of message text string

46 movl $STDOUT, %edi # standard out

47 movl $WRITE, %eax

48 syscall # request kernel service

49

50 movl $2, %edx # 1 character, plus newline

51 leaq aLetter(%rbp), %rsi # place where character stored

52 movl $STDOUT, %edi # standard out

53 movl $WRITE, %eax

54 syscall # request kernel service

55

56 movq %rbp, %rsp # delete local variables

57 popq %rbp # restore caller's frame pointer

58 movl $EXIT, %eax # exit from this process

59 syscall

Listing 8.14: Echo character program using the syscall instruction.

Comparing this program with the one in Listing 8.6, the program arg uments are the same

and are passed in the same re gisters. The only difference with using the syscall function is that

you have to prov ide a code for the operation to be performed in the eax register. The com plete list

of system operations that can be perfor med are in the system file /usr/include/asm-x86

_

64/unistd.h.

(The path on your system m ay be different.)

To determine the arguments that must be passed to each system operation read section 2 of

8.6. CALLING FUNCTIONS, 32-BIT MODE 179

the man page for that operation. For example, the arguments for the write system call c an be

seen by using

bob$ man 2 write

Then follow the rule s in Se ction 8.1 for placing the arguments in the proper registers.

8.6 Calling Functions, 32-Bit Mode

In 32-bit mode all the arguments are pushed onto the call stack in right-to-left order. Listing

8.15 shows how to call the write() system call function.

1 # fourChars

_

32.s

2 # displays four characters on the screen using the write() system call.

3 # (32-bit version.)

4 # Bob Plantz - 19 March 2008

5

6 # Read only data

7 .section .rodata

8 Chars:

9 .byte 'A'

10 .byte '-'

11 .byte 'Z'

12 .byte '\n'

13 # Code

14 .text

15 .globl main

16 .type main, @function

17 main:

18 pushl %ebp # save frame pointer

19 movl %esp, %ebp # set new frame pointer

20

21 pushl $4 # send four bytes

22 pushl $Chars # at this location

23 pushl $1 # to screen.

24 call write

25 addl $12, %esp

26

27 movl $0, %eax # return 0;

28 movl %ebp, %esp # restore stack pointer

29 popl %ebp # restore frame pointer

30 ret

Listing 8.15: Displaying four characters o n the screen using the write syste m call function in

assembly language.

After all thre e arguments have been pushed onto the c all stack, it looks like:

esp

(esp) +8

(esp) +4

????

1

$Chars

4

180 CHAPTER 8. PROGRAM DATA INPUT, STORE, OUTPUT

where the notation (esp) + n means "the address in the esp register plu s n." The stack pointer,

the esp register, points to the last item pushed onto the call stack. The other two arguments

are store d on the stack below the top item. Don't for get that "below" on the call stack is at

numerically higher addre sses because the stack grows toward lower addresses.

When the call instruction is executed, the return address is pushed onto the call stack as

shown h ere:

esp

(esp) +12

(esp) +8

(esp) +4

????

return

1

$Chars

4

where "return" is the address where the called fun ction is supposed to return to at the end of

its execu tion. So the arguments are read ily available inside the called function; you will learn

how to access them in Chap ter 8. And as long as the called function does not change the return

address, and restores the stack pointer to the position it was in when the function was called, it

can easily re tur n to the calling fun c tio n.

Now, let's look at what happens to the stack m emory area in the assembly language pro-

gram in Listing 8.15. Assume that the value in the esp register when the main function is

called is 0xbffffc5c and that the value in the ebp register is 0xbffffc6a. Immediately after the

subl $8, %esp instruction is executed, the stack looks like:

address contents

bffffc50: ????????

bffffc54: ????????

bffffc58: bffffc6a

bffffc5c: important information

the value in the esp register is 0xbffffc50, and the value in the ebp register is 0xbffffc58. The

"?" indicates that the states of the bits in the indicated me mory locations are irrelevant to us.

That is, the memory between locations 0xbffffc50 and 0xbffffc57 is "garbage."

We have to assume that the values in bytes number 0xbffffc5c, 5d, 5e, and 5f were placed

there by the function that called this function and have some meaning to that func tion. So we

have to be careful to pr eserve the value there.

Since the esp register contains 0xbffffc50, we can continue using the stack pushing and

popping without disturbing the eight bytes between locations 0xbffffc50 and 0xbffffc57.

These eight bytes are the ones we will use for storing the local variables. And if we take care not

to change the value in the ebp register throug hout the function, we can easily access the local

variables.

8.7 Instructions Introduced Thus Far

This summary shows the assembly language instructions introduced thus far in the book. The

page number where the instruction is explained in more detail, which may be in a subsequent

chapter, is also giv en. This boo k provide s only an introduction to the usage of each in struction.

You need to consult the manuals ([2] [6], [14] [18]) in order to learn all the po ssible uses of

the instructions.

8.8. EXERCISES 181

8.7.1 Instructions

data movement:

opcode source destination action see page:

movs $imm/ %reg %reg/mem move 141

movsss $imm/ %reg %reg/mem move, sign extend 216

movzss $imm/ %reg %reg/mem move, zero extend 217

popw %reg/mem pop from stack 163

pushw $imm/ %reg/mem push o nto stack 163

s = b, w, l, q; w = l, q

arithmetic/logic:

opcode source destination action see page:

cmps $imm/ %reg %reg/mem compare 209

incs %reg/mem increment 220

leaw mem %reg load eff ective address 167

subs $imm/ %reg %reg/mem subtract 190

s = b, w, l, q; w = l, q

program ow control:

opcode location action see page:

call label call function 156

je label jump equal 211

jmp label jump 213

jne label jump not equal 211

leave undo stack frame 168

ret return from function 168

syscall call kernel function 177

8.7.2 Addressing Modes

register direct: The data value is located in a CPU register.

syntax: name of the register with a "%" prefix.

example: movl %eax, %ebx

immediate

data:

The data value is located immediately after the instruc-

tion. Source operand only.

syntax: data value with a "$" prefix.

example: movl $0xabcd1234, %ebx

base register

plus offset:

The data value is located in memory. The address of the

memory location is the sum of a value in a base register

plus an offset value.

syntax: use the name of the register with parentheses

around the name and the offset value immediately be-

fore the left parenthesis.

example: movl $0xaabbccdd, 12(%eax)

8.8 Exercises

8-1 (§8.1) Enter the C program in Listing 8.1 and get it to work correctly. Run the program

under gdb, setting a break poin t at the call to write. When the program bre aks, use the

si (Step one instruction exactly) command to exe c ute the instructions that load registers

with the arguments. As you do this, keep track of the contents in the appropriate argument

registers and the rip register. What is the address where the text string is stored? If you

single step into the write f unction, use the cont c ommand to co ntinue through it.

182 CHAPTER 8. PROGRAM DATA INPUT, STORE, OUTPUT

8-2 (§8.2) Modify the program in Listing 8.3 so that the stack grow s from lower numbered

array elements to higher numbered on es.

8-3 (§8.2) En ter the the assembly language program in Listing 8.4 and show that the rbp and

rsp registers are also saved and restored by this fun ction.

8-4 (§8.3) Enter the C program in Listing Listing 2.4 (page 23) an d compile it with the debug-

ging option, -g. Run the program under gdb, setting a break point at each of the calls to

write and read. Each time the program breaks, use the si (Step one instruction exactly)

command to execute the instructions that load registers with the arguments. As you do

this, keep track of the contents in the appropriate argument r egisters and the rip regis-

ter. What are the addresses where the text strings are stored? What is the address of the

aLetter variable? If you single step into either the write or read functions, use the cont

command to continue through it.

8-5 (§8.3) Modify the assembly languag e program in Listing 8.6 such that it also reads the

newline character when the use r enters a sing le character. Run the pr ogram with gdb. Set

a breakpoint at the first instruction, then run the program. When it bre aks, write down

the values in the rsp and rbp registers. Write down the changes in these two registers as

you single step (si command) through the first three instructions.

Set breakpoints at the instruction that calls the read function an d at the next instruction

immediately after that one. Examine the values in the argument-passing registers.

From the addresses you wrote down above, determine where the two characters (user 's

character plus newline) that are read from the keyboard will be stored, and examine that

area of memory.

Use the cont command to continue execution through the read function. Enter a character.

When the program breaks back into gdb, exam in e the area of memory again to make sure

the two characters got stored ther e.

8-6 (§8.3) Write a progr am in assembly language that prompts the user to enter an integer,

then displays its hexadecimal equivalent.

8-7 (§8.3) Write a program in assembly langu age that "declares" four char variables and four

int variables, and initializes all eight variables with appropriate values. Then call printf

to display the values of all eight variables with only one call.

Chapter 9

Computer Operations

We are now ready to look more closely at the instructions that control the CPU. This will only

be an introd uction to the topic. We will examine the most common operations assignment,

addition, and subtraction. Additional operations will be described in subsequent chapters.

Each assembly language instruction must be translated into its correspond ing machin e code,

including the locations of any data it manipulates. It is the bit pattern of the machine code that

directs the activities of the control unit.

The goal here is to show you that a compute r performs its operations based on bit patterns.

As you read through this material, keep in mind that even thou gh this material is quite te-

dious, the operations are ve ry simple. Fortunately, instruction execution is very fast, so lots of

meaningful w ork c an be done by the computer.

9.1 The Assignment Ope rator

The C/C++ assignment operator, "= ", causes the expre ssion on the right-hand side of the operator

to be evaluated and the result to be associated with the variable that is named on the left-hand

side. Subsequent uses of the variable name in the prog ram will evaluate to this same value. For

example,

int x;

.....

x = 123;

will assign the inte ger 123 to the variable x. If x is later used in an expression, the value

assigned to x will be used in evaluating the expression. For example, the expression

2

*

x;

would evaluate to 246.

This assumes that the expression on the right-hand side evaluates to the same data type

as the variable on the le ft-hand side. If not, some automatic type casting may occur, or the

compiler may indicate an error. We ignore the issue o f data type for no w and will discuss it at

several points when appropriate. For now, we are working with arbitrary bit patterns that have

no meaning as "data."

We now explore what assignment means at the assembly language level. The variable dec-

laration,

int x;

causes memory to be allocated an d the location of that memory to be given the name "x." That

is, other parts of the program c an refer to the memory location where the value of x is stored by

using the name "x." The type name in the declaration, int, tells the comp iler how many bytes

to allocate and the code used to represent the data stored at this location. The int ty pe uses the

two's complement code. The assignment statement,

x = 123;

183

184 CHAPTER 9. COMPUTER OPERATIONS

sets the bit pattern in the location named x to 0x0000007b, the two's complement code for the

integer 123. The assignment statement

x = -123;

sets the bit pattern in the location named, x to 0xffffff85, the two's comple ment code for the

integer -123.

Let us consider the simplest c ase where

the allocated memory is within the CPU (i.e., a register).

the bit pattern has no "real world" meaning.

That is, we will consider a program that simply sets a bit pattern in a CPU register. A C progr am

to do this is shown in Listing 9.1.

1 /

*

2

*

assignment1.c

3

*

Assign a 32-bit pattern to a register

4

*

5

*

Bob Plantz - 11 June 2009

6

*

/

7

8 #include <stdio.h>

9

10 int main(void)

11 {

12 register int x;

13

14 x = 0xabcd1234;

15

16 printf("x = %i\n", x);

17

18 return 0;

19 }

Listing 9.1: Assignment to a reg ister variable (C).

The register modifier "advises" the compiler to use a CPU register for the integer variable

named "x." And the notation 0xabcd1234 means that abcd1234 is written in hexade cimal. (Recall

that hexadecimal is used as a compact notation for representing bit patterns.) When the C

program in Listing 9.1 is compiled into its assembly language equivalent with no optimization:

bob$ gcc -S -O0 -fno-asynchronous-unwind-tables assignment1.c

the gcc compiler ge nerates the assembly language p rogram shown in Listing 9.2, with a com-

ment added to show where the assignmen t operation takes place.

1 .file "assignment1.c"

2 .section .rodata

3 .LC0:

4 .string "x = %i\n"

5 .text

6 .globl main

7 .type main, @function

8 main:

9 pushq %rbp

10 movq %rsp, %rbp

11 movl $-1412623820, %esi # x = 0xabcd1234;

12 movl $.LC0, %edi

13 movl $0, %eax

9.1. THE ASSIGNMENT OPERATOR 185

14 call printf

15 movl $0, %eax

16 leave

17 ret

18 .size main, .-main

19 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"

20 .section .note.GNU-stack,"",@progbits

Listing 9.2: Assignment to a register variable (gcc assembly language). Comment added to show

the assignment operation.

The C assignment operation is imp lemented with the mov instruction. For example, in Listing

9.1,

14 x = 0xabcd1234;

is implemented with

11 movl $-1412623820, %esi # x = 0xabcd1234;

on line 11 in Listing 9.2. We can see that the compiler chose to use the esi register as the x

variable.

The instructions on lines 12 14 implement the call to the printf func tion. One reason

for the call to the printf function is to prevent the compiler from eliminating the assignment

statement during its optimization of this function. Yes, even with the -O0 option the comp iler

does some optimization.

Compare this to Listing 7.4 on page 144. Notice that the prologu e

main:

pushq %rbp

movq %rsp, %rbp

and epilogue

leave

ret

of this fu nction are the same.

The mov instruction has an "l" ("ell", not "one") appended to it to indicate that the operand

size is 32 bits. This is redundant because the register named as an operand, esi, is 32 bits, but

it is the required syntax. The Intel syntax does not include this redundancy. If we consider the

Intel syntax:

Intel®

Syntax

mov esi, -1412623820

we see the three other differences noted in Section 7.2.2 (page 141):

the operan d order is opposite,

the AT&T syntax requires a "%" pre fix to the name of a register, and

the AT&T syntax requires a "$" pre fix to the immediate data.

These d ifferences are spe cific to the assembler program being used and are not relevant to the

behavior of the CPU. The assembler program will translate the assembly language instruction

into the correct machine language co de.

You may wonder why the gcc compiler assigns the constant -1412623820 to the variable,

while the C version of the program assigns 0xabcd1234. The answer is that they are the same

values. The first is expressed in decimal and the second in hexadecimal. We d iscussed the

equivalence of decimal and hexadecimal in Section 2.2 (page 8), and we discussed signed de c im al

integers in Section 3.3 (page 34).

In Listing 9.3 we show the essential assembly language required to implement the C program

from Listing 9.1.

186 CHAPTER 9. COMPUTER OPERATIONS

1 # assignment2.s

2 # Assigns a 32-bit pattern to the esi register.

3 # Bob Plantz - 11 June 2009

4

5 .text

6 .globl main

7 .type main, @function

8 main:

9 pushq %rbp # save caller's base pointer

10 movq %rsp, %rbp # establish our base pointer

11

12 movl $0xabcd1234, %esi # store a bit pattern in esi

13

14 movl $0, %eax # return 0 to caller

15 movq %rbp, %rsp # restore stack pointer

16 popq %rbp # restore caller's base pointer

17 ret # back to caller

Listing 9.3: Assignment to a reg ister variable (programmer assembly lang uage).

Compare Listing 9.3 to Listing 7.5 on page 144. Note that

12 movl $0xabcd1234, %esi # set a bit pattern in esi

is the only assembly language statement that was added to the program. From this comparison,

you can see that this assembly language statement implements the two C statements:

register int x;

x = 0xabcd1234;

Like the compiler (Listing 9.2), we are using the esi register as our variable. We can use the

registers in Table 6.4 (page 121) as v ariables, except the stack pointer, %rsp, which has special

uses. The "%" prefix tells the assembler that these are names of registers, hence in the CPU and

not labels on mem ory locations.

Let us look more closely at the program in Listing 9.3. I used an editor to enter the code the n

assembled and linked it. Since it does not produce a display on the screen , I used gdb to observe

the changes in the registers. My typing is boldface.

$ gdb assignment2

GNU gdb 6.8-debian

Copyright (C) 2008 Free Software Foundation, Inc.

License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law. Type "show copying"

and "show warranty" for details.

This GDB was configured as "x86

_

64-linux-gnu"...

(gdb) li

1 # assignment2.s

2 # Assigns a 32-bit pattern to the esi register.

3 # Bob Plantz - 11 Jun 2009

4

5 .text

6 .globl main

7 .type main, @function

8 main:

9 pushq %rbp # save caller's frame pointer

10 movq %rsp, %rbp # establish our frame pointer

9.1. THE ASSIGNMENT OPERATOR 187

I use the li command to list part of the program. This allo ws us to see where I sh ould

set the first brea kpoint.

(gdb) br 9

Breakpoint 1 at 0x4004ac: file assignment2.s, line 9.

I set it on the first instructi on of th e program.

(gdb) run

Starting program: /home/bob/my

_

book

_

64

_

size/progs/chap09/assignment2

Breakpoint 1, main () at assignment2.s:9

9 pushq %rbp # save caller's frame pointer

Current language: auto; currently asm

I run the program, it breaks at the first breakpoint, and I can display the registers.

(gdb) i r rax rsi rsp rbp rip

rax 0x7fa31ab73ac0 140338504612544

rsi 0x7fff22d950f8 140733778055416

rsp 0x7fff22d95028 0x7fff22d95028

rbp 0x0 0x0

rip 0x4004ac 0x4004ac <main>

The i r rax rsi rsp rbp rip (info registers) command di splays the contents of the

registers that are used in this program. Note that the value in the rip register ( the

instruction pointer) is 0x4004ac. If you replicate thi s example (a good thing to do) you

will probably get a different values in your re gisters.

(gdb) si

10 movq %rsp, %rbp # establish our frame pointer

Next I use the single instruction (si) command to execute one instru ction.

(gdb) i r rax rsi rsp rbp rip

rax 0x7fa31ab73ac0 140338504612544

rsi 0x7fff22d950f8 140733778055416

rsp 0x7fff22d95020 0x7fff22d95020

rbp 0x0 0x0

rip 0x4004ad 0x4004ad <main+1>

I display the new state of the registers. Notice that the rip register has changed from

0x4004ac to 0x4004ad . Thi s tells us that the instruction that was just exe cuted, pushl

%rbp, is 0x4004ad - 0x4004ac = 1 byte long. The numbers in the r ight-hand column

show the decimal equivalent of the bit patterns for some of the registers. The instruc-

tion that is a bout to be executed will copy the value in the rsp register to the rbp register

and the next one will set the thirty-two bits of the esi register to 0xabcd1234.

(gdb) si

main () at assignment2.s:12

12 movl $0xabcd1234, %esi # set a bit pattern in esi

(gdb) si

14 movl $0, %eax # return 0 to caller

188 CHAPTER 9. COMPUTER OPERATIONS

I execute two instructions by using the si command twice.

(gdb) i r rax rsi rsp rbp rip

rax 0x7fa31ab73ac0 140338504612544

rsi 0xabcd1234 2882343476

rsp 0x7fff22d95020 0x7fff22d95020

rbp 0x7fff22d95020 0x7fff22d95020

rip 0x4004b5 0x4004b5 <main+9>

The i r command shows us that th e rbp register has been chang ed to equal the rsp

register and the esi register has been set to the bit pattern 0xabcd1234. The rsi register

actually contains the bit pattern 0x00000000abcd1234; gdb does not disp lay leading

zeros. The rip register has changed from 0x4004ad to 0x4004b5. Thi s tells us that the

total number of bytes in the two instructions that w ere just executed, movq %rsp, %rbp

and movl $0xabcd1234, %edi is 0x4004b5 - 0x4004ad = 8 bytes.

Don't forget that

these are in hex.

(gdb) si

15 movq %rbp, %rsp # restore stack pointer

(gdb) i r rax rsi rsp rbp rip

rax 0x0 0

rsi 0xabcd1234 2882343476

rsp 0x7fff22d95020 0x7fff22d95020

rbp 0x7fff22d95020 0x7fff22d95020

rip 0x4004ba 0x4004ba <main+14>

Executing another single i nstruction shows that the movl $0, %eax instru ction does,

indeed, store all zeros in the eax register. The program is now pois e d at the instructi on

that will begin undoing the stack frame in pr e paration for the return to the calling

function.

(gdb) si

main () at assignment2.s:16

16 popq %rbp # restore caller's frame pointer

(gdb) si

main () at assignment2.s:16

17 ret # back to caller

(gdb) i r rax rsi rsp rbp rip

rax 0x0 0

rsi 0xabcd1234 2882343476

rsp 0x7fff22d95028 0x7fff22d95028

rbp 0x0 0x0

rip 0x4004be 0x4004be <main+18>

Executing tw o more instruction and displaying the registers sh ows that the frame

pointer register, rbp, has been r estored to its original value and the return value (in

eax) i s correct.

(gdb) cont

Continuing.

Program exited normally.

9.2. ADDITION AND SUBTRACTION OPERATORS 189

Finally, I use the continue command (cont) to run the program out to its end. Note:

If you use the si command to single step beyo nd the ret i nstruction at the end of the

main function, gdb will dutifully take you through the system libraries. At best, this is

a was te of time.

(gdb) q

$

And, of course, I have to tell gdb to quit.

9.2 Addition and Subtraction Ope rators

The assembly language instruction to perform binary addition is quite simple:

adds source, destination

where s denotes the size of the operand:

s meaning number of bits

b byte 8

w word 16

l longword 32

q quadword 64

The add instruction adds the source operand to the destination op erand using the r ules of binary

addition, leaving the result in the destination operand. As with the mov instruction, no mo re

than one operand can be a memory location. The source operand is n ot changed. In C/C++ the

You need to use

at least one

register to add or

subtract.

operation co uld be expr essed as:

destination += source

For example, the instruction

addq %rax, %rdx

adds the 64-bit value in the rax register to the 64-bit value in the rdx register, leaving the rax

register intact. The instruction

addw %dx , %r10w

adds the 32-bit value in the dx reg ister to the 32-bit value in the r10w re gister.

In the Intel syntax, the size of the data is determined by the o perand, so the size character

(b, w, l, or q) is not appended to the instruction. (And the order of the operands is reversed.)

Intel®

Syntax

add destination, source

We saw in Chapter 3 that addition may cause carry or overflow. Carry and overflow are

recorded in the 64-bit rflags register. The CF is bit nu mber zero, and the OF is bit number

eleven (numbering from right to left). Whenever an add instruction is executed both bits are set

as shown in Algor ithm 9.1.

Algorithm 9.1: Carry Flag and Overflow Flag after add.

if there is no carry the n 1

CF 0; 2

else 3

CF 1; 4

if there is no overflow then 5

OF 0; 6

else 7

OF 1; 8

190 CHAPTER 9. COMPUTER OPERATIONS

If the values being added represent unsigned ints, CF indicates whether the result fits within

the operand size or not. If the values repre sent signed ints, OF indicates whether the result fits

within the operand size or not. If the size of the operands is le ss than 64 bits and the operation

produces a carry and/or an overflow, this is not propagated up through the next bits in the des-

tination operand. The carry and overflow conditions are simply recorded in the cor r esponding

bits in the rflags register.

For example, if we consider the initial conditions

register contents

rax: ffff eeee dddd cccc

r8: 2222 4444 6666 8888

CF: ?

OF: ?

the instruction

addl %eax, %r8w

would produce

register contents

rax: ffff eeee dddd cccc

r8: 2222 4444 4444 5554

CF: 1

OF: 0

Whereas (starting from the same in itial conditions) the instruction

addb %al , %r8b

would produce

register contents

rax: ffff eeee dddd cccc

r8: 2222 4444 6666 8854

CF: 1

OF: 1

The assembly language instruction to perform binary subtraction is

subs source, destination

where s denotes the size of the operand:

s meaning number of bits

b byte 8

w word 16

l longword 32

q quadword 64

The sub instruction subtracts the source operand from the destination operand using the rules

of binary subtraction, leaving the result in the destination operand. As with the mov instruction,

no more than one operand c an be a memory location. The source operand is not changed. In

C/C++ the operation could be expressed as:

destination -= source

For example, the instruction

subl %eax, %edx

subtracts the 32-bit value in the eax register from the 32-bit value in the edx register. The

instruction

subb %dh , %ah

9.2. ADDITION AND SUBTRACTION OPERATORS 191

subtracts the 8-bit value in the dh register from the 8- bit value in the ah register.

In the Intel syntax, the size of the data is determined by the o perand, so the size character

(b, w, or l) is not appended to the instruction. (And the order of the operands is reversed .)

Intel®

Syntax

sub destination, source

Subtraction also affects the CF and the OF. Whenever a sub instruction is executed both bits

are set as shown in Algorithm 9.2.

Algorithm 9.2: Carry Flag and Overflow Flag after subtraction.

if there is no borrow then 1

CF 0; 2

else 3

CF 1; 4

if there is no overflow then 5

OF 0; 6

else 7

OF 1; 8

Just as with addition, if the value s being subtracted rep resent unsigned ints, CF indicates

whether there was a borro w from beyond the operand size or not. If the values represent signed

ints, OF indicates whether the result fits within the operand size or not. I f the size of the

operands is less than 64 bits and the oper ation produces a carry and/or an overflow, this is

not propagated up through the next bits in the destination operand. The carry and overflow

conditions are simply recorded in the correspond in g bits in the rflags register.

For example, if we consider the initial conditions

register contents

rax: ffff eeee dddd cccc

r8: 2222 4444 6666 8888

CF: ?

OF: ?

the instruction

subl %eax, %r8w

would produce

register contents

rax: ffff eeee dddd cccc

r8: 2222 4444 8888 bbbc

CF: 1

OF: 1

Whereas (starting from the same in itial conditions) the instruction

subb %al , %r8b

would produce

register contents

rax: ffff eeee dddd cccc

r8: 2222 4444 6666 88bc

CF: 1

OF: 0

A simple progr am given in Listing 9.4 illustrates both addition and subtraction in C.

192 CHAPTER 9. COMPUTER OPERATIONS

1 /

*

2

*

addAndSubtract1.c

3

*

Reads two integers from user, then

4

*

performs addition and subtraction

5

*

Bob Plantz - 11 June 2009

6

*

/

7

8 #include <stdio.h>

9

10 int main(void)

11 {

12 int w, x, y, z;

13

14 printf("Enter two integers: ");

15 scanf("%i %i", &w, &x);

16 y = w + x;

17 z = w - x;

18 printf("sum = %i, difference = %i\n", y, z);

19

20 return 0;

21 }

Listing 9.4: Addition and subtraction (C).

Unfortunately, this program can give inc orrect results:

$ ./addAndSubtract1

Enter two integers: 1000000000 2000000000

sum = -1294967296, difference = -1000000000

$ ./addAndSubtract1

Enter two integers: -1000000000 2000000000

sum = 1000000000, difference = 1294967296

Wo rse, there is no message even warning that these are incorrect results. You know (see Se c tio n

3.4, page 39) that the r esults have o verflowed. C does not check for overflow, so you would have

to write code that explicitly checks for it.

The assembly language gener ate d by gcc is shown in Listing 9.5 with comments added.

1 .file "addAndSubtract1.c"

2 .section .rodata

3 .LC0:

4 .string "Enter two integers: "

5 .LC1:

6 .string "%i %i"

7 .LC2:

8 .string "sum = %i, difference = %i\n"

9 .text

10 .globl main

11 .type main, @function

12 main:

13 pushq %rbp

14 movq %rsp, %rbp

15 subq $16, %rsp

16 movl $.LC0, %edi

17 movl $0, %eax

18 call printf

19 leaq -8(%rbp), %rdx # load address of x

20 leaq -4(%rbp), %rsi # load address of w

9.2. ADDITION AND SUBTRACTION OPERATORS 193

21 movl $.LC1, %edi # load address of format string

22 movl $0, %eax # no float arguments

23 call scanf

24 movl -4(%rbp), %edx # load w

25 movl -8(%rbp), %eax # load x

26 leal (%rdx,%rax), %eax # eax <- w + x

27 movl %eax, -12(%rbp) # y = w + x;

28 movl -4(%rbp), %edx # load w

29 movl -8(%rbp), %eax # load x

30 movl %edx, %ecx # ecx <- w

31 subl %eax, %ecx # eax <- w - x

32 movl %ecx, %eax

33 movl %eax, -16(%rbp) # z = w - x;

34 movl -16(%rbp), %edx

35 movl -12(%rbp), %esi

36 movl $.LC2, %edi

37 movl $0, %eax

38 call printf

39 movl $0, %eax

40 leave

41 ret

42 .size main, .-main

43 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"

44 .section .note.GNU-stack,"",@progbits

Listing 9.5: Addition and subtraction (gcc assembly language).

We see that a rather simple C statement:

16 y = w + x;

must be broken down into d istinct steps at the assembly language le vel:

24 movl -4(%rbp), %edx # load w

25 movl -8(%rbp), %eax # load x

26 leal (%rdx,%rax), %eax # eax <- w + x

27 movl %eax, -12(%rbp) # y = w + x;

It probably seem s very odd that there is no add instruction in this code sequence. The compiler

has used the leal instruction with the indexed addressing mode, which will be discussed in more

detail in Section 13.1 when we discuss arrays. Basically, it is intended to compute an address by

adding the value s in the two registers that are in the parentheses. In this example, it adds the

two values in rdx and rax. This sum is intended to be used as an address, so the leal instruction

is used to load the sum into eax.

An important differe nce between leal and addl is that leal does not affect the condition

codes in the eflags register. It might seem that this would "disqualify" this construct from

being use d to add two integers, but C does not check for carry or overflow. So this meets the

specifications of the C language.

Similarly, the C statement:

17 z = w - x;

is broken down into the distinct steps:

28 movl -4(%rbp), %edx # load w

29 movl -8(%rbp), %eax # load x

30 movl %edx, %ecx # ecx <- w

31 subl %eax, %ecx # eax <- w - x

32 movl %ecx, %eax

33 movl %eax, -16(%rbp) # z = w - x;

194 CHAPTER 9. COMPUTER OPERATIONS

It is easy to see that the co mpiler did not generate the most efficient code. (This was compiled

with no optimization.)

We have seen that the computations performed by both these C statements can produce

overflow. Table 9.1 shows h ow the variables (and CF and OF) change as we walk through the code

in the program of Listing 9.4. There are two runs o f the program using the input values above.

statement w x y z CF OF

scanf(); 0x3b9aca00 0x77359400 ???????? ???????? ? ?

y = w + x; 0x3b9aca00 0x77359400 0xb2d05e00 ???????? 0 0

z = w - x; 0x3b9aca00 0x77359400 0xb2d05e00 0xc4653600 1 0

scanf(); 0xc4653600 0x77359400 ???????? ???????? ? ?

y = w + x; 0xc4653600 0x77359400 0x3b9aca00 ???????? 0 0

z = w - x; 0xc4653600 0x77359400 0x3b9aca00 0x4d2fa200 0 1

Table 9.1: Walking through the code in Listing 9.4. There are two r uns of the program here.

Listing 9.6 shows an assembly language prog ram that performs the same operations as the

C program in Listing 9.4 but uses the jno (jump if no overflow) instruction to check for overflow.

These checks are e asy in assembly language. They add very little to the execution time of the

program, because most of the time only the conditional jumps are executed, and the jumps do

not take place.

1 # addAndSubtract2.s

2 # Gets two integers from user, then

3 # performs addition and subtraction

4 # Bob Plantz - 11 June 2009

5 # Stack frame

6 .equ w,-8

7 .equ x,-4

8 .equ localSize,-16

9 # Read only data

10 .section .rodata

11 prompt:

12 .string "Enter two integers: "

13 getData:

14 .string "%i %i"

15 display:

16 .string "sum = %i, difference = %i\n"

17 warning:

18 .string "Overflow has occurred.\n"

19 # Code

20 .text

21 .globl main

22 .type main, @function

23 main:

24 pushq %rbp # save caller's base pointer

25 movq %rsp, %rbp # establish our base pointer

26 addq $localSize, %rsp # for local vars

27

28 movl $prompt, %edi # prompt user

29 movl $0, %eax # no floats

30 call printf

31

32 leaq x(% rbp), %rdx # &x

33 leaq w(% rbp), %rsi # &w

34 movl $getData, %edi # get user data

9.3. INTRODUCTION TO MACHINE CODE 195

35 movl $0, %eax # no floats

36 call scanf

37

38 movl w(% rbp), %esi # y = w

39 addl x(% rbp), %esi # y += x

40 jno nOver1 # skip warning if no OF

41 movl $warning, %edi

42 movl $0, %eax

43 call printf

44 nOver1:

45 movl w(% rbp), %edx # z = w

46 subl x(% rbp), %edx # z -= x

47 jno nOver2 # skip warning if no OF

48 movl $warning, %edi

49 movl $0, %eax

50 call printf

51 nOver2:

52 movl $display, %edi # display results

53 movl $0, %eax # no floats

54 call printf

55

56 movl $0, %eax # return 0 to OS

57 movq %rbp, %rsp # restore stack pointer

58 popq %rbp # restore caller's base pointer

59 ret

Listing 9.6: Addition and subtraction (programmer assembly language).

9.3 Introduction to Machine Code

This section provides o nly a very brief glimpse of the machine c ode for the x 86 architecture. The

goal here is to provide you with a taste of what machine code looks like and thus emphasize

that the computer is really con trolled by groups of bit settings. The vast majority of c omputer

professionals never need to know the machine code for the computer they are working with. For

a complete description y ou will need to consult the manufacturer's documentation.

Let us c onsider for a moment how we might design a set of machine instructions for a simple

four-function computer. Ou r proposed computer c an add, subtract, m ultiply, and divide. And we

will suppose that it has 1 MB of memory. Each instruction must encode the following informa-

tion for the control unit:

1. the oper ation to be pe rformed, and

2. the location of the operand(s), if any, to operate on.

We will ignore the problem of getting data into the computer for this example, but we will

certainly want to be able to move data from location to location in our computer. So we will have

five operations:

move

add

subtract

multiply

divide

Our design will need to allow three bits for encoding each of these operations. For exam ple, we

could use the follow in g code:

196 CHAPTER 9. COMPUTER OPERATIONS

move 000

add 001

subtract 010

multiply 100

divide 111

Recall that N bits can be used to encode 2

N

different values. We want 1 MB of memory. From

2

10

= 1024 = 1K , and 1 M = 1K × 1K = 2

10

× 2

10

= 2

20

, we see that we need to allow 20 bits for

memory addressing.

Thus, if we want our compu ter to be able to add a value stored in one memory location to the

value at another we need 3 + 20 + 20 = 43 bits to encode the instruction. Question: how many

bits would be required if we wanted a design that would allow us to add two value s stored in

memory and store the sum at a third location?

Our silly design falls far short of practicality. The instructions themselves take too much

memory, and we have allowed for only a very limited number of operations on the data. This

was a more serious problem in the early days of computer design because memory was very

expensive. The result was that computer designers came up with some clever ways to e ncode

the necessary inf ormation into very few bits.

The de sign of the x86 processors is a very good example of this clever ness. In tel has paid

particular attention to backwards compatibility as their designs have evolved. Thus, we see

the remnants of the earlie r designs when memory was ve r y expensive in the latest Intel

processors. The more common instructions generally take fewer bytes of memory. As newer,

more complex features have be en added, they gene rally take more bytes.

Computer design took a different turn in the 1980s. Memory had become much cheaper and

CPUs had become much faster. This led to design s whe re all the instructions are the same size

32 bits being very common these days.

We now turn our attention to the machine code that is produced by the assembler. Recall

that it is the machine code that is actually executed by the control unit in the CPU. That is, the

computer is controlled by bit patterns that are loaded into the instruction register in the CPU.

Programmers seldom ne ed to know what the machine code is fo r any given assembly lan-

guage instruction. The actual instruction depends upon the operation to be performed, the

location(s) of the data to operate on, and the size of the data. Even when writing in assembly

language, the programmer uses mnemonic names to specify each of these, and the assembler

program translates them in to the proper machine code instruction . So you do no t need to memo -

rize machine code. However, learning how assembly language instructions translate to machine

code is important for learning how a computer actually works. And knowing how to "hand

assemble" an instruction using a m anual can help you find obscure bugs.

9.3.1 Assembler Listings

Most assemblers can prov ide the programmer with a listing file, which shows the machine cod e

for each instruction. The assembly listing option for the gnu assembler is -al. For example, the

"program" in Listing 9.7 contains some instructions that we will assemble and study to illustrate

how to read machine language from a listing file.

1 # someMachineCode.s

2 # Some instructions to illustrate machine code.

3 # Bob Plantz - 11 June 2009

4

5 .text

6 .globl main

7 .type main, @function

8 main:

9 pushq %rbp # save caller's base pointer

10 movq %rsp, %rbp # establish our base pointer

11

12 movq $0x1234567890abcdef, %r10 # 64-bit immediate

9.3. INTRODUCTION TO MACHINE CODE 197

13 movl $0x12345678, %r11d # 32-bit immediate

14 movw $0x1234, %r12w # 16-bit immediate

15 movb $0x12, %r13b # 8-bit immediate

16

17 movq %rax, %r10 # 64-bit operands

18 movl %ecx, %r11d # 32-bit operands

19 movw %dx , %r12w # 16-bit operands

20 movb %bl , %r13b # 8-bit operands

21

22 addq %r10, %rax # add 64-bit operands

23

24 movb %al , (%rdi) # register indirect

25 movq %r12, 24(%rsi) # register indirect with offset

26

27 movl $0, %eax # return 0 to caller

28 movq %rbp, %rsp # restore stack pointer

29 popq %rbp # restore caller's base pointer

30 ret # back to caller

Listing 9.7: Some instructions fo r us to assemble. (This is not a program, just some instruc-

tions.)

The com mand to assemble the source file in Listing 9.7 and create a listing le is

as --gstabs -al -o someMachineCode.o someMachineCode.s

The -al option sends the listing file to the standard output file, which defaults to the screen.

You can capture this ou tp ut by redirecting the standard output to a disk file. A good extension

for the file name is ".lst." The complete command is

as --gstabs -al -o someMachineCode.o someMachineCode.s \

> someMachineCode.lst

Notice the line

continuation

character, '\'.

which produces the file shown in Figure 9.1.

The first column is the line number of the original source. You should recognize the right-

hand two-thirds of the listing as the assembly language so urce. We will focus our attention on

the second and third columns on the left-h and side.

The v alues in the first column are displayed in decimal, while the values in the second and

third columns are in hexadecimal.

The function itself starts on line 8 with the label "main." Since there is nothing else on this

line in the source file, it does not occupy any memory in the program.

The first entry in the second column 0000 occ urs at line 9. It shows the memory location

relative to the begin ning of the function. Since the source code on line 8 has only a label, the

instruction on line 9 is the rst one in this function. Furthermore, the label on line 8 applie s

to (relative) memory location 0000. The label allows other parts of the program to refer to this

memory lo cation by name. In particular, since the label, main, is declared as a .globl , functions

in other files linked to this one can r efer to this memo ry location. It effectively names this

function as the main function.

The entry in the third colum n on line 9 is 55. It is the machine code at relative location 0000.

That is, byte number 0000 in this function is set to the bit pattern 55

16

. Fo llowing the line across,

we can see that this is the machine code corresponding to the instruction

pushq %rbp

Since the first instruction occupies one byte of memory, the second instruction will start in byte

number 0001 (the seco nd byte from the beginning). From the assembly listing file (Figure 9.1)

we see that the machine code for

movq %rsp, %rbp

198 CHAPTER 9. COMPUTER OPERATIONS

GAS LISTING someMachineCode.s page 1

1 # someMachineCode.s

2 # Some instructions to illustrate machine code.

3 # Bob Plantz - 11 June 2009

4

5 .text

6 .globl main

7 .type main, @function

8 main:

9 0000 55 pushq %rbp # save caller's base pointer

10 0001 4889E5 movq %rsp, %rbp # establish our base pointer

11

12 0004 49BAEFCD movq $0x1234567890abcdef, %r10 # 64-bit immediate

12 AB907856

12 3412

13 000e 41BB7856 movl $0x12345678, %r11d # 32-bit immediate

13 3412

14 0014 6641BC34 movw $0x1234, %r12w # 16-bit immediate

14 12

15 0019 41B512 movb $0x12, %r13b # 8-bit immediate

16

17 001c 4989C2 movq %rax, %r10 # 64-bit operands

18 001f 4189CB movl %ecx, %r11d # 32-bit operands

19 0022 664189D4 movw %dx, %r12w # 16-bit operands

20 0026 4188DD movb %bl, %r13b # 8-bit operands

21

22 0029 4C01D0 addq %r10, %rax # add 64-bit operands

23

24 002c 8807 movb %al, (%rdi) # register indirect

25 002e 4C896618 movq %r12, 24(%rsi) # register indirect with offset

26

27 0032 B8000000 movl $0, %eax # return 0 to caller

27 00

28 0037 4889EC movq %rbp, %rsp # restore stack pointer

29 003a 5D popq %rbp # restore caller's base pointer

30 003b C3 ret # back to caller

Figure 9.1: Assembler listing file for the function shown in Listing 9.7.

is the bit pattern

4889e5

16

= 0100 1000 1000 1001 1110 0101

2

This instruction occupies three bytes. Thus, the third instruction in this function begins at the

fifth byte relative location 0004 . Continuing to line 30, the last instruction in the program

ret

is a one-byte instruction. It is the sixtieth byte in the function and is located at relative location

003b with the bit pattern,

c3

16

= 1100 0011

2

So you can use the -al option for the as assembler to produ ce an assembler listing, which will

show you exactly what the bit patterns are f or each instruction and which bytes, relative to the

beginning of the function, are set to these patterns.

9.3. INTRODUCTION TO MACHINE CODE 199

9.3.2 General Format of Instructions

Instructions in the X86-64 architec ture can be from one to fifteen bytes in length. Each byte

falls into one of several categories:

O pcode This is the first byte in the instruction and specifies the basic operation per-

formed by executing the instruction. It can also include operand location.

ModRM The mode/reg ister/memory byte specifies o perand locations and h ow they are

accessed.

SIB The scale/index/base byte specifies op erand locations and how they are accessed.

D ata These bytes are used to encode constants, either tho se that are part of the program,

or those that are relative address o ffsets to operand locations in memory.

Prefix If placed in be fore the opcode, these mod ify the behavior of the instruction, typi-

cally the size of the operands.

The general placement of these bytes is shown in Figure 9.2.

- p r e f i x - - o p c o d e - - m o d r m - - - - s i b - - - - - d a t a - -

Figure 9.2: General format of instructions. Ther e can be more than one prefix byte. The number

of data bytes depends on the size o f the data.

9.3.3 REX Prefix Byte

In order for an instruction to use the 64-bit features the x 86-64 architecture uses a prefix byte,

a REX prefix , placed immediately before the primary instruction. The assembler recog nizes

when a REX p r efix is re qu ired and inserts it automatically; the programmer does not need to

explicitly specify it. However, the assembler may give an e rror message that implies it is the

responsibility of the pro grammer to insert a REX prefi x. For example, when attempting to use

subb %ah , %dil # subtract bytes

the assembler g ave the er r or message:

addAndSubtract2.s:23: Error: can't encode register '%ah' in an

instruction requiring REX prefix.

The reason for this error is explained in Section 6.2 (page 118). Accessing the %dil register

requires that the assembler insert a REX prefix, but the %ah register cannot be accessed by an

instruction that has a REX p refix.

REX prefi xes are a byproduct of maintaining backward compatibility. The x86-32 architec-

ture has only 8 general purpose registers, so it is sufficient to have only three bits in an instruc-

tion to specify any register. There are 16 general purpose re gisters in the x86-64 architecture,

so four bits are re qu ired to specify a register. Some instru c tio ns involve up to three registers,

thus there must be a place for three more bits to specify all the registers. Rather than change

the register-specif ying patterns in the Opcode, ModRM, and SIB bytes, the CPU designers de-

cided to use the REX.R, REX.X, and REX.B bits in the REX prefix byte as the high-order bits for

specifying registers. This provides the necessary three bits for register specification. A fou rth

bit in the REX prefix, the REX.W bit, is set to 1 when the operand is 64 bits. For all o ther operand

sizes 8, 16, or 32 bits REX.W is set to 0. The format of the REX p refix byte is shown in

Figure 9.3.

200 CHAPTER 9. COMPUTER OPERATIONS

0 1 0 0 W R X B

Figure 9.3: REX prefix byte. The four lettered bits are named REX.W , REX.R, REX.X, and REX.B.

m m r r r b b b

Figure 9.4: ModRM byte. The mode is spe cified by the mm bits, register by the rrr bits, and

address base register by the bbb bits.

9.3.4 ModRM Byte

The f ormat of a ModRM byte is shown in Figure 9.4. When one operan d uses the base register

plus of fset addressing mod e, that register is specified by the 3-bit bbb re gister field, and the

other register is specified by the rrr register field. Table 9.2 sho ws the meaning of the 2-bit mm

field. If mm = 11 both operands are register direct and are specified by the two re gister fields,

mm meaning

00 memory operand; address in register specified by bbb

01 memory operand; address in register specified by bbb plus 8-bit offset

10 memory operand; address in register specified by bbb plus 16-bit offset

11 register operand; reg ister specified by bbb

Table 9.2: The mm eld in the ModRM byte. Shows how to in terpret the bbb register field.

bbb and rrr . If mm = 00 the bbb register contains the memory address of one of the operands.

The bbb re gister con tains a base add ress for the other two values of mm. 01 means that an 8-bit

offset, and 10 a 16-bit offset, is added to the base address to obtain the memory address. The

offset is stored as part of the instruction.

The meaning of the register elds is shown in Table 9.3. For 64-bit m ode, the REX bit column

is explained in Section 9.3.3.

9.3.5 SIB Byte

The format of an SIB byte is shown in Figur e 9.5. An SIB byte is required to implement the

s s i i i b b b

Figure 9.5: SIB byte. The ss bits specify a scale factor, the iii bits the index r egister, and the

bbb bits the add r ess base register.

indexed addressing mo de (see Section 13.1, page 291). The memory addr ess is given by multi-

plying the value in the index register by the scale factor and add ing this to the address in the

base register. There can also be a offset, which is added to this sum.

9.3. INTRODUCTION TO MACHINE CODE 201

REX register register

bit field names

0 0 0 0 rax , eax, ax, al

0 0 0 1 rcx , ecx, cx, cl

0 0 1 0 rdx , edx, dx, dl

0 0 1 1 rbx , ebx, bx, bl

0 1 0 0 rsp, esp, sp, spl, ah

0 1 0 1 rbp, ebp, bp, bpl, ch

0 1 1 0 rsi, esi, si, sil, dh

0 1 1 1 rdi, edi, di, dil, bh

1 0 0 0 r8, r8d, r8w, r8b

1 0 0 1 r9, r9d, r9w, r9b

1 0 1 0 r10, r10d , r10w, r10b

1 0 1 1 r11, r11d , r11w, r11b

1 1 0 0 r12, r12d , r12w, r12b

1 1 0 1 r13, r13d , r13w, r13b

1 1 1 0 r14, r14d , r14w, r14b

1 1 1 1 r15, r15d , r15w, r15b

Notes:

1. A 3-bit register field can be in an opcode, ModRM, or SIB byte, depending upon the instruction.

2. The REX bit is the REX.R, REX.X, or REX.B bit in the REX prefix (Section 9.3.3), depending on the location of

the register field.

3. If a REX prefix is required, the REX.W bit is set to 1 for 64-bit operands.

4. The ah, bh, ch, and dh registers cannot be used in an instruction that requires a REX prefix; the spl, bpl, sil, and

dil registers require a REX prefix.

Table 9.3: Machine code of gener al purpose registers. The register name specified by the pro-

grammer determines other bit patterns in the instruction in addition to those shown

here.

9.3.6 The mov Instruction

We next conside r the instruction on line 10 of Figure 9.1:

10 0001 4889E5 movq %rsp, %rbp # establish our base pointer

This instruction copies all eight bytes from the rsp register to the rbp re gister. It starts with a

REX Prefix, followed by two bytes for the instruction itself. The general format of the instruction

for movin g data from one register to another is shown in Figure 9.6. The REX Prefi x is followed

1 0 0 0 1 0 0 w 1 1 s r c d s t

Figure 9.6: Machine code for the mov from a register to a register instruc tion. The source re gister

is coded in the src bits and the destination in the dst bits. See Table 9.3 for the bit

patterns in each of these fields.

by the opcode, then an ModRM byte.

The opco de includes a "w" bit. This bit is 0 for 8-bit moves and 1 for all other sizes. The

instruction operates on a 64-bit value, so w = 1 in the opcode ( 89

16

).

The 11

2

in the mod field of the ModRM byte shows that both the sour c e and destination

register numbers are encoded in this byte. The src field shows the source and the dst field

shows the destination.

From Table 9.3 we see that the source re gister is either rsp, esp, or sp, and the destination

register is either rbp, ebp, o r bp. (w = 1 rules out the 8-bit registers.) Since the REX.W bit in the

202 CHAPTER 9. COMPUTER OPERATIONS

REX Prefix is 1, the operand size is 64 bits. Thus, the instruction makes a copy of all 64 bits in

the rsp register into the ebp register.

The second mov format cover ed here is moving immediate data to a register. Examples are

given on lines 11 14 of Figure 9.1. The rst operand (the source) is a literal the value itself

is stated. This value will be stored immediately after the instruction. Of course, the instruction

must encode the fact that this ope r and is located at the address immediately following the

instruction the immediate data addressing mode. The destination operand is a register

the re gister direct addressing mode. The general form at for the move immediate data to a

register instruction is shown in Figure 9.7 in binary.

1 0 1 1 w d s t - - d a t a - - - - d a t a - - - - d a t a - - - - d a t a - -

Figure 9.7: Machine code for the mov immediate data to a register instruction. The number of

data bytes depends on the size of the data.

Consider the

11 0004 49BAEFCD movq $0x1234567890abcdef, %r10

11 AB907856

11 3412

instruction, the assembler determines that this is a mov instruction and the source operand is

immediate data (due to the "$" character), so the fi rst four bits of the opcode are 1011 (see Figure

9.7). Since the op erand is no t 8 bits, the "w" bit is 1. Next, the assembler figures out that the

destination register is the r10 register. Looking this up on Table 9.3 (which is built into the

assembler) shows that the remaining three bits are 010. Thus, the assembler generates the first

byte of the instruction:

1011 1010

2

= ba

16

Since the oper an d size is 64 bits, the data value, 0x1234567890abcdef, is stored immediately

(immediate addressing mod e) after the instruction. Notice that the byte s seem to be stored

backwards. That is, it looks like the assembler stored the 64-bit value 0xefcdab9078563412!

Recall that the x86-64 architecture uses the little endian order for storing data in mem ory, so

when the movl instruction co pies four bytes from memory into a register, the byte at the lowest

memory address is loaded into the least significant byte of the re gister, the byte at the next

memory address is loaded into the next higher order byte of the register, etc. The assembler

takes this into account for us and stores the immediate d ata in mem ory in little endian format.

The endian issue is irrelevant if you are always consistent with the size of the data item.

Howeve r, if your algorithm changes data size, you need to be very aware of the endianess o f the

processor. For example, if you use a movl to store four bytes in memory, then four movbs to read

them back into registers, you need to be aware of how the y are physically stored in memory.

Finally, since this instruction operates on a 64-bit value, the instruction requires a REX

Prefix. Referring to Figure 9.3 we see that the REX.W bit is 1, indicating the 64-bit size of the

operands. And the REX.B bit is 1, which is used with the dst field to give the 4-bit number of

the r10 register, 1010

2

.

BE CAREFUL! Notice that the instruction is ten bytes long (Figure 9.1), but the operand size is

four bytes. Do not confus e the size of the instruction with the size of the operand(s).

9.3.7 The add Instruction

The add instruction has three different general formats. We p r esent only a partial description

here.

The format for addin g an immediate value to a value in the rax, eax, ax, or al register is

shown in Figure 9.8. The w bit is 0 for al and 1 for all others. The immediate data value must be

9.3. INTRODUCTION TO MACHINE CODE 203

the same size as the register to which it is added, except when adding to the rax register. Then

the immediate data is 32 bits and is sign-extende d to 64 bits before adding it to the value in the

rax r egister. Note that this instruction is not used for the ah portion o f the a register. For adding

an immediate value to a value to the ah register or any of the other registers, the assembler

program must use the instruction shown in Figure 9.9.

0 0 0 0 0 1 0 w - - d a t a - - - - d a t a - - - - d a t a - - - - d a t a - -

Figure 9.8: Machine code for the add immediate data to the A register (except ah) instruction.

The number of data bytes depends on the size of the data.

1 0 0 0 0 0 0 w 1 1 0 0 0 d s t - - d a t a - - - - d a t a - - - - d a t a - - - - d a t a - -

Figure 9.9: Machine code for the add immediate data to register (not al, ax, nor eax registers)

instruction. The number of data bytes depends on the size of the data.

Notice that the instruction for adding to the a register (except the ah portion) is one byte

shorter than when adding to the other registers (compare Figures 9.8 and 9.9). There is an

historical reason for this. Early CPU designs had only one general purpose register. It was

used as the "accumulator" for performing arithmetic. (Perhaps naming it the "a" register makes

a little more sense.) As more general purpose registers were added to the designs, assembly

language programm ers tended to continue using the "accumulator" register more frequently

than the others. And compiler writers continued this same pattern of register usage. Hence, the

"a" register is used much more for addition in a program than the other registers, and making it

a shorter instruction reduces memory usage and increases execution speed. The differences are

generally irrelevant these days, but the x86 architecture has evolved in such a way to maintain

backward compatibility.

The add instruction shown in Figure 9.10 is used when the data value is small enough to t

into one byte, but it is being added to a two-, four-, or eight-byte register. The value is sign-

extended to a full 16-bit, 32-bit, or 64-bit value, respectively, inside the CPU before it is added to

the register. Sign-extension consists of copying the high-order bit into each bit to the left until

the full width is reached. For example, sign-extending 0x7f to 32 bits would give 0x0000007f;

sign-extending 0x80 to 32 bits would give 0xffffff80. Notice that sign-extension preserves the

signed d ecimal value of the bit pattern. (Review Section 3.3.)

Adding small

32-bit values.

1 0 0 0 0 0 1 1 1 1 0 0 0 d s t - - d a t a - -

Figure 9.10: Machine code for the add immediate data to a register instruction. Used when the

data will fit into one byte, but the register is two, four, or eight bytes. Value is

sign-extended.

An ex ample of this is the instruction

addl $5, %ecx

Even though the value can be coded in only eight bits, the full 32 bits of the register may be

affected by the addition. That is, the machine code is 83c105 (the data is coded in only one byte),

but the CPU ad ds 0x00000005 to the rcx register. (Recall that this may produce different results

than simply adding 0x05 to the cl portion of the ecx register.)

The f ormat for adding a value in a register to a value in a register is shown in Figure 9.11.

Again, the registers and size of d ata are specified by the bits w, src, and dst are given in Table

9.3, and "src" means "source" and "dst" means "destination."

204 CHAPTER 9. COMPUTER OPERATIONS

0 0 0 0 0 0 0 w 1 1 s r c d s t

Figure 9.11: Machine code for the add register to re gister instruction.

Let us look at the add instruction on line 17 in Figure 9.1:

addl %ecx, %edx

This instruction adds the 32 bits from the ecx register to the 32 bits in the edx register, leaving

the result in the edx register. From Table Table 9.3, w = 1, src = 001, and dst = 010. Thus the

instruction is

00000001 11001010

2

= 01ca8

16

9.4 Instructions Introduced Thus Far

This summary shows the assembly language instructions introduced thus far in the book. The

page number where the instruction is explained in more detail, which may be in a subsequent

chapter, is also giv en. This boo k provide s only an introduction to the usage of each in struction.

You need to consult the manuals ([2] [6], [14] [18]) in order to learn all the po ssible uses of

the instructions.

9.4.1 Instructions

data movement:

opcode source destination action see page:

movs $imm/ %reg %reg/mem move 141

movsss $imm/ %reg %reg/mem move, sign extend 216

movzss $imm/ %reg %reg/mem move, zero extend 217

popw %reg/mem pop from stack 163

pushw $imm/ %reg/mem push o nto stack 163

s = b, w, l, q; w = l, q

arithmetic/logic:

opcode source destination action see page:

adds $imm/ %reg %reg/mem add 189

adds mem %reg add 189

cmps $imm/ %reg %reg/mem compare 209

incs %reg/mem increment 220

leaw mem %reg load eff ective address 167

subs $imm/ %reg %reg/mem subtract 190

subs mem %reg subtract 190

s = b, w, l, q; w = l, q

program ow control:

opcode location action see page:

call label call function 156

je label jump equal 211

jmp label jump 213

jne label jump not equal 211

jno label jump no overflow 211

leave undo stack frame 168

ret return from function 168

syscall call kernel function 177

9.5. EXERCISES 205

9.4.2 Addressing Modes

register direct: The data value is located in a CPU register.

syntax: name of the register with a "%" prefix.

example: movl %eax, %ebx

immediate

data:

The data value is located immediately after the instruc-

tion. Source operand only.

syntax: data value with a "$" prefix.

example: movl $0xabcd1234, %ebx

base register

plus offset:

The data value is located in memory. The address of the

memory location is the sum of a value in a base register

plus an offset value.

syntax: use the name of the register with parentheses

around the name and the offset value immediately be-

fore the left parenthesis.

example: movl $0xaabbccdd, 12(%eax)

9.5 Exercises

9-1 (§9.1) Enter the assembly language pro gram in Listing 9.3. Use gdb to single step through

the program as shown in the book. Befo re executing each instruction, predict how the rax,

rbp, and rsp reg isters will change. Also record the values in the rip and eflags registers

as you single step through the program. How many bytes are there in each instruction?

9-2 (§9.2) Enter the C pr ogram in Listing 9.4. Using gdb, ve rify that the program works cor-

rectly, as shown in Table 9.1.

9-3 (§9.2) Enter the assembly language progr am in Listing 9.6 and run it. Notice that it gives

different results than the C version if there is overflow. Why is this? Modify the program

so that it gives the same results as the C version but still gives an overflow warning.

9-4 (§9.3) Assemble each of the mov instructions in Listings 9.7 by hand. Check your answers

with the assembly listing.

9-5 (§9.3) Assemble each of the add instructions in Listing 9.7 by hand. Check your answers

with the assembly listing.

9-6 (§9.3) Assemble each of the following instructions by han d (on paper).

a) movl $0x89abcdef, %ecx

b) movw $0xabcd, %ax

c) movb $0x30, %al

d) movb $0x31, %ah

e) movq %r8, %r15

f) movb %r9b, %r10b

g) movl %r11d, %r12d

h) movq $0x7fffec9b2cf4, %rsi

Check your work by e ntering the code into a sou r ce file of the f orm

.text

.globl main

.type main, @function

main:

pushq %rbp

movq %rsp, %rbp

# Your code sequence goes here.

movl $0, %eax

popq %rbp

ret

and cre ating a listing le.

206 CHAPTER 9. COMPUTER OPERATIONS

9-7 (§9.3) Assemble each of the following instructions by han d (on paper).

a) addl $0x89abcdef, %ecx

b) addw $0xabcd, %ax

c) addb $0x30, %al

d) addb $0x31, %ah

e) addq %r12, %r15

f) addw %r8w, %r10w

g) addb %r9b, %sil

h) addl %esi, %edi

Check your work by e ntering the code into a sou r ce file of the f orm

.text

.globl main

.type main, @function

main:

pushq %rbp

movq %rsp, %rbp

# Your code sequence goes here.

movl $0, %eax

popq %rbp

ret

and cre ating a listing le.

9-8 (§9.3) Design an experim ent that will allow you to determine what the machine code is for

the

pushq 64-bit

_

register

instruction, where "64-bit_register" is any of the general purpose registers. What is the

general format of the instruction? Show your answe r as a drawing similar to Figure 9.7.

Which ones use a REX p refix? Hint: assemble with the -al option.

9-9 (§9.3) Design an experim ent that will allow you to determine what the machine code is for

the

popq 64-bit

_

register

instruction, where "64-bit_register" is any of the general purpose registers. What is the

general format of the instruction? Show your answe r as a drawing similar to Figure 9.7.

Which ones use a REX p refix? Hint: assemble with the -al option.

9-10 9.3) Disassemble each of the machine instruction sequen ces by hand (on paper). (Find

the corresponding assembly language instruction for each machine code instruction.) No-

tice that this is a much more difficult pro blem, because it is diffi cult to tell where one

instruction ends and the next one begins. We h ave placed one machine instruction on each

line to help you. Enter each of yo ur assembly languag e programs into a source file and use

the assembler to check you r work.

a) b0ab

b4cd

41b0ef

41b701

b) 40b723

40b634

b256

b678

c) b83412cdab

bbabcd1234

41b900000000

41be7b000000

d) 66b8cdab

66bbbacd

66b93412

66ba2143

9.5. EXERCISES 207

e) 88c4

8808

88480a

8a08

8a480a

f) 89c3

6689d8

4889ca

4589c6

g) 04ab

80c4cd

80c3ef

80c701

h) 80c123

80c534

80c256

80c678

i) 053412cdab

81c3abcd1234

81c1d4c3b2a1

81c2a1b2c3d4

j) 5ab00000000

83c301

83c100

81c2ff000000

k) 6605cdab

6681c3bace

6681c13412

6681c22143

l) 6605ab00

6683c301

6683c100

6681c2ff00

m) 00c4

4100c2

00ca

4500c1

n) 01c3

6600d8

4801ca

4501c6

Chapter 10

Program Fl ow Constructs

The assembly language we have studied thus far is executed in se qu ence. In this chapter we

will learn how to organize assembly language instructions to implement the other two required

program flow constructs repe tition and binary decision.

Text string manipulations provide many examples of using program flow constructs, so we

will use them to illustrate many of the conc epts. Almost any program displays many text string

messages on the screen, which are simply arrays of characters.

10.1 Repetition

The algorithms we choose when programming inter act closely with the data storage structure.

As you probably know, a string of characters is stored in an array. Each element of the array is

of type char, and in C the end of the data is signified with a sen tinel value, the NUL character

(see Table 2.3 on page 20).

The other technique for specifying the length of the string is to store the number of characters in the

string together with the string. This is implemented in Pascal by s toring the number of characters

in the rst b yte of the array, and the actua l characters are stored i mmediately following.

Array processing is usually a rep etitive task. The processing of a character string is a good

example of repetition. Consider the C program in Listing 10.1.

1 /

*

2

*

helloWorld1.c

3

*

"hello world" program using the write() system call

4

*

one character at a time.

5

*

Bob Plantz - 12 June 2009

6

*

/

7 #include <unistd.h>

8

9 int main(void)

10 {

11 char

*

aString = "Hello World.\n";

12

13 while (

*

aString != '\0')

14 {

15 write(STDOUT

_

FILENO, aString, 1);

16 aString++;

17 }

18

19 return 0;

20 }

208

10.1. REPETITION 209

Listing 10.1: Displaying a string one character at a time (C).

The while statement on lines 13 17,

while (

*

aString != '\0')

{

...

}

controls the execution of the statements within the {. . . } block.

1. It evaluates the boolean expression

*

aString != '\0'.

2. If the boolean expression evaluates to false, program flow jumps to the statement immedi-

ately following the {. . . } block.

3. If the boolean expression evaluates to true, program flow enters the {. . . } block and exe-

cutes the statements there in sequence.

4. At the end of the {. . . } block program flow jumps back up to the evaluation of the boolean

expression.

The pointer variable is incremen ted with the

aString++;

statement. Notice that this variable must be changed inside the {. . . } block. Otherwise, the

boolean expression will always evaluate to true, giving an "infinite" loop.

It is important that you identify the variable that the while construct uses to control program

flow the Loop Control Variable (LCV). Make sure that the value of the LCV is changed within

the {. . . } block. Note that there may be more than one LCV.

The way that the while construct controls p rogram fl ow can be seen in the flow chart in Fig-

ure 10.1. This flow chart shows that we need the following assembly languag e tools to construct

a while loop:

Instruction(s) to evaluate boolean expressions.

An instruction that conditionally transfers control (ju mps) to another location in the pro-

gram. This is represented by the large diamond, which shows two p ossible paths.

An instruction that uncon ditionally transfers control to another location in the program.

This is represented by the line that leads from "Execute body of while loop" back to the

top.

We will exp lore instructions that provide these tools in the next three subsections.

10.1.1 Comparison Instructions

Most arithmetic and logic instructions affect the condition code bits in the rflags register. (See

page 121.) In this section we will look at two instructions that are use d to se t the condition

codes to show the relationship between two values without changing either of them.

One is cmp (compare). The syntax is

cmps source, destination

where s denotes the size of the operand:

s meaning number of bits

b byte 8

w word 16

l longword 32

q quadword 64

210 CHAPTER 10. PROGRAM FLOW CON STRUCTS

Evaluate

Boolean

expression

Initialize Loop

Control Variable

Execute Body

of while loop

Next instruction

after wh ile

loop construct

false

true

Figure 10.1: Flow chart o f a while loop. The large diamond represents a binary decision that

leads to two possible paths, "true" or "false." Notice the path that leads back to the

top of the while loop after the body has been e xecuted.

Intel®

Syntax cmp destination, source

The cmp operation consists of subtracting the source operand from the destination operand

and setting the condition code bits in the rflags register accordingly. Neither of the operand

values is changed. The subtraction is done inter nally simply to get the result and se t the OF, SF,

ZF, AF, PF, CF condition codes acco rding to the result.

The other instruction is test. The syntax is

tests source, destination

where s denotes the size of the operand:

s meaning number of bits

b byte 8

w word 16

l longword 32

q quadword 64

Intel®

Syntax

test destination, source

The test operation consists of performing a bit-wise and between the two operan ds and

setting the cond ition codes in the rflags register accordingly. Neither of the operand values is

changed. The and operation is d one internally simply to get the result and set the SF, ZF, and PF

condition codes acc ording to the result. The OF and CF are set to 0, and the AF value is undefined.

10.1. REPETITION 211

10.1.2 Conditional Jumps

These instructions are used to alter the flow of the program depending on the settings of the

condition code bits in the rflags register. The g eneral format is

jcc label

where cc is a 1 4 letter sequence specifying the condition codes, and label is a memory lo c ation.

Program flow is transfe r red to label if cc is true. Otherwise, the instruction immed iately follow-

ing the conditional jump is executed. The condition al jump instructions are listed in Table 10.1.

instruction action

condition codes

ja jump if above

(CF = 0) · (ZF = 0)

jae jump if above or equal CF = 0

jb jump if below CF = 1

jbe jump if below or equal

(CF = 1) + (ZF = 1)

jc jump if carry CF = 1

jcxz jump if cx register zero

jecxz jump if ecx register zero

jrcxz jump if rcx register zero

je jump if equal ZF = 1

jg jump if greater (ZF = 0 ) · (SF = OF )

jge jump if greater or equal SF = OF

jl jump if less

SF 6= OF

jle jump if less or equal (ZF = 1) + (SF 6 = OF )

jna jump if not above (CF = 1) + (ZF = 1)

jnae jump if not above or equal

CF = 1

jnb jump if not below CF = 0

jnbe jump if not below or equal

(CF = 0) · (ZF = 0)

jnc jump if not carry CF = 0

jne jump if not equal ZF = 0

jng jump if not greater

(ZF = 1) + (SF 6 = OF )

jnge jump if not greater or e qu al SF 6= OF

jnl jump if not less SF = OF

jnle jump if not less or equal

(ZF = 0) · (SF = OF )

jno jump if not over flow OF = 0

jnp jump if not parity or equal

P F = 0

jns jump if not sign SF = 0

jnz jump if not zero ZF = 0

jo jump if overflow

OF = 1

jp jump if parity P F = 1

jpe jump if parity even P F = 1

jpo jump if parity odd

P F = 0

js jump if sign SF = 1

jz jump if zero

ZF = 1

Table 10.1: Conditional jump instructions.

A good way to appreciate the meaning of the cc sequences in this table is to consider a very

common application of a conditional jump:

cmpb %al, %bl

jae somePlace

movb $0x123, %ah

If the value in the bl register is numerically above the value in the al register, or if they are

equal, then program control transfe rs to the address labeled "somePlace." Otherwise, program

control c ontinues with the movb instruction .

212 CHAPTER 10. PROGRAM FLOW CON STRUCTS

The differences between "greater" versus "above", and "less" versus "below", are a little sub-

tle. "Above" and "below" refer to a sequen ce of unsign ed numbers. For example, characters

would probably be co nsidered to be unsigned in most applications. "Greater" and "less" refer to

signed values. Integers are commonly considered to be signed.

Table 10.2 lists four conditional jumps that are commo nly used when processing unsigned

values. And Table 10.3 lists four commonly used with signed values.

instruction meaning immediately after a cmp . . .

ja jump above jump if destination is above source

in sequence

jae jump above or

equal

jump if destination is above or in

same place as source in sequence

jb jump below jump if destination is below source

in sequence

jbe jump below or

equal

jump if destination is below or in

same place as source in sequence

Table 10.2: Conditional jump instructions for unsigned values.

instruction meaning immediately after a cmp . . .

jg jump greater jump if destination is greater than

source

jge jump greater or

equal

jump if destination is greater than

or equ al to source

jl jump less jump if destination is less than

source

jle jump less or

equal

jump if destination is less than or

equal to source

Table 10.3: Conditional jump instructions for signed values.

Since most instructions affect the se ttings of the condition codes in the rflags register, each

must be used immediately after the instruction that determines the c onditions that the pro-

grammer intends to cause the jump.

HINT: It is easy to forget how the order of the source and desti nation controls the conditional ju mp

in this construct. Here is a p lace where the debugger can save you time. Simply put a breakpoint at

the conditional jump in struction. When th e program stops there, look at the values in the source and

destination. Then use the si debugger command to execute one instruction a nd see where it goes.

The jum p instructions bring up another addressing mode rip-relative.

1

rip-relative: The target is a mem ory address determined by adding an offset to the current

address in the rip register.

syntax: a progr ammer-d efined label

example: je somePlace

The offset, which can be positive or neg ative, is stored imme diately followin g the opcode for

the instruction in two's complement format. Thus, the offset becomes a part of the instruction,

similar to the immediate data addressing mode. Just like the immediate addressing mode, the

offset is stored in little endian order in memory.

The fo llowing steps occur during program execution of a jcc instruction (recall Figure 6.5):

1. The jump instruction, including the offset value, is fetched.

1

In an environment where the instruction pointer is called the "program counter" this would be called "pc-relative."

10.1. REPETITION 213

2. As always, the rip register is incremented by the number of bytes in the jump instruction,

including the offset value that is stored as part o f the jump instruction.

3. If the conditions to c au se a jump are true, the offset is adde d to the rip register.

4. If they are not true, the instruction has no effect.

When a conditional jump instruction is assembled, the assembler computes the number of

bytes from the jump instruction to the specified label. The assembler then subtracts the number

of bytes in the jump instruction from the distance to the label to yield the offset. This computed

offset is stored as part of the jump instruction. Each jump instruction has several forms, de-

pending on the number of bytes that m ust be used to store the offse t. Note that the offset is

stored in two's complement form at to allow f or negative jumps.

For example, if the offset will fit into eight bits the opcode f or the je instruction is 74

16

, and it

is 0f84

16

if more than eight bits are required to store the off se t (in which c ase the offset is stored

in as a thirty-two bit v alue). The machine cod e is shown in Table 10.4 for four different target

address offsets. Notice that the 32-bit offsets are stored in little endian order in memory.

distance to target address

bytes, decima l

machine code hexadeci-

mal

+100 7462

-100 749a

+300 0f8426010000

-300 0f84cefeffff

Table 10.4: Machine code for the je instruction. Four different distances to the jump targe t

address. Notice that the 32-bit offsets are stored in little endian order.

10.1.3 Unconditional Jump

We also need an instruction that unconditionally transfers con trol to another location in the

program. The instruction has three forms:

jmp label

jmp

*

register

jmp

*

memory

Program flow is transferred to the location specified by the operand.

The first form is limited to those situations where the distance, in number of bytes, to the

target location will fit within a 32-bit sign ed integer. The addressing mode is rip-relative. That

is, the 32-bit signed integer is added to the current value in the rip register. This is suf ficient

for most cases.

In the other two forms, the target address is stored in the specified register or memory

location, and the operan d is accessed indirectly. The add r ess is an unsigned 64-bit value. The

jmp instruction moves this stored address directly into the rip register, replacing the address

that was in there. The "

*

" character is used to indicate "indirection."

BE CAREFUL: The unconditional jump uses "

*

" for indirection, while all other instructions u se

"(register)." It might be tempting to use something like "

*

(%rax)." Although the (. . . ) are not an

error here, they are superfluous. They have essentially the same effect a s someth ing like (x ) in an

algebraic expression.

The three ways to use an unconditional jump are shown in Listing 10.2.

1 # jumps.s

2 # demonstrates unconditional jumps

3 # Bob Plantz - 12 June 2009

4 # global variable

214 CHAPTER 10. PROGRAM FLOW CON STRUCTS

5 .data

6 pointer:

7 .quad 0

8 format:

9 .string "The jump pattern is %x.\n"

10 # code

11 .text

12 .globl main

13 .type main, @function

14 main:

15 pushq %rbp # save frame pointer

16 movq %rsp, %rbp # set new frame pointer

17

18 movl $7, %esi # assume all three jumps

19 jmp here1

20 andl $0xfffffffe, %esi # no jump, turn off bit 0

21 here1:

22 leaq here2, %rax

23 jmp

*

%rax

24 andl $0xfffffffd, %esi # no jump, turn off bit 1

25 here2:

26 leaq here3, %rax

27 movq %rax, pointer

28 jmp

*

pointer

29 andl $0xfffffffb, %esi # no jump, turn off bit 2

30 here3:

31 movl $format, %edi

32 movl $0, %eax # no floats

33 call printf # show pattern

34

35 movl $0, %eax # return 0;

36 movq %rbp, %rsp # restore stack pointer

37 popq %rbp # restore frame pointer

38 ret

Listing 10.2: Unconditional jumps.

The most commonly used form is rip-relative as shown on line 19:

19 jmp here1

On lines 22 23 an address is loaded into a register, then the jump is made indirectly via the

register to that address.

22 leaq here2, %rax

23 jmp

*

%rax

Lines 26 28 show how an address can be stored in memory, then the memo ry used indirectly

for the j ump.

26 leaq here3, %rax

27 movq %rax, pointer

28 jmp

*

pointer

Of course, the indirect techniques are not required in this simple example, but they might be

neede d for some programs.

10.1.4 while Loop

We are no w p repared to look at how a while loop is constructed at the assembly language level.

As usual, w e begin with the assembly language gene rated by the gcc compiler for the program

10.1. REPETITION 215

in Listing 10.1, which is shown in Listing 10.3 with comments added.

1 .file "helloWorld1.c"

2 .section .rodata

3 .LC0:

4 .string "Hello World.\n"

5 .text

6 .globl main

7 .type main, @function

8 main:

9 pushq %rbp

10 movq %rsp, %rbp

11 subq $16, %rsp

12 movq $.LC0, -8(%rbp) # pointer to string

13 jmp .L2 # go to bottom of loop

14 .L3:

15 movq -8(%rbp), %rsi # 2nd arg. - pointer

16 movl $1, %edx # 3rd arg. - 1 character

17 movl $1, %edi # 1st arg. - standard out

18 call write

19 addq $1, -8(%rbp) # aString++;

20 .L2:

21 movq -8(%rbp), %rax # load pointer

22 movzbl (%rax), %eax # get current character

23 testb %al, %al # is it NUL?

24 jne .L3 # no, go to top of loop

25 movl $0, %eax

26 leave

27 ret

28 .size main, .-main

29 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"

30 .section .note.GNU-stack,"",@progbits

Listing 10.3: Displaying a string one character at a time (gcc assembly langu age). Comments

added.

Let us consider the loop:

12 movq $.LC0, -8(%rbp) # pointer to string

13 jmp .L2 # go to bottom of loop

14 .L3:

15 movq -8(%rbp), %rsi # 2nd arg. - pointer

16 movl $1, %edx # 3rd arg. - 1 character

17 movl $1, %edi # 1st arg. - standard out

18 call write

19 addq $1, -8(%rbp) # aString++;

20 .L2:

21 movq -8(%rbp), %rax # load pointer

22 movzbl (%rax), %eax # get current character

23 testb %al, %al # is it NUL?

24 jne .L3 # no, go to top of loop

Notice that after initializing the loop control variable it jumps to the condition test,

12 movq $.LC0, -8(%rbp) # pointer to string

13 jmp .L2 # go to bottom of loop

which is at the bottom of the loop:

20 .L2:

216 CHAPTER 10. PROGRAM FLOW CON STRUCTS

21 movq -8(%rbp), %rax # load pointer

22 movzbl (%rax), %eax # get current character

23 testb %al, %al # is it NUL?

24 jne .L3 # no, go to top of loop

Let us rearrange the instructions so that this is a true while loop the condition test is at

the top of the loop. The exit condition has been changed from jne to je for correctness. The

original is on the left, the rearr an ged on the right:

12 movq $.LC0, -8(%rbp)

13 jmp .L2

14 .L3:

15 movq -8(%rbp), %rsi

16 movl $1, %edx

17 movl $1, %edi

18 call write

19 addq $1, -8(%rbp)

20 .L2:

21 movq -8(%rbp), %rax

22 movzbl (%rax), %eax

23 testb %al, %al

24 jne .L3

12 movq $.LC0, -8(%rbp)

13 .L2:

14 movq -8(%rbp), %rax

15 movzbl (%rax), %eax

16 testb %al , %al

17 je .L3

18 movq -8(%rbp), %rsi

19 movl $1, %edx

20 movl $1, %edi

21 call write

22 addq $1, -8(%rbp)

23 jmp .L2

24 .L3:

Both versions have exactly the same number of instructions. However, the unconditional

jump instruction, jmp, is executed every time through the "true" while loo p, but is exec uted only

once in the compiler's version. Thus, the compiler's version is more efficient. The savings is

probably insignificant in the vast majority of applications. However, if a loop is nested within

another loop o r two, the diffe rence could be important.

We also see an other v ersion of the mov instruction on line 22:

22 movzbl (%rax), %eax

This instruction converts the data size from 8- bit to 32-bit, placing zeros in the high-order 24

bits, as it copies the byte from memory to the eax register. The memory addr ess of the copied

byte is in the rax register. (Yes, this instruction writes over the address in the register as it

executes.)

The x86-64 architecture includes instructions for extending the size of a value by adding

more bits to the left. There are two ways to do this:

Sign extend copy the sign bit to each of the new high-order bits. For example, when

sign extending an 8-bit value to 16 bits, 85 would becom e ff85, but 75 would become 0075.

Zero extend make each of the new high-order bits zero. When zero extending 85 to

sixteen bits, it becomes 0085.

Sign extension can be acc omplished with the movs instruction:

movssd source, de s tination

where s denote s the size o f the source operand and d the size of the destination operand. (Use

the s column for d.)

s meaning number of bits

b byte 8

w word 16

l longword 32

q quadword 64

It can be used to move an 8-bit value from memory or a register into a 16-, 32-, or 64-bit register;

move a 16-bit value from memory or a register into a 32-bit register; or move a 32-bit value from

memory or a register into a 64-bit r egister. The "s" causes the rest of the high-order bits in

10.1. REPETITION 217

the destination register to be a copy of the sign bit in the source value. I t does not affect the

condition codes in the rflags register.

In the Intel syntax the instruction is movsx. The size of the data is determined by the

operands, so the size characters ( b, w, l, or q) are n ot appended to the instruction, and the

order of the operan ds is reversed.

Intel®

Syntax

movsx destination, source

In some cases the Intel syntax is ambiguous. Intel-syntax assemblers use keywords to specify the data

size in such cases. For example, th e nasm assembler uses

movsx destination, BYTE [source]

to move one byte and zero extend, and uses

movsx destination, WORD [source]

to move two bytes and sign extend.

Zero extension can be accomplished with the movz instruction:

movzsd source, de s tination

where s denote s the size o f the source operand and d the size of the destination operand. (Use

the s column for d.)

s meaning number of bits

b byte 8

w word 16

l longword 32

q quadword 64

It can be used to move an 8-bit value from memory or a register into a 16-, 32-, or 64-bit register;

or mov e a 16-bit value from memory or a reg ister into a 32-bit re gister. The "z" causes the rest

of the high-order bits in the destination register to be set to zero. It does not affect the condition

codes in the rflags register. Rec all that moving a 32-bit value from memory or a register into a

There is no

movzlq

instruction.

64-bit register sets the high-ord er 32 bits to zero, so there is no movzlq instruction.

In the Intel syntax the instruction is movzx The size of the data is determined by the operands,

so the size characters (b, w, l, or q) are not appended to the instruction, and the o r der of the

operands is reversed.

Intel®

Syntax

movzx destination, source

There is also a set of instructions that double the size of data in portions of the rax register,

sign extending as they do so. The instructions are:

AT&T syntax Intel® syntax start result

cbtw cbw byte in al word in ax

cwtl cwde word in ax long in eax

cwtd cwd word in ax long in dx:ax

cltd cdq lonq in eax quad in edx:eax

cltq cdqe lonq in eax quad in rax

cqto cqo quad in rax octuple in rdx:rax

where the notation "long in dx:ax" means a 32-bit value with the high-order 16 bits in dx and

the low-order 16 bits in ax. Notice that these instructions do not explicitly specify any operands,

but they change the rax and possibly the rdx registers. They do no t affect the condition codes in

the rflags register.

Returning to while loops, the general structure of a count-controlled while loop is shown in

Listing 10.4.

1 # generalWhile.s

2 # general structure of a while loop (not a program)

3 #

218 CHAPTER 10. PROGRAM FLOW CON STRUCTS

4 # count = 10;

5 # while (count > 0)

6 # {

7 # // loop body

8 # count--;

9 # }

10 #

11 # Bob Plantz - 10 June 2009

12

13 movl $10, count(%rbp) # initialize loop control variable

14 whileLoop:

15 cmpb $0, count(%rbp) # check continuation conditions

16 jle whileDone # if false, leave loop

17 # ------

18 # loop body processing

19 # ------

20 subl $1, count(%rbp) # change loop control variable

21 jmp whileLoop # back to top

22 whileDone:

23 # next programming construct

Listing 10.4: General structure of a count-controlled while loop.

This is not a co mplete program or even a fu nction. It simply shows the key elem ents o f a while

loop.

Loops, of course, take the most execution time in a program. However, in almost all cases code read-

ability is more important than efficiency. You should determine that a loop is an efficiency bottleneck

before sacrificing its structure for efficiency. And then you should generously comment what you have

done.

Our assembly language version of a "Hello world" program in Listing 10.5 uses a sentinel-

controlled while loop.

1 # helloWorld3.s

2 # "hello world" program using the write() system call

3 # one character at a time.

4 # Bob Plantz - 12 June 2009

5

6 # Useful constants

7 .equ STDOUT,1

8 # Stack frame

9 .equ aString,-8

10 .equ localSize,-16

11 # Read only data

12 .section .rodata

13 theString:

14 .string "Hello world.\n"

15 # Code

16 .text

17 .globl main

18 .type main, @function

19 main:

20 pushq %rbp # save base pointer

21 movq %rsp, %rbp # set new base pointer

22 addq $localSize, %rsp # for local var.

23

24 movl $theString, %esi

25 movl %esi, aString(%rbp) #

*

aString = "Hello World.\n";

10.1. REPETITION 219

26 whileLoop:

27 movl aString(%rbp), %esi # current char in string

28 cmpb $0, (%esi) # null character?

29 je allDone # yes, all done

30

31 movl $1, %edx # one character

32 movl $STDOUT, %edi # standard out

33 call write # invoke write function

34

35 incl aString(%rbp) # aString++;

36 jmp whileLoop # back to top

37 allDone:

38 movl $0, %eax # return 0;

39 movq %rbp, %rsp # restore stack pointer

40 popq %rbp # restore base pointer

41 ret

Listing 10.5: Displaying a string one character at a time (programmer assembly language).

Consider the sequence on lines 26 28:

26 whileLoop:

27 movl aString(%rbp), %esi # current char in string

28 cmpb $0, (%esi) # null character?

We had to move the pointer value into a register in order to dereference the pointer. These two

You have to get

an address (a

pointer) into a

register before

you can

dereference it.

instruction implement the C expression:

(

*

aString != '\0')

In particular, you have to move the addr ess into a register, then derefer ence it with the "(regis-

ter)" syntax.

Be careful not to confus e this wi th the indirection operator, "

*

", used with the jmp instruction that you

saw in Section 10.1.3, especially since the assembly language indirection operator is the same as the

dereference operator i n C/C++.

There are two common errors when u sing the assembly language syntax.

The assembly language derefere nce operator does not work on variable names. For exam-

ple, you cannot u se

cmpb $0, (ptr(%rbp)) #

***

DOES NOT WORK

***

to dereferenc e the variable, ptr.

Neither do

cmpb $0, (theString) #

***

DOES NOT WORK

***

nor

cmpb $0, (\$theString) #

***

DOES NOT WORK

***

work to dereference the theString location. Unfortunately, the assembler may not consider

any of these to be syntax erro rs, just an unnecessary set of parentheses. Therefore, you

probably will not get an assembler error message, just incorrect progr am behavior.

Another common error is to forget to dereference the register once you get the address

stored in it:

cmpb $0, %esi #

***

DOES NOT WORK

***

220 CHAPTER 10. PROGRAM FLOW CON STRUCTS

This would compare a byte in the eax register itself with the value ze ro. Since there

are four bytes in the eax register, this code will generate an assembler warning message

because it does not specify w hich byte.

Read the

warning

messages when

you assemble

and link your

programs.

BE CAREF UL: The C/C ++ syntax for the NUL character, '\0', is not recognized by the gnu assembler,

as. From Table 2.3 we see that the bit pattern for the NUL character is 0x00, and th is value mus t be

used in the gnu assembly language.

We also need to add one to the pointer variable so as to move it to the nex t character in the

string. Adding one is a common operation, so there is an operator that simply adds one,

incs source

where s denotes the size of the operand:

s meaning number of bits

b byte 8

w word 16

l longword 32

q quadword 64

The inc instruction adds one to the source operand. The o perand can be a reg ister or a memory

location.

On line 34 of the program in Listing 10.5, incl is used to add o ne to the address stored in

memory min us four bytes relative to the fr am e pointer :

Increment the

entire 32- or

64-bit address,

not just one byte.

incl aString(%rbp) # aString++;

BE CAREFUL: It is easy to think that the instruction ought to be incb since each character is only

one byte. The address in this program is 32 bits, so we have to use incl. And, of course, when we use a

64-bit address, we need to use incq. Don't forget that the value we are adding one to is an address, not

the value stored at that address.

Subtracting one from a counter is also a common operation. The dec instruction subtracts

one from an operand and sets the rflags register accordingly. The operan d can be a reg ister or

a memory location.

decs source

where s denotes the size of the operand:

s meaning number of bits

b byte 8

w word 16

l longword 32

q quadword 64

A decl instruction is used on line 27 in Listing 10.6 to both subtract one from the counter

variable and to set the c ondition codes in the rflags register for the jg instruction.

1 # printStars.s

2 # prints 10

*

characters on a line

3 # Bob Plantz - 12 June 2009

4

5 # Useful constants

6 .equ STDOUT,1

7 # Stack frame

8 .equ theChar,-1

9 .equ counter,-16

10 .equ localSize,-16

11 # Code

12 .text

13 .globl main

10.2. BINARY DECISIONS 221

14 .type main, @function

15 main:

16 pushq %rbp # save base pointer

17 movq %rsp, %rbp # set new base pointer

18 addq $localSize, %rsp # for local var.

19

20 movb $'

*

', theChar(%rbp) # character to print

21 movl $10, counter(%rbp) # ten times

22 doWhileLoop:

23 leaq theChar(%rbp), %rsi # address of char

24 movl $1, %edx # one character

25 movl $STDOUT, %edi # standard out

26 call write # invoke write function

27 decl counter(%rbp) # counter--;

28 jg doWhileLoop # repeat if > 0

29

30 movl $0, %eax # return 0;

31 movq %rbp, %rsp # restore stack pointer

32 popq %rbp # restore base pointer

33 ret

Listing 10.6: A do-while loop to print 10 characters.

This is clearly better than using

....

subl $1, counter(%rbp) # counter--;

cmpl $0, counter(%rbp)

jg doWhileLoop # repeat if > 0

....

This program also demonstrates how to implement a do-while loop.

10.2 Binary Decisions

We now know how to implement two of the primary program flow constructs sequence and

repetition. We continue on with the third binary decision. You know this construct from

C/C++ as the if-else .

We start the discussion with a common example a simple program that asks the use r

whether changes should be saved or not (Listing 10.7). This ex ample program does not do

anything, so there really is nothing to change, but you have c ertainly seen this construct. (As

usual, this program is meant to illustrate concepts, not good C/C++ programming practices.)

1 /

*

2

*

yesNo1.c

3

*

Prompts user to enter a y/n response.

4

*

5

*

Bob Plantz - 12 June 2009

6

*

/

7

8 #include <unistd.h>

9

10 int main(void)

11 {

12 char

*

ptr;

13 char response;

14

15 ptr = "Save changes? ";

222 CHAPTER 10. PROGRAM FLOW CON STRUCTS

16

17 while (

*

ptr != '\0')

18 {

19 write(STDOUT

_

FILENO, ptr, 1);

20 ptr++;

21 }

22

23 read (STDIN

_

FILENO, &response, 1);

24

25 if (response == 'y')

26 {

27 ptr = "Changes saved.\n";

28 while (

*

ptr != '\0')

29 {

30 write(STDOUT

_

FILENO, ptr, 1);

31 ptr++;

32 }

33 }

34 else

35 {

36 ptr = "Changes discarded.\n";

37 while (

*

ptr != '\0')

38 {

39 write(STDOUT

_

FILENO, ptr, 1);

40 ptr++;

41 }

42 }

43 return 0;

44 }

Listing 10.7: Get yes/no respo nse from user (C).

Let's look at the ow o f the program that the if-else controls.

1. The boolean expression (response == 'y') is evaluated.

2. If the evaluation is true, the first block, the one that displays "Changes sav ed.", is ex ecuted.

3. If the evaluation is false, the second block, the one that displays "Changes discarded.", is

executed.

4. In both cases the next statement to be executed is the return 0;

The program con trol flow of the if-else construct is illustrated in Figure 10.2.

10.2. BINARY DECISIONS 223

Evaluate

Boolean

expression

Execute 'Then'

part

Execute 'Else'

part

Next instruction

after if-then

construct

falsetrue

Figure 10.2: Flow chart of if-else construct. The large diamond represents a binary d ecision

that leads to two possible paths, "true" or "false." N otice that either the "then" block

or the "else" block is executed, but not both. Each leads to the end of the if-else

construct.

We already know all the assembly language instructions needed to implement the if-else

in Listing 10.7. The important thing to note is that there must be an unconditional jump at the

end of the "then" block to transfer program flow around the "else" block. The assembly language

generated for this program is shown in Listing 10.8.

1 .file "yesNo1.c"

2 .section .rodata

3 .LC0:

4 .string "Save changes? "

5 .LC1:

6 .string "Changes saved.\n"

7 .LC2:

8 .string "Changes discarded.\n"

9 .text

10 .globl main

11 .type main, @function

12 main:

13 pushq %rbp

14 movq %rsp, %rbp

15 subq $16, %rsp

16 movq $.LC0, -16(%rbp)

17 jmp .L2

18 .L3:

19 movq -16(%rbp), %rsi

20 movl $1, %edx

21 movl $1, %edi

22 call write

23 addq $1, -16(%rbp)

24 .L2:

25 movq -16(%rbp), %rax

26 movzbl (%rax), %eax

27 testb %al, %al

224 CHAPTER 10. PROGRAM FLOW CON STRUCTS

28 jne .L3

29 leaq -1(%rbp), %rsi # place to store user response

30 movl $1, %edx

31 movl $0, %edi

32 call read

33 movzbl -1(%rbp), %eax # get user response

34 cmpb $121, %al # response == 'y' ?

35 jne .L4 # no, go to else part

36 movq $.LC1, -16(%rbp) # yes, write "Changes saved.\n"

37 jmp .L5

38 .L6:

39 movq -16(%rbp), %rsi

40 movl $1, %edx

41 movl $1, %edi

42 call write

43 addq $1, -16(%rbp)

44 .L5:

45 movq -16(%rbp), %rax

46 movzbl (%rax), %eax

47 testb %al, %al

48 jne .L6

49 jmp .L7 # jump around else part

50 .L4: # else part,

51 movq $.LC2, -16(%rbp) # write "Changes discarded.\n"

52 jmp .L8

53 .L9:

54 movq -16(%rbp), %rsi

55 movl $1, %edx

56 movl $1, %edi

57 call write

58 addq $1, -16(%rbp)

59 .L8:

60 movq -16(%rbp), %rax

61 movzbl (%rax), %eax

62 testb %al, %al

63 jne .L9

64 .L7: # after if-else statement

65 movl $0, %eax

66 leave

67 ret

68 .size main, .-main

69 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"

70 .section .note.GNU-stack,"",@progbits

Listing 10.8: Get yes/no response from user (gcc assembly language).

The general structure of an if-else construct is shown in Listing 10.9.

1 # generalIf-else.s

2 # general structure of an if-else (not a program)

3 #

4 # if (response == 'y')

5 # {

6 # then part

7 # }

8 # else

9 # {

10 # else part

10.2. BINARY DECISIONS 225

11 # }

12 #

13 # Bob Plantz - 10 June 2009

14

15 cmpb $'y', response(%rbp) # check conditions

16 jne noChange # false, go to else part

17 # ------

18 # "then" part processing

19 # ------

20 jmp allDone # go to end of if-else

21 noChange:

22 # ------

23 # "else" part processing

24 # ------

25 allDone:

26 # next programming construct

Listing 10.9: General structure of an if-else construct. Don't forget the "jmp" at the end of the

"then" block (line 20).

This is not a complete program or even a function. It simply shows the key elements of an

if-else construct.

Our assembly language version of the yes/no program in Listing 10.10 follows this general

pattern. It, of course, uses more me aningful labels than what the c ompiler gen erated.

1 # yesNo2.s

2 # Prompts user to enter a y/n response.

3 # Bob Plantz - 12 June 2009

4

5 # Useful constants

6 .equ STDIN,0

7 .equ STDOUT,1

8 # Stack frame

9 .equ response,-1

10 .equ ptr,-16

11 .equ localSize,-16

12 # Read only data

13 .section .rodata

14 queryMsg:

15 .string "Save changes? "

16 saveMsg:

17 .string "Changes saved.\n"

18 discardMsg:

19 .string "Changes discarded.\n"

20 # Code

21 .text

22 .globl main

23 .type main, @function

24 main:

25 pushq %rbp # save base pointer

26 movq %rsp, %rbp # establish our base pointer

27 addq $localSize, %rsp # for local vars.

28 pushq %rbx # save for caller

29

30 movl $queryMsg, %esi

31 movl %esi, ptr(%rbp) # point to query message

32 queryLoop:

33 movl ptr(%rbp), %esi # current char in string

226 CHAPTER 10. PROGRAM FLOW CON STRUCTS

34 cmpb $0, (%esi) # null character?

35 je getResp # yes, get user response

36

37 movl $1, %edx # one character

38 movl $STDOUT, %edi # standard out

39 call write # invoke write function

40

41 incl ptr(%rbp) # ptr++;

42 jmp queryLoop # back to top

43

44 getResp:

45 movl $1, %edx # read one byte

46 leaq response(%rbp), %rsi # into this location

47 movl $STDIN, %edi # from keyboard

48 call read

49 # if (response == 'y')

50 cmpb $'y', response(%rbp) # was it 'y'?

51 jne noChange # no, there is no change

52

53 # then print the "save" message

54 movl $saveMsg, %esi

55 movl %esi, ptr(%rbp) # point to message

56 saveLoop:

57 movl ptr(%rbp), %esi # current char in string

58 cmpb $0, (%esi) # null character?

59 je saveEnd # yes, leave while loop

60

61 movl $1, %edx # one character

62 movl $STDOUT, %edi # standard out

63 call write # invoke write function

64

65 incl ptr(%rbp) # ptr++;

66 jmp saveLoop # back to top

67

68 saveEnd:

69 jmp allDone # go to end of if-else

70

71 # else print the "discard" message

72 noChange:

73 movl $discardMsg, %esi

74 movl %esi, ptr(%rbp) # point to message

75 discardLoop:

76 movl ptr(%rbp), %esi # current char in string

77 cmpb $0, (%esi) # null character?

78 je allDone # yes, leave while loop

79

80 movl $1, %edx # one character

81 movl $STDOUT, %edi # standard out

82 call write # invoke write function

83

84 incl ptr(%rbp) # ptr++;

85 jmp discardLoop # back to top

86

87 allDone:

88 movl $0, %eax # return 0;

89 popq %rbx # restore reg.

10.2. BINARY DECISIONS 227

90 movq %rbp, %rsp # restore stack pointer

91 popq %rbp # restore for caller

92 ret

Listing 10.10: Get yes/no response from user (programmer assembly language).

The exit from the while loop o n line 59

59 je saveEnd # yes, leave while loop

jumps to the end of the "then" block of the if-else statement, which then jumps to the end of

the entire if-else statement:

68 saveEnd:

69 jmp allDone # go to end of if-else

In this particular progr am we could g ain some efficiency by using

je allDone # yes, program done

on line 59. But this ve ry slight efficiency gain comes at the expense of good software engineer ing.

In g eneral, there could be more proc essing to do after the while loop in the "then" block of the

if-else statement. The real dang er her e is that additional processing will be added during the

program's maintenan ce ph ase and the programmer will f orget to change the structure. Good,

easy to read structure is almost always better than execution efficiency.

Another common programming problem is to check to see if a variable is within a certain

range. This requires a compo und boolean expression, as shown in the C program in Listing

10.11.

1 /

*

2

*

range1.c

3

*

Checks to see if a character entered by user is a numeral.

4

*

Bob Plantz - 12 June 2009

5

*

/

6

7 #include <unistd.h>

8

9 int main()

10 {

11 char response; // For user's response

12 char

*

ptr; // For text messages

13

14 ptr = "Enter single character: ";

15 while (

*

ptr != '\0')

16 {

17 write(STDOUT

_

FILENO, ptr, 1);

18 ptr++;

19 }

20

21 read(STDIN

_

FILENO, &response, 1);

22

23 if ((response <= '9') && (response >= '0'))

24 {

25 ptr = "You entered a numeral.\n";

26 while (

*

ptr != '\0')

27 {

28 write(STDOUT

_

FILENO, ptr, 1);

29 ptr++;

30 }

31 }

32 else

228 CHAPTER 10. PROGRAM FLOW CON STRUCTS

33 {

34 ptr = "You entered some other character.\n";

35 while (

*

ptr != '\0')

36 {

37 write(STDOUT

_

FILENO, ptr, 1);

38 ptr++;

39 }

40 }

41 return 0;

42 }

Listing 10.11: C ompound boolean expression in an if-else construct (C).

Each condition of the boolean expression generally requires a separate comparison/condi-

tional jump pair. The best way to see this is to study the compiler-generated assembly language

code of the numeral checking program in Listing 10.12.

1 .file "range1.c"

2 .section .rodata

3 .LC0:

4 .string "Enter single character: "

5 .LC1:

6 .string "You entered a numeral.\n"

7 .align 8

8 .LC2:

9 .string "You entered some other character.\n"

10 .text

11 .globl main

12 .type main, @function

13 main:

14 pushq %rbp

15 movq %rsp, %rbp

16 subq $16, %rsp

17 movq $.LC0, -16(%rbp)

18 jmp .L2

19 .L3:

20 movq -16(%rbp), %rsi

21 movl $1, %edx

22 movl $1, %edi

23 call write

24 addq $1, -16(%rbp)

25 .L2:

26 movq -16(%rbp), %rax

27 movzbl (%rax), %eax

28 testb %al, %al

29 jne .L3

30 leaq -1(%rbp), %rsi

31 movl $1, %edx

32 movl $0, %edi

33 call read

34 movzbl -1(%rbp), %eax # load numeral character

35 cmpb $57, %al # is numeral > '9'?

36 jg .L4 # yes, go to else part

37 movzbl -1(%rbp), %eax # load numeral character

38 cmpb $47, %al # is numeral <= '/'?

39 jle .L4 # yes, go to else part

40 movq $.LC1, -16(%rbp) # "then" part

41 jmp .L5

10.2. BINARY DECISIONS 229

42 .L6:

43 movq -16(%rbp), %rsi

44 movl $1, %edx

45 movl $1, %edi

46 call write

47 addq $1, -16(%rbp)

48 .L5:

49 movq -16(%rbp), %rax

50 movzbl (%rax), %eax

51 testb %al, %al

52 jne .L6

53 jmp .L7 # skip over "else" part

54 .L4: # "else" part

55 movq $.LC2, -16(%rbp)

56 jmp .L8

57 .L9:

58 movq -16(%rbp), %rsi

59 movl $1, %edx

60 movl $1, %edi

61 call write

62 addq $1, -16(%rbp)

63 .L8:

64 movq -16(%rbp), %rax

65 movzbl (%rax), %eax

66 testb %al, %al

67 jne .L9

68 .L7: # end of if-else construct

69 movl $0, %eax

70 leave

71 ret

72 .size main, .-main

73 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"

74 .section .note.GNU-stack,"",@progbits

Listing 10.12: C ompound boolean expression in an if-else construct (gcc assembly language).

In particular, no tice that the decision regarding whether the character ente r ed by the user is a

numeral or not is made on the lines:

34 movzbl -9(%rbp), %eax # load numeral character

35 cmpb $57, %al # is numeral > '9'?

36 jg .L5 # yes, go to else part

37 movzbl -9(%rbp), %eax # load numeral character

38 cmpb $47, %al # is numeral <= '/'?

39 jle .L5 # yes, go to else part

40 movq $.LC1, -8(%rbp) # "then" part

Consulting Table 2.3 on page 20 we see that the program first compares the character e ntered

by the user with the ascii code for the numeral "9" (57

10

= 39

16

). If the character is numerically

greater, the program jumps to .L5, which is the beginning of the "else" part. Then the character

is compared to the ASCII code for the character "/", which is numerically one less that the ascii

code for the numeral "0" (48

10

= 30

16

). If the character is numer ically equal to or less than, the

program also jumps to .L5.

If neither of these conditions causes a jump to the "else" part, the program simply continues

on to execute the "then" part. At the end of the "then" part, the program skips over the "else"

part to the end of the program:

53 jmp .L11 # skip over "else" part

54 .L5: # "else" part

230 CHAPTER 10. PROGRAM FLOW CON STRUCTS

10.2.1 Short-Circuit Evaluation

Consider the boolean expression use for the if-else conditional:

22 if ((response <= '9') && (response >= '0')) {

On lines 35 and 36 in the assembly language,

35 cmpb $57, %al # is numeral > '9'?

36 jg .L5 # yes, go to else part

we see that the test for '0' is never made if (response <= '9') is false.

This is called short-circuit evalu ation in C/C++. Whe n connec ting boolean tests with the &&

and || operators, each the boolean tests is each performe d. If the ove rall result of the expression

true or false is known before all the tests are made, the rem ainin g tests are not executed.

This is one of the most important reasons for no t writing boolean expre ssions that include side

effects; the operation that prod uces a needed side effect may never get executed.

10.2.2 Conditional Move

Many binary decisions are very simple. For example, the decision in Listing 10.7 could be writ-

ten:

ptr = "Changes discarded.\n";

if (response == 'y')

{

ptr = "Changes saved.\n";

}

while (

*

ptr != '\0')

{

write(STDOUT

_

FILENO, ptr, 1);

ptr++;

}

This code segment assigns an address to the ptr variable. If the co ndition, response == 'y', is

true, then the address in the ptr variable is w ritten over with another ad dress. This could be

written in assembly language (see Listing 10.10) as:

movl $discardMsg, %esi

# if (response == 'y')

cmpb $'y', response(%rbp) # was it 'y'?

jne noChange # no, there is no change

movl $saveMsg, %esi # yes, get other message

noChange:

movl %esi, ptr(%rbp) # point to message

msgLoop:

movl ptr(%rbp), %esi # current char in string

cmpb $0, (%esi) # null character?

je allDone # yes, leave while loop

movl $1, %edx # one character

movl $STDOUT, %edi # standard out

call write # invoke write function

incl ptr(%rbp) # ptr++;

jmp msgLoop # back to top

The x86-64 architecture provides a conditional move instruction, cmov cc, for simple if constructs

like this. The general format is

cmovcc source, des ti nation

10.3. INSTRUCTIONS INTRODUCED THUS FAR 231

where cc is a 1 4 letter seque nce specify in g the settings of the condition codes. Similar to the

conditional jump instructions, the cond itional data move takes place if the status flag settings

are true, and does not if they are false.

Possible letter sequences are the same as for the conditional jump instruction s listed in Table

10.1 on page 211. The source operand can be eithe r a register or a m emory location, and the

destination must be a register. Unlike other data move ment instructions, the cmov cc instruction

does not use the operand size suffix; the size is implicitly specified by the size of the destination

register.

The conditional m ove instruction would allow the above assembly language to be written

with a cmove instruction, where the "e" means "equal" (see Table 10.1).

movl $discardMsg, %esi # load addresses of

movl $saveMsg, %edi # both messages

# if (response == 'y')

cmpb $'y', response(%rbp) # was it 'y'?

cmove %edi, %esi # yes, "save" message

movl %esi, ptr(%rbp) # point to message

msgLoop:

movl ptr(%rbp), %esi # current char in string

cmpb $0, (%esi) # null character?

je allDone # yes, leave while loop

movl $1, %edx # one character

movl $STDOUT, %edi # standard out

call write # invoke write function

incl ptr(%rbp) # ptr++;

jmp msgLoop # back to top

Although this actually increases the average number of instructions executed, it allows the CPU

to make more efficient use of the pipeline. So a conditional move may provide faster pr ogram

execution by eliminating possible pipeline inefficiencies caused by a conditional jump. See for

example [28], [31], and [34].

10.3 Instructions Introduced Thus Far

This summary shows the assembly language instructions introduced thus far in the book. The

page number where the instruction is explained in more detail, which may be in a subsequent

chapter, is also giv en. This boo k provide s only an introduction to the usage of each in struction.

You need to consult the manuals ([2] [6], [14] [18]) in order to learn all the po ssible uses of

the instructions.

10.3.1 Instructions

data movement:

opcode source destination action see page:

cmovcc %reg/mem %reg conditional move 230

movs $imm/ %reg %reg/mem move 141

movsss $imm/ %reg %reg/mem move, sign exte nd 216

movzss $imm/ %reg %reg/mem move, zero extend 217

popw %reg/mem pop from stack 163

pushw $imm/ %reg/mem push on to stack 163

s = b, w, l, q; w = l, q; cc = condition codes

232 CHAPTER 10. PROGRAM FLOW CON STRUCTS

arithmetic/logic:

opcode source destination action see page:

adds $imm/ %reg %reg/mem add 189

adds mem %reg add 189

cmps $imm/ %reg %reg/mem compare 209

cmps mem %reg compare 209

decs %reg/mem decrement 220

incs %reg/mem increment 220

leaw mem %reg load eff ective address 167

subs $imm/ %reg %reg/mem subtract 190

subs mem %reg subtract 190

tests $imm/ %reg %reg/mem test bits 210

tests mem %reg test bits 210

s = b, w, l, q; w = l, q

program ow control:

opcode location action see page:

call label call function 156

ja label jump above (unsigned) 212

jae label jump above/equal (unsigned) 212

jb label jump below (unsigned) 212

jbe label jump below/equal (unsigned) 212

je label jump equal 211

jg label jump greater than (signed) 212

jge label jump greater than/equal (signed) 212

jl label jump less than (signed) 212

jle label jump less than/equal (signed) 212

jmp label jump 213

jne label jump not equal 211

jno label jump no overflow 211

jcc label jump on condition codes 211

leave undo stack frame 168

ret return from function 168

syscall call kernel function 177

cc = con dition codes

10.4. EXERCISES 233

10.3.2 Addressing Modes

register direct: The data value is located in a CPU register.

syntax: name of the register with a "%" prefix.

example: movl %eax, %ebx

immediate

data:

The data value is located immediately after the instruc-

tion. Source operand only.

syntax: data value with a "$" prefix.

example: movl $0xabcd1234, %ebx

base register

plus offset:

The data value is located in memory. The address of the

memory location is the sum of a value in a base register

plus an offset value.

syntax: use the name of the register with parentheses

around the name and the offset value immediately be-

fore the left parenthesis.

example: movl $0xaabbccdd, 12(%eax)

rip-relative: The target is a memory address determined by adding

an offset to the current address in the rip register.

syntax: a progr ammer-d efined label

example: je somePlace

10.4 Exercises

10-1 10.1) Verif y on p aper that the machine instructions in Table 10.4 actually cause a jump

of the number of bytes shown (in decimal) when the jump is taken.

10-2 10.1) Enter the progr am in Listing 10.2 and verify that the jump to here1 uses the rip-

relative addressing mode, and the other two jumps use the direct address. Hint: Produce

a listing file for the program and use gdb to e xamine register and memory co ntents.

10-3 10.1) Enter the program in Listing 10.5, changing the while loop to use eax as a pointer :

movl $theString, %eax

whileLoop:

cmpb $0, (%eax) # null character?

je allDone # yes, all done

movl $1, %edx # one character

movl %eax, %esi # current pointer

movl $STDOUT, %edi # standard out

call write # invoke write function

incl %eax # aString++;

jmp whileLoop # back to top

This would seem to be more efficient than reading the pointer from memory each time

through the loop. Use gdb to debug the program. Set a break point at the call instruction

and another break point at the incl instruction. Inspect the registers each time the p ro-

gram breaks into gdb. What is happ ening to the value in eax? Hint: Read what the "man

2 write " shell command has to say about the write system call fun ction. This exercise

points out the nece ssity of unde r standing what h appens to registers when calling another

function. In general, it is safer to use local variables in the stack frame.

10-4 10.1) Assume that you do not know how many numerals there are, only that the first

one is '0' and the last one is '9' (the character "0" and character "9"). Write a program

in assembly language that displays all the numerals, 0 9, on the screen, on e character at

a time. Use only one byte in the .data segmen t for storing a character; do not allocate a

separate byte for each numeral.

234 CHAPTER 10. PROGRAM FLOW CON STRUCTS

10-5 10.1) Assume that you do not know how many upper case letters there are, only that

the first one is 'A' and the last one is 'Z'. Write a program in assembly language that

displays all the u pper case letters, A Z, on the screen, one character at a time. Use only

one byte in the .data segment for storing a character; do not allocate a separate byte for

each numeral.

10-6 10.1) Assume that you do not know how many lower case letters there are, only that

the first one is 'a' and the last one is 'z'. Write a program in assembly language that

displays all the lower case letters, a z, on the screen, one character at a time. Use only

one byte in the .data segment for storing a character; do not allocate a separate byte for

each numeral.

10-7 10.1) Enter the following C program and use the "-S" option to gener ate the assembly

language:

1 /

*

2

*

forLoop.c

3

*

For loop multiplication.

4

*

5

*

Bob Plantz - 21 June 2009

6

*

/

7

8 #include<stdio.h>

9

10 int main ()

11 {

12 int x, y, z;

13 int i;

14

15 printf("Enter two integers: ");

16 scanf("%i %i", &x, &y);

17 z = x;

18 for (i = 1; i < y; i++)

19 z += x;

20

21 printf("%i

*

%i = %i\n", x, y, z);

22 return 0;

23 }

Listing 10.13: Simp le f or loop to perfo r m multiplication.

Identify the loop that performs the actual multiplication. Write an equivalent C program

that uses a while loo p instead of the for lo op, and also generate the assembly language for

it. Do the loops differ? If so, how?

10-8 10.2) Enter the C program in Listing 10.7 and get it to work. Do you see any odd behavior

when the program terminates? Can you fix it? Hint: When the program prompts the user,

how many keys did you press? What was the se cond key press?

10-9 10.2) Ente r the program in Listing 10.10 and get it to work.

10-10 ( §10.2) Write a program in assembly language that displays all the printable characters

that are neither numerals nor letters on the screen, one character at a time. Don't forget

that the space character, ' ', is printable. Do not display the DEL character. Use only one

byte for storing a character; do not allocate a separate byte for each character.

Use only one while loop in this program. You will nee d an if-else construct with a co m-

pound boolean conditional statement.

10.4. EXERCISES 235

10-11 ( §10.2) Write a program in assembly languag e that

a) prompts the user to enter a text string,

b) reads the user's input into a char array,

c) echoes the user's input string,

d) incr ements each character in the string to the next character in the ASCII sequence,

with the last printable character "wrapping around " to the first printable character,

and

e) displays the modified string.

10-12 ( §10.2) Write a program in assembly languag e that

a) prompts the user to enter a text string,

b) reads the user's input into a char array,

c) echoes the user's input string,

d) d ecrements each character in the string to the previo us character in the ASCII se-

quence, with the first printable character "wrapping aro und" to the last printable

character, and

e) displays the modified string.

10-13 ( §10.2) Write a program in assembly languag e that

a) instructs the user,

b) prompts the user to enter a character,

c) reads the user's input into a char variable,

d) if the user enters a 'q', the program terminates,

e) if the user enters a n umeral, the program echoes the numeral the number of times

represente d by the numeral plus one, and

f) any o ther printable character is echoe d just o nce.

The program continues to run until the user enters a 'q'.

For example, a run of the program might loo k like (user input is boldface):

A single numeral, N, is echoed N+1 times, other characters are echoed once.

'q' ends program.

Enter a single character: a

You entered: a

Enter a single character: Z

You entered: Z

Enter a single character: 5

You entered: 5

You entered: 5

You entered: 5

You entered: 5

You entered: 5

You entered: 5

Enter a single character: %

You entered: %

Enter a single character: q

End of program.

Chapter 11

Writing Your Own Functions

Good software engineering practice generally includes breaking problems down into functionally

distinct subproblems. This leads to software solutions with many functions, each of which solves

a subproblem. This "divide and conque r" approach has some distinct advantages:

It is easier to solve a small subproblem.

Previous solutions to subproblems are often reu sable.

Several people can be working on differ ent parts of the o verall pro blems simultaneously.

The main disadvantage of breaking a problem down like this is coo rdinating the many sub-

solutions so that they work together correctly to pro vide a corre ct overall so lu tio n. In software,

this translates to making sure that the interface between a calling function and a called func-

tion wo r ks correctly. I n orde r to ensure correct operation of the interface, it must be specified in

a v ery e xplicit way.

In Chapter 8 you learned how to pass arguments into a function and call it. In this chapter

you will learn how to use these arguments inside the called function.

11.1 Overview of Passing A rgu ments

Be careful to distinguish data input/output to/from a called function f r om user inpu t/o utput.

User input typically co mes from an input device (keyboard, mouse, etc.) and user output is

typically sent to an output device (scre en, printe r, speaker, etc.).

Functions can interact with the data in other parts o f the progr am in three ways:

1. Input. The data comes from another part of the program and is used by the function, but

is not mod ified by it.

2. Output. The function provides new data to another part of the program.

3. Update. The function modifies a data item that is held by another part of the pro gram.

The new value is based on the value before the function was called.

All three interactions can be performed if the called function also knows the location of the

data item. This can be done by the calling function passing the address to the called function or

by making the address globally known to both functions. Up dates require that the address be

known by the called function.

Outputs can also be imple mented by placing the new data item in a location that is accessible

to both the called and the calling function . In C/C++ this is done by placing the return value

from a function in the eax re gister. And inpu ts can be implemented by passing a copy of the data

item to the c alle d function. In both of these cases the called function does not know the location

of the original data item, and thus does no t have access to it.

In addition to global data, C syntax allows three ways for functions to exchange data:

236

11.1. OVERVIEW OF PASSING ARGUMEN TS 237

Pass by value an input value is passed by making a copy of it available to the function.

Return value an output value can be returned to the calling function.

Pass by pointer an output value can be stored for the calling function by passing the

address wh ere the output value should be stored to the called functio n. This can also be

used to update a data item.

The last me thod, pass by pointer, can also be used to pass large inputs, or to pass inputs that

should be changed also called updates. It is also the metho d by which C++ implements pass

by referenc e.

When one function c alls another, the information that is required to prov ide the interface

between the two is called an activa ti on record. Since both the registers and the call stack are

common to all the functions within a program, both the calling function and the called function

have access to them. So arguments can be passed either in registers or on the call stack. Of

course, the called function must know exactly where each of the argume nts is located when

program flow transfers to it.

In principle, the location s of arguments need only be consistent within a program. As lon g

as all the programmers working on the program observe the same rules, everything should

work. However, designing a good set of rules for any real-world project is a very time-consuming

process. Fortunately, the ABI [25] for the x86-64 architecture specifies a good set of rules. They

rules are very ted ious because they are meant to cover all possible situations. In this book we

will consider only the simpler rules in order to get an overall pictur e of how this works.

In 64-bit mode six of the general purpose registers and a portion of the call stack are used

for the activation record . The area of the stack used for the activation record is called a stack

frame. Within any function, the stack frame contains the following information:

Arguments (in excess of six) passed from the calling fu nction.

The return address back to the calling fu nction.

The calling function's frame pointer.

Local variables for the current function.

and often includes:

Copies of arguments passed in registers.

Copies of values in the registers that must be p reserved by a function rbx, r12 r15.

Some general memory usage rules (64-bit mode) are:

Each argument is passed within an 8-byte u nit. For example, passing three char values

requires three registers. This 8-byte rule also applies to arguments passed on the stack.

Local variables can be allocated to take up only the amount of memory they require. For

example, three char values can be accommodated in a three-byte memory area.

The address in the frame po inter (rbp register) must always be a m ultiple of sixteen. It

should never be changed within a function, except during the prologue and epilogue.

The address in the stack pointer ( rsp register) must always be a multiple of sixteen before

transferring program flow to another function.

We can see how this works by studying the program in Listing 11.1.

1 /

*

2

*

addProg.c

3

*

Adds two integers

4

*

Bob Plantz - 13 June 2009

5

*

/

6

238 CHAPTER 11. WRITING YOUR OWN FUNCTIONS

7 #include <stdio.h>

8 #include "sumInts1.h"

9

10 int main(void)

11 {

12 int x, y, z;

13 int overflow;

14

15 printf("Enter two integers: ");

16 scanf("%i %i", &x, &y);

17 overflow = sumInts(x, y, &z);

18 printf("%i + %i = %i\n", x, y, z);

19 if (overflow)

20 printf("

***

Overflow occurred

***

\n");

21

22 return 0;

23 }

1 /

*

2

*

sumInts1.h

3

*

Returns N + (N-1) + ... + 1

4

*

Bob Plantz - 4 Junee 2008

5

*

/

6

7 #ifndef SUMINTS1

_

H

8 #define SUMINTS1

_

H

9 int sumInts(int, int, int

*

);

10