10-1 Chapter10- Advanced Computer Architecture Chapter10- Advanced Computer ... 10-2 Chapter10- Advanced Computer Architecture Computer Architecture ... Architecture Computer Architecture and ...

  • Published on
    22-Apr-2018

  • View
    220

  • Download
    4

Transcript

  • 10-1 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    Computer Architecture andOrganization

    Miles Murdocca and Vincent Heuring

    Chapter 10 AdvancedComputer Architecture

  • 10-2 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    Chapter Contents10.1 Parallel Architecture10.2 Superscalar Machines and the PowerPC10.3 VLIW Machines, and the Itanium10.4 Case Study: Extensions to the Instruction Set The Intel

    MMX/SSEX and Motorola Altivec SIMD Instructions10.5 Programmable Logic Devices and Custom ICs10.6 Unconventional Architectures

  • 10-3 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    Parallel Speedup and Amdahls Law In the context of parallel processing, speedup

    can be computed:

    Amdahls law, for pprocessors and a fraction fof unparallelizable code:

    For example, if f = 10% of the operations must be performedsequentially, then speedup can be no greater than 10 regardless ofhow many processors are used:

  • 10-4 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    Efficiency and Throughput Efficiency is the ratio of speedup to the number of processors used.

    For a speedup of 5.3 with 10 processors, the efficiency is:

    Throughput is a measure of how much computation is achieved overtime, and is of special concern for I/O bound and pipelinedapplications. For the case of a four stage pipeline that remains filled,in which each pipeline stage completes its task in 10 ns, the averagetime to complete an operation is 10 ns even though it takes 40 ns toexecute any one operation. The overall throughput for this situation isthen:

  • 10-5 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    FlynnTaxonomy

    Classification ofarchitectures according tothe Flynn taxonomy: (a)SISD; (b) SIMD; (c) MIMD;(d) MISD.

  • 10-6 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    NetworkTopologies

    Network topologies:(a) crossbar; (b) bus;(c) ring; (d) mesh;(e) star; (f) tree; (g)perfect shuffle; (h)hypercube.

  • 10-7 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    Crossbar Internal organization of a crossbar.

  • 10-8 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    Crosspoint Settings

    (a) Crosspoint settingsfor connections 0 3and 3 0; (b) adjustedsettings toaccommodateconnection 1 1.

  • 10-9 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    Three-Stage Clos Network

  • 10-10 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    12-ChannelThree-

    Stage ClosNetwork

    with n = p= 6

  • 10-11 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    12-ChannelThree-StageClos

    Networkwith n = p

    = 2

  • 10-12 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    12-Channel Three-Stage Clos Networkwith n = p = 4

  • 10-13 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    12-Channel Three-Stage ClosNetwork with n = p = 3

  • 10-14 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    C function computes (x2 + y2) y2

  • 10-15 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    DependencyGraph

    (a) Controlsequence for Cprogram; (b)dependency graphfor C program.

  • 10-16 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    MatrixMultiplication

    (a) Problem setup forAx = b; (b) equations forcomputing the bi.

  • 10-17 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    Matrix MultiplicationDependency Graph

  • 10-18 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    ThePowerPC 601Architecture

  • 10-19 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    128-Bit IA-64 Instruction Word

    Each 41 bit instruction consists of three register addresses (each7 bits = 128 possible registers), a predicate register (6 bits) andthe opcode and flags or general purpose register (14 bits, variesby instruction).

  • 10-20 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    Itanium Instruction Types

  • 10-21 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    AllowableCombinations

    of IA-64Instruction

    Types Assignedto Instruction

    Slots

  • 10-22 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    IA-64 Instruction Issues Maximum number of IA-64 instructions that can be executed for

    each pairing of bundles.

  • 10-23 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    Intel MMX (MultiMedia eXtensions)

    Vector addition of eight bytes by the Intel PADDB mm0,mm1 instruction:

  • 10-24 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    Intel and MotorolaVector Registers

    Intel aliases thefloating point registersas MMX registers. Thismeans that thePentiums 8 64-bitfloating-point registersdo double-duty asMMX registers.

    Motorola implements32 128-bit vectorregisters as a new set,separate and distinctfrom the floating-pointregisters.

  • 10-25 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    MMX and AltiVec ArithmeticInstructions

  • 10-26 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    Comparing Two MMX Byte Vectors forEquality

  • 10-27 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    Conditional Assignment of an MMXByte Vector

  • 10-28 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    A PAL DevicePLAs and PALs are similar except that the OR gates in a PAL have afixed number of inputs and the inputs are not programmable. PALsare more prevalent than PLAs because they are easier tomanufacture and are less complex.

  • 10-29 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    Complex Programmable Logic DeviceCPLDs are PAL-like or PLA-like blocks that can be combined withprogrammable interconnections. Commercial CPLDs may contain asmany as 200,000 equivalent gates and have over 3,000 macrocells.

  • 10-30 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    Field Programmable Gate ArrayUnlike CPLDs, which employ large logic blocks and fewer interconnectionoptions, FPGAs employ small logic blocks that can be programmablyinterconnected.

  • 10-31 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    Quantum Computing

    Single-particle interference experiment.

  • 10-32 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    Multi-Valued LogicTruth tables for binary and ternary comparison functions:

  • 10-33 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    Neural NetworksModel of a living neuron, and model of an artificial neuron (below).

  • 10-34 Chapter 10 - Advanced Computer Architecture

    Computer Architecture and Organization by M. Murdocca and V. Heuring 2007 M. Murdocca and V. Heuring

    Artificial Neural Network ExampleTwo simple, feed-forward neural networks with inputs, weights, andthresholds as shown.