Securing Higher Grades Costing Your Pocket? Book Your Assignment at The Lowest Price Now!


Get assignment help service to meet the high expectations of your professors

GET ASSIGNMENT HELP

MIPS assembly code assignment

Q1 (3 points) : For this problem, we will use the following loop:

for (k=n; k>=0; k--) x[k] = y[k+1] – 2.0 * y[k];

If we assume:

  • R1 contains the address of the nth element of y
  • R2 contains the address of the nth element of x
  • F0 contains 2.0

The above code could be written as the following MIPS assembly code:

Loop: LD

F2,

8(R1)

LD

F4,

0(R1)

MULTD

F6,

F4, F0

SUBD

F8,

F2, F6

SD

F8,

0(R2)

SUBI

R1,

R1, #8

#same as DADDUI R1, R1, #-8

SUBI

R2,

R2, #8

BNEZ

R2,

Loop

Using the following table for instruction latencies

Instruction/Operation Type

Latency in Clock Cycles

Double Load

1

Double Store

0

FP Multiply

5

  • Show the (cycle) schedule, including stalls of the unmodified loop on a fully pipelined machine.

Cycle Instruction/stall

  • Unroll the loop 3 times but do not reschedule the instructions. Ignore the delay slot. Do not delete any instructions other than loop overhead instructions.

Cycle Instruction/stall

  • Unroll the loop 3 times and reschedule the instructions to reduce the number of stalls. Ignore the delay slot. Do not delete any instructions other than loop overhead instructions.

Cycle Instruction/stall

  • What is the speedup of the unrolled loop in (2) from the unmodified case in (1)? What is the speedup of the unrolled and scheduled loop in (3) from the unmodified case in (1). Please show your calculation

Q2 : (7 points) Show scheduling of the following code: L.D F2, 0(R2)

L.D F4, 100(R3)

ADD.D F8, F2, F2

MUL.D F6, F4, F8

SUB.D F6, F2, F4

  1. (3 points) Using scoreboard. Assume one integer ALU, two FP multipliers, one FP adder and one FP divider. Integer ALU takes one execution cycle, FP multipliers take 7 cycles, FP adder takes 4 cycles and FP divider

takes 25 cycles.

  1. (3 points) Using Tomasulo’s algorithm. Assume two

LOAD units, two FP multipliers and three FP adders. Load unit takes one execution cycle for address calculation and a second one for memory access, FP

multipliers take 7 cycles and FP adders take 4 cycles.

  1. (1 point) Comment what structure (Scoreboard or Tomasulo’s) provides shorter execution time and why

(how are the sources of slowdown in one

structure avoided by the “better” structure).

Assignment Help Features
Assignment Help Services
  • Assignment Help
  • Homework Help
  • Writing Help
  • Academic Writing Assistance
  • Editing Services
  • Plagiarism Checker Online
  • Proofreading
  • Research Writing Help
QR Code Assignment Help
elearningfeeds