CS370 Questions
(I)
Short
Answers (2-3 lines)
(1) What is a Von Neumann
bottleneck in sequential architectures?
(2) Describe the Flynn’s classification
of parallel architectures.
(3) Consider an MISD
architecture with 6 processing segments and each processing segment requires 5
microseconds of processing time. Show the pipeline effect for processing 5
operations in the pipeline on a time axis.
(4) Consider the following
loop: for (i=0; i<5000; i++) {job(i)} that is executed in an MIMD architecture.
(i) Processing time for job(i)
is 30 microseconds. If Job(i) is executed on a 6
processing segment pipeline with each segment requiring 5 microseconds, what is
the speed up achieved?
(ii) If the computation involved is to find the
largest element of the above array, write all the essential code to determine
the maximum using “p” (determined at run time) processes using both Loop Splitting
and Self Scheduling algorithms.
(5) Consider the following loop:
for (i=0; i<500; i++) {job(i)}. This is executed on
an SIMD architecture with 15 processors and each job(i) requires 2 microseconds
of processing time. What is the net speed up attained?
(6) Consider the following loop:
for (i=0; i<500; i++) {job(i)}. This
is executed on an MIMD architecture using asynchronous parallel programming.
Assume each job(i) requires 20 microseconds of processing time. Assume that you
incur a process overhead of 1 microsecond/process. What is the net speed up attained if 25
processes are employed and the architecture has 25 processors.
(7) Consider the following loop:
for (i=0; i<500; i++) {job(i)}.
This is executed on an MIMD architecture using asynchronous parallel
programming. Assume each job(i) requires 20 microseconds of processing time.
Assume that you incur a process overhead of 1 microsecond/process. What is the net speed up attained if 25
processes are employed and the architecture has 15 processors.
(8) Describe (i) asynchronous parallel
programming in MIMD architectures (ii) synchronous
parallel programming in SIMD architectures.
(9) Consider the following
loop: for (i=0; i<N; i++) {job(i)}.
Assume the execution time of job(i) =
T seconds. If you create “P” processes (assume process overhead = H seconds per
process)
(i) What is the serial execution time with one process
(ii) What is the serial execution time with P processes
(iii) Argue why there would be an optimal value of P
(iv) Compute optimal P and optimal
speedup.
(10)
Consider a
nested loop executed in asynchronous parallel programming.
for (i=0; i<n; i++)
for (j=0;
j<n; j++)
job(i,j) //job(i,j) are
independent tasks.
(i)
Describe the
loopsplitting technique using group scheduling with p (=g*k) processes.
(assuming that p >n)
(ii)
How would you
select p,g,k if n=50 and the number of processors are 22.
(iii)
Describe how
self scheduling can be implemented using one critical section.
(11)
What is a
cluster?
(12)
Give three
reasons why cluster computing is gaining prominence.
(13)
What is MPI?
(14)
How do processes
exchange information in shared-memory architectures and distributed-memory
architectures.
(15)
Describe the
hypercube interconnection network with 8 processors (P0 to P7 ).
Which processors can b reached from P1 in one hop.
(16)
In an Illiac IV
interconnection network of 64 processors
(P0 to P63 ), which processors can be reached from
P7 in one hop.
(17)
Differentiate
between a static and a dynamic interconnection network. Give a few examples of
each.
(18)
Classify Illiac
IV, Hypercube, Oscar-Cluster, Sun architectures as :
(i) SIMD/MISD/MIMD
(ii)shared/distributed memory (iii) static/dynamic interconnection
network
(19)
What is a
semaphore? Why is it needed in asynchronous parallel programming?
(20)
What is a
barrier? Why is it needed in asynchronous parallel programming?
(21)
If a barrier is
to be used only once, specify the (four) variables that need to be defined.
Describe the initialization and the wait_at_barrier functions.
(22)
Why does your the wait_at_barrier function
above not work if it is used inside of a loop.
(23)
Describe the term “Systolic Architectures”.
(24)
How is Systolic Architecture similar to the parallelism in SIMD & MISD architectures.
(25)
How does VLSI technology aid in the design of Systolic Architectures.
(26)
What is the conclusion regarding parallelism stated by Amdahl’s Law.
(27)
Describe the output when
executed on mars.calstatela.edu; (The number of lines of output and the
lines that could be permuted in any order)
procs = 2;
id = process_fork(procs);
cout << " My id is "<< id<<endl;
for (i=0;i<=id; i++)
cout << " My id is "<< id<<endl;
process_join (procs, id);
cout << " My id is "<< id<<endl;
(II)
Long
answers (Asynchronous parallel programs using shared memory)
(1) Describe the serial and
parallel sorting algorithm (Rank/Odd-Even) using a specified number of
processes.
(2) Compute å (1/(1+x2)
starting with x=0 and intervals of 0.02 until x=1
(use loopsplitting with n processes)
(3) Show odd-even sorting with
the help of an example (say a 6 element array)
(4) Serial and parallel Back
Substitution algorithm. Give complexity using O notation. (In the parallel
algorithm, you have n processes with each process i determining xi .
State the data structure used to keep track of the variables that a particular
process i needs in determining xi )
(5) Given a two dimensional
square matrix X (n x n) , compute the row sums, R using n processes.
(6) Given a two dimensional
square matrix X (n x n) , compute the column sums, C using n processes.
(7) Given a two dimensional
square matrix X (n x n) , compute the rows sums, R using k*n processes.
(8) How do you determine if all
the elements of a one dimensional matrix R
of n elements are all the same. Output a message EQUAL or NOT EQUAL. Do
the computation using p (p<=n) processes. (This may be an example in which a
shared memory variable can be updated by many processes without employing a
critical section)
(9)
If X and Y are
square matrices of n x n, give the sequential and parallel code with p
(p=n) processes to compute Z = X + Y.
What is the speed up? Is there a need for a semaphore?
(10)
If X and Y are
square matrices of n x n, give the sequential and parallel code with p (p>n,
p=g*k) processes to compute Z = X + Y.
What is the speed up?
(11)
Write a parallel program that reads in
two numbers and outputs the number of prime numbers in this interval.
(Use loop splitting where each process determines the number of prime numbers.)
(III)
Long
answers (Synchronous parallel programs on Illiac IV)
(1) Given a two dimensional
square matrix X (n x n) , compute the row sums, R. Assume n=64 and give a
storage lay out. Give the computational speed up?
(2) Given a two dimensional
square matrix X (n x n) , compute the column sums, C. Assume n=64 and give a
storage lay out. Give the computational speed up?
(3) How would you compute the
sum of all elements of R.
(4) How would you compute the
sum of all elements of a two dimensional matrix X (64 x 64). What is the
computational speed up?
(IV)
Long answers
(Asynchronous programming using distributed memory on clusters)
(1) Given the MPI library:
(i)
Write a program
to compute the sum of an array of 100 integers. (Similar to loop splitting
strategy)
(ii) Compute å (1/(1+x2)
starting with x=0 and
intervals of 0.02 until x=1
(iii) Write a parallel program that reads
in two numbers and outputs the number of
prime numbers in this interval. (Use loop splitting where each process sends
the number of prime numbers it has encountered at the end to the head node. The
head node outputs the total number of prime numbers)