CS370 Questions

 

(I)                Short Answers (2-3 lines)

(1)   What is a Von Neumann bottleneck in sequential architectures?

(2)   Describe the Flynn’s classification of parallel architectures.

(3)   Consider an MISD architecture with 6 processing segments and each processing segment requires 5 microseconds of processing time. Show the pipeline effect for processing 5 operations in the pipeline on a time axis.

(4)   Consider the following loop:  for (i=0; i<5000; i++) {job(i)} that is executed in an MIMD architecture.

(i) Processing time for job(i) is 30 microseconds. If Job(i) is executed on a 6 processing segment pipeline with each segment requiring 5 microseconds, what is the speed up achieved?

(ii) If the computation involved is to find the largest element of the above array, write all the essential code to determine the maximum using “p” (determined at run time) processes using both Loop Splitting and Self Scheduling algorithms.

(5)   Consider the following loop: for (i=0; i<500; i++) {job(i)}. This is executed on an SIMD architecture with 15 processors and each job(i) requires 2 microseconds of processing time. What is the net speed up attained?

(6)   Consider the following loop: for (i=0; i<500; i++) {job(i)}.  This is executed on an MIMD architecture using asynchronous parallel programming. Assume each job(i) requires 20 microseconds of processing time. Assume that you incur a process overhead of 1 microsecond/process.  What is the net speed up attained if 25 processes are employed and the architecture has 25 processors.

(7)   Consider the following loop: for (i=0; i<500; i++) {job(i)}.    This is executed on an MIMD architecture using asynchronous parallel programming. Assume each job(i) requires 20 microseconds of processing time. Assume that you incur a process overhead of 1 microsecond/process.  What is the net speed up attained if 25 processes are employed and the architecture has 15 processors.

(8)   Describe (i)  asynchronous parallel programming in MIMD architectures (ii)  synchronous parallel programming in SIMD architectures.

(9) Consider the following loop: for (i=0; i<N; i++) {job(i)}.   

      Assume the execution time of job(i) = T seconds. If you create “P” processes (assume process overhead = H seconds per process)

      (i) What is the serial execution time with one process

      (ii) What is the serial execution time with P processes

      (iii) Argue why there would be an optimal value of P

      (iv) Compute optimal P and optimal speedup.

(10)           Consider a nested loop executed in asynchronous parallel programming.

for (i=0; i<n; i++)

 for (j=0; j<n; j++)

    job(i,j)  //job(i,j) are independent tasks.

(i)                  Describe the loopsplitting technique using group scheduling with p (=g*k) processes. (assuming that p >n)

(ii)                How would you select p,g,k if n=50 and the number of processors are 22.

(iii)               Describe how self scheduling can be implemented using one critical section.

(11)           What is a cluster?

(12)           Give three reasons why cluster computing is gaining prominence.

(13)           What is MPI?

(14)           How do processes exchange information in shared-memory architectures and distributed-memory architectures.

(15)           Describe the hypercube interconnection network with 8 processors (P0 to P7 ). Which processors can b reached from P1 in one hop.

(16)           In an Illiac IV interconnection network of 64 processors  (P0 to P63 ), which processors can be reached from P7  in one hop.

(17)           Differentiate between a static and a dynamic interconnection network. Give a few examples of each.

(18)           Classify Illiac IV, Hypercube, Oscar-Cluster, Sun architectures as : (i) SIMD/MISD/MIMD  (ii)shared/distributed memory (iii) static/dynamic interconnection network

(19)           What is a semaphore? Why is it needed in asynchronous parallel programming?

(20)           What is a barrier? Why is it needed in asynchronous parallel programming?

(21)           If a barrier is to be used only once, specify the (four) variables that need to be defined. Describe the initialization and the wait_at_barrier functions.

(22)                Why does your the wait_at_barrier function above not work if it is used inside of a loop.

(23)                       Describe the term “Systolic Architectures”.

(24)                       How is Systolic Architecture similar to the parallelism in  SIMD & MISD architectures.

(25)                       How does VLSI technology aid in the design of Systolic Architectures.

(26)                       What is the conclusion regarding parallelism stated by Amdahl’s Law.

(27)                       Describe the output when  executed on mars.calstatela.edu; (The number of lines of output and the lines that could be permuted in any order)

  procs = 2;

  id = process_fork(procs);

  cout << " My id is "<< id<<endl;

  for (i=0;i<=id; i++)

        cout << " My id is "<< id<<endl;   

  process_join (procs, id);

  cout << " My id is "<< id<<endl;

 

 

 

(II)             Long answers (Asynchronous parallel programs using shared memory)

(1)  Describe the serial and parallel sorting algorithm (Rank/Odd-Even) using a specified number of processes.

(2)   Compute  å  (1/(1+x2)  

starting with x=0 and intervals of 0.02 until x=1 (use loopsplitting with n processes)

(3)   Show odd-even sorting with the help of an example (say a 6 element array)

(4)   Serial and parallel Back Substitution algorithm. Give complexity using O notation. (In the parallel algorithm, you have n processes with each process i determining xi . State the data structure used to keep track of the variables that a particular process i needs in determining xi )

(5)   Given a two dimensional square matrix X (n x n) , compute the row sums, R using n processes.

(6)   Given a two dimensional square matrix X (n x n) , compute the column sums, C using n processes.

(7)   Given a two dimensional square matrix X (n x n) , compute the rows sums, R using k*n processes.

(8)   How do you determine if all the elements of a one dimensional matrix R  of n elements are all the same. Output a message EQUAL or NOT EQUAL. Do the computation using p (p<=n) processes. (This may be an example in which a shared memory variable can be updated by many processes without employing a critical section)

(9)           If X and Y are square matrices of n x n, give the sequential and parallel code with p (p=n)  processes to compute Z = X + Y. What is the speed up? Is there a need for a semaphore?

(10)           If X and Y are square matrices of n x n, give the sequential and parallel code with p (p>n, p=g*k)  processes to compute Z = X + Y. What is the speed up?

 (11) Write a parallel program that reads in  two numbers and outputs the number of prime numbers in this interval. (Use loop splitting where each process determines the number of prime numbers.)

 

(III)          Long answers (Synchronous parallel programs on Illiac IV)

(1)   Given a two dimensional square matrix X (n x n) , compute the row sums, R. Assume n=64 and give a storage lay out. Give the computational speed up?

(2)   Given a two dimensional square matrix X (n x n) , compute the column sums, C. Assume n=64 and give a storage lay out. Give the computational speed up?

(3)   How would you compute the sum of all elements of R.

(4)   How would you compute the sum of all elements of a two dimensional matrix X (64 x 64). What is the computational speed up?

 

(IV)           Long answers (Asynchronous programming using distributed memory on clusters)

 (1)   Given the MPI library:

 

 

(i)                  Write a program to compute the sum of an array of 100 integers. (Similar to loop splitting strategy)

 

(ii)                    Compute  å  (1/(1+x2)  

starting with x=0 and intervals of 0.02 until x=1

 

     (iii) Write a parallel program that reads in  two numbers and outputs the number of prime numbers in this interval. (Use loop splitting where each process sends the number of prime numbers it has encountered at the end to the head node. The head node outputs the total number of prime numbers)