xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

C语言代写-CMDA 3634 SP2021

时间：2021-04-17

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Project 04: Parallelizing the Wave Equation with MPI

Version: Current as of: 2021-04-12 13:54:50

Due:

– Preparation: 2021-04-20 23:59:00

– Coding & Analysis: 2021-05-05 23:59:00 (24 hour grace period applies to this due date.)

Points: 100

Deliverables:

– Preparation work as a PDF, typeset with LaTeX, through Canvas.

– All code through code.vt.edu, including all LaTeX source.

– Project report and analysis as a PDF, typeset with LaTeX, through Canvas.

Collaboration:

– This assignment is to be completed by yourself or by your group.

– For conceptual aspects of this assignment, you may seek assistance from your classmates. In your sub-

mission you must indicate from whom you received assistance.

– You may not assist or seek assistance from others that are not in your group on matters of programming,

code design, or derivations.

– If you are unsure, ask course staff.

Honor Code: By submitting this assignment, you acknowledge that you have adhered to the Virginia Tech

Honor Code and attest to the following:

I have neither given nor received unauthorized assistance on this assignment. The work I am pre-

senting is ultimately my own.

References

MPI:

– Eijkhout book on MPI and OpenMP: http://pages.tacc.utexas.edu/~eijkhout/pcse/html/

index.html

– MPI Quick Reference: http://www.netlib.org/utk/people/JackDongarra/WEB-PAGES/SPRING-2006/

mpi-quick-ref.pdf

1

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Code Environment

All necessary code must be put in your class Git repository.

– If you are working with a partner or group, you will need to create a new git repository to work from.

– You must add, at minimum, the instructor and the GTAs as developers to this repository. If you need

assistance from a ULA you may need to add them as well.

Use the standard directory structure from previous projects.

All experiments will be run on TinkerCliffs.

Be an ARC good citizen. Use the resources you request. Use your scratch space for large data.

Any scripts which are necessary to recreate your data must be in the data directory.

DO NOT INCLUDE VIDEO FILES OR LARGE NUMBERS OF IMAGE FILES OR IN YOUR

GIT REPOSITORY!

Your tex source for project preparation and final report go in the report directory.

Remember: commit early, commit often, commit after minor changes, commit after major changes. Push

your code to code.vt.edu frequently.

You must separate headers, source, and applications.

You must include a makefile.

You must include a README explaining how to build and run your programs.

2

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Requirements

For this project, you have the opportunity to work in teams of two or three. Your teammates must be in your

section. If you choose to work with a team:

You may collaborate on the Preparation assignment, but you must submit your own writeup, in your own

words!

You will complete and submit Coding & Analysis assignment jointly, with one minor exception detailed

below.

Preparation (20%)

1. Write a work plan for completing this project. You should review the coding and analysis sections, as well

as the remaining course schedule. Your work plan should include a description of the steps you think will

be necessary to complete the required tasks, as well an estimate of the time that each task will take. Be

sure to account for resource requirements. To properly answer this question, you will need to take some

time and think through the remainder of the assignment.

If you have choose to work with a team, identify them here and sign up for a group on Canvas. In this

case, your work plan must include additional details describing how you are going to work together, what

collaboration tools you will use, a division of labor, how frequently you plan to meet (remotely), whose code

you will start from, etc.

For continued safety, you are not permitted to meet in person!

All meetings must use Zoom, FaceTime, Hangouts, Slack, etc.

2. In Labs 06 and 07, we used 1D domain decomposition to distribute the N data in the vectors. Here is the

formula we used to find the number of entries (Nr) to be assigned to each rank r of P processors:

Nr =

{

bN/P c r ∈ {0, 1, . . . , P − 2}

bN/P c+N%P r = P − 1 ,

where % is the mod operator and bN/P c means N/P rounded down to the nearest integer. However, this

decomposition is prone to load imbalance.

(a) How bad is the load imbalance if N = 1001 and P = 1000? What about if N = 1999 and P = 1000?

(b) In the case for, N = 1999 and P = 1000, what is the optimal distribution of work?

(c) Develop a better formula for distributing such data. You will use this formula to help distribute the

rows of your computational grids. In your new formula, the difference between the largest amount of

work assigned to one rank and the smallest assigned to one rank may not be greater than 1. Also, the

relationship Nr ≥ Nr+1 must hold for all r (except r = P − 1, obviously).

3. Disregarding the header (which has negligible size), for a solution that is ny × nx = 100, 001 × 100, 001,

how large will a single solution file be on disk? Give your answer exactly, in bytes, and approximately, in

the largest reasonable unit (KB, MB, GB, etc.).

4. For each of the following configurations, estimate how many CPU hours will be charged. These computations

will be relevant in tasks 2-4 of the analysis.

(a) A batch script requesting 4 entire nodes (512 total CPUs) for the following:

> mpirun -np 1 ./wave_timing 10001 3 4 1.0 250

Run time: 460s

(b) A batch script requesting 4 entire nodes (512 total CPUs) for the following:

3

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

> mpirun -np 512 ./wave_timing 10001 3 4 1.0 250

Run time: 1.25s

(c) A batch script requesting 8 entire nodes (1024 total CPUs) for the following:

> mpirun -np 1024 ./wave_timing 10001 3 4 1.0 100

Run time: 0.3s

(d) A batch script requesting 8 entire nodes (1024 total CPUs) for the following:

> mpirun -np 1024 ./wave_timing 100001 3 4 1.0 100

Run time: 26s

5. What information needs to be stored in the 2D array data structure to operate in a domain decomposed

environment? You may assume that the domain will be decomposed using a 1D domain decomposition in

the y dimension, only. Also, assume that we will be writing output files in parallel. Address, at minimum,

the following:

For the wave simulation, how large should the halo/padding region be?

Which dimension(s) should be padded?

What information is necessary to map the internal data to the global grid coordinate system.

What other information do you think will be useful?

4

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Coding (60%)

For this project you may start from your solution to project 2 or project 3, or my solution to project 2. If you are

working with a team you may start from either of your solutions, or merge them.

The source code for your solution is to be submitted via GitLab.

This project is to be written in the C language, using MPI.

You must include a Makefile in your code directory. This makefile should include targets for each executable

you are required to produce, a target to compile all executables, and a clean target to remove intermediate

build files and executables.

You must include a readme file, explaining how compile and run your code.

You are provided some source code in array 2d io.h/c, found in the materials repository. This includes

an example parallel file writer as well as one that you will need to complete.

Tasks:

1. Modify your makefile to enable compilation with MPI.

2. Modify your timing, image generation, error calculation, and animation programs to use MPI. Be sure to

update the timing to use MPI tools – there should be no remnants of OpenMP.

3. Modify your 2D array data structures for operation in a domain decomposed environment. Be sure to

include all necessary information to map the internal data to the global grid coordinate system. We will

decompose the problem using a 1D domain decomposition in the y dimension.

Be sure to update your memory management routines (allocation, deallocation, etc.). Your alloca-

tion function should perform the actual domain decomposition, so it will need access to the MPI

communicator.

Data must be optimally distributed, using the approach developed in Task 2 of the Preparation phase.

Complete the routine compute subarray size in array 2d io.h/c, implementing the formula you

developed in Task 2 of the Preparation phase. This is required for one of the provided routines and

will be helpful elsewhere.

4. Implement a routine to perform a halo exchange for a domain-decomposed computational grid. For full

credit, you must use the non-blocking send/receive operations.

5. Complete the provided stub (write float array dist mpiio in array 2d io.h/c) to use MPI I/O to

write your wavefields to a single file, in parallel. An implementation that does not use MPI I/O is provided

for testing purposes and so that you may complete other aspects of the project.

6. Use MPI to parallelize your function for computing the error norm.

7. Update your functions for evaluating the standing wave and computing one iteration of the simulation to

work with distributed grids. Be sure to ensure that the halo regions are properly updated. There will be a

minimum of two rows per subdomain.

8. Write SLURM submission scripts for generating the results in the analysis section. Use the solutions to

Task 4 of the preparation to inform the construction of your scripts.

9. If you are working in a group of 3, complete one of the three additional tasks listed at the end of the project.

5

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Analysis (20%)

Your report for this project will be submitted via Canvas. Tex source, images, and final PDFs should be

included in your Git repositories.

All timings and analysis should be from experiments on TinkerCliffs.

I suggest that you run each of the experiments as separate jobs on TinkerCliffs. Use the early experiments

to estimate how long the later ones might take.

Unless otherwise specified, use α = 1,mx = 17, and my = 27.

You should assume that n = nx = ny for the global problem.

Tasks:

1. Use outputs from your image and error calculation programs to generate images and plots to demonstrate

that your parallel code is working correctly for P = 64 MPI tasks. Pick a large enough problem to be

interesting, but small enough that you can save images or build any animations you want to share.

2. Run a strong scalability study for n = 10001, nt = 250, and P = 1, 2, 4, 8, 16, 32, 64, 128, 256, and 512

for your wave equation simulation.

(a) You should run a small number of trials per experiment.

(b) Create plots of both the run-times and speedup. Remember, these are a function of P .

(c) Does this problem good strong scalability? Justify your claim and explain why you think you observe

this behavior.

3. Run a weak scalability study for n = 2501 (for P = 1), nt = 250, and P = 1, 2, 4, 8, 16, 32, 64, 128,

256, and 512 for your wave equation simulation.

(a) You should run a small number of trials per experiment.

(b) Create plots of both the run-times and speedup. Remember, these are a function of P .

(c) Does this problem good weak scalability? Justify your claim and explain why you think you observe

this behavior.

4. For P = 1024 (all of the cores on 8 nodes of TinkerCliffs), time the execution of nt = 100 iterations of

your wave equation simulator for n = 2,501; 5,001; 10,001; 25,001; 50,001; 75,001; and 100,001.

Plot your results as a function of N = nx × ny. On separate lines on the plot, include your results

from similar questions for Projects 1, 2, and 3. You may either re-run the experiments from Project 1

and 2 for 100 iterations or rescale the results so that they are equivalent to 100 iterations.

Look at your timing results from Projects 1, 2, and 3. Use MATH to estimate how many CPU hours it

would take to run 100 simulation iterations with n = 100, 001 based on each project’s timings (three

different predictions). For Project 3, use your P = 64 data.

How many CPU hours did the n = 100, 001 case take for this project?

What conclusions can you draw from this semester long experiment?

How might you use this knowledge when working on future projects?

5. If you are working in a group of 3, complete the additional analysis tasks corresponding to the additional

task you chose for the Coding section.

6. Evaluate your working style on this project.

If you worked on this project with a team, in the identified separate Canvas assignment, give 4 bullet

points discussing your collaboration in detail.

– State each team members specific contributions.

6

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

– What challenges and successes did you experience with collaboration tools?

– How did the workload break down between you and your team and were you able to follow the

work plan?

– How satisfied were you with both your effort and work ethic and your team’s effort and work

ethic?

This question will be graded individually, and we reserve the right to assign different grades on the

overall assignment in the event of an imbalance in collaborative effort.

If you did not work on this project with a team, in this submission, give 3 bullet points discussing your

work process in detail.

– How well were you able to follow your work plan?

– How satisfied were you with your effort and work ethic?

– How do you think things would have gone differently if you had chosen to worked with a team?

7

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Extra Credit Task: The Wave Equation with a Point Source

Additional Coding Tasks

1. Implement a new function for simulating a wave solution with a point source. The results of this

simulation will look like a ripple on a pond. This simulation works exactly the same way as the the

ones we’ve performed using the standing waves, except for the following:

– The equation is

∂ttu(t, x, y)−4u(t, x, y) = f(t, x, y), (1)

where f is the source function.

– The time-update in the simulation has one additional term,

uk+1j,i = −uk−1j,i + 2ukj,i + δt2 4 ukj,i + δt2fkj,i, (2)

where fkj,i is the source field.

– The point source is located at the point (xs, ys). For this project, we will place the source at the

grid point closest to that point. We will represent that grid point by the indices (js, is).

– The source function is,

fkj,i =

{

w(δt ∗ k, ν) if (js, is) = (j, i),

0 elsewhere.

(3)

C source code for evaluting w(t, ν) is provided in the materials repository.

– Instead of initializing with the standing wave, the initial conditions are

u(t, x, y) = 0 for t ≤ 0. (4)

Your implementation should work with your domain decomposed structures.

Additional Analysis Tasks

1. Using at least 8 MPI tasks and a domain sized at least n = 501, create an animation showing

propagation of your point source. You should run the simulation long enough that the wave reflects

off of the domain boundaries a couple of times.

Start with (xs, ys) = (0.5, 0.5) and ν = 10 Hz, but experiment with the source location to find an

interesting location and frequency.

Upload you animation to Canvas or google drive.

8

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Additional Tasks for 3-member Teams

Option A: MPI + OpenMP

Additional Coding Tasks

1. Modify your code to use both MPI and OpenMP simultaneously.

Additional Analysis Tasks

1. Re-run the experiment from Task 4 of the Analysis, once using P = 512 MPI tasks with 2 OpenMP

threads per task and once using P = 256 MPI tasks with 4 OpenMP threads per task. (This preserves

the total of 1024 total processors.) Plot your results with Task 4. Do you see an improvement on

the pure MPI case? You should experiment futher to see if more OpenMP threads yields improved

performance.

To run these experiments, use the following mpirun command as an example:

> mpirun -np 256 --map-by ppr:32:node --bind-to L3cache -x OMP NUM THREADS=4

./your timing program arg1 arg2 ...

Option B: 3D Simulations

Additional Coding Tasks

1. Modify your code so that you can do 3D simulations. You should create a separate copy of your code

and programs for this. It will be more challenging to have 2D and 3D contained inside one code. You

should still do a 1D domain decomposition, but in the z dimension this time.

The initial condition (through the standing wave function) extends naturally:

ω = pi

√

M2x +M

2

y +M

2

z (5)

u(t, x, y, z) = sin(Mxxpi) sin(Myypi) sin(Mzzpi) cos(ωt)). (6)

The Laplacian also extends naturally (k is the z index and s is the time index), assuming δx = δy = δz:

4usk,j,i ≈

−6usk,j,i + usk−1,j,i + usk,j−1,i + usk,j,i−1 + usk+1,j,i + usk,j+1,i + usk,j,i+1

δx2

. (7)

The stable time-step in 3D is (for α ≤ 1):

δt = αδx/

√

3. (8)

Additional Analysis Tasks

1. Create some 3D visualizations to demonstrate that your program is working correctly in a distributed

memory environment. You may consider plotting some 2D slices or 3D renderings. Use enough

processors to perform a small scalability study.

Hints

– These problems get large quickly (100003 = 100 ∗ 1000002, so keep n under 1001.

Option C: 2D Domain Decompositions

Additional Coding Tasks

1. Modify your code to use 2D domain decompositions instead of 1D domain decompositions. The

modified code should support arbitrary and different numbers of processors in the x and y dimensions.

You will need to modify your halo exchange to use separate communication buffers for each side and

dimension.

9

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Additional Analysis Tasks

1. Compare the run times for the n = 10001 and P = 256 using the following numbers of processors

in the x and y dimensions. The first configuration is equivalent to a 1D decomposition in the y

dimension, the last is equivalent to a 1D decomposition in the x dimension.

Px 1 4 16 64 256

Py 256 64 16 4 1

Which configuration performs best? Use your knowledge of the system and the problem to explain

why you think this is the case?

Hints

– Look into MPI’s Cartesian communicators for creating the 2D structure. For example, MPI Cart create,

MPI Cart coords, and MPI Cart shift may be useful.

10

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Development and Debugging Hints

Test the analysis tasks on small examples to be sure that they work prior to submitting the big test jobs.

Start by implementing the domain decomposition and testing it on your distributed form of the standing

wave evaluation. You have a functional array writer that you can use to test. You should get the same

output for P = 1, 2, 3, . . . .

Use the provided array writer to compare output to your MPI I/O based writer. These outputs should be

identical.

Your original array writer can be used to write out individual subdomains. This can be useful for debugging.

If you print debug output, print each worker’s rank first. You can sort by the rank to see each worker’s

execution path.

You can use the animation program to create small examples that can help you visualize if your domain

decomposition and halo exchange are working correctly. Look for artifacts around the subdomain boundaries.

After evaluating or updating a solution, ensure that the halo regions are properly updated.

Think about which subdomains have to handle the boundary conditions.

11

学霸联盟

Project 04: Parallelizing the Wave Equation with MPI

Version: Current as of: 2021-04-12 13:54:50

Due:

– Preparation: 2021-04-20 23:59:00

– Coding & Analysis: 2021-05-05 23:59:00 (24 hour grace period applies to this due date.)

Points: 100

Deliverables:

– Preparation work as a PDF, typeset with LaTeX, through Canvas.

– All code through code.vt.edu, including all LaTeX source.

– Project report and analysis as a PDF, typeset with LaTeX, through Canvas.

Collaboration:

– This assignment is to be completed by yourself or by your group.

– For conceptual aspects of this assignment, you may seek assistance from your classmates. In your sub-

mission you must indicate from whom you received assistance.

– You may not assist or seek assistance from others that are not in your group on matters of programming,

code design, or derivations.

– If you are unsure, ask course staff.

Honor Code: By submitting this assignment, you acknowledge that you have adhered to the Virginia Tech

Honor Code and attest to the following:

I have neither given nor received unauthorized assistance on this assignment. The work I am pre-

senting is ultimately my own.

References

MPI:

– Eijkhout book on MPI and OpenMP: http://pages.tacc.utexas.edu/~eijkhout/pcse/html/

index.html

– MPI Quick Reference: http://www.netlib.org/utk/people/JackDongarra/WEB-PAGES/SPRING-2006/

mpi-quick-ref.pdf

1

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Code Environment

All necessary code must be put in your class Git repository.

– If you are working with a partner or group, you will need to create a new git repository to work from.

– You must add, at minimum, the instructor and the GTAs as developers to this repository. If you need

assistance from a ULA you may need to add them as well.

Use the standard directory structure from previous projects.

All experiments will be run on TinkerCliffs.

Be an ARC good citizen. Use the resources you request. Use your scratch space for large data.

Any scripts which are necessary to recreate your data must be in the data directory.

DO NOT INCLUDE VIDEO FILES OR LARGE NUMBERS OF IMAGE FILES OR IN YOUR

GIT REPOSITORY!

Your tex source for project preparation and final report go in the report directory.

Remember: commit early, commit often, commit after minor changes, commit after major changes. Push

your code to code.vt.edu frequently.

You must separate headers, source, and applications.

You must include a makefile.

You must include a README explaining how to build and run your programs.

2

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Requirements

For this project, you have the opportunity to work in teams of two or three. Your teammates must be in your

section. If you choose to work with a team:

You may collaborate on the Preparation assignment, but you must submit your own writeup, in your own

words!

You will complete and submit Coding & Analysis assignment jointly, with one minor exception detailed

below.

Preparation (20%)

1. Write a work plan for completing this project. You should review the coding and analysis sections, as well

as the remaining course schedule. Your work plan should include a description of the steps you think will

be necessary to complete the required tasks, as well an estimate of the time that each task will take. Be

sure to account for resource requirements. To properly answer this question, you will need to take some

time and think through the remainder of the assignment.

If you have choose to work with a team, identify them here and sign up for a group on Canvas. In this

case, your work plan must include additional details describing how you are going to work together, what

collaboration tools you will use, a division of labor, how frequently you plan to meet (remotely), whose code

you will start from, etc.

For continued safety, you are not permitted to meet in person!

All meetings must use Zoom, FaceTime, Hangouts, Slack, etc.

2. In Labs 06 and 07, we used 1D domain decomposition to distribute the N data in the vectors. Here is the

formula we used to find the number of entries (Nr) to be assigned to each rank r of P processors:

Nr =

{

bN/P c r ∈ {0, 1, . . . , P − 2}

bN/P c+N%P r = P − 1 ,

where % is the mod operator and bN/P c means N/P rounded down to the nearest integer. However, this

decomposition is prone to load imbalance.

(a) How bad is the load imbalance if N = 1001 and P = 1000? What about if N = 1999 and P = 1000?

(b) In the case for, N = 1999 and P = 1000, what is the optimal distribution of work?

(c) Develop a better formula for distributing such data. You will use this formula to help distribute the

rows of your computational grids. In your new formula, the difference between the largest amount of

work assigned to one rank and the smallest assigned to one rank may not be greater than 1. Also, the

relationship Nr ≥ Nr+1 must hold for all r (except r = P − 1, obviously).

3. Disregarding the header (which has negligible size), for a solution that is ny × nx = 100, 001 × 100, 001,

how large will a single solution file be on disk? Give your answer exactly, in bytes, and approximately, in

the largest reasonable unit (KB, MB, GB, etc.).

4. For each of the following configurations, estimate how many CPU hours will be charged. These computations

will be relevant in tasks 2-4 of the analysis.

(a) A batch script requesting 4 entire nodes (512 total CPUs) for the following:

> mpirun -np 1 ./wave_timing 10001 3 4 1.0 250

Run time: 460s

(b) A batch script requesting 4 entire nodes (512 total CPUs) for the following:

3

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

> mpirun -np 512 ./wave_timing 10001 3 4 1.0 250

Run time: 1.25s

(c) A batch script requesting 8 entire nodes (1024 total CPUs) for the following:

> mpirun -np 1024 ./wave_timing 10001 3 4 1.0 100

Run time: 0.3s

(d) A batch script requesting 8 entire nodes (1024 total CPUs) for the following:

> mpirun -np 1024 ./wave_timing 100001 3 4 1.0 100

Run time: 26s

5. What information needs to be stored in the 2D array data structure to operate in a domain decomposed

environment? You may assume that the domain will be decomposed using a 1D domain decomposition in

the y dimension, only. Also, assume that we will be writing output files in parallel. Address, at minimum,

the following:

For the wave simulation, how large should the halo/padding region be?

Which dimension(s) should be padded?

What information is necessary to map the internal data to the global grid coordinate system.

What other information do you think will be useful?

4

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Coding (60%)

For this project you may start from your solution to project 2 or project 3, or my solution to project 2. If you are

working with a team you may start from either of your solutions, or merge them.

The source code for your solution is to be submitted via GitLab.

This project is to be written in the C language, using MPI.

You must include a Makefile in your code directory. This makefile should include targets for each executable

you are required to produce, a target to compile all executables, and a clean target to remove intermediate

build files and executables.

You must include a readme file, explaining how compile and run your code.

You are provided some source code in array 2d io.h/c, found in the materials repository. This includes

an example parallel file writer as well as one that you will need to complete.

Tasks:

1. Modify your makefile to enable compilation with MPI.

2. Modify your timing, image generation, error calculation, and animation programs to use MPI. Be sure to

update the timing to use MPI tools – there should be no remnants of OpenMP.

3. Modify your 2D array data structures for operation in a domain decomposed environment. Be sure to

include all necessary information to map the internal data to the global grid coordinate system. We will

decompose the problem using a 1D domain decomposition in the y dimension.

Be sure to update your memory management routines (allocation, deallocation, etc.). Your alloca-

tion function should perform the actual domain decomposition, so it will need access to the MPI

communicator.

Data must be optimally distributed, using the approach developed in Task 2 of the Preparation phase.

Complete the routine compute subarray size in array 2d io.h/c, implementing the formula you

developed in Task 2 of the Preparation phase. This is required for one of the provided routines and

will be helpful elsewhere.

4. Implement a routine to perform a halo exchange for a domain-decomposed computational grid. For full

credit, you must use the non-blocking send/receive operations.

5. Complete the provided stub (write float array dist mpiio in array 2d io.h/c) to use MPI I/O to

write your wavefields to a single file, in parallel. An implementation that does not use MPI I/O is provided

for testing purposes and so that you may complete other aspects of the project.

6. Use MPI to parallelize your function for computing the error norm.

7. Update your functions for evaluating the standing wave and computing one iteration of the simulation to

work with distributed grids. Be sure to ensure that the halo regions are properly updated. There will be a

minimum of two rows per subdomain.

8. Write SLURM submission scripts for generating the results in the analysis section. Use the solutions to

Task 4 of the preparation to inform the construction of your scripts.

9. If you are working in a group of 3, complete one of the three additional tasks listed at the end of the project.

5

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Analysis (20%)

Your report for this project will be submitted via Canvas. Tex source, images, and final PDFs should be

included in your Git repositories.

All timings and analysis should be from experiments on TinkerCliffs.

I suggest that you run each of the experiments as separate jobs on TinkerCliffs. Use the early experiments

to estimate how long the later ones might take.

Unless otherwise specified, use α = 1,mx = 17, and my = 27.

You should assume that n = nx = ny for the global problem.

Tasks:

1. Use outputs from your image and error calculation programs to generate images and plots to demonstrate

that your parallel code is working correctly for P = 64 MPI tasks. Pick a large enough problem to be

interesting, but small enough that you can save images or build any animations you want to share.

2. Run a strong scalability study for n = 10001, nt = 250, and P = 1, 2, 4, 8, 16, 32, 64, 128, 256, and 512

for your wave equation simulation.

(a) You should run a small number of trials per experiment.

(b) Create plots of both the run-times and speedup. Remember, these are a function of P .

(c) Does this problem good strong scalability? Justify your claim and explain why you think you observe

this behavior.

3. Run a weak scalability study for n = 2501 (for P = 1), nt = 250, and P = 1, 2, 4, 8, 16, 32, 64, 128,

256, and 512 for your wave equation simulation.

(a) You should run a small number of trials per experiment.

(b) Create plots of both the run-times and speedup. Remember, these are a function of P .

(c) Does this problem good weak scalability? Justify your claim and explain why you think you observe

this behavior.

4. For P = 1024 (all of the cores on 8 nodes of TinkerCliffs), time the execution of nt = 100 iterations of

your wave equation simulator for n = 2,501; 5,001; 10,001; 25,001; 50,001; 75,001; and 100,001.

Plot your results as a function of N = nx × ny. On separate lines on the plot, include your results

from similar questions for Projects 1, 2, and 3. You may either re-run the experiments from Project 1

and 2 for 100 iterations or rescale the results so that they are equivalent to 100 iterations.

Look at your timing results from Projects 1, 2, and 3. Use MATH to estimate how many CPU hours it

would take to run 100 simulation iterations with n = 100, 001 based on each project’s timings (three

different predictions). For Project 3, use your P = 64 data.

How many CPU hours did the n = 100, 001 case take for this project?

What conclusions can you draw from this semester long experiment?

How might you use this knowledge when working on future projects?

5. If you are working in a group of 3, complete the additional analysis tasks corresponding to the additional

task you chose for the Coding section.

6. Evaluate your working style on this project.

If you worked on this project with a team, in the identified separate Canvas assignment, give 4 bullet

points discussing your collaboration in detail.

– State each team members specific contributions.

6

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

– What challenges and successes did you experience with collaboration tools?

– How did the workload break down between you and your team and were you able to follow the

work plan?

– How satisfied were you with both your effort and work ethic and your team’s effort and work

ethic?

This question will be graded individually, and we reserve the right to assign different grades on the

overall assignment in the event of an imbalance in collaborative effort.

If you did not work on this project with a team, in this submission, give 3 bullet points discussing your

work process in detail.

– How well were you able to follow your work plan?

– How satisfied were you with your effort and work ethic?

– How do you think things would have gone differently if you had chosen to worked with a team?

7

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Extra Credit Task: The Wave Equation with a Point Source

Additional Coding Tasks

1. Implement a new function for simulating a wave solution with a point source. The results of this

simulation will look like a ripple on a pond. This simulation works exactly the same way as the the

ones we’ve performed using the standing waves, except for the following:

– The equation is

∂ttu(t, x, y)−4u(t, x, y) = f(t, x, y), (1)

where f is the source function.

– The time-update in the simulation has one additional term,

uk+1j,i = −uk−1j,i + 2ukj,i + δt2 4 ukj,i + δt2fkj,i, (2)

where fkj,i is the source field.

– The point source is located at the point (xs, ys). For this project, we will place the source at the

grid point closest to that point. We will represent that grid point by the indices (js, is).

– The source function is,

fkj,i =

{

w(δt ∗ k, ν) if (js, is) = (j, i),

0 elsewhere.

(3)

C source code for evaluting w(t, ν) is provided in the materials repository.

– Instead of initializing with the standing wave, the initial conditions are

u(t, x, y) = 0 for t ≤ 0. (4)

Your implementation should work with your domain decomposed structures.

Additional Analysis Tasks

1. Using at least 8 MPI tasks and a domain sized at least n = 501, create an animation showing

propagation of your point source. You should run the simulation long enough that the wave reflects

off of the domain boundaries a couple of times.

Start with (xs, ys) = (0.5, 0.5) and ν = 10 Hz, but experiment with the source location to find an

interesting location and frequency.

Upload you animation to Canvas or google drive.

8

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Additional Tasks for 3-member Teams

Option A: MPI + OpenMP

Additional Coding Tasks

1. Modify your code to use both MPI and OpenMP simultaneously.

Additional Analysis Tasks

1. Re-run the experiment from Task 4 of the Analysis, once using P = 512 MPI tasks with 2 OpenMP

threads per task and once using P = 256 MPI tasks with 4 OpenMP threads per task. (This preserves

the total of 1024 total processors.) Plot your results with Task 4. Do you see an improvement on

the pure MPI case? You should experiment futher to see if more OpenMP threads yields improved

performance.

To run these experiments, use the following mpirun command as an example:

> mpirun -np 256 --map-by ppr:32:node --bind-to L3cache -x OMP NUM THREADS=4

./your timing program arg1 arg2 ...

Option B: 3D Simulations

Additional Coding Tasks

1. Modify your code so that you can do 3D simulations. You should create a separate copy of your code

and programs for this. It will be more challenging to have 2D and 3D contained inside one code. You

should still do a 1D domain decomposition, but in the z dimension this time.

The initial condition (through the standing wave function) extends naturally:

ω = pi

√

M2x +M

2

y +M

2

z (5)

u(t, x, y, z) = sin(Mxxpi) sin(Myypi) sin(Mzzpi) cos(ωt)). (6)

The Laplacian also extends naturally (k is the z index and s is the time index), assuming δx = δy = δz:

4usk,j,i ≈

−6usk,j,i + usk−1,j,i + usk,j−1,i + usk,j,i−1 + usk+1,j,i + usk,j+1,i + usk,j,i+1

δx2

. (7)

The stable time-step in 3D is (for α ≤ 1):

δt = αδx/

√

3. (8)

Additional Analysis Tasks

1. Create some 3D visualizations to demonstrate that your program is working correctly in a distributed

memory environment. You may consider plotting some 2D slices or 3D renderings. Use enough

processors to perform a small scalability study.

Hints

– These problems get large quickly (100003 = 100 ∗ 1000002, so keep n under 1001.

Option C: 2D Domain Decompositions

Additional Coding Tasks

1. Modify your code to use 2D domain decompositions instead of 1D domain decompositions. The

modified code should support arbitrary and different numbers of processors in the x and y dimensions.

You will need to modify your halo exchange to use separate communication buffers for each side and

dimension.

9

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Additional Analysis Tasks

1. Compare the run times for the n = 10001 and P = 256 using the following numbers of processors

in the x and y dimensions. The first configuration is equivalent to a 1D decomposition in the y

dimension, the last is equivalent to a 1D decomposition in the x dimension.

Px 1 4 16 64 256

Py 256 64 16 4 1

Which configuration performs best? Use your knowledge of the system and the problem to explain

why you think this is the case?

Hints

– Look into MPI’s Cartesian communicators for creating the 2D structure. For example, MPI Cart create,

MPI Cart coords, and MPI Cart shift may be useful.

10

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Development and Debugging Hints

Test the analysis tasks on small examples to be sure that they work prior to submitting the big test jobs.

Start by implementing the domain decomposition and testing it on your distributed form of the standing

wave evaluation. You have a functional array writer that you can use to test. You should get the same

output for P = 1, 2, 3, . . . .

Use the provided array writer to compare output to your MPI I/O based writer. These outputs should be

identical.

Your original array writer can be used to write out individual subdomains. This can be useful for debugging.

If you print debug output, print each worker’s rank first. You can sort by the rank to see each worker’s

execution path.

You can use the animation program to create small examples that can help you visualize if your domain

decomposition and halo exchange are working correctly. Look for artifacts around the subdomain boundaries.

After evaluating or updating a solution, ensure that the halo regions are properly updated.

Think about which subdomains have to handle the boundary conditions.

11

学霸联盟