xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

扫码添加客服微信

扫描添加客服微信

C语言代写-CMDA 3634 SP2021

时间：2021-04-17

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Project 04: Parallelizing the Wave Equation with MPI

Version: Current as of: 2021-04-12 13:54:50

Due:

– Preparation: 2021-04-20 23:59:00

– Coding & Analysis: 2021-05-05 23:59:00 (24 hour grace period applies to this due date.)

Points: 100

Deliverables:

– Preparation work as a PDF, typeset with LaTeX, through Canvas.

– All code through code.vt.edu, including all LaTeX source.

– Project report and analysis as a PDF, typeset with LaTeX, through Canvas.

Collaboration:

– This assignment is to be completed by yourself or by your group.

– For conceptual aspects of this assignment, you may seek assistance from your classmates. In your sub-

mission you must indicate from whom you received assistance.

– You may not assist or seek assistance from others that are not in your group on matters of programming,

code design, or derivations.

– If you are unsure, ask course staff.

Honor Code: By submitting this assignment, you acknowledge that you have adhered to the Virginia Tech

Honor Code and attest to the following:

I have neither given nor received unauthorized assistance on this assignment. The work I am pre-

senting is ultimately my own.

References

MPI:

– Eijkhout book on MPI and OpenMP: http://pages.tacc.utexas.edu/~eijkhout/pcse/html/

index.html

– MPI Quick Reference: http://www.netlib.org/utk/people/JackDongarra/WEB-PAGES/SPRING-2006/

mpi-quick-ref.pdf

1

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Code Environment

All necessary code must be put in your class Git repository.

– If you are working with a partner or group, you will need to create a new git repository to work from.

– You must add, at minimum, the instructor and the GTAs as developers to this repository. If you need

assistance from a ULA you may need to add them as well.

Use the standard directory structure from previous projects.

All experiments will be run on TinkerCliffs.

Be an ARC good citizen. Use the resources you request. Use your scratch space for large data.

Any scripts which are necessary to recreate your data must be in the data directory.

DO NOT INCLUDE VIDEO FILES OR LARGE NUMBERS OF IMAGE FILES OR IN YOUR

GIT REPOSITORY!

Your tex source for project preparation and final report go in the report directory.

Remember: commit early, commit often, commit after minor changes, commit after major changes. Push

your code to code.vt.edu frequently.

You must separate headers, source, and applications.

You must include a makefile.

You must include a README explaining how to build and run your programs.

2

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Requirements

For this project, you have the opportunity to work in teams of two or three. Your teammates must be in your

section. If you choose to work with a team:

You may collaborate on the Preparation assignment, but you must submit your own writeup, in your own

words!

You will complete and submit Coding & Analysis assignment jointly, with one minor exception detailed

below.

Preparation (20%)

1. Write a work plan for completing this project. You should review the coding and analysis sections, as well

as the remaining course schedule. Your work plan should include a description of the steps you think will

be necessary to complete the required tasks, as well an estimate of the time that each task will take. Be

sure to account for resource requirements. To properly answer this question, you will need to take some

time and think through the remainder of the assignment.

If you have choose to work with a team, identify them here and sign up for a group on Canvas. In this

case, your work plan must include additional details describing how you are going to work together, what

collaboration tools you will use, a division of labor, how frequently you plan to meet (remotely), whose code

you will start from, etc.

For continued safety, you are not permitted to meet in person!

All meetings must use Zoom, FaceTime, Hangouts, Slack, etc.

2. In Labs 06 and 07, we used 1D domain decomposition to distribute the N data in the vectors. Here is the

formula we used to find the number of entries (Nr) to be assigned to each rank r of P processors:

Nr =

{

bN/P c r ∈ {0, 1, . . . , P − 2}

bN/P c+N%P r = P − 1 ,

where % is the mod operator and bN/P c means N/P rounded down to the nearest integer. However, this

decomposition is prone to load imbalance.

(a) How bad is the load imbalance if N = 1001 and P = 1000? What about if N = 1999 and P = 1000?

(b) In the case for, N = 1999 and P = 1000, what is the optimal distribution of work?

(c) Develop a better formula for distributing such data. You will use this formula to help distribute the

rows of your computational grids. In your new formula, the difference between the largest amount of

work assigned to one rank and the smallest assigned to one rank may not be greater than 1. Also, the

relationship Nr ≥ Nr+1 must hold for all r (except r = P − 1, obviously).

3. Disregarding the header (which has negligible size), for a solution that is ny × nx = 100, 001 × 100, 001,

how large will a single solution file be on disk? Give your answer exactly, in bytes, and approximately, in

the largest reasonable unit (KB, MB, GB, etc.).

4. For each of the following configurations, estimate how many CPU hours will be charged. These computations

will be relevant in tasks 2-4 of the analysis.

(a) A batch script requesting 4 entire nodes (512 total CPUs) for the following:

> mpirun -np 1 ./wave_timing 10001 3 4 1.0 250

Run time: 460s

(b) A batch script requesting 4 entire nodes (512 total CPUs) for the following:

3

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

> mpirun -np 512 ./wave_timing 10001 3 4 1.0 250

Run time: 1.25s

(c) A batch script requesting 8 entire nodes (1024 total CPUs) for the following:

> mpirun -np 1024 ./wave_timing 10001 3 4 1.0 100

Run time: 0.3s

(d) A batch script requesting 8 entire nodes (1024 total CPUs) for the following:

> mpirun -np 1024 ./wave_timing 100001 3 4 1.0 100

Run time: 26s

5. What information needs to be stored in the 2D array data structure to operate in a domain decomposed

environment? You may assume that the domain will be decomposed using a 1D domain decomposition in

the y dimension, only. Also, assume that we will be writing output files in parallel. Address, at minimum,

the following:

For the wave simulation, how large should the halo/padding region be?

Which dimension(s) should be padded?

What information is necessary to map the internal data to the global grid coordinate system.

What other information do you think will be useful?

4

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Coding (60%)

For this project you may start from your solution to project 2 or project 3, or my solution to project 2. If you are

working with a team you may start from either of your solutions, or merge them.

The source code for your solution is to be submitted via GitLab.

This project is to be written in the C language, using MPI.

You must include a Makefile in your code directory. This makefile should include targets for each executable

you are required to produce, a target to compile all executables, and a clean target to remove intermediate

build files and executables.

You must include a readme file, explaining how compile and run your code.

You are provided some source code in array 2d io.h/c, found in the materials repository. This includes

an example parallel file writer as well as one that you will need to complete.

Tasks:

1. Modify your makefile to enable compilation with MPI.

2. Modify your timing, image generation, error calculation, and animation programs to use MPI. Be sure to

update the timing to use MPI tools – there should be no remnants of OpenMP.

3. Modify your 2D array data structures for operation in a domain decomposed environment. Be sure to

include all necessary information to map the internal data to the global grid coordinate system. We will

decompose the problem using a 1D domain decomposition in the y dimension.

Be sure to update your memory management routines (allocation, deallocation, etc.). Your alloca-

tion function should perform the actual domain decomposition, so it will need access to the MPI

communicator.

Data must be optimally distributed, using the approach developed in Task 2 of the Preparation phase.

Complete the routine compute subarray size in array 2d io.h/c, implementing the formula you

developed in Task 2 of the Preparation phase. This is required for one of the provided routines and

will be helpful elsewhere.

4. Implement a routine to perform a halo exchange for a domain-decomposed computational grid. For full

credit, you must use the non-blocking send/receive operations.

5. Complete the provided stub (write float array dist mpiio in array 2d io.h/c) to use MPI I/O to

write your wavefields to a single file, in parallel. An implementation that does not use MPI I/O is provided

for testing purposes and so that you may complete other aspects of the project.

6. Use MPI to parallelize your function for computing the error norm.

7. Update your functions for evaluating the standing wave and computing one iteration of the simulation to

work with distributed grids. Be sure to ensure that the halo regions are properly updated. There will be a

minimum of two rows per subdomain.

8. Write SLURM submission scripts for generating the results in the analysis section. Use the solutions to

Task 4 of the preparation to inform the construction of your scripts.

9. If you are working in a group of 3, complete one of the three additional tasks listed at the end of the project.

5

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Analysis (20%)

Your report for this project will be submitted via Canvas. Tex source, images, and final PDFs should be

included in your Git repositories.

All timings and analysis should be from experiments on TinkerCliffs.

I suggest that you run each of the experiments as separate jobs on TinkerCliffs. Use the early experiments

to estimate how long the later ones might take.

Unless otherwise specified, use α = 1,mx = 17, and my = 27.

You should assume that n = nx = ny for the global problem.

Tasks:

1. Use outputs from your image and error calculation programs to generate images and plots to demonstrate

that your parallel code is working correctly for P = 64 MPI tasks. Pick a large enough problem to be

interesting, but small enough that you can save images or build any animations you want to share.

2. Run a strong scalability study for n = 10001, nt = 250, and P = 1, 2, 4, 8, 16, 32, 64, 128, 256, and 512

for your wave equation simulation.

(a) You should run a small number of trials per experiment.

(b) Create plots of both the run-times and speedup. Remember, these are a function of P .

(c) Does this problem good strong scalability? Justify your claim and explain why you think you observe

this behavior.

3. Run a weak scalability study for n = 2501 (for P = 1), nt = 250, and P = 1, 2, 4, 8, 16, 32, 64, 128,

256, and 512 for your wave equation simulation.

(a) You should run a small number of trials per experiment.

(b) Create plots of both the run-times and speedup. Remember, these are a function of P .

(c) Does this problem good weak scalability? Justify your claim and explain why you think you observe

this behavior.

4. For P = 1024 (all of the cores on 8 nodes of TinkerCliffs), time the execution of nt = 100 iterations of

your wave equation simulator for n = 2,501; 5,001; 10,001; 25,001; 50,001; 75,001; and 100,001.

Plot your results as a function of N = nx × ny. On separate lines on the plot, include your results

from similar questions for Projects 1, 2, and 3. You may either re-run the experiments from Project 1

and 2 for 100 iterations or rescale the results so that they are equivalent to 100 iterations.

Look at your timing results from Projects 1, 2, and 3. Use MATH to estimate how many CPU hours it

would take to run 100 simulation iterations with n = 100, 001 based on each project’s timings (three

different predictions). For Project 3, use your P = 64 data.

How many CPU hours did the n = 100, 001 case take for this project?

What conclusions can you draw from this semester long experiment?

How might you use this knowledge when working on future projects?

5. If you are working in a group of 3, complete the additional analysis tasks corresponding to the additional

task you chose for the Coding section.

6. Evaluate your working style on this project.

If you worked on this project with a team, in the identified separate Canvas assignment, give 4 bullet

points discussing your collaboration in detail.

– State each team members specific contributions.

6

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

– What challenges and successes did you experience with collaboration tools?

– How did the workload break down between you and your team and were you able to follow the

work plan?

– How satisfied were you with both your effort and work ethic and your team’s effort and work

ethic?

This question will be graded individually, and we reserve the right to assign different grades on the

overall assignment in the event of an imbalance in collaborative effort.

If you did not work on this project with a team, in this submission, give 3 bullet points discussing your

work process in detail.

– How well were you able to follow your work plan?

– How satisfied were you with your effort and work ethic?

– How do you think things would have gone differently if you had chosen to worked with a team?

7

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Extra Credit Task: The Wave Equation with a Point Source

Additional Coding Tasks

1. Implement a new function for simulating a wave solution with a point source. The results of this

simulation will look like a ripple on a pond. This simulation works exactly the same way as the the

ones we’ve performed using the standing waves, except for the following:

– The equation is

∂ttu(t, x, y)−4u(t, x, y) = f(t, x, y), (1)

where f is the source function.

– The time-update in the simulation has one additional term,

uk+1j,i = −uk−1j,i + 2ukj,i + δt2 4 ukj,i + δt2fkj,i, (2)

where fkj,i is the source field.

– The point source is located at the point (xs, ys). For this project, we will place the source at the

grid point closest to that point. We will represent that grid point by the indices (js, is).

– The source function is,

fkj,i =

{

w(δt ∗ k, ν) if (js, is) = (j, i),

0 elsewhere.

(3)

C source code for evaluting w(t, ν) is provided in the materials repository.

– Instead of initializing with the standing wave, the initial conditions are

u(t, x, y) = 0 for t ≤ 0. (4)

Your implementation should work with your domain decomposed structures.

Additional Analysis Tasks

1. Using at least 8 MPI tasks and a domain sized at least n = 501, create an animation showing

propagation of your point source. You should run the simulation long enough that the wave reflects

off of the domain boundaries a couple of times.

Start with (xs, ys) = (0.5, 0.5) and ν = 10 Hz, but experiment with the source location to find an

interesting location and frequency.

Upload you animation to Canvas or google drive.

8

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Additional Tasks for 3-member Teams

Option A: MPI + OpenMP

Additional Coding Tasks

1. Modify your code to use both MPI and OpenMP simultaneously.

Additional Analysis Tasks

1. Re-run the experiment from Task 4 of the Analysis, once using P = 512 MPI tasks with 2 OpenMP

threads per task and once using P = 256 MPI tasks with 4 OpenMP threads per task. (This preserves

the total of 1024 total processors.) Plot your results with Task 4. Do you see an improvement on

the pure MPI case? You should experiment futher to see if more OpenMP threads yields improved

performance.

To run these experiments, use the following mpirun command as an example:

> mpirun -np 256 --map-by ppr:32:node --bind-to L3cache -x OMP NUM THREADS=4

./your timing program arg1 arg2 ...

Option B: 3D Simulations

Additional Coding Tasks

1. Modify your code so that you can do 3D simulations. You should create a separate copy of your code

and programs for this. It will be more challenging to have 2D and 3D contained inside one code. You

should still do a 1D domain decomposition, but in the z dimension this time.

The initial condition (through the standing wave function) extends naturally:

ω = pi

√

M2x +M

2

y +M

2

z (5)

u(t, x, y, z) = sin(Mxxpi) sin(Myypi) sin(Mzzpi) cos(ωt)). (6)

The Laplacian also extends naturally (k is the z index and s is the time index), assuming δx = δy = δz:

4usk,j,i ≈

−6usk,j,i + usk−1,j,i + usk,j−1,i + usk,j,i−1 + usk+1,j,i + usk,j+1,i + usk,j,i+1

δx2

. (7)

The stable time-step in 3D is (for α ≤ 1):

δt = αδx/

√

3. (8)

Additional Analysis Tasks

1. Create some 3D visualizations to demonstrate that your program is working correctly in a distributed

memory environment. You may consider plotting some 2D slices or 3D renderings. Use enough

processors to perform a small scalability study.

Hints

– These problems get large quickly (100003 = 100 ∗ 1000002, so keep n under 1001.

Option C: 2D Domain Decompositions

Additional Coding Tasks

1. Modify your code to use 2D domain decompositions instead of 1D domain decompositions. The

modified code should support arbitrary and different numbers of processors in the x and y dimensions.

You will need to modify your halo exchange to use separate communication buffers for each side and

dimension.

9

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Additional Analysis Tasks

1. Compare the run times for the n = 10001 and P = 256 using the following numbers of processors

in the x and y dimensions. The first configuration is equivalent to a 1D decomposition in the y

dimension, the last is equivalent to a 1D decomposition in the x dimension.

Px 1 4 16 64 256

Py 256 64 16 4 1

Which configuration performs best? Use your knowledge of the system and the problem to explain

why you think this is the case?

Hints

– Look into MPI’s Cartesian communicators for creating the 2D structure. For example, MPI Cart create,

MPI Cart coords, and MPI Cart shift may be useful.

10

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Development and Debugging Hints

Test the analysis tasks on small examples to be sure that they work prior to submitting the big test jobs.

Start by implementing the domain decomposition and testing it on your distributed form of the standing

wave evaluation. You have a functional array writer that you can use to test. You should get the same

output for P = 1, 2, 3, . . . .

Use the provided array writer to compare output to your MPI I/O based writer. These outputs should be

identical.

Your original array writer can be used to write out individual subdomains. This can be useful for debugging.

If you print debug output, print each worker’s rank first. You can sort by the rank to see each worker’s

execution path.

You can use the animation program to create small examples that can help you visualize if your domain

decomposition and halo exchange are working correctly. Look for artifacts around the subdomain boundaries.

After evaluating or updating a solution, ensure that the halo regions are properly updated.

Think about which subdomains have to handle the boundary conditions.

11

学霸联盟

Project 04: Parallelizing the Wave Equation with MPI

Version: Current as of: 2021-04-12 13:54:50

Due:

– Preparation: 2021-04-20 23:59:00

– Coding & Analysis: 2021-05-05 23:59:00 (24 hour grace period applies to this due date.)

Points: 100

Deliverables:

– Preparation work as a PDF, typeset with LaTeX, through Canvas.

– All code through code.vt.edu, including all LaTeX source.

– Project report and analysis as a PDF, typeset with LaTeX, through Canvas.

Collaboration:

– This assignment is to be completed by yourself or by your group.

– For conceptual aspects of this assignment, you may seek assistance from your classmates. In your sub-

mission you must indicate from whom you received assistance.

– You may not assist or seek assistance from others that are not in your group on matters of programming,

code design, or derivations.

– If you are unsure, ask course staff.

Honor Code: By submitting this assignment, you acknowledge that you have adhered to the Virginia Tech

Honor Code and attest to the following:

I have neither given nor received unauthorized assistance on this assignment. The work I am pre-

senting is ultimately my own.

References

MPI:

– Eijkhout book on MPI and OpenMP: http://pages.tacc.utexas.edu/~eijkhout/pcse/html/

index.html

– MPI Quick Reference: http://www.netlib.org/utk/people/JackDongarra/WEB-PAGES/SPRING-2006/

mpi-quick-ref.pdf

1

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Code Environment

All necessary code must be put in your class Git repository.

– If you are working with a partner or group, you will need to create a new git repository to work from.

– You must add, at minimum, the instructor and the GTAs as developers to this repository. If you need

assistance from a ULA you may need to add them as well.

Use the standard directory structure from previous projects.

All experiments will be run on TinkerCliffs.

Be an ARC good citizen. Use the resources you request. Use your scratch space for large data.

Any scripts which are necessary to recreate your data must be in the data directory.

DO NOT INCLUDE VIDEO FILES OR LARGE NUMBERS OF IMAGE FILES OR IN YOUR

GIT REPOSITORY!

Your tex source for project preparation and final report go in the report directory.

Remember: commit early, commit often, commit after minor changes, commit after major changes. Push

your code to code.vt.edu frequently.

You must separate headers, source, and applications.

You must include a makefile.

You must include a README explaining how to build and run your programs.

2

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Requirements

For this project, you have the opportunity to work in teams of two or three. Your teammates must be in your

section. If you choose to work with a team:

You may collaborate on the Preparation assignment, but you must submit your own writeup, in your own

words!

You will complete and submit Coding & Analysis assignment jointly, with one minor exception detailed

below.

Preparation (20%)

1. Write a work plan for completing this project. You should review the coding and analysis sections, as well

as the remaining course schedule. Your work plan should include a description of the steps you think will

be necessary to complete the required tasks, as well an estimate of the time that each task will take. Be

sure to account for resource requirements. To properly answer this question, you will need to take some

time and think through the remainder of the assignment.

If you have choose to work with a team, identify them here and sign up for a group on Canvas. In this

case, your work plan must include additional details describing how you are going to work together, what

collaboration tools you will use, a division of labor, how frequently you plan to meet (remotely), whose code

you will start from, etc.

For continued safety, you are not permitted to meet in person!

All meetings must use Zoom, FaceTime, Hangouts, Slack, etc.

2. In Labs 06 and 07, we used 1D domain decomposition to distribute the N data in the vectors. Here is the

formula we used to find the number of entries (Nr) to be assigned to each rank r of P processors:

Nr =

{

bN/P c r ∈ {0, 1, . . . , P − 2}

bN/P c+N%P r = P − 1 ,

where % is the mod operator and bN/P c means N/P rounded down to the nearest integer. However, this

decomposition is prone to load imbalance.

(a) How bad is the load imbalance if N = 1001 and P = 1000? What about if N = 1999 and P = 1000?

(b) In the case for, N = 1999 and P = 1000, what is the optimal distribution of work?

(c) Develop a better formula for distributing such data. You will use this formula to help distribute the

rows of your computational grids. In your new formula, the difference between the largest amount of

work assigned to one rank and the smallest assigned to one rank may not be greater than 1. Also, the

relationship Nr ≥ Nr+1 must hold for all r (except r = P − 1, obviously).

3. Disregarding the header (which has negligible size), for a solution that is ny × nx = 100, 001 × 100, 001,

how large will a single solution file be on disk? Give your answer exactly, in bytes, and approximately, in

the largest reasonable unit (KB, MB, GB, etc.).

4. For each of the following configurations, estimate how many CPU hours will be charged. These computations

will be relevant in tasks 2-4 of the analysis.

(a) A batch script requesting 4 entire nodes (512 total CPUs) for the following:

> mpirun -np 1 ./wave_timing 10001 3 4 1.0 250

Run time: 460s

(b) A batch script requesting 4 entire nodes (512 total CPUs) for the following:

3

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

> mpirun -np 512 ./wave_timing 10001 3 4 1.0 250

Run time: 1.25s

(c) A batch script requesting 8 entire nodes (1024 total CPUs) for the following:

> mpirun -np 1024 ./wave_timing 10001 3 4 1.0 100

Run time: 0.3s

(d) A batch script requesting 8 entire nodes (1024 total CPUs) for the following:

> mpirun -np 1024 ./wave_timing 100001 3 4 1.0 100

Run time: 26s

5. What information needs to be stored in the 2D array data structure to operate in a domain decomposed

environment? You may assume that the domain will be decomposed using a 1D domain decomposition in

the y dimension, only. Also, assume that we will be writing output files in parallel. Address, at minimum,

the following:

For the wave simulation, how large should the halo/padding region be?

Which dimension(s) should be padded?

What information is necessary to map the internal data to the global grid coordinate system.

What other information do you think will be useful?

4

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Coding (60%)

For this project you may start from your solution to project 2 or project 3, or my solution to project 2. If you are

working with a team you may start from either of your solutions, or merge them.

The source code for your solution is to be submitted via GitLab.

This project is to be written in the C language, using MPI.

You must include a Makefile in your code directory. This makefile should include targets for each executable

you are required to produce, a target to compile all executables, and a clean target to remove intermediate

build files and executables.

You must include a readme file, explaining how compile and run your code.

You are provided some source code in array 2d io.h/c, found in the materials repository. This includes

an example parallel file writer as well as one that you will need to complete.

Tasks:

1. Modify your makefile to enable compilation with MPI.

2. Modify your timing, image generation, error calculation, and animation programs to use MPI. Be sure to

update the timing to use MPI tools – there should be no remnants of OpenMP.

3. Modify your 2D array data structures for operation in a domain decomposed environment. Be sure to

include all necessary information to map the internal data to the global grid coordinate system. We will

decompose the problem using a 1D domain decomposition in the y dimension.

Be sure to update your memory management routines (allocation, deallocation, etc.). Your alloca-

tion function should perform the actual domain decomposition, so it will need access to the MPI

communicator.

Data must be optimally distributed, using the approach developed in Task 2 of the Preparation phase.

Complete the routine compute subarray size in array 2d io.h/c, implementing the formula you

developed in Task 2 of the Preparation phase. This is required for one of the provided routines and

will be helpful elsewhere.

4. Implement a routine to perform a halo exchange for a domain-decomposed computational grid. For full

credit, you must use the non-blocking send/receive operations.

5. Complete the provided stub (write float array dist mpiio in array 2d io.h/c) to use MPI I/O to

write your wavefields to a single file, in parallel. An implementation that does not use MPI I/O is provided

for testing purposes and so that you may complete other aspects of the project.

6. Use MPI to parallelize your function for computing the error norm.

7. Update your functions for evaluating the standing wave and computing one iteration of the simulation to

work with distributed grids. Be sure to ensure that the halo regions are properly updated. There will be a

minimum of two rows per subdomain.

8. Write SLURM submission scripts for generating the results in the analysis section. Use the solutions to

Task 4 of the preparation to inform the construction of your scripts.

9. If you are working in a group of 3, complete one of the three additional tasks listed at the end of the project.

5

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Analysis (20%)

Your report for this project will be submitted via Canvas. Tex source, images, and final PDFs should be

included in your Git repositories.

All timings and analysis should be from experiments on TinkerCliffs.

I suggest that you run each of the experiments as separate jobs on TinkerCliffs. Use the early experiments

to estimate how long the later ones might take.

Unless otherwise specified, use α = 1,mx = 17, and my = 27.

You should assume that n = nx = ny for the global problem.

Tasks:

1. Use outputs from your image and error calculation programs to generate images and plots to demonstrate

that your parallel code is working correctly for P = 64 MPI tasks. Pick a large enough problem to be

interesting, but small enough that you can save images or build any animations you want to share.

2. Run a strong scalability study for n = 10001, nt = 250, and P = 1, 2, 4, 8, 16, 32, 64, 128, 256, and 512

for your wave equation simulation.

(a) You should run a small number of trials per experiment.

(b) Create plots of both the run-times and speedup. Remember, these are a function of P .

(c) Does this problem good strong scalability? Justify your claim and explain why you think you observe

this behavior.

3. Run a weak scalability study for n = 2501 (for P = 1), nt = 250, and P = 1, 2, 4, 8, 16, 32, 64, 128,

256, and 512 for your wave equation simulation.

(a) You should run a small number of trials per experiment.

(b) Create plots of both the run-times and speedup. Remember, these are a function of P .

(c) Does this problem good weak scalability? Justify your claim and explain why you think you observe

this behavior.

4. For P = 1024 (all of the cores on 8 nodes of TinkerCliffs), time the execution of nt = 100 iterations of

your wave equation simulator for n = 2,501; 5,001; 10,001; 25,001; 50,001; 75,001; and 100,001.

Plot your results as a function of N = nx × ny. On separate lines on the plot, include your results

from similar questions for Projects 1, 2, and 3. You may either re-run the experiments from Project 1

and 2 for 100 iterations or rescale the results so that they are equivalent to 100 iterations.

Look at your timing results from Projects 1, 2, and 3. Use MATH to estimate how many CPU hours it

would take to run 100 simulation iterations with n = 100, 001 based on each project’s timings (three

different predictions). For Project 3, use your P = 64 data.

How many CPU hours did the n = 100, 001 case take for this project?

What conclusions can you draw from this semester long experiment?

How might you use this knowledge when working on future projects?

5. If you are working in a group of 3, complete the additional analysis tasks corresponding to the additional

task you chose for the Coding section.

6. Evaluate your working style on this project.

If you worked on this project with a team, in the identified separate Canvas assignment, give 4 bullet

points discussing your collaboration in detail.

– State each team members specific contributions.

6

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

– What challenges and successes did you experience with collaboration tools?

– How did the workload break down between you and your team and were you able to follow the

work plan?

– How satisfied were you with both your effort and work ethic and your team’s effort and work

ethic?

This question will be graded individually, and we reserve the right to assign different grades on the

overall assignment in the event of an imbalance in collaborative effort.

If you did not work on this project with a team, in this submission, give 3 bullet points discussing your

work process in detail.

– How well were you able to follow your work plan?

– How satisfied were you with your effort and work ethic?

– How do you think things would have gone differently if you had chosen to worked with a team?

7

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Extra Credit Task: The Wave Equation with a Point Source

Additional Coding Tasks

1. Implement a new function for simulating a wave solution with a point source. The results of this

simulation will look like a ripple on a pond. This simulation works exactly the same way as the the

ones we’ve performed using the standing waves, except for the following:

– The equation is

∂ttu(t, x, y)−4u(t, x, y) = f(t, x, y), (1)

where f is the source function.

– The time-update in the simulation has one additional term,

uk+1j,i = −uk−1j,i + 2ukj,i + δt2 4 ukj,i + δt2fkj,i, (2)

where fkj,i is the source field.

– The point source is located at the point (xs, ys). For this project, we will place the source at the

grid point closest to that point. We will represent that grid point by the indices (js, is).

– The source function is,

fkj,i =

{

w(δt ∗ k, ν) if (js, is) = (j, i),

0 elsewhere.

(3)

C source code for evaluting w(t, ν) is provided in the materials repository.

– Instead of initializing with the standing wave, the initial conditions are

u(t, x, y) = 0 for t ≤ 0. (4)

Your implementation should work with your domain decomposed structures.

Additional Analysis Tasks

1. Using at least 8 MPI tasks and a domain sized at least n = 501, create an animation showing

propagation of your point source. You should run the simulation long enough that the wave reflects

off of the domain boundaries a couple of times.

Start with (xs, ys) = (0.5, 0.5) and ν = 10 Hz, but experiment with the source location to find an

interesting location and frequency.

Upload you animation to Canvas or google drive.

8

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Additional Tasks for 3-member Teams

Option A: MPI + OpenMP

Additional Coding Tasks

1. Modify your code to use both MPI and OpenMP simultaneously.

Additional Analysis Tasks

1. Re-run the experiment from Task 4 of the Analysis, once using P = 512 MPI tasks with 2 OpenMP

threads per task and once using P = 256 MPI tasks with 4 OpenMP threads per task. (This preserves

the total of 1024 total processors.) Plot your results with Task 4. Do you see an improvement on

the pure MPI case? You should experiment futher to see if more OpenMP threads yields improved

performance.

To run these experiments, use the following mpirun command as an example:

> mpirun -np 256 --map-by ppr:32:node --bind-to L3cache -x OMP NUM THREADS=4

./your timing program arg1 arg2 ...

Option B: 3D Simulations

Additional Coding Tasks

1. Modify your code so that you can do 3D simulations. You should create a separate copy of your code

and programs for this. It will be more challenging to have 2D and 3D contained inside one code. You

should still do a 1D domain decomposition, but in the z dimension this time.

The initial condition (through the standing wave function) extends naturally:

ω = pi

√

M2x +M

2

y +M

2

z (5)

u(t, x, y, z) = sin(Mxxpi) sin(Myypi) sin(Mzzpi) cos(ωt)). (6)

The Laplacian also extends naturally (k is the z index and s is the time index), assuming δx = δy = δz:

4usk,j,i ≈

−6usk,j,i + usk−1,j,i + usk,j−1,i + usk,j,i−1 + usk+1,j,i + usk,j+1,i + usk,j,i+1

δx2

. (7)

The stable time-step in 3D is (for α ≤ 1):

δt = αδx/

√

3. (8)

Additional Analysis Tasks

1. Create some 3D visualizations to demonstrate that your program is working correctly in a distributed

memory environment. You may consider plotting some 2D slices or 3D renderings. Use enough

processors to perform a small scalability study.

Hints

– These problems get large quickly (100003 = 100 ∗ 1000002, so keep n under 1001.

Option C: 2D Domain Decompositions

Additional Coding Tasks

1. Modify your code to use 2D domain decompositions instead of 1D domain decompositions. The

modified code should support arbitrary and different numbers of processors in the x and y dimensions.

You will need to modify your halo exchange to use separate communication buffers for each side and

dimension.

9

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Additional Analysis Tasks

1. Compare the run times for the n = 10001 and P = 256 using the following numbers of processors

in the x and y dimensions. The first configuration is equivalent to a 1D decomposition in the y

dimension, the last is equivalent to a 1D decomposition in the x dimension.

Px 1 4 16 64 256

Py 256 64 16 4 1

Which configuration performs best? Use your knowledge of the system and the problem to explain

why you think this is the case?

Hints

– Look into MPI’s Cartesian communicators for creating the 2D structure. For example, MPI Cart create,

MPI Cart coords, and MPI Cart shift may be useful.

10

CMDA 3634 SP2021 Parallelizing the Wave Equation with MPI Project 04

Development and Debugging Hints

Test the analysis tasks on small examples to be sure that they work prior to submitting the big test jobs.

Start by implementing the domain decomposition and testing it on your distributed form of the standing

wave evaluation. You have a functional array writer that you can use to test. You should get the same

output for P = 1, 2, 3, . . . .

Use the provided array writer to compare output to your MPI I/O based writer. These outputs should be

identical.

Your original array writer can be used to write out individual subdomains. This can be useful for debugging.

If you print debug output, print each worker’s rank first. You can sort by the rank to see each worker’s

execution path.

You can use the animation program to create small examples that can help you visualize if your domain

decomposition and halo exchange are working correctly. Look for artifacts around the subdomain boundaries.

After evaluating or updating a solution, ensure that the halo regions are properly updated.

Think about which subdomains have to handle the boundary conditions.

11

学霸联盟

- 留学生代写
- Python代写
- Java代写
- c/c++代写
- 数据库代写
- 算法代写
- 机器学习代写
- 数据挖掘代写
- 数据分析代写
- Android代写
- html代写
- 计算机网络代写
- 操作系统代写
- 计算机体系结构代写
- R代写
- 数学代写
- 金融作业代写
- 微观经济学代写
- 会计代写
- 统计代写
- 生物代写
- 物理代写
- 机械代写
- Assignment代写
- sql数据库代写
- analysis代写
- Haskell代写
- Linux代写
- Shell代写
- Diode Ideality Factor代写
- 宏观经济学代写
- 经济代写
- 计量经济代写
- math代写
- 金融统计代写
- 经济统计代写
- 概率论代写
- 代数代写
- 工程作业代写
- Databases代写
- 逻辑代写
- JavaScript代写
- Matlab代写
- Unity代写
- BigDate大数据代写
- 汇编代写
- stat代写
- scala代写
- OpenGL代写
- CS代写
- 程序代写
- 简答代写
- Excel代写
- Logisim代写
- 代码代写
- 手写题代写
- 电子工程代写
- 判断代写
- 论文代写
- stata代写
- witness代写
- statscloud代写
- 证明代写
- 非欧几何代写
- 理论代写
- http代写
- MySQL代写
- PHP代写
- 计算代写
- 考试代写
- 博弈论代写
- 英语代写
- essay代写
- 不限代写
- lingo代写
- 线性代数代写
- 文本处理代写
- 商科代写
- visual studio代写
- 光谱分析代写
- report代写
- GCP代写
- 无代写
- 电力系统代写
- refinitiv eikon代写
- 运筹学代写
- simulink代写
- 单片机代写
- GAMS代写
- 人力资源代写
- 报告代写
- SQLAlchemy代写
- Stufio代写
- sklearn代写
- 计算机架构代写
- 贝叶斯代写
- 以太坊代写
- 计算证明代写
- prolog代写
- 交互设计代写
- mips代写
- css代写
- 云计算代写
- dafny代写
- quiz考试代写
- js代写
- 密码学代写
- ml代写
- 水利工程基础代写
- 经济管理代写
- Rmarkdown代写
- 电路代写
- 质量管理画图代写
- sas代写
- 金融数学代写
- processing代写
- 预测分析代写
- 机械力学代写
- vhdl代写
- solidworks代写
- 不涉及代写
- 计算分析代写
- Netlogo代写
- openbugs代写
- 土木代写
- 国际金融专题代写
- 离散数学代写
- openssl代写
- 化学材料代写
- eview代写
- nlp代写
- Assembly language代写
- gproms代写
- studio代写
- robot analyse代写
- pytorch代写
- 证明题代写
- latex代写
- coq代写
- 市场营销论文代写
- 人力资论文代写
- weka代写
- 英文代写
- Minitab代写
- 航空代写
- webots代写
- Advanced Management Accounting代写
- Lunix代写
- 云基础代写
- 有限状态过程代写
- aws代写
- AI代写
- 图灵机代写
- Sociology代写
- 分析代写
- 经济开发代写
- Data代写
- jupyter代写
- 通信考试代写
- 网络安全代写
- 固体力学代写
- spss代写
- 无编程代写
- react代写
- Ocaml代写
- 期货期权代写
- Scheme代写
- 数学统计代写
- 信息安全代写
- Bloomberg代写
- 残疾与创新设计代写
- 历史代写
- 理论题代写
- cpu代写
- 计量代写
- Xpress-IVE代写
- 微积分代写
- 材料学代写
- 代写
- 会计信息系统代写
- 凸优化代写
- 投资代写
- F#代写
- C#代写
- arm代写
- 伪代码代写
- 白话代写
- IC集成电路代写
- reasoning代写
- agents代写
- 精算代写
- opencl代写
- Perl代写
- 图像处理代写
- 工程电磁场代写
- 时间序列代写
- 数据结构算法代写
- 网络基础代写
- 画图代写
- Marie代写
- ASP代写
- EViews代写
- Interval Temporal Logic代写
- ccgarch代写
- rmgarch代写
- jmp代写
- 选择填空代写
- mathematics代写
- winbugs代写
- maya代写
- Directx代写
- PPT代写
- 可视化代写
- 工程材料代写
- 环境代写
- abaqus代写
- 投资组合代写
- 选择题代写
- openmp.c代写
- cuda.cu代写
- 传感器基础代写
- 区块链比特币代写
- 土壤固结代写
- 电气代写
- 电子设计代写
- 主观题代写
- 金融微积代写
- ajax代写
- Risk theory代写
- tcp代写
- tableau代写
- mylab代写
- research paper代写
- 手写代写
- 管理代写
- paper代写
- 毕设代写
- 衍生品代写
- 学术论文代写
- 计算画图代写
- SPIM汇编代写
- 演讲稿代写
- 金融实证代写
- 环境化学代写
- 通信代写
- 股权市场代写
- 计算机逻辑代写
- Microsoft Visio代写
- 业务流程管理代写
- Spark代写
- USYD代写
- 数值分析代写
- 有限元代写
- 抽代代写
- 不限定代写
- IOS代写
- scikit-learn代写
- ts angular代写
- sml代写
- 管理决策分析代写
- vba代写
- 墨大代写
- erlang代写
- Azure代写
- 粒子物理代写
- 编译器代写
- socket代写
- 商业分析代写
- 财务报表分析代写
- Machine Learning代写
- 国际贸易代写
- code代写
- 流体力学代写
- 辅导代写
- 设计代写
- marketing代写
- web代写
- 计算机代写
- verilog代写
- 心理学代写
- 线性回归代写
- 高级数据分析代写
- clingo代写
- Mplab代写
- coventorware代写
- creo代写
- nosql代写
- 供应链代写
- uml代写
- 数字业务技术代写
- 数字业务管理代写
- 结构分析代写
- tf-idf代写
- 地理代写
- financial modeling代写
- quantlib代写
- 电力电子元件代写
- atenda 2D代写
- 宏观代写
- 媒体代写
- 政治代写
- 化学代写
- 随机过程代写
- self attension算法代写
- arm assembly代写
- wireshark代写
- openCV代写
- Uncertainty Quantificatio代写
- prolong代写
- IPYthon代写
- Digital system design 代写
- julia代写
- Advanced Geotechnical Engineering代写
- 回答问题代写
- junit代写
- solidty代写
- maple代写
- 光电技术代写
- 网页代写
- 网络分析代写
- ENVI代写
- gimp代写
- sfml代写
- 社会学代写
- simulationX solidwork代写
- unity 3D代写
- ansys代写
- react native代写
- Alloy代写
- Applied Matrix代写
- JMP PRO代写
- 微观代写
- 人类健康代写
- 市场代写
- proposal代写
- 软件代写
- 信息检索代写
- 商法代写
- 信号代写
- pycharm代写
- 金融风险管理代写
- 数据可视化代写
- fashion代写
- 加拿大代写
- 经济学代写
- Behavioural Finance代写
- cytoscape代写
- 推荐代写
- 金融经济代写
- optimization代写
- alteryxy代写
- tabluea代写
- sas viya代写
- ads代写
- 实时系统代写
- 药剂学代写
- os代写
- Mathematica代写
- Xcode代写
- Swift代写
- rattle代写
- 人工智能代写
- 流体代写
- 结构力学代写
- Communications代写
- 动物学代写
- 问答代写
- MiKTEX代写
- 图论代写
- 数据科学代写
- 计算机安全代写
- 日本历史代写
- gis代写
- rs代写
- 语言代写
- 电学代写
- flutter代写
- drat代写
- 澳洲代写
- 医药代写
- ox代写
- 营销代写
- pddl代写
- 工程项目代写
- archi代写
- Propositional Logic代写
- 国际财务管理代写
- 高宏代写
- 模型代写
- 润色代写
- 营养学论文代写
- 热力学代写
- Acct代写
- Data Synthesis代写
- 翻译代写
- 公司法代写
- 管理学代写
- 建筑学代写
- 生理课程代写
- 动画代写
- 高数代写