



Introduction to Modeling parallelism with Intel® Advisor XE



# Data Driven Threading Design Intel® Advisor XE – Threading Prototyping Tool for Architects

- Have you:
- Tried threading an app, but seen little performance benefit?
- Hit a "scalability barrier"? Performance gains level off as you add cores?
- Delayed a release that adds threading because of synchronization errors?

Breakthrough for threading design:

Quickly prototype multiple options

Project scaling on larger systems

Find synchronization errors before implementing threading

Separate design and implementation, design without disrupting development

Add Parallelism with Less Effort, Less Risk and More Impact





### Amdahl's Law

- (paraphrased) "The benefit from parallelism is limited by the computation which remains serial"
- If you perfectly execute ½ of your application in parallel you will achieve < 2x speedup</p>
- The implication of this is that you must focus your attention where your application spends its time





### Agenda

- Survey
- Add Annotations
- Model Performance Suitability
- Check Correctness
- Add Parallel Framework
- Conclusion





### Intel® Advisor XE Workflow

Transforming many serial algorithms into parallel form takes 5 easy high-level steps:

- Survey and Summary tools: where to add parallelism
- **Annotations**: experiment with parallel program structure
- 3. Suitability tool: predict and model program scalability
- **Correctness tool**: discover potential synchronization problems
- Manually convert annotations to parallel framework API (with a 5. little help of Annotations/Summary)

#### Advisor XE Workflow



#### 1. Survey Target

Where should I consider adding parallelism? Locate the loops and functions where your program spends its time, and functions that call them.



Collect Survey Data



View Survey Result



#### 2. Annotate Sources

Add Intel Advisor XE annotations to identify possible parallel tasks and their enclosing parallel sites.

Steps to annotate



View Annotations



#### 3. Check Suitability

Analyze the annotated program to check its predicted parallel performance.



Collect Suitability Data



View Suitability Result



#### 4. Check Correctness

Predict parallel data sharing problems for the annotated tasks. Fix the reported sharing problems.



Collect Correctness Data



View Correctness Result



#### 5. Add Parallel Framework

Steps to replace annotations



View Summary

Current Project: Benchmarks





### Intel® Advisor XE Workflow

- Advisor XE guides you through these 5 steps
  - provides assisting tools
  - No auto-parallelization
- Model & evaluate potential return of parallelization investments.
- On your serial program



(Advisor XE toolbar)





#### 1. Survey Target

Where should I consider adding parallelism? Locate the loops and functions where your program spends its time, and functions that call them.



Collect Survey Data



View Survey Result



#### 2. Annotate Sources

Add Intel Advisor XE annotations to identify possible parallel tasks and their enclosing parallel sites.

Steps to annotate



View Annotations



#### 3. Check Suitability

Analyze the annotated program to check its predicted parallel performance.



Collect Suitability Data



View Suitability Result



#### 4. Check Correctness

Predict parallel data sharing problems for the annotated tasks. Fix the reported sharing problems.



Collect Correctness Data



View Correctness Result



#### 5. Add Parallel Framework

Steps to replace annotations



View Summary

Current Project: Benchmarks





## Intel® Advisor XE Advantages of Advisor XE modeling

#### Serial modeling benefits:

- Your application can't fail due to bugs caused by incorrect parallel execution (it's running serially)
- You can easily experiment with several different proposals before committing to a specific implementation
- 3. All of your test suites should still pass when validating the correctness of your transformations

AND you can use Advisor XE on partially or completely parallelized code.





#### Intel® Advisor XE

#### Advantages of Advisor XE modeling

Advisor XE modeling avoids the major design mistakes:

- 1. Measure performance, focus on hotspots.
- 2. Predict scalability, load balancing and overheads.
- 3. Predict data races

Automated analysis catches cases people miss.

Making good decisions early saves time.



Advisor XE increases parallelization ROI



### Project Set up





### Step 1: Survey Target







### Survey Report



### Drill down to Source Code in the hotspot



### **Advisor XE Annotation Concepts**

Advisor XE uses 3 primary concepts to create a model

#### SITE

 A region of code in your application you want to transform into parallel code

#### TASK

 The region of code in a SITE you want to execute in parallel with the rest of the code in the SITE

#### LOCK

 Mark regions of code in a TASK which must be serialized

#### **NOTE**

- All of these regions may be nested
- You may create more than one SITE
- Just macros, so work with any C/C++ compiler







### **Candidate: ADD Annotation**

48% Work::start

```
DWORD WINAPI work(void *pArg) {
≐#else
 void * work(void *pArg) {
 #endif
     int j = 0, i = 0;
     int tid = (int) pArg;
     for (j = 0; j < ITERATIONS; j++)</pre>
         for (i = tid; i < MAXSIZE; i+= NUM PROCS)</pre>
              a[i] = i + a[i] * b[i];
              localSum[tid] += a[i];
     return 0;
```

```
Go To Declaration
                                  Ctrl+F12
   Find All References
                                  Shift+F12

    ∀iew Call Hierarchy

                                  Ctrl+K, Ctrl+T
   Toggle Header / Code File
                                  Ctrl+K, Ctrl+O
   Intel Advisor XE 2015
                                                     Annotation Wizard.
   Breakpoint
                                                      Annotate Site
  Run To Cursor
                                  Ctrl+F10
                                                      Annotate Iteration Task
   Run Flagged Threads To Cursor
   Cut
                                  Ctrl+X
                                                                     efinitions Reference
                                  Ctrl. C
                   void * work(void *pArg)
                   #endif
                        int j = 0, i = 0;
                        int tid = (int) pArg;
                        ANNOTATE_SITE_BEGIN( MySite5 );
                        for (j = 0; j < ITERATIONS; j++)</pre>
                             ANNOTATE_ITERATION_TASK( MyTask9 );
                             for (i = tid; i < MAXSIZE; i+= NUM PROCS)</pre>
                                  a[i] = i + a[i] * b[i];
                                  localSum[tid] += a[i];
                        ANNOTATE SITE END();
                        return 0;
```

### Step 3:



**Scalability Graph** 

Intel Confidential Optimization Notice



## Adjustable: Target architecture, threading models and number of CPU



To set up data collection determine the target architecture, threading model and number of CPU's

Collect the Scalability data, and determine how it differs between the architectures and threading models.





### After collecting data change your view







### Lab 1: Survey, Annotate, Suitability



### Review of of Lab 1 Survey, Annotate, Suitability

What would be the best target architecture for this code?

Does loop scale well?

At how many iterations would you want to use Xeon Phi coprocessors instead of CPU's

Does the Threading model make a difference in the scalability?



#### Correctness Simulation

#### Find data sharing problems prior to implementation

- Data Sharing
- Data races

#### Intel Advisor XE provides a list of errors:

 shows a snippet of the code at all the related code locations

 Correctness Analysis watches the annotated sites for data sharing problems



### Correctness Analysis – Set up

Build a target executable using debug configuration build

Choose a reduced data set that allows execution of all control paths

Reduce data set size – divide by 8

258x84 executes about 1 minute

Be sure to thoroughly execute control paths

Annotate your program

Correctness only looks at annotated sites







#### **Check Correctness**

Recompile using Debug

Execute your program using



Execution will take longer as the code is executing in debug and the annotations in the code







### Correctness Report



Analyze your annotations to see if you made a correct choice





### Drill down to source code to get more information







### Fix the issues shown in Advisor and then Repeat...

You do not have to choose the perfect answer the first time, so you can go back and modify your choices

#### Iterative refinement will either

- Create a suitable and correct annotation proposal
- Conclude no viable sites are possible

Efficiently arriving at either answer is valuable





### Lab 2: Check Correctness

Reduce the size of the data set

Recompile with Debug

Start a project

Set up project properties

Check correctness

### Add Parallel Framework







### Summary

#### The Intel Advisor XE is a unique tool

- assists you to work smarter though detailed modeling
- guides you through the necessary steps
- leaves you in control of code and architectural choices
- lets you transform serial algorithms into parallel form faster

#### The parallel modeling methodology

- maintains your original application's semantics and behavior
- helps find the natural opportunities to exploit parallel execution





### Top 10 Questions

| 1. | Why we do not just insert the        |
|----|--------------------------------------|
|    | correct code at a push of the button |

Answer: the technology for parallelism is not quite there. See slide 5.

#### 2. Do you run on Xeon Phi coprocessor

Answer: no, we run on Xeon and then use heuristics and what we know about the data size of the operations to model the behavior if the code is run natively on Xeon Phi or run in offload to Xeon Phi.

3 How do Advisor and Amplifier results compare.

5

6

8

9

10

### Intel® Parallel Studio XE

Intel® Advisor XE is part of

❖Intel® Parallel Studio XE 2015

Please download and try today!

http://software.intel.com/en-us/intel-advisor-xe





### Legal Disclaimer & Optimization Notice

INFORMATION IN THIS DOCUMENT IS PROVIDED "AS IS". NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

Copyright ©, Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Xeon Phi, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries.

#### **Optimization Notice**

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804





