图书介绍

并行程序设计原理(英文版)pdf电子书版本下载

并行程序设计原理(英文版)
  • 出版社:
  • ISBN:
  • 出版时间:未知
  • 标注页数:0页
  • 文件大小:36MB
  • 文件页数:357页
  • 主题词:

PDF下载


点此进入-本书在线PDF格式电子书下载【推荐-云解压-方便快捷】直接下载PDF格式图书。移动端-PC端通用
下载压缩包 [复制下载地址] 温馨提示:(请使用BT下载软件FDM进行下载)软件下载地址页

下载说明

并行程序设计原理(英文版)PDF格式电子书版下载

下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。

建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如 BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!

(文件页数 要大于 标注页数,上中下等多册电子书除外)

注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具

图书目录

PART 1 Foundations 1

Chapter 1 Introduction 1

The Power and Potential of Parallelism 2

Parallelism,a Familiar Concept 2

Parallelism in Computer Programs 3

Multi-Core Computers,an Opportunity 4

Even More Opportunities to Use Parallel Hardware 5

Parallel Computing versus Distributed Computing 6

System Level Parallelism 6

Convenience of Parallel Abstractions 8

Examining Sequential and Parallel Programs 8

Parallelizing Compilers 8

A Paradigm Shift 9

Parallel Prefix Sum 13

Parallelism Using Multiple Instruction Streams 15

The Concept of a Thread 15

A Multithreaded Solution to Counting 3s 15

The Goals:Scalability and Performance Portability 25

Scalability 25

Performance Portability 26

Principles First 27

Chapter Summary 27

Historical Perspective 28

Exercises 28

Chapter 2 Understanding Parallel Computers 30

Balancing Machine Specifics with Portability 30

A Look at Six Parallel Computers 31

Chip Multiprocessors 31

Symmetric Multiprocessor Architectures 34

Heterogeneous Chip Designs 36

Clusters 39

Supercomputers 40

Observations from Our Six Parallel Computers 43

An Abstraction of a Sequential Computer 44

Applying the RAM Model 44

Evaluating the RAM Model 45

The PRAM:A Parallel Computer Model 46

The CTA:A Practical Parallel Computer Model 47

The CTA Model 47

Communication Latency 49

Properties of the CTA 52

Memory Reference Mechanisms 53

Shared Memory 53

One-Sided Communication 54

Message Passing 54

Memory Consistency Models 55

Programming Models 56

A Closer Look at Communication 57

Applying the CTA Model 58

Chapter Summary 59

Historical Perspective 59

Exercises 59

Chapter 3 Reasoning about Performance 61

Motivation and Basic Concepts 61

Parallelism versus Performance 61

Threads and Processes 62

Latency and Throughput 62

Sources of Performance Loss 64

Overhead 64

Non-Parallelizable Code 65

Contention 67

Idle Time 67

Parallel Structure 68

Dependences 68

Dependences Limit Parallelism 70

Granularity 72

Locality 73

Performance Trade-Offs 73

Communication versus Computation 74

Memory versus Parallelism 75

Overhead versus Parallelism 75

Measuring Performance 77

Execution Time 77

Speedup 78

Superlinear Speedup 78

Efficiency 79

Concerns with Speedup 79

Scaled Speedup versus Fixed-Size Speedup 81

Scalable Performance 81

Scalable Performance Is Difficult to Achieve 81

Implications for Hardware 82

Implications for Software 83

Scaling the Problem Size 83

Chapter Summary 84

Historical Perspective 84

Exercises 85

PART 2 Parallel Abstractions 87

Chapter 4 First Steps Toward Parallel Programming 88

Data and Task Parallelism 88

Definitions 88

Illustrating Data and Task Parallelism 89

The Peril-L Notation 89

Extending C 90

Parallel Threads 90

Synchronization and Coordination 91

Memory Model 92

Synchronized Memory 94

Reduce and Scan 95

The Reduce Abstraction 96

Count 3s Example 97

Formulating Parallelism 97

Fixed Parallelism 97

Unlimited Parallelism 98

Scalable Parallelism 99

Alphabetizing Example 100

Unlimited Parallelism 101

Fixed Parallelism 102

Scalable Parallelism 104

Comparing the Three Solutions 109

Chapter Summary 110

Historical Perspective 110

Exercises 110

Chapter 5 Scalable Algorithmic Techniques 112

Blocks of Independent Computation 112

Schwartz'Algorithm 113

The Reduce and Scan Abstractions 115

Example of Generalized Reduces and Scans 116

The Basic Structure 118

Structure for Generalized Reduce 119

Example of Components of a Generalized Scan 122

Applying the Generalized Scan 124

Generalized Vector Operations 125

Assigning Work to Processes Statically 125

Block Allocations 126

Overlap Regions 128

Cyclic and Block Cyclic Allocations 129

Irregular Allocations 132

Assigning Work to Processes Dynamically 134

Work Queues 134

Variations of Work Queues 137

Case Study:Concurrent Memory Allocation 137

Trees 139

Allocation by Sub-Tree 139

Dynamic Allocations 140

Chapter Summary 141

Historical Perspective 142

Exercises 142

PART 3 Parallel Programming Languages 143

Chapter 6 Programming with Threads 145

POSIX Threads 145

Thread Creation and Destruction 146

Mutual Exclusion 150

Synchronization 153

Safety Issues 163

Performance Issues 167

Case Study:Successive Over-Relaxation 174

Case Study:Overlapping Synchronization with Computation 179

Case Study:Streaming Computations on a Multi-Core Chip 187

Java Threads 187

Synchronized Methods 189

Synchronized Statements 189

The Count 3s Example 190

Volatile Memory 192

Atomic Objects 192

Lock Objects 193

Executors 193

Concurrent Collections 193

OpenMP 193

The Count 3s Example 194

Semantic Limitations on parallel for 195

Reduction 196

Thread Behavior and Interaction 197

Sections 199

Summary of OpenMP 199

Chapter Summary 200

Historical Perspective 200

Exercises 200

Chapter 7 MPI and Other Local View Languages 202

MPI:The Message Passing Interface 202

The Count 3s Example 203

Groups and Communicators 211

Point-to-Point Communication 212

Collective Communication 214

Example:Successive Over-Relaxation 219

Performance Issues 222

Safety Issues 228

Partitioned Global Address Space Languages 229

Co-Array Fortran 230

Unified Parallel C 231

Titanium 232

Chapter Summary 233

Historical Perspective 234

Exercises 234

Chapter 8 ZPL and Other Global View Languages 236

The ZPL Programming Language 236

Basic Concepts of ZPL 237

Regions 237

Array Computation 240

Life,an Example 242

The Problem 242

The Solution 242

How It Works 243

The Philosophv of Life 245

Distinguishing Features of ZPL 245

Regions 245

Statement-Level Indexing 245

Restrictions Imposed by Regions 246

Performance Model 246

Addition by Subtraction 247

Manipulating Arrays of Different Ranks 247

Partial Reduce 248

Flooding 249

The Flooding Principle 250

Data Manipulation,an Example 251

Flood Regions 252

Matrix Multiplication 253

Reordering Data with Remap 255

Index Arrays 255

Remap 256

Ordering Example 258

Parallel Execution of ZPL Programs 260

Role of the Compiler 260

Specifying the Number of Processes 261

Assigning Regions to Processes 261

Array Allocation 262

Scalar Allocation 263

Work Assignment 263

Performance Model 264

Applying the Performance Model:Life 265

Applying the Performance Model:SUMMA 266

Summary of the Performance Model 266

NESL Parallel Language 267

Language Concepts 267

Matrix Product Using Nested Parallelism 268

NESL Complexity Model 269

Chapter Summary 269

Historical Perspective 269

Exercises 270

Chapter 9 Assessing the State of the Art 271

Four Important Properties of Parallel Languages 271

Correctness 271

Performance 273

Scalability 274

Portability 274

Evaluating Existing Approaches 275

POSIX Threads 275

Java Threads 276

OpenMP 276

MPI 276

PGAS Languages 277

ZPL 278

NESL 278

Lessons for the Future 279

Hidden Parallelism 279

Transparent Performance 280

Locality 280

Constrained Parallelism 280

Implicit versus Explicit Parallelism 281

Chapter Summary 282

Historical Perspective 282

Exercises 282

PART 4 Looking Forward 283

Chapter 10 Future Directions in Parallel Programming 284

Attached Processors 284

Graphics Processing Units 285

Cell Processors 288

Attached Processors Summary 288

Grid Computing 290

Transactional Memory 291

Comparison with Locks 292

Implementation Issues 293

Open Research Issues 295

MapReduce 296

Problem Space Promotion 298

Emerging Languages 299

Chapel 300

Fortress 300

X10 302

Chapter Summary 304

Historical Perspective 304

Exercises 304

Chapter 11 Writing Parallel Programs 305

Getting Started 305

Access and Software 305

Hello,World 306

Parallel Programming Recommendations 307

Incremental Development 307

Focus on the Parallel Structure 307

Testing the Parallel Structure 308

Sequential Programming 309

Be Willing to Write Extra Code 309

Controlling Parameters during Testing 310

Functional Debugging 310

Capstone Project Ideas 311

Implementing Existing Parallel Algorithms 311

Competing with Standard Benchmarks 312

Developing New Parallel Computations 313

Performance Measurement 314

Comparing against a Sequential Solution 315

Maintaining a Fair Experimental Setting 315

Understanding Parallel Performance 316

Performance Analysis 317

Experimental Methodology 318

Portability and Tuning 319

Chapter Summary 319

Historical Perspective 319

Exercises 320

Glossary 321

References 325

Index 328

精品推荐