图书介绍

Hadoop权威指南 英文 第4版pdf电子书版本下载

Hadoop权威指南  英文  第4版
  • (美)怀特著 著
  • 出版社: 南京:东南大学出版社
  • ISBN:9787564159177
  • 出版时间:2015
  • 标注页数:730页
  • 文件大小:77MB
  • 文件页数:756页
  • 主题词:数据处理软件-指南-英文

PDF下载


点此进入-本书在线PDF格式电子书下载【推荐-云解压-方便快捷】直接下载PDF格式图书。移动端-PC端通用
种子下载[BT下载速度快] 温馨提示:(请使用BT下载软件FDM进行下载)软件下载地址页 直链下载[便捷但速度慢]   [在线试读本书]   [在线获取解压码]

下载说明

Hadoop权威指南 英文 第4版PDF格式电子书版下载

下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。

建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如 BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!

(文件页数 要大于 标注页数,上中下等多册电子书除外)

注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具

图书目录

Part Ⅰ.Hadoop Fundamentals 3

1.Meet Hadoop 3

Data! 3

Data Storage and Analysis 5

Querying All Your Data 6

Beyond Batch 7

Comparison with Other Systems 8

Relational Database Management Systems 8

Grid Computing 10

Volunteer Computing 11

A Brief History of Apache Hadoop 12

What's in This Book? 15

2.MapReduce 19

A Weather Dataset 19

Data Format 19

Analyzing the Data with Unix Tools 21

Analyzing the Data with Hadoop 22

Map and Reduce 22

Java Map Reduce 24

Scaling Out 30

Data Flow 30

Combiner Functions 34

Running a Distributed Map Reduce Job 37

Hadoop Streaming 37

Ruby 37

Python 40

3.The Hadoop Distributed Filesystem 43

The Design of HDFS 43

HDFS Concepts 45

Blocks 45

Namenodes and Datanodes 46

Block Caching 47

HDFS Federation 48

HDFS High Availability 48

The Command-Line Interface 50

Basic Filesystem Operations 51

Hadoop Filesystems 53

Interfaces 54

The Java Interface 56

Reading Data from a Hadoop URL 57

Reading Data Using the FileSystem API 58

Writing Data 61

Directories 63

Querying the Filesystem 63

Deleting Data 68

Data Flow 69

Anatomy of a File Read 69

Anatomy of a File Write 72

Coherency Model 74

Parallel Copying with distcp 76

Keeping an HDFS Cluster Balanced 77

4.YARN 79

Anatomy of a YARN Application Run 80

Resource Requests 81

Application Lifespan 82

Building YARN Applications 82

YARN Compared to MapReduce 1 83

Scheduling in YARN 85

Scheduler Options 86

Capacity Scheduler Configuration 88

Fair Scheduler Configuration 90

Delay Scheduling 94

Dominant Resource Fairness 95

Further Reading 96

5.Hadoop I/O 97

Data Integrity 97

Data Integrity in HDFS 98

LocalFileSystem 99

ChecksumFileSystem 99

Compression 100

Codecs 101

Compression and Input Splits 105

Using Compression in MapReduce 107

Serialization 109

The Writable Interface 110

Writable Classes 113

Implementing a Custom Writable 121

Serialization Frameworks 126

File-Based Data Structures 127

SequenceFile 127

MapFile 135

Other File Formats and Column-Oriented Formats 136

Part Ⅱ.MapReduce 141

6.Developing a MapReduce Application 141

The Configuration API 141

Combining Resources 143

Variable Expansion 143

Setting Up the Development Environment 144

Managing Configuration 146

GeneficOptionsParser,Tool,and ToolRunner 148

Writing a Unit Test with MRUnit 152

Mapper 153

Reducer 156

Running Locally on Test Data 156

Running a Job in a Local Job Runner 157

Testing the Driver 158

Running on a Cluster 160

Packaging a Job 160

Launching a Job 162

The MapReduce Web UI 165

Retrieving the Results 167

Debugging a Job 168

Hadoop Logs 172

Remote Debugging 174

Tuning a Job 175

Profiling Tasks 175

MapReduce Workflows 177

Decomposing a Problem into MapReduce Jobs 177

JobControl 178

Apache Oozie 179

7.How Map Reduce Works 185

Anatomy of a MapReduce Job Run 185

Job Submission 186

Job Initialization 187

Task Assignment 188

Task Execution 189

Progress and Status Updates 190

Job Completion 192

Failures 193

Task Failure 193

Application Master Failure 194

Node Manager Failure 195

Resource Manager Failure 196

Shuffle and Sort 197

The Map Side 197

The Reduce Side 198

Configuration Tuning 201

Task Execution 203

The Task Execution Environment 203

Speculative Execution 204

Output Committers 206

8.MapReduce Types and Formats 209

MapReduce Types 209

The Default MapReduce Job 214

Input Formats 220

Input Splits and Records 220

Text Input 232

Binary Input 236

Multiple Inputs 237

Database Input(and Output) 238

Output Formats 238

Text Output 239

Binary Output 239

Multiple Outputs 240

Lazy Output 245

Database Output 245

9.MapReduce Features 247

Counters 247

Built-in Counters 247

User-Defined Java Counters 251

User-Defined Streaming Counters 255

Sorting 255

Preparation 256

Partial Sort 257

Total Sort 259

Secondary Sort 262

Joins 268

Map-Side Joins 269

Reduce-Side Joins 270

Side Data Distribution 273

Using the Job Configuration 273

Distributed Cache 274

MapReduce Library Classes 279

Part Ⅲ.Hadoop Operations 283

1O.Setting Up a Hadoop Cluster 283

Cluster Specification 284

Cluster Sizing 285

Network Topology 286

Cluster Setup and Installation 288

Installing Java 288

Creating Unix User Accounts 288

Installing Hadoop 289

Configuring SSH 289

Configuring Hadoop 290

Formatting the HDFS Filesystem 290

Starting and Stopping the Daemons 290

Creating User Directories 292

Hadoop Configuration 292

Configuration Management 293

Environment Settings 294

Important Hadoop Daemon Properties 296

Hadoop Daemon Addresses and Ports 304

Other Hadoop Properties 307

Security 309

Kerberos and Hadoop 309

Delegation Tokens 312

Other Security Enhancements 313

Benchmarking a Hadoop Cluster 314

Hadoop Benchmarks 314

User Jobs 316

11.Administering Hadoop 317

HDFS 317

Persistent Data Structures 317

Safe Mode 322

Audit Logging 324

Tools 325

Monitoring 330

Logging 330

Metrics and JMX 331

Maintenance 332

Routine Administration Procedures 332

Commissioning and Decommissioning Nodes 334

Upgrades 337

Part Ⅳ.Related Projects 345

12.Avro 345

Avro Data Types and Schemas 346

In-Memory Serialization and Deserialization 349

The Specific API 351

Avro Datafiles 352

Interoperability 354

Python API 354

Avro Tools 355

Schema Resolution 355

Sort Order 358

Avro MapReduce 359

Sorting Using Avro MapReduce 363

Avro in Other Languages 365

13.Parquet 367

Data Model 368

Nested Encoding 370

Parquet File Format 370

Parquet Configuration 372

Writing and Reading Parquet Files 373

Avro,Protocol Buffers,and Thrift 375

Parquet MapReduce 377

14.Flume 381

Installing Flume 381

An Example 382

Transactions and Reliability 384

Batching 385

The HDFS Sink 385

Partitioning and Interceptors 387

File Formats 387

Fan Out 388

Delivery Guarantees 389

Replicating and Multiplexing Selectors 390

Distribution:Agent Tiers 390

Delivery Guarantees 393

Sink Groups 395

Integrating Flume with Applications 398

Component Catalog 399

Further Reading 400

15.Sqoop 401

Getting Sqoop 401

Sqoop Connectors 403

A Sample Import 404

Text and Binary File Formats 406

Generated Code 407

Additional Serialization Systems 408

Imports:A Deeper Look 408

Controlling the Import 410

Imports and Consistency 411

Incremental Imports 411

Direct-Mode Imports 411

Working with Imported Data 412

Imported Data and Hive 413

Importing Large Objects 415

Performing an Export 417

Exports:A Deeper Look 419

Exports and Transactionality 420

Exports and SequenceFiles 421

Further Reading 422

16.Pig 423

Installing and Running Pig 424

Execution Types 424

Running Pig Programs 426

Grunt 426

Pig Latin Editors 427

An Example 427

Generating Examples 429

Comparison with Databases 430

Pig Latin 432

Structure 432

Statements 433

Expressions 438

Types 439

Schemas 441

Functions 445

Macros 447

User-Defined Functions 448

A Filter UDF 448

An Eval UDF 452

A Load UDF 453

Data Processing Operators 457

Loading and Storing Data 457

Filtering Data 457

Grouping and Joining Data 459

Sorting Data 465

Combining and Splitting Data 466

Pig in Pracfice 467

Parallelism 467

Anonymous Relations 467

Parameter Substitution 468

Further Reading 469

17.Hive 471

Installing Hive 472

The Hive Shell 473

An Example 474

Running Hive 475

Configuring Hive 475

Hive Services 478

The Metastore 480

Comparison with Traditional Databases 482

Schema on Read Versus Schema on Write 482

Updates,Transactions,and Indexes 483

SQL-on-Hadoop Alternatives 484

HiveQL 485

Data Types 486

Operators and Functions 488

Tables 489

Managed Tables and External Tables 490

Partitions and Buckets 491

Storage Formats 496

Importing Data 500

Altering Tables 502

Dropping Tables 502

Querying Data 503

Sorting and Aggregating 503

MapReduce Scripts 503

Joins 505

Subqueries 508

Views 509

User-Defined Functions 510

Writing a UDF 511

Writing a UDAF 513

Further Reading 518

18.Crunch 519

An Example 520

The Core Crunch API 523

Primitive Operations 523

Types 528

Sources and Targets 531

Functions 533

Materialization 535

Pipeline Execution 538

Running a Pipeline 538

Stopping a Pipeline 539

Inspecting a Crunch Plan 540

Iterative Algorithms 543

Checkpointing a Pipeline 545

Crunch Libraries 545

Further Reading 548

19.Spark 549

Installing Spark 550

An Example 550

Spark Applications,Jobs,Stages,and Tasks 552

A Scala Standalone Application 552

A Java Example 554

A Python Example 555

Resilient Distributed Datasets 556

Creation 556

Transformations and Actions 557

Persistence 560

Serialization 562

Shared Variables 564

Broadcast Variables 564

Accumulators 564

Anatomy of a Spark Job Run 565

Job Submission 565

DAG Construction 566

Task Scheduling 569

Task Execution 570

Executors and Cluster Managers 570

Spark on YARN 571

Further Reading 574

20.HBase 575

HBasics 575

Backdrop 576

Concepts 576

Whirlwind Tour of the Data Model 576

Implementation 578

Installation 581

Test Drive 582

Clients 584

Java 584

MapReduce 587

REST and Thrift 589

Building an Online Query Application 589

Schema Design 590

Loading Data 591

Online Queries 594

HBase Versus RDBMS 597

Successful Service 598

HBase 599

Praxis 600

HDFS 600

UI 601

Metrics 601

Counters 601

Further Reading 601

21.ZooKeeper 603

Installing and Running ZooKeeper 604

An Example 606

Group Membership in ZooKeeper 606

Creating the Group 607

Joining a Group 609

Listing Members in a Group 610

Deleting a Group 612

The ZooKeeper Service 613

Data Model 614

Operations 616

Implementation 620

Consistency 622

Sessions 624

States 625

Building Applications with ZooKeeper 627

A Configuration Service 627

The Resilient ZooKeeper Application 630

A Lock Service 634

More Distributed Data Structures and Protocols 636

ZooKeeper in Production 637

Resilience and Performance 637

Configuration 639

Further Reading 640

Part Ⅴ.Case Studies 643

22.Composable Data at Cerner 643

From CPUs to Semantic Integration 643

Enter Apache Crunch 644

Building a Complete Picture 644

Integrating Healthcare Data 647

Composability over Frameworks 650

Moving Forward 651

23.Biological Data Science:Saving Lives with Software 653

The Structure of DNA 655

The Genetic Code:Turning DNA Letters into Proteins 656

Thinking of DNA as Source Code 657

The Human Genome Project and Reference Genomes 659

Sequencing and Aligning DNA 660

ADAM,A Scalable Genome Analysis Platform 661

Literate programming with the Avro interface description language(IDL) 662

Column-oriented access with Parquet 663

A simple example:k-mer counting using Spark and ADAM 665

From Personalized Ads to Personalized Medicine 667

Join In 668

24.Cascading 669

Fields,Tuples,and Pipes 670

Operations 673

Taps,Schemes,and Flows 675

Cascading in Practice 676

Flexibility 679

Hadoop and Cascading at ShareThis 680

Summary 684

A.Installing Apache Hadoop 685

B.Cloudera's Distribution Including Apache Hadoop 691

C.Preparing the NCDC Weather Data 693

D.The Old and New Java MapReduce APIs 697

Index 701

精品推荐