图书介绍

R语言机器学习 第2版 影印版pdf电子书版本下载

R语言机器学习  第2版  影印版
  • Brett Lantz 著
  • 出版社: 南京:东南大学出版社
  • ISBN:9787564170714
  • 出版时间:2017
  • 标注页数:427页
  • 文件大小:61MB
  • 文件页数:448页
  • 主题词:程序语言-程序设计-英文

PDF下载


点此进入-本书在线PDF格式电子书下载【推荐-云解压-方便快捷】直接下载PDF格式图书。移动端-PC端通用
种子下载[BT下载速度快] 温馨提示:(请使用BT下载软件FDM进行下载)软件下载地址页 直链下载[便捷但速度慢]   [在线试读本书]   [在线获取解压码]

下载说明

R语言机器学习 第2版 影印版PDF格式电子书版下载

下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。

建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如 BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!

(文件页数 要大于 标注页数,上中下等多册电子书除外)

注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具

图书目录

Chapter 1:Introducing Machine Learning 1

The origins of machine learning 2

Uses and abuses of machine learning 4

Machine learning successes 5

The limits of machine learning 5

Machine learning ethics 7

How machines learn 9

Data storage 10

Abstraction 11

Generalization 13

Evaluation 14

Machine learning in practice 16

Types of input data 17

Types of machine learning algorithms 19

Matching input data to algorithms 21

Machine learning with R 22

Installing R packages 23

Loading and unloading R packages 24

Summary 25

Chapter 2:Managing and Understanding Data 27

R data structures 28

Vectors 28

Factors 30

Lists 32

Data frames 35

Matrixes and arrays 37

Managing data with R 39

Saving,loading,and removing R data structures 39

Importing and saving data from CSV files 41

Exploring and understanding data 42

Exploring the structure of data 43

Exploring numeric variables 44

Measuring the central tendency-mean and median 45

Measuring spread-quartiles and the five-number summary 47

Visualizing numeric variables-boxplots 49

Visualizing numeric variables-histograms 51

Understanding numeric data-uniform and normal distributions 53

Measuring spread-variance and standard deviation 54

Exploring categorical variables 56

Measuring the central tendency-the mode 58

Exploring relationships between variables 59

Visualizing relationships-scatterplots 59

Examining relationships-two-way cross-tabulations 61

Summary 64

Chapter 3:Lazy Learning-Classification Using Nearest Neighbors 65

Understanding nearest neighbor classification 66

The k-NN algorithm 66

Measuring similarity with distance 69

Choosing an appropriate k 70

Preparing data for use with k-NN 72

Why is the k-NN algorithm lazy? 74

Example-diagnosing breast cancer with the k-NN algorithm 75

Step 1-collecting data 76

Step 2-exploring and preparing the data 77

Transformation-normalizing numeric data 79

Data preparation-creating training and test datasets 80

Step 3-training a model on the data 81

Step 4-evaluating model performance 83

Step 5-improving model performance 84

Transformation-z-score standardization 85

Testing alternative values of k 86

Summary 87

Chapter 4:Probabilistic Learning-Classification Using Naive Bayes 89

Understanding Naive Bayes 90

Basic concepts of Bayesian methods 90

Understanding probability 91

Understanding joint probability 92

Computing conditional probability with Bayes'theorem 94

The Naive Bayes algorithm 97

Classification with Naive Bayes 98

The Laplace estimator 100

Using numeric features with Naive Bayes 102

Example-filtering mobile phone spam with the Naive Bayes algorithm 103

Step 1-collecting data 104

Step 2-exploring and preparing the data 105

Data preparation-cleaning and standardizing text data 106

Data preparation-splitting text documents into words 112

Data preparation-creating training and test datasets 115

Visualizing text data-word clouds 116

Data preparation-creating indicator features for frequent words 119

Step 3-training a model on the data 121

Step 4-evaluating model performance 122

Step 5-improving model performance 123

Summary 124

Chapter 5:Divide and Conquer-Classification Using Decision Trees and Rules 125

Understanding decision trees 126

Divide and conquer 127

The C5.0 decision tree algorithm 131

Choosing the best split 133

Pruning the decision tree 135

Example-identifying risky bank loans using C5.0 decision trees 136

Step 1-collecting data 136

Step 2-exploring and preparing the data 137

Data preparation-creating random training and test datasets 138

Step 3-training a model on the data 140

Step 4-evaluating model performance 144

Step 5-improving model performance 145

Boosting the accuracy of decision trees 145

Making mistakes more costlier than others 147

Understanding classification rules 149

Separate and conquer 150

The 1R algorithm 153

The RIPPER algorithm 155

Rules from decision trees 157

What makes trees and rules greedy? 158

Example-identifying poisonous mushrooms with rule learners 160

Step 1-collecting data 160

Step 2-exploring and preparing the data 161

Step 3-training a model on the data 162

Step 4-evaluating model performance 165

Step 5-improving model performance 166

Summary 169

Chapter 6:Forecasting Numeric Data-Regression Methods 171

Understanding regression 172

Simple linear regression 174

Ordinary least squares estimation 177

Correlations 179

Multiple linear regression 181

Example-predicting medical expenses using linear regression 186

Step 1-collecting data 186

Step 2-exploring and preparing the data 187

Exploring relationships among features-the correlation matrix 189

Visualizing relationships among features-the scatterplot matrix 190

Step 3-training a model on the data 193

Step 4-evaluating model performance 196

Step 5-improving model performance 197

Model specification-adding non-linear relationships 198

Transformation-converting a numeric variable to a binary indicator 198

Model specification-adding interaction effects 199

Putting it all together-an improved regression model 200

Understanding regression trees and model trees 201

Adding regression to trees 202

Example-estimating the quality of wines with regression trees and model trees 205

Step 1-collecting data 205

Step 2-exploring and preparing the data 206

Step 3-training a model on the data 208

Visualizing decision trees 210

Step 4-evaluating model performance 212

Measuring performance with the mean absolute error 213

Step 5-improving model performance 214

Summary 218

Chapter 7:Black Box Methods-Neural Networks and Support Vector Machines 219

Understanding neural networks 220

From biological to artificial neurons 221

Activation functions 223

Network topology 225

The number of layers 226

The direction of information travel 227

The number of nodes in each layer 228

Training neural networks with backpropagation 229

Example-Modeling the strength of concrete with ANNs 231

Step 1-collecting data 232

Step 2-exploring and preparing the data 232

Step 3-training a model on the data 234

Step 4-evaluating model performance 237

Step 5-improving model performance 238

Understanding Support Vector Machines 239

Classification with hyperplanes 240

The case of linearly separable data 242

The case of nonlinearly separable data 244

Using kernels for non-linear spaces 245

Example-performing OCR with SVMs 248

Step 1-collecting data 249

Step 2-exploring and preparing the data 250

Step 3-training a model on the data 252

Step 4-evaluating model performance 254

Step 5-improving model performance 256

Chapter 8:Finding Patterns-Market Basket Analysis Using Association Rules 259

Understanding association rules 260

The Apriori algorithm for association rule learning 261

Measuring rule interest-support and confidence 263

Building a set of rules with the Apriori principle 265

Example-identifying frequently purchased groceries with association rules 266

Step 1-collecting data 266

Step 2-exploring and preparing the data 267

Data preparation-creating a sparse matrix for transaction data 268

Visualizing item support-item frequency plots 272

Visualizing the transaction data-plotting the sparse matrix 273

Step 3-training a model on the data 274

Step 4-evaluating model performance 277

Step 5-improving model performance 280

Sorting the set of association rules 280

Taking subsets of association rules 281

Saving association rules to a file or data f?ame 283

Summary 284

Chapter 9:Finding Groups of Data-Clustering with k-means 285

Understanding clustering 286

Clustering as a machine learning task 286

The k-means clustering algorithm 289

Using distance to assign and update clusters 290

Choosing the appropriate number of clusters 294

Example-finding teen market segments using k-means clustering 296

Step 1-collecting data 297

Step 2-exploring and preparing the data 297

Data preparation-dummy coding missing values 299

Data preparation-imputing the missing values 300

Step 3-training a model on the data 302

Step 4-evaluating model performance 304

Step 5-improving model performance 308

Summary 310

Chapter 10:Evaluating Model Performance 311

Measuring performance for classification 312

Working with classification prediction data in R 313

A closer look at confusion matrices 317

Using confusion matrices to measure performance 319

Beyond accuracy-other measures of performance 321

The kappa statistic 323

Sensitivity and specificity 326

Precision and recall 328

The F-measure 330

Visualizing performance trade-offs 331

ROC curves 332

Estimating future performance 336

The holdout method 336

Cross-validation 340

Bootstrap sampling 343

Summary 344

Chapter 11:Improving Model Performance 347

Tuning stock models for better performance 348

Using caret for automated parameter tuning 349

Creating a simple tuned model 352

Customizing the tuning process 355

Improving model performance with meta-learning 359

Understanding ensembles 359

Bagging 362

Boosting 366

Random forests 369

Training random forests 370

Evaluating random forest performance 373

Summary 375

Chapter 12:Specialized Machine Learning Topics 377

Working with proprietary files and databases 378

Reading from and writing to Microsoff Excel,SAS,SPSS,and Stata files 378

Querying data in SQL databases 379

Working with online data and services 381

Downloading the complete text of web pages 382

Scraping data from web pages 383

Parsing XML documents 387

Parsing JSON from web APIs 388

Working with domain-specific data 392

Analyzing bioinformatics data 393

Analyzing and visualizing network data 393

Improving the performance of R 398

Managing very large datasets 398

Generalizing tabular data structures with dplyr 399

Making data frames faster with data.table 401

Creating disk-based data frames with ff 402

Using massive matrices with bigmemory 404

Learning faster with parallel computing 404

Measuring execution time 406

Working in parallel with multicore and snow 406

Taking advantage of parallel with foreach and doParallel 410

Parallel cloud computing with MapReduce and Hadoop 411

GPU computing 412

Deploying optimized learning algorithms 413

Building bigger regression models with biglm 414

Growing bigger and faster random forests with bigrf 414

Training and evaluating models in parallel with caret 414

Summary 416

Index 417

精品推荐