本次北美统计代写主要是R语言实现手写手别模型

Stat 432 Homework 11

## 操作说明

-您需要提交两个文件:
-您的.rmd RMarkdown(或Python)文件,应另存为HWx_yourNetID.Rmd。例如,`HW1_rqzhu.Rmd`。
-将RMarkdown文件编织为“ HW1_yourNetID.pdf”的结果。例如,“ HW1_rqzhu.pdf”。请注意,该文件必须是`.pdf`文件。无法接受`.html`格式。

-在报告中包括您的姓名和NetID。
-如果您将此文件或示例作业“ .Rmd”文件用作模板,请务必删除此说明部分。
-您的`.Rmd`文件应这样写:如果将它放置在包含您使用过的任何数据的文件夹中,则该文件可以正确编织而无需修改。
-确保正确设置种子,以便可以复制结果。
-对于某些问题,可以使用的软件包会有限制。请仔细阅读要求。

##问题1 [100分]提升

我们将再次使用来自“ ElemStatLearn”包的手写数字识别数据。我们仅考虑带有预定义的zip.train和zip.test的训练测试拆分。我们再次仅考虑两个数字:2和4。我们将使用交叉验证,使用训练数据选择最佳调整,然后根据测试数据评估最终模型。对于这个问题,请使用xgboost软件包(R和Python均提供),该软件包是boosting算法的快速实现。例如,`xgb.cv()`实现了交叉验证的版本,可用于调整参数。有关更多详细信息,您需要阅读该软件包的文档

完成此问题时,必须考虑以下事项:

*这是一个分类问题,因此您需要为此问题指定适当的模型。
*使用两种不同的基础学习器:线性学习器和树学习器。对于树型学习者,您应该调整最大深度(选择两个不同的值)。这些调整参数应使用参数“ params”指定,该参数应为列表。您可能会发现[此文档]很有用。
*还可以通过选择三个不同的值来调整学习率。
*您需要考虑的另一件事是选择最佳调音的标准。我们通常使用误分类错误作为标准。但是,对于这个问题,让我们使用之前实践的AUC标准。同样,这可以在`xgb.cv()`函数中指定。

拟合模型后,在测试数据上报告最终模型的预测准确性。

“`{r}
#手写数字识别数据
图书馆(ElemStatLearn)

#这是训练数据!
暗(zip.test)
火车= zip.test

#这是测试数据!
暗(zip.train)
测试= zip.train
“`

## Instruction

– You are required to submit two files:
– Your `.rmd` RMarkdown (or Python) file, which should be saved as `HWx_yourNetID.Rmd`. For example, `HW1_rqzhu.Rmd`.
– The result of knitting your RMarkdown file as `HW1_yourNetID.pdf`. For example, `HW1_rqzhu.pdf`. Please note that this must be a `.pdf` file. `.html` format cannot be accepted.

– Include your Name and NetID in your report.
– If you use this file or the example homework `.Rmd` file as a template, be sure to remove this instruction section.
– Your `.Rmd` file should be written such that, if it is placed in a folder with any data you utilize, it will knit properly without modification.
– Make sure that you set seed properly so that the results can be replicated.
– For some questions, there will be restrictions on what packages you can use. Please read the requirements carefully.

## Question 1 [100 Points] Boosting

We will use the handwritten digit recognition data again from the `ElemStatLearn` package. We only consider the train-test split, with the pre-defined `zip.train` and `zip.test`. We again only consider the two digits: 2 and 4. We will use cross-validation to choose the best tuning using the training data, then evaluate the final model on the testing data. For this question, use the `xgboost` package (available in both R and Python), which is a fast implementation of the boosting algorithm. For example, the `xgb.cv()` implements a cross-validated version, and can be used to tune parameters. For more details, you need to read the documentations of the package

When completing this question, you must consider the following:

* This is a classification problem, so you need to specify the appropriate model for this question.
* Use two different base learners: linear and tree. For the tree base learner, you should tune the maximum depth (choose two different values). These tuning parameters should be specified using the argument `params`, which should be a list. You may find [this document]
* Another thing you need to consider is the criteria for selecting the best tuning. We normally use the mis-classification error as the criteria. However, for this question, let’s use the AUC criteria which we practices before. Again, this can be specified in the `xgb.cv()` function.

After fitting the model,report the prediction accuracy of your final model on the testing data.

“`{r}
# Handwritten Digit Recognition Data
library(ElemStatLearn)

# this is the training data!
dim(zip.test)
train = zip.test

# this is the testing data!
dim(zip.train)
test = zip.train
“`