{"id":2695,"date":"2025-01-25T11:24:12","date_gmt":"2025-01-25T03:24:12","guid":{"rendered":"https:\/\/www.gnn.club\/?p=2695"},"modified":"2025-03-12T15:06:41","modified_gmt":"2025-03-12T07:06:41","slug":"tutorial-06-%e4%b8%be%e4%b8%80%e5%8f%8d%e4%b8%89%ef%bc%9a%e8%bf%81%e7%a7%bb%e5%ad%a6%e4%b9%a0%ef%bc%88transfer-learning%ef%bc%89","status":"publish","type":"post","link":"http:\/\/gnn.club\/?p=2695","title":{"rendered":"Tutorial 06 &#8211; \u4e3e\u4e00\u53cd\u4e09\uff1a\u8fc1\u79fb\u5b66\u4e60\uff08Transfer Learning\uff09"},"content":{"rendered":"<h1>Learning Methods of Deep Learning<\/h1>\n<hr \/>\n<p>create by Deepfinder<\/p>\n<h3><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/bubbles\/50\/000000\/checklist.png\" style=\"height:50px;display:inline\"> Agenda<\/h3>\n<hr \/>\n<ol>\n<li>\u5e08\u5f92\u76f8\u6388\uff1a\u6709\u76d1\u7763\u5b66\u4e60\uff08Supervised Learning\uff09<\/li>\n<li>\u89c1\u5fae\u77e5\u8457\uff1a\u65e0\u76d1\u7763\u5b66\u4e60\uff08Un-supervised Learning\uff09<\/li>\n<li>\u65e0\u5e08\u81ea\u901a\uff1a\u81ea\u76d1\u7763\u5b66\u4e60\uff08Self-supervised Learning\uff09<\/li>\n<li>\u4ee5\u70b9\u5e26\u9762\uff1a\u534a\u76d1\u7763\u5b66\u4e60\uff08Semi-supervised learning\uff09<\/li>\n<li>\u660e\u8fa8\u662f\u975e\uff1a\u5bf9\u6bd4\u5b66\u4e60\uff08Contrastive Learning\uff09<\/li>\n<li><strong>\u4e3e\u4e00\u53cd\u4e09\uff1a\u8fc1\u79fb\u5b66\u4e60\uff08Transfer Learning\uff09<\/strong><\/li>\n<li>\u9488\u950b\u76f8\u5bf9\uff1a\u5bf9\u6297\u5b66\u4e60\uff08Adversarial Learning\uff09<\/li>\n<li>\u4f17\u5fd7\u6210\u57ce\uff1a\u96c6\u6210\u5b66\u4e60(Ensemble Learning) <\/li>\n<li>\u6b8a\u9014\u540c\u5f52\uff1a\u8054\u90a6\u5b66\u4e60\uff08Federated Learning\uff09<\/li>\n<li>\u767e\u6298\u4e0d\u6320\uff1a\u5f3a\u5316\u5b66\u4e60\uff08Reinforcement Learning\uff09<\/li>\n<li>\u6c42\u77e5\u82e5\u6e34\uff1a\u4e3b\u52a8\u5b66\u4e60\uff08Active Learning\uff09<\/li>\n<li>\u4e07\u6cd5\u5f52\u5b97\uff1a\u5143\u5b66\u4e60\uff08Meta-Learning\uff09<\/li>\n<\/ol>\n<h2>Tutorial 06 - \u4e3e\u4e00\u53cd\u4e09\uff1a\u8fc1\u79fb\u5b66\u4e60\uff08Transfer Learning\uff09<\/h2>\n<h2><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/bubbles\/50\/000000\/car.png\" style=\"height:50px;display:inline\"> \u8fc1\u79fb\u5b66\u4e60\u5e94\u7528<\/h2>\n<hr \/>\n<p><strong>Domain Adaptation<\/strong> - \u9886\u57df\u81ea\u9002\u5e94\uff1a<\/p>\n<ul>\n<li>\n<p>\u6838\u5fc3\u6982\u5ff5\uff1a\u5c06\u5df2\u7ecf\u5728\u4e00\u4e2a\u9886\u57df\uff08\u6e90\u9886\u57df\uff09\u5b66\u5230\u7684\u77e5\u8bc6\uff0c\u8fc1\u79fb\u6216\u9002\u914d\u5230\u53e6\u5916\u4e00\u4e2a\u4e0d\u540c\u4f46\u76f8\u5173\u7684\u9886\u57df\uff08\u76ee\u6807\u9886\u57df\uff09\u3002<\/p>\n<\/li>\n<li>\n<p>\u89c6\u89c9\uff08CV\uff09\u793a\u4f8b\uff1a\u5728\u56fe\u50cf\u8bc6\u522b\u4efb\u52a1\u4e2d\uff0c\u6211\u4eec\u5f80\u5f80\u5728\u4e00\u4e2a\u9886\u57df\u91cc\u6709\u5927\u91cf\u6807\u6ce8\u6570\u636e\uff0c\u800c\u5bf9\u771f\u6b63\u5173\u5fc3\u7684\u9886\u57df\u5374\u6ca1\u6709\u6216\u51e0\u4e4e\u6ca1\u6709\u6807\u6ce8\u6570\u636e\u3002\u5373\u4f7f\u4e24\u8005\u5728\u5916\u89c2\u4e0a\u5f88\u76f8\u4f3c\uff0c\u8bad\u7ec3\u6570\u636e\u53ef\u80fd\u5e26\u6709\u5fae\u5999\u7684\u504f\u5dee\uff0c\u6a21\u578b\u4f1a\u5229\u7528\u8fd9\u4e9b\u504f\u5dee\u8fdb\u884c\u8fc7\u62df\u5408\uff0c\u4ece\u800c\u5bf9\u5b9e\u9645\u573a\u666f\u8868\u73b0\u4e0d\u4f73\u3002<\/p>\n<\/li>\n<\/ul>\n<p><strong>Sim2Real<\/strong> - \u4ece\u6a21\u62df\u73af\u5883\u8fc1\u79fb\u5230\u771f\u5b9e\u73af\u5883<\/p>\n<ul>\n<li>\n<p>\u5bf9\u4e8e\u8bb8\u591a\u4f9d\u8d56\u786c\u4ef6\u8fdb\u884c\u4ea4\u4e92\u7684\u673a\u5668\u5b66\u4e60\u5e94\u7528\u6765\u8bf4\uff0c\u5728\u73b0\u5b9e\u4e16\u754c\u4e2d\u6536\u96c6\u6570\u636e\u548c\u8bad\u7ec3\u6a21\u578b\u8981\u4e48<strong>\u6602\u8d35\u3001\u8017\u65f6\uff0c\u8981\u4e48\u592a\u5371\u9669<\/strong>\u3002\u56e0\u6b64\uff0c\u5efa\u8bae\u4ee5\u5176\u4ed6\u98ce\u9669\u8f83\u5c0f\u7684\u65b9\u5f0f\u6536\u96c6\u6570\u636e\u3002<\/p>\n<\/li>\n<li>\n<p>\u5e38\u89c1\u5e94\u7528\u5305\u62ec\u81ea\u52a8\u9a7e\u9a76\u548c\u673a\u5668\u4eba\u6280\u672f\uff08\u5176\u4e2d\u6536\u96c6\u6570\u636e\u53ef\u80fd\u5f88\u6162\u6216\u5f88\u5371\u9669\uff09\u3002<\/p>\n<\/li>\n<\/ul>\n<h2><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/nolan\/64\/system-task.png\" style=\"height:50px;display:inline\"> \u4f7f\u7528\u9884\u8bad\u7ec3\u6a21\u578b\u8fdb\u884c\u8fc1\u79fb\u5b66\u4e60<\/h2>\n<hr \/>\n<ul>\n<li>\u8fc1\u79fb\u5b66\u4e60\u7684\u57fa\u672c\u8981\u6c42\u4e4b\u4e00\u662f\u5b58\u5728\u5728\u6e90\u4efb\u52a1\u4e0a\u8868\u73b0\u826f\u597d\u7684\u6a21\u578b\u3002<\/li>\n<li>\u5efa\u7acb\u5728\u9884\u8bad\u7ec3\u6a21\u578b\u4e4b\u4e0a\u4ee5\u5728\u4efb\u52a1\u548c\u9886\u57df\u4e4b\u95f4\u8fdb\u884c\u8fc1\u79fb\u7684\u4e24\u4e2a\u6700\u5e38\u89c1\u9886\u57df\u662f\u8ba1\u7b97\u673a\u89c6\u89c9\u548c NLP\u3002<\/li>\n<\/ul>\n<h3><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/fluent\/96\/000000\/image.png\" style=\"height:50px;display:inline\"> \u4f7f\u7528\u9884\u8bad\u7ec3\u7684 CNN \u7279\u5f81<\/h3>\n<hr \/>\n<ul>\n<li>\u663e\u7136\uff0c\u8f83\u4f4e\u7684\u5377\u79ef\u5c42\u6355\u83b7<strong>\u4f4e\u7ea7\u56fe\u50cf\u7279\u5f81<\/strong>\uff0c\u4f8b\u5982\u8fb9\u7f18\uff0c\u800c\u8f83\u9ad8\u7684\u5377\u79ef\u5c42\u6355\u83b7\u66f4\u590d\u6742\u7684\u7ec6\u8282\uff0c\u4f8b\u5982\u8eab\u4f53\u90e8\u4f4d\u3001\u9762\u90e8\u548c\u5176\u4ed6\u7ec4\u5408\u7279\u5f81\u3002<\/li>\n<li><strong>\u6700\u7ec8\u7684\u5168\u8fde\u63a5\u5c42<\/strong>\u901a\u5e38\u88ab\u8ba4\u4e3a\u6355\u83b7\u4e0e\u89e3\u51b3\u76f8\u5e94\u4efb\u52a1\uff08\u4f8b\u5982\u5206\u7c7b\uff09\u76f8\u5173\u7684\u4fe1\u606f\u3002<\/li>\n<li>\u6355\u83b7\u56fe\u50cf\u5982\u4f55\u7ec4\u6210\u4ee5\u53ca\u5b83\u5305\u542b\u54ea\u4e9b\u8fb9\u7f18\u548c\u5f62\u72b6\u7ec4\u5408\u7684\u4e00\u822c\u4fe1\u606f\u7684\u8868\u793a<strong>\u53ef\u80fd\u5bf9\u5176\u4ed6\u4efb\u52a1\u6709\u5e2e\u52a9<\/strong>\u3002\u6b64\u4fe1\u606f\u5305\u542b\u5728 ImageNet \u4e0a\u8bad\u7ec3\u7684\u5927\u578b\u5377\u79ef\u795e\u7ecf\u7f51\u7edc\u4e2d\u7684\u6700\u7ec8\u5377\u79ef\u5c42\u6216\u65e9\u671f\u5168\u8fde\u63a5\u5c42\u4e4b\u4e00\u4e2d\u3002<\/li>\n<li>\u56e0\u6b64\uff0c\u5bf9\u4e8e\u65b0\u4efb\u52a1\uff0c\u6211\u4eec\u53ef\u4ee5\u7b80\u5355\u5730\u4f7f\u7528\u5728 ImageNet \u4e0a\u9884\u5148\u8bad\u7ec3\u7684\u6700\u5148\u8fdb\u7684 CNN \u7684\u73b0\u6210\u529f\u80fd\uff0c\u5e76\u5728\u8fd9\u4e9b\u63d0\u53d6\u7684\u529f\u80fd\u4e0a\u8bad\u7ec3\u65b0\u6a21\u578b\u3002<\/li>\n<li>\u5728\u5b9e\u8df5\u4e2d\uff0c\u6211\u4eec\u8981\u4e48<strong>\u4fdd\u6301\u9884\u5148\u8bad\u7ec3\u7684\u53c2\u6570\u4e0d\u53d8\uff0c\u8981\u4e48\u4ee5\u8f83\u5c0f\u7684\u5b66\u4e60\u7387\u5bf9\u5176\u8fdb\u884c\u8c03\u6574<\/strong>\uff0c\u4ee5\u786e\u4fdd\u6211\u4eec\u4e0d\u4f1a\u201c\u5fd8\u8bb0\u201d\u5148\u524d\u83b7\u5f97\u7684\u77e5\u8bc6\u3002<\/li>\n<\/ul>\n<h3><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/cotton\/64\/000000\/olympic-torch.png\" style=\"height:50px;display:inline\"> \u4f7f\u7528 PyTorch \u8fdb\u884c\u8fc1\u79fb\u5b66\u4e60\u7684\u793a\u4f8b<\/h3>\n<hr \/>\n<ul>\n<li>\u6211\u4eec\u5c06\u9075\u5faa <a href=\"https:\/\/pytorch.org\/tutorials\/beginner\/transfer\\_learning\\_tutorial.html\">Sasank Chilamkurthy<\/a> \u548c <a href=\"https:\/\/pytorch.org\/tutorials\/beginner\/finetuning\\_torchvision\\_models\\_tutorial.html\">Nathan Inkawhich<\/a> \u7684\u793a\u4f8b\u3002<\/li>\n<li>\u6211\u4eec\u5c06\u8bad\u7ec3\u4e00\u4e2a\u5206\u7c7b\u5668\u6765\u533a\u5206 <strong>\u8682\u8681<\/strong> \u548c <strong>\u871c\u8702<\/strong>\u3002<\/li>\n<li>\u53ef\u4ee5\u4ece\u6b64\u5904\u4e0b\u8f7d\u6570\u636e\uff1a<a href=\"https:\/\/download.pytorch.org\/tutorial\/hymenoptera\\_data.zip\">\u4e0b\u8f7d\u94fe\u63a5<\/a>\u3002<\/li>\n<li>\u6709\u4e24\u79cd\u4e3b\u8981\u7684\u8fc1\u79fb\u5b66\u4e60\u573a\u666f\uff1a<\/li>\n<li><strong>\u5fae\u8c03 ConvNet<\/strong>\uff1a\u6211\u4eec\u4e0d\u662f\u968f\u673a\u521d\u59cb\u5316\uff0c\u800c\u662f\u4f7f\u7528\u9884\u8bad\u7ec3\u7f51\u7edc\u521d\u59cb\u5316\u7f51\u7edc\uff0c\u4f8b\u5982 VGG\uff08\u5728 ImageNet 1000 \u6570\u636e\u96c6\u4e0a\u8bad\u7ec3\uff09\u3002\u5176\u4f59\u8bad\u7ec3\u770b\u8d77\u6765\u4e0e\u5f80\u5e38\u4e00\u6837\uff08\u9664\u4e86\u901a\u5e38\u4f7f\u7528\u8f83\u4f4e\u7684\u5b66\u4e60\u7387\uff09\u3002<\/li>\n<li><strong>ConvNet \u4f5c\u4e3a\u56fa\u5b9a\u7279\u5f81\u63d0\u53d6\u5668<\/strong>\uff1a\u5728\u8fd9\u91cc\uff0c\u6211\u4eec\u5c06\u51bb\u7ed3\u9664\u6700\u7ec8\u5168\u8fde\u63a5\u5c42\u4e4b\u5916\u7684\u6240\u6709\u7f51\u7edc\u7684\u6743\u91cd\u3002\u6700\u540e\u4e00\u4e2a\u5168\u8fde\u63a5\u5c42\u88ab\u66ff\u6362\u4e3a\u5177\u6709\u968f\u673a\u6743\u91cd\u7684\u65b0\u5c42\uff0c\u5e76\u4e14\u53ea\u8bad\u7ec3\u8fd9\u4e00\u5c42\u3002\u4f18\u70b9\u662f\u8bad\u7ec3\u901f\u5ea6\u975e\u5e38\u5feb\uff0c\u4f46\u6a21\u578b\u7684\u51e0\u4e2a\u90e8\u5206\u4e0d\u9002\u5e94\u65b0\u76ee\u6807\u3002<\/li>\n<\/ul>\n<pre><code class=\"language-python\">import os\nimport numpy as np\nimport time\nimport torch\nimport torchvision\nimport torch.nn as nn\nfrom torch.utils.data import DataLoader\nfrom torchvision.datasets import ImageFolder\nfrom torchvision import models, transforms\nimport matplotlib.pyplot as plt\n\n# Data augmentation and normalization for training\n# Just normalization for validation\ndata_transforms = {\n    &#039;train&#039;: transforms.Compose([\n        transforms.RandomResizedCrop(224),\n        transforms.RandomHorizontalFlip(),\n        transforms.ToTensor(),\n        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])\n    ]),\n    &#039;val&#039;: transforms.Compose([\n        transforms.Resize(256),\n        transforms.CenterCrop(224),\n        transforms.ToTensor(),\n        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])\n    ]),\n}\n\nbatch_size = 4\ndata_dir = &#039;.\/datasets\/hymenoptera_data&#039;\nimage_datasets = {x: ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in [&#039;train&#039;, &#039;val&#039;]}\ndataloaders = {x: DataLoader(image_datasets[x], batch_size=batch_size,\n                             shuffle=True, num_workers=4) for x in [&#039;train&#039;, &#039;val&#039;]}\ndataset_sizes = {x: len(image_datasets[x]) for x in [&#039;train&#039;, &#039;val&#039;]}\nclass_names = image_datasets[&#039;train&#039;].classes\n\ndevice = torch.device(&quot;cuda:0&quot; if torch.cuda.is_available() else &quot;cpu&quot;)\nprint(device)<\/code><\/pre>\n<pre><code>cuda:0<\/code><\/pre>\n<pre><code class=\"language-python\">def imshow(inp, title=None):\n    &quot;&quot;&quot;Imshow for Tensor.&quot;&quot;&quot;\n    inp = inp.numpy().transpose((1, 2, 0))\n    mean = np.array([0.485, 0.456, 0.406])\n    std = np.array([0.229, 0.224, 0.225])\n    inp = std * inp + mean\n    inp = np.clip(inp, 0, 1)\n    fig = plt.figure(figsize=(5, 8))\n    ax = fig.add_subplot(111)\n    ax.imshow(inp)\n    if title is not None:\n        ax.set_title(title)\n    ax.set_axis_off()\n\n# Let\u2019s visualize a few training images so as to understand the data augmentations.\n# Get a batch of training data\ninputs, classes = next(iter(dataloaders[&#039;train&#039;]))\n\n# Make a grid from batch\nout = torchvision.utils.make_grid(inputs)\n\nimshow(out, title=[class_names[x] for x in classes])<\/code><\/pre>\n<p align=\"center\">\n  <img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2025\/01\/20250125111529582.png\n\" style=\"height:200px\">\n<\/p>\n<h4>\u8bbe\u7f6e\u6a21\u578b\u53c2\u6570\u7684 <code>.requires_grad<\/code> \u5c5e\u6027<\/h4>\n<hr \/>\n<ul>\n<li>\u5f53\u6211\u4eec\u8fdb\u884c\u7279\u5f81\u63d0\u53d6\u65f6\uff0c\u4ee5\u4e0b\u8f85\u52a9\u51fd\u6570\u5c06\u6a21\u578b\u4e2d\u53c2\u6570\u7684 <code>.requires_grad<\/code> \u5c5e\u6027\u8bbe\u7f6e\u4e3a <code>False<\/code>\u3002<\/li>\n<li>\u9ed8\u8ba4\u60c5\u51b5\u4e0b\uff0c\u5f53\u6211\u4eec\u52a0\u8f7d\u9884\u8bad\u7ec3\u6a21\u578b\u65f6\uff0c\u6240\u6709\u53c2\u6570\u90fd\u6709 <code>.requires_grad=True<\/code>\uff0c\u5982\u679c\u6211\u4eec\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u6216\u8fdb\u884c\u5fae\u8c03\uff0c\u8fd9\u662f\u6ca1\u95ee\u9898\u7684\u3002<\/li>\n<li>\u4f46\u662f\uff0c\u5982\u679c\u6211\u4eec\u8fdb\u884c\u7279\u5f81\u63d0\u53d6\u5e76\u4e14\u53ea\u60f3\u8ba1\u7b97\u65b0\u521d\u59cb\u5316\u5c42\u7684\u68af\u5ea6\uff0c\u90a3\u4e48\u6211\u4eec\u5e0c\u671b\u6240\u6709\u5176\u4ed6\u53c2\u6570\u90fd\u4e0d\u9700\u8981\u68af\u5ea6\u3002<\/li>\n<\/ul>\n<pre><code class=\"language-python\">def set_parameter_requires_grad(model, feature_extracting=False):\n    # approach 1\n    if feature_extracting:\n        # frozen model\n        model.requires_grad_(False)\n    else:\n        # fine-tuning\n        model.requires_grad_(True)\n\n    # approach 2\n    if feature_extracting:\n        # frozen model\n        for param in model.parameters():\n            param.requires_grad = False\n    else:\n        # fine-tuning\n        for param in model.parameters():\n            param.requires_grad = True\n    # note: you can also mix between frozen layers and trainable layers, but you&#039;ll need a custom \n    # function that loops over the model&#039;s layers and you specify which layers are frozen.<\/code><\/pre>\n<h4>\u521d\u59cb\u5316\u548c\u91cd\u5851\u7f51\u7edc<\/h4>\n<hr \/>\n<ul>\n<li>\n<p>\u56de\u60f3\u4e00\u4e0b\uff0cCNN \u6a21\u578b\u7684\u6700\u540e\u4e00\u5c42\uff08\u901a\u5e38\u662f FC \u5c42\uff09\u7684\u8282\u70b9\u6570\u4e0e\u6570\u636e\u96c6\u4e2d\u7684\u8f93\u51fa\u7c7b\u6570\u76f8\u540c\u3002<\/p>\n<\/li>\n<li>\n<p>\u7531\u4e8e\u4ee5\u4e0b\u6240\u6709\u6a21\u578b\u90fd\u5df2\u5728 ImageNet \u4e0a\u8fdb\u884c\u4e86\u9884\u8bad\u7ec3\uff0c\u56e0\u6b64\u5b83\u4eec\u90fd\u5177\u6709\u5927\u5c0f\u4e3a 1000 \u7684\u8f93\u51fa\u5c42\uff0c\u6bcf\u4e2a\u7c7b\u4e00\u4e2a\u8282\u70b9\u3002<\/p>\n<\/li>\n<li>\n<p>\u8fd9\u91cc\u7684\u76ee\u6807\u662f<strong>\u91cd\u5851\u6700\u540e\u4e00\u5c42\uff0c\u4f7f\u5176\u5177\u6709\u4e0e\u4e4b\u524d\u76f8\u540c\u6570\u91cf\u7684\u8f93\u5165<\/strong>\uff0c\u5e76\u4e14\u5177\u6709<strong>\u4e0e\u6570\u636e\u96c6\u4e2d\u7684\u7c7b\u6570\u76f8\u540c\u7684\u8f93\u51fa\u6570\u91cf<\/strong>\u3002<\/p>\n<\/li>\n<li>\n<p>\u5728<em>\u7279\u5f81\u63d0\u53d6<\/em>\u65f6\uff0c\u6211\u4eec\u53ea\u60f3\u66f4\u65b0\u6700\u540e\u4e00\u5c42\u7684\u53c2\u6570\uff0c\u6362\u53e5\u8bdd\u8bf4\uff0c\u6211\u4eec\u53ea\u60f3\u66f4\u65b0\u6211\u4eec\u6b63\u5728\u91cd\u5851\u7684\u5c42\u7684\u53c2\u6570\u3002<\/p>\n<\/li>\n<li>\n<p>\u56e0\u6b64\uff0c\u6211\u4eec\u4e0d\u9700\u8981\u8ba1\u7b97\u6211\u4eec\u4e0d\u6539\u53d8\u7684\u53c2\u6570\u7684\u68af\u5ea6\uff0c\u6240\u4ee5\u4e3a\u4e86\u63d0\u9ad8\u6548\u7387\uff0c\u6211\u4eec\u5c06 <code>.requires_grad<\/code> \u5c5e\u6027\u8bbe\u7f6e\u4e3a <code>False<\/code>\u3002<\/p>\n<\/li>\n<li>\n<p>\u8fd9\u5f88\u91cd\u8981\uff0c\u56e0\u4e3a\u9ed8\u8ba4\u60c5\u51b5\u4e0b\uff0c\u6b64\u5c5e\u6027\u8bbe\u7f6e\u4e3a <code>True<\/code>\u3002\u7136\u540e\uff0c\u5f53\u6211\u4eec\u521d\u59cb\u5316\u65b0\u5c42\u65f6\uff0c\u9ed8\u8ba4\u60c5\u51b5\u4e0b\u65b0\u53c2\u6570\u5177\u6709 <code>.requires_grad=True<\/code>\uff0c\u56e0\u6b64\u53ea\u6709\u65b0\u5c42\u7684\u53c2\u6570\u4f1a\u88ab\u66f4\u65b0\u3002<\/p>\n<\/li>\n<li>\n<p>\u5f53\u6211\u4eec\u8fdb\u884c\u5fae\u8c03\u65f6\uff0c\u6211\u4eec\u53ef\u4ee5\u5c06\u6240\u6709 <code>.required_grad<\/code> \u8bbe\u7f6e\u4e3a\u9ed8\u8ba4\u503c <code>True<\/code>\u3002<\/p>\n<\/li>\n<\/ul>\n<h3><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/nolan\/64\/download-from-cloud.png\" style=\"height:50px;display:inline\"> Torchvision \u9884\u8bad\u7ec3\u6a21\u578b<\/h3>\n<hr \/>\n<ul>\n<li><code>torchvision.models<\/code> \u5b50\u5305\u5305\u542b\u7528\u4e8e\u89e3\u51b3\u4e0d\u540c\u4efb\u52a1\u7684\u6a21\u578b\u5b9a\u4e49\uff0c\u5305\u62ec\uff1a\u56fe\u50cf\u5206\u7c7b\u3001\u50cf\u7d20\u7ea7\u8bed\u4e49\u5206\u5272\u3001\u5bf9\u8c61\u68c0\u6d4b\u3001\u5b9e\u4f8b\u5206\u5272\u3001\u4eba\u7269\u5173\u952e\u70b9\u68c0\u6d4b\u3001\u89c6\u9891\u5206\u7c7b\u548c\u5149\u6d41\u3002<\/li>\n<li>\u60a8\u53ef\u4ee5\u5728\u6b64\u5904\u67e5\u770b\u6240\u6709\u53ef\u7528\u5185\u5bb9 - <a href=\"https:\/\/pytorch.org\/vision\/stable\/models.html\">\u6a21\u578b\u548c\u9884\u8bad\u7ec3\u6743\u91cd<\/a>\u3002<\/li>\n<li>\u5728\u4ee3\u7801\u4e2d\uff0c\u60a8\u53ef\u4ee5\u4f7f\u7528 <a href=\"https:\/\/pytorch.org\/vision\/stable\/models.html#listing-and-retrieving-available-models\"><code>torchvision.models.list\\_models<\/code><\/a> \u67e5\u770b\u6240\u6709\u53ef\u7528\u6a21\u578b\u7684\u5217\u8868\u3002<\/li>\n<li>\u793a\u4f8b\uff1a<\/li>\n<\/ul>\n<pre><code class=\"language-python\">def initialize_model(model_name, num_classes, feature_extract, use_pretrained=True):\n    # Initialize these variables which will be set in this if statement. Each of these\n    #   variables is model specific.\n    model_ft = None\n    input_size = 0  # image size, e.g. (3, 224, 224)\n    # new method from torchvision &gt;= 0.13\n    weigths = &#039;DEFAULT&#039; if use_pretrained else None \n    # to use other checkpoints than the default ones, check the model&#039;s available chekpoints here:\n    # https:\/\/pytorch.org\/vision\/stable\/models.html\n    if model_name == &quot;resnet&quot;:\n        &quot;&quot;&quot; Resnet18\n        &quot;&quot;&quot;\n        # new method from torchvision &gt;= 0.13\n        model_ft = models.resnet18(weights=weights)\n        # old method for toechvision &lt; 0.13\n        # model_ft = models.resnet18(pretrained=use_pretrained)\n\n        set_parameter_requires_grad(model_ft, feature_extract)\n        num_ftrs = model_ft.fc.in_features\n        model_ft.fc = nn.Linear(num_ftrs, num_classes) # replace the last FC layer\n        input_size = 224\n\n    elif model_name == &quot;alexnet&quot;:\n        &quot;&quot;&quot; Alexnet\n        &quot;&quot;&quot;\n        # new method from torchvision &gt;= 0.13\n        model_ft = models.alexnet(weights=models)\n        # old method for toechvision &lt; 0.13\n        # model_ft = models.alexnet(pretrained=use_pretrained)\n\n        set_parameter_requires_grad(model_ft, feature_extract)\n        num_ftrs = model_ft.classifier[6].in_features\n        model_ft.classifier[6] = nn.Linear(num_ftrs, num_classes)\n        input_size = 224\n\n    elif model_name == &quot;vgg&quot;:\n        &quot;&quot;&quot; VGG16\n        &quot;&quot;&quot;\n        # new method from torchvision &gt;= 0.13\n        model_ft = models.vgg16(weights=models.VGG16_Weights.DEFAULT)\n        # old method for toechvision &lt; 0.13\n        # model_ft = models.vgg16(pretrained=use_pretrained)\n\n        set_parameter_requires_grad(model_ft, feature_extract)\n        num_ftrs = model_ft.classifier[6].in_features\n        model_ft.classifier[6] = nn.Linear(num_ftrs, num_classes)\n        input_size = 224\n\n    elif model_name == &quot;squeezenet&quot;:\n        &quot;&quot;&quot; Squeezenet\n        &quot;&quot;&quot;\n        # new method from torchvision &gt;= 0.13\n        model_ft = models.squeezenet1_0(weights=weights)\n        # old method for torchvision &lt; 0.13\n        # model_ft = models.squeezenet1_0(pretrained=use_pretrained)\n\n        set_parameter_requires_grad(model_ft, feature_extract)\n        model_ft.classifier[1] = nn.Conv2d(512, num_classes, kernel_size=(1,1), stride=(1,1))\n        model_ft.num_classes = num_classes\n        input_size = 224\n\n    elif model_name == &quot;densenet&quot;:\n        &quot;&quot;&quot; Densenet\n        &quot;&quot;&quot;\n        # new method from torchvision &gt;= 0.13\n        model_ft = models.densenet121(weights=weights)\n        # old method for torchvision &lt; 0.13\n        # model_ft = models.densenet121(pretrained=use_pretrained)\n\n        set_parameter_requires_grad(model_ft, feature_extract)\n        num_ftrs = model_ft.classifier.in_features\n        model_ft.classifier = nn.Linear(num_ftrs, num_classes)\n        input_size = 224\n\n    else:\n        raise NotImplementedError\n\n    return model_ft, input_size<\/code><\/pre>\n<pre><code class=\"language-python\"># Models to choose from [resnet, alexnet, vgg, squeezenet, densenet]\nmodel_name = &quot;vgg&quot;\n\n# Number of classes in the dataset\nnum_classes = 2\n\n# Batch size for training (change depending on how much memory you have)\nbatch_size = 8\n\n# Number of epochs to train for\nnum_epochs = 5\n\n# Flag for feature extracting. When False, we fine-tune the whole model,\n#   when True we only update the reshaped layer params\nfeature_extract = True<\/code><\/pre>\n<pre><code class=\"language-python\"># Initialize the model for this run\nmodel_ft, input_size = initialize_model(model_name, num_classes, feature_extract, use_pretrained=True)\n\n# Print the model we just instantiated\nprint(model_ft)<\/code><\/pre>\n<pre><code>VGG(\n  (features): Sequential(\n    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n    (1): ReLU(inplace=True)\n    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n    (3): ReLU(inplace=True)\n    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n    (6): ReLU(inplace=True)\n    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n    (8): ReLU(inplace=True)\n    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n    (11): ReLU(inplace=True)\n    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n    (13): ReLU(inplace=True)\n    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n    (15): ReLU(inplace=True)\n    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n    (18): ReLU(inplace=True)\n    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n    (20): ReLU(inplace=True)\n    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n    (22): ReLU(inplace=True)\n    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n    (25): ReLU(inplace=True)\n    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n    (27): ReLU(inplace=True)\n    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n    (29): ReLU(inplace=True)\n    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n  )\n  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))\n  (classifier): Sequential(\n    (0): Linear(in_features=25088, out_features=4096, bias=True)\n    (1): ReLU(inplace=True)\n    (2): Dropout(p=0.5, inplace=False)\n    (3): Linear(in_features=4096, out_features=4096, bias=True)\n    (4): ReLU(inplace=True)\n    (5): Dropout(p=0.5, inplace=False)\n    (6): Linear(in_features=4096, out_features=2, bias=True)\n  )\n)<\/code><\/pre>\n<pre><code class=\"language-python\">model_ft = model_ft.to(device)\n\n# Gather the parameters to be optimized\/updated in this run. If we are\n#  fine-tuning we will be updating all parameters. However, if we are\n#  doing feature extract method, we will only update the parameters\n#  that we have just initialized, i.e. the parameters with requires_grad\n#  is True.\nparams_to_update = model_ft.parameters()\nprint(&quot;Params to learn:&quot;)\nif feature_extract:\n    params_to_update = []  # override the initial list definition above\n    for name,param in model_ft.named_parameters():\n        if param.requires_grad == True:\n            params_to_update.append(param)\n            print(&quot;\\t&quot;,name)\nelse:\n    for name,param in model_ft.named_parameters():\n        if param.requires_grad == True:\n            print(&quot;\\t&quot;,name)\n\n# Observe that all parameters are being optimized\noptimizer_ft = torch.optim.SGD(params_to_update, lr=0.001, momentum=0.9)<\/code><\/pre>\n<pre><code>Params to learn:\n     classifier.6.weight\n     classifier.6.bias<\/code><\/pre>\n<pre><code class=\"language-python\">import copy\n&quot;&quot;&quot;\nTraining function\n&quot;&quot;&quot;\ndef train_model(model, dataloaders, criterion, optimizer, num_epochs=10):\n    since = time.time()\n\n    val_acc_history = []\n\n    best_model_wts = copy.deepcopy(model.state_dict())\n    best_acc = 0.0\n\n    for epoch in range(num_epochs):\n        print(&#039;Epoch {}\/{}&#039;.format(epoch, num_epochs - 1))\n        print(&#039;-&#039; * 10)\n\n        # Each epoch has a training and validation phase\n        for phase in [&#039;train&#039;, &#039;val&#039;]:\n            if phase == &#039;train&#039;:\n                model.train()  # Set model to training mode\n            else:\n                model.eval()   # Set model to evaluate mode\n\n            running_loss = 0.0\n            running_corrects = 0\n\n            # Iterate over data.\n            for inputs, labels in dataloaders[phase]:\n                inputs = inputs.to(device)\n                labels = labels.to(device)\n\n                # forward\n                # track history if only in train\n                with torch.set_grad_enabled(phase == &#039;train&#039;):\n                    # Get model outputs and calculate loss\n                    outputs = model(inputs)\n                    loss = criterion(outputs, labels)\n\n                    _, preds = torch.max(outputs, 1)\n\n                    # backward + optimize only if in training phase\n                    if phase == &#039;train&#039;:\n                        # zero the parameter gradients\n                        optimizer.zero_grad()\n                        loss.backward()\n                        optimizer.step()\n\n                # statistics\n                running_loss += loss.item() * inputs.size(0)\n                running_corrects += torch.sum(preds == labels.data)\n\n            epoch_loss = running_loss \/ len(dataloaders[phase].dataset)\n            epoch_acc = running_corrects.double() \/ len(dataloaders[phase].dataset)\n\n            print(&#039;{} Loss: {:.4f} Acc: {:.4f}&#039;.format(phase, epoch_loss, epoch_acc))\n\n            # deep copy the model\n            if phase == &#039;val&#039; and epoch_acc &gt; best_acc:\n                best_acc = epoch_acc\n                best_model_wts = copy.deepcopy(model.state_dict())\n            if phase == &#039;val&#039;:\n                val_acc_history.append(epoch_acc)\n\n        print()\n\n    time_elapsed = time.time() - since\n    print(&#039;Training complete in {:.0f}m {:.0f}s&#039;.format(time_elapsed \/\/ 60, time_elapsed % 60))\n    print(&#039;Best val Acc: {:4f}&#039;.format(best_acc))\n\n    # load best model weights\n    model.load_state_dict(best_model_wts)\n    return model, val_acc_history<\/code><\/pre>\n<pre><code class=\"language-python\"># Setup the loss fn\ncriterion = nn.CrossEntropyLoss()\n\n# Train and evaluate\nmodel_ft, hist = train_model(model_ft, dataloaders, criterion, optimizer_ft, num_epochs=num_epochs)<\/code><\/pre>\n<pre><code>Epoch 0\/4\n----------\ntrain Loss: 0.3457 Acc: 0.8770\nval Loss: 0.1964 Acc: 0.9346\n\nEpoch 1\/4\n----------\ntrain Loss: 0.2492 Acc: 0.9139\nval Loss: 0.1017 Acc: 0.9739\n\nEpoch 2\/4\n----------\ntrain Loss: 0.2920 Acc: 0.9303\nval Loss: 0.1045 Acc: 0.9673\n\nEpoch 3\/4\n----------\ntrain Loss: 0.1478 Acc: 0.9385\nval Loss: 0.2151 Acc: 0.9216\n\nEpoch 4\/4\n----------\ntrain Loss: 0.2607 Acc: 0.9180\nval Loss: 0.1252 Acc: 0.9477\n\nTraining complete in 0m 9s\nBest val Acc: 0.973856<\/code><\/pre>\n<h3><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/cute-clipart\/64\/000000\/language.png\" style=\"height:50px;display:inline\"> \u81ea\u7136\u8bed\u8a00\u5904\u7406\u7684\u9884\u8bad\u7ec3<\/h3>\n<hr \/>\n<ul>\n<li>NLP \u9762\u4e34\u7684\u6700\u5927\u6311\u6218\u4e4b\u4e00\u662f\u6807\u8bb0\u8bad\u7ec3\u6570\u636e\u7684\u77ed\u7f3a\u3002<\/li>\n<li>\u7531\u4e8e NLP \u662f\u4e00\u4e2a\u591a\u5143\u5316\u7684\u9886\u57df\uff0c\u5177\u6709\u8bb8\u591a\u4e0d\u540c\u7684\u4efb\u52a1\uff0c\u56e0\u6b64\u5927\u591a\u6570\u7279\u5b9a\u4e8e\u4efb\u52a1\u7684\u6570\u636e\u96c6\u4ec5\u5305\u542b\u51e0\u5343\u6216\u51e0\u5341\u4e07\u4e2a\u4eba\u5de5\u6807\u8bb0\u7684\u8bad\u7ec3\u793a\u4f8b\u3002<\/li>\n<li>\u6b63\u5982 Google \u548c OpenAI \u7b49\u5927\u516c\u53f8\u6240\u5c55\u793a\u7684\u90a3\u6837\uff0c<strong>\u57fa\u4e8e\u73b0\u4ee3\u6df1\u5ea6\u5b66\u4e60\u7684 NLP \u6a21\u578b\u4ece\u5927\u91cf\u6570\u636e\u4e2d\u83b7\u76ca\uff0c\u5728\u5bf9\u6570\u767e\u4e07\u6216\u6570\u5341\u4ebf\u4e2a\u5e26\u6ce8\u91ca\u7684\u8bad\u7ec3\u793a\u4f8b\u8fdb\u884c\u8bad\u7ec3\u65f6\u4f1a\u5f97\u5230\u6539\u8fdb<\/strong>\u3002<\/li>\n<li>\u7136\u540e\u53ef\u4ee5\u5728\u5c0f\u6570\u636e NLP \u4efb\u52a1\uff08\u5982\u95ee\u7b54\u548c\u60c5\u7eea\u5206\u6790\uff09\u4e0a\u5bf9\u5927\u91cf<strong>\u7f51\u7edc\u4e0a\u672a\u6ce8\u91ca\u6587\u672c<\/strong>\u4e0a\u7684\u5927\u578b\u9884\u8bad\u7ec3\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4e0e\u4ece\u5934\u5f00\u59cb\u5bf9\u8fd9\u4e9b\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\u76f8\u6bd4\uff0c\u51c6\u786e\u6027\u6709\u4e86\u663e\u7740\u63d0\u9ad8\u3002<\/li>\n<li><strong>Transformers \u7684\u53cc\u5411\u7f16\u7801\u5668\u8868\u793a (BERT)\uff0cGoogle<\/strong> - Google \u5f00\u53d1\u7684\u7528\u4e8e\u81ea\u7136\u8bed\u8a00\u5904\u7406 (NLP) \u9884\u8bad\u7ec3\u7684\u57fa\u4e8e Transformer \u7684\u673a\u5668\u5b66\u4e60\u6280\u672f\u3002\u5176\u601d\u60f3\u662f\u5c4f\u853d\u67d0\u4e9b\u5355\u8bcd\uff0c\u7136\u540e\u5c1d\u8bd5\u9884\u6d4b\u5b83\u4eec\u3002\u539f\u59cb\u7684\u82f1\u8bed BERT \u6a21\u578b\u5e26\u6709\u4e24\u79cd\u9884\u8bad\u7ec3\u7684\u901a\u7528\u7c7b\u578b\uff1a<\/li>\n<li>(1) $BERT_{BASE}$ \u6a21\u578b\uff0c12 \u5c42\u3001768 \u4e2a\u9690\u85cf\u300112 \u4e2a\u5934\u3001110M \u53c2\u6570\u795e\u7ecf\u7f51\u7edc\u67b6\u6784\u3002<\/li>\n<li>(2) $BERT_{LARGE}$ \u6a21\u578b\uff0c24 \u5c42\u30011024 \u4e2a\u9690\u85cf\u300116 \u4e2a\u5934\u3001340M \u53c2\u6570\u795e\u7ecf\u7f51\u7edc\u67b6\u6784\u3002<\/li>\n<li>\u4e24\u8005\u90fd\u662f\u5728\u5305\u542b 800M \u5355\u8bcd\u7684 BooksCorpus \u6570\u636e\u96c6\u548c\u5305\u542b 2,500M \u5355\u8bcd\u7684\u82f1\u6587\u7ef4\u57fa\u767e\u79d1\u7248\u672c\u4e0a\u8fdb\u884c\u8bad\u7ec3\u7684\u3002<\/li>\n<li>BERT \u4f7f\u7528\u7b80\u5355\u7684\u6280\u672f\u6765\u5c4f\u853d\u8f93\u5165\u4e2d\u7684\u67d0\u4e9b\u5355\u8bcd\uff0c\u7136\u540e\u53cc\u5411\u8c03\u8282\u6bcf\u4e2a\u5355\u8bcd\u4ee5\u9884\u6d4b\u88ab\u5c4f\u853d\u7684\u5355\u8bcd\uff0c\u540c\u65f6\u901a\u8fc7\u5bf9\u53ef\u4ee5\u4ece\u4efb\u4f55\u6587\u672c\u8bed\u6599\u5e93\u751f\u6210\u7684\u975e\u5e38\u7b80\u5355\u7684\u4efb\u52a1\u8fdb\u884c\u9884\u8bad\u7ec3\u6765\u5b66\u4e60\u5efa\u6a21\u53e5\u5b50\u4e4b\u95f4\u7684\u5173\u7cfb\uff1a\u7ed9\u5b9a\u4e24\u4e2a\u53e5\u5b50 A \u548c B\uff0cB \u662f\u8bed\u6599\u5e93\u4e2d A \u4e4b\u540e\u7684\u5b9e\u9645\u4e0b\u4e00\u4e2a\u53e5\u5b50\uff0c\u8fd8\u662f\u53ea\u662f\u4e00\u4e2a\u968f\u673a\u53e5\u5b50\uff1f<\/li>\n<\/ul>\n<p align=\"center\">\n  <img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2025\/01\/20250125112225592.png\n\" style=\"height:300px\">\n<\/p>\n<ul>\n<li><a href=\"https:\/\/ai.googleblog.com\/2018\/11\/open-sourcing-bert-state-of-art-pre.html\">Image Source 1<\/a>, <a href=\"http:\/\/jalammar.github.io\/illustrated-bert\/\">Image Source 2<\/a><\/li>\n<\/ul>\n<p align=\"center\">\n  <img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2025\/01\/20250125112248745.png\n\" style=\"height:300px\">\n<\/p>\n<ul>\n<li><a href=\"https:\/\/ai.googleblog.com\/2018\/11\/open-sourcing-bert-state-of-art-pre.html\">Image Source 1<\/a>, <a href=\"http:\/\/jalammar.github.io\/illustrated-bert\/\">Image Source 2<\/a><\/li>\n<\/ul>\n<h4>PyTorch \u4e2d\u7528\u4e8e NLP \u7684\u9884\u8bad\u7ec3\u6a21\u578b<\/h4>\n<hr \/>\n<ul>\n<li><a href=\"https:\/\/huggingface.co\/\">HuggingFace<\/a> \u662f\u4e00\u5bb6\u81f4\u529b\u4e8e\u53d1\u5e03\u6240\u6709\u53ef\u7528\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u516c\u53f8\uff0c\u5b83\u4e5f\u9002\u7528\u4e8e PyTorch - <a href=\"https:\/\/github.com\/huggingface\/transformers\">HuggingFace Transformers<\/a><\/li>\n<li><a href=\"https:\/\/pytorch.org\/hub\/huggingface\\_pytorch-transformers\/\">\u4f7f\u7528 PyTorch \u7684\u793a\u4f8b<\/a><\/li>\n<li><a href=\"https:\/\/huggingface.co\/docs\/transformers\/training\">\u6559\u7a0b\uff1a\u4f7f\u7528 PyTorch \u548c HuggingFace \u5bf9\u7528\u4e8e NLP \u4efb\u52a1\u7684 Transformers \u8fdb\u884c\u5fae\u8c03<\/a>\u3002<\/li>\n<\/ul>\n<h4><img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2025\/01\/20250125112917318.png\" style=\"height:50px;display:inline\"> TorchTune \u5e93 - \u4f7f\u7528 LLM \u8fdb\u884c\u5fae\u8c03\u548c\u5b9e\u9a8c<\/h4>\n<hr \/>\n<ul>\n<li><a href=\"https:\/\/github.com\/pytorch\/torchtune\">TorchTune<\/a> \u662f\u4e00\u4e2a\u539f\u751f Pytorch \u5e93\uff0c\u53ef\u8f7b\u677e\u7f16\u5199\u3001\u5fae\u8c03\u548c\u4f7f\u7528 LLM \u8fdb\u884c\u5b9e\u9a8c\u3002<\/li>\n<li>\u6d41\u884c LLM \u7684\u539f\u751f PyTorch \u5b9e\u73b0\uff0c\u652f\u6301\u5404\u79cd\u683c\u5f0f\u7684\u68c0\u67e5\u70b9\uff0c\u5305\u62ec HuggingFace \u683c\u5f0f\u7684\u68c0\u67e5\u70b9\u3002<\/li>\n<li><a href=\"https:\/\/github.com\/pytorch\/torchtune\">GitHub \u4e0a\u7684 TorchTune<\/a><\/li>\n<\/ul>\n<p align=\"center\">\n  <img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2025\/01\/20250125112328442.png\n\" style=\"height:200px\">\n<\/p>\n<h2><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/dusk\/64\/000000\/prize.png\" style=\"height:50px;display:inline\"> Credits<\/h2>\n<hr \/>\n<ul>\n<li>Icons made by <a href=\"https:\/\/www.flaticon.com\/authors\/becris\" title=\"Becris\">Becris<\/a> from <a href=\"https:\/\/www.flaticon.com\/\" title=\"Flaticon\">www.flaticon.com<\/a><\/li>\n<li>Icons from <a href=\"https:\/\/icons8.com\/\">Icons8.com<\/a> - <a href=\"https:\/\/icons8.com\">https:\/\/icons8.com<\/a><\/li>\n<li>Datasets from <a href=\"https:\/\/www.kaggle.com\/\">Kaggle<\/a> - <a href=\"https:\/\/www.kaggle.com\/\">https:\/\/www.kaggle.com\/<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/why-initialize-a-neural-network-with-random-weights\/\">Jason Brownlee - Why Initialize a Neural Network with Random Weights?<\/a><\/li>\n<li><a href=\"https:\/\/openai.com\/blog\/deep-double-descent\/\">OpenAI - Deep Double Descent<\/a><\/li>\n<li><a href=\"https:\/\/taldatech.github.io\">Tal Daniel<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Learning Methods of Deep Learning create by Deepfinder  [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2698,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18,28],"tags":[],"class_list":["post-2695","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-18","category-28"],"_links":{"self":[{"href":"http:\/\/gnn.club\/index.php?rest_route=\/wp\/v2\/posts\/2695","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/gnn.club\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/gnn.club\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/gnn.club\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/gnn.club\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2695"}],"version-history":[{"count":5,"href":"http:\/\/gnn.club\/index.php?rest_route=\/wp\/v2\/posts\/2695\/revisions"}],"predecessor-version":[{"id":2706,"href":"http:\/\/gnn.club\/index.php?rest_route=\/wp\/v2\/posts\/2695\/revisions\/2706"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/gnn.club\/index.php?rest_route=\/wp\/v2\/media\/2698"}],"wp:attachment":[{"href":"http:\/\/gnn.club\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2695"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/gnn.club\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2695"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/gnn.club\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2695"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}