{"id":2628,"date":"2025-01-24T21:43:53","date_gmt":"2025-01-24T13:43:53","guid":{"rendered":"https:\/\/www.gnn.club\/?p=2628"},"modified":"2025-03-12T15:06:57","modified_gmt":"2025-03-12T07:06:57","slug":"tutorial-03-%e6%97%a0%e5%b8%88%e8%87%aa%e9%80%9a%ef%bc%9a%e8%87%aa%e7%9b%91%e7%9d%a3%e5%ad%a6%e4%b9%a0%ef%bc%88self-supervised-learning%ef%bc%89","status":"publish","type":"post","link":"http:\/\/gnn.club\/?p=2628","title":{"rendered":"Tutorial 03 &#8211; \u65e0\u5e08\u81ea\u901a\uff1a\u81ea\u76d1\u7763\u5b66\u4e60\uff08Self-supervised Learning\uff09"},"content":{"rendered":"<h1>Learning Methods of Deep Learning<\/h1>\n<hr \/>\n<p>create by Deepfinder<\/p>\n<h3><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/bubbles\/50\/000000\/checklist.png\" style=\"height:50px;display:inline\"> Agenda<\/h3>\n<hr \/>\n<ol>\n<li>\u5e08\u5f92\u76f8\u6388\uff1a\u6709\u76d1\u7763\u5b66\u4e60\uff08Supervised Learning\uff09<\/li>\n<li>\u89c1\u5fae\u77e5\u8457\uff1a\u65e0\u76d1\u7763\u5b66\u4e60\uff08Un-supervised Learning\uff09<\/li>\n<li><strong>\u65e0\u5e08\u81ea\u901a\uff1a\u81ea\u76d1\u7763\u5b66\u4e60\uff08Self-supervised Learning\uff09<\/strong><\/li>\n<li>\u4ee5\u70b9\u5e26\u9762\uff1a\u534a\u76d1\u7763\u5b66\u4e60\uff08Semi-supervised learning\uff09<\/li>\n<li>\u660e\u8fa8\u662f\u975e\uff1a\u5bf9\u6bd4\u5b66\u4e60\uff08Contrastive Learning\uff09<\/li>\n<li>\u4e3e\u4e00\u53cd\u4e09\uff1a\u8fc1\u79fb\u5b66\u4e60\uff08Transfer Learning\uff09<\/li>\n<li>\u9488\u950b\u76f8\u5bf9\uff1a\u5bf9\u6297\u5b66\u4e60\uff08Adversarial Learning\uff09<\/li>\n<li>\u4f17\u5fd7\u6210\u57ce\uff1a\u96c6\u6210\u5b66\u4e60(Ensemble Learning) <\/li>\n<li>\u6b8a\u9014\u540c\u5f52\uff1a\u8054\u90a6\u5b66\u4e60\uff08Federated Learning\uff09<\/li>\n<li>\u767e\u6298\u4e0d\u6320\uff1a\u5f3a\u5316\u5b66\u4e60\uff08Reinforcement Learning\uff09<\/li>\n<li>\u6c42\u77e5\u82e5\u6e34\uff1a\u4e3b\u52a8\u5b66\u4e60\uff08Active Learning\uff09<\/li>\n<li>\u4e07\u6cd5\u5f52\u5b97\uff1a\u5143\u5b66\u4e60\uff08Meta-Learning\uff09<\/li>\n<\/ol>\n<h2>Tutorial 03 - \u65e0\u5e08\u81ea\u901a\uff1a\u81ea\u76d1\u7763\u5b66\u4e60\uff08Self-supervised Learning\uff09<\/h2>\n<h2><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/cute-clipart\/64\/000000\/task.png\" style=\"height:50px;display:inline\"> \u81ea\u76d1\u7763\u5b66\u4e60<\/h2>\n<hr \/>\n<ul>\n<li>\u4e00\u79cd\u65e0\u76d1\u7763\u5b66\u4e60\u7684\u7248\u672c\uff0c\u5176\u4e2d<strong>\u6570\u636e\u63d0\u4f9b\u76d1\u7763<\/strong>\u3002<\/li>\n<li><strong>\u60f3\u6cd5<\/strong>\uff1a\u4fdd\u7559\u90e8\u5206\u6570\u636e\uff0c\u7136\u540e\u8ba9\u795e\u7ecf\u7f51\u7edc\u6839\u636e\u5269\u4f59\u90e8\u5206\u8fdb\u884c\u9884\u6d4b\u3002<\/li>\n<\/ul>\n<h3><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/external-wanicon-lineal-color-wanicon\/64\/null\/external-mask-brazilian-carnival-wanicon-lineal-color-wanicon.png\" style=\"height:50px;display:inline\"> Masked Autoencoders<\/h3>\n<ul>\n<li><a href=\"https:\/\/arxiv.org\/abs\/2111.06377\">Masked Autoencoders Are Scalable Vision Learners, He et al. 2021.<\/a><\/li>\n<\/ul>\n<p align=\"center\">\n  <img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2025\/01\/20250124211042445.png\n\" style=\"height:300px\">\n<\/p>\n<p>Masked Autoencoders (MAE) \u7684\u57fa\u672c\u5de5\u4f5c\u539f\u7406\uff1a<\/p>\n<p><strong>\u6570\u636e\u5904\u7406\u9636\u6bb5<\/strong>\uff1a<br \/>\n\u5bf9\u8f93\u5165\u7684\u56fe\u50cf\u5206\u5272\u6210\u8bb8\u591a\u5c0f\u7684\u56fe\u50cf\u5757\uff08patches\uff09\u3002<br \/>\n\u968f\u673a\u9009\u62e9\u4e00\u90e8\u5206\u56fe\u50cf\u5757\uff08\u901a\u5e38\u662f 75% \u7684\u56fe\u50cf\u5757\uff09\u8fdb\u884c\u906e\u63a9\uff08mask\uff09\uff0c\u5373\u4ece\u8f93\u5165\u4e2d\u79fb\u9664\u8fd9\u4e9b\u5757\u3002<br \/>\n\u5269\u4e0b\u672a\u906e\u63a9\u7684\u56fe\u50cf\u5757\u88ab\u9001\u5165\u7f16\u7801\u5668\uff08encoder\uff09\u8fdb\u884c\u7279\u5f81\u63d0\u53d6\u3002<\/p>\n<p><strong>\u63a9\u7801\u4ee4\u724c<\/strong> (Mask Tokens)\uff1a<br \/>\n\u5728\u7f16\u7801\u5668\u4e4b\u540e\uff0c\u5c06\u63a9\u7801\u5757\u7684\u4f4d\u7f6e\u7528\u7279\u6b8a\u7684\u201c\u63a9\u7801\u4ee4\u724c\u201d\u6765\u8865\u5145\u3002\u63a9\u7801\u4ee4\u724c\u662f\u7528\u4e8e\u586b\u5145\u88ab\u906e\u63a9\u7684\u56fe\u50cf\u5757\u4f4d\u7f6e\u7684\u7279\u6b8a\u6807\u8bb0\u3002\u56e0\u4e3a\u89e3\u7801\u5668\u9700\u8981\u5b8c\u6574\u7684\u56fe\u50cf\u5757\u5e8f\u5217\uff08\u5305\u542b\u672a\u906e\u63a9\u7684\u5757\u548c\u906e\u63a9\u7684\u5757\uff09\u6765\u91cd\u5efa\u539f\u59cb\u56fe\u50cf\uff0c\u6240\u4ee5\u63a9\u7801\u4ee4\u724c\u5728\u906e\u63a9\u5757\u7684\u4f4d\u7f6e\u4e0a\u8d77\u5230\u4e86\u5360\u4f4d\u7684\u4f5c\u7528\u3002\u901a\u8fc7\u89e3\u7801\u5668\uff08decoder\uff09\u5bf9\u7f16\u7801\u540e\u7684\u56fe\u50cf\u5757\u548c\u63a9\u7801\u4ee4\u724c\u8fdb\u884c\u5904\u7406\uff0c\u5c1d\u8bd5\u91cd\u5efa\u539f\u59cb\u56fe\u50cf\u7684\u50cf\u7d20\u3002<\/p>\n<p>\u7b80\u5355\u8bf4\uff0c\u63a9\u7801\u4ee4\u724c\u4f7f\u89e3\u7801\u5668\u80fd\u591f\u533a\u5206\u54ea\u4e9b\u90e8\u5206\u662f\u5df2\u77e5\u4fe1\u606f\uff08\u7f16\u7801\u5668\u63d0\u4f9b\u7684\u672a\u906e\u63a9\u5757\uff09\uff0c\u54ea\u4e9b\u662f\u9700\u8981\u9884\u6d4b\u7684\u672a\u77e5\u4fe1\u606f\uff08\u7531\u63a9\u7801\u5757\u7684\u4f4d\u7f6e\u6307\u793a\uff09\u3002\u8fd9\u6709\u6548\u5730\u5e2e\u52a9\u89e3\u7801\u5668\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u4e13\u6ce8\u4e8e\u91cd\u5efa\u88ab\u906e\u63a9\u7684\u90e8\u5206\u3002<\/p>\n<p><strong>\u4f7f\u7528\u9636\u6bb5<\/strong>\uff1a<br \/>\n\u9884\u8bad\u7ec3\u5b8c\u6210\u540e\uff0c\u89e3\u7801\u5668\u88ab\u4e22\u5f03\uff0c\u53ea\u6709\u7f16\u7801\u5668\u88ab\u4fdd\u7559\u3002<br \/>\n\u5bf9\u4e8e\u540e\u7eed\u4efb\u52a1\uff08\u5982\u56fe\u50cf\u5206\u7c7b\u6216\u76ee\u6807\u8bc6\u522b\uff09\uff0c\u7f16\u7801\u5668\u63a5\u6536\u5b8c\u6574\u7684\u672a\u906e\u63a9\u56fe\u50cf\u5757\u4f5c\u4e3a\u8f93\u5165\u3002<br \/>\n\u8fd9\u79cd\u65b9\u6cd5\u7684\u6838\u5fc3\u601d\u60f3\u662f\u901a\u8fc7\u9884\u6d4b\u88ab\u906e\u63a9\u7684\u90e8\u5206\u6765\u8ba9\u6a21\u578b\u5b66\u4e60\u66f4\u597d\u7684\u56fe\u50cf\u8868\u793a\u3002\u9884\u8bad\u7ec3\u9636\u6bb5\u7c7b\u4f3c\u4e8e\u81ea\u76d1\u7763\u5b66\u4e60\uff0c\u901a\u8fc7\u5bf9\u906e\u63a9\u90e8\u5206\u7684\u91cd\u5efa\u6765\u63d0\u5347\u7f16\u7801\u5668\u7684\u7279\u5f81\u63d0\u53d6\u80fd\u529b\u3002<\/p>\n<ul>\n<li>\u4ee3\u7801\uff1a<\/li>\n<li>HuggingFace\uff1a<a href=\"https:\/\/huggingface.co\/docs\/transformers\/model\\_doc\/vit\\_mae\">ViTMAE<\/a><\/li>\n<li>GitHub\uff1a<a href=\"https:\/\/github.com\/facebookresearch\/mae\">\u5b98\u65b9 PyTorch \u5b9e\u73b0 (FAIR)<\/a>\u3001<a href=\"https:\/\/github.com\/EdisonLeeeee\/Awesome-Masked-Autoencoders\">\u8d85\u8d5e MAE \u6a21\u578b<\/a><\/li>\n<\/ul>\n<p><strong>Masked Language Model (MLM)<\/strong><\/p>\n<p>\u63a9\u7801\u8bed\u8a00\u6a21\u578b\u662f BERT \u7684\u5173\u952e\u8bad\u7ec3\u65b9\u5f0f\uff0c\u65e8\u5728\u901a\u8fc7\u63a9\u7801\u90e8\u5206\u8f93\u5165\u8bcd\u6c47\u6765\u5b66\u4e60\u4e0a\u4e0b\u6587\u8bed\u4e49\u3002<\/p>\n<ul>\n<li>\u8bad\u7ec3\u8fc7\u7a0b\uff1a<\/li>\n<\/ul>\n<p>\u63a9\u7801\u968f\u673a\u8bcd\uff1a<br \/>\n\u5bf9\u8f93\u5165\u6587\u672c\u4e2d\u7684\u8bcd\u968f\u673a\u9009\u62e9 15% \u8fdb\u884c\u5904\u7406\uff1a<br \/>\n80% \u7684\u6982\u7387\u7528 [MASK] \u66ff\u6362\uff08\u5982 &quot;apple&quot; \u2192 &quot;[MASK]&quot;\uff09\u3002<br \/>\n10% \u7684\u6982\u7387\u66ff\u6362\u4e3a\u968f\u673a\u8bcd\uff08\u5982 &quot;apple&quot; \u2192 &quot;orange&quot;\uff09\u3002<br \/>\n10% \u7684\u6982\u7387\u4fdd\u6301\u539f\u8bcd\u4e0d\u53d8\uff08\u5982 &quot;apple&quot; \u2192 &quot;apple&quot;\uff09\u3002<\/p>\n<ul>\n<li>\n<p>\u76ee\u6807\uff1a<br \/>\n\u6a21\u578b\u901a\u8fc7\u4e0a\u4e0b\u6587\u9884\u6d4b\u88ab\u63a9\u7801\u7684\u8bcd\u3002<\/p>\n<\/li>\n<li>\n<p>\u4f5c\u7528\uff1a<br \/>\n\u901a\u8fc7\u53cc\u5411\u4e0a\u4e0b\u6587\u5efa\u6a21\uff0c\u8ba9\u6a21\u578b\u7406\u89e3\u53e5\u5b50\u4e2d\u6bcf\u4e2a\u8bcd\u4e0e\u5468\u56f4\u8bcd\u7684\u5173\u7cfb\uff0c\u4ece\u800c\u5b66\u4e60\u66f4\u6df1\u5c42\u6b21\u7684\u8bed\u4e49\u8868\u793a\u3002<\/p>\n<\/li>\n<\/ul>\n<p align=\"center\">\n  <img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2025\/01\/20250124214100930.png\n\" style=\"height:300px\">\n<\/p>\n<ul>\n<li><a href=\"https:\/\/ai.googleblog.com\/2018\/11\/open-sourcing-bert-state-of-art-pre.html\">Image Source 1<\/a>, <a href=\"http:\/\/jalammar.github.io\/illustrated-bert\/\">Image Source 2<\/a><\/li>\n<\/ul>\n<p><strong>Next Sentence Prediction (NSP)<\/strong><\/p>\n<p align=\"center\">\n  <img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2025\/01\/20250124214129872.png\n\" style=\"height:300px\">\n<\/p>\n<ul>\n<li><a href=\"https:\/\/ai.googleblog.com\/2018\/11\/open-sourcing-bert-state-of-art-pre.html\">Image Source 1<\/a>, <a href=\"http:\/\/jalammar.github.io\/illustrated-bert\/\">Image Source 2<\/a><\/li>\n<\/ul>\n<h4><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/?size=100&id=91CnU00i6HLv&format=png&color=000000\" style=\"height:50px;display:inline\"> \u81ea\u76d1\u7763\u548c\u65e0\u76d1\u7763\u7684\u6838\u5fc3\u533a\u522b\u5728\u54ea\u91cc\uff1f<\/h4>\n<pre><code class=\"language-python\">from datasets import load_dataset\n\ndef load_and_split_dataset(csv_file_path, test_size=0.2, seed=42):\n    &quot;&quot;&quot;\n    \u4ece CSV \u6587\u4ef6\u4e2d\u52a0\u8f7d\u6570\u636e\uff0c\u5e76\u62c6\u5206\u4e3a\u8bad\u7ec3\u96c6\u4e0e\u6d4b\u8bd5\u96c6\u3002\n\n    \u53c2\u6570:\n    -------\n    csv_file_path : str\n        CSV \u6587\u4ef6\u8def\u5f84\n    test_size : float\n        \u6d4b\u8bd5\u96c6\u5360\u6bd4 (\u9ed8\u8ba4\u4e3a 0.2, \u5373 20%)\n    seed : int\n        \u968f\u673a\u79cd\u5b50\n\n    \u8fd4\u56de:\n    -------\n    dataset_dict : DatasetDict\n        \u5305\u542b &#039;train&#039; \u4e0e &#039;test&#039; \u4e24\u4e2a\u5207\u5206\u7684 DatasetDict \u5bf9\u8c61\n    &quot;&quot;&quot;\n    # 1. \u52a0\u8f7d CSV \u6587\u4ef6\uff08\u5176\u4e2d\u4e00\u5217\u540d\u4e3a &quot;review&quot;, \u53e6\u4e00\u5217\u540d\u4e3a &quot;sentiment&quot;\uff09\n    raw_dataset = load_dataset(\n        &quot;csv&quot;, \n        data_files=csv_file_path\n    )  \n    # \u6ce8\u610f\uff1a\u6b64\u65f6 raw_dataset \u53ea\u5305\u542b\u4e00\u4e2a\u540d\u4e3a &quot;train&quot; \u7684\u5207\u5206\u3002\n    # \u56e0\u4e3a\u9ed8\u8ba4\u60c5\u51b5\u4e0b\u8bfb\u53d6\u5355\u4e00\u6587\u4ef6\u4f1a\u653e\u5728 &quot;train&quot; \u8fd9\u4e2a\u5207\u5206\u4e0b\u3002\n    # \u4f60\u53ef\u4ee5\u7528 raw_dataset[&quot;train&quot;] \u6765\u8bbf\u95ee\u5168\u90e8\u6570\u636e\u3002\n\n    # 2. \u628a\u5168\u90e8\u6570\u636e\u62c6\u5206\u6210\u8bad\u7ec3\u96c6\u548c\u6d4b\u8bd5\u96c6\uff088:2\uff09\n    # \u4f7f\u7528 train_test_split \u5c06\u539f\u59cb raw_dataset[&quot;train&quot;] \u5207\u5206\u4e3a &#039;train&#039; \u548c &#039;test&#039;\n    dataset_dict = raw_dataset[&quot;train&quot;].train_test_split(\n        test_size=test_size,\n        shuffle=True,\n        seed=seed\n    )\n\n    # 3. \u6253\u5370\u6570\u636e\u5f62\u6001\u4e0e\u793a\u4f8b\n    print(f&quot;Dataset splits: {dataset_dict}&quot;)\n    print(f&quot;Train samples: {dataset_dict[&#039;train&#039;].num_rows}&quot;)\n    print(f&quot;Test samples: {dataset_dict[&#039;test&#039;].num_rows}&quot;)\n\n    # \u6253\u5370\u524d\u4e24\u884c\u4f5c\u4e3a\u793a\u4f8b\n    print(&quot;Train sample[0]:&quot;, dataset_dict[&quot;train&quot;][0])\n    print(&quot;Test sample[0]:&quot;, dataset_dict[&quot;test&quot;][0])\n\n    # \u8fd4\u56de\u5305\u542b &#039;train&#039; \u548c &#039;test&#039; \u7684 DatasetDict\n    return dataset_dict\n\ncsv_path = &quot;datasets\/imdb\/IMDB Dataset.csv&quot;  # \u4f60\u7684 CSV \u6587\u4ef6\u8def\u5f84\ndataset = load_and_split_dataset(\n    csv_file_path=csv_path,\n    test_size=0.2,\n    seed=42\n)<\/code><\/pre>\n<pre><code>\/home\/arwin\/anaconda3\/envs\/dt\/lib\/python3.8\/site-packages\/tqdm\/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https:\/\/ipywidgets.readthedocs.io\/en\/stable\/user_install.html\n  from .autonotebook import tqdm as notebook_tqdm\n<frozen importlib._bootstrap>:219: RuntimeWarning: pyarrow.lib.Tensor size changed, may indicate binary incompatibility. Expected 64 from C header, got 80 from PyObject\n\nDataset splits: DatasetDict({\n    train: Dataset({\n        features: ['review', 'sentiment'],\n        num_rows: 40000\n    })\n    test: Dataset({\n        features: ['review', 'sentiment'],\n        num_rows: 10000\n    })\n})\nTrain samples: 40000\nTest samples: 10000\nTrain sample[0]: {'review': \"The film disappointed me for many reasons: first of all the depiction of a future which seemed at first realistic to me was well-built but did only feature a marginal role. Then, the story itself was a weak copy of Lost in Translation. The Middle-Eastern setting, man with family meets new girl overseas, karaoke bar, the camera movements and the imagery - all that was a very bad imitation of the excellent Lost in Translation which had also credibility. This movie tries to be something brilliant and cultural: it is not. I wonder why Tim Robbins even considered doing this film!? The female main actress is awful - did she play the precog in Minority Report? And why do you have to show the vagina in a movie like this? Lost in Translation didn't have to show excessive love scenes. R-Rated just for this? This movie isn't even worth watching it from a videostore!\", 'sentiment': 'negative'}\nTest sample[0]: {'review': 'Arguably the finest serial ever made(no argument here thus far) about Earthman Flash Gordon, Professor Zarkov, and beautiful Dale Arden traveling in a rocket ship to another universe to save the planet. Along the way, in spellbinding, spectacular, and action-packed chapters Flash and his friends along with new found friends such as Prince Barin, Prince Thun, and the awesome King Vultan pool their resources together to fight the evils and armies of the merciless Ming of Mongo and the jealous treachery of his daughter Priness Aura(now she\\'s a car!). This serial is not just a cut above most serials in terms of plot, acting, and budget - it is miles ahead in these areas. Produced by Universal Studios it has many former sets at its disposable like the laboratory set from The Bride of Frankenstein and the Opera House from The Phantom of the Opera just to name a few. The production values across the board are advanced, in my most humble opinion, for 1936. The costumes worn by many of these strange men and women are really creative and first-rate. We get hawk-men, shark men, lion men, high priests, creatures like dragons, octasacks, orangapoids, and tigrons(oh my!)and many, many other fantastic things. Are all of them believable and first-rate special effects? No way. But for 1936 most are very impressive. The musical score is awesome and the chapter beginnings are well-written, lengthy enough to revitalize viewer memories of the former chapter, and expertly scored. Director Frederick Stephani does a great job piecing everything together wonderfully and creating a worthy film for Alex Raymond\\'s phenom comic strip. Lastly, the acting is pretty good in this serial. All too often serials have either no names with no talent surrounding one or two former talents - here most everyone has some ability. Don\\'t get me wrong, this isn\\'t a Shakespeare troupe by any means, but Buster Crabbe does a workmanlike, likable job as Flash. He is ably aided by Jean Arden, Priscella Lawson, and the rest of the cast in general with two performers standing out. But before I get to those two let me add as another reviewer noted, it must have been amazing for this serial to get by the Hayes Office. I see more flesh on Flash and on Jean Rogers and Priscella Lawson than in movies decades later. The shorts Crabbe(and unfortunately for all of us Professor Zarkov((Frank Shannon)) wears are about as form-fitting a pair of shorts guys can wear. The girls are wearing mid drifts throughout and are absolutely beautiful Jean Rogers may have limited acting talent but she is a blonde bombshell. Lawson is also very sultry and sensuous and beautiful. But for me the two actors that make the serial are Charles Middleton as Ming: officious, sardonic, merciless, and fun. Middleton is a class act. Jack \"Tiny\" Lipson plays King Vultan: boisterous, rousing, hilarious - a symbol for pure joy in life and the every essence of hedonism. Lipson steals each and every scene he is in. The plot meanders here, there, and everywhere - but Flash Gordon is the penultimate serial, space opera, and the basis for loads of science fiction to follow. Excellent!', 'sentiment': 'positive'}<\/code><\/pre>\n<pre><code class=\"language-python\">\nimport torch\nfrom datasets import load_dataset\nfrom transformers import (\n    BertTokenizer,\n    BertForMaskedLM,\n    DataCollatorForLanguageModeling,\n    TrainingArguments,\n    Trainer\n)\n\n# 1. \u52a0\u8f7d IMDB \u6570\u636e\u96c6\n#    \u6570\u636e\u96c6\u5305\u542b &quot;train&quot; \u548c &quot;test&quot; \u4e24\u4e2a\u5207\u5206\uff0c\u6bcf\u6761\u6570\u636e\u5305\u542b &quot;text&quot; \u548c &quot;label&quot; \u5b57\u6bb5\u3002\nimdb_dataset = dataset\n\n# 2. \u521d\u59cb\u5316\u5206\u8bcd\u5668\uff08Tokenizer\uff09\ntokenizer = BertTokenizer.from_pretrained(&quot;datasets\/bert-base-uncased&quot;)\n\n# 3. \u5b9a\u4e49\u5206\u8bcd\u51fd\u6570\uff0c\u5e76\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u5206\u8bcd\u4e0e\u6570\u503c\u5316\n#    - `padding=&quot;max_length&quot;`: \u5c06\u53e5\u5b50\u8865\u5230\u540c\u6837\u957f\u5ea6\n#    - `truncation=True`    : \u8d85\u8fc7\u6307\u5b9a\u957f\u5ea6\u4f1a\u8fdb\u884c\u622a\u65ad\n#    - `max_length=128`     : \u7edf\u4e00\u5230 128 \u7684\u5e8f\u5217\u957f\u5ea6\ndef tokenize_function(examples):\n    # \u6ce8\u610f\u5217\u540d\u7528 &quot;review&quot;\n    return tokenizer(\n        examples[&quot;review&quot;],\n        padding=&quot;max_length&quot;,\n        truncation=True,\n        max_length=128\n    )\n\n# remove_columns=[&quot;review&quot;] \u8868\u793a\u5904\u7406\u540e\u53bb\u6389\u539f\u59cb\u6587\u672c\u5217\uff0c\u53ea\u4fdd\u7559\u6a21\u578b\u6240\u9700\u7684\u5b57\u6bb5\ntokenized_imdb = imdb_dataset.map(\n    tokenize_function, \n    batched=True, \n    remove_columns=[&quot;review&quot;]\n)\n\n# 4. \u51c6\u5907 DataCollator\n#    DataCollatorForLanguageModeling \u4f1a\u81ea\u52a8\u5bf9 batch \u5185\u7684\u53e5\u5b50\u8fdb\u884c\u968f\u673a Mask\n#    mlm_probability=0.15 \u8868\u793a\u5728\u4e00\u4e2a\u53e5\u5b50\u4e2d\u6709 15% \u7684 Token \u88ab\u968f\u673a Mask\u3002\ndata_collator = DataCollatorForLanguageModeling(\n    tokenizer=tokenizer,\n    mlm=True,\n    mlm_probability=0.15\n)\n\n# 5. \u5b9a\u4e49 BERT MLM \u6a21\u578b\nmodel = BertForMaskedLM.from_pretrained(&quot;datasets\/bert-base-uncased&quot;)\n\n# 6. \u8bad\u7ec3\u914d\u7f6e\ntraining_args = TrainingArguments(\n    output_dir=&quot;.\/mlm_imdb_bert&quot;,      # \u6a21\u578b\u8f93\u51fa\u8def\u5f84\n    evaluation_strategy=&quot;epoch&quot;,       # \u6bcf\u4e2a epoch \u7ed3\u675f\u540e\u8fdb\u884c\u4e00\u6b21\u8bc4\u4f30\n    per_device_train_batch_size=8,     # \u8bad\u7ec3\u65f6\u6bcf\u5757 GPU\/CPU \u7684 batch size\n    per_device_eval_batch_size=8,      # \u6d4b\u8bd5\u65f6\u6bcf\u5757 GPU\/CPU \u7684 batch size\n    num_train_epochs=1,                # \u6f14\u793a\u7528\u8bad\u7ec3\u8f6e\u6570\uff0c\u53ef\u6839\u636e\u9700\u8981\u4fee\u6539\n    logging_steps=100,                 # \u6bcf\u9694\u591a\u5c11\u6b65\u6253\u5370\u65e5\u5fd7\n    save_steps=500                     # \u591a\u5c11\u6b65\u4fdd\u5b58\u4e00\u6b21\u6a21\u578b\n)\n\n# 7. \u7528 Trainer \u6765\u5c01\u88c5\u8bad\u7ec3\u6d41\u7a0b\ntrainer = Trainer(\n    model=model,\n    args=training_args,\n    train_dataset=tokenized_imdb[&quot;train&quot;],  # \u8bad\u7ec3\u96c6\n    eval_dataset=tokenized_imdb[&quot;test&quot;],    # \u6d4b\u8bd5\u96c6\n    data_collator=data_collator\n)\n\n# 8. \u8fdb\u884c\u8bad\u7ec3\ntrainer.train()\n\n# 9. \u8bad\u7ec3\u5b8c\u6210\u540e\uff0c\u53ef\u4f7f\u7528 trainer.evaluate() \u5bf9\u6d4b\u8bd5\u96c6\u8fdb\u884c\u8bc4\u4f30\neval_results = trainer.evaluate()\nprint(eval_results)\n<\/code><\/pre>\n<pre><code><div>\n\n  <progress value='2500' max='2500' style='width:300px; height:20px; vertical-align: middle;'><\/progress>[2500\/2500 20:34, Epoch 1\/1]\n<\/div>\n<table border=\"1\" class=\"dataframe\"><\/code><\/pre>\n<thead>\n<tr style=\"text-align: left;\">\n<th>Epoch<\/th>\n<th>Training Loss<\/th>\n<th>Validation Loss<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>1<\/td>\n<td>1.984600<\/td>\n<td>1.933474<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div>\n<progress value='625' max='625' style='width:300px; height:20px; vertical-align: middle;'><\/progress>[625\/625 02:10]\n<\/div>\n<pre><code>{'eval_loss': 1.9387198686599731, 'eval_runtime': 130.2856, 'eval_samples_per_second': 76.754, 'eval_steps_per_second': 4.797, 'epoch': 1.0}<\/code><\/pre>\n<h3><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/dusk\/64\/000000\/prize.png\" style=\"height:50px;display:inline\"> Credits<\/h3>\n<hr \/>\n<ul>\n<li>Icons made by <a href=\"https:\/\/www.flaticon.com\/authors\/becris\" title=\"Becris\">Becris<\/a> from <a href=\"https:\/\/www.flaticon.com\/\" title=\"Flaticon\">www.flaticon.com<\/a><\/li>\n<li>Icons from <a href=\"https:\/\/icons8.com\/\">Icons8.com<\/a> - <a href=\"https:\/\/icons8.com\">https:\/\/icons8.com<\/a><\/li>\n<li>Datasets from <a href=\"https:\/\/www.kaggle.com\/\">Kaggle<\/a> - <a href=\"https:\/\/www.kaggle.com\/\">https:\/\/www.kaggle.com\/<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/why-initialize-a-neural-network-with-random-weights\/\">Jason Brownlee - Why Initialize a Neural Network with Random Weights?<\/a><\/li>\n<li><a href=\"https:\/\/openai.com\/blog\/deep-double-descent\/\">OpenAI - Deep Double Descent<\/a><\/li>\n<li><a href=\"https:\/\/taldatech.github.io\">Tal Daniel<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Learning Methods of Deep Learning create by Deepfinder  [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2630,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18,28],"tags":[],"class_list":["post-2628","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-18","category-28"],"_links":{"self":[{"href":"http:\/\/gnn.club\/index.php?rest_route=\/wp\/v2\/posts\/2628","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/gnn.club\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/gnn.club\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/gnn.club\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/gnn.club\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2628"}],"version-history":[{"count":39,"href":"http:\/\/gnn.club\/index.php?rest_route=\/wp\/v2\/posts\/2628\/revisions"}],"predecessor-version":[{"id":2672,"href":"http:\/\/gnn.club\/index.php?rest_route=\/wp\/v2\/posts\/2628\/revisions\/2672"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/gnn.club\/index.php?rest_route=\/wp\/v2\/media\/2630"}],"wp:attachment":[{"href":"http:\/\/gnn.club\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2628"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/gnn.club\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2628"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/gnn.club\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2628"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}