{"id":773,"date":"2024-05-23T12:43:20","date_gmt":"2024-05-23T04:43:20","guid":{"rendered":"http:\/\/madapapa.com\/wordpress\/?p=773"},"modified":"2024-05-23T12:58:44","modified_gmt":"2024-05-23T04:58:44","slug":"ying-wen-shi-da-kai-shi-jie-de-yao-shi-tong-zhi-sh","status":"publish","type":"post","link":"http:\/\/madapapa.com\/wordpress\/?p=773","title":{"rendered":"\u82f1\u6587\u662f\u6253\u5f00\u4e16\u754c\u7684\u94a5\u5319\uff0c\u540c\u5fd7\u5c1a\u9700\u52aa\u529b"},"content":{"rendered":"<p>\u4f7f\u7528colab\u8fdb\u884c\u3010\u4e2d\u6587\u5206\u8bcd\u6a21\u578b\u6d4b\u8bd5\u3011\uff0c\u7f16\u7a0b\u73af\u5883\u81ea\u52a8\u7ed9\u51fa\u4e86train_test_spint\u51fd\u6570\u7684\u63d0\u793a\uff0c\u6211\u5bf9\u4e8e\u8fd9\u79cd\u63d0\u793a\u4e0d\u592a\u6e05\u695a\uff0c\u5c31\u8ba9chatgpt\u6765\u89e3\u91ca\u3002<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/madapapa.com\/wordpress\/wp-content\/uploads\/2024\/05\/17164399788947.jpg\" alt=\"\" \/><\/p>\n<h2><a id=\"%E8%AF%B7chatgpt%E7%AC%AC%E4%B8%80%E6%AC%A1%E8%A7%A3%E9%87%8A\" class=\"anchor\" aria-hidden=\"true\"><span class=\"octicon octicon-link\"><\/span><\/a>\u8bf7chatgpt\u7b2c\u4e00\u6b21\u89e3\u91ca<\/h2>\n<p><img decoding=\"async\" src=\"http:\/\/madapapa.com\/wordpress\/wp-content\/uploads\/2024\/05\/17164399099100.jpg\" alt=\"\" \/><\/p>\n<p>The image you uploaded shows the function signature and description for a function used to split arrays or matrices into random train and test subsets. This function is likely train_test_split from the sklearn.model_selection module in the scikit-learn library. Here\u2019s a breakdown of the parameters and how to use them:<\/p>\n<p>Function Signature<br \/>\npython<br \/>\nCopy code<br \/>\n(*arrays: Any, test_size: Any | None = None, train_size: Any | None = None, random_state: Any | None = None, shuffle: bool = True, stratify: Any | None = None) -&gt; list<br \/>\nDescription<br \/>\nPurpose: The function splits arrays or matrices into random train and test subsets.<br \/>\nUsage: It wraps input validation and application to input data into a single call for splitting (and optionally subsampling) data into a one-liner.<br \/>\nParameters<br \/>\n*arrays: Sequence of indexables with the same length\/shape[0]. This is where you pass the data you want to split. It can be features (X) and labels (y).<br \/>\ntest_size: The proportion of the dataset to include in the test split. Can be a float (e.g., 0.25 for 25% test data), an integer (number of samples), or None (in which case the value is set to the complement of the train size).<br \/>\ntrain_size: The proportion of the dataset to include in the train split. Can be a float, an integer, or None (in which case the value is automatically set to the complement of the test size).<br \/>\nrandom_state: Controls the shuffling applied to the data before splitting. Pass an integer for reproducible output across multiple function calls.<br \/>\nshuffle: Whether or not to shuffle the data before splitting. Default is True.<br \/>\nstratify: If not None, data is split in a stratified fashion, using this as the class labels.<\/p>\n<h2><a id=\"%E8%AF%B7chatgpt%E7%AC%AC%E4%BA%8C%E6%AC%A1%E8%A7%A3%E9%87%8A\" class=\"anchor\" aria-hidden=\"true\"><span class=\"octicon octicon-link\"><\/span><\/a>\u8bf7chatgpt\u7b2c\u4e8c\u6b21\u89e3\u91ca<\/h2>\n<p>I understand that the concept can be a bit confusing, so let me clarify it further.<\/p>\n<p>The <code>random_state<\/code> parameter in functions like <code>train_test_split<\/code> controls the randomness of the operation. It ensures that the random processes involved (such as shuffling the data) are repeatable. By setting <code>random_state<\/code> to a specific value, you ensure that the split will be the same each time you run the code, as long as the data and the code remain unchanged.<\/p>\n<p>Here\u2019s a step-by-step explanation and demonstration:<\/p>\n<ol>\n<li>\n<p><strong>What does <code>random_state<\/code> do?<\/strong><\/p>\n<ul>\n<li><code>random_state<\/code> sets the seed for the random number generator. Different seeds will produce different sequences of random numbers, and therefore different shuffles and splits of the data.<\/li>\n<\/ul>\n<\/li>\n<li>\n<p><strong>Why does it matter?<\/strong><\/p>\n<ul>\n<li>Using the same <code>random_state<\/code> value allows you to reproduce your results exactly. This is useful for debugging, sharing your results with others, and ensuring consistent results when rerunning experiments.<\/li>\n<\/ul>\n<\/li>\n<li>\n<p><strong>Different values for <code>random_state<\/code>:<\/strong><\/p>\n<ul>\n<li>Different <code>random_state<\/code> values will initialize the random number generator differently, leading to different shuffles of the data and, consequently, different splits.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<h3><a id=\"demonstration\" class=\"anchor\" aria-hidden=\"true\"><span class=\"octicon octicon-link\"><\/span><\/a>Demonstration<\/h3>\n<p>Let\u2019s demonstrate this with a simple example using <code>train_test_split<\/code> with different <code>random_state<\/code> values.<\/p>\n<h4><a id=\"using-random-state-1\" class=\"anchor\" aria-hidden=\"true\"><span class=\"octicon octicon-link\"><\/span><\/a>Using <code>random_state=1<\/code><\/h4>\n<pre><code class=\"language-python\">from sklearn.model_selection import train_test_split\n\n# Sample data\nX = [[1, 2], [3, 4], [5, 6], [7, 8]]\ny = [0, 1, 0, 1]\n\n# Split the data with random_state=1\nX_train_1, X_test_1, y_train_1, y_test_1 = train_test_split(X, y, test_size=0.25, random_state=1)\n\nprint(&quot;random_state=1&quot;)\nprint(&quot;X_train:&quot;, X_train_1)\nprint(&quot;X_test:&quot;, X_test_1)\nprint(&quot;y_train:&quot;, y_train_1)\nprint(&quot;y_test:&quot;, y_test_1)\n<\/code><\/pre>\n<h4><a id=\"using-random-state-42\" class=\"anchor\" aria-hidden=\"true\"><span class=\"octicon octicon-link\"><\/span><\/a>Using <code>random_state=42<\/code><\/h4>\n<pre><code class=\"language-python\"># Split the data with random_state=42\nX_train_42, X_test_42, y_train_42, y_test_42 = train_test_split(X, y, test_size=0.25, random_state=42)\n\nprint(&quot;\\nrandom_state=42&quot;)\nprint(&quot;X_train:&quot;, X_train_42)\nprint(&quot;X_test:&quot;, X_test_42)\nprint(&quot;y_train:&quot;, y_train_42)\nprint(&quot;y_test:&quot;, y_test_42)\n<\/code><\/pre>\n<h3><a id=\"expected-output\" class=\"anchor\" aria-hidden=\"true\"><span class=\"octicon octicon-link\"><\/span><\/a>Expected Output<\/h3>\n<h4><a id=\"for-random-state-1\" class=\"anchor\" aria-hidden=\"true\"><span class=\"octicon octicon-link\"><\/span><\/a>For <code>random_state=1<\/code><\/h4>\n<pre><code class=\"language-plaintext\">random_state=1\nX_train: [[5, 6], [1, 2], [7, 8]]\nX_test: [[3, 4]]\ny_train: [0, 0, 1]\ny_test: [1]\n<\/code><\/pre>\n<h4><a id=\"for-random-state-42\" class=\"anchor\" aria-hidden=\"true\"><span class=\"octicon octicon-link\"><\/span><\/a>For <code>random_state=42<\/code><\/h4>\n<pre><code class=\"language-plaintext\">random_state=42\nX_train: [[7, 8], [3, 4], [1, 2]]\nX_test: [[5, 6]]\ny_train: [1, 1, 0]\ny_test: [0]\n<\/code><\/pre>\n<p>As you can see, using <code>random_state=1<\/code> and <code>random_state=42<\/code> results in different splits of the data. This is because the random number generator is initialized differently with each value, resulting in different shuffling of the data before it is split.<\/p>\n<h3><a id=\"consistency-across-runs\" class=\"anchor\" aria-hidden=\"true\"><span class=\"octicon octicon-link\"><\/span><\/a>Consistency Across Runs<\/h3>\n<p>If you run the above code multiple times with the same <code>random_state<\/code> value, you will get the same split every time. However, changing the <code>random_state<\/code> value will change the split.<\/p>\n<h3><a id=\"running-multiple-times-for-the-same-random-state\" class=\"anchor\" aria-hidden=\"true\"><span class=\"octicon octicon-link\"><\/span><\/a>Running Multiple Times for the Same <code>random_state<\/code><\/h3>\n<p>To verify this, you can run the split multiple times with the same <code>random_state<\/code>:<\/p>\n<pre><code class=\"language-python\">for i in range(3):\n    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)\n    print(f&quot;Run {i+1}:&quot;)\n    print(&quot;X_train:&quot;, X_train)\n    print(&quot;X_test:&quot;, X_test)\n    print(&quot;y_train:&quot;, y_train)\n    print(&quot;y_test:&quot;, y_test)\n    print(&quot;-&quot; * 30)\n<\/code><\/pre>\n<h3><a id=\"expected-output-for-each-run-with-random-state-42\" class=\"anchor\" aria-hidden=\"true\"><span class=\"octicon octicon-link\"><\/span><\/a>Expected Output for Each Run with <code>random_state=42<\/code><\/h3>\n<pre><code class=\"language-plaintext\">Run 1:\nX_train: [[7, 8], [3, 4], [1, 2]]\nX_test: [[5, 6]]\ny_train: [1, 1, 0]\ny_test: [0]\n------------------------------\nRun 2:\nX_train: [[7, 8], [3, 4], [1, 2]]\nX_test: [[5, 6]]\ny_train: [1, 1, 0]\ny_test: [0]\n------------------------------\nRun 3:\nX_train: [[7, 8], [3, 4], [1, 2]]\nX_test: [[5, 6]]\ny_train: [1, 1, 0]\ny_test: [0]\n------------------------------\n<\/code><\/pre>\n<p>This demonstrates that the split is consistent across multiple runs with the same <code>random_state<\/code>. Changing the <code>random_state<\/code> value will produce a different but consistent split for that specific value.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u4f7f\u7528colab\u8fdb\u884c\u3010\u4e2d\u6587\u5206\u8bcd\u6a21\u578b\u6d4b\u8bd5\u3011\uff0c\u7f16\u7a0b\u73af\u5883\u81ea\u52a8\u7ed9\u51fa\u4e86train_test_spint\u51fd\u6570\u7684\u63d0\u793a\uff0c\u6211\u5bf9\u4e8e\u8fd9\u79cd\u63d0\u793a\u4e0d\u592a\u6e05\u695a\uff0c\u5c31\u8ba9chatgpt\u6765\u89e3\u91ca\u3002 \u8bf7chatgpt\u7b2c\u4e00\u6b21\u89e3\u91ca The image you uploaded shows the function signature and description for a function used to split arrays or matrices into random train and test subsets. This function is likely train_test_split from the sklearn.model_selection module in the scikit-learn library. Here\u2019s a breakdown of the parameters and how to use them: Function Signature python Copy &hellip; <a href=\"http:\/\/madapapa.com\/wordpress\/?p=773\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">\u82f1\u6587\u662f\u6253\u5f00\u4e16\u754c\u7684\u94a5\u5319\uff0c\u540c\u5fd7\u5c1a\u9700\u52aa\u529b<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[48,47],"tags":[],"class_list":["post-773","post","type-post","status-publish","format-standard","hentry","category-artificial-intelligence","category-datascience"],"_links":{"self":[{"href":"http:\/\/madapapa.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/773","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/madapapa.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/madapapa.com\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/madapapa.com\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/madapapa.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=773"}],"version-history":[{"count":1,"href":"http:\/\/madapapa.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/773\/revisions"}],"predecessor-version":[{"id":774,"href":"http:\/\/madapapa.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/773\/revisions\/774"}],"wp:attachment":[{"href":"http:\/\/madapapa.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=773"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/madapapa.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=773"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/madapapa.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=773"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}