4.3 Train/Test Split

In order to actually train and test a model, we'll need to perform a train test split, where we will split data we have into two groups — train data and test data. Luckily, we won't have to manually split the data because Scikit-Learn provides us with a handy method: train_test_split. To use it, we'll first need to import it:


from sklearn.model_selection import train_test_split

From there, we can feed in our input data (the data that the model uses to generate predictions), which we'll name X, our output data (the actual values of what the model is trying to predict), y, and the proportion of the data that we want to partition to the test set (this should generally be around , or 30% of the entire dataset):


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)

We can store what the function returns in four different dataframes: X_train, X_test, y_train, y_test. It should be pretty self explanatory what each of the dataframes store.

Previous Section

2️⃣

4.2 Overfitting and Underfitting

Next Section

4️⃣

4.4 Train/Test Strategies

⚖️

Copyright © 2021 Code 4 Tomorrow. All rights reserved. The code in this course is licensed under the MIT License. If you would like to use content from any of our courses, you must obtain our explicit written permission and provide credit. Please contact classes@code4tomorrow.org for inquiries.