Prepare Data for Training

Courses

Introduction to Deep Learning with PyTorch

If the data you feed into your model is of low quality, the output of the model will also be of low quality. This is captured more succinctly by the aphorism “Garbage in, garbage out.” This unit teaches you techniques for preparing data for training to ensure that the neural network is trained effectively. You will explore data scaling techniques like normalization and standardization, learn when to apply each technique based on data distribution, and see how to handle categorical variables through one-hot encoding and label encoding. You will also learn how to properly split datasets into training, validation, and test subsets to evaluate generalization performance, avoid overfitting, and tune hyperparameters like learning rate and hidden layer size.

Tools

Sessions

Scale Data to Prepare it for Model Training

Scaling data with normalization or standardization, encoding categorical data

Split Data into Train, Validation, and Test Subdatasets

Split dataset into train, validation, and test subsets to tune hyperparameters and test how well the trained model generalizes