top of page
Unseen data

I used the test data as the unseen data and put the data into the models. I added the prediction values for each model to the test dataset.

​

I have used random forest for data dimensionality reduction. Linear models almost do not have overfitting since the model is not too complex. Both the random forest and decision tree models have a little overfitting. And the random forest have more apparent overfitting, which may possibly be caused by the fact that random forest is a more complex model. But I think random forest is the best model to predict based on the predicting performance and RMSE result.

bottom of page