top of page

Diamond prediction project

The Diamonds dataset is originally from 2017 Tiffany & Co's snapshot price list. We use different features given by the dataset to predict diamond prices by building Regression models, including Linear regression, Lasso regression, Ridge regression, Tree regression, and Random forest. Eventually, we create unseen data to make predictions.

Part two: Model Building

  • Linear regression

  • Lasso regression

  • Ridge regression

  • Decision tree

  • Random forest 

Part one: Data Exploration 

  • A brief introduction of the dataset

  • Data description 

  • Data visualization

    • Distribution

    • Correlation heatmap 

    • Pair plot

  • Data processing ​

    • Get dummies​

    • Dimension reduction 

    • Data splitting 

Part three: Unseen Data

I created unseen data by adding gaussian noise to test if the models work or not and evaluate which model perform best.

bottom of page