top of page

Part one: Data Scraping and Cleaning

Scrape the reviews from bestbuy.com, using Selenium in Python

Step one: Import packages & modules
Screen Shot 2021-12-05 at 2.48.02 PM.png
Step two: Set up the Chrome WebDriver

WebDriver is an open-source tool for automated testing of web apps across many browsers. Without that, it is not possible to execute Selenium test scripts in Google Chrome as well as automate any web application. After executing my code, a google browser will be opened and controlled by python.

Screen Shot 2021-12-05 at 2.48.37 PM.png
Screen Shot 2021-12-05 at 2.48.49 PM.png
Step three: Scrape data from bestbuy.com

I first created a new list "reviews_one_store" to store all the data I scraped from the website. In the for loop, I created a dictionary "one_review" to store raw HTML, review text, review date, and review star. Then I append the dictionary to my list. My code will scrape one page at a time, and stop when will the pages are scraped. The total number of reviews I collected is 5054.

Screen Shot 2021-12-05 at 2.49.05 PM.png
Step four: Clean data

Scraped data may not be as decent as we want. We need to make sure that the data is pretty clean and organized for later analysis. 

Screen Shot 2021-12-05 at 2.49.13 PM.png
Step five: Export to Excel
Screen Shot 2021-12-05 at 2.49.19 PM.png
Part of my data looks like:
Screen Shot 2021-12-06 at 1.33.53 PM.png
bottom of page