Saving a product name and price from Amazon is straightforward, simply mark the name and the price of any product and store them wherever you want. But…what if you have hundreds or even thousands of product names and prices to be saved? Will the same trick work? At least not for us!

In this article, we will show you how to quickly build a simple scraper with some Ruby libraries to crawl a product name and price from Amazon, which can be applied to hundreds of Amazon products.

Crawling Amazon with ProxyCrawl

Let’s create a file amazon_scraper.rb which will contain our ruby code.

Let’s also install our two requirements by pasting the below at your command prompt:

  • gem install proxycrawl
  • gem install nokogiri

Now its time to start coding. Let’s write our code in the amazon_scraper.rb file, and we will start by loading an HTML page of one Amazon product URL using ProxyCrawl ruby library. We need to initialize the library and create a worker with our token. For Amazon, we should use the normal token, make sure to replace it with your actual token from your account.

1
2
3
4
5
require 'proxycrawl'

api = ProxyCrawl::API.new(token: YOUR_TOKEN)
url = 'https://www.amazon.com/dp/B071JNRK1V'
html = api.get(url)

We are now loading the URL, but we are not doing anything with the result. So it’s now time to start scraping the name and the price of the product.

Scraping Amazon data

We will use Ruby Nokogiri library that we installed before to parse the resulting HTML and extract only the name and price of the Amazon product.

Let’s write our code which should parse an HTML body and scrape the product name and price accordingly.

1
2
3
4
5
require 'nokogiri'

doc = Nokogiri::HTML(html.body)
product_name = doc.at('#productTitle').text.strip
product_price = doc.at('#priceblock_ourprice').text.strip

The full code should look like the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
require 'proxycrawl'
require 'nokogiri'

api = ProxyCrawl::API.new(token: YOUR_TOKEN)

url = 'https://www.amazon.com/dp/B071JNRK1V'
html = api.get(url)

doc = Nokogiri::HTML(html.body)
product_name = doc.at('#productTitle').text.strip
product_price = doc.at('#priceblock_ourprice').text.strip

puts "Amazon Product URL: #{url}"
puts "Amazon Product Name: #{product_name}"
puts "Amazon Product Price: #{product_price}"

Now we should have our scraped Amazon product name and price like the following in the command prompt:

The code is ready, and you can quickly scrape an Amazon product to get its name and price. You can see the results in the console in which it can be saved in a database, save in a file, etc. That is up to you.

Happy crawling!