In this case study, we gather data on pricing, costumer rating, and sales ranks of a broad range of mobile phones sold on amazon.com, wondering about the price segments covered by leading producers of mobile phones. Amazon sells a broad range of products, allowing us to get comprehensive summary of the products from each of the big mobile phone producers.
The case study makes use of the packages RCurl, XML, and stringr and it features search page manipulation, link extraction, and page downloads using the RCurl curl handle. After reading the case study, you should be able to search information in source code and to apply XPath in real-life problems. Furthermore, within this case study a SQLite database is created to store data in a consistent way and make it reusable for the next case study.
Amazon sells all kinds of products. Our first task is therefore to restrict the product search to certain categories and specific producers. Furthermore, we have to find a way to exclude accessories or used phones.
Let us have a look at the Amazon website: www.amazon.com. Check out the search bar at the top of the page—see also Figure 16.1. In addition to typing in search keywords, we can select the department in which we want to search. Do the following: