Scraping App Store Reviews with Python
How to use the itunes-app-scraper and app-store-scraper to build datasets of app information and reviews
In a previous post, I laid out how you can use the google-play-scraper to scrape both app details (description, price, current version, etc.) and app reviews. This post will focus on using Python code to do the same thing, but for the App Store.
Whereas the google-play-scraper provides functions for scraping app info and reviews in one convenient package, you’ll need to use two separate libraries to accomplish this for the App Store.
The itunes-app-scraper provides a couple methods that can be used to obtain app IDs, and additional methods to actually scrape data about those apps. With this scraper you can obtain several app details like the app description, price, genre, and current version.
The app-store-scraper provides a method for scraping user reviews of apps in the App Store.
I’ll cover how I prefer to use each to make sure I’m getting the data I want for the apps I’m interested in.
Getting Started Scraping the App Store
Step 1: Obtain App Names and IDs
There is one piece of information that is required to scrape app info or reviews and that’s the app name. There is a second piece of information that I suggest you treat as required because sometimes things go a little wonky when trusting the scrapers to retrieve it for you automatically: the app ID.
Both pieces of information can be found in the url of the app’s page in the App Store. As shown in the image below, the app name can be found between “app/” and “/id”.
The app ID immediately follows “/id” and ends the url.
My newest project is focused on mental health, mindfulness, and self care apps. As I was researching apps, I kept track of lots of various info in a spreadsheet. This was a natural place to store the name and ID for each app. And with a spreadsheet like this one, we can easily read in the file to a Pandas DataFrame to get lists of app names and IDs to iterate over.
If you’re scraping reviews for multiple apps, I would also suggest keeping track of the rough estimate of the number of ratings for each app. It takes a while to scrape reviews, especially compared to the google-play-scraper. By keeping track of the number of ratings an app has, you can decide how to chunk your list of apps for scraping. If you know you’re going to need to pause at some point soon, you’ll know not to start scraping an app with millions of reviews just yet.
It should also be noted that the rough number of ratings for each app will definitely exceed the number of reviews you get from scraping all the reviews. Not everyone who rates an app takes the time to leave a review.
Step 2: Installs and Imports
Here I’ll import everything we’ll need. If you’d like to see an example of storing app data in a MongoDB collection using Pymongo, refer to my earlier post about using the google-play-scraper. For this post, we’ll simply write each batch to a csv file.
You should pip install
as necessary to be able to be able to import the following:
Scraping App Info
The stage is mostly set for us to start scraping and storing. We just need our list of app IDs. I downloaded a version of my spreadsheet as a csv file, so I’ll read that in as a Pandas DataFrame.
And now we can easily get lists of app names and IDs to loop through while scraping:
For now, to get the app info using the itunes-app-scraper, we will only be using the app IDs. The library provides a method for retrieving app IDs based on the app name from the url (get_app_ids_for_query
), but I’ve found that it doesn’t reliably return what I’m asking for or it returns extra IDs. So rather than bother with that, we’ll feed our list of app IDs directly into the get_multiple_app_details
method after instantiating the AppStoreScraper
.
The last line prints a dictionary containing various information about our first app. Even though it’s pretty print, it’s still not very nice to look at:
So let’s make our list of dictionaries into a Pandas DataFrame and write that to a csv file using the following code:
Scraping App Reviews
Now we’ll be using the app-store-scraper from which we imported the AppStore
class to scrape reviews. Once instantiated, the AppStore
class has a review
method that enables us to scrape reviews. To instantiate, you need to provide a country code, the app name, and the app ID. I definitely recommend supplying the ID directly to the class, otherwise you might not get exactly what you’re expecting.
The review
method has 3 parameters. The first, how_many
, is simply how many reviews you want to scrape in total. If no argument is provided, all reviews will be scraped. The review
method scrapes batches of 20 reviews at a time. This can’t be changed.
The second parameter, after
, allows you to filter out older reviews by providing a datetime object so that you only get reviews written after that date. This will not actually limit the number of calls you make because App Store reviews can’t be sorted by date. So basically review
will still make a call to each review, but won’t actually store the review if it doesn’t meet the criteria.
The last parameter, sleep
, is optional but I highly suggest using it to build in sleep time between calls. Especially if you plan to scrape a lot of reviews. Just slow it down.
After the review
method has completed it’s job, we can access all the reviews through the reviews
attribute and find out how many reviews were scraped for that particular app via the review_count
attribute.
Assuming you want reviews for multiple apps, you have to instantiate AppStore
for each app. So we’ll need to iterate through both our lists of app names and IDs to accomplish this.
The following block of code loops through all the apps in our lists. For each app, it instantiates the AppStore
class and calls review
to scrape reviews. We’ve limited the number of reviews collected to 10,000 and constrained collection to those reviews written after February 28, 2020. We’ve also built in a sleep interval between each call lasting 20 to 25 seconds.
After scraping the reviews, we also append 2 keys to each review dictionary, one to include the app name and the other to include the app ID. This way, once we concatenate all our separate csv files, we have an easy way to identify which app the review belongs to. Finally, we convert the list of dictionaries to a Pandas DataFrame and write that to a csv file that includes the app name in the file name.
Executing the above block of code also produces an output that keeps you updated on how scraping is progressing (the pink output is automatically output by the review
method):
Summary
We covered:
- How to get set up to scrape information and reviews from the App Store, including how to find the necessary app names and IDs
- How to use the itunes-app-scraper library to get app info data
- How to use the app-store-scraper library to scrape app reviews
- How to convert scraped app data into Pandas DataFrames and write to csv files
I hope you found this informative and are able to apply something you learned to your own work. Thanks for reading!