TECHNOLOGY INSPIRATION
Technology-People-Innovation

Using Google Scripts as Web Scraping in Reddit

Mr Reddy working on a project that involves mining Reddit data.
It fetches a listing of all posts on different subreddits and copies the obtained data to a Google spreadsheet for further analysis.
Reddit, unlike most websites, allows web scraping as long as the crawler scripts make no more than one request every two seconds to the Reddit servers (see rules). You don't even need a developer account  or an API key to perform scraping on Reddit.
There are popular tools like wget, Site Sucker (Mac) or HTTrack Website Copier (Windows) that can download whole websites content for offline use but they are mostly useless for scraping Reddit data since the site doesn't use page numbers and content of pages is constantly changing. A post may be listed on the first page of a subreddit but it could find itself on the third page the next second as other posts are voted to the top.

While there exists PHP and Python libraries for scraping Reddit, they are too complicated for the non-techies. Fortunately, there's always Google Apps Script to the rescue. Here's what you can do to pull data from any Subreddit on Reddit automatically.

Procedure 

Open the Google Sheet and choose File – Make a copy to copy this sheet in your Google Drive.

Go to Tools -> Script editor and copy-paste the Reddit Scraper Script. You can change "LifeProTips" to any other subreddit name.
While in the script editor, choose Run -> Run and authorize the script.
That's it. The script will run in the background automatically pulling content from Reddit into the Google spreadsheet. And it stops automatically once all the posts* of that Reddit have been fetched.


[*] All Subreddits on Reddit display a maximum of 999 posts – you can't go beyond that number even while manually browsing a subreddit.

Enter suggestions/ feedback

Post a Comment

[blogger]

Contact Form

Name

Email *

Message *

Powered by Blogger.
Javascript DisablePlease Enable Javascript To See All Widget