Jul 12, 2015
TnT Editor

1.7 Billion Reddit Comments Available for Download Right Now (Worth Eight Years of Reddit Comments For Research Apply)

Reddit is an entertainment, social networking, and news website that allows Reddit registered users post links to content on the web. Any other Reddit users can then vote submissions links up or down to organize or discussions the posts and determine their position on the site’s pages. There are sub-communities, or subreddits that organized topics include news, gaming, movies, music, books, programming, science, fitness, food, photosharing, politics, and more.

1.7 Billion Reddit Comments Available for Download Right Now ( Worth Eight Years of Reddit Comments For Research Apply)

If you are researcher or data lover, you may always search for posts or comments by a particular user in a subreddit. In fact, Reddit available with millions of Reddit users and thousands of subreddits catering to nearly every topic imaginable, so it is a bit hard to find a community that fits your research.

But now, thanks to user “Stuck_In_the_Matrix”, everyone are eligible to download dataset of 1.70 billion entries or comments. User “Stuck_In_the_Matrix” start collected every comment from October 2007 up until May 2015. The data using 20 million Reddit API to farm around 1.70 billion entries or comments, however approximately 350,000 couldn’t be collected due to issues with Reddit’s API. This may due that some comment resides in a private subreddit or removed.

It is made up of JSON objects and saved as plain text, including the authors usernames, comments, scores, subreddit locations, position in the comment tree, and more.

The dataset comes with over 1TB uncompressed, or totals 250GB when compressed through the torrent file “Stuck_In_the_Matrix“. Besider that, you also can head to original Reddit post to download a much smaller one-month sampler, in case you only wish to download something you really need.

This 1.70 billion dataset is useful for major research projects with enough resources.

Get latest updates via email: