Usernames notoriously leak information about users. The numbers in each usernames are generally tied to the birth year, account creation year, or some favorite number. Initially using OKCupid ages and usernames, I found a great correlation between the user age and birth years for two and four digit numbers.
Taking this insight to the Reddit BigQuery dataset, I estimated the mean age of each subreddit. I show them in the subreddit clustering work done by rhiever which connects subreddits depending on the number of overlapping user comments.
The ages group up well with the content in each. Subreddits with median ages less than 20 tend to be found in /r/teenagers, /r/pokemon, /r/leagueoflegends – as expected. 30 year old or greater median age subreddits tend to be focued on parenting (/r/parenting, /r/dadit) or more adult like past times (/r/woodworking, /r/gardening).
Interestingly, cities show a large difference in their mean age using this technique. Some make sense – /r/Orlando being the oldest for example. Further validation would be to compare these values to the population weighted average in each city to understand how each subreddit is skewed relative to the US census data (baseline).
You can find more details in my jupyter notebooks on github.