Twitter User Survey: Demographics and Statistics

Overview
During July and August 2009, Box UK gathered data from a random selection of 83,628 Twitter users. This is an analysis of that data.
Author
Dan Zambonini
Date

Twitter user survey

During July and August 2009, we gathered data for a random selection of 83,628 twitter users. The following is an analysis of the data.

Note that some graphs present data for “real” users as well as “all” users in the survey. A crude algorithm was used to separate potential “real” users from spam accounts (“real” users were decided to be those that had changed their avatar picture from the default and had 5+ tweets, 5+ friends and 5+ followers). Using this liberal metric, 34,334 users were classed as real – about 40% of our total survey, though this undoubtedly still includes many spam and dead accounts.

Demographics: Sex

Twitter users do not specify their sex on registration. To deduce the sex of each user, we compared their full name (if provided) against US Census Data of first names (which was manually updated to include more recent names for obvious omissions). If a name could be either sex, we chose whichever sex had a higher popularity of that name. This resulted in 66% (55,504) of the users in our survey being assigned a sex.

The distribution was the same for “all” accounts and “real” accounts, with 59% female users compared with 41% male users.

Twitter sex

Demographics: Age

The age of a twitter user is difficult to ascertain without a direct survey. To estimate the distribution of ages, we searched twitter for phrases such as “I am 23″, “Im 23″ or “I’m 23″ that didn’t contain mentions of “today”, “tomorrow” or “birthday” (to reduce the skew of people announcing birthdays). The rate of tweets mentioning each age can then be used to plot the distribution.

Twitter age

There is clearly still a skew from birthday tweets (every 5/10 years, plus the legally important 18 and 21 ages), and we might imagine that younger users are more keen/enthusiastic/socially likely to be announcing their age.

However, given the large difference, we can assume that the average twitter user age (mode, rather than median) is somewhere between 18 and 21.

Number of tweets

Not much new to report here: a large number of twitter users have never tweeted or have only tweeted a few times (in our full sample, 22% had never tweeted; 58% had tweeted ten times or less).

Twitter tweets

Follower and following numbers

For the full sample, the number of ‘followers’ (for each user) peaks at around 2 to 4, then quickly drops off: 53% have 10 or less followers.

The number of people that users follow is more interesting. For “real” users, the distribution is fairly flat, with roughly the same number of people following 10 people as 50 or 100 people (with a slight peak at around 30 friends). Remember that we cut off “real” users at less than 5 friends, so the graph doesn’t start until this point.

However, when we look at the “full” sample (all users), a massive spike occurs at 20 friends, suggesting that users who follow exactly 20 people are much more likely to be spam accounts.

Twitter followers

Similarly, we can look at the ratio of following to follower numbers. Again, the “full” example exhibits spikes that “real” users do not, with spikes at the 10 ratio and 20 ratio (i.e. these users follow 10 or 20 times the number of people that follow them back).

Segregating by sex, we can also see that female users tend to have a very slightly higher average ratio than male users.

Twitter followers real

Miscellaneous

The graph below shows which day of the week twitter users created their account. Mid-week (Wednesday) tends to be busiest, with the weekend being the least popular time to create an account. Note that the data suggests that “all” users (i.e. including spam) have slightly higher account creation activity at the weekend than just “real” users.

Twitter day created

Finally, we looked at the description/bio length of each user. The majority (about 65% of all users, 35% of real users) have no description.

Twitter no bio

The following graph shows the description/bio length for users that have created one. This peaks at around 20 to 40 characters, with a large spike at 160 (maximum) character length – possibly due to users not understanding the limit and typing/pasting a large portion of text into the field.

Twitter bio length

Summary

According to our analysis, the ‘average’ Twitter user is a girl in her late teens, who is following 20 to 50 people, and has roughly the same number of people following her back. Her bio/description is quite short, at about 30 characters.

Studying twitter usage and demographics is important for anyone looking to exploit the ever-growing service, whether for business or personal/social means.

Unfortunately, some organisations and people take advantage of the openness and simplicity of twitter by trying to cheat the system and find ‘quick wins’ by not participating in the spirit of the platform; instead spam-ing, automating and deceiving.

Thankfully, as shown above, we can use this same twitter analysis to help identify the patterns of likely spammers. We now need the organisation behind Twitter to start integrating tools or algorithms to make better use of this type of pattern detection and prevention, stopping spammers before they can aggravate a significant number of real users.

A simple suggestion would be to adopt a similar ‘scoring’ system to that used by email spam detectors. For example, if a user is following exactly 20 people, 10 points; if a user hasn’t changed their avatar, 2 points; no description, 2 points; account created at the weekend, 1 point; a following/follower ratio of more than 10, 5 points.

Each user could then specify in their profile settings a maximum ‘score’ that a potential follower user can have (this would default to a large number, such as 100), allowing individuals to set their preference about potential spam followers (and false-positives).

How else might we detect spam accounts? Does this average twitter user feel right to you? Let us know in the comments below.

Comments

Add a comment

  • 08 Sep 2009 15:21

    In a similar fashion to email spam filtering, you could look for certain words and phrases typical of spammers – V1agra, make money fast, etc.

    Also, you could look for a 'spew' of identical or similar tweets which often happens with spam accounts.

    Sometimes the avatar is a generic-looking female face, I wonder if these images are ever repeated on different accounts? You could look for that and if they do recur they could be detected as spam.

  • 08 Sep 2009 23:40

    This is quite interesting analysis. Its so hard to differentiate between spammers and real users though as so many real users also use Auto-DM software. I used to get DMs on my phone, but now so many people send I voted this and I just took this quiz that its way too much spam.

  • 09 Sep 2009 12:18

    Interesting analysis, very helpfull. Thanks a lot for this good job. Twitter needs a better and deeper research and understanding. I conduct a qualitative research in Spain about Twitter uses.

    If some one is interested, just look at (this is not spam)

    http://quor-wom.blogspot.com/2009/06/medios-y-redes-sociales-12.html

    Best regards

  • 09 Sep 2009 16:07 Box UK Staff

    Something I've just checked, that I didn't when I wrote this article, was the Quantcast demographics for the twitter.com website.

    Interestingly, it seems to match (quite accurately) the 'average twitter user' from our research:

    http://www.quantcast.com/twitter.com#demographics

    (Currently shows: 54% Female, 43% in the 18-34 age bracket)

  • 09 Sep 2009 23:00

    By your logic, if I tweeted "I am 12 stone" I would qualify as 12 years old.

  • 10 Sep 2009 16:43

    I'm surprised how many people tweet their age. All this study tells us though is that people under 25 are more likely to tweet their age. This is completely mutually exclusive of the actual number of people of each age.

    The number of followers/following seems more credible and useful.

    On the quantcast note, I know Alexa has its problems but funny to note is almost the exact opposite (more males, more in 25-34)

  • 11 Sep 2009 09:16 Box UK Staff

    The 'age' stats aren't supposed to be accurate, of course, but I still think does show some trend (e.g. are 24 year olds really less likely to tweet their age than 23 year olds? Probably not, in which case the difference in the data is significant, and there probably are more 23 year olds than 24 on twitter).

  • 11 Sep 2009 09:21 Box UK Staff

    @Pranay I didn't mention it in the blog, but the search query also contained negative specifiers for weights, distances, times (feet, miles, seconds etc) to try to prevent that ("I'm 20 seconds away from exploding!").

  • 13 Nov 2009 18:31

    Your "Tweets per user" chart clearly shows a commonly reported demographic behavior: many many people try it and abandon it, never fill out their profiles, never tweet, never follow. I suppose these stats might be useful in many ways, but I can't think of any where it's really interesting to include these non-users. Wouldn't it be more useful to remove them? Some cut-off like "at least 5 tweets, or at least 5 follows," perhaps.

    Anecdotally, I have become convinced as well that there's a very strong population partition between casual / social / playful users, and users with commercial (buy or sell) interest. A visit to the Twitter @public_timeline will convince you that the former are in the vast majority, but surely the latter are more interesting for many users of the data.

  • 08 Jan 2010 11:53

    Dan – wow – great to find some considered research into Twitter rather than anecdote and personal opinions.
    Im trying to do something similar with users of Twitter in Brasil.
    Quick question – how did you create the random sample???
    is it based on users or accounts – thinking some people could have a number of accounts.
    Thanks again for the data – any other segmentation of twitter users anyone can recommend to me?

  • 22 Jan 2010 05:30

    A very interesting post. I was intrigued by the female skew which may be indicative of the long held assumption that females are better at networking.

    GuruBob
    http://www.gurubobsblog.com/

  • 21 Sep 2014 10:12

    Great article! We are linking to this great article on our site.
    Keeep up the good writing.

  • 17 Dec 2014 06:07

    Can I just say what a comfort tto find soomeone that actually knows what
    they are talkingg aboout on the net. You certainly know how to bring a
    problem to light and make it important. Moree and more people should
    check this out and understand this side of your story.
    I was surprised you’re not more popular given tha you surely hzve the gift.

Add your comment

If provided, we will link to this from your name