Twitter user survey
During July and August 2009, we gathered data for a random selection of 83,628 twitter users. The following is an analysis of the data.
Note that some graphs present data for "real" users as well as "all" users in the survey. A crude algorithm was used to separate potential "real" users from spam accounts ("real" users were decided to be those that had changed their avatar picture from the default and had 5+ tweets, 5+ friends and 5+ followers). Using this liberal metric, 34,334 users were classed as real - about 40% of our total survey, though this undoubtedly still includes many spam and dead accounts.
Demographics: Sex
Twitter users do not specify their sex on registration. To deduce the sex of each user, we compared their full name (if provided) against US Census Data of first names (which was manually updated to include more recent names for obvious omissions). If a name could be either sex, we chose whichever sex had a higher popularity of that name. This resulted in 66% (55,504) of the users in our survey being assigned a sex.
The distribution was the same for "all" accounts and "real" accounts, with 59% female users compared with 41% male users.

Demographics: Age
The age of a twitter user is difficult to ascertain without a direct survey. To estimate the distribution of ages, we searched twitter for phrases such as "I am 23", "Im 23" or "I'm 23" that didn't contain mentions of "today", "tomorrow" or "birthday" (to reduce the skew of people announcing birthdays). The rate of tweets mentioning each age can then be used to plot the distribution.
![]()
There is clearly still a skew from birthday tweets (every 5/10 years, plus the legally important 18 and 21 ages), and we might imagine that younger users are more keen/enthusiastic/socially likely to be announcing their age.
However, given the large difference, we can assume that the average twitter user age (mode, rather than median) is somewhere between 18 and 21.
Number of tweets
Not much new to report here: a large number of twitter users have never tweeted or have only tweeted a few times (in our full sample, 22% had never tweeted; 58% had tweeted ten times or less).
![]()
Follower and following numbers
For the full sample, the number of 'followers' (for each user) peaks at around 2 to 4, then quickly drops off: 53% have 10 or less followers.
The number of people that users follow is more interesting. For "real" users, the distribution is fairly flat, with roughly the same number of people following 10 people as 50 or 100 people (with a slight peak at around 30 friends). Remember that we cut off "real" users at less than 5 friends, so the graph doesn't start until this point.
However, when we look at the "full" sample (all users), a massive spike occurs at 20 friends, suggesting that users who follow exactly 20 people are much more likely to be spam accounts.
![]()
Similarly, we can look at the ratio of following to follower numbers. Again, the "full" example exhibits spikes that "real" users do not, with spikes at the 10 ratio and 20 ratio (i.e. these users follow 10 or 20 times the number of people that follow them back).
Segregating by sex, we can also see that female users tend to have a very slightly higher average ratio than male users.
![]()
Miscellaneous
The graph below shows which day of the week twitter users created their account. Mid-week (Wednesday) tends to be busiest, with the weekend being the least popular time to create an account. Note that the data suggests that "all" users (i.e. including spam) have slightly higher account creation activity at the weekend than just "real" users.
![]()
Finally, we looked at the description/bio length of each user. The majority (about 65% of all users, 35% of real users) have no description.
![]()
The following graph shows the description/bio length for users that have created one. This peaks at around 20 to 40 characters, with a large spike at 160 (maximum) character length - possibly due to users not understanding the limit and typing/pasting a large portion of text into the field.
![]()
Summary
According to our analysis, the 'average' Twitter user is a girl in her late teens, who is following 20 to 50 people, and has roughly the same number of people following her back. Her bio/description is quite short, at about 30 characters.
Studying twitter usage and demographics is important for anyone looking to exploit the ever-growing service, whether for business or personal/social means.
Unfortunately, some organisations and people take advantage of the openness and simplicity of twitter by trying to cheat the system and find 'quick wins' by not participating in the spirit of the platform; instead spam-ing, automating and deceiving.
Thankfully, as shown above, we can use this same twitter analysis to help identify the patterns of likely spammers. We now need the organisation behind Twitter to start integrating tools or algorithms to make better use of this type of pattern detection and prevention, stopping spammers before they can aggravate a significant number of real users.
A simple suggestion would be to adopt a similar 'scoring' system to that used by email spam detectors. For example, if a user is following exactly 20 people, 10 points; if a user hasn't changed their avatar, 2 points; no description, 2 points; account created at the weekend, 1 point; a following/follower ratio of more than 10, 5 points.
Each user could then specify in their profile settings a maximum 'score' that a potential follower user can have (this would default to a large number, such as 100), allowing individuals to set their preference about potential spam followers (and false-positives).
How else might we detect spam accounts? Does this average twitter user feel right to you? Let us know in the comments below.


Comments
Add a comment
Carl Morris
8 Sep 2009 15:21Also, you could look for a 'spew' of identical or similar tweets which often happens with spam accounts.
Sometimes the avatar is a generic-looking female face, I wonder if these images are ever repeated on different accounts? You could look for that and if they do recur they could be detected as spam.
Amy, Coach
8 Sep 2009 23:40Miguel
9 Sep 2009 12:18If some one is interested, just look at (this is not spam)
http://quor-wom.blogspot.com/2009/06/medios-y-redes-sociales-12.html">http://quor-wom.blogspot.com/2009/06/medios-y-redes-sociales-12.html
Best regards
Dan Zambonini
9 Sep 2009 16:07Interestingly, it seems to match (quite accurately) the 'average twitter user' from our research:
http://www.quantcast.com/twitter.com">http://www.quantcast.com/twitter.com#demographics
(Currently shows: 54% Female, 43% in the 18-34 age bracket)
Pranay Manocha
9 Sep 2009 23:00Doug_Remington.
10 Sep 2009 16:43The number of followers/following seems more credible and useful.
On the quantcast note, I know Alexa has its problems but funny to note is almost the exact opposite (more males, more in 25-34)
Dan Zambonini
11 Sep 2009 09:16Dan Zambonini
11 Sep 2009 09:21website design
19 Sep 2009 05:44Jack Repenning
13 Nov 2009 18:31Anecdotally, I have become convinced as well that there's a very strong population partition between casual / social / playful users, and users with commercial (buy or sell) interest. A visit to the Twitter @public_timeline will convince you that the former are in the vast majority, but surely the latter are more interesting for many users of the data.
tim Lucas
8 Jan 2010 11:53Im trying to do something similar with users of Twitter in Brasil.
Quick question - how did you create the random sample???
is it based on users or accounts - thinking some people could have a number of accounts.
Thanks again for the data - any other segmentation of twitter users anyone can recommend to me?
Robert Somerville
22 Jan 2010 05:30GuruBob
http://www.gurubobsblog.com/
fillout survey
14 Sep 2010 19:47rtyecript
23 Aug 2011 01:34Add your comment