What was the Impact of TedxBrum on Social Networks in Twitter?

 This was a spur of the moment project, which I started on the morning of TedxBrum. The idea I had overnight was to look at the impact that TedxBrum had on social networks in Birmingham - did it connect people?

Who is included? What was measured?

As there was no list of which Twitter users were attending TedxBrum, all the work on who followed who is based on Twitter users who had tweeted the #TedxBrum hashtag before about 11:30 am on the day. I wrote code to monitor who they added to their follow list during the day & over then next few days.  So the data basically consists of a list of those who tweeted about TedxBrum in advance, and those who they followed. I have also collected data about actual tweets, which may form part of a subsequent analysis if there is sufficient interest (& I have time!)
All the data is/was publically available from Twitter i.e. anyone can see who follows who.

Iphone / No Java?

If you can't see the interactive charts below, or nothing happens when you click on them, you probably don't have Java available in your browser. I've included links to snapshots throughout so you can still see the visualisations.

What was Said?

Let's start with a view of what people said about TedxBrum on twitter. To create this I retrieved tweets mentioning "TedxBrum", and analysed the word frequencies. I then used Wordle to create a word cloud. I counted words such as "@flutebox" and "flutebox" separately in order to make references to twitter users easier to spot.  The Wordle website has a fullsize, printable copy. Top users mentioned are @flutebox, @annekadeva, @acuitydesign, @macbirmingham, @drbob82, @helgahenry, @asmallfurrybear, @suhaav, @jawilletts, @mystrogen, and @deanthepainter.

Who was Followed?

On the day and up until Sunday morning

This chart shows all follows of users who had tweeted about TedxBrum in advance, over the time period covered is approximately 11:30 on the morning of TedxBrum till 9:30am the next day. You will need to "click to interact" to see the live visualisation. If you're on a device which doesn't support Java, you can see a snapshot of the visualisation.
The top people followed, in this sample, were @Flutebox (performed at TedXBrum), @wgx (spoke on privacy), @LCBenit (performed) & @annekadeva (TedxBrum founder). That's hardly surprising, but then on the other hand, if we didn't see people who were on stage appearing, then we'd be worried about the quality of our data & analysis.
As the chart above was a bit busy, I cut the data down to only those users who received 2 or more follows. This gives move space for labels too. (snapshot)

Impact Over the Weekend

Next I looked at the time period up until about 8am on Monday Morning (snapshot).


Who did the Following?

Now lets examine who did the following (snapshot)

Impact Over the Weekend

Again, lets examine the time period up until about 8am on Monday Morning (snapshot).

Here we see users like @ukgav, who I know (from his tweets) didn't have twitter access during TedxBrum, catching up with the twitter stream. We also see other users such as @MyJQ, @Jacattell and @24_Carrots who have followed a lot during this period, but are their follows related to TedxBrum or are they working hard on other projects too? The social network visualisation in the next section will give us some more clues about this.

Social Network Visualisation

The classic way to visualise this kind of data is as a graph, these can be complex just like the real world data the represent, and are best viewed interactively. Unfortunately, I wasn't able to embed an interactive network viewer in this page, but here are some screenshots created using Gephi. Node size is proportional to the number of new followers.

On the day and up until Sunday morning

Follows over the same time period as above 11:30am Saturday to 9:30am Sunday
(click image to see it full size)
Network of New Followers during TedxBrum
And here is what it looks like when I select some of the most followed users & their followers
(again, please click image to see it full size)
New Followers Network - Most Followed Nodes & Followers Selected

Impact Over the Weekend

Next I looked at the time period up until about 8am on Monday Morning. We can spot some users (bottom, right) who have followed many people seemingly unrelated to TedxBrum.
(again, please click image to see it full size)

Discussion of the Data

You might have thought of some limitations in the data & analysis. I've laid out some below.
The sample of twitter users:
  •  is not just those who attended TedxBrum
  • does not include all those who attended TedxBrum
  • Solution for future projects: Gather full attendee list 
The sample of actions:
  • Is not just those actions caused by TedxBrum (how can we know the user's intent, or why they followed)
  • Deals only with new "follows" - but we are looking at the impact which TedxBrum had
  • Starts 1.5 hours after TedxBrum opened
  • Ends arbitrarily at 9:30am the day after / 8am 2 days after. Further data is available but not analysed here.
  • Only covers "Follow" and does not make use of tweets themselves. 
  • Does not incorporate information on how many follow a user typically does on a weekend without TedxBrum!
What are the privacy implications?
  • The raw data is available to any from Twitter, either via their API or by looking at profiles.
  • This is the kind of analysis that many companies are doing
  • This type of analysis can create knowledge about groups of people & behaviours which aren't evident in the raw data.

Further Work

  • Look at Tweet contents, co-occurrences of words, who tweets about what, extract the subject & sentiment (positive / negative) of tweets
  • Look at conversations i.e. Tweets between users interested in #TedxBrum
  • Visualise the whole network of followers before & after TedxBrum
  • An earlier sample (e.g. including of all those following @TedxBrum) would allow a larger number of users to be sampled, and a quantifiable impact to be computed.
  • Calculate quantitative figures for impact.

Technical Details

I used R to extract & process the data. In particular the twitteR package to interact with Twitter and the igraph package to represent the network of followers. I used the tm package to analyse tweets. The bubble charts are from ManyEyes & the network visualisations were produced using Gephi.


You can contact me @AndyPryke on twitter, or andy@the-data-mine.co.uk
I'm always looking to work on interesting problems & data, including twitter & social network analysis but also prediction of customer behaviour (e.g. who will be likely to but tickets for what show, or buy a particular product, or be a repeat customer). I can also help with visualisation & analysis of just about any data you can come up with.