Categories
Geeky/Programming

Create a Word Cloud From Your Twitter Feed

I love playing with data. My data makes it even more fun. Wordle has been around for a long time, and so has Twitter (in Internet years anyways). I have always been fascinated by word clouds and visualizing text patterns, etc.

I figured that hey, there has got to be some analyzer for your twitter stream, and I am sure there are a ton, but I didn’t stumble upon any with some easy Googling, so I did it the hard way.

First goal? Get your Twitter feed and/or data somehow. Multiple ways to do this, but I stumbled upon a pretty cool site. http://tweetbook.in that let’s you create an eBook from your Twitter feed and favorites. It let’s you publish out as a PDF or XML file, so I figured that would work. It is a busy site and you may have to wait to get in but once you do you just oAuth it up to Twitter and grab your data.

Now, once you have your data, you need to do something with it. The data would be in XML so you need to parse out the data you want, for instance, I wanted to analyze my “favorites” so I wanted to get the text out of the XML. Here is my first favorite on Twitter (by the way, it will only go back 3200, I think – I only have 2600 or so faves)

  881539697
  A "Manager" class is like my grandmother's junk drawer.
  Fri Aug 08 14:23:56 +0000 2008
  web
  jeremydmiller
 

Well I just want to grab that <text> value, and without having to do any programming or powershell or C#, I fired up trusty old LINQPad (more on this tool in future posts for sure). I then just wrote a quick little query against the XML file like so:

var xml = XElement.Load (@"c:fave.xml");

var query =
  from e in xml.Elements()
  select e.Element("text").ToString().Replace("","").Replace("","").Replace("RT ","");

query.Dump();

As you can see, I am just loading up the xml file and doing some text cleanup (removing the xml text blocks and removing RT’s, the old syntax which muddies up the results). Note in LINQPad you need to change the query type to C# Statements instead of the default C# Expression.

Once I had my values in the results I wanted, I did a quick CTRL+A, CTRL+C (it still baffles me how many people don’t know CTRL+A is “Select All”) and then pasted it into notepad++, to view, and cleaned up some html characters there (quotes, etc) and then pasted it into Wordle. Here is what I got back:

wordle_of_my_favorites

You can see I really like to favor SQL Server, Microsoft, iPhone, Blogs, sqlpass, Google, SharePoint, Twitter, and pretty much everything geeky. Pretty dang cool. Why doesn’t Twitter offer something like this? I think it would be cool. What other cool things have you done with your “data” – what cool things would you like to see?