Sunday, April 8, 2012

Twitter, meet Pub Sub, or How to be the next Twitter

I've been experimenting with the Twitter API lately, getting a feel for not only its published capabilities but the nuances of what you can achieve with it.  If you want to see where I've gone with that, feel free to drop in on socialseer.com from time to time -- as I learn new tricks or come up with new ideas, the web site may go in a completely different direction.

Most of the time, I think about the humans at the other end of the tweet.  Where are they? Are they male or female? Are they happy or sad? Who are they talking about? What are they talking about?  These are all interesting questions, and interesting questions to try to get a computer to answer.  I have code that can answer some of these precisely, guess at some of them, and some are still beyond my grasp.

But even with incomplete and sometimes questionably accurate information, a wealth of analysis can be done.  You can see a blog post here about the reverberations of Rush Limbaugh's "slut" comment and, again, my socialseer.com website has many other examples.

Sometimes, one of my posts will get a lot of hits -- mostly when the topic is on something controversial or in the news.  And perhaps it's a bit of vanity, but I try to cater to that in the posts.  Deep down, though, I'm really looking at how I can extract more interesting information from Twitter and how that information can be used. That probably puts me in a very distinct and small minority -- I don't care about the events as much as how the events can be observed through the Twitter lens.

If you're with me so far, we're about to go a bit deeper and nerdier...

The starting point for this analysis is a mechanical access to the information in Twitter.  Twitter offers a reasonably decent API (reasonably, because it's full of quirks, inconsistencies, strange failures, and limits you have to deal with).  And for the most part, they offer it to any or all to play with.  That's kind and generous of them, and I am grateful to them for that.

At the lowest level, Twitter turns out to be a (quirky, etc.) implementation of what software engineers call "publish and subscribe" -- just normally done by humans.  I write a Tweet and send it, and those of you who follow me on Twitter will see it in your timeline.

It turns in this case, Bianca is not interested in just any Tom, Dick, or Harry

Whether you read the Tweet or not is another question, if you have, say, Tweetdeck running my Tweet shows up on in your open application; message received as far as the software is concerned.

Twitter also allows me to publish messages and to tag them (via hashcodes) for others to subscribe to, although that mechanism is much weaker (there is no formal following of hashcodes in Twitter for example, but must applications allow you to open a persistent and updating search).

Net net, Twitter is using classic approach to integration between software applications and adapting it for communication between humans.

You're saying, at this point, OK I concede you're right, but so what?

One so what is that it gives us a different way to look at Twitter and describe its function a bit more rigorously.  That, I realize, is not a great benefit.

The other so what is that it gives an insight into what's missing from Twitter or what else you could do with it.  So what ( :-) ) are those insights?

Let's start with a kind of hack which is being used to enrich the content of Tweets, such as how a lot of Twitter clients show pictures that are referenced by URLs in the tweet.  The picture is not really part of the Tweet, but it appears to the recipient to be part of it.

Twitter on the Mac opens a small window with
the photo  from a Tweet without bothering you with the mechanics of where the photo's stored
The problem with this kind of solution is two-fold.  First, it is implemented in an informal and ad-hoc way.  Twitpic works, but if I want to build my own sort of Twitpic, it probably won't work.  Second, the picture is completely independent of the core Twitter system, so it could be deleted or modified in some fashion after the Tweet is sent.  Probably with Twitpic you cannot do that, but I bet some enterprising hacker is looking for a way to send a Tweet out with a lolcat, get it retweeted a million times, and then replace the picture with an advert for credit cards.

Still with me? Wondering why?  Ah, well, I'm finally getting around to my real point now...

Even with the current Twitter  we could -- albeit in a way that violates the TOS -- use it to communicate things other than (im)pure English (or German or Chinese or).  We could, for example, send complete messages structured for computers to consume, not humans, at least not directly.

Let me give you a simple example.  Suppose you're a band and you want to tell your fans where and when your next concert is.  You could send a snippet of <140 character prose "we'll be at Bob's Bar in Orlando next Friday at 8pm".  You could send something with greater detail: "we'll be at Bob's Bar, 1234 John Young Pkwy, Orlando 39999 at 8:00pm through 10:00pm".  Or you could send the kind of calendar invite many of us send around in email.  Twitter could display the invite in human readable format, a map of the location, and the date and time in a way that when we click it, it could go right into our calendar.

More examples beyond that -- well, start thinking -- there's a business model for a start up in a lot of them!

But even beyond that, you could (well, I could) imagine a new Twitter, one which is not meant for direct human consumption at all.  One that is purely meant for communication between applications.  You might still limit the message size, but might allow something a bit more reasonable -- say 1K -- that would allow for a decent sized JSON payload.  As a business model, you could sell access -- say a small per message fee to publish, but none to subscribe -- with perhaps a free tier of a limited amount of messages (100 per day?) per user to encourage adoption.  Or charge people for messages above a certain size.  Some messages could be ultimately intended for direct human consumption -- activity stream kind of things -- but some could be just for communication between applications.

The key thing would be that if we all agreed to use this system, it wouldn't matter what our underlying applications were.  And so long as each message type had a uniquely identifying marker, a million different message types could coexist without problem.  In sense, it's like the notion of routing web service calls over messaging middleware, something that was incorporated in things like the SOAP specifications but just about never seen in the wild.

Some obvious categories of messages:

  • Document update notifications: I would always have the most recent version of a presentation, or a contract, or a PDF.  This would be a much more refined way of doing sharing than, for example, shared folders in Dropbox, and a much more independent mechanism than Google Docs.
  • Schedule updates: It's just painful to receive an email every time a meeting gets changed, especially when it's a useless message like "adding Bob".  If we're all on the same calendaring system, we can do this now, but when we aren't, the routing of meeting invites and updates via email gets messy.
  • Bridging activity streams: People are trying to build walled gardens around activity streams, like Salesforce's Chatter.  Either you're in (and pay) or your out.  But if I don't know my subscribers, how would I know what system they are using and reach them there?
  • Software updates: you could have a generic demon that subscribes to update notifications from your application vendors, and then launches the application to update.  It's annoying, on the Mac, that Apple has its updater, Microsoft has it updater, Adobe has its -- and they all want to run all the time on their own.
What else? Again, it's up to your imagination to decide.  And your business accumen to bring to market.

The key is to have a single pub-sub service like a Twitter that's tuned for mechanical consumption.  If you like the idea, and think we should give it a go, maybe something small & open source, let me know at charles@mcguinness.us.  If you want to just take this idea and run with it, go for it.  All I ask is for friends and family shares when you go public :-) You never know, something like this could be ten times larger than Twitter in the apps world we live in.

Sunday, March 11, 2012

A Day in the life of Rush Limbaugh on Twitter

Rush Limbaugh's been a pretty hot topic lately, and he's certainly been a popular discussion on Social Media.

In the 24 hours preceding 8pm EDT (which is midnight, GMT), there's been roughly 14,000 tweets that include "limbaugh" in the contents, or about 10 a minute.  Keep in mind that it's a weekend, and Rush hasn't said anything on the air since Friday ...


Sentiment

Some breakdown of the sentiment in those tweets (click on chart to enlarge):


52% of the tweets registered as weakly or strongly negative, while 21% registered as weakly or strongly positive.  That's no surprise.  No matter who's talking about Rush or their position, they're probably not happy about what's going on. See the comments at the end about limits of sentiment analysis.


Hash Tags

One thing that is always interesting is the use of hash tags (#winning).  These are unstructured and uncontrolled, and so it is purely convention that is adopted by Twitter users.  For the posts that talk about Rush, the following are the top twenty hashtags:


HashTag Count Percent
#p2 824 9%
#limbaugh 585 6%
#stoprush 583 6%
#tcot 544 6%
#boycottrush 536 6%
#gop 262 3%
#taxpayerfunded 245 3%
#flushrush 232 3%
#rush 194 2%
#snl 186 2%
#waronwomen 182 2%
#gamechange 158 2%
#cnn 147 2%
#fem2 126 1%
#tlot 109 1%
#topprog 107 1%
#ows 102 1%
#rushlimbaugh 94 1%
#teaparty 93 1%
#news 82 1%


#p2 is the has tag for "Progressives on Twitter".  I was surprised to see it as the most popular tag.  #tcot is "Top Conservatives on Twitter". #fem2 is for feminists. #tlot is "Top Libertarians on Twitter".  What's interesting is that there is no hashtag which is reaching critical mass.  If you were to search for just #stoprush, for example, you would get only a tiny fraction of the posts about Limbaugh.

Twitter Users

There are no clear "top posters"; the most frequent poster is "Miaminonymous", who appears to just retweet everything, with 131 posts.

The top 25 people mentioned in tweets are:


User ID count % of Mentions
@thinkprogress 480 4.0%
@hipstermermaid 320 2.7%
@limbaugh 310 2.6%
@huffingtonpost 191 1.6%
@credomobile 182 1.5%
@politico 167 1.4%
@addthis 158 1.3%
@shoq 153 1.3%
@superguts 152 1.3%
@denisleary 152 1.3%
@billmaher 133 1.1%
@youtube 131 1.1%
@politicususa 121 1.0%
@tmorello 118 1.0%
@theblaze 117 1.0%
@cdibona 114 1.0%
@mediaite 112 0.9%
@anonyops 112 0.9%
@sandrafluke 102 0.9%
@thedailybeast 95 0.8%
@rushlimbaugh 92 0.8%
@krystalball1 80 0.7%
@boingboing 76 0.6%
@stoprush 70 0.6%
@thedailyedge 70 0.6%


Interestingly, @limbaugh is not the twitter account Rush uses, @rushlimbaugh is.  


Comments



What I make of this

Rush Limbaugh is still a hot topic in social media.  I would expect that tomorrow, Monday, if there is significant news related to Rush (more advertisers pull out or he says something controversial) we will see a spike in topics.  On the other hand, if something else rises to the top of the news cycle, we may see Rush take a breather on Twitter.  It is interesting that the conversation is still scattered -- huge numbers of disconnected users and hash tags.   There is not one conversation taking place; there are thousands.



Limits on Sentiment Analysis


Sentiment analysis is a mechanical assessment of the sentiment, positive or negative, in a tweet.  It does not necessarily indicate approval (if positive) or disapproval (if negative) of a particular subject.  Consider some contrived examples: "I hate the constant criticism of Rush" is negative, while "I am so happy that Rush is losing advertisers.  I love the ones who are quitting" is very positive.  The sentiment generally tells us whether the statement is happy and upbeat or negative and downbeat.  In large numbers, it is a crude assessment of a topic like I am using it in this post.

Thursday, March 8, 2012

Rush Fire

It is tempting to look at the recent dust up around Rush Limbaugh’s insulting name-calling of a law school student as merely another instance in a right vs. left battle in this country. Many times, commentators will mention that people on the left are equally capable of inflammatory comments as they go on to skewer Rush.  Some on the right, sensing it is a partisan issue, have defended Rush, even some women.

Although I have a strong opinion on the subject from both a political as well as a propriety view, there's something that a lot of people are missing about this incident: the rising political power of social media and, just as importantly, the dominating use of social media by women.

In a nut shell, Rush used fighting words against a woman, and women are fighting back and winning.

~6K Likes, ~8K comments


And the tools they are using are social media.  But, strangely, the traditional media is blind to that.  Take the first article I linked to in this blog post, from "The Week".  Where does it mention the intense conversation taking place on Facebook? The websites devoted to tracking the advertisers to Rush's show and barraging them with demands to stop advertising?  Nowhere.  If you were to just read the article, you would think that the advertisers acted out of moral indignity, not in response to pressure from the (female) public.

Nearly 300,000 people Liked Planned Parenthood,
about a third of those who like Rush.  You might be
tempted to think the Social Media audience is male
and conservative.  That is a dangerous assumption.


But let's look at the ways social media worked to rally a response to Rush Limbaugh.  First is the obvious channel of Facebook.  There are all sorts of friend to friend sharing of outrage, although it's hard for anyone to see much of that given the general privacy settings on Facebook.  But we can see some of the effects nonetheless.

Rush is getting a lot of traffic on his Facebook page, pro and con:

Nearly 20k comments on Rush's Page


And there's a whole bunch of Boycott Rush groups on Facebook, including this one:



This is pushing people to write to advertisers, asking them to drop support for Rush's show.  For example, take a look at Lifelock's page on Facebook:

And 2000 more on another post



You might say this is a tempest in a Facebook teapot, but have a look at what's going on out on the Internet. One example is "boycottrush.org"which redirects to "leftaction.com".  Let's look at its traffic (courtesy of Alexa):


It has gone from being nothing to being somewhere around the 14,000th most visited site on the web.  That seems insignificant until you drill down and discover that about 1.5 million visitors have stopped by this page since the start of the Rush fire. (see footnotes)  This is a lot of visitors.

And these visitors are mostly female and aged 45 and up.  These are not the people most associate with savvy internet users, but that assumption is clearly wrong (as an aside, these are the people who tend to vote).  Perhaps not too surprisingly, many of these are women who are veterans of the women's liberation movement.  They obviously haven't lost their fervor, and they're taking it online.



This group is well organized, maintaining a web-collaboration based spreadsheet in of advertisers (former and current) with contact information which they are keeping current with every show:



And Rush has been a popular topic on Twitter all of a sudden, to no surprise:



Sadly, there is no sentiment tracking that would help us differentiate the pro-rush vs. anti-rush tweets.  But given the other activity on the internet, it's safe assumption that a lot of it is anti.

I also find it amusing that there seems to be a bit of a gender divide in the analysis of what's going on. My observation is that male reporters seem more blind to the social media power that is being brought to bear and are quicker to focus on the partisan or free-speech aspects of the situation rather than the misogyny that women respond to.  It would appear that if it's not your ox that's being gored it's easier to be dispassionate.

Coming on the heels of the Susan G. Komen situation, it's clear that women are starting to find that they have a real power they can exercise through social media.  (For us men, it would be best if we pay attention to it!)

You might think that this topic is far afield from the core of enterprise technology, but keep this in mind.  Rush Limbaugh is not just a man, but a very large business undertaking.  It is easy to say it's just him, but the problems that cropped up have affected his business and his scores of partners.  Any business which is  in the public eye can have a problem like this.

Rush himself seems to have gone days before he noticed the firestorm he started, and as a result made a situation much worse before he issued his apology.  Had he been more aware, and apologized sooner, it would have better diffused the situation.  It seems to me that a company (and Rush is a company) that notices a mistake quickly and reacts to it looks much better than one that lets things drag on.  The former seems like a company that cares, while the latter seems only like a company that didn't realize it had been caught until too late.

The advertisers have adopted a different strategy, which is to lay low mostly.  Regardless of whether they stay or go, their communications tend to be as succinct and under the radar as possible.  Everyone knows the risks of offending one side or the other, and many seem intent on saying as little as possible, probably hoping it will all blow over eventually.  Whether you can hide in this era of social media is an open question; perhaps for the partners they can, but certainly Rush cannot.

In the end, neither Rush nor most of his advertisers were attune to the signals that were coming from the social media. Your CEO may not call a college student a slut in front of millions of people, but it's just as easy to unwittingly commit some other faux pas that triggers a massive response.  And if you are an advertiser, you have to contend with the perception that your advertising is an endorsement of the person you are sponsoring; now you have to worry about partner reputations as well.

To be fair, most uninvolved people (read: men) didn't pay a whole lot of attention to social media initially.  But the situation shows that businesses cannot afford to turn a deaf ear to the voices being raised on the internet. The only questions left now are how to listen and how to respond.


Footnotes:

Estimate of 1.5 million visitors to leftaction.com:
http://www.internetworldstats.com/emarketing.htm estimates total internet population at 2.3 billion users 
http://www.alexa.com/siteinfo/leftaction.com# estimates daily traffic since the start of the Rush fire on average of .014% of the total internet population.  This also gives demographics of the visitors. 
That works out to about 300K daily visitors, or 1.5 million for the 5 days of the Rush fire tracked by Alexa at the time of this writing.


Saturday, March 3, 2012

Windows 8 First Impressions

I've downloaded the Windows 8 Consumer Preview, and have some initial thoughts to share about Win 8 and the new Metro interface. I'll start by saying why it's hard for me, or any one really, to do a decent job of a review, and why we should all be skeptical of any review at this point.  Then I'll move on to trying to ignore my own advice in the previous sentence...

So let's start ...

When you look at a piece of software, you have to envision living with it.  You want to do your work, play your games, surf the net, whatever it is you do with a computer.  And when you have a new, and incomplete, version of software, you can't do that for real.  And so you guess. At least I do.  And odds are I'll guess wrong.  Worse, we all have our biases.  In my case, I was a loyal windows users right up to Windows XP.  After that, we started to switch to Macs at our house.  The last version of Windows we bought was Vista, which was a disaster.  After that, we pretty much are all Mac.

I still use Windows on a daily basis, but it has been demoted to a virtual machine to be summoned for running windows only software.  So, as you can see, I'm not your best windows fanboy to review Metro and Windows 8.

Still, I was excited enough by what I saw to download and install it -- so I'm willing to let it win me over.

So let's have a look at Windows 8...


The key thing about Windows 8 is the Metro UI, as you see it in the first screen shot.  The idea is to move to a more tablet like experience.  And the Metro UI does that, with its large square, easy to hit with a finger icons.  Of course, only a subset of the apps are there, but a right mouse click brings up an icon to see all your applications:

Which leads to:

Not quite the old start menu, but not really far from it.

Metro Apps like to run full screen:

Which is very tablet like.

But you can still run traditional windows (small w) applications as well

Interestingly, the windows desktop is, in some sense, a full screen Metro application.  It is treated like that.  So we you go to switch tasks, the desktop appears as a whole:

Except when you alt-tab, then the desktop programs are distinct:



Looking at it from the viewpoint of a desktop system, there's a couple of things that are interesting.  Both Apple and Microsoft seem enamored of the full screen application.  It's taken from the tablet paradigm for sure.  And, I suppose if you have a smallish laptop display, full screen is useful in situations.

What's amusing (or alarming, you pick) is that this move to an app at a time goes against the entire development history of Microsoft.  MS/DOS was, of course, an app at a time full-screen operating system!  And yet very quickly, as memory capacity grew on PCs, people started trying to find ways to multi-task.  At first there were a variety of add-on multi-taskers to MS/DOS.  Eventually, Microsoft Windows came out, although the first versions were not that good at multi-tasking.

Then, as multi-tasking became more reliable, the holy grail for Microsoft was desktop integration.  You probably don't remember a time when you couldn't copy from Excel and past into Word, but trust me, that was a big deal when it happened.  The idea that you could have a spreadsheet and word processor running at the same time and seamlessly move data back and forth was amazing.  You may want to think that Microsoft's monopolistic tendencies crushed its office competitors like Lotus and Wordperfect, but it was that little trick of integration that made Office dominant.

With Metro, we're not really throwing that away, but it feels like it.  Think about how you would insert a small table you built in Excel into a PowerPoint presentation: you would have two windows open, PowerPoint and Excel.  You'd select the table in Excel, copy it, click on the PowerPoint window, and past.  You do that sort of thing all the time without thinking about it.

But with Metro, assuming you had a Metro Word and Metro Powerpoint, the navigation between applications is more disjoint.  You have to either execute weird mouse moves or alt-tab to get between the applications.  It won't seem smooth.

But, you might ask, what if you aren't really running multiple applications at once? What if you are just logged into Facebook and whiling away the hours?  Well, I admit, Metro is probably great for that.

And that leads to my observation: Metro -- like iOS to be fair -- is designed for the consumption of content, not the creation.  It's like there's two worlds of Windows 8: The creation side (traditional windowing usage) and the consumption side (Metro).  That is a strange dichotomy.

To me, it feels like Microsoft glued two operation systems together -- Windows 8 desktop and Metro tablet, and decided to call it a single operating system.  This is very Microsoft like; when Windows 3.x came out, you had to start MS/DOS and then boot Windows, but you always knew that you had an MS/DOS world (to run your legacy DOS applications) and a Windows world.  It's the same thing, just twenty years later: You have a legacy Windows world and a new Metro world.

If you compare this to how Apple is approaching the integration of iOS and OSX (no longer Mac OSX as of Mountain Lion), there is no separate world of "OSX" apps and "iOS" apps on the desktop.  You can launch and manage the same applications in both the traditional way as well as via the new iOS-like ways.  You can run the same applications in a windowed mode, or in full screen.  The power is in the hands of the user; the transition is up to the user to make or not make at his or her own speed.

I had hoped to find Metro and Windows to my liking.  I had hoped to find something new and useful and a real advance of the desktop paradigm.  I had hoped to find a reason to want to bring Windows back into my daily life beyond just for a few Windows only applications.  But what I find is a forced, awkward, and disconnected experience.  Here's hoping that it improves -- a lot -- by GA.

Friday, March 2, 2012

Real numbers for smart phones

I read a lot about mobile platforms, and there seems to be a constant battle for bragging rights between the iOS and Android platforms.   There's a lot of talk about how activation rates for Android phones are so much higher than for iOS, and therefore that iOS devices have a relatively smaller market share.

The follow on argument from the Apple boosters is that, well, Apple is smaller but they have all the profits.

But is Apple really the smaller vendor?  I was browsing through a report from the Pew Research center (see the report here), and it said that of smart phone users surveyed:
  • 20% of cell owners now describe their phone as an Android device, up from 15% in May 2011
  • 19% of cell owners now describe their phone as an iPhone, up from 10% in May 2011
  • 6% of cell owners now describe the phone as a Blackberry, down from 10% in May 2011
Statistically, then, Android and iOS are tied. (Note that 45% of all cell phone owners say they own a smartphone, so there's nobody missing from those percentages).

What's even more interesting is that Android users are up 5% in about a year, but iPhone users are up 9%.  That seems terribly at odds with the conventional wisdom about adoption (like at business insider).

I have a couple of thoughts on the subject:

  • iOS has a good head start
  • Apple has a stronger support for keeping 1 and 2 generation old devices up to date with software.  If you have a 3GS, you are still able to get iOS 5.  Android, on the other hand, tends to orphan devices fairly quickly.  So people may stay on an iOS device longer than people on Android devices, resulting in a longer in-service life for iPhones.
  • Similarly, the fact that a few years old iPhone is still able to run current software means that it has value in the used market.  An old and out of date Android device is not going to look attractive compared to getting a new one from the carrier.
  • How many people get an Android and switch to iPhone later?
Because of these, I suspect there's a lot of retired yet fairly modern Android phones out there.  And that's why a random survey of what people actually use shows the iPhone still holding even with Android.

Wednesday, February 22, 2012

Recovering from a hacked wordpress website

It was a dark and stormy night ...

The infection bootstrap file

It was late last night, and I wanted to make a slight change to one of my websites before heading to bed.  But when I brought up the home page of my site, I was startled when the normal front page was replaced by a glaring red Google warning that my site was serving up malware!

So much for going to bed anytime soon.  I've had this happen to me before, years ago: a long stagnant and nearly forgotten web site of mine used a third party plug-in which got hacked and pushed me onto the Google blacklist.  That was no fun to undo, and I wanted to clean up the current mess quickly enough to avoid being branded a malware site.

As a bit of background, the website in question runs on Dreamhost.com's hosting service and is built with Wordpress.  I've been a Dreamhost customer for about 10 years now and, while they've probably had more than their fair share of problems in the past, the combination of features, flexibility and price have kept me a reasonably loyal customer.

I SSH'd into my account to look around.  I discovered that the hack, as far as hacks go, was pretty simple.  Just about every PHP file (as far as I know, every single PHP file) on my account had one line of script added to it.  This script redirected people's browsers to a malware site.  That was interesting, because it meant that somebody was able to sweep through every file I owned and rewrite it.  It wasn't a site-specific attack -- all the sites I hosted, even a bunch of dead ones, were affected.  Somebody was able to either (a) run code on my account (which seemed most likely case -- that's how I'd do it), perhaps by having the viral payload in each PHP file try to infect other files on my account or, (b) had some FTP based program that kept running through and fetching files to infect them (seems unlikely and harder to implement).

As an aside, can I just say how much I hate PHP?  It's too-stupid to see a variable without a superfluous dollar sign.  It has a bastardized object model.  And it's got a security model that says as long as the file ends in .php  it gets run as code.  Code signing? Meta files to separate executable content from data? Nope.  If you can figure out how to upload a rogue .php file to a web site, you can run it with full permissions on the server.  That's just a constant invitation to disaster.  Like the one that happened to me.

Anyway, there I was, staring at a massive corrupting of all my web sites that used PHP.  The only thing that appeared to save me is the infection was additive rather than replacing code -- it just stuck a line of code at the beginning of each PHP file:

What got put in front of every PHP file
Still, I had two big tasks ahead: (1) stop any future damage and (2) remove all the current damage.  Mind you, it was not phrased like that in my mind.  It was more like "oh shit I got to fix this fast and figure out how to stop it from happening again." Neither turned out to be real fast, unfortunately.

Let me start with the second task, undoing the damage.  The damage consists of prepending the PHP files with an encoded script, most often in base-64 but sometimes just using hex escapes to obscure the evil logic.  I'm not sure why they bothered, though, other than to look cool. It sure did stand out when I looked: a more stealthy injection would be harder to find intermingled with all the other code in the PHP files.  But the consistent nature of the hack made it easy to spot.  So easy, in fact, somebody has written a fixit script to remove the changes.  So backing out the changes was no problem. (note that it removed the infection from my files, but did not find and remove the additional files placed in my web site.  Read below about those).

The first task was the real challenge and what took all the time.  I removed all the hacks to my files, but within a half hour they were all back again.  This meant that the infection process was still underway or had been retriggered. Not pleasant.

The term "fog of war" come to mind describing my state.  Somehow, I was able to uninflect files only to see them re-infected, but I had no idea how it was being done.  The following possibilities seemed to exist:

  • Someone still able to log into my account
  • Someone had super user access to the server
  • Wordpress had some sort of flaw in it that allowed people to upload or modify PHP files
  • Wordpress had some sort of flaw that allowed people to modify the database where they could insert code there (that would then infect the files)
  • I missed some infected file that was being run and reinfected my sites
  • Something else?? Some other hack that I had closed but leaked a user id or password to the server or database??

Immediately I took all my web sites off line -- I simply moved the contents of the websites into a separate directory and put a place holder index.html file into the root directory of each web site.  That bought me time.

I started by thinking it might be some sort of wordpress exploit.  I had turned on allowing people to register user ids in one of my blogs shortly before the mess had happened, so I guessed it might be some sort of SQL injection problem.  I proceeded to export my posts from wordpress, and then did a scorched earth reinstall.  I then re-uploaded my posts, rebuilt my customizations to the templates and put the web site back online.

And a half hour later, it had been hacked again.  Ok, that wasn't it.  Maybe an initial exploit opened the door for a subsequent one.  Or maybe it's that my hosting provider, Dreamhost, had been penetrated themselves and my assailant had root access. Or? It was still foggy.

To close the door on the first one, I went through a round of password resetting.  I changed the FTP/SSH password.  I changed the database password.  And then I cleaned the hacked PHP files and reopened the web site.

Again, before too long, the PHP files were getting hacked yet again.  This was frustrating.

I was starting to think that my hosting provider was hacked and there was nothing I could do. But I tried one more thing in desperation.  If I had my own server (say Amazon EC2), I would be able to exert a lot of control over the virtual machine -- reboot it, for example.  But with a shared hosting provider, I cannot.  But there's a small, almost hidden command in the control panel for Dreamhost websites, which terminates all existing FTP connections: what if someone had kept open an FTP connection? It seemed crazy, but I had all connections terminated.  I also went into my sites and had all existing cookies invalidated, so someone who had a login to my web sites would be forced off.  (If you're a wordpress user, look at the file wp-config.php and the section about changing the salts).

After that, I went through another round of changing passwords from the database on down.  I'm a Mac person, and the program 1password is invaluable in generating and maintaining a set of secure passwords.

I finally finished up very late last night, and today it appears that it is holding.

But there's one more thing ...

This morning, the hosting provider, Dreamhost, provided me a list of potentially hacked files in my account.  There were a whole bunch of other infected files I had missed, but whose infection patterns looked different. The first set -- the set I had seen and cleaned -- started with a PHP eval statement, but the second set used a bunch of hex escapes (see the first illustration in this entry).  My searches using grep didn't look for them, and the fixit script didn't see them either.  So I just deleted them all.  They were not infected files, they were the starter files to initiate an infection.  (They were all named either r.php, rr.php, or footlib.php -- the last of which was a small boot strap program that allowed someone to peruse the files in my account via the web.  It appeared to be the source of the initial infection).

I'm going to stay on top of my web sites for a while until I'm comfortable the threat has passed.  I'm not sure how the infection got in, so I'm not sure what to look out for.  Dreamhost had been hacked into itself in January, and I'm thinking that an infection script was put into one of my web sites where it laid dormant until last night.

All this is no fun.  If I wasn't familiar with Linux shell commands and PHP, I wouldn't know what to do. I guess host my blogs at Tumblr?  On the plus side, I've learned a lot about how wordpress works.  And did I mention how much I dislike PHP?

Sunday, February 19, 2012

Molly Wood goes all Apple ...

I heard this in a pod cast, and I thought it was interesting.  Molly Wood of CNet has burned through several Android phones and found them each to fail her in one way or another.  So now she's gone to an iPhone 4S, adding it to her MacBook Air, iPad, and iPod.