Skype today launched Video for iPhone, iPod Touch and iPads. This new version of the Skype service application software lets users make and receive video calls from iPhones, iPod touch and iPads, with instant messaging for other Skype users, over both Wi-Fi and AT&T's 3G network.
Was it a test for this iPhone video application that brought down Skype's Video Network Last week? The story going around last week was that a new release for Apple software - possibly the Skype iPhone Video application announced today - had a problem and triggered the Skype server failure when installed first on one and then several Skype 'supernodes'.
But don't blame the Apple software application.
Skype's supernodes act as both offline message (IM/SMS) relays and as Skype's Chief Information Officer noted yesterday "a directory, supporting other Skype clients, helping to establish connections between them and creating local clusters typically of several hundred peer nodes per each supernode."
The initial crashes brought down 25% to 30% of the Skype supernode servers - just before the normal daily peak. This in turn led to traffic overload that created extensive delays in the support servers responsible for offline instant messaging. This resulted in long response delays to some to Skype Windows clients and 20% of these had an old software bug that then caused them to crash.
The official Skype story was released yesterday by Lars Rabbe, Skype's Chief Information Officer, who describes the "snowball" effect that blocked most Skype users for 24 hours on 22nd.- 23rd. December 2010.
"50% of all Skype users globally were running (an older) 126.96.36.199 version of Skype for Windows, and the supernode crashes caused approximately 40% of those clients to fail. These ... included 25–30% of the publicly available 'supernodes', (that) also failed as a result of this problem."
"The failure of 25–30% of supernodes in the P2P network resulted in .. massively increased... load as (supernodes) reconnected to the peer-to-peer cloud... just before our usual daily peak-hour (1000 PST/1800 GMT)". As users tried to reconnect to the system, they generated "traffic to the supernodes that was about 100 times what would normally be expected at that time of day" and overwhelmed the remaining supernodes bringing the whole system to a standstill.
It is interesting that some sources focus blame on Microsoft, not just Skype's network, servers and software, but maybe the problem is more profound.
P2P Server Architecture.
Serious questions need to be asked about a network service architecture that allows:
- Application software to crash what should be 'carrier class' servers performing network functions
- P2P software that causes both network and user device based clients to crash as a result of network overload problems
- Network server problems that spread automatically across a large number of supernodes
Network servers need to be especially resilient and intelligent in how they 'fail-over' in a distributed networking environment; but a robust Service Architecture is always a pre-requisite.
Let other P2P and 'Cloud' service providers beware.
On a positive note Skype brought in massive extra capacity to stabilize the network and was also able to restore Group Video Calling functionality in time for Christmas.
Software Release Deployment
Lars Rabbe also committed to review Skype's "testing processes to determine better ways of detecting and avoiding bugs which could affect the system.". Hopefully this promise includes:
- 'Old fashioned' regression testing of all old versions of client software
- Large scale network testing that does not impact live users - especially at peak traffic times!
These are rules that traditional service providers have followed for decades. Perhaps a little more respect for the "old fashioned" network operators and their software release processes is warranted.