Our internal network here has been having problems ... I was reminded that the speed of collapse in a network is often a function of the natural frequency (speed) of the network, while the breadth of failure depends on a number of factors, including load and the degree of interdependence within the network.
The problem was eventually traced to a problem with one piece of software on one machine on our intranet.
...
Our Intranet network could have been built to be reliable, but instead it was built to be "efficient". Far from being a network of fail-safe systems, our network is a network of interdependencies. When the system was loaded, a single failure brought the whole system down.
...
Our network operates at electronic speeds, and it failed with the same rapidity.
Understanding how this happened is critically important. There are four parts to creating the complete meltdown of a network:
1. Create a network by building connections between systems.
2. When a particular part of the network approaches overload (goes red), recognise that this is happening and use the connections you have created to allow you to switch load to another part of the network.
3. Continue doing this until all areas are red.
4. Now add more load.
...
In summary: The ability to measure and monitor the system gives us the capacity to avoid small avalanches in individual areas. However, if we keep adding load without adding capacity we overload the entire network and thus make an all-encompassing avalanche inevitable.
If we can’t add capacity, then it would have been better to allow a series of small avalanches.
Read the whole article which goes from computers, to sand castles, to financial systems, to peak oil and beyond.