Finding a Weird Network Fault Due to a Faulty Wire

I was called in to help fix a network that had become discombobulated. I didn’t end up fixing it, but one staffer there did the trick by disconnecting a switch with a bunch of wires plugged into it. It turned out to be a bad cable.

It took a long time to get there, but, figured out a technique that would have brought us to the errant equipment a lot sooner.

Binary Chopping or Binary Search

The main lesson learned – which I didn’t do, but will do in the future – is to shut down the failing network. Shut down the computers and switches. Bring up the Internet connection and the router, making sure it works. Then, bring up half the network, and see if it functions. If it does, the problem is in the other half. If it fails, shut it down, and bring up the other half.

(Note that you must focus on the switches closest to the backbone first, so you can test the internet.  Also, by “shutting down the network”, I mean only the switches, not the computers.)

Leaving the good half up, bring up half of the remaining, faulty set. Test, and determine which half contains the fault.

Repeat this process of bringing up half of the remainder of the bad network until you find the problem.

You will reach the fault in minimal time.

As an example, consider if we have a LAN with a total of 256 ethernet ports.

First, you power up 128 ports to find which half the bad device is at…

256 / 2 = 128

Then you power up 64 of the remaining ports…

128 / 2 = 64

Then you power up 32 of the remaining ports, and so on…

64 / 2 = 32

32 / 2 = 16

16 / 2 = 8

8 / 2 = 4

4 / 2 = 2

A this point, only two ports are unknown, and you can just unplug one to see if it helps.

So it takes around 7 steps. If each step takes 10 minutes, your entire diagnosis takes 70 minutes. That’s not bad.

Also, note that even if the fault causes a total network failure – the network is basically functional most of the time. The only time problems happen is when the bad fraction of the network is powered up.