Crashing Computer: Recovering from Disaster Recovery Plans

A couple weeks ago, I rebuilt my old computer with a new motherboard purchased for disaster recovery in case my original mobo died, but had some problems with crashing. The problems seemed to be related to RAM. Some configurations crashed pretty quickly, and often, and other ones were pretty stable, but nothing was as stable as the system I was hoping to retire. Continue reading Crashing Computer: Recovering from Disaster Recovery Plans

Crashing Computer, Troubleshooting RAM: Sampling Done

I bought a computer motherboard and it’s been crashing. I’m trying to figure out what’s wrong, by recording how long the computer is up before it crashes. The sampling has been completed, and it’s pretty obvious that one of the RAM sticks is not working with this motherboard.

Continue reading Crashing Computer, Troubleshooting RAM: Sampling Done

Crashing Computer, Alternative Troubleshooting Possibilities

I’m going to take a short break while I keep recording the crashes that I’m experiencing.

Generally, it’s not a good idea to keep using a crashing computer. It’s better to have it fixed, because faulty memory = faulty data. The computer might write faulty data to disk.  Unfortunately, I don’t have another computer that I like as much as my olde battleaxe.

I decided to do some reading about troubleshooting RAM. There are some gems out there:

They all had a lot of good advice. Also, it might not be RAM; it could be an electrical problem almost anywhere in the system. It could be heat – and that’s why I bought some expensive, slightly old and dry, thermal paste, which did a kick-ass job to reduce the CPU temperature.

Remember, this RAM was working just fine in the other computer, so, it could also be the motherboard or a configuration. What makes me think it’s RAM, is because this RAM was bought in a pair, and one half of the pair failed several  years ago. It didn’t show up in Memtest as bad, but it crashed the computer. Tweaking the settings didn’t help much. Removing that RAM module ended the crashes.

RAM Goes Bad Over Time

In the 1980s and 1990s, nobody thought RAM aged and failed; today, it’s common.  RAM wasn’t so dense, and it was usually soldered in, or installed with 96-pin or 128-pin modules.

As the chips and the connectors have gotten more dense, they’ve become more fragile. They’re more sensitive to power spikes, heat, moisture, rust, and vibrations.

Additionally, the Error Correction features of newer RAM will mask the problems. It’s a little like the error correction on hard drives: when you get a couple errors,  you should realize that these are the errors you see: there are many more errors you haven’t seen.

Alternative Troubleshooting Possibilities

Troubleshooting the Power Supply

It helps to have an oscilloscope to check the power supply, because the problem isn’t that there’s no power, but the power fluctuates.  That causes problems like crashes that feel more like software or RAM problems.

Sometimes, they feel like disk problems, with the computer halting or pausing.

Lacking that, you should usually just have a spare PS around.  Power supplies used to be the main component that failed. I suspect they are still at or near the top for causing headaches.

Troubleshooting the Hard Disk

You should always check the SMART values. They are somewhat useful. Some disks will tell  you when they’ve had bad sectors.  Nowadays, all disks come with bad sectors, which are remapped to good areas of the disk, so you often don’t see this in the SMART tests.

I suspect the companies are hiding the failures.  I think, if something shows up in SMART, like a bad sector, it’s time to get a new disk.

Disk problems feel like computer slowdowns or network congestion, not crashes.  You might be doing something, and everything just freezes, perhaps for over a minute, and then continues.

A Bad PS/2 Keyboard

Hardly anyone has a PS/2 keyboard anymore, but I had one that induced crashes.  I don’t know what happened, but I suspect that the fact that PS/2 operates on interrupts could have contributed to the problem. The keyboard wires directly to the keyboard controller, and that controller sends interrupt signals to the CPU.

USB is different. It’s more like a network connection, and each port is a little bit isolated from the computer, because there’s the USB hub, and then the controller chip that handles USB (the southbridge on Intel boards), then the CPU. It looks more isolated.

Bad Mounting

I have crappy mounting screws installed in this case.

If a trace touches the case, it’ll cause a crash.