The Importance of Triple Redundancy in Crucial Systems

(I have touched upon this topic in another blog and in the book. I regard it as more important than I have previously been able to do justice, and indeed beyond what I am capable of doing now. This is a topic that we will have experts advising, so the initial residents/founders have as complete an understanding as possible in making design decisions.)

Modern systems of all kinds are staggeringly complex. The production of a single product will often have thousands of separate steps, and include sub-components that themselves also had thousands of steps in their manufacture. (This may extend multiple levels deep to sub-sub-components.) Extrapolate this to automated systems that run the repetitive aspects of an abundance-based society, and we have a serious issue.

The good news is that sensors have never been cheaper, and costs continue to plummet. Soon, it will be trivially inexpensive to monitor all critical variables within a system in real time. When such monitoring is by triple sensors, all identical and all expected to produce identical readings at all times, this is known as “triple redundancy”. When any of the sensors produces a reading different than its two other triplet members, it is instantly presumed to be defective, and flagged for prompt replacement. Until such replacement happens, the whole triplet and the systems it monitors are themselves subjected to special monitoring.

This is how organizations such as NASA have minimized catastrophic failures in environments (i.e., space) where there is no room for such failure, because survival of the mission and even astronauts’ lives depends on avoiding it. Further, there is often neither time to figure out a solution on the fly nor access to resources that would be available had the problem happened on Earth.

This is why we find movies such as Apollo 13 so captivating, and the actions/successes of the astronauts so heroic. We can easily imagine how horribly wrong things might have gone. And NASA is hardly perfect. I doubt that humanity will ever forget the Challenger disaster; a catastrophe that not only cost precious lives but set the whole space program back by years. It was apparently due to a single faulty O-ring.

The first Celebration Societies will surely be terrestrial and not built in space. Therefore, any system failures (and there will be such) can be addressed with the massive resources of terrestrial technology, parts inventories, and expertise. Further, such failures are unlikely to be potentially catastrophic. Nevertheless, since the first such society will serve as a showcase for our ideas and their viability, it is essential that the society not experience existential risk of any kind.

Most such risks can be averted by making all critical systems (those in which a failure would have significant consequences, not easily remedied) redundant, with triple-redundant sensors continuously monitoring important variables to assure that the variables remain within tolerable limits.
Since much of the automated systems will be, essentially, software, we need not only reliable redundancy but also defense against malware. Obviously, defense against malware is not trivial, and indeed it is expected to shortly become an ongoing battle between AIs, since humans will not be fast enough to either defend or attack successfully when opposed by AIs.

There are two possible defenses of which I am aware. The first is to quarantine the city-state’s mission-critical systems against any input of any sort beyond very limited, recorded and real-time monitored communications with Citizens. (I can see no need for those systems to have an internet connection though, of course, I may be wrong.) Second, an ally who remains anonymous at this time is deeply experienced and connected in the world of Silicon Valley software. He has informed me that a startup of which he is part has figured out a definitive solution to malware. I hope he proves right.

We cannot avert all catastrophic risks. For example, a modest sized asteroid could obliterate a Celebration Society either by striking it or striking elsewhere and causing, for example, a tsunami. However, the odds against such an event are extremely high. Further such risks can essentially be eliminated by building a second Celebration Society as soon as possible. This is, not coincidentally, the same argument being made in favor of Martian colonies to assure humanity’s continuation in the event of a planetary catastrophe.

As I’ve written elsewhere, Martian colonies should be a fine place to build Celebration Societies, just as soon as the planet has been terraformed. Meanwhile, we can automate and monitor the operation of that automation on a continuous basis. In fact, the monitoring can itself become automated—in effect, a second software system that monitors the actual operating system.

This could potentially be taken a further level deep: a third “assurance” system could run tests of the monitoring system on a regular basis, in effect stress testing it to confirm its proper functioning. By making the monitoring system itself triple-redundant (three such systems, all running separately and continuously, all tested by the “assurance” system on a frequent basis for identical and correct results), it is hard for me to see what could go wrong.

That said, human failures of imagination are well-known and well-documented. Mine is surely no exception. This is but one reason why I favor the entirety of the Celebration Society’s systems being under the ultimate control of the Citizens as a body.

Leave a Reply

Your email address will not be published. Required fields are marked *