PDA

View Full Version : Mars Rover Spirit Uses Windows XP?



Kevinito
2004-Jan-24, 02:28 PM
I was just curious because of all the reboots (over sixty) Spirit's software has endured. Kinda sounds like my new PC!
Seriously though, I do understand the complexity of the rovers and the hardware/software, but the fact of the matter is that thirty year old technology (aka Viking) was more reliable. The Viking mission was projected for only 90 days; however, it lasted for years, over six to be exact (according to NASA tv).
$410,000,000 is not worth the price tag of a few pictures on the Internet. The real science hasn't even begun yet. Just to leave with a quote from AP, "``The chances that it (Spirit) will be perfect again are not good and the chances that it will not work again are also low,'' project manager Pete Theisinger said at NASA's Jet Propulsion Laboratory." There is still hope.

-Kevin

Jack Higgins
2004-Jan-24, 03:37 PM
Viking's technology was much less complex than the current rovers, so there was far less to go wrong on them. They ran off RTG power sources- so unless something major broke they were guaranteed to last a long time!

A lot (73mbits) of engineering and telemetry was recieved back today anyway, so by the time opportunity is on the ground, we'll know exactly what's up with spirit too.

ToSeek
2004-Jan-24, 05:08 PM
The rovers use an operating system called VxWorks, which is a real-time OS frequently used in space applications. It was also used for Pathfinder.

And it's unfair to come to sweeping conclusions about reliability based on three examples.

LTC8K6
2004-Jan-25, 04:58 AM
Compared to Win95/98/ME, WinXP never crashes.

Demigrog
2004-Jan-25, 06:10 AM
VxWorks is one of the most common embedded OSes used in industry, so I doubt the OS is at fault. I wish I could find detailed info on the rover computers (NASA's site is written at a 6th grade level), anyone know of a site? If its running VxWorks on a VME system, it sounds rather similar to most general purpose industrial computers out there. I design systems like that for a living, and what it sounds like to me is that something happened during a write to Flash memory, probably a power glitch. Flash doesn't handle power failure well during a write operation.

What disturbs me is that there seems to be only one computer in the darned thing. I realize there are mass limitations, but I can't believe they wouldn't have redundant computers, or at least a specialized computer handling critical functions independent of a general computer controlling data crunching applications. We use triple-modular-redundant (TMR) controls on all sorts of things in industry, which protects against things like flash corruption of a single controller. I hope I'm just missing something, or my opinion of NASA's computer engineering is going down several notches...

Demigrog
2004-Jan-25, 06:54 AM
I decided to look into the rover computers a bit more. Just thought I'd post some links.

Main computer is a single-board VME card using the RAD6000 CPU, a radiation hardnened 32-bit RISC processor used on many other spacecraft, including Pathfinder and Stardust. Processor performance is about 20 MIPS according to BAE Systems' press release. http://www.baesystems.com/newsroom/2003/july/250703news1.htm

Interesting presentation on the Pathfinder software, which was the basis for the Spirit and Opportunity rovers:
http://avatar.cs.colorado.edu/~siewerts/marspath/jpl/tsld001.htm

I can't find anything on the system architecture of the rover electronics module. I can't even confirm that only one RAD6000 is in the rover.

Nanoda
2004-Jan-25, 08:29 AM
More systems means more weight, more electrical use, and more points of failure, as well as not being fast, cheap, or efficient. Of all the things to fail, frankly the computer would be my last choice anyhow. Much better to add another battery or probe or something.

The mars rovers contain three types of RAM (http://planetary.org/news/2004/spirit_sol-20.html). Regular, flash (for long term data storage), and EEPROM (where the code is stored). It does in fact contain redundant flash ram, but whatever is causing the problem appears to be affecting both, so one would suspect something wrong with the flash controller, address lines, whatever.

In any case, I reject your assertion that backups of everything is needed. Besides the issues I already mentioned, TMR isn't required. I Googled a bit about it, but it seems to fulfill the role of failover equipment. I own a Dell server with dual CPUs, dual power supplies, RAID, etc etc (and I want rid of it. 200$ OBO!). It's for supplying clients who'll throw a hissy fit if they can't check their mail. If something goes wrong here (as it has), mission control will say "Darnit, can't use <insert function here>. Keep going on the other stuff".

Besides which, Spirit does have TMR support. It landed a few hours ago. :P

Edymnion
2004-Jan-26, 04:08 PM
Compared to Win95/98/ME, WinXP never crashes.Amen.
I had to use Windows ME for years. Damn thing would just spontaneously reboot if you even looked at it funny. I swear, funny story, was telling somebody what a piece of crap ME was, and, no kidding, the computer rebooted itself not 5 seconds after I finished the sentance. I think it heard me...

*is very afraid*

Demigrog
2004-Jan-26, 07:57 PM
More systems means more weight, more electrical use, and more points of failure, as well as not being fast, cheap, or efficient. Of all the things to fail, frankly the computer would be my last choice anyhow. Much better to add another battery or probe or something.


The entire point of redundant controllers is to reduce points of failure. There is certainly a size, mass, and power trade off, but it is impossible to tell how much of a trade off it is without more information than is generally available on the rover on the web. Iíd love to have more information; maybe there is a JPL webpage buried somewhere that would help.

In any case, software is usually a large source of failure, even in NASA probes. It is quite logical to me to have a fall-back computer for when your main computer decides to reboot itself randomly. The computer is such a critical component on the rover that it is just disturbing to me that there isnít a backup.



The mars rovers contain three types of RAM (http://planetary.org/news/2004/spirit_sol-20.html). Regular, flash (for long term data storage), and EEPROM (where the code is stored). It does in fact contain redundant flash ram, but whatever is causing the problem appears to be affecting both, so one would suspect something wrong with the flash controller, address lines, whatever.

Iíve seen nothing to suggest there is redundant flash memory on the rover. Sources?



In any case, I reject your assertion that backups of everything is needed. Besides the issues I already mentioned, TMR isn't required. I Googled a bit about it, but it seems to fulfill the role of failover equipment. I own a Dell server with dual CPUs, dual power supplies, RAID, etc etc (and I want rid of it. 200$ OBO!). It's for supplying clients who'll throw a hissy fit if they can't check their mail. If something goes wrong here (as it has), mission control will say "Darnit, can't use <insert function here>. Keep going on the other stuff".

TMR and similar protection schemes are designed to ensure continuity of real-time control in industrial control applications. Typically they involve having two or more (three in TMR, obviously) completely independent computers running the same control software. Outputs of the control software are ďvotedĒ between the controllers, allowing one controller to fail or be shut down for service without screwing up the process being controlled. There really isnít any similarity to a server with dual CPUs, though dual power supplies and RAID storage are somewhat analogous.

Iíll admit TMR as we do it in industry is probably overkill on the rover, as there arenít really many real-time systems to control (communications, possibly environmental control). I donít think it is excessive that there be a backup computer that can independently communicate with Earth to help detect, correct, or bypass problems with the main computer.



Besides which, Spirit does have TMR support. It landed a few hours ago. :P

Yeah, but Iíd rather have two healthy rovers. I consider Opportunity to be a mission in its own right. If it had landed at the same place as Spirit, it would be a backup.

Nanoda
2004-Jan-27, 12:20 AM
In any case, software is usually a large source of failure, even in NASA probes. It is quite logical to me to have a fall-back computer for when your main computer decides to reboot itself randomly.
Your first sentence is true, but you said backup computer, not software. The only system I'm aware of that contains triple-redundant systems and doesn't just duplicate one set of code is the Space Shuttle fly-by-wire system (http://science.ksc.nasa.gov/shuttle/technology/sts-newsref/sts-av.html#sts-dps-backup). (A guest speaker in a computing failure analysis lecture once told me that voting is accomplished by hydraulic systems acting on a small shuttle model, but I've never been able to verify that.) IMHO, having a parallel programming process for the rover would at least double the cost, and provide little extra protection against odd events. The current situation seems to bear that out; a software problem occured (the current theory is the flash file system module is having a problem with large data rates) and the computer is rebooting to clear this. NASA controllers are examining the situation, working around it for now, and will probably patch it soon. If they had 2 computers, and one wasn't working properly, I think they'd be doing the exact same thing anyhow.


Iíve seen nothing to suggest there is redundant flash memory on the rover. Sources?
I heard project manager Peter Theisinger say it on NASA TV... for some reason even the tech-oriented sites don't seem to say anything about it. Here's (http://space.about.com/cs/marsrovers/a/spirit012404a.htm) one that mentions it in passing.


I donít think it is excessive that there be a backup computer that can independently communicate with Earth to help detect, correct, or bypass problems with the main computer.
I'm aware of how failover works (and I still assert that you could yank at least the secondary CPU on my server and still have it work), but I haven't changed my mind about it. It's for rockets, nuclear power plants, and any other place where immediate human intervention is not a possibility and catastrophic failure can occur if things fail. The mars rovers have two different methods of communication (http://marsrovers.jpl.nasa.gov/mission/spacecraft_rover_antennas.html), some sort of low-level process or hardware that can accept commands from earth in pretty much any mode, and things are going pretty good. As designed, the problem that is occuring can be fixed from Earth.

Geez, I'm writing a novel. So... in conclusion, I still say that if you have decent hardware with half-decent software, which they do, then you don't need duplication.

Darnon
2004-Jan-27, 06:04 PM
I as well have heard mention of there being at least two flash storage units on the rover that the data is alternated between, although for the exact reason I don't happen to recall at the moment. I don't believe it is designed to be a redundant system, however, as one of the original theories for the problem was a problem in the hardware where it connects to both.

Demigrog
2004-Jan-27, 08:48 PM
Your first sentence is true, but you said backup computer, not software. The only system I'm aware of that contains triple-redundant systems and doesn't just duplicate one set of code is the Space Shuttle fly-by-wire system (http://science.ksc.nasa.gov/shuttle/technology/sts-newsref/sts-av.html#sts-dps-backup). (A guest speaker in a computing failure analysis lecture once told me that voting is accomplished by hydraulic systems acting on a small shuttle model, but I've never been able to verify that.) IMHO, having a parallel programming process for the rover would at least double the cost

The cost of the rover would no where near double, as the computer itself is a relatively small part of the cost of the system (RAD6000 CPU goes for about $300,000 but figure a few million including design of the VME card its on). Software costs wouldnít increase significantly either as the secondary computer would be running the same software, in general, as the main CPU. Alternately, the secondary computer could be much more limited in capabilities (ie backup communications only), and thus cheaper.



and provide little extra protection against odd events. The current situation seems to bear that out; a software problem occured (the current theory is the flash file system module is having a problem with large data rates) and the computer is rebooting to clear this. NASA controllers are examining the situation, working around it for now, and will probably patch it soon. If they had 2 computers, and one wasn't working properly, I think they'd be doing the exact same thing anyhow.

The secondary computer would have kept communications from being interrupted, allowed download of engineering data that could speed up diagnosis and repair of the problem, and provided a fallback if there is indeed a hardware problem on the main computer. Communications is so essential that Iíd at least have given the communications electronics a computer with the capacity to independently communicate with Earth and poll other hardware on the rover. Actually, with the lack of technical information on the web I canít say they donít have that kind of capabilityóthat might be how they are sending the special commands to boot the main computer from RAM.




Iíve seen nothing to suggest there is redundant flash memory on the rover. Sources?
I heard project manager Peter Theisinger say it on NASA TV... for some reason even the tech-oriented sites don't seem to say anything about it. Here's (http://space.about.com/cs/marsrovers/a/spirit012404a.htm) one that mentions it in passing.

That quote could mean anything from having two flash chips in once flash device to two fully independent flash devices. What I want to find is a good hardware block diagram of the rover electronics module; Iíll bet there is more backup hardware than is obvious from the public-consumption websites. Iím probably making a mistake extrapolating design features from watered-down press release data.



I'm aware of how failover works (and I still assert that you could yank at least the secondary CPU on my server and still have it work), but I haven't changed my mind about it. It's for rockets, nuclear power plants, and any other place where immediate human intervention is not a possibility and catastrophic failure can occur if things fail.

I think a mars rover is pretty far from human intervention, and while nobody is going to die on Spirit, Iíd say losing an expensive mars rover is pretty catastrophic.



The mars rovers have two different methods of communication (http://marsrovers.jpl.nasa.gov/mission/spacecraft_rover_antennas.html), some sort of low-level process or hardware that can accept commands from earth in pretty much any mode, and things are going pretty good. As designed, the problem that is occuring can be fixed from Earth.
That low-level process could very well be a backup computer, or at least a dedicated computer for communications processing. So, I could be arguing about nothing here. However, I interpreted the press releases as saying that the rover boots up by default in a mode that checks for commands from earth before proceeding with the normal boot cycle. NASA apparently has to send a new startup program to store in RAM every time it boots up now. Presumably they will fix the main program in EEPROM at some point when they are sure they have isolated the problem. So, while the lack of a backup computer may not be fatal for this particular problem, there are plenty of problems that would be fatal. If the problem had been with the EEPROM, RAM, or the CPU itself instead of flash, Spirit would be the latest victim of the Mars curse.



Geez, I'm writing a novel. So... in conclusion, I still say that if you have decent hardware with half-decent software, which they do, then you don't need duplication.
Iím not convinced there is such thing as decent hardware or half-decent software, at least when there is no way to manually walk up to the rover and swap out a defective part. A backup computer just seems logical to me, but without more technical information on the rover I cannot support my position (as if my opinion mattered anyway).

Yeah, I didnít intend this to be a novel, nor am I trying to be a troll by bringing it up in the first place. Iím just interested in the design of the rover, and frustrated by the lack of detail on it, particularly when trying to answer nagging questions like thisóits professional curiosity, as I design very similar systems. I was hoping someone could provide a link to better technical details, but I fear the title and original topic of the thread is keeping away a lot of people. I suppose I could contact NASA and ask, but that would be work. :)

Demigrog
2004-Jan-27, 09:57 PM
I finally found an article that actually helps explain what is going on.
http://parts.jpl.nasa.gov/mrqw/mrqw_presentations/Keynote1_haddad.pdf
has details on the VME card that the RAD6000 is on. The EEPROM and RAM are integrated onto the processor card, so apparently the Flash memory is on a separate VME card.

That article also discusses the issues of TMR in spacecraft computers, but only from the aspect of radiation hardening. Iíll grant that the RAD6000ís reliability is high enough that full redundancy would not be worth the cost in power, space, mass, or money for the vast majority of processing functions. I still think a warm-backup system might be appropriate to mitigate software issues, if the backup computer is kept in minimum power mode and running watchdog software until a problem is detected. Best case would be to design a CPU into a science or I/O VME card so it can be dual-use and not waste space or VME slots.

I think I finally found the Uplink/Downlink card theyíre using:
http://www.spectrumastro.com/PDFs/ULDL-Web.pdf

Interesting how much off-the-shelf stuff theyíre using, and how much of it overlaps the commercial satellite business. The modularity of the system is rather impressive; the base cost of the rover minus the science mission specific parts must actually be quite low. Kinda tempts me to try to design a satellite, if only on paper.