antics 6 days ago

One of the things I like about this paper is the pristine, crystal-clear English it uses to describe things I thought I knew extremely well:

> Processes are the software analog of processors. They are the units of software modularity, service, failure and repair. The operating system kernel running in each processor provides multiple processes, each with a one gigabyte virtual address space. Processes in a processor may share memory, but for fault containment, and to allow processes to migrate to other processors for load balancing, sharing of data among processes is frowned upon. Rather, processes communicate via messages. They only share code segments.

"Processes are the software analog of processors." Yes of course. How easy it is to never ponder such a thing.

  • boris 5 days ago

    > "Processes are the software analog of processors." Yes of course.

    Maybe this analogy worked in 1986 when hardware was a lot less reliable, but I don't think it goes very far these days: processes die all the time (normal termination, crash, get killed). When was the last time a processor or core died on you? In fact, according to this analogy most of us are running MS-DOS equivalent of systems since if your processor or core dies, your machine dies. And I don't see this changing any time soon.

    • antics 5 days ago

      I think the point they're making is that a process is a software abstraction that provides programmers the illusion of having a processor all to themselves. So in that sense the process is a software "analog" to the processor.

    • anonfordays 5 days ago

      >When was the last time a processor or core died on you?

      Almost every day. You may never experience if you only manage tens or hundreds of cores, but when you manage hundreds of thousands of cores, it happens.

  • sillywalk 6 days ago

    I'm not sure if you're being serious, or delivering scathing sarcasm.

    • antics 6 days ago

      I am being serious. There is a lot of value in making complicated things trivially understandable. I think it's not an accident, for example, that many of the highest-cited academic papers in CS are survey papers.

      • sillywalk 6 days ago

        Apologies from my cynical self & Poe's Law.

FlingPoo 5 days ago

I was an "operator" on a Tandem NonStop. First was Pitney Bowes, I preferred the night shift. In the computer room, there was a Tandem, and a UNIVAC. My job was mostly swapping disk packs on the Tandem, and swapping tapes on the UNIVAC.

One of my jobs was to shutdown the dialup system (customers mailing machines would send data about the mailings they made overnight). I had to shutdown the dialup lines according to the time zones across Canada. I decided to write a script to automate it. My script would shutdown each phone line in order. The first time I ran it, I "broke" the Tandem. My script was a basic loop. It would check the time, and shutdown one of the timezones. First time I ran it the Tandem "mainframe" froze. I had to call "Doug" in the middle of the night, I was freaking out. Doug came in, looked at my script and quietly pointed out that the time command was a high priority system. My script didn't have any "waits" in it, the loop I wrote was constantly asking the system the time, taking up 100% of the processing. "Doug" had to reboot the Tandem, and after the "wait" was put into my script, all was good.

After I left Pitney Bowes I was an operator at the OPP (Ontario Provincial Police), that system had the OMPPAC system on it (Ontario Municipal Police Automated Cooperative), The criminal database for the Province of Ontario.

  • fuzzfactor 5 days ago

    Things are different when "mission-critical" is the highest priority.

    Before PC's really got popular, I went out to a petroleum pipeline installation one time where they were just unboxing one of these Tandem rigs.

    They were going to use it to further automate, control, and account for transfers like some of the other oil companies were doing with their mainframes. As part of an expected technology advance at the time.

    Here it was not just a CRT terminal and a printer, but the whole thing right there in a fairly hazardous location in the blockhouse office where contractors would do their hand calculations.

    I was there to take readings on the mechanical totalizers, especially on the piping section we had independently calibrated, and bring samples back to the lab for precision viscosity and density determination to more decimal places than available elsewhere. Along with all kinds of other routine and research parameters.

    Turns out I was the pioneer in digital densitometry among the multinational contractors. That's another story altogether but within a decade they all had it and I was in more demand after the niche had grown than it was when I owned the niche. People still never want me to stop.

    Anyway, I had a pretty good handle on floating-point error and was doing my part to reign it in with improvements in physical measurement.

    It didn't take long to realize that my Atari would be basically capable of handling all of the things they were going to use the Tandem for.

    The shortcomings would be the redundancy/reliability and Atari just couldn't count that high :)

    When you're moving large numbers of barrels the numbers go through the roof when you convert to liters or even worse, some currencies.

    If the figures didn't agree very well with manual calculation using 16-digit calculators, some big shot may very well hit the roof.

    I would have had to hook two Ataris together and try to get more precision somehow at the same time as try some redundant reliability. Never did.

    Although within a couple years I did hook up two TRS-80s together and they were quite adversarial . . .

pastureofplenty 6 days ago

Tandem was a special company for sure. My dad worked there in the 80s and 90s and said it was the best job he ever had.

  • tialaramex 6 days ago

    I worked for them briefly before University, I was essentially the sole report for a handful of senior people who were being dropped in to deliver "Project Genesis". They all told me this was the best place I'd ever work and so far (30+ years later) they're right.

    That's also the first place I had something resembling routine access to the Internet, and coincidentally Tim's stupid hypermedia technology (the World Wide Web) was just taking off too so I got to see that brief period when Netscape Navigator is exciting and new.

    • sillywalk 6 days ago

      I'm curious as to what Project Genesis was.

      • tialaramex 6 days ago

        I'm actually not at all confident that I remember. New Customer Service system maybe?

  • sillywalk 6 days ago

    Apparently Jim Treybig (Tandem founder) was big on people. He came from HP, and brought over the HP (at the time) way. I think a lot of employees even got 6 week paid sabbaticals every 4 years.

    • pmcjones 6 days ago

      As far as I know, sabbaticals were universal.

      • dsand 5 days ago

        Yes, all Tandem full-time employees got paid sabbaticals. And stock options.

        My partner joined Tandem 2 years after me, so our eligibility for sabbaticals was out of sync. One of us would have to defer our next sabbatical for 2 years to get us in sync. Instead, we both took off together for 6 weeks every two years, using time off without pay when it wasn't our turn for a paid vacation. We took a lot of overseas trips.

  • mitchbob 6 days ago

    I started there as an intern in 1982, and it was my first job out of college. Tandem was Kleiner Perkins's breakthrough: it was the fastest-growing public company in America, and the growth was anticipated and carefully planned for. There were brilliant people everywhere: Jim Gray's office was on the other side of the wall from ours, and Tom Van Vleck, who wrote the first email program for CTSS at MIT, was down the hall. It was the closest to a Manhattan Project I had in my career, and pure luck that I was there.

    • thvv 3 days ago

      I started at Tandem in 1981. It was a great company to work for. CEO Jimmy Treybig came to new employee orientation and said, "The goal of Tandem is to make a lot of money and have a good time." Well wow. (Previous company would have seen good times as an error.)

      There were a lot of good times: 4:30 on Fridays at Beer Bust, standing around the pool with colleagues. Working hard with great people.

      • mitchbob 3 days ago

        Hi Tom! Fond memory of mine: running with Jimmy T. at lunch. What a great guy he was. Haven't done anything like that with a CEO since. Hope you're doing well!

schoen 5 days ago

As a kid I got to visit the (very conscious) Tandem competitor Stratus ... I think a family friend was working there. They had a competitive line of highly redundant and fault tolerant computers.

The person giving me the tour gave a demo by opening up an operating computer and pulling some components out of it. The computer logged the fact that the components were now missing, and blithely kept on computing. I found it pretty mind-blowing at the time!

As far as I know, Tandem could do more or less the same demo.

  • sillywalk 5 days ago

    I can't find the link, but I read a similar story about Tandem - removing parts in front of customers and it kept going.

    The funny bit was that it was configured to dial-up to the Global Support Center, when something went wrong, so after all of these hardware fail/component removed errors happened, a technician was sent over with replacement parts that weren't needed.

ghaff 6 days ago

Tandem reinvented themselves a number of times in the light of various hardware changes, eventually ending up I believe on x86.

  • sillywalk 6 days ago

    Yes. Proprietary CISC chips->MIPS->Itanium and ported to x86-64 around 10 years ago. Apparently is or will soon be offered "in the cloud" as a "NonStop as a Service".

    • ghaff 6 days ago

      I forgot about the MIPS step. Went to grad school with a guy who was with Tandem and ended up in charge of HP's enterprise servers. Not sure how network-accessed FT works from a reliability perspective but I guess there are a lot of apps where, if your network is down, you're toast anyway and there are various ways to provide redundancy.

      • sillywalk 6 days ago

        I think it's for people who don't want to deal with or upgrade the legacy NonStop stuff. As I understand it, a minimum logical NonStop server is actually pretty extensive:

        Two standard(?) HPE servers plus two Infiniband fabrics. Also, these servers don't directly connect to anything except the Infiniband. I believe all (or most) IO is offloaded to Clustered IO Modules (CLIMs), which are 1U HPE servers running a custom Linux distro. There are Storage and Network (and I believe Telco) CLIMs.

        I'm not sure what the minimum number of CLIMs is - e.g. can one provide storage and networking, or if you need two, or for redundancy you need two or four.

        • ghaff 6 days ago

          I was an analyst who covered the server space but I sort of lost track of the details of Tandem FT latterly even though I knew the head of HP enterprise server pretty well. As I wrote, a classmate. Originally it was 3 hardware modules voting but that changed latterly and I forget the details. NEC had something as well based on standard hardware as I recall.

newman314 6 days ago

So it probably sufficiently far removed now that I can talk about it but I was involved in taking a look behind the scenes at a telco over a decade ago while doing due diligence.

The telco had a pair of Tandem systems acting as the FTP server for all the telco switches. Pretty wild if you think about it.

  • sillywalk 6 days ago

    It seems like overkill for just FTP. Although I believe that Tandem also was (is?) fairly common in running telco stuff, plus billing.

    • newman314 6 days ago

      I don't disagree at all. But I guess they really wanted their CDRs.

      Another unusual piece of tech was the use of SLR tape drives. That's really the only time I have encountered those in the wild.

  • Sylamore 5 days ago

    Telcos I've touched have used Tandems for E911, SMS processing, Device authorization, and automated tariff negotiation between CLEC/ILEC, but FTP server is a new one!

mech422 6 days ago

bah - Team Stratus!! None of this wanna be software redundancy - real computers do it hardware !!

/jk

Always was a bit of rivalry between them (at least with Wall St. data centers)

  • sillywalk 6 days ago

    They still kept their software redundancy, but when Tandem switched from their own custom CPUs to MIPS, they used pairs of lock-stepped CPUs like Stratus.

    When they still had their own multi-board proprietary processors, Tandem also used lock-stepped microprocessors in their maintenance and diagnostic subsytem and some of their communications controllers.

    I also gather Stratus is still around running on x86-64.

    • schoen 5 days ago

      Do you know (or does anyone else reading this know) how the lock-stepping works?

      It seems like it would be an enormous challenge to keep modern CPUs exactly in lockstep. All of the same interrupts have to occur at exactly the same times and be handled in the same amount of time? (I guess the hardware has to physically ensure that every interrupt is raised on both CPUs at the same time?)

      • sillywalk 5 days ago

        I believe, but am not sure, that NonStop ditched lock-stepping altogether for their NonStop Multicore Architecture when Itanium went multicore, but I can't find a paper on it, or their current X86-64 Architecture.

        They used a custom chipset that compared the two MIPS processors and also synchronized them on external interrupts. I believe that Tandem System Review is on archive.org. Volume 8, Number 1 (Spring 92) has info on the first generation lock-stepped TNS/R, and Volume 10, Number 1 (January 94) has info on the 2nd generation MIPS "Himalaya K1000".

        For Itanium, the processors were loosely synchronized, but not really lockstepped. They ran the same application code, but the processors could have different cache, TLB hits, etc. They got synchronized on external IO, all of which went over the ServerNet. There is a paper called NonStopĀ® Advanced Architecture[0] by David Bernick, et. al that has the details on their 1st generation Itanium systems.

        [0] https://www.cse.chalmers.se/edu/year/2012/course/ZZ_courses_...

        (the only source I could find that had the paper)

    • mech422 5 days ago

      Oh? I didn't realize they used hardware level on MIPS

      Stratus was pretty nice - I liked VOS :-)