In the late 90s I got my first job and it was in an amazing place. Incredible for what I learned, for how much fun I had and also for such hard experience it was.
I joined the Information Systems unit at Telefónica’s Research and Development company. Insiders and many outsiders called it the Computer Center.
Yes, if this were The Golden Girls, and I was Sophia Petrillo, this story would start with a Sicilia, 1920.
The story begins…
At the Telefónica R + D Computer Center we made sure that the computer and information systems worked. From the machines that supported the development environments (at that time, there were many people working on something called Infovía, which was the germ of the current internet connection networks) but also of all the company’s internal systems: payroll management, the application to request holidays, e-mail server, visitor access control, central communications and much more.
So there were no things in “the cloud”; everything worked on large machines that lived in a basement that many of us called the refrigerator, and in which we spent many hours in front of bright green mono-colored screens, the famous VTs.
When we had to make any software or hardware changes, we worked at night. Clearly, you couldn’t stop the development environments for the more than 1,200 employees there. These operating windows used to be open from 10 or 11 at night, and we stayed there until everything worked (hopefully, most of the time) or until we left everything as it was (because the change had not gone well). Everything should be transparent (or almost) to our users.
One day we had to completely change our largest machine there. This server was the one with all employee IDs, access systems, email accounts … the bee’s knees. If the change went bad, and we couldn’t fix it, the next day no one would be able to work.
And, to be honest we changed it because we had no choice.
The working server became too small for the current operations; but we didn’t feel like it at all. The chances of something going wrong were pretty great. Bear in mind that back then there were a lot of things that were done manually and there were a lot of non-described system problems (well, that might still happen now)
The design of what we were going to do had taken us a couple of weeks. The new machine had been under stress test for almost a month; all the work that could be done had been advanced, was done. We had designed the data dump and backtracking processes; and they had been tested. It was or, so we thought, everything under control.
D-day came and we went to the office.
Things started well: the basic procedures and the first dumps of information went smoothly. We had been there for four hours, and we had not had any serious problems. It would be like three or four in the morning and it only remained to dump the user base and replicate the email accounts. We had done it!
The last part of the process was carried out by a program we had made to copy user records. It was a process that could not be interrupted or it would create inconsistencies.
It was going to be the first time that we finished one of our night jobs without major incidents (something always happens when in production). Was going to be…
Halfway through the process, an error appeared on one of the drives. It was not recoverable. Everything was left hanging. Oh.
I remember how a cold sweat came over me, and I was not the only one.
It was five o’clock in the morning, in three hours people were going to start arriving, and we had two machines that at that moment were not good for anything other than to act as paperweights. Oh. Oh.
At 7, we still hadn’t recovered the systems, but we had a plan, and we were working on it.
At 8, the phone started ringing. In a company where many know about technology, many things arose: how could you have done it so badly; what you have to do is …; I send you to one of my boys that they fix it right away … Some said to us: do you need something? How can we help you?
And I remember how my boss answered the same:
Please, let us work.
Yes, errors exist. We sure were partly to blame: we were not prepared for what happened; we did not imagine that the failure was as it was. But of course we were the ones who had the most opportunities to fix it.
By 10:30 in the morning everything was settled. We had to go back and recover the old machine by changing the fried disk and put all the systems back in place. But we got it!
We got it!
Not even in something as “easy” as technology is all black and white, less in life. What was a problem for the rest of the company and a giant crisis for us provided a lot of learnings. And now, I’m only taking one:
When there are people taking care of what happens, let them work. Perhaps you know more (or so you think), perhaps you would like others to take care of the problem, perhaps those who are dealing with it doesn’t deserve your trust. I do not know.
Until you get in the mud and really see how big is what is going on and all its implications, you can’t be sure you would do better. And if the problem is complex enough in a tough complex environment with many related elements, we are all going to make mistakes.
So, either way, let’s let them work. And let’s keep them accountant, if we have to, later.
Take care. We do have that in our hand now.
PS: / Although it has been a long time since I changed my life and I no longer work in this field, there is not a day that I do not remember the people who now, in this crisis, are running Telcos’ operations, taking care that the communications’ networks work, so we can all stay connected. Thank you.