Friday, August 18, 2006

Making things work - Part the second

Assuming you read part one, and followed the instructions, you have read all the project documentation, produced a good network diagram and a racking plan, and your customer is getting together the details of IP addresses and naming conventions. What's next?

If you work for a large organisation that does a lot of integration work, or perhaps for a hardware or software vendor, somewhere about the place there will be a group of people who provide technical support to customers. Even if the call centre itself is in another country, the chances are that there are a few people still on the pay roll whose business it is to go to the customer's premises and fix things when they break, and possibly perform preventive maintenance. If you are dealing with an important customer, there's a good chance that there is a support person "dedicated" to them. They're probably dedicated to a dozen other customers, too, but they'll know them reasonably well. Even if there is no dedicated engineer, the support folk may know your customer. Seek them out and enquire. Take your network diagram and rack plan.

Support staff do not have an easy life (I've done this job, so believe me on this one). Nobody calls you to tell you that they're having a nice day: they call to complain that something isn't working, and they're often upset when they do it. If you are really lucky, you get paged at 2AM by some operator in the data centre who can tell you that there is an error on his screen, but can't tell you what he was doing when the error appeared. Anyone who works in support has a vested interest in your project, because when you finish, they are going to have to support it. Their lives will be easier if you have followed accepted best practice and your racking arrangements are sensible.

Anyone who regularly does customer facing infrastructure delivery work should make every effort to be on good terms with their colleagues in Support, because you can help each other. You'll know you've struck the right balance when you can go to them and ask for advice, and when they come to you saying things like "the customer's got this device/software that we've never seen before, are you familiar with it?" What you don't want is the question "why did you idiots install it like this?"

So be up front with them: locate the people most likely to be affected by your project, and tell them about it. If it's a completely new customer, ask the support team lead who is most likely to have to take care of them, and try and engage that person. Show them your diagrams. Take their suggestions and/or criticism in good spirit. Listen to anything they can tell you about the customer and their environment. You may get gems of information such as "no project that this customer has started in the last three years has gone into production" or "you do realise that their existing kit is housed in a so-called server room constructed in the space under a staircase?". You may be told that their data centre manager is a direct blood-line decendant of Attila the Hun, and a nightmare to deal with (data centre managers would make an interesting psychological study for any one with the intestinal fortitude to tackle the subject: they tend to be obsessive control freaks of the first order). Whatever, heed the wisdom of Support. They have been on the customer's site, and you probably haven't. If they do nothing else, they can clue you in to the change control procedures that the customer uses and the building access that you can expect when you try to deliver your project's kit. I once saw a case where the data centre was located in the top story of a heritage listed building with narrow staircases, and a crane was needed to get the new equipment into place.

Support will know if the customer has power or air conditioning problems. I once saw a major telco take delivery of a large new machine (a Sun 25K), only to discover that there were insufficient power connections on the data centre floor to turn the thing on. When they got sufficient 32 AMP points put in, they found that the extra heat load of the frame overloaded their airconditioning, and the machine could only be operated for limited periods until they got an air conditioning upgrade done. Installation was rather drawn out on that one.

Learn what you can, adjust your diagrams, and move on.

By now you should know what you are installing, and what it is supposed to do. You should be able to work out what skill sets you will need to complete the job, and the approximate order in which you are likely to need them. Tell the project manager who you are going to need, and when (I'm assuming here that you can get the skills you need in-house. If you can't, then the project manager will need time to find a external resource). Resourcing issues should now keep them occupied while you tackle the next critical task...

Wednesday, August 16, 2006

Making things work - Part the first

In every project that I have ever worked on, no matter who I was working for, there was a phase between the customer signing the order and the new "system" being handed over to the customer, and that phase was usually called "system integration". Sometimes it was called "system build" or "install and implement" or something similar, but it always meant "put everything together and make it work the way we told the customer it would".

In the early days, the system integration phase would be short and simple: install a couple of computers and a printer and configure the print queues, or some such thing. But as the projects got bigger, the integration tasks became more numerous and varied. And that increased complexity never seems to get factored into anyone's estimated costs, and I suspect this is because no one but the people who actually do the integration can list all the possible sub tasks, and explain why some of them are both difficult and time consuming. One of the things that always worries me is the sight of a project manager who has just been handed their first big integration job. You just know that they are going to suffer torments for the entire duration of the project, and no warnings will convince them that there will be problems until the job gets properly under way. By about the second week they will be looking stressed and clutching at their Gantt charts for comfort, and comfort there will not be. For verily, systems integration can be extremely tough.

Now you may never have done this sort of work, but have a job like this in your future. Or perhaps you have done an integration job, and never want to suffer like that again. Or perhaps you're in sales, and you'd like to know why the contingency budget on all projects is usually not only fully consumed, but over run (and why all the infrastructure engineers hate your intestines). Let me try and explain how integration works from the point of view of the people who do it, and share what I've learned about things that can make it hurt less. I'm not claiming that this is the best way: it's just the best I've come up with so far. The job I'm going to imagine here is somewhere between $AU3 million and $AU5 million. It will include several racks of computer equipment, firewalls, load balancers, switches and routers, a lot of licenced software and some that is bespoke.

Almost every integration job that I have ever been assigned has featured implementing something that someone else designed. It is a sad fact in the IT industry that engineers who work in pre-sales are not compelled to work on the implementation of the things they design. It might make some of them more careful. I always start be reading whatever documentation exists: the design document that was given to the customer, any internal documentation that the sales people have for the job, anything we got from the customer, the bill of materials and (if it exists at this point), the project schedule.

It annoys me deeply that sales people and project managers create project schedules without asking the people who are going to do the work how long they think it will take to do, but this particular form of insanity seems common across the industry. People who couldn't perform the required tasks if their lives depended on it will guess how long the tasks that they don't understand will take to complete. Anyway, take note of where they think the job will end. The end date usually bears a close relation to the next end of financial quarter, the often reflects no urgency on the customer's part, but merely a requirement to get revenue recognised by a particular date. If you happen to be a customer embarking on one of these projects, pause and consider what your vendor's revenue recognition deadlines may be doing to your project quality.

However, back to the documentation. Most designs feature at least one diagram which represents the network in which the proposed solution will reside. For many of the jobs which I have done, the network itself has been part of the solution. Unfortunately, the diagram will usually be a marketing diagram: little pictures of computers joined together with lines. This type of diagram is extremely dangerous, because it conceals more information than it reveals. Consider a diagram such as this:



Three computers, how bad can it be you say? Well the Internet isn't really a cloud: typically, we see its physical entry point as a port on a router somewhere in the data centre. Two computers can't connect to the same router port, so either we need two router ports, or a switch between the computers and the router. Or there might be a whole lot of other infrastructure between those two machines and the internet - routers, switches, hubs, firewalls. The diagram doesn't tell us. Nor does the diagram really tell us how the two computers in the middle connect to the one at the bottom. Does the third machine have two network cards, or is there some other device between them? What is it? A switch or a router? And if there are switches, routers or extra networks cards needed, do they already exist or are they coming out of our project's budget?

Start by drawing a detailed network diagram that shows every device and every port, and work out what is required, what already exists and what has to be purchased. Compare what has to be purchased to the bill of materials. I've done several jobs where critical components have been missed from the bill of materials, and you need to get missing equipment ordered quickly, or the delay in supply will hold up your build.

The missing equipment issue can be complicated if your contract with the customer includes hardware maintenance. If you have to add extra components to make it work, you will also be adding the cost of the hardware maintenance contracts. This will eat into your contingency budget, assuming you are on a fixed price agreement, which these things almost always are. Furthermore, if you add extra components, you will increase the amount of heat the completed solution will generate, and you will require more power and probably more rack space. You see the problem?

Now while you are doing all this, your project manager is probably agitating for you to get the build under way. Don't. You are no where near ready to start building anything. First you need a proper racking design, and to do that properly you need a very detailed network diagram. You need a clear picture of what network cabling is going to have to go where. Once you understand what must physically connect to what, and you have the physical specs of all the equipment, you can draft the racking diagram. Good rack design will save you time in the build, and make your customer's life easier when they take control of your creation. Equipment manufacturers all have recommendations for how their gear should be racked (you did check that all the racking rails got ordered, didn't you?). As a rule of thumb, put the big heavy stuff near the bottom of the racks and try to minimise the amount of cable that has to go between racks.

While you are doing all this, there is other information that you are going to need soon. Find out who your contact is on the customer side, and ask them for their standard for naming equipment in their network. Also ask what their procedure is for allocating blocks of IP addresses, and get the details of their DNS, default routers, time servers and anything else of that type that you are going to need. Finally, ask if they have a standard for operating system installation. Do they minimise or harden the OS in any way? Do you need to do anything special to conform to their normal practices? It may take them a while to collate this information, so ask for it early. You still have a lot of work to do before you are ready to start installing anything.....

Monday, August 14, 2006

Doomed projects

Over the years I've been involved in some weird and wonderful projects. Some I remember fondly, some I try not to think of at all. Most projects have been successful, if "success" is defined as "what we built worked, the customer signed off, we got paid and the system went into production". But one or two ended in failure, and I was thinking about one of these this weekend while contemplating the question "how does a big IT project get launched when it should be obvious to anyone if possession of the facts that the project is doomed from its inception?"

I worked on one of these train wreck jobs about 10 years ago. At that time I was working for a reseller, and my job was on the help desk, providing technical support for things like SCO Unix, ArcServe and several sorts of hardware. Home internet access was becoming increasingly common in Australia, and the internet itself was evolving rapidly from a place where geeks talked about geek stuff to a place where non-geeks were trying to do business.

On the other side of the world a Big American Bank (let's call them B.A.B.) had launched a very swish new website. Their customers could register on this website, tick the boxes for the information that interested them, and then on their next visit, the website would show them relevant, fresh, content. And the software would keep track of who visited and what they looked at. By modern standards this is trivial, but in the mid-nineties it was hot stuff. The site had been built using software and professional services from a famous Californian application vendor (let's called them CalVend), and it was extremely slick. Other banks were deeply envious, and a certain large Australian bank (let's call them L.A.B.) decided that it was their destiny to be the first bank in Australia to have a website like B.A.B.'s.

Now it so happened that my employer was reseller of CalVend's products. CalVend were very keen to have L.A.B. deploy their software, so keen that they sent a presales engineer to Sydney to assist in closing the deal. And the deal was closed, and then it dawned on the sales folk that someone was actually going to have to deliver the solution, which only ran on Solaris. The company didn't have many Unix-literate engineers: most of the in-house geeks were Netware specialists. Remember when Netware was fashionable? I digress.

They needed a Unix engineer to work on this project, so I was summoned from the depths of technical support and seconded to the project team. The project manager and I were sent to California to do the application training course, and by the time we got home the hardware had been delivered and installed. I installed the operating system and the application software, and the rest of the team started reconstructing the rest of the L.A.B. website, so that it would mesh nicely with the new application. I settled down to write the Perl CGI scripts that would interface the registration form into the application. All the application actually did was record user details, record what each person was interested in (investment or retirement planning or whatever), and retrieve pieces of content from its store to be displayed to the user. Getting the information in there in the first place was my problem.

A number of other problems soon became apparent. The first one was that the application software was both badly documented and riddled with bugs. The second was that CalVend had expected that their professional services would be engaged to do this job. When they found that we were planning to do it ourselves, and that the only revenue they were going to get was the stupendous licencing fee for the application, they became extremely uncooperative. We would call them to ask how to resolve some problem, and it would always transpire that the only person who could answer our question would be "travelling", typically in Korea or Outer Mongolia, and would not be available for at least a week.

However, we persevered, and the day came when everything worked. You could register, and the system would hand you the test pages we were using as dummy content. The operator could apply weighting to selected content, so a user was more likely to see page A instead of page B. All as required by the specification. So I turned to the L.A.B. person who was working with us, and said "OK, we need to load the first cut of the real content, so we can test properly" and he went away and returned shortly with a 3.5" floppy, saying "here it is". So I dumped the files into the system, and had a look at them. I experienced the first twinge of alarm when I saw how few there were, about 15 as I recall. And they were all PDFs. I opened one. Then another. The full horror hit me: the contents of the floppy was a set of PDFs for the printed brochures that you could get from a L.A.B. branch. Boring leaflets featuring carefully staged, politically correct photos of happy bank customers, a few paragraphs of waffle and the phone number of the bank's call centre.

I asked when the rest of the content would be available. The answer was that there wasn't any more. "We thought we would just start with that, and see how it went" they said. I pointed out, firmly, that no end user would go through the web site registration process for the privilege of getting the same brochures that they could get from the local branch. And if that was all they could get, they wouldn't bother revisiting the site. That scared them: the whole point of this exercise was to increase traffic to the bank's web site, and to be able to monitor that traffic. B.A.B. had a large staff churning out content, so there was almost always something fresh to look at on their site. L.A.B. had nothing in place to supply a constant stream of content. Problem.

So they called their marketing department, and arranged a meeting. I was there (I had to explain how the system worked, most of the bank staff still didn't really understand it). The meeting was interesting. The marketing folks had not heard about this web site project, and you could see they weren't happy. Once they understood what was going on, and how far it had got, you could see them fighting for composure and control: they could see brand name ruination staring them in the face. If the site went live with the current "content", the result could only be public humiliation of the bank on a national scale. But they had no resources, and more importantly, no plan for what content should be created and why.

The marketing folks were smart people. They didn't say anything negative. They said that they needed time to formulate a plan, and they went away. We went back to the office we were working from, to await developments.

I don't know exactly what happened next, because there were meetings to which I was not invited. Nor, so far as I could tell, was anyone else working on the project. What I think happened is that the marketing folks went a few levels up the bank's chain of command, and had our project put on hold, pending the creation of content. They then produced a plan for content creation, and as part of that plan, had control of the project transferred to them. And once they had control, they signed off the completed work, and shut the project down completely. We got paid for the work we had done, but the site never went live, and about a year later when the bank did launch a new web site, it was a completely new development (and it didn't use CalVend's software).

How did this go so wrong? The project could never have succeeded without good content, but nobody thought about the content at all. All the project sponsors thought about was the "coolness" of the web site, and everyone on the implementation team assumed that there would be content available when we needed it. Nothing in our scope of work mentioned content, all it talked about was functionality. We delivered the functionality, but functionality with no content is meaningless.

But it is far easier to focus on the technology, the gadgets, the bells and whistles. I've seen quite a few project sponsors do this over the years: they like to come and have their photo taken in front of the equipment racks when the new system goes live. Unfortunately, most end users are completely unimpressed by the technology, because they don't understand it and they don't need to understand it. They do not care about the sophistication of the software, the elegance of the design, or any of the fascinating little details that will absorb an engineer for hours. They just want "the system" to do something useful, and to do it quickly, without making them feel stupid.

For the engineers, a successful project is one where the technology works, and we don't start getting support calls as soon as we hand the system over to the customer. For the project manager, it's one where the job is completed on schedule, the customer signs off and we get paid. But for the end user, a successful project is one that creates a service, utility or product that they will want to use. If the end users stay away, what you have built is a white elephant, often a very expensive one. I don't recall what L.A.B. spent on their doomed project, but I don't believe there was much change out of $AU500,000.

Addthis

Bookmark and Share