Wednesday, December 20, 2006

Wandering Along The Career Path

That long silence was caused by me having to take time out to focus on changing jobs. Did you ever get to the point where you absolutely could not stand a job any more, and getting out of the position was the only sensible option? Well, my last job and I reached that point back in September 2006: let's not dwell on the sordid details, but suffice it to say that I regret ever having accepted the position in the first place. I started looking for another position, and several times I thought I'd found something suitable, only to have some unanticipated problem thwart my efforts - hiring freezes, redefinitions of role, whatever.

However, the job market stayed busy right into December, so I kept at it. I have done some truly silly interviews in the last few weeks: the prize for most pointless must go to the consultancy that listed a position with a role description listing all sorts of technical qualifications, most of which I have. The interview, when I got to it, appeared to be for a non-technical business analyst. I could find no connection between the position description and the job interview. Add to that the now apparently obligatory interviews with Google (honestly, is there any one those people haven't interviewed?), and the whole exercise has been a time consuming nuisance of the first order. However, I have accepted a position with a reputable consulting firm, and I start on Monday.

It will be a huge relief not to have to deal with recruiters and HR folks for a while. Very few recruiters appear to be worth the oxygen that they use, and I worked out long ago that the sole and only purpose of HR departments is to prevent the employees from sueing the company over mistreatment, stress, working conditions, or anything else. Once you understand that HR is there to protect the employer from the employee, their machinations make perfect sense, and can be handled with the minimum of wasted time and effort, except when you are trying to get a job, at which point you have to put up with their nonsense with such grace as you can muster. In the category of "nonsense" I include psychometric testing, any and all "staff review" processes, and HR's feeble efforts to "screen" resumes, to identify suitable candidates. Allowing HR people to review resumes simply means engaging a poorly trained pattern matching engine, which is as likely to reject good candidates as bad ones, because none of the "patterns" have any real meaning to them.

I've always had trouble understanding people who have a career plan. I mean, I can see that it would be nice: I'm the sort of person who can't function without a contingency plan. I like plans. I have a plan for what to do if I get a flat tyre on the way to work, even though I don't expect that to happen. If I have a plan, I feel I have control of the situation. But my career plan has never been any more detailed than "do interesting work, feel useful, pay bills". That's it. And this plan, feeble though it may appear, has taken me from a badly paid job as a public service librarian to a very well paid position as a senior IT consultant.

I couldn't possibly have planned the transition - at the time I was working as a librarian I barely knew that there was such a thing as an IT consultant. Any career plan that I might have developed as a librarian would only have involved being a more senior, better paid librarian. But because I wasn't focussed on achieving a particularly detailed career goal, when an opportunity arose out of left field, I could take it and still remain true "my plan".

And so I wandered into the IT industry, where by happy chance I learned Unix (well, Xenix actually) first. Only later did I have to cope with the various incarnations of Windows, and even for a few horrid years, Netware. Every time I find myself working in an environment where Windows is in wide spread use for anything more complex that office productivity, I find myself working with people who confuse me. They work with software that crashes regularly, is prone to all manner of malware and is expensive to licence. They consider this normal, have invested time, money and effort in achieving certifications to prove that they are qualified to administer this muck, and are extremely defensive when they encounter other operating systems.

I should know better, but I've done it to myself twice in a row: taken jobs where some version of Windows is in wide spread use on servers in production environment. And to compound my errors, I've worked directly for end user organisations. I have now come to my senses: 17 months is enough. I am going back to a consulting job, working with Unix - Solaris, Linux, HP-UX, perhaps a bit of Mac OS, whatever. I have instructed my husband to remind me the next time I suggest working for an end user company that this will not work for me, or them. I am adapted to consulting work, and that is what I should do.

It has taken me a little while to work out exactly what the difference is between consulting (regardless of technology) and working as part of an end user team, and I think I have figured it out. When I turn up on a new customer site, as a consultant ready to deliver "something", I have two advantages. One is that whatever I am contracted to deliver will have been approved and paid for by a business person, who has no interest in the actually technology used (unless it impacts on training costs or business continuity planning). So even if the folks on site don't like what I am doing, they can't stop me: my orders come from above. If they do manage to slow me down, the wrath of the project sponsor is likely to fall on them, because my charges will go up to cover time wasted.

The second advantage is that I don't have to live with the consequences. If you are working directly in a team, you have to labour to get buy in and cooperation from a disparate group of people, and if anyone feels that they weren't listened to properly, or that they got a rough deal, you have to put up with sulking, obstruction through non-cooperation and sundry pettiness. Heaven help you if what you are trying to do threatens someone else's position or devalues their skill set. I once worked on a project which involved installing several racks of Sun servers into an environment which was almost all main frame based. The main frame guys initially refused to allow us to put our equipment into "their" server room. We were instructed to install it into the room the system operators worked from. Their management intervened when the operators began to complain about the heat and noise the new equipment was generating, and the whole lot was moved into the server room over the protests of the main frame folks.

If you are a consultant, you do what you are paid to do, and get off site, and someone else gets to sooth any ruffled feathers. I've had this happen to me on numerous occasions: some manager will get a consulting team in to do something his/her own people were perfectly capable of doing themselves, but the project is politically unpopular. So "the consultants" do it, and after they leave the building, they get blamed for any subsequent grief. Usually in words such as "I'm afraid the consultants did it like that, and we can't change it", or better yet "if we change it, we'll void the warranty". There's no line item for "taking the blame" in any statement of work I have ever seen, but it's there by implication.

Weird as it seems, I find I miss this type of work. I can hardly wait for Monday to come.

Thursday, November 16, 2006

Kitchen Renovation Project and Systems Integration Project: compare and contrast

That long silence was caused by the Great Kitchen Renovation Project (GKRP), which started at my house on October 3rd 2006 and finished on the morning of November 16th. During the last six weeks I think I have seen almost every one of the things that go wrong on large integration projects go wrong on my kitchen refit.

I should probably explain that the old kitchen was a "temporary" facility, built as cheaply as I could manage it, and never intended for long term use. It featured 1.8 metres of benchtop with cupboards underneath, which I bought in the going-out-of-business sale of a kitchen supply company. I had a factory second sink installed in the benchtop, and added a variety of cheap, freestanding pine and melamine cupboards for more storage. A bench from Ikea, our existing microwave oven, the fridge that my husband bought second hand when he first moved to Sydney and 900mm wide Ilve cooker (gas top, electric oven, and the only thing I spent significant money on) completed the original fit out. The floor was concrete, sealed with Berger Jet Dry paving paint. We intended to upgrade the kitchen as soon as our finances recovered from building the house in the first place, keeping the expensive cooker and scrapping everything else.

Now how many times have you deployed a "temporary fix" that wound up staying in production for years? We used the temporary kitchen for 9 years, and the catalyst for the renovation was the expensive Ilve cooker breaking down for the umpteenth time, and me refusing the pay to have the lemon of a thing repaired yet again. Honestly, when you are on first name terms with the repair man, and he can comment knowledgeably about the progress of your landscaping, he's visiting too often. That Ilve cooker was nothing but trouble from the day it was installed, and I would never buy another.

When the cooker failed, I went out and bought a large Sharp Convection Microwave oven, with which I was extremely pleased. The gas cook top still worked, and with that and the microwave, we could feed ourselves decently while the new kitchen was planned and built. Little did I know that the new microwave over would be the factor that caused most of the problems with the new kitchen.

I went forth and shopped. I'm good at this: I shop thoroughly, with much research and attention to detail. I don't mind spending money, but I do expect to get quality, and exactly what I want. So I selected a fabulous new cooker from Falcon (I bought the cream coloured one); a wonderful fridge/freezer from Maytag, a Qasair rangehood (model CM ) and a Miele dishwasher. And I looked at a lot of kitchen showrooms, and settled on two potential suppliers, both of whom I asked to do a design and quote. I supplied all the details of my chosen appliances. I explained about the large convection microwave, which I wanted to keep. Each potential supplier sent a designer to measure the room, and talk to me about what I wanted. I showed each man the manual for the microwave, which stated that there was a special installation kit needed if you wanted to build it into a cabinet. And each man said words to the effect of "oh, we can build that, there's no need for a kit".

Both designs came back quite similar, and my husband and I chose the one we preferred (it cost a little more, but we felt the workmanship was better, and the cabinets would be made in Australia rather than in another country). We settled on finishes - cherry wood with stainless steel benchtops and splashbacks - and the supplier sent out their "check measurer", to ensure that the first set of measurements was accurate.

I insisted that the check measurer double check the fitting requirements for the microwave, and this time some one called the microwave supplier and actually got the real word. The answer was that the microwave vented heat at the back, and need a huge clearance (relative to the cabinet space available) to be installed safely. The kitchen person called me in distress to explain this, and it was a real problem: the microwave would not fit where it was supposed to go, unless I never used the convection feature.

Now I have seen similar things happen on integration projects: the presales team assumes that a piece of technology that they have never worked with before will work in a particular way, and when it doesn't, their plans and designs are invalidated. Or the customer tries to alert the sales folks to a problem, only to be assured that all will be well, and when it turns out that the customer was right, there's plenty of egg for all the faces concerned.

I was once involved in a project where the sales team assured the customer that all their shell scripts could be ported from DG/UX to Solaris. On the face of it this seems fairly reasonable, but the customer told sales that they had some scripts that involved print queues that were pretty special, and they thought there might be problems. They were more or less told not to worry about it, in a rather patronizing way. The contracts were signed, and I turned up on site to start the work. The customer showed me the suspect scripts, and I could see at once that there were problems. Their scripts relied on certain behaviours of the old DG/UX printing subsystem and the way DG/UX handled serial devices. I could have made the scripts work on Solaris 2.6, but large chunks of the Solaris printing and device subsystems had been rewritten to improve security, and what would work in 2.6 would not work in Solaris 9.

The customer knew perfectly well that this would be a problem, and I think they took some pleasure in informing me that, for them, this was a show stopper: either we provided equivalent functionality for the duration of the project (after which a new system would replace the old stuff and all its functionality), or the project got stopped. Fortunately I managed to concoct a solution which got around the problem, but it would have been better if the customer's legitimate concerns had been addressed at the beginning.

It was the same thing on GKRP: my legitimate concerns weren't addressed until the designs were completed. I wasn't prepared to sacrifice precious cupboard space to give the microwave room to vent, so I elected to replace the microwave with a non-convection unit. Since I was getting the Falcon cooker, that had two ovens, this seemed the best solution. The microwave cabinet could be made a bit smaller, the drawer unit next to it could be made a bit wider, and all would be well. So we agreed on a variation, and the diagrams were revised.

The day came when the cabinets were to be delivered to my house, and I got a phone call from the owner of the kitchen company. There had been a mistake. The news about the variation had some how not reached the cabinet makers. The microwave cabinet had been built too wide. Modifying the microwave cabinet was easy, it could be cut down and made narrower, but changing the drawer unit next to it would be a major exercise. He suggested that instead of changing the drawers, they would build a small wine rack to fill the space. This could be done quickly.

At this point I had the contents of the old kitchen stored in my dining and lounge rooms, and the kitchen was completely empty. I knew I was going to have to use the sink in the laundry for all tasks that would normally be performed in the kitchen sink for as long as the refit took. For cooking I had a single electric hot plate, the convection microwave (on a bench in the dining room), a sandwich toaster and a barbeque in the back yard. I felt like I was camping in my own house, and the prospect of significant delay was horrifying. I agreed to the wine rack solution.

Now how many times on a large integration project has there been a change in the design that has not been communicated to the people building the systems? And every time it happens there is a trickle down effect to other places in the project......

The cabinets were delivered, and all placed either in the kitchen or in my family room. Then came the appliances. The appliances are all large. The dishwasher and the microwave would fit in the family room to await final installation in the kitchen, but the cooker, the fridge and the range hood all had to be placed in the garage. The men who delivered the appliances looked at the access to the kitchen - my house is built on a narrow block of land, and in consequence some of the doors are narrower than standard - and expressed the opinion that getting the fridge into the kitchen at all was going to be a serious challenge.

Now how many times has a large computer/SAN/tape drive been delivered to a site, only to find that there's no way to get the thing into the building without removing doors or windows, or unpacking the kit in the loading dock and hauling it into the data centre piece meal?

Well, that's what happened with the fridge. Once the cupboards were all installed, a team of strong men arrived, removed the fridge doors, the hinge assemblies, and the freezer drawer, and carried the pieces into the kitchen. Then they put it back together. I am impressed beyond expression with the fact that they achieved this (including lifting the main body of the fridge and passing it into the kitchen through the counter-height servery hatch that connects the kitchen to the family room when there proved to be no way to get it through the door) without swearing once.

Then the installer tried to put the microwave oven into its cabinet. The cabinet had been made narrower, but it the cavity was still too tall: there was a big gap at the top. The whole cabinet had to be removed again, and taken away for adjustment. Things were now running quite late, so the project coordinator made the classic mistake that project managers make when a project is running late: he booked multiple trades to be on site at the same time. When the gas plumber arrived, there were two electricians and three rangehood installers already working in the kitchen. The plumber went away again, and came back later in the week. The rule here, and many project managers never learn this one, is that some tasks cannot be performed in parallel, and that throwing more engineers at a job may make it slower.

Another common problem on large projects is the part or service that is supposed to be provided by "someone else". The vendor assumes that the customer will have sufficient switch ports to plug in all the new equipment. The hardware people think that the software folks are organizing the software licenses. The software people assume that the system administrator will take care of their backups. Whatever, there are never enough switch ports, no one orders the software licenses and the system administrator doesn't have a big enough backup window to add the new systems. The thing you assume is "someone else's problem (SEP)" just gets missed. On GKRP, the kitchen company assumed that their subcontractor electrician would supply the chrome down lights, and the electrician assumed that the kitchen folks were supplying the parts. Consequence: no down lights, and a delay while they were ordered and supplied.

However, slowly it has all come together. The house is more or less back to normal - everything that should be in the kitchen is in the kitchen, not in some other room. I still need to get the walls painted and the flooring laid, but it looks great,

and works like a dream. My nerves will recover, eventually.

Sunday, September 17, 2006

What makes customers happy?

I like to help people, I like to fix things, and I like to answer questions. Originally, I put these traits to use as a librarian (yes, really). But when computer terminals and then whole computers began appearing in libraries, I found that I could usually operate them more effectively than many of my fellow librarians. One thing led to another, and one day I found that I had some how or other changed careers, and I was now working in IT. At first I thought that this was only temporary, and that I would go back a library, and resume answering questions like "who wrote these lyrics" and "do you have a book about giardia lamblia". But the years passed, and I got comfortable with my new job, and quite fond of the greatly increased pay packet that came along with it. And if you work in technical support, you do get to spend a lot of time reading manuals to people who failed to RTFM by themselves, and answering questions like "what's does this error message mean?". There are a lot of similarities between technical support and reference work in a library.

In the library I noticed that I would often get profuse thanks for relatively trivial bits of work. I recall once listening to a piece of classical music played over a bad phone line: the caller desperately needed to know the name of the piece, because his daughter was going to dance to it in her ice-skating exam, and she had to be able to put the name of her chosen piece of music on her examination paperwork. I listened. I knew I recognised the piece, but couldn't name it off the top of my head. I said I'd call him back. What I didn't want to say was that it was the piece that the hippos and crocodiles dance to in Walt Disney's 'Fantasia". I went down to the stacks, located a book about the making of the movie, got the name of the piece (Ponchielli's 'Dance of the Hours'), and called him back. Joy unbounded, his kid could complete her exam paperwork.

Two weeks of dredging through remote corners of the library for source material for some half wit who wanted to make a movie about the history of music in Australia? Not a word of thanks.

And the same thing holds true in IT. I was once sent the hard disk from a server belonging to a small country hospital. Due to the incompetance of the IT contractor who worked for them part time, the beginning of the disk had been overwritten, knocking out the superblock and the beginning of the inode table (the machine had been running a version of SCO Unix). They had no usable backups (the same contractor had botched the installation of their tape drive, and all they had was blank tapes), and the disk contained the only copy of the pathology records for the hospital. It would make a lot of difference to the treatment of patients if they could get the data back.

Fortunately it was an early version of SCO Unix, and the file system structure was simple. I got the data back using fsdb (hands up all those old enough to have used the file system debugger). It took me about a week, and it was not easy. I got no thanks for this, only complaints about how long it was taking.

A few years later I went on site for a small legal firm on the edge of Chinatown. The disk in their office server had crashed, and the company that had sold them the machine in the first place had gone out of business. This one machine ran their whole office, including word processing (Word Perfect). There was no networking, everything was serially connected (Wyse terminals!) using some serial expansion board I had never seen before. They had data backups, but no usable operating system backups - their backups had been done using a version of tar that didn't archive device files correctly.

So I reinstalled the OS on a new disk, reloaded Word Perfect, recreated the user accounts, restored their data, found the driver disk for their strange serial board and got it working and reconfigured the print queues. I also fixed their backups, so I could restore easily if the machine failed again. Nothing complex, the serial board was the only slightly tricky bit (no manuals, no web site, no idea where it came from). Once I had everything back, I asked the office manager to get everyone to log on and check that they could open their files. No problems, so I moved on to testing the print queues. The first job came out, followed by a blank page. I said something like "that's wrong, give me a minute to get rid of the trailing page", and the office manager said "Oh, you can't". She had been told by the people who set the machine up that there would always be a blank page between print jobs, and that there was nothing that could be done about it. Part of the office routine was to extract the blank pages to be put back in the printer (it was a very small legal practice, and money was obviously tight).

I made a derisory comment about the intelligence of the original supplier (my tact circuit doesn't always function flawlessly), opened the printer interface script in vi and commented out the echo statement that was causing the queue to throw an extra page. Then I asked them to try printing again. Lo and behold! No blank page.

Joy unbounded didn't even begin to describe their response. This was the best thing that had happened to them in years! The whole ritual of reclaiming blank pages was obviously a pet hate of all the office folks. I was their complete heroine. Never mind that I had recovered their broken machine in the face of bad backups and no documentation. Getting rid of the blank pages made me a star. The whole office took me to lunch at a very nice Chinese restaurant.

Customers will be unmoved by technical brilliance if the results don't appear to help them. But find a real pain point and address it and even if what you have done appears to you simple and basic, yet you will be hailed as a heroine or hero. I've watched sale teams concoct exotic solutions which used cutting edge technology in clever ways. And I've seen the intended customer more or less yawn politely, and decline to buy. Don't focus on what you want to sell: focus on what the customer wants to buy. The problem you should be trying to solve is the customer's problem, whatever that may be; not the problem of how you make quota for the quarter. Solving the customer's problem gets you return business. Solving your quota problem will not.


Friday, August 25, 2006

Making Things Work - Part the Fourth - Do things the customer's way

One of the fastest ways to alienate a customer while trying to deliver an integration project is to fail to ask how they like things done.

Unless you are dealing with a completely new site, with virtually no existing infrastructure, there will be an accepted "right way" to do things such as labelling cables and managing backups which is widely understood by the customer's IT staff. They may not have any of their normal procedures and practices documented, but they will be deeply offended if you don't follow their standards. There is little as depressing as having a customer representative come by the racks of equipment that you are busily installing in their data centre, looking disapproving and making some helpful remark such as "the management cables should all be green". Particularly if you have already installed all the cables. If the customer has a cabling standard, they will expect you to comply with it, and failure to do so may hold up acceptance and sign off. And "we didn't know" will not fly as an excuse, so don't try it.

Involve your customer in the planning of the installation, and get their commitment on the details (in writing!). This will save arguments and rework, and rework is to be avoided at all costs: rework tends to introduce errors, and is of itself expensive. If they insist on something that you know is wrong, stupid or hazardous, make sure that you explain to them in writing why whatever they want will cause a problem. If they still insist, make sure the project manager has record of who said what and why, in the sure and certain knowledge that you will need this information when the project reaches the "allocation of blame" phase, and then do what they want. I have never seen a project stopped because a customer wanted something stupid done, though I've seen a few that should have been.

So what should you expect to need to know about the customer's preferences?

First, if you want a reference for good data centre design, try Rob Snevely's "Enterprise Data Center Design and Methodology". There are other books on the subject, but that one has always worked for me. However, whatever references you may have read, please temper their application with a big dose of common sense: no customer has a data centre that actually looks like the ones in the references.

Cabling standards. Do they use particular colours for particular purposes? How do they do label cables? How do they do cable management? In my experience, if the customer has a regular contractor who does their cabling, it can been quicker and cheaper to subcontract that party to do whatever is needed for the project.

What is the company policy for operating system installations? For firewalls and routers? For the use of encryption? If they have nothing (and that is still very common), and request that you apply "best practice", make absolutely certain that they are fully briefed about what your interpretation of best practice is before you start installing. I have wasted days, and in some extreme cases weeks, negotiating the minefield that is "security best practice" on customer sites. It's a subject everyone thinks they know about, usually because they read something in an inflight magazine or some other similiarly authoritative publication. When time permits, I'll do a blog entry on idiotic security practices I have seen. For now, try to avoid getting hung up on this particular reef.

Is there a standard administration account name that they use?

Unless the backup facilities are part of your deployment, how are they going to backup the systems you are installing now? What connectivity do you need to allow for in your build? I was once handed a "design" for a large, N-tier network installation. On close inspection, I realised that there was no way to connect the new equipment to the customer's existing infrastructure: the design had connection points for the internet, and for the customer's partners, but not for their own admin staff. It had a single "console" (actually a small desktop machine which had a screen), from which everything in the build was supposed to be controlled. I went to manager responsible for the job and said "there's no way to connect this lot to the customer's network". And he said "that feature wasn't listed in their Request For Proposal", and took the position that, since they hadn't asked for it, it wasn't our problem to provide the functionality.

I could see that this was going to be a problem, but reasoned argument made no headway at the time, so the infrastructure engineering team went ahead and installed everything according to the "design" (which had numerous other defects, which we had to fix as we went along). A few months later (it was a big and complex build), the equipment was on the floor of the customer's data centre, and they realised two things. The first was that the only way to administer the new equipment was to walk into the data centre operator's room, sit down at the "console" and work from there: it was unreachable from anywhere else. And the second things was that they couldn't print any reports from the new system, because it had no connectivity to their printers.

The technically correct solution in my view was to redesign the whole thing to integrate tidily and securely into the existing customer infrastructure, but no one (including the customer) was prepared to contemplate that option. They had a deadline to get the new systems into production, and the project was already running late. Instead, we had to design and build an ugly chunk of bridging network to connect the new systems to their network. This cost the customer more money (it was unquestionably a variation to the scope of works), complicated the build, extended the test phase and looked like what it was: a belated after thought. One phone call could have headed this off, if it had been made at the beginning of the project, if anyone had been prepared to tell the customer that they were making a mistake. Even if they chose not to address the problem at the time, at least we as the vendor would have retained a little credibility.

So if you can see problems in the design, try and make someone pay attention early on, while there is still time to fix things. Even if all you achieve is an email trail that documents recognition of a potential problem, that may be enough to make you look less foolish if the wheels fall off further into the build.

Double check what they are planning to do about backups: this may be the point where you discover that nobody quoted them the extra backup client licences that they will need for whatever product they use. Be particularly careful if you have to deal with applications that require special backup clients: I've seen jobs where the salesman only quoted operating system backup clients, and missed the licences needed for Oracle. In the worst case, the customer may need to upgrade a tape silo or SAN to get sufficient capacity for backups.

What are the customer's expectations of user acceptance testing (UAT)? On a large project with a lot of software, there is a tendency for the testing phase to focus on the applications and their functionality: "does this stuff do what we wanted?" testing. And I have seen projects where the entire test plan was written by the software development team: they were astounded when they found that the customer expected to see test cases for the hardware as well. Find out what your customer wants, and get ready to deliver it, because without a completed UAT, you will not get sign off. Be extremely wary of contracts that contain vague statements about "mutually agreed test plans". That phrase just means that the people who drafted the contract had no idea what testing could or should be done. Chances are the contract was worked out by business people, not IT staff. But you can bet that the UAT will have to be acceptable to the IT staff, and they may insist on all sorts of time consuming tests.

The whole issue of testing can blow your schedule right out of the water, and it can add unanticipated extra work to the build. Take a fairly common case: you build a set of infrastructure that is supposed to be firewalled off from the internet on one side, and connected to the customer's existing network on the other. When you come to test it, the customer insists that you conduct tests to prove that the infrastructure is secure before they will let you connect it to either their network or the internet. You essentially need to perform a penetration test, but in an isolated environment. A common requirement is that you facilitate access for a neutral third party to conduct the penetration test (and this is quite reasonable: the people who build secure systems should not be the same people who test them). I've seen projects that had to install additional equipment just provide adequate facilities for testing to take place in a manner that the customer was pleased to accept. The equipment had to be sourced, installed and configured. It all takes time.

My strongest recommendation on testing is that you start writing the test plan before you unpack a single piece of equipment. Write an outline, parcel out the work, make sure the project manager realises that this is a non-trivial task, get regular customer feedback. If nothing else, the work of writing test cases can be used to keep the team busy if some other part of the project stalls. And stall it will...

Tuesday, August 22, 2006

Making things work - Part the third - The Project Manager Is Your Friend

A good project manager is a jewel beyond price. I've worked on projects where, without the effort and expertise of the project manager, we would never have closed the job. Unfortunately a bad project manager can make the simple jobs next to impossible, and drive the engineers to distraction.

Good project managers keep management off your back, control scope creep, track open issues, sort out the logistics of equipment supply and delivery, handle the resourcing problems (when you absolutely have to have the company expert in "whatever" for 4 straight days if you are to complete the project, and he's already booked to another job), worry about time cards and P&L and other financial matters and prevent you from going crazy by helping you prioritise problems in a sensible way.

Bad project managers agree to everything the customer asks for, regardless of scope, have no idea how to keep an issues log, leave you to sort out your own logistics and resourcing and generally cause more problems than they solve. The character who insists that a team, all of whom are working in the same small room, have several "catch up meetings" every day, with him and the customer's project manager, and then wonder's why work is proceeding slowly. The guy who demands rapid reconfiguration of production equipment in the customer's data centre, to suit his project, but in violation of the customer's change control procedures. The genius whose project schedule doesn't include any time for testing. You do get them.

If you've got a good project manager, cherish them. If you have a inferior model, be prepared to do part of their work yourself, because it has to be done, like it or not. If your project manager is just a bit inexperienced, you can help them be better by explaining what you need from them and why. And consider the fact that a project manager who has only ever worked on software development projects will need your help to get their head around a project that includes a lot of hardware.

Of course, there's always the possibility that you are a relatively inexperienced engineer. Perhaps you've never worked on a big project, or with a project manager before. There was a first time for all of us. So let's step through some basics here:

Scope creep

Every project, large or small, is supposed to have a fixed scope. The scope is the body of work that is to be performed, for example "install Linux on four servers". The customer agrees to pay for that work. There is a contract (or work order, or whatever the local lingo requires). The customer pays $N and the engineers do Y tasks. But if the customer then produces a fifth machine, and asks to have that installed as well, that would be an increase in scope. That's a fairly obvious case, but the thing is that if you are already installing four machines (and you're probably doing them in parallel), doing a fifth doesn't look like much extra work. There's a tendency to agree to it.

Scope creep becomes much more hazardous in a complex environment. Adding one extra web server or proxy server to a load balanced multi-tier network can involve reconfiguring firewalls and load balancing switches, and testing all the changes. That one extra machine can cause hours of extra work, and if it triggers a problem (as under-analysed changes to designs so often do), the whole project may grind to a halt until a fix is found. My basic rule is that I, as the lead engineer, never under any circumstances agree to anything that causes scope creep. All such matters must be referred to the project manager. A good project manager will then say "what could go wrong if we do this?" and "how long will it take?", and make a decision about what to do next. Really little things should be agreed to, but they still need to be documented and tracked in the project documentation. Larger things may require a variation to the scope of work, and an additional charge. And as soon as you talk about extra money, someone on the customer's side will have to approve the additional cost. Sometimes this will cause the proposed change to be halted, because no one is prepared to pay for it.

Be very wary of requests for additional work coming from the customer's operations and support staff: the chances are very high that they do not have the authority to incur additional costs, and they probably don't have the right to vary an agreed scope of work. Sorting this stuff out is the project manager's responsibility, and an too-helpful engineer on the team can make the project manager's life difficult. If you stop what you are doing for an hour to help Larry-the-customer's-engineer get something working, and as a consequence then run late to finish your appointed task, the project manager will be better able to handle complaints about the job running late if he/she can say "as agreed, we stopped for an hour to help your staff fix a problem".

Consider also the financial terms of your project. If it is a fixed price job, and you accept additional tasks into the scope at no charge, you are working for free. If, on the other hand, you are on time-and-materials, and you perform extra work and charge for the time, you will increase the size of the customer's final bill. The first case shouldn't make you happy, and the second case may make them very unhappy, particularly if they are on a tight budget. Spurn scope creep: you are not here to make everything perfect in the customer's environment, you are here to deliver a project. Focus.

Issue logs and customer communications

The issues log should be kept by the project manager. They may call it something else, but its purpose is to record problems, issues and questions, and what was done about them. So when an issue develops, say "not enough power in the data centre to power up the machines", the project manager logs an issue: "Issue 1, 6 additionals 32 AMP power connections needed in the data centre", and the task of resolving the issue is assigned to someone. Then every time someone in management asks why the project has stalled, the project manager can say "we have a open issue that is preventing us moving forward, and it's been assigned to Joe Blogs". This will normally deflect the wrath of management onto Joe Blogs, who in this case will probably be the customer's data centre manager, and capable of taking care of himself (I've never met a female data centre manager).

The issues log also becomes a repository of the project's history. It can be a big help when, six months in, no one can remember why a certain thing was done in a certain way (Why did we set all those subnet masks to 27 bits?). The issue log may remind you of an old, forgotten problem, and what you did to get around it.

It is also vital that all email relating to the project is stored. Hard copy communication normally gets stored by default, but email can get overlooked. You MUST keep it: everything you send to the customer, and everything you get back. Like the issue log, it can remind you of how you got to a particular place. It can also be very useful when disputes arise. So:

Many moons ago, the marketing department of a certain telco decided that they would offer their customer's an application hosting service (this happened long enough ago that application hosting was something of a novelty). Accordingly, their staff did a design, and handed it to my business development manager so that we could price the work. We looked at it, and pointed out that the design called for a separate firewall, router and switch for each new customer, and that, given the equipment specified, they would only be able to fit two customer deployments per rack. In case you are not familiar with the economics of data centres, rack space is expensive, and you try to minimise the number of racks used. So we wrote the customer's representative an email, pointing out that the proposed solution would rapidly become extremely onerous to manage, and that the whole things could be done more cheaply by virtualising the network components. We were trying to help.

What we got back was a curt note stating that the design had been done by "our expert staff" and requesting that we mind our own business and get on with pricing and installing the first set of equipment. The initial deployment was enough for two customers, one rack of kit.

And the BDM said "what do we do now?" and I said "File that email for future reference, and we'll get on with the job", which we did. Job complete, signed off and accounts paid.

Weeks passed, and one day the BDM got a very grumpy email from someone in authority on the customer's side. We had installed a very bad design, they were going to have to buy a new firewall, router and switch for every new customer, the system would soon be unmanageable. What did we proposed to do about it?

In response we sent back a copy of our email, pointing out the deficiencies of the design, and of their response to same. Silence fell. I believe their may have been some blood letting on their side.

Save all correspondence. I used to have a project manager who would print all relevant emails to PDF, and publish them to a web page which belonged to the project. That worked well. If you get a project email that the project manager is not copied on, forward to them. But make sure they are storing the emails somewhere, for later.

Friday, August 18, 2006

Making things work - Part the second

Assuming you read part one, and followed the instructions, you have read all the project documentation, produced a good network diagram and a racking plan, and your customer is getting together the details of IP addresses and naming conventions. What's next?

If you work for a large organisation that does a lot of integration work, or perhaps for a hardware or software vendor, somewhere about the place there will be a group of people who provide technical support to customers. Even if the call centre itself is in another country, the chances are that there are a few people still on the pay roll whose business it is to go to the customer's premises and fix things when they break, and possibly perform preventive maintenance. If you are dealing with an important customer, there's a good chance that there is a support person "dedicated" to them. They're probably dedicated to a dozen other customers, too, but they'll know them reasonably well. Even if there is no dedicated engineer, the support folk may know your customer. Seek them out and enquire. Take your network diagram and rack plan.

Support staff do not have an easy life (I've done this job, so believe me on this one). Nobody calls you to tell you that they're having a nice day: they call to complain that something isn't working, and they're often upset when they do it. If you are really lucky, you get paged at 2AM by some operator in the data centre who can tell you that there is an error on his screen, but can't tell you what he was doing when the error appeared. Anyone who works in support has a vested interest in your project, because when you finish, they are going to have to support it. Their lives will be easier if you have followed accepted best practice and your racking arrangements are sensible.

Anyone who regularly does customer facing infrastructure delivery work should make every effort to be on good terms with their colleagues in Support, because you can help each other. You'll know you've struck the right balance when you can go to them and ask for advice, and when they come to you saying things like "the customer's got this device/software that we've never seen before, are you familiar with it?" What you don't want is the question "why did you idiots install it like this?"

So be up front with them: locate the people most likely to be affected by your project, and tell them about it. If it's a completely new customer, ask the support team lead who is most likely to have to take care of them, and try and engage that person. Show them your diagrams. Take their suggestions and/or criticism in good spirit. Listen to anything they can tell you about the customer and their environment. You may get gems of information such as "no project that this customer has started in the last three years has gone into production" or "you do realise that their existing kit is housed in a so-called server room constructed in the space under a staircase?". You may be told that their data centre manager is a direct blood-line decendant of Attila the Hun, and a nightmare to deal with (data centre managers would make an interesting psychological study for any one with the intestinal fortitude to tackle the subject: they tend to be obsessive control freaks of the first order). Whatever, heed the wisdom of Support. They have been on the customer's site, and you probably haven't. If they do nothing else, they can clue you in to the change control procedures that the customer uses and the building access that you can expect when you try to deliver your project's kit. I once saw a case where the data centre was located in the top story of a heritage listed building with narrow staircases, and a crane was needed to get the new equipment into place.

Support will know if the customer has power or air conditioning problems. I once saw a major telco take delivery of a large new machine (a Sun 25K), only to discover that there were insufficient power connections on the data centre floor to turn the thing on. When they got sufficient 32 AMP points put in, they found that the extra heat load of the frame overloaded their airconditioning, and the machine could only be operated for limited periods until they got an air conditioning upgrade done. Installation was rather drawn out on that one.

Learn what you can, adjust your diagrams, and move on.

By now you should know what you are installing, and what it is supposed to do. You should be able to work out what skill sets you will need to complete the job, and the approximate order in which you are likely to need them. Tell the project manager who you are going to need, and when (I'm assuming here that you can get the skills you need in-house. If you can't, then the project manager will need time to find a external resource). Resourcing issues should now keep them occupied while you tackle the next critical task...

Wednesday, August 16, 2006

Making things work - Part the first

In every project that I have ever worked on, no matter who I was working for, there was a phase between the customer signing the order and the new "system" being handed over to the customer, and that phase was usually called "system integration". Sometimes it was called "system build" or "install and implement" or something similar, but it always meant "put everything together and make it work the way we told the customer it would".

In the early days, the system integration phase would be short and simple: install a couple of computers and a printer and configure the print queues, or some such thing. But as the projects got bigger, the integration tasks became more numerous and varied. And that increased complexity never seems to get factored into anyone's estimated costs, and I suspect this is because no one but the people who actually do the integration can list all the possible sub tasks, and explain why some of them are both difficult and time consuming. One of the things that always worries me is the sight of a project manager who has just been handed their first big integration job. You just know that they are going to suffer torments for the entire duration of the project, and no warnings will convince them that there will be problems until the job gets properly under way. By about the second week they will be looking stressed and clutching at their Gantt charts for comfort, and comfort there will not be. For verily, systems integration can be extremely tough.

Now you may never have done this sort of work, but have a job like this in your future. Or perhaps you have done an integration job, and never want to suffer like that again. Or perhaps you're in sales, and you'd like to know why the contingency budget on all projects is usually not only fully consumed, but over run (and why all the infrastructure engineers hate your intestines). Let me try and explain how integration works from the point of view of the people who do it, and share what I've learned about things that can make it hurt less. I'm not claiming that this is the best way: it's just the best I've come up with so far. The job I'm going to imagine here is somewhere between $AU3 million and $AU5 million. It will include several racks of computer equipment, firewalls, load balancers, switches and routers, a lot of licenced software and some that is bespoke.

Almost every integration job that I have ever been assigned has featured implementing something that someone else designed. It is a sad fact in the IT industry that engineers who work in pre-sales are not compelled to work on the implementation of the things they design. It might make some of them more careful. I always start be reading whatever documentation exists: the design document that was given to the customer, any internal documentation that the sales people have for the job, anything we got from the customer, the bill of materials and (if it exists at this point), the project schedule.

It annoys me deeply that sales people and project managers create project schedules without asking the people who are going to do the work how long they think it will take to do, but this particular form of insanity seems common across the industry. People who couldn't perform the required tasks if their lives depended on it will guess how long the tasks that they don't understand will take to complete. Anyway, take note of where they think the job will end. The end date usually bears a close relation to the next end of financial quarter, the often reflects no urgency on the customer's part, but merely a requirement to get revenue recognised by a particular date. If you happen to be a customer embarking on one of these projects, pause and consider what your vendor's revenue recognition deadlines may be doing to your project quality.

However, back to the documentation. Most designs feature at least one diagram which represents the network in which the proposed solution will reside. For many of the jobs which I have done, the network itself has been part of the solution. Unfortunately, the diagram will usually be a marketing diagram: little pictures of computers joined together with lines. This type of diagram is extremely dangerous, because it conceals more information than it reveals. Consider a diagram such as this:

Three computers, how bad can it be you say? Well the Internet isn't really a cloud: typically, we see its physical entry point as a port on a router somewhere in the data centre. Two computers can't connect to the same router port, so either we need two router ports, or a switch between the computers and the router. Or there might be a whole lot of other infrastructure between those two machines and the internet - routers, switches, hubs, firewalls. The diagram doesn't tell us. Nor does the diagram really tell us how the two computers in the middle connect to the one at the bottom. Does the third machine have two network cards, or is there some other device between them? What is it? A switch or a router? And if there are switches, routers or extra networks cards needed, do they already exist or are they coming out of our project's budget?

Start by drawing a detailed network diagram that shows every device and every port, and work out what is required, what already exists and what has to be purchased. Compare what has to be purchased to the bill of materials. I've done several jobs where critical components have been missed from the bill of materials, and you need to get missing equipment ordered quickly, or the delay in supply will hold up your build.

The missing equipment issue can be complicated if your contract with the customer includes hardware maintenance. If you have to add extra components to make it work, you will also be adding the cost of the hardware maintenance contracts. This will eat into your contingency budget, assuming you are on a fixed price agreement, which these things almost always are. Furthermore, if you add extra components, you will increase the amount of heat the completed solution will generate, and you will require more power and probably more rack space. You see the problem?

Now while you are doing all this, your project manager is probably agitating for you to get the build under way. Don't. You are no where near ready to start building anything. First you need a proper racking design, and to do that properly you need a very detailed network diagram. You need a clear picture of what network cabling is going to have to go where. Once you understand what must physically connect to what, and you have the physical specs of all the equipment, you can draft the racking diagram. Good rack design will save you time in the build, and make your customer's life easier when they take control of your creation. Equipment manufacturers all have recommendations for how their gear should be racked (you did check that all the racking rails got ordered, didn't you?). As a rule of thumb, put the big heavy stuff near the bottom of the racks and try to minimise the amount of cable that has to go between racks.

While you are doing all this, there is other information that you are going to need soon. Find out who your contact is on the customer side, and ask them for their standard for naming equipment in their network. Also ask what their procedure is for allocating blocks of IP addresses, and get the details of their DNS, default routers, time servers and anything else of that type that you are going to need. Finally, ask if they have a standard for operating system installation. Do they minimise or harden the OS in any way? Do you need to do anything special to conform to their normal practices? It may take them a while to collate this information, so ask for it early. You still have a lot of work to do before you are ready to start installing anything.....

Monday, August 14, 2006

Doomed projects

Over the years I've been involved in some weird and wonderful projects. Some I remember fondly, some I try not to think of at all. Most projects have been successful, if "success" is defined as "what we built worked, the customer signed off, we got paid and the system went into production". But one or two ended in failure, and I was thinking about one of these this weekend while contemplating the question "how does a big IT project get launched when it should be obvious to anyone if possession of the facts that the project is doomed from its inception?"

I worked on one of these train wreck jobs about 10 years ago. At that time I was working for a reseller, and my job was on the help desk, providing technical support for things like SCO Unix, ArcServe and several sorts of hardware. Home internet access was becoming increasingly common in Australia, and the internet itself was evolving rapidly from a place where geeks talked about geek stuff to a place where non-geeks were trying to do business.

On the other side of the world a Big American Bank (let's call them B.A.B.) had launched a very swish new website. Their customers could register on this website, tick the boxes for the information that interested them, and then on their next visit, the website would show them relevant, fresh, content. And the software would keep track of who visited and what they looked at. By modern standards this is trivial, but in the mid-nineties it was hot stuff. The site had been built using software and professional services from a famous Californian application vendor (let's called them CalVend), and it was extremely slick. Other banks were deeply envious, and a certain large Australian bank (let's call them L.A.B.) decided that it was their destiny to be the first bank in Australia to have a website like B.A.B.'s.

Now it so happened that my employer was reseller of CalVend's products. CalVend were very keen to have L.A.B. deploy their software, so keen that they sent a presales engineer to Sydney to assist in closing the deal. And the deal was closed, and then it dawned on the sales folk that someone was actually going to have to deliver the solution, which only ran on Solaris. The company didn't have many Unix-literate engineers: most of the in-house geeks were Netware specialists. Remember when Netware was fashionable? I digress.

They needed a Unix engineer to work on this project, so I was summoned from the depths of technical support and seconded to the project team. The project manager and I were sent to California to do the application training course, and by the time we got home the hardware had been delivered and installed. I installed the operating system and the application software, and the rest of the team started reconstructing the rest of the L.A.B. website, so that it would mesh nicely with the new application. I settled down to write the Perl CGI scripts that would interface the registration form into the application. All the application actually did was record user details, record what each person was interested in (investment or retirement planning or whatever), and retrieve pieces of content from its store to be displayed to the user. Getting the information in there in the first place was my problem.

A number of other problems soon became apparent. The first one was that the application software was both badly documented and riddled with bugs. The second was that CalVend had expected that their professional services would be engaged to do this job. When they found that we were planning to do it ourselves, and that the only revenue they were going to get was the stupendous licencing fee for the application, they became extremely uncooperative. We would call them to ask how to resolve some problem, and it would always transpire that the only person who could answer our question would be "travelling", typically in Korea or Outer Mongolia, and would not be available for at least a week.

However, we persevered, and the day came when everything worked. You could register, and the system would hand you the test pages we were using as dummy content. The operator could apply weighting to selected content, so a user was more likely to see page A instead of page B. All as required by the specification. So I turned to the L.A.B. person who was working with us, and said "OK, we need to load the first cut of the real content, so we can test properly" and he went away and returned shortly with a 3.5" floppy, saying "here it is". So I dumped the files into the system, and had a look at them. I experienced the first twinge of alarm when I saw how few there were, about 15 as I recall. And they were all PDFs. I opened one. Then another. The full horror hit me: the contents of the floppy was a set of PDFs for the printed brochures that you could get from a L.A.B. branch. Boring leaflets featuring carefully staged, politically correct photos of happy bank customers, a few paragraphs of waffle and the phone number of the bank's call centre.

I asked when the rest of the content would be available. The answer was that there wasn't any more. "We thought we would just start with that, and see how it went" they said. I pointed out, firmly, that no end user would go through the web site registration process for the privilege of getting the same brochures that they could get from the local branch. And if that was all they could get, they wouldn't bother revisiting the site. That scared them: the whole point of this exercise was to increase traffic to the bank's web site, and to be able to monitor that traffic. B.A.B. had a large staff churning out content, so there was almost always something fresh to look at on their site. L.A.B. had nothing in place to supply a constant stream of content. Problem.

So they called their marketing department, and arranged a meeting. I was there (I had to explain how the system worked, most of the bank staff still didn't really understand it). The meeting was interesting. The marketing folks had not heard about this web site project, and you could see they weren't happy. Once they understood what was going on, and how far it had got, you could see them fighting for composure and control: they could see brand name ruination staring them in the face. If the site went live with the current "content", the result could only be public humiliation of the bank on a national scale. But they had no resources, and more importantly, no plan for what content should be created and why.

The marketing folks were smart people. They didn't say anything negative. They said that they needed time to formulate a plan, and they went away. We went back to the office we were working from, to await developments.

I don't know exactly what happened next, because there were meetings to which I was not invited. Nor, so far as I could tell, was anyone else working on the project. What I think happened is that the marketing folks went a few levels up the bank's chain of command, and had our project put on hold, pending the creation of content. They then produced a plan for content creation, and as part of that plan, had control of the project transferred to them. And once they had control, they signed off the completed work, and shut the project down completely. We got paid for the work we had done, but the site never went live, and about a year later when the bank did launch a new web site, it was a completely new development (and it didn't use CalVend's software).

How did this go so wrong? The project could never have succeeded without good content, but nobody thought about the content at all. All the project sponsors thought about was the "coolness" of the web site, and everyone on the implementation team assumed that there would be content available when we needed it. Nothing in our scope of work mentioned content, all it talked about was functionality. We delivered the functionality, but functionality with no content is meaningless.

But it is far easier to focus on the technology, the gadgets, the bells and whistles. I've seen quite a few project sponsors do this over the years: they like to come and have their photo taken in front of the equipment racks when the new system goes live. Unfortunately, most end users are completely unimpressed by the technology, because they don't understand it and they don't need to understand it. They do not care about the sophistication of the software, the elegance of the design, or any of the fascinating little details that will absorb an engineer for hours. They just want "the system" to do something useful, and to do it quickly, without making them feel stupid.

For the engineers, a successful project is one where the technology works, and we don't start getting support calls as soon as we hand the system over to the customer. For the project manager, it's one where the job is completed on schedule, the customer signs off and we get paid. But for the end user, a successful project is one that creates a service, utility or product that they will want to use. If the end users stay away, what you have built is a white elephant, often a very expensive one. I don't recall what L.A.B. spent on their doomed project, but I don't believe there was much change out of $AU500,000.

Friday, August 11, 2006

To blog, or not to blog, that is the question. Whether 'tis better to start to blog, and then discover that time, commitment and/or inspiration fail within the first month. Or not start to blog and perhaps miss out on something worthwhile. Well, if you don't bet, you can't win, so let's give it a whirl...

To the extent that this blog has a purpose, it will be to allow me to muse aloud on subjects that interest me, and occasionally to vent my spleen about matters that annoy me. I can only hope that others will find my maunderings useful, amusing or at least thought provoking. Subjects that interest me include (but are not limited to) science fiction, fantasy, philosophy, history, knitting, crochet, food, alcohol, computers, security, music and cats. Things that annoy me include, but most definitely are not limited to, casual rudeness, inept network design, sloppy documentation, bigotry, most modern management fads, almost everything to do with product marketing and competitive sport.

Let's see how we get along.....


Bookmark and Share