Dear ZOS Project Manager,
I get it. Money comes from new features... it's hard to advertise "the same as before, but now with fewer problems". Also, writing new code and features are FUN; it's easy to assign to a new engineer. Fixing problems are not fun, and much less rewarding. I'd even bet that the employee who originally wrote some of the problematic code has moved on to greener pastures. I get that too; fixing someone else's code is even less fun and rewarding. However, do you know what is super rewarding as a developer? Working on a stable code base. Starting with something that is clean and absent of major architectural flaws makes all the new code faster to write and much, much more enjoyable to work on as an engineer.
As a software engineer and project manager for over 20 years, I'd urge you to invest in your future and tackle the fundamental problems that have been put on the backburner (or likely assigned to someone in QA who is struggling to reliably reproduce). Clearly, you have some very talented engineers on your team. Have a meeting with the entire team and say "ok, we are finally going to do this, all hands on deck closing these issues. No new features go in until we solve some these issues". Trust me, you WILL get some great suggestions, and putting your superstar developer on testing/verifying these issues will pay more dividends than you realize down the road. I'm sure there are parts of the code nobody likes touching, super fragile pieces that seem to break when even the smallest change it made. You and your team know this isn't good; It's time to change that.
Your team won't like it at first, but trust me, they will thank you later. Imagine how good it would feel to have all trial disconnects fixed; never any lag in cyrodill; dungeon finder working exactly as designed... you know the tough issues better than us. Imagine if you had 100% confidence in these areas; think of how easy it would be the tweak the design later. That's when development becomes fun and fast.
For testing there are some great tools for finding crashes; static analysis is free, fast and never a bad idea.
There are multiple static analysis programs out there, and it doesn't hurt to try several. Worst case is that you spend an afternoon setting it up and the information isn't useful; best case is that it prevents a future hard-to-find bug. My experience is that each one excels in different areas, so its worthwhile to run at least 2. Memory allocation and verification tools such as boundschecker, purify and now built into visual studio have each helped me track down tough intermittent crashes in our application. I would explore this technique to find clues for the trial disconnect issue. All of these tools also have a performance metrics which will tell you exactly which function is taking most time; this should help you narrow in on the cyrodil lag issues -it should point you to exactly where the lag is coming from. If you know where the problems are but the in-house team doesn't know how to optimize it further, try posting a software bounty. Lay out the problem and offer $5,000 if a freelance software engineer can solve it. You might be surprised what you find.
I'll leave you with the words of a popular manager and software writer; Joel Spolsky (one of the founders of stack overflow). He wrote his famous "
Joel Test" 17 years ago and is still very relevant. Here is an expert which explains the value of fixing bugs much better than I can.
5. Do you fix bugs before writing new code?
The very first version of Microsoft Word for Windows was considered a “death march” project. It took forever. It kept slipping. The whole team was working ridiculous hours, the project was delayed again, and again, and again, and the stress was incredible. When the dang thing finally shipped, years late, Microsoft sent the whole team off to Cancun for a vacation, then sat down for some serious soul-searching.
What they realized was that the project managers had been so insistent on keeping to the “schedule” that programmers simply rushed through the coding process, writing extremely bad code, because the bug fixing phase was not a part of the formal schedule. There was no attempt to keep the bug-count down. Quite the opposite. The story goes that one programmer, who had to write the code to calculate the height of a line of text, simply wrote “return 12;” and waited for the bug report to come in about how his function is not always correct. The schedule was merely a checklist of features waiting to be turned into bugs. In the post-mortem, this was referred to as “infinite defects methodology”.
To correct the problem, Microsoft universally adopted something called a “zero defects methodology”. Many of the programmers in the company giggled, since it sounded like management thought they could reduce the bug count by executive fiat. Actually, “zero defects” meant that at any given time, the highest priority is to eliminate bugs before writing any new code. Here’s why.
In general, the longer you wait before fixing a bug, the costlier (in time and money) it is to fix.
For example, when you make a typo or syntax error that the compiler catches, fixing it is basically trivial.
When you have a bug in your code that you see the first time you try to run it, you will be able to fix it in no time at all, because all the code is still fresh in your mind.
If you find a bug in some code that you wrote a few days ago, it will take you a while to hunt it down, but when you reread the code you wrote, you’ll remember everything and you’ll be able to fix the bug in a reasonable amount of time.
But if you find a bug in code that you wrote a few months ago, you’ll probably have forgotten a lot of things about that code, and it’s much harder to fix. By that time you may be fixing somebody else’s code, and they may be in Aruba on vacation, in which case, fixing the bug is like science: you have to be slow, methodical, and meticulous, and you can’t be sure how long it will take to discover the cure.
And if you find a bug in code that has already shipped, you’re going to incur incredible expense getting it fixed.
That’s one reason to fix bugs right away: because it takes less time. There’s another reason, which relates to the fact that it’s easier to predict how long it will take to write new code than to fix an existing bug. For example, if I asked you to predict how long it would take to write the code to sort a list, you could give me a pretty good estimate. But if I asked you how to predict how long it would take to fix that bug where your code doesn’t work if Internet Explorer 5.5 is installed, you can’t even guess, because you don’t know (by definition) what’s causing the bug. It could take 3 days to track it down, or it could take 2 minutes.
What this means is that if you have a schedule with a lot of bugs remaining to be fixed, the schedule is unreliable. But if you’ve fixed all the known bugs, and all that’s left is new code, then your schedule will be stunningly more accurate.
Another great thing about keeping the bug count at zero is that you can respond much faster to competition. Some programmers think of this as keeping the product ready to ship at all times. Then if your competitor introduces a killer new feature that is stealing your customers, you can implement just that feature and ship on the spot, without having to fix a large number of accumulated bugs.
I am a big fan of this game, and would love to see it thrive for years to come.