
What I'd really be interested to know is, if/how the industry will move towards a technical solution to be able to remove performance bottlenecks without having to restart the system, and how this could be implemented in MMOs. 100% up-time would be wonderful, but it's a costly solution that is currently only used by systems where potential failure would mean the loss of exorbitant amounts of money or even lives.
My experience comes from the banking industry, but the underlying idea is basically the same. There's this guy on Tumblr who answered some of your questions in one of his posts: http://askagamedev.tumblr.com/post/156041789585/what-really-happens-during-mmo-server-maintenance
First off -- This is not an entitlement thread, this is not a ZOS is horrible thread or an I want a refund thread... this is (hopefully) a constructive dialogue thread. Even saying that, I'm know sure how constructive even I will be, but it's gotten to that point. I do not wish this to be a pure hate thread though, rather a professional opionins thread, but of course anyone is more than welcome to chime in.
Fingers crossed though that this doesn't go too far south before at least 10 replies.
I want to make clear that, 9/10 times, I side on ZOS on server/network/dev issues in this forum, with a good educated reason.
I have been a Systems Engineer, professionally, for 17 years. I have managed tens - hundreds of thousands of servers in many industries... financial, commercial, semiconductor, social, etc.... but not online gaming. Uptime is critical in any industry that I've worked in, be it for in-house services or externally-facing services. Being in this profession for so long has also made me privy to the goings-on of related fields such as development, networking, and security.
Periodic unplanned downtime to fix code-related bugs, like yesterdays for example, are to be expected and should even praised when a game-breaking flaw has been found. This thread has zero to do with instances like that. Could tonight's downtime be another example of this? Unfortunately we will most likely never know, but this thread is absolutely NOT just about tonight.... it's just the proverbial straw. On that note.... NO they should not release the technical details of the huge bug yesterday or any bug.
Any Internet presence worth anything will eventually get some sort of DDOS. Why DDOS someone when you won't get any response from it? That said, there are plenty of services out there to help protect against it, and plenty of things in-house that can be done to mitigate (NOT 100% prevent) the problem. I hate to say this part but... some people on here will be like "Oh well Gmail/Facebook/Yahoo doesn't go down for hours" etc etc. While their basis for the argument is generally flawed, their point is valid.
OK SO
I can't fathom how management in the server/network department(s) allow for such horrendously frequent downtimes. Sure ESO (and basically all MMO's) make it clear in their TOS that they do not guarantee availability, but does anyone expect downtime to be often and for so many minutes at a time? Technically they could be up for an hour a month and still fall under that clause, so let's be realistic here.... the bottom line demands a certain level of service availability to retain customers, especially for an exclusively online service. We all know about five 9's etc etc, but I'm not sure if this could even be a one 9.
Systems people, Network people, Devs, etc.... what do you think about all of this? I just can't wrap my head around it frankly. Anyone with first-hand gaming industry experience please do comment from your perspective.
Credentials: I've been a certified network admin and developer for almost 15 years. I've been the lead developer for VoIP networks of over 5,000 machines. I currently manage a cluster of roughly 400 machines, mainly for large websites and developer environments. I have hosted hundreds of gaming servers, personally, from being involved in large gaming communities.
The bottom line is, downtime is the killer of customer appreciation and satisfaction. In all of my years as a network admin, I could never dream of explaining to my customers, my coworkers or my higher ups, that downtime is essential to the growth of their application or our company.
When I was in the VoIP industry, a few hours of downtime could result in tens of thousands of dollars in lost revenue, and likely my job. In the website industry, an hour of downtime loses you customers. They go elsewhere, because there are other options that guarantee 100% up-time. Businesses are not prepared for downtime in their financial models, because they hire competent network developers, whose sole responsibility is to prevent downtime.
In my eyes, the main difference is that ZoS already has our money. We bought in and there isn't much to lose from customer dissatisfaction. Yes, we buy ESO+. Yes, we purchase items from the crown store, but ZoS doesn't count on this to keep their business floating. Their business model is focused around initial game sales, and I'll bet that you can still purchase their game even when the servers are down.
It may be standard in the business of MMO's to have down-times, but I question why? Redundancy, version rollbacks and proper performance testing should alleviate down-times. Why, when there is a PTS for performance testing, do bugs go unnoticed? Maybe launching a patch with last minute changes that were not tested in the PTS environment are the culprit. Maybe it's poor version control or just careless developers in the first place. When I lead a team of developers, I would question the competence of the other developers in my department if their were down-times, but I didn't have this issue, because there were no down-times.
I perform routine weekly server maintenance, ironically, in the wee-hours of Monday morning. The main difference is that there isn't an average down-time of 4 hours every week for my applications. Actually, there is zero down-time. Every bit of maintenance is automated, due to proper planning and version control, and the only thing necessary is for me to ensure that all my servers are working as intended after the maintenance. The very few times, and I mean 2 times that I can think of, that there were issues after my routine maintenance, I was able to roll-back the servers in a matter of seconds to a working version and perform further testing on the proposed updated code before launching.
When I read about complaints that end-game content communities are dying and that player base is suffering, I can't help but point my fingers at these inherent problems. Customers aren't happy and they're leaving. Now, for customers like myself, who genuinely enjoy ESO as a whole and want the game to improve and grow, the main issue here is convenience. It is inconvenient that ESO has so much down-time, but not the end of the world. I don't plan on leaving the community because of these inconveniences But, what does this inconvenience make me feel? It makes me feel like ZoS doesn't care, like ZoS doesn't question their team and their developers as to why they don't employ industry-standard tactics like server redundancy, version roll-backs and automated code updates. To me, that is just poor leadership and a poor customer-satisfaction model. Who decided that it would be okay for servers to be down every single week? I've honestly never heard of an internet business who shuts down for hours every single week for routine maintenance. But if they did, I bet they'd lose customers, customer appreciation would be at an all time low and whoever was in charge of the network would be replaced with someone more competent.
Sorry for the wall of text, I just wanted to explain my thoughts as someone with experience in network hierarchy and it got a little long-winded.
I like the game content but am seriously not satisfied (trying to keep it nice) with 2 things : lags (including thise spikes) and server unavailability.
This game is not performing correctly in those 2 areas
Credentials: I've been a certified network admin and developer for almost 15 years. I've been the lead developer for VoIP networks of over 5,000 machines. I currently manage a cluster of roughly 400 machines, mainly for large websites and developer environments. I have hosted hundreds of gaming servers, personally, from being involved in large gaming communities.
The bottom line is, downtime is the killer of customer appreciation and satisfaction. In all of my years as a network admin, I could never dream of explaining to my customers, my coworkers or my higher ups, that downtime is essential to the growth of their application or our company.
When I was in the VoIP industry, a few hours of downtime could result in tens of thousands of dollars in lost revenue, and likely my job. In the website industry, an hour of downtime loses you customers. They go elsewhere, because there are other options that guarantee 100% up-time. Businesses are not prepared for downtime in their financial models, because they hire competent network developers, whose sole responsibility is to prevent downtime.
In my eyes, the main difference is that ZoS already has our money. We bought in and there isn't much to lose from customer dissatisfaction. Yes, we buy ESO+. Yes, we purchase items from the crown store, but ZoS doesn't count on this to keep their business floating. Their business model is focused around initial game sales, and I'll bet that you can still purchase their game even when the servers are down.
It may be standard in the business of MMO's to have down-times, but I question why? Redundancy, version rollbacks and proper performance testing should alleviate down-times. Why, when there is a PTS for performance testing, do bugs go unnoticed? Maybe launching a patch with last minute changes that were not tested in the PTS environment are the culprit. Maybe it's poor version control or just careless developers in the first place. When I lead a team of developers, I would question the competence of the other developers in my department if their were down-times, but I didn't have this issue, because there were no down-times.
I perform routine weekly server maintenance, ironically, in the wee-hours of Monday morning. The main difference is that there isn't an average down-time of 4 hours every week for my applications. Actually, there is zero down-time. Every bit of maintenance is automated, due to proper planning and version control, and the only thing necessary is for me to ensure that all my servers are working as intended after the maintenance. The very few times, and I mean 2 times that I can think of, that there were issues after my routine maintenance, I was able to roll-back the servers in a matter of seconds to a working version and perform further testing on the proposed updated code before launching.
When I read about complaints that end-game content communities are dying and that player base is suffering, I can't help but point my fingers at these inherent problems. Customers aren't happy and they're leaving. Now, for customers like myself, who genuinely enjoy ESO as a whole and want the game to improve and grow, the main issue here is convenience. It is inconvenient that ESO has so much down-time, but not the end of the world. I don't plan on leaving the community because of these inconveniences But, what does this inconvenience make me feel? It makes me feel like ZoS doesn't care, like ZoS doesn't question their team and their developers as to why they don't employ industry-standard tactics like server redundancy, version roll-backs and automated code updates. To me, that is just poor leadership and a poor customer-satisfaction model. Who decided that it would be okay for servers to be down every single week? I've honestly never heard of an internet business who shuts down for hours every single week for routine maintenance. But if they did, I bet they'd lose customers, customer appreciation would be at an all time low and whoever was in charge of the network would be replaced with someone more competent.
Sorry for the wall of text, I just wanted to explain my thoughts as someone with experience in network hierarchy and it got a little long-winded.
It may be standard in the business of MMO's to have down-times, but I question why? Redundancy, version rollbacks and proper performance testing should alleviate down-times. Why, when there is a PTS for performance testing, do bugs go unnoticed? Maybe launching a patch with last minute changes that were not tested in the PTS environment are the culprit. Maybe it's poor version control or just careless developers in the first place. When I lead a team of developers, I would question the competence of the other developers in my department if their were down-times, but I didn't have this issue, because there were no down-times.
Doctordarkspawn wrote: »Can we plaster this in front of ZOS employee's at some point? This needs drilled into they're heads.
In the many MMOs I have played over the years, downtime is a fact and a reality. I even remember LoTRO going down for 5 whole days! You can imagine how lit the forums were.
FloppyTouch wrote: »I'm not a expert or anything on this subject but I have played many MMORPGs over the years. This seems really common and just something I got use to. Is it right idk is it annoying yes but I just play something else or clean up at the place while I check the server status every 30mins.
I don't think it's really that big of an issue.
It's nothing to do with entitlement or compensation, I'm not interested in any of that. It's purely to do with keeping the players informed. I do think that is an area of weakness at the moment, and it's a pretty straightforward thing to put right without the need to give out sensitive information or provoke any sort of backlash.
In the many MMOs I have played over the years, downtime is a fact and a reality. I even remember LoTRO going down for 5 whole days! You can imagine how lit the forums were.
It's nothing to do with entitlement or compensation, I'm not interested in any of that. It's purely to do with keeping the players informed. I do think that is an area of weakness at the moment, and it's a pretty straightforward thing to put right without the need to give out sensitive information or provoke any sort of backlash.
Agreed!