Maintenance for the week of January 5:
· [COMPLETE] NA megaservers for maintenance – January 7, 4:00AM EST (9:00 UTC) - 10:00AM EST (15:00 UTC)
· [COMPLETE] EU megaservers for maintenance – January 7, 4:00AM EST (9:00 UTC) - 10:00AM EST (15:00 UTC)

Professional Opinions on ZOS Infrastructure

  • FloppyTouch
    FloppyTouch
    ✭✭✭✭✭
    ✭✭✭
    I was checking out there home page and looking at job listings looks like they are looking for a lot of programmers and engineers so they might be sort staff that can work on these issues.

    So this kind of backs ups some of your concerns related to the man power to deal with these issues.
  • Saturnana
    Saturnana
    ✭✭✭✭✭
    Basically, an MMO generates insane amounts of data throughout its usage, which will eventually result in slowly clogging memory, file/event listeners, queues, databases.. you name it. This also happens on any other 'normal' PC or server, but you'll generally not notice the decline in performance etc. because - for instance - most people turn off their PC when they go to bed. (Hence the "have you tried turning it off and on"-gag that is actually very sound advice in several situations). The systems running our MMOs don't get turned off when people go to bed, but they do run into the same technical issues when kept running for days/weeks on end. So currently, it's definitely a needed process.

    As far as the more frequent downtime goes; I think it has to do with the HotR release and the fact that even with the best planning / testing, moving a new piece of software to production and integrating it with already existing components will undoubtedly give some unforeseen results. If those results pose a high enough risk (i.e. technical risk has business impact that justifies downtime of the entire product), then the system will be taken down for maintenance and whatever defect was found will be fixed as soon as possible. All of this is approached from a business point of view; probably only taking into account the clients/players if their specific negative experience with the product in a certain situation has a business impact ZOS would like to mitigate. 'Horrendous downtime' is therefore 'allowed' when it has a higher benefit than cost.

    What I'd really be interested to know is, if/how the industry will move towards a technical solution to be able to remove performance bottlenecks without having to restart the system, and how this could be implemented in MMOs. 100% up-time would be wonderful, but it's a costly solution that is currently only used by systems where potential failure would mean the loss of exorbitant amounts of money or even lives.

    My experience comes from the banking industry, but the underlying idea is basically the same. There's this guy on Tumblr who answered some of your questions in one of his posts: http://askagamedev.tumblr.com/post/156041789585/what-really-happens-during-mmo-server-maintenance
    @Saturnna | PC / EU

    Nâmae Rin : Dragonknight | Dr Milodas Ra'Himo : Templar | Mira Motierre : Sorceress
    Plays-ln-Puddles : Warden  |  Lady Neria : Dragonknight   | Philadore : Nightblade  
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    "Ha! I do love it when the mortals know they're being manipulated. Makes things infinitely more interesting."
                                      - Sheogorath
  • Fherrit
    Fherrit
    ✭✭✭
    I appreciate the input from the more tech savy folks than me, as some others I've been in the MMO game for a good while now and know these things happen and yes they can be extremely frustrating. I genuinely believe the staff is hard at work if for no other reason than the issue of accountability of getting things done.

    But while I don't have much in the way of deep tech knowledge, I do know human resource management fairly extensively and have had many a frustrating meeting where Suits demand a result that their funding policies made not just difficult, but flat out impossible. And I've seen Suits think they're Darth Vader going all "You have failed me for the last time Admiral" and fire highly capable people on the spot because their unreasonable demands aren't being catered to.

    Most of the time upper management doesn't want to hear that they need to spend more, they only want to hear how they can cut costs and increase profits. I could give absurdly long list of examples of dept's I've done cost/performance analysis for at the various casinos I've worked for about this sort of short sightedness, but it would just make for a depressing read.

    So TLDR: I agree with the speculation that it's likely a payroll/starving infrastructure issue, has many of the symptoms.
  • Wreuntzylla
    Wreuntzylla
    ✭✭✭✭✭
    ✭✭
    I can only explain it as an effect of outsourcing to different shops. The extent to which different aspects of ESO are not in synch reeks of coders unfamiliar with (new to) the particular code and a lack of coherency due to disparate management.

    Well, I can also explain it due to one other reason, but ZoS typically gives the impression of competence and calculated decision making.
  • Sunver
    Sunver
    ✭✭✭
    Yesterday I tried to tracert both the EU and NA PC megaservers. Well I hope Deutsche Telekom has as good measures to counter the DDoSes and other stuff as Akamai.

    PFeMNog.png
    When I consider thy heavens, the work of thy fingers, the moon and the stars, which thou hast ordained;
    What is man, that thou art mindful of him? and the son of man, that thou visitest him?
    For thou hast made him a little lower than the angels, and hast crowned him with glory and honour.
    O LORD our Lord, how excellent is thy name in all the earth!
  • Opticon
    Opticon
    ✭✭✭
    .

    edit: quoted wrong post.
    Edited by Opticon on August 18, 2017 8:51AM
  • Scootter
    Scootter
    ✭✭✭
    ArenaNet's system to hotfix and patch Guild Wars 2 with no downtime is pretty impressive. I have no idea how they do this but it is done. I just wish ZOS can get things figured out.
  • Opticon
    Opticon
    ✭✭✭
    lnsane wrote: »
    What I'd really be interested to know is, if/how the industry will move towards a technical solution to be able to remove performance bottlenecks without having to restart the system, and how this could be implemented in MMOs. 100% up-time would be wonderful, but it's a costly solution that is currently only used by systems where potential failure would mean the loss of exorbitant amounts of money or even lives.
    My experience comes from the banking industry, but the underlying idea is basically the same. There's this guy on Tumblr who answered some of your questions in one of his posts: http://askagamedev.tumblr.com/post/156041789585/what-really-happens-during-mmo-server-maintenance

    Snipped quote. Thank you for replying, I agree about the cost issue, and comparing this to life or limb does add perspective.
    Edited by Opticon on August 18, 2017 8:54AM
  • Opticon
    Opticon
    ✭✭✭
    Fherrit wrote: »
    So TLDR: I agree with the speculation that it's likely a payroll/starving infrastructure issue, has many of the symptoms.

    I would tend to agree with you.

    Edited by Opticon on August 18, 2017 8:55AM
  • DRXHarbinger
    DRXHarbinger
    ✭✭✭✭✭
    ✭✭
    Opticon wrote: »
    First off -- This is not an entitlement thread, this is not a ZOS is horrible thread or an I want a refund thread... this is (hopefully) a constructive dialogue thread. Even saying that, I'm know sure how constructive even I will be, but it's gotten to that point. I do not wish this to be a pure hate thread though, rather a professional opionins thread, but of course anyone is more than welcome to chime in.

    Fingers crossed though that this doesn't go too far south before at least 10 replies.

    I want to make clear that, 9/10 times, I side on ZOS on server/network/dev issues in this forum, with a good educated reason.

    I have been a Systems Engineer, professionally, for 17 years. I have managed tens - hundreds of thousands of servers in many industries... financial, commercial, semiconductor, social, etc.... but not online gaming. Uptime is critical in any industry that I've worked in, be it for in-house services or externally-facing services. Being in this profession for so long has also made me privy to the goings-on of related fields such as development, networking, and security.

    Periodic unplanned downtime to fix code-related bugs, like yesterdays for example, are to be expected and should even praised when a game-breaking flaw has been found. This thread has zero to do with instances like that. Could tonight's downtime be another example of this? Unfortunately we will most likely never know, but this thread is absolutely NOT just about tonight.... it's just the proverbial straw. On that note.... NO they should not release the technical details of the huge bug yesterday or any bug.

    Any Internet presence worth anything will eventually get some sort of DDOS. Why DDOS someone when you won't get any response from it? That said, there are plenty of services out there to help protect against it, and plenty of things in-house that can be done to mitigate (NOT 100% prevent) the problem. I hate to say this part but... some people on here will be like "Oh well Gmail/Facebook/Yahoo doesn't go down for hours" etc etc. While their basis for the argument is generally flawed, their point is valid.

    OK SO

    I can't fathom how management in the server/network department(s) allow for such horrendously frequent downtimes. Sure ESO (and basically all MMO's) make it clear in their TOS that they do not guarantee availability, but does anyone expect downtime to be often and for so many minutes at a time? Technically they could be up for an hour a month and still fall under that clause, so let's be realistic here.... the bottom line demands a certain level of service availability to retain customers, especially for an exclusively online service. We all know about five 9's etc etc, but I'm not sure if this could even be a one 9.

    Systems people, Network people, Devs, etc.... what do you think about all of this? I just can't wrap my head around it frankly. Anyone with first-hand gaming industry experience please do comment from your perspective.







    Personally I find it disgusting, some of the contracts I work on runs a data centre for UK Government (several sections) and the national television provider. the costs to the data centre if there is any downtime... well let's just say 1hr of downtime on some of these costs more than you or I are earning in this decade (100% truth it's horrendously expensive)

    Even in the event of a total power failure on standby there is a 250,000 litre fuel tank to power the place for 48hrs all to maintain uptime.

    DDOS or anything similar, Other site comes on and site shuts down to work on the issue.

    How does the business mitigate this risk? Continuity!!!!!!!!!!!!!! they operate 2 side by side at all times with each site working in shift patterns (6AM to 6PM) and vice versa.

    I'm guessing ZOS has no continuity plan, nor the desire to even look into this. Even a reduced service operation, like PVE only wouldn't be the worst thing to happen.

    Anyone in the UK remember seeing BBC website, TV, online services offline? No..we do a good job.
    PC Master Race

    1001CP
    8 Flawless Toons, all Classes.
    Master Angler
    Dro-M'artha Destroyer (at last)
    Tamriel Hero
    Grand Overlord
    Every Skyshard
    Down With BOP!
  • randomkeyhits
    randomkeyhits
    ✭✭✭✭✭
    Been doing IT for waaaay too long, last eighteen years have been purely infrastructure.

    I've always seen the game developers as being focused on writing the games ( good! ) and using bog standard network libraries for what is basic client/server connectivity. Any cleverness living in the login/broker servers. The games are designed to run online, they are not designed to run 24/7.

    Normal outages look to be linear, stop service, back it up, make changes if it works open to the public type affairs. The back it up bit can be done very fast if you know how but by default most will take the obvious but slow (and cheaper) methods.

    The thing is, irrespective of the coding, the most important thing is the database itself. This has to be secure and protected as it contains all the efforts of all the players. Anything messing with that... we all suffer. So if there is any bug/exploit which does that then the service has to be closed immediately until the protection is working again which it looks like this situation comes under.

    At that point communication is everything as customers we hate having to deal with a void or black hole, which will invariably reflect back to us our negative emotions on the situation. Regular communications even it its just "bear with us", "the dev team are working on it", "the hamsters are running as fast as they can" do a lot to diffuse or just plain get rid of that bad emotion. We don't necessarily need to know exactly what is going on, just that it is and that we customers are not being ignored.

    We are very insecure that way.....
    EU PS4
  • di_rty
    di_rty
    ✭✭✭
    Credentials: I've been a certified network admin and developer for almost 15 years. I've been the lead developer for VoIP networks of over 5,000 machines. I currently manage a cluster of roughly 400 machines, mainly for large websites and developer environments. I have hosted hundreds of gaming servers, personally, from being involved in large gaming communities.

    The bottom line is, downtime is the killer of customer appreciation and satisfaction. In all of my years as a network admin, I could never dream of explaining to my customers, my coworkers or my higher ups, that downtime is essential to the growth of their application or our company.

    When I was in the VoIP industry, a few hours of downtime could result in tens of thousands of dollars in lost revenue, and likely my job. In the website industry, an hour of downtime loses you customers. They go elsewhere, because there are other options that guarantee 100% up-time. Businesses are not prepared for downtime in their financial models, because they hire competent network developers, whose sole responsibility is to prevent downtime.

    In my eyes, the main difference is that ZoS already has our money. We bought in and there isn't much to lose from customer dissatisfaction. Yes, we buy ESO+. Yes, we purchase items from the crown store, but ZoS doesn't count on this to keep their business floating. Their business model is focused around initial game sales, and I'll bet that you can still purchase their game even when the servers are down.

    It may be standard in the business of MMO's to have down-times, but I question why? Redundancy, version rollbacks and proper performance testing should alleviate down-times. Why, when there is a PTS for performance testing, do bugs go unnoticed? Maybe launching a patch with last minute changes that were not tested in the PTS environment are the culprit. Maybe it's poor version control or just careless developers in the first place. When I lead a team of developers, I would question the competence of the other developers in my department if their were down-times, but I didn't have this issue, because there were no down-times.

    I perform routine weekly server maintenance, ironically, in the wee-hours of Monday morning. The main difference is that there isn't an average down-time of 4 hours every week for my applications. Actually, there is zero down-time. Every bit of maintenance is automated, due to proper planning and version control, and the only thing necessary is for me to ensure that all my servers are working as intended after the maintenance. The very few times, and I mean 2 times that I can think of, that there were issues after my routine maintenance, I was able to roll-back the servers in a matter of seconds to a working version and perform further testing on the proposed updated code before launching.

    When I read about complaints that end-game content communities are dying and that player base is suffering, I can't help but point my fingers at these inherent problems. Customers aren't happy and they're leaving. Now, for customers like myself, who genuinely enjoy ESO as a whole and want the game to improve and grow, the main issue here is convenience. It is inconvenient that ESO has so much down-time, but not the end of the world. I don't plan on leaving the community because of these inconveniences But, what does this inconvenience make me feel? It makes me feel like ZoS doesn't care, like ZoS doesn't question their team and their developers as to why they don't employ industry-standard tactics like server redundancy, version roll-backs and automated code updates. To me, that is just poor leadership and a poor customer-satisfaction model. Who decided that it would be okay for servers to be down every single week? I've honestly never heard of an internet business who shuts down for hours every single week for routine maintenance. But if they did, I bet they'd lose customers, customer appreciation would be at an all time low and whoever was in charge of the network would be replaced with someone more competent.

    Sorry for the wall of text, I just wanted to explain my thoughts as someone with experience in network hierarchy and it got a little long-winded.
    Edited by di_rty on August 18, 2017 9:11AM
  • jaschacasadiob16_ESO
    jaschacasadiob16_ESO
    ✭✭✭✭✭
    Here is an interesting article that explains availability and the theory of the nines, for those without a distributed system background interested in the subject:
    https://vijaygill.wordpress.com/2010/11/10/nines/
    "Yesterday while searching a barrel in vVoM I found a lemon. Best drop of the whole run."

    Protect the weak. Heal the sick.
    Treasure the gifts of friendship. Seek joy and inspiration in the mysteries of love.
    Honor the Earth, its creatures, and the spirits. Use Nature's gifts wisely. Respect her power. Fear her fury.
  • Opticon
    Opticon
    ✭✭✭
    di_rty wrote: »
    Credentials: I've been a certified network admin and developer for almost 15 years. I've been the lead developer for VoIP networks of over 5,000 machines. I currently manage a cluster of roughly 400 machines, mainly for large websites and developer environments. I have hosted hundreds of gaming servers, personally, from being involved in large gaming communities.

    Yay a network guy!
    di_rty wrote: »
    The bottom line is, downtime is the killer of customer appreciation and satisfaction. In all of my years as a network admin, I could never dream of explaining to my customers, my coworkers or my higher ups, that downtime is essential to the growth of their application or our company.

    When I was in the VoIP industry, a few hours of downtime could result in tens of thousands of dollars in lost revenue, and likely my job. In the website industry, an hour of downtime loses you customers. They go elsewhere, because there are other options that guarantee 100% up-time. Businesses are not prepared for downtime in their financial models, because they hire competent network developers, whose sole responsibility is to prevent downtime.

    In my eyes, the main difference is that ZoS already has our money. We bought in and there isn't much to lose from customer dissatisfaction. Yes, we buy ESO+. Yes, we purchase items from the crown store, but ZoS doesn't count on this to keep their business floating. Their business model is focused around initial game sales, and I'll bet that you can still purchase their game even when the servers are down.

    That is a very excellent point which I did not consider. I am used to on-demand/real-time service/payments, not regularly scheduled payments. This does indeed present a very different scenario.[/quote]
    di_rty wrote: »
    It may be standard in the business of MMO's to have down-times, but I question why? Redundancy, version rollbacks and proper performance testing should alleviate down-times. Why, when there is a PTS for performance testing, do bugs go unnoticed? Maybe launching a patch with last minute changes that were not tested in the PTS environment are the culprit. Maybe it's poor version control or just careless developers in the first place. When I lead a team of developers, I would question the competence of the other developers in my department if their were down-times, but I didn't have this issue, because there were no down-times.

    This!!!
    di_rty wrote: »
    I perform routine weekly server maintenance, ironically, in the wee-hours of Monday morning. The main difference is that there isn't an average down-time of 4 hours every week for my applications. Actually, there is zero down-time. Every bit of maintenance is automated, due to proper planning and version control, and the only thing necessary is for me to ensure that all my servers are working as intended after the maintenance. The very few times, and I mean 2 times that I can think of, that there were issues after my routine maintenance, I was able to roll-back the servers in a matter of seconds to a working version and perform further testing on the proposed updated code before launching.

    This too! This is how it is supposed to be!
    di_rty wrote: »
    When I read about complaints that end-game content communities are dying and that player base is suffering, I can't help but point my fingers at these inherent problems. Customers aren't happy and they're leaving. Now, for customers like myself, who genuinely enjoy ESO as a whole and want the game to improve and grow, the main issue here is convenience. It is inconvenient that ESO has so much down-time, but not the end of the world. I don't plan on leaving the community because of these inconveniences But, what does this inconvenience make me feel? It makes me feel like ZoS doesn't care, like ZoS doesn't question their team and their developers as to why they don't employ industry-standard tactics like server redundancy, version roll-backs and automated code updates. To me, that is just poor leadership and a poor customer-satisfaction model. Who decided that it would be okay for servers to be down every single week? I've honestly never heard of an internet business who shuts down for hours every single week for routine maintenance. But if they did, I bet they'd lose customers, customer appreciation would be at an all time low and whoever was in charge of the network would be replaced with someone more competent.

    Sorry for the wall of text, I just wanted to explain my thoughts as someone with experience in network hierarchy and it got a little long-winded.

    I couldn't have said any of your entire response better myself, in fact I wish I had said it better now :smiley:

  • MarzAttakz
    MarzAttakz
    ✭✭✭✭
    Morgul667 wrote: »
    I like the game content but am seriously not satisfied (trying to keep it nice) with 2 things : lags (including thise spikes) and server unavailability.

    This game is not performing correctly in those 2 areas

    Those are the two things that have nailed it for me. I'm moving on until they can sort things out. This game (as do most of the others that follow this pattern) has so much latent potential. Sadly the fault cannot fall solely on the shoulders of the infrastructure and development teams. I'm blaming the primary stakeholders further up the chain, ultimately the budget and resource expenditure is down to them.
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    PC EU
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Qura Scura | Altmer | MagBlade
    Lhylyth | Breton | MagPlar
    Nhynyth | Khajiit | MagDK
    Ghwynyth | Dunmer | MagSorc
    Loots-All-Urns | Argonian | MagDen
    Shades-Of-Gray | Argonian | StamDK
    Or'Chastration | Orc | StamSorc
    Little Miss Famished | Orc | StamCro
    Fhane Sharog | Orc | StamDen
    Dead Moons Rising | Khajiit | StamBlade
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    The aim of argument, or of discussion, should not be victory, but progress.
  • SixVoltCar
    SixVoltCar
    ✭✭✭

    Look to Guild Wars 2. They run the same sort of thing, and when they need to patch something, they test and release, they do not shut down the entire server while they contemplate the deep philosophical meaning of every problem.

    What I think you're actually observing is misplaced priorities.
    Edited by SixVoltCar on August 18, 2017 9:28AM
  • Doctordarkspawn
    Doctordarkspawn
    ✭✭✭✭✭
    ✭✭✭✭✭
    di_rty wrote: »
    Credentials: I've been a certified network admin and developer for almost 15 years. I've been the lead developer for VoIP networks of over 5,000 machines. I currently manage a cluster of roughly 400 machines, mainly for large websites and developer environments. I have hosted hundreds of gaming servers, personally, from being involved in large gaming communities.

    The bottom line is, downtime is the killer of customer appreciation and satisfaction. In all of my years as a network admin, I could never dream of explaining to my customers, my coworkers or my higher ups, that downtime is essential to the growth of their application or our company.

    When I was in the VoIP industry, a few hours of downtime could result in tens of thousands of dollars in lost revenue, and likely my job. In the website industry, an hour of downtime loses you customers. They go elsewhere, because there are other options that guarantee 100% up-time. Businesses are not prepared for downtime in their financial models, because they hire competent network developers, whose sole responsibility is to prevent downtime.

    In my eyes, the main difference is that ZoS already has our money. We bought in and there isn't much to lose from customer dissatisfaction. Yes, we buy ESO+. Yes, we purchase items from the crown store, but ZoS doesn't count on this to keep their business floating. Their business model is focused around initial game sales, and I'll bet that you can still purchase their game even when the servers are down.

    It may be standard in the business of MMO's to have down-times, but I question why? Redundancy, version rollbacks and proper performance testing should alleviate down-times. Why, when there is a PTS for performance testing, do bugs go unnoticed? Maybe launching a patch with last minute changes that were not tested in the PTS environment are the culprit. Maybe it's poor version control or just careless developers in the first place. When I lead a team of developers, I would question the competence of the other developers in my department if their were down-times, but I didn't have this issue, because there were no down-times.

    I perform routine weekly server maintenance, ironically, in the wee-hours of Monday morning. The main difference is that there isn't an average down-time of 4 hours every week for my applications. Actually, there is zero down-time. Every bit of maintenance is automated, due to proper planning and version control, and the only thing necessary is for me to ensure that all my servers are working as intended after the maintenance. The very few times, and I mean 2 times that I can think of, that there were issues after my routine maintenance, I was able to roll-back the servers in a matter of seconds to a working version and perform further testing on the proposed updated code before launching.

    When I read about complaints that end-game content communities are dying and that player base is suffering, I can't help but point my fingers at these inherent problems. Customers aren't happy and they're leaving. Now, for customers like myself, who genuinely enjoy ESO as a whole and want the game to improve and grow, the main issue here is convenience. It is inconvenient that ESO has so much down-time, but not the end of the world. I don't plan on leaving the community because of these inconveniences But, what does this inconvenience make me feel? It makes me feel like ZoS doesn't care, like ZoS doesn't question their team and their developers as to why they don't employ industry-standard tactics like server redundancy, version roll-backs and automated code updates. To me, that is just poor leadership and a poor customer-satisfaction model. Who decided that it would be okay for servers to be down every single week? I've honestly never heard of an internet business who shuts down for hours every single week for routine maintenance. But if they did, I bet they'd lose customers, customer appreciation would be at an all time low and whoever was in charge of the network would be replaced with someone more competent.

    Sorry for the wall of text, I just wanted to explain my thoughts as someone with experience in network hierarchy and it got a little long-winded.

    Can we plaster this in front of ZOS employee's at some point? This needs drilled into they're heads.
  • randomkeyhits
    randomkeyhits
    ✭✭✭✭✭
    di_rty wrote: »
    It may be standard in the business of MMO's to have down-times, but I question why? Redundancy, version rollbacks and proper performance testing should alleviate down-times. Why, when there is a PTS for performance testing, do bugs go unnoticed? Maybe launching a patch with last minute changes that were not tested in the PTS environment are the culprit. Maybe it's poor version control or just careless developers in the first place. When I lead a team of developers, I would question the competence of the other developers in my department if their were down-times, but I didn't have this issue, because there were no down-times.

    Cost, pure and simple.

    To do things right first time (or attempt to)
    To properly run regression testing in-house
    To design, implement and maintain a fault tolerant distributed system
    To have enough capacity to be able to upgrade a portion of your servers and switch users to those while the rest upgrade.
    To maintain the underlying database system in a way which allows rolling upgrades
    To be able to choose the fast options over the slow ones.

    The above all need time, money, people and equipment above and beyond what is needed to provide a 'basic' (read usual MMO) service.

    And luck too.....

    To be able to tell the money men that their published release date is pants and is not going to happen because quality. (j/k its always IT which has to bend over...)
    EU PS4
  • Opticon
    Opticon
    ✭✭✭
    Can we plaster this in front of ZOS employee's at some point? This needs drilled into they're heads.

    Yes, please, that was an 100x better version of my original post. Well put, again.
    Edited by Opticon on August 18, 2017 9:34AM
  • Doctordarkspawn
    Doctordarkspawn
    ✭✭✭✭✭
    ✭✭✭✭✭
    Backround: I have no IT experience. I do, however, have experience with this game. I was here when it launched and I've followed it for a while.

    With that in mind, what could I have to contribute? Simple: I'm a studier of history. I look to the past to understand what is happening now. And I have a question.

    Back in 2015, august, 19-20th at aproximation (At least that's when I can find records) Zenimax suffered massive layoffs. The initial effects were the offing of game-master staff from the company but a curious effect seems to have happened after that. Performance, around this time, and following various DLC releases of Orsinium, and later the next year Thieves Guild, went way down, as did server instibility. It was a slow process but a noticeable one in retrospect.

    My question is simple. We know the graphics update of Thieves Guild went badly, and the game didn't quite like to run on a DX 11 system. Could this, combined with the layoffs which I think was related to networking staff, (Or could have been, related to maintence, ect) affect the longer downtimes? Could the problem have been directly related to cost, in that they couldn't afford to pay the networkers, and then increased the load a system had to bear for extended periods of time? Forgive me if I'm talking out of my ass, but I'm genuinely curious.
    Edited by Doctordarkspawn on August 18, 2017 10:02AM
  • Banana
    Banana
    ✭✭✭✭✭
    ✭✭✭✭✭
    Im guessing someone forgot to add some more coal for the boilers
    Edited by Banana on August 18, 2017 10:01AM
  • Aurielle
    Aurielle
    ✭✭✭✭✭
    ✭✭✭✭✭
    Nermy wrote: »
    In the many MMOs I have played over the years, downtime is a fact and a reality. I even remember LoTRO going down for 5 whole days! You can imagine how lit the forums were.

    LOL, LOTRO... Remember when the lag spikes were so bad that stable horses literally froze in place mid-trip, requiring a manual dismount? Eight years in that game taught me the value of patience.

    I'm no developer, but I've played a lot of MMOs. This problem is not unique to ZOS, and I actually experience LESS performance issues/downtime here.

  • Tandor
    Tandor
    ✭✭✭✭✭
    ✭✭✭✭✭
    I'm not a expert or anything on this subject but I have played many MMORPGs over the years. This seems really common and just something I got use to. Is it right idk is it annoying yes but I just play something else or clean up at the place while I check the server status every 30mins.

    I don't think it's really that big of an issue.

    I agree, as an online gamer since 1997.

    All gaming companies go through these sorts of phases where there are problems one after another, and that includes during times long before anyone had heard of DDOS attacks let alone suffered one. What separates the great from the not so good, however, isn't the amount of server downtime incidents but the level of communication over them.

    In this I praise the Community team who do a fantastic job of keeping us as informed as they are allowed to and within the limitations of their own knowledge of what's happening. Where I do think ZOS could improve is in the higher level communication and in that respect I do think that it's time for some sort of statement from either Rich Lambert or Matt Firor by way of a "State of the Servers" announcement.

    It's nothing to do with entitlement or compensation, I'm not interested in any of that. It's purely to do with keeping the players informed. I do think that is an area of weakness at the moment, and it's a pretty straightforward thing to put right without the need to give out sensitive information or provoke any sort of backlash.
  • Opticon
    Opticon
    ✭✭✭
    Tandor wrote: »
    It's nothing to do with entitlement or compensation, I'm not interested in any of that. It's purely to do with keeping the players informed. I do think that is an area of weakness at the moment, and it's a pretty straightforward thing to put right without the need to give out sensitive information or provoke any sort of backlash.


    Agreed!
  • Biro123
    Biro123
    ✭✭✭✭✭
    ✭✭✭✭
    As a software developer with 25 years experience of working on systems that, again, need to be constantly up - and are much, much more critical than a game - I can only back Zos.

    Over the years, I have seen many, many major releases of software which result in bugs being introduced. Even some very minor releases introducing bugs. The most innocuous-looking change to the most innocuous-looking line of code can sometimes cause massive unintended problems.

    This IS inevitable. There will ALWAYS be problems after any major release. How you deal with those problems depends on the kind of problem and the impact it has. Sometimes (in a professional environment) you can simply say to your (professional) user-base 'X is broken - can you get by without using it till we fix it?'
    If its an exploit in an MMO.. Can you trust your customer-base to NOT abuse it? Well, no - you have to do something.

    Sometimes, the problem is literally game-breaking, and you have to roll out a fix as soon as possible - bringing down the system (or in larger cases - the impacted part of the system) to implement the fix.

    Sometimes bugs are viewed as more minor to be corrected in a later release.

    Sometimes there is some kind of performance issue introduced which causes the system to grind to a halt and come down on its own every now and then.

    I mean I don't work in the games industry, but suspect that they can't really just shut down 'part' of it while a bugfix is done.

    But regardless of the industry - after a major release the WILL be a flurry of activity as issues are found and downtime possibly required to address them. Sometimes the fix itself breaks something, or simply doesn't fully work (all too common since they are usually rushed) - and so the fix then needs a fix..

    They have just had a major software release. Its only to be expected.

    Don't know about the current NA issue though, sounds like it could be more hardware or networking-related (and so in the realms of the OP) - as they would have brought down all servers for a software patch.

    But then, we've just had an emergency software patch.. We now have an unexpected Networking/hardware issue..
    Sometimes the car just breaks down. Just be happy someone is on-hand to fix it.

    Minalan owes me a beer.

    PC EU Megaserver
    Minie Mo - Stam/Magblade - DC
    Woody Ron - Stamplar - DC
    Aidee - Magsorc - DC
    Notadorf - Stamsorc - DC
    Khattman Doo - Stamblade - Relegated to Crafter, cos AD.
  • Opticon
    Opticon
    ✭✭✭
    Aurielle wrote: »
    In the many MMOs I have played over the years, downtime is a fact and a reality. I even remember LoTRO going down for 5 whole days! You can imagine how lit the forums were.

    Yes you are absolutely correct, however it still begs to question... why, exactly. From a server standpoint, which I am taking, there is little to no acceptable excuse besides underfunding of their infrastructure team(s).

    I don't mean to argue with you here, but the "this is how other games are" doesn't really excuse why it's currently an issue in ESO.

    Edited by Opticon on August 18, 2017 10:32AM
  • FrancisCrawford
    FrancisCrawford
    ✭✭✭✭✭
    ✭✭✭✭
    Shortly after launch, http://www.dbms2.com/2014/04/16/the-worst-database-developers-in-the-world/ plus its comment thread had a long discussion of coding issues.

    tl;dr -- the code is bad pasta.

    Obviously, that's not helpful to devops.



  • Biro123
    Biro123
    ✭✭✭✭✭
    ✭✭✭✭
    Opticon wrote: »
    Tandor wrote: »
    It's nothing to do with entitlement or compensation, I'm not interested in any of that. It's purely to do with keeping the players informed. I do think that is an area of weakness at the moment, and it's a pretty straightforward thing to put right without the need to give out sensitive information or provoke any sort of backlash.


    Agreed!

    I'm not so sure.. Whenever I've been involved in a critical issue, it normally boils down to 1 or 2 developers trying to get to the bottom of it, with 500 managers asking for eta's. and demanding to know what caused it etc. I normally tell them where to go - or we have a competent incident-manager who keeps them off your back. You really cannot solve anything without shutting these people out.

    Basically the nature of MOST (not all), IT bugs is that you first need to find out what's causing it. THIS is what takes the time. And its also an unknown amount of time - and takes a great deal of concentration as you're often keeping track of several sections of code/data at once - while the live scenario (if still up) is constantly changing. This could take several hours.

    Once you find the cause, the fix is often done in 5 minutes. Testing in 20 minutes. Then there's probably about 30 mins or so of getting approvals/access to actually implement the fix.

    Now asking for an eta during the investigation stage is like asking someone how long it will take them to find that needle in the haystack... You have no idea. It could stab you in the finger as soon as you put your hand in - or you may have to sift through the whole stack handful-by-handful. Any communications at this stage can only ever be 'We're looking into it'. Its not that the devs are deliberately hiding stuff - they just don't know yet.

    Asking for an eta during the 'fixing' stage... Well, I've had instances where there have been multiple managers on the line demanding my attention to give estimates, 'next steps', impact assessments, what-if's etc.. . I could have stopped, talked to them for 20 minutes or so to give them this 'communication' that they so value - and miss the critical deadline.. Or I could have spent 5 minutes ignoring them, doing the fix - then tell them after 7 minutes.. 'All fixed and working'..
    Basically, sometimes the situation changes so rapidly that by the time any updated get written up and sent out, they are out-of-date and wrong.

    Or if I did give a rough estimate, Most fixes like this are 'best guesses' with no guarantee that they would work.. I mean, your taking what would usually involve weeks of analysis, pre-testing, speaking to SME's, code reviews, testing etc. and cramming it into a couple of hours, so its gonna be risky. So companies are reluctant to go to the customers and provide an eta - because they will look very bad if it doesn't work - and then need more downtime.
    So basically in this stage, either the developer actually doing the work isn't gonna give much of an eta - so what gets to the customer is still just guesswork, or he does, but nobody is confident with it - so again - still need to be vague with the customers.

    So basically, usually the best you can expect with all issues is..

    1. We're looking into it. (for quite a long time)
    then
    2. We think we've fixed it and are bringing up the servers now.

    And I don't think we should be informed of what the bug was - not in this industry - because people will then go out looking for similar 'bugs' to exploit..





    Edited by Biro123 on August 18, 2017 10:57AM
    Minalan owes me a beer.

    PC EU Megaserver
    Minie Mo - Stam/Magblade - DC
    Woody Ron - Stamplar - DC
    Aidee - Magsorc - DC
    Notadorf - Stamsorc - DC
    Khattman Doo - Stamblade - Relegated to Crafter, cos AD.
  • WaltherCarraway
    WaltherCarraway
    ✭✭✭✭✭
    Ways to ping PC NA Megaserver?
    Back from my last hiatus. 2021 a new start.
  • anitajoneb17_ESO
    anitajoneb17_ESO
    ✭✭✭✭✭
    ✭✭✭✭✭
    The server downtime is the result of two factors :

    - Avoiding them costs money
    - Allowing them costs nothing, or very little.

    As long as avoiding them will cost more than allowing them, it will not change.

    To add to the second point, the facts look like we complain/demand/whine a lot, but in the end we excuse a lot, accept a lot, live with it, and bottom line : we keep on playing, buying and subscribing.
    Of course I do not know how many people actually quit / unsub / don't buy the game because of those downtimes, but I'm pretty sure it's negligible.

    Partly because we're behaving like sheep, partly because we're addicted, but also, let's not forget that : because ESO is an excellent game.

    Edited by anitajoneb17_ESO on August 18, 2017 11:46AM
Sign In or Register to comment.