The Gold Road Chapter – which includes the Scribing system – and Update 42 is now available to test on the PTS! You can read the latest patch notes here: https://forums.elderscrollsonline.com/en/discussion/656454/

Professional Opinions on ZOS Infrastructure

Opticon
Opticon
✭✭✭
First off -- This is not an entitlement thread, this is not a ZOS is horrible thread or an I want a refund thread... this is (hopefully) a constructive dialogue thread. Even saying that, I'm know sure how constructive even I will be, but it's gotten to that point. I do not wish this to be a pure hate thread though, rather a professional opionins thread, but of course anyone is more than welcome to chime in.

Fingers crossed though that this doesn't go too far south before at least 10 replies.

I want to make clear that, 9/10 times, I side on ZOS on server/network/dev issues in this forum, with a good educated reason.

I have been a Systems Engineer, professionally, for 17 years. I have managed tens - hundreds of thousands of servers in many industries... financial, commercial, semiconductor, social, etc.... but not online gaming. Uptime is critical in any industry that I've worked in, be it for in-house services or externally-facing services. Being in this profession for so long has also made me privy to the goings-on of related fields such as development, networking, and security.

Periodic unplanned downtime to fix code-related bugs, like yesterdays for example, are to be expected and should even praised when a game-breaking flaw has been found. This thread has zero to do with instances like that. Could tonight's downtime be another example of this? Unfortunately we will most likely never know, but this thread is absolutely NOT just about tonight.... it's just the proverbial straw. On that note.... NO they should not release the technical details of the huge bug yesterday or any bug.

Any Internet presence worth anything will eventually get some sort of DDOS. Why DDOS someone when you won't get any response from it? That said, there are plenty of services out there to help protect against it, and plenty of things in-house that can be done to mitigate (NOT 100% prevent) the problem. I hate to say this part but... some people on here will be like "Oh well Gmail/Facebook/Yahoo doesn't go down for hours" etc etc. While their basis for the argument is generally flawed, their point is valid.

OK SO

I can't fathom how management in the server/network department(s) allow for such horrendously frequent downtimes. Sure ESO (and basically all MMO's) make it clear in their TOS that they do not guarantee availability, but does anyone expect downtime to be often and for so many minutes at a time? Technically they could be up for an hour a month and still fall under that clause, so let's be realistic here.... the bottom line demands a certain level of service availability to retain customers, especially for an exclusively online service. We all know about five 9's etc etc, but I'm not sure if this could even be a one 9.

Systems people, Network people, Devs, etc.... what do you think about all of this? I just can't wrap my head around it frankly. Anyone with first-hand gaming industry experience please do comment from your perspective.







Edited by Opticon on August 18, 2017 7:46AM
  • Morgul667
    Morgul667
    ✭✭✭✭✭
    ✭✭✭✭✭
    I like the game content but am seriously not satisfied (trying to keep it nice) with 2 things : lags (including thise spikes) and server unavailability.

    This game is not performing correctly in those 2 areas
    Edited by Morgul667 on August 18, 2017 7:44AM
  • FloppyTouch
    FloppyTouch
    ✭✭✭✭✭
    ✭✭✭
    I'm not a expert or anything on this subject but I have played many MMORPGs over the years. This seems really common and just something I got use to. Is it right idk is it annoying yes but I just play something else or clean up at the place while I check the server status every 30mins.

    I don't think it's really that big of an issue.
  • Opticon
    Opticon
    ✭✭✭
    I'm not a expert or anything on this subject but I have played many MMORPGs over the years. This seems really common and just something I got use to. Is it right idk is it annoying yes but I just play something else or clean up at the place while I check the server status every 30mins.

    I don't think it's really that big of an issue.

    I agree it's not that big of an issue, in the overall picture, but for this I just wanted to focus on specific infrastructure-related issues. I want to be clear that this is not a "I want a refund for my downtime" thread.
  • Acrolas
    Acrolas
    ✭✭✭✭✭
    ✭✭✭✭✭
    There are nearly 100 proverbs referencing straw.

    Perhaps an apt example of thinking one thing while many other possibilities can and do exist.
    signing off
  • Opticon
    Opticon
    ✭✭✭
    Acrolas wrote: »
    There are nearly 100 proverbs referencing straw.

    Perhaps an apt example of thinking one thing while many other possibilities can and do exist.

    Thanks for commenting. I would assume that most people would infer that I was referring to the one that broke the camel's back.

    Now, back to the subject at hand.
    Edited by Opticon on August 18, 2017 7:55AM
  • aToken
    aToken
    ✭✭✭
    I just feels like manpower or something is low. Like there is a rush or struggle somewhere. Causing the careless mistakes. Maybe overwhelming employees almost.
  • Opticon
    Opticon
    ✭✭✭
    aToken wrote: »
    I just feels like manpower or something is low. Like there is a rush or struggle somewhere. Causing the careless mistakes. Maybe overwhelming employees almost.

    I can wholeheartedly agree with that! I've seen the outcome from understaffed groups, and this kinda smells just like it.
  • aToken
    aToken
    ✭✭✭
    It's not the downtime imo that's bad it's the excessive amount of bugs every time there is a patch. Almost like patch is executed porely or to quickly
  • Beardimus
    Beardimus
    ✭✭✭✭✭
    ✭✭✭✭✭
    With all the details and RCA im unsure how we can debate what they are getting right / wrong.

    We have no idea how the architecture works or dependencies / complexities or even how the contracts with their vendors work. There may well be restrictions unseen to us or just pure cost limitations on contracts.

    If we look at income vs infrastructure costs I'm.unsure the budget is even that good.

    In short without all the facts I don't think we can judge
    Xbox One | EU | EP
    Beardimus : VR16 Dunmer MagSorc [RIP MagDW 2015-2018]
    Emperor of Sotha Sil 02-2018 & Sheogorath 05-2019
    1st Emperor of Ravenwatch
    Alts - - for the Lolz
    Archimus : Bosmer Thief / Archer / Werewolf
    Orcimus : Fat drunk Orc battlefield 1st aider
    Scalimus - Argonian Sorc Healer / Pet master

    Fighting small scale with : The SAXON Guild
    Fighting with [PvP] : The Undaunted Wolves
    Trading Guilds : TradersOfNirn | FourSquareTraders

    Xbox One | NA | EP
    Bëardimus : L43 Dunmer Magsorc / BG
    Heals-With-Pets : VR16 Argonian Sorc PvP / BG Healer
    Nordimus : VR16 Stamsorc
    Beardimus le 13iem : L30 Dunmer Magsorc Icereach
  • Kahsa
    Kahsa
    ✭✭✭
    -
    Edited by Kahsa on August 18, 2017 8:06AM
  • Thogard
    Thogard
    ✭✭✭✭✭
    ✭✭✭✭✭
    Not in computer science or networking in any capacity, but I've played a ton of MMOs...

    This kind of outage is to be expected when a game first launches, or when a game releases an expansion pack.

    I was actually really pleased with the morrowind launch - I thought it would crash more often than it did... most MMOs will have instability issues when something new happens, then stabilize and fix it.

    But in ESO's case, their trouble seems to be constant and seems to occur regardless of major changes. It seems like it's constantly an issue during prime time.

    This leads me to believe that it isn't something that they don't know how to fix... rather, it is simply something too expensive for them to fix. And that worries me a lot.
    PC NA - @dazkt - Dazk Ardoonkt / Sir Thogalot / Dask Dragoh’t / Dazk Dragoh’t / El Thogardo

    Stream: twitch.tv/THOGARDvsThePeasants
    YouTube: http://youtube.com/c/thogardpvp


  • Opticon
    Opticon
    ✭✭✭
    Kahsa wrote: »
    This isn't an issue of servers breaking down or crashing. The downtime has been related to online gaming company issues. If someone here works for an online gaming company, would love to hear their thoughts.

    Care to elaborate?
  • Nermy
    Nermy
    ✭✭✭✭✭
    In the many MMOs I have played over the years, downtime is a fact and a reality. I even remember LoTRO going down for 5 whole days! You can imagine how lit the forums were.

    You obviously speak from more experience on a technical side but from a player's perspective, I expect these downtimes, I expect the occasional crash and even some of the lag. I KNOW they are working hard behind the scenes to correct these and for that reason I keep on subbing to provide them with the funds to keep on doing a good job.

    I love the game and my only gripe would be there is not enough communication from ZoS. Sure @Gina_Bruno and @Jessica_Folsom do their best but they can only communicate what they are given. I also get fed-up with the entitlement threads... f-off with those.

    I don't know if I have answered your question but I'm sure I have added to the debate.
    @Nermy
    Ex-Leader of The Wabbajack [EU EP PvP guild - Now stood down from active duty]
    BLOOD FOR THE PACT!!!

    Nermden - EP Warden, Nerm-in'a'tor - EP Dragon Knight, N'erm - EP Sorcerer, D'arkness - EP Nightblade, Nermy - EP Templar

    “Always forgive your enemies; nothing annoys them so much.” ― Oscar Wilde

    "An Army is a team; lives, sleeps, eats, fights as a team. This individual heroic stuff is a lot of crap." -General George S. Patton
  • Opticon
    Opticon
    ✭✭✭
    Beardimus wrote: »
    With all the details and RCA im unsure how we can debate what they are getting right / wrong.

    We have no idea how the architecture works or dependencies / complexities or even how the contracts with their vendors work. There may well be restrictions unseen to us or just pure cost limitations on contracts.

    If we look at income vs infrastructure costs I'm.unsure the budget is even that good.

    In short without all the facts I don't think we can judge

    Unfortunately in any industry, income vs. infrastructure is always a delicate subject. I also agree that without all the facts we can't say anything for sure, but we can still expect a reasonable level of uptime even after considering all of the factors that many people are not aware of.

  • Sagranax
    Sagranax
    ✭✭✭
    aToken wrote: »
    I just feels like manpower or something is low. Like there is a rush or struggle somewhere. Causing the careless mistakes. Maybe overwhelming employees almost.

    My thoughts exactly. It feels as if this whole thing is being kept together with duct tape or somesorts.

    Edited by Sagranax on August 18, 2017 8:09AM
  • RupzSkooma
    RupzSkooma
    ✭✭✭✭
    Being a Software engineer ( Embedded system , networking , game engine programmer , AI and general programmer) I can say that Zeni need to upgrade their network infrastructure. But in the end it all comes down to the business side .
    Is the Upkeep cost worth it ?
    is the Upgrade cost worth it ?
    I think right now the total downtime is little too much for most of the consumers.

    As an engineer I can't make a constructive criticism without having more data available.We know not much about the software itself . The architecture and how it interact with the system. We don't know about the business figure.
    We don't have the source codes.We know very much nothing about the server side mechanism of the software and how exactly it interact with it , we don't know about the exact number of employees that has the task of managing the servers and their efficiency.
    And we will never know about any of these probably and for good reasons.
    So only criticism I can make is as a consumer but not as an engineer. You don't need to be an engineer to understand Zeni is facing trouble right now managing the servers.
    Edited by RupzSkooma on August 18, 2017 8:14AM
    Elder Kings II is a Role Playing Elder Scrolls mod for Crusader Kings III.
  • Opticon
    Opticon
    ✭✭✭
    Nermy wrote: »
    In the many MMOs I have played over the years, downtime is a fact and a reality. I even remember LoTRO going down for 5 whole days! You can imagine how lit the forums were.

    Yes, yes, I remember those days "fondly" :wink:
    Nermy wrote: »
    You obviously speak from more experience on a technical side but from a player's perspective, I expect these downtimes, I expect the occasional crash and even some of the lag. I KNOW they are working hard behind the scenes to correct these and for that reason I keep on subbing to provide them with the funds to keep on doing a good job.

    I completely agree they work hard to fix problems.
    Nermy wrote: »
    I love the game and my only gripe would be there is not enough communication from ZoS. Sure @Gina_Bruno and @Jessica_Folsom do their best but they can only communicate what they are given. I also get fed-up with the entitlement threads... f-off with those.
    Communication is crucial, but as you know this forum is about 95% hate to one thing or the other, so I can understand the limited responses. HOWEVER.... a BIT more detail would go a LONG way to making customers happy.
    Nermy wrote: »
    I don't know if I have answered your question but I'm sure I have added to the debate.

    And thank you for doing so :)
  • MakoFore
    MakoFore
    ✭✭✭✭✭
    my good friend owns an internet cafe down the street - well a gaming cafe. over 60 computers with all types of games- at the moment its pBUG, LOL , DOTA , OVERWATCH and BG and CS that are the most popular games.
    occasionally WOW players come in, and i m the only ESO player who goes regualalry- i play from home but i go in when i want to clock a decent vMA score or join a raid- or do some serious pvp- as the lag in my place is crap- well in the whole country really.
    he and i have had many talks- and ESO is by far the worst performing game server/ping/latency wise , downtime wise- and patch wise also. ill often be sitting there waiting for the game spikes to come back to normal after a vma death- and he ll laugh at me while he plays BDO next to me.
    he has had many customers come in- see what im playing- try it- and then see that the pings are 400 or so and never come play it again. that s the thing with games- especially online games- its not good enough to be stable 90 percent of the time- because when someone gets booted from an emp run, a 12 man trial or a record vma push- its enough to make people quit- and never come back. the quality of online gaming recently has become very very good- world wide- unfortunately for my game- its never been more dire.
  • Slack
    Slack
    ✭✭✭✭✭
    Asking for professional opinions in a place that is full of angry nerds :)

    Anyway, comparing the work of ZOS to other games I liked and played, like Hammerpoint with WarZ /ISS or the MMO I played before, Age of Conan which is run by funcom, I must say that ZOS does a good job in updating, fixing and adding new stuff
    Edited by Slack on August 18, 2017 8:17AM
    PC EU
    Betty Breeze - Magwarden
    Hunts S'hitblades - Stamplar
    Aschavi - Magplar
  • Opticon
    Opticon
    ✭✭✭
    Slack wrote: »
    Asking for professional opinions in a place that is full of angry nerds :)

    I'm trying my best, so far it's working out :-D
  • Scootter
    Scootter
    ✭✭✭
    I have also been I.T. for most of my life and I agree it is baffling. Like @aToken said
    I think it is possible that they run a very small team and just don't have the resources to do everything correctly or maybe cut corners.

    I really love this game but it pains me to say that their servers/infrastructure/QA has some serious issues going on. I am die hard and will continue playing but I am sure people have ditched the game over downtime and bugs.
  • Opticon
    Opticon
    ✭✭✭
    Being a Software engineer ( Embedded system , networking , game engine programmer , AI and general programmer) I can say that Zeni need to upgrade their network infrastructure. But in the end it all comes down to the business side .
    Is the Upkeep cost worth it ?
    is the Upgrade cost worth it ?
    I think right now the total downtime is little too much for most of the consumers.

    Agreed!
    As an engineer I can't make a constructive criticism without having more data available.We know not much about the software itself . The architecture and how it interact with the system. We don't know about the business figure.
    We don't have the source codes.We know very much nothing about the server side mechanism of the software and how exactly it interact with it , we don't know about the exact number of employees that has the task of managing the servers and their efficiency.
    And we will never know about any of these probably and for good reasons.
    So only criticism I can make is as a consumer but not as an engineer. You don't need to be an engineer to understand Zeni is facing trouble right now managing the servers.

    Agreed too! However I was shooting for the more tech-professional feedback, such as yours :)

    Edited by Opticon on August 18, 2017 8:17AM
  • Nermy
    Nermy
    ✭✭✭✭✭
    MakoFore wrote: »
    my good friend owns an internet cafe down the street - well a gaming cafe. over 60 computers with all types of games- at the moment its pBUG, LOL , DOTA , OVERWATCH and BG and CS that are the most popular games.
    occasionally WOW players come in, and i m the only ESO player who goes regualalry- i play from home but i go in when i want to clock a decent vMA score or join a raid- or do some serious pvp- as the lag in my place is crap- well in the whole country really.
    he and i have had many talks- and ESO is by far the worst performing game server/ping/latency wise , downtime wise- and patch wise also. ill often be sitting there waiting for the game spikes to come back to normal after a vma death- and he ll laugh at me while he plays BDO next to me.
    he has had many customers come in- see what im playing- try it- and then see that the pings are 400 or so and never come play it again. that s the thing with games- especially online games- its not good enough to be stable 90 percent of the time- because when someone gets booted from an emp run, a 12 man trial or a record vma push- its enough to make people quit- and never come back. the quality of online gaming recently has become very very good- world wide- unfortunately for my game- its never been more dire.

    Where abouts are you?
    @Nermy
    Ex-Leader of The Wabbajack [EU EP PvP guild - Now stood down from active duty]
    BLOOD FOR THE PACT!!!

    Nermden - EP Warden, Nerm-in'a'tor - EP Dragon Knight, N'erm - EP Sorcerer, D'arkness - EP Nightblade, Nermy - EP Templar

    “Always forgive your enemies; nothing annoys them so much.” ― Oscar Wilde

    "An Army is a team; lives, sleeps, eats, fights as a team. This individual heroic stuff is a lot of crap." -General George S. Patton
  • RupzSkooma
    RupzSkooma
    ✭✭✭✭
    Opticon wrote: »
    Slack wrote: »
    Asking for professional opinions in a place that is full of angry nerds :)

    I'm trying my best, so far it's working out :-D

    No one can provide a proper professional opinion out of the studio.We can only provide opinion as an educated consumer however.
    Elder Kings II is a Role Playing Elder Scrolls mod for Crusader Kings III.
  • Opticon
    Opticon
    ✭✭✭
    Opticon wrote: »
    Slack wrote: »
    Asking for professional opinions in a place that is full of angry nerds :)

    I'm trying my best, so far it's working out :-D

    No one can provide a proper professional opinion out of the studio.We can only provide opinion as an educated consumer however.

    Now I'm not so sure why you are replying. This is for a general tech perspective, not a super-secret-only-eso-knows perspective. You don't have to work for the company in question to provide an educated and professional opinion on the matter.
  • twev
    twev
    ✭✭✭✭✭
    I understand down time due to equipment failure.
    It tends to be unscheduled and not the company's fault.

    Scheduled down time for maintenance is understandable.

    Down time due to faulty or sloppy code is different.
    The company had plenty of time to test the code before making us take it and going live with it.
    Down time due to exploits resulting from bad code is an issue.
    Unscheduled down time due to the above 3 lines is an issue.

    I don't want a refund.

    But when I pay a subscription fee - I don't want the clock counting down to zero because the company broke something and is charging my account for the time they take to fix it.

    You can't keep your system on line because of unscheduled issues with code and services you control? - Then stop the clock on my sub until you're stable again.

    I've paid a sub since launch.
    I don't need free mounts as gratitude.
    I want the company to thrive.

    I don't want to pay for stuff I'm not getting.
    And then be accused of claiming to be 'entitled'



    Edited by twev on August 18, 2017 8:31AM
    The problem with society these days is that no one drinks from the skulls of their enemies anymore.

    PC/NA, i7 with 32 gigs of ram, nVME cards and an nVidea 1060 over fiber.
    I don't play through Steam, ever.
  • RupzSkooma
    RupzSkooma
    ✭✭✭✭
    Opticon wrote: »
    Opticon wrote: »
    Slack wrote: »
    Asking for professional opinions in a place that is full of angry nerds :)

    I'm trying my best, so far it's working out :-D

    No one can provide a proper professional opinion out of the studio.We can only provide opinion as an educated consumer however.

    Now I'm not so sure why you are replying. This is for a general tech perspective, not a super-secret-only-eso-knows perspective. You don't have to work for the company in question to provide an educated and professional opinion on the matter.

    Heyyyyyyyyyyyy!
    I wasn't trying to be rude bro. :c
    Am still sorry if i was being annoying .
    Elder Kings II is a Role Playing Elder Scrolls mod for Crusader Kings III.
  • Duck
    Duck
    ✭✭✭
    I (continue to) have the nagging feeling that the PTS for new patches, is the only test for new patches. I hope I'm wrong.

    There's just been so many times in the past that I've seen bugs and glitches and exploits, ones I guarantee were reported about, make it from PTS to live. Then they got hotfixed or "unscheduled maintenanced" away later when too many people caught it on Live. As if they're lacking the manpower to get to them, or to notice them amongst all the reports, or lacking the time to get to them.
    What I lack in gameplaying ability I make up for in smack talk.
  • jaschacasadiob16_ESO
    jaschacasadiob16_ESO
    ✭✭✭✭✭
    If you remember when the game was still beta, the files were downloaded from AWS, suggesting the whole infrastructure scales on top of Amazon's service. AWS is far from being reliable, despite the 99.99% uptime guarantee. And of course, if you don't directly control the machines, you are limited by those who control them.

    As a systems Engineer you certainly know that whenever you want to improve availability you must compare the cost of a 9 to the benefits it would provide. And if you are already beyond the availability you agreed to, it is not usually worth the hassle.

    I doubt SREs at ZOS planned anything like stress tests and/or chaos monkeys. But I am confident they are doing the best they can, within the limits enforced to them.
    "Yesterday while searching a barrel in vVoM I found a lemon. Best drop of the whole run."

    Protect the weak. Heal the sick.
    Treasure the gifts of friendship. Seek joy and inspiration in the mysteries of love.
    Honor the Earth, its creatures, and the spirits. Use Nature's gifts wisely. Respect her power. Fear her fury.
  • Opticon
    Opticon
    ✭✭✭
    If you remember when the game was still beta, the files were downloaded from AWS, suggesting the whole infrastructure scales on top of Amazon's service. AWS is far from being reliable, despite the 99.99% uptime guarantee. And of course, if you don't directly control the machines, you are limited by those who control them.

    As a systems Engineer you certainly know that whenever you want to improve availability you must compare the cost of a 9 to the benefits it would provide. And if you are already beyond the availability you agreed to, it is not usually worth the hassle.

    I doubt SREs at ZOS planned anything like stress tests and/or chaos monkeys. But I am confident they are doing the best they can, within the limits enforced to them.

    I don't doubt SRE at ZOS, I can imagine how difficult it is to manage, but let's take a step back from an SRE type of group. Mid to upper level management of and above SRE should be furious about these uptimes. Are they? Who knows. But it sure seems like, to me at least, that some heads should eventually be rolling (in other groups).


    edit: also, unfortunately, I know all too well about about how infrastructure is horrible on the P&L sheets.
    Edited by Opticon on August 18, 2017 8:41AM
Sign In or Register to comment.