Maintenance for the week of December 16:
• [COMPLETE] NA megaservers for patch maintenance – December 17, 4:00AM EST (9:00 UTC) - 12:00PM EST (17:00 UTC)
• [COMPLETE] EU megaservers for patch maintenance – December 17, 9:00 UTC (4:00AM EST) - 17:00 UTC (12:00PM EST)

Update 33 PC Launch Postmortem

  • LalMirchi
    LalMirchi
    ✭✭✭✭✭
    A question:

    They should perhaps nuke their own data-centers and employ stable services, perhaps Azure or AWS?

    Perhaps focusing on the game instead of the backend would help?
  • TechMaybeHic
    TechMaybeHic
    ✭✭✭✭✭
    ✭✭
    I know there was a port issue but...seeing as PC NA is still in rough shape; you can't be serious. Really a bad idea for an April fool's joke @ZOS_GinaBruno
    We’re excited to let everyone know we’re approaching the final stage of completing database sharding and are planning to shard the PC EU megaserver next Tuesday, April 5 so you can reap the performance benefits of sharding as soon as possible.

    Let's be honest group finder didn't break because of a port issue.

    Also what happened this time. A week later the fixes break everything again?

    Yeah. No judgement here one way or the other. Just so far this database thing hasn't really shown much improvement on PC. Maybe it's because PC has been around longer than console. Maybe it's sheer volume of players. (If so; good luck PCEU who might have the most). I just don't know any more than whatever it was; it still ain't right.
  • coletas
    coletas
    ✭✭✭✭
    LalMirchi wrote: »
    A question:

    They should perhaps nuke their own data-centers and employ stable services, perhaps Azure or AWS?

    Perhaps focusing on the game instead of the backend would help?

    No datacenter can give any big improvement if software architecture is terrible. The problem is not the server. Is a leadership problem. If you have the money to buy a big boat, you buy it and dont know to manage and hire the sailor properly, u will float the boat and you will have terrible problems when non easy tasks has to be performed, like in a storm. When nothing is planed carefully you front the problems in a hard way when most of them would be easy to fix and most important, easy to avoid. Passengers are leaving the ship while the captain is only capable of bringing more passengers, but that will have an end that people with some experience know well.

    Resume: no, no datacenter is going to save the game
  • Gaeliannas
    Gaeliannas
    ✭✭✭✭✭
    LalMirchi wrote: »
    A question:

    They should perhaps nuke their own data-centers and employ stable services, perhaps Azure or AWS?

    Perhaps focusing on the game instead of the backend would help?

    Actually it was ignoring the backend (servers) for 8+ years that got us where we are today.
  • Aldoss
    Aldoss
    ✭✭✭✭✭
    So was there actually a hardware issue and it coincidentally resolved the issues from a week ago, or was this post lip service?

    Loading anywhere on PCNA is taking 5minutes plus. I'm scared to try and go to my house (you know, the one that I paid tens of thousands of crowns for and also store furnishings that I also spent crowns on).

    I hope I see more articles make it to game news blogs laughing about how a billion dollar company can shoot themselves in the foot, bandage it up, and then immediately shoot themselves in the other foot. My gut tells me that will be the only way you (ZOS) will ever do what needs to be done to salvage this game from the ashes.
  • smacx250
    smacx250
    ✭✭✭✭✭
    Given how things are tonight, it seems like that failed hardware was another red herring. Keep looking...
  • SeaUnicorn
    SeaUnicorn
    ✭✭✭✭✭
    smacx250 wrote: »
    Given how things are tonight, it seems like that failed hardware was another red herring. Keep looking...

    Or it was not the only piece of hardware that is failing.
  • Onomog
    Onomog
    ✭✭✭✭
    SeaUnicorn wrote: »
    smacx250 wrote: »
    Given how things are tonight, it seems like that failed hardware was another red herring. Keep looking...

    Or it was not the only piece of hardware that is failing.

    I'm beginning to think that we've been playing on a house of cards...
  • Dietche
    Dietche
    ✭✭✭
    Servers are back to the exact same state during the time the network card was supposedly bad.
    >>Cannot group people.
    >>Cannot leave group.
    >>Cannot disband group.
    >>Finder doesn't work unless it's a 4 man premade.
    >>Once the finder pops, only one person actually zones in, and everyone else has to manual port.
    >>Logging in, or out, at the password screen or the character screen, is all a complete joke.
    >>Zoning *anywhere* for any reason is laughable. And for a game that relies SO heavily on zoning, what with having 10 doors and ladders for even the simplest of quests, the inability to zone--basically at all--makes it nearly impossible to do "other" things besides dungeons to pass the time.
    >>Node picking (really our only option left?) is now crazy slow

    All these things are the exact same issues we had when the network card "supposedly died". All these things were happening during an event, just like this time. In the last 2 years, we have had horrible server performance *each time* an event came. So forgive me if I just don't believe that it's a matter of a "simple lan card issue" anymore.

    The coincidences just keep piling up. Who wants to bet if they stopped the Jesters Event, right now, that all the login queues and broken finder and grouping issues and poor zoning performance suddenly vanishes? What? No bet? Yeahhhh....
    Guild Leader: Sardonically Synthesized
  • TheAlphaRaider
    TheAlphaRaider
    ✭✭✭
    hey @ZOS_MattFiror we are still dealing with bugs from the patch. See bug reports about queues in dungeons, occurs during prime time mostly.
  • ArchangelIsraphel
    ArchangelIsraphel
    ✭✭✭✭✭
    ✭✭✭✭
    Time for another autopsy methinks. My necromancer is more than willing to exhume the body!
    Edited by ArchangelIsraphel on April 2, 2022 2:47AM
    Legends never die
    They're written down in eternity
    But you'll never see the price it costs
    The scars collected all their lives
    When everything's lost, they pick up their hearts and avenge defeat
    Before it all starts, they suffer through harm just to touch a dream
    Oh, pick yourself up, 'cause
    Legends never die
  • TheAlphaRaider
    TheAlphaRaider
    ✭✭✭
    I think mortem is going on still.
  • Jaraal
    Jaraal
    ✭✭✭✭✭
    ✭✭✭✭✭
    On Tuesday, with the understanding that the problem was probably not connected to DB Sharding at all, we traced every log we could find to figure out where the bottleneck was and we finally found it – the issue was actually caused by a bad (as in failing) network port that was unable to process as much bandwidth as it was configured for. It wasn't a software problem at all; it was a hardware failure that, in essence, slowed down the entire megaserver. Tuesday’s maintenance was to take that device out of service and reconfigure a replacement, and once that was up, everything returned to normal and the DB Sharding process ran as intended: behind the scenes and with no player impact.

    So.... can you tell us about the current hardware failure?
  • EvilAutoTech
    EvilAutoTech
    ✭✭✭✭✭
    Abigail wrote: »
    Will there be a 2nd post mortem?

    Think it would be an exhumation rather than a 2nd postmortem.

    Actually they never buried the problems. The problems got up and started walking around again.
  • FeedbackOnly
    FeedbackOnly
    ✭✭✭✭✭
    ✭✭
    I did tell say after the fix something was still wrong. Latency was still slightly higher then average

  • LalMirchi
    LalMirchi
    ✭✭✭✭✭
    coletas wrote: »
    LalMirchi wrote: »
    A question:

    They should perhaps nuke their own data-centers and employ stable services, perhaps Azure or AWS?

    Perhaps focusing on the game instead of the backend would help?

    No datacenter can give any big improvement if software architecture is terrible. The problem is not the server. Is a leadership problem. If you have the money to buy a big boat, you buy it and dont know to manage and hire the sailor properly, u will float the boat and you will have terrible problems when non easy tasks has to be performed, like in a storm. When nothing is planed carefully you front the problems in a hard way when most of them would be easy to fix and most important, easy to avoid. Passengers are leaving the ship while the captain is only capable of bringing more passengers, but that will have an end that people with some experience know well.

    Resume: no, no datacenter is going to save the game

    I do think that gutting the vanity project (in-house servers) >>> ZOS very own datacenter would release developer in-house resources for relevant in-game work, this could be beneficial.
  • coletas
    coletas
    ✭✭✭✭
    I would never use those "resources" for anything relevant. They hit and hit the same rock over and over. When something doesnt work has to be replaced by something that works or outsource it, and I doubt anyone wants to take that job in current state
  • SilverWrought
    SilverWrought
    Soul Shriven
    I hope it gets fixed soon. I had planned to do some exploring with a couple young friends and show them ESO. But, well... this isn't the best look when I'm trying to convince them to try the game out...

    Mebbe get a whel=el barrow full of those Microsoft dollars and build some hefty backend? PLZ?
  • zharkovian
    zharkovian
    ✭✭✭✭
    One thing that I cannot understand, having worked on large database systems, is that whenever we wanted to process the database, backup, divide or organize the primary, we would take it offline, the database was backed up of course which happened all the time while online, but when we wanted to process things, the primary was not "live" and I suppose in retrospect, ZoS should have chosen the quiet times to do a sharding "maintenance" and shut us all out of the process. However, that's my opinion and when it comes to database management I know just enought to be dangerous.
  • coletas
    coletas
    ✭✭✭✭
    zharkovian wrote: »
    One thing that I cannot understand, having worked on large database systems, is that whenever we wanted to process the database, backup, divide or organize the primary, we would take it offline, the database was backed up of course which happened all the time while online, but when we wanted to process things, the primary was not "live" and I suppose in retrospect, ZoS should have chosen the quiet times to do a sharding "maintenance" and shut us all out of the process. However, that's my opinion and when it comes to database management I know just enought to be dangerous.

    Leaving apart that sharding key and looking to the big locks and knowing anything... I would bet some gold they are using a bad chosen clustered index for that sharding. Backups? They never do any unitest with real data and they ask customers to play to take statistical data instead of simulating it... I never seen a rollback even in worse scenarios... I would bet the backup is a raid 1 and a weekend copy... with luck.
  • Sylvermynx
    Sylvermynx
    ✭✭✭✭✭
    ✭✭✭✭✭
    coletas wrote: »
    zharkovian wrote: »
    One thing that I cannot understand, having worked on large database systems, is that whenever we wanted to process the database, backup, divide or organize the primary, we would take it offline, the database was backed up of course which happened all the time while online, but when we wanted to process things, the primary was not "live" and I suppose in retrospect, ZoS should have chosen the quiet times to do a sharding "maintenance" and shut us all out of the process. However, that's my opinion and when it comes to database management I know just enought to be dangerous.

    Leaving apart that sharding key and looking to the big locks and knowing anything... I would bet some gold they are using a bad chosen clustered index for that sharding. Backups? They never do any unitest with real data and they ask customers to play to take statistical data instead of simulating it... I never seen a rollback even in worse scenarios... I would bet the backup is a raid 1 and a weekend copy... with luck.

    Goddesses, I hope you're wrong. My little forum and blog databases back up every night.... Yeah, I've actually never needed a nightly (since 2000 when I started website management) but hey, I still have EVERY one of them....

    Well, except for the former client who moved to the UK, where her new provider had a major fire, and couldn't recover her site - but I still had a copy from before she moved.....
    Edited by Sylvermynx on April 3, 2022 12:09AM
  • coletas
    coletas
    ✭✭✭✭
    Sylvermynx wrote: »
    coletas wrote: »
    zharkovian wrote: »
    One thing that I cannot understand, having worked on large database systems, is that whenever we wanted to process the database, backup, divide or organize the primary, we would take it offline, the database was backed up of course which happened all the time while online, but when we wanted to process things, the primary was not "live" and I suppose in retrospect, ZoS should have chosen the quiet times to do a sharding "maintenance" and shut us all out of the process. However, that's my opinion and when it comes to database management I know just enought to be dangerous.

    Leaving apart that sharding key and looking to the big locks and knowing anything... I would bet some gold they are using a bad chosen clustered index for that sharding. Backups? They never do any unitest with real data and they ask customers to play to take statistical data instead of simulating it... I never seen a rollback even in worse scenarios... I would bet the backup is a raid 1 and a weekend copy... with luck.

    Goddesses, I hope you're wrong. My little forum and blog databases back up every night.... Yeah, I've actually never needed a nightly (since 2000 when I started website management) but hey, I still have EVERY one of them....

    Well, except for the former client who moved to the UK, where her new provider had a major fire, and couldn't recover her site - but I still had a copy from before she moved.....

    With a good design, u never need nightlys. If u have to do nightlys and most updates are hard, is better to take a serious look to the architecture you designed.

    About backups... Yeah, apart of raid and externals wich are a must, do al kind of backups you can... Logs for live rollbacks, complete remote backups and even image backups. A customer that has his product down for minutes gets angry, for an hour gets extremely angry, and for some more hours, is looking for an alternative and will keep looking forever.

    Here they give you exp scrolls...
  • Sylvermynx
    Sylvermynx
    ✭✭✭✭✭
    ✭✭✭✭✭
    Yep. It isn't a really sterling example of management.
  • Vulkunne
    Vulkunne
    ✭✭✭✭✭
    This whole patch in particular feels really rushed. Many of us on here expressed a number of different concerns, some directly affecting the servers and others more related to unlikable character changes. I wish whoever is making decisions over there would have taken some more time on this (like when Vampirism was first revised), where we could all talk about the direction and what we really wanted to see happen.

    Now look at the situation, performance is terrible and plagued by issues... well I said that would happen in the beginning lol on another thread.

    People have lost their individual character achievements and that is causing YUGE issues for Trial Guilds as well as for the players themselves trying to track content for each character. Let alone I think when you guys decided to start messing with customer data you was just asking for trouble.

    Scrapping the DB where the character data used to reside is also not contributing to -any- of your promises from the very beginning and has made life playing this game more difficult, as well as instituted a number of new, unattended issues that you cannot solve in a timely manner.

    Role back the update dawg. Role it all back and just put everything back to the way it was please. Stand up to whomever is running things into the ground over there and tell them we need to just 'drop it' cause we're out in left field and alot of people are tired of being ignored and watching things get worse from almost every update.

    I'm just being real about this whatever this is, I mean it doesn't even feel like the same game anymore. I'm not going to praise terrible results especially from folks who won't hear me out or who take my money and then fundamentally chip away at something that used to be enjoyable and not just for me either.
    Edited by Vulkunne on April 3, 2022 3:26AM
    Today Victory is mine. Long Live the Empire.
  • LalMirchi
    LalMirchi
    ✭✭✭✭✭
    IMHO The weak link is the weakest hardware, that will say the very inadequate Playstations & Xboxes.

    It would be hard but rather beneficial to rid us of these applicices. Or reduce their influence in the current build.

    "Will no one rid me of this turbulent priest?"
    Edited by LalMirchi on April 3, 2022 11:22AM
  • Aardappelboom
    Aardappelboom
    ✭✭✭✭✭
    LalMirchi wrote: »
    IMHO The weak link is the weakest hardware, that will say the very inadequate Playstations & Xboxes.

    It would be hard but rather beneficial to rid us of these applicices. Or reduce their influence in the current build.

    "Will no one rid me of this turbulent priest?"

    Why would this have anything to do with this? Most problems are clearly server side from what I can tell. Client side is further devided on consoles and PC ever since the enhanced version came out and they've even started differentiating graphical settings (which are client-side) and that part actually works great.

    Except for maybe some more overhead to cater to all these devices there's nothing holding ESO back, there's just problems with the server not able to keep up, the fact that this is (mostly) only happening on PC NA also indicates server side problems.
  • Gaeliannas
    Gaeliannas
    ✭✭✭✭✭
    Vulkunne wrote: »
    This whole patch in particular feels really rushed. Many of us on here expressed a number of different concerns, some directly affecting the servers and others more related to unlikable character changes.

    The inexcusable thing in my mind, was them pushing out another major change on top of a failed patch. Most players couldn't play for a full week, the forums were ablaze with complaints and ZOS apparently "chalked these reports up to normal server startup issues after a big update" and decided to double down on the changes. Well to start with, "normal server startup issues" isn't Normal to begin with, it is already a sign there is something terribly wrong that needs looked at and fixed. The fact that they consider this normal, especially after a week straight of it, is a sign of how how bad things have become, how long this has been going on, how desensitized to major issues within their game they have become, and really explains a lot.

    Edited by Gaeliannas on April 3, 2022 2:20PM
  • sarahthes
    sarahthes
    ✭✭✭✭✭
    ✭✭
    I do not think most of the issues have anything to do with software, database design, or even database sharding - because PC NA is the 5th server to undergo sharding and the first to have issues.

    This all reeks of hardware infrastructure problems.
  • _adhyffbjjjf12
    _adhyffbjjjf12
    ✭✭✭✭✭
    sarahthes wrote: »
    I do not think most of the issues have anything to do with software, database design, or even database sharding - because PC NA is the 5th server to undergo sharding and the first to have issues.

    This all reeks of hardware infrastructure problems.

    I've been a senior dev for over 20 years and it reeks of a poor devops model and really really poor coding quality practices, or more likely they are hacking that awful game engine that everyone cried out as been woeful before ESO was released.

    issues?

    - Releases often break stuff (not acceptable in this day and age - see competition)
    - Maintenance brings down live. (not acceptable - see competition)
    - Housing - they sell large houses for real money even though they have stated their software cannot handle number of objects required to populate them (outrageous behaviour, borderline fraud)
    - cryodill not fit for purpose, the game engine cannot cope
    - cryodill not fit for purpose, the hardware cannot cope.
    - ESO the game, the Database design was not built to scale.
    - ESO PVE instances laggy and buggy.

    it goes on and on and on and on.





    Edited by _adhyffbjjjf12 on April 3, 2022 3:56PM
  • SerafinaWaterstar
    SerafinaWaterstar
    ✭✭✭✭✭
    LalMirchi wrote: »
    IMHO The weak link is the weakest hardware, that will say the very inadequate Playstations & Xboxes.

    It would be hard but rather beneficial to rid us of these applicices. Or reduce their influence in the current build.

    "Will no one rid me of this turbulent priest?"

    Why blame consoles when we’re not having the issues PC is having? Some people have rather rubbish pcs too, you know.

    And yes, getting ‘rid’ of potentially 2/3rds of your player base is *such* a good idea for long term sustainability. /s
    Edited by SerafinaWaterstar on April 3, 2022 4:19PM
Sign In or Register to comment.