Maintenance for the week of March 20:
• PC/Mac: No maintenance – March 20
• Xbox: NA and EU megaservers for maintenance – March 22, 6:00AM EDT (10:00 UTC) - 10:00AM EDT (14:00 UTC)
• PlayStation®: NA and EU megaservers for maintenance – March 22, 6:00AM EDT (10:00 UTC) - 10:00AM EDT (14:00 UTC)

Update 33 PC Launch Postmortem

ZOS_MattFiror
Since Update 33 launched, I think the PC North American megaserver performance problems deserve some explanation. This post outlines what has been going on the last week or so for our North American PC players.

First, last year (which seems like decades ago) we announced a plan to increase ESO’s stability and performance, and we have been diligently performing tasks behind the scenes with every update to implement them. One of the larger items on this list was "Database Sharding" which is a simple concept: take our giant player database (DB) and separate it into two sections for "current characters" and "older characters" so the entire DB doesn't have to be queried when a player logs in. Over time, our character DB (one per Megaserver) has been growing and about two years ago, its sheer size became a bottleneck. This is why the "requesting character load" part of the login process sometimes takes a lot longer than it should.

The DB Sharding process separates our character databases into a "live" DB and a "cold" DB; all accounts who have logged in over the past year are in the Live DB and older ones are in the Cold DB. The plan, once everything is complete, is that active accounts will pull their characters from the smaller Live DB on login, greatly decreasing login time. Older characters will pull from the Cold DB on login, which will take longer, but once an account logs in their characters are moved over to the Live DB for faster access after the initial login. This character record separation happens the first time an account logs in after sharding has been enabled for that megaserver. The first login may be longer than normal as the copying happens, but after that every login afterwards should be much faster.

The good news here is that we have already done this for most of the live megaservers over the last couple of months; all console megaservers have been upgraded already and login times have greatly decreased.

With that background information, you can now start to understand what happened since Update 33 launched last Monday. The PC character database (especially the North American megaserver) is far, far larger than console as ESO had a big launch year in 2014 (pre-console launch) and all those accounts are still there. In addition, all the Beta accounts (and characters) are still there as well.

So, Update 33 launched last Monday and the plan was to wait until the dust settled, then actually enable sharding on PC NA. On launch day, we tracked the usual in-game bugs and issues that tend to crop up and began work to address them. And there were indeed some problems. There were reports of in-game loading screen timeouts and that the Activity Finder was bogged down. Our first big failure was we chalked these reports up to normal server startup issues after a big update. We later increased our real-time monitoring which showed the Activity Finder and other processes were running a bit "hot" – they would spike a bit, then return to normal. We made adjustments both outside of and during primetime hours to try to alleviate queue issues, but this made it difficult to pinpoint if our adjustments were working or if primetime population on the server was easing. So we – and this was our second large error – decided to move ahead with enabling DB Sharding on the PC NA megaserver without addressing the Activity Finder issues.

And all of you who play on the PC NA megaserver know what happened once we flipped the DB Sharding switch: the entire server slowed down even more during primetime. The DB processes got backed up, which meant that all transfers between processes (i.e. zoning) were even slower, as well as logouts (where your character's DB record is updated) and the Activity Finder (which accesses your character records) became so bogged down it essentially ceased to function at all.

We had done the math and designed the DB Sharding system to work within normal server performance guidelines, so when we started addressing the slowdown issues, we naturally assumed that we had some bad calculations and started there. We made some changes (hence the downtime on Monday earlier this week) but they didn't help at all; performance was still terrible Monday night. Adding to the situation was that we could only troubleshoot on the live server, and only during primetime, because these problems cropped up mostly when the server was under moderate load. But the system ran slowly again Monday night so we knew it was something else.

On Tuesday, with the understanding that the problem was probably not connected to DB Sharding at all, we traced every log we could find to figure out where the bottleneck was and we finally found it – the issue was actually caused by a bad (as in failing) network port that was unable to process as much bandwidth as it was configured for. It wasn't a software problem at all; it was a hardware failure that, in essence, slowed down the entire megaserver. Tuesday’s maintenance was to take that device out of service and reconfigure a replacement, and once that was up, everything returned to normal and the DB Sharding process ran as intended: behind the scenes and with no player impact.

Obviously, there are no guarantees, but we do believe we have gotten to the root of this issue. The TL;DR is that it wasn't related to Update 33, Account Wide Achievements or DB Sharding at all, even though they all happened around the same time and we spent too much time investigating a red herring because of it.

I know this hasn't been an awesome time for any of you on PC. Many of you were unable to login to play and take advantage of the Explorer's Celebration as you otherwise might have. You may have lost time and progress, and to acknowledge that, we are going to be giving out five 150% Experience Scrolls on the first day of April through the Daily Login Rewards calendar and will be tripling the number of Weekly Endeavor Seals the week of 4/4 for players on all ESO platforms.

We have so much to look forward to in April with Jester's Festival, the Anniversary Jubilee, and even more we can't wait to share with you. We hope you'll use these Experience Scrolls during the upcoming 100% bonus XP events and catch up to where you might have been, had the game been running as intended.

Thanks so much for bearing with us and for reading this long explanation. Given the circumstances, I think full disclosure was warranted.
Edited by ZOS_Kevin on July 15, 2022 7:45PM
Matt Firor
Studio Director, ZeniMax Online Studios
Staff Post
  • FeedbackOnly
    FeedbackOnly
    ✭✭✭✭✭
    ✭✭
    This actually makes sense. The problems for me started Sunday night before update 33.

    ♥️

    I just hope you checked all the network ports because it's still not like it was. There's a 30 increase numbers on latency
    Edited by FeedbackOnly on March 25, 2022 2:09PM
  • FeedbackOnly
    FeedbackOnly
    ✭✭✭✭✭
    ✭✭
    🙏♥️Please make sure to test group finder next update. 🙏♥️This used to happen every patch before dark brotherhood. As in the patch would break it for awhile each patch.


    Group finder did not work. It would only allow premade groups or groups more then 1 in queue properly. Solo queue getting it was extremely rare and only because the other half was group queued.

    Example I would instant queue with myself as a healer and with a DPS. While tanks would still be queue for litterally hours alone.
    Edited by FeedbackOnly on March 25, 2022 2:24PM
  • LalMirchi
    LalMirchi
    ✭✭✭✭✭
    A postmortem is always useful and making the postmortem a discussion is excellent.

    I wonder if there would be any improvement (performance || cost) by moving the servers to Azure or AWS?
  • Ashryn
    Ashryn
    ✭✭✭✭✭
    Thank you :)
  • MasterWarrior
    MasterWarrior
    ✭✭✭
    this is a good response to the issues we had. You laid out in detail why it happened without revealing too much. And I assume you will be making changes to try to avoid these problems in the future.
  • Lostar
    Lostar
    ✭✭✭✭
    "It's very rare a big gaming company to give its player community such straightforward and honest explanations. So thank you very much for your efforts." --It's becoming less and less rare as companies take note that their playerbase has come to expect such from companies that had the foresight to already do this.. a good thing and though one would hope such measures are unnecessary in the future; that should it come to pass, that they continue to offer postmortems but also certainly unless it's an 'all hands on deck scenario' that they don't wait until postmortem. It's been a pretty stressful time for them I'm sure and I'm happy for the form of compensation.
    I paint stuff sometimes...
    https://www.instagram.com/artoflostar/
  • Anifaas
    Anifaas
    ✭✭✭
    Thank you for your thoughtful and detailed post. Much respect! ❤️
  • Serafen
    Serafen
    Thank you so much!! This shows you care and listen to those of us who love and support this beautiful game. 💗💗
  • mbaranski15
    mbaranski15
    ✭✭
    Thank you! Love the detailed explanation!
  • FeedbackOnly
    FeedbackOnly
    ✭✭✭✭✭
    ✭✭
    @ZOS_MattFiror

    And when will the bugs from the last 5 years be fixed?

    Sugar skulls the food item has litterally been bugged for years.

    A simple UI error but by now it deserves fix.
    Edited by FeedbackOnly on March 25, 2022 2:43PM
Sign In or Register to comment.