Maintenance for the week of December 23:
· [COMPLETE] NA megaservers for maintenance – December 23, 4:00AM EST (9:00 UTC) - 9:00AM EST (14:00 UTC)
· [COMPLETE] EU megaservers for maintenance – December 23, 9:00 UTC (4:00AM EST) - 14:00 UTC (9:00AM EST)

rip the server

  • Ardaghion
    Ardaghion
    ✭✭✭✭
    Morvan wrote: »
    Yeah, seems to be the same thing that happened last week.

    It works as normal for a few minutes, then gets completely unresponsive, back and forth.

    3otnccpohib6.png

    I'm in Wyoming and connect through Mountain West. My pings aren't that bad, I tried a tracert and my path hits a bunch of networks owned by he.net, looks like some backbone networks. A bunch of those servers are dropping packets or not responding about half the time. That includes the 198.20.200.1 server.

    I've tried ICMP packets which does get rejected by many servers but my networking tools include tracert with TCP or UDP packets.

    Edit: I take that back about pings, I tried again and every ping failed. It then comes back and shows no drops.
    Edited by Ardaghion on December 14, 2024 6:32AM
  • DewiMorgan
    DewiMorgan
    ✭✭✭
    Morvan wrote: »
    Yeah, seems to be the same thing that happened last week.

    Oof. I've worked for an MMO, and for an ISP, and either way, router problems were always the worst, because the ones with problems were rarely ours: Especially the Friday-night outages were always a hop or two from our data center, so we had to get in touch with their engineers, get through to someone who would escalate it, and then wait on them for a fix. And of course their IP contact records' contact info is someone who has no clue what the router even IS, and can't put us through to anyone who does.

    But of course it would look like OUR problem as far as the users were concerned, rather than them blaming some misconfigured router owned by our upstream ISP.

    At least when you're an ISP, and your network peers' routers mess up, you have some kinda business relationship and uptime agreements with those peers. A game doesn't get any of that kinda leverage.

    Looking at tracert, it looks like a router 4.71.220.10, four hops before 198.20.200.1, mightbe flapping? But can't really tell, tracert is so flakey nowadays. I miss the days when routers didn't deliberately eat ICMP, it made debugging this stuff so much easier.

    (hmm.. why's my ISP routing me all the way up to Dallas, if ESO is hosted here in Austin?)
    [...]
      6    18 ms    13 ms    13 ms   ae51.edge1.Dallas1.Level3.net [67.72.0.33]
      7     *        *        *     Request timed out.  <- maybe fine, might just eat ICMP?
      8    27 ms    24 ms     *     4.71.220.10 <- maybe flapping?
      9    23 ms     *        *     198.20.200.1
     10     *        *        *     Request timed out.
     11     *        *        *     Request timed out.
     12    31 ms     *       14 ms  198.20.200.1
    
    [...]
      6    28 ms    22 ms    18 ms  ae51.edge1.Dallas1.Level3.net [67.72.0.33]
      7     *        *        *     Request timed out.
      8     *        *       33 ms  4.71.220.10
      9     *        *        *     Request timed out.
     10     *        *        *     Request timed out.
     11     *        *        13 ms  ZENIMAX-MED.ear1.Dallas1.Level3.net [4.71.220.10]
     12    23 ms     *       16 ms  198.20.200.1
    

    The second tracert I see 4.71.220.10 in line #8 and #11, which... yeah. But the second time it resolves to "ZENIMAX-MED.ear1.Dallas1.Level3.net" - which means it's one side or the other of a connection between Zenimax and Level3, and is either in Dallas, or is on a connection TO Dallas (unfortunately routers are often named after where they are, but also often after where they connect to).

    So could be Zenimax router, or a Level3 one. If the latter, it'll be a non-fun Friday night for engineers in both companies.

    But this is just speculation. I don't work there or know anything. Could just be someone turning the servers on and off like a scene from Airplane, for all I know.
    Edited by DewiMorgan on December 14, 2024 6:54AM
  • barney2525
    barney2525
    ✭✭✭✭✭
    ✭✭✭
    TheDuke wrote: »
    What is it THIS time

    could be the server is a bit miffed about all the abuse that was said yesterday, and just wants to vent a bit.


    :#
  • SeaGtGruff
    SeaGtGruff
    ✭✭✭✭✭
    ✭✭✭✭✭
    trittnerxx wrote: »
    all that matters is they were able to focus on the cooking stream instead of the servers that have been on fire

    I'm pretty sure Gina and Jessica don't work on the servers in Texas (or wherever the data center is that had the emergency power outage the other day), and that the folks who work on the servers in Texas don't have any connection to the holiday cooking stream.
    I've fought mudcrabs more fearsome than me!
  • Ella_Mental
    Ella_Mental
    ✭✭✭
    https://forums.elderscrollsonline.com/en/discussion/comment/8237530/#Comment_8237530

    "wanted to confirm the team is investigating these issues which appear to be affecting all NA servers"
    "we have alerts that occur for issues like this along with on-call teams that handle the investigations/resolutions. These folks had already been investigating for a bit"
    @Ella_Mental [PCNA-Steam] - Ella_Mental on Discord - _Ella_Mental_ on Twitter/X
  • blktauna
    blktauna
    ✭✭✭✭✭
    DewiMorgan wrote: »
    baconaura wrote: »
    no oncalls. been going on for an hour, and still no acknowledgement there is an issue. everyone is just going to give up and call it a night because the game is unplayable.

    Engineers do not respond to customer service calls, especially when things are on fire. They focus on diagnosing and resolving the problem.

    For any large enterprise, there are essentially always engineers on call. I can essentially guarantee that there are some engineers right now with very unhappy faces, because they are not going to have a fun Friday night.

    Tech support is generally a different dept, and in this case maybe even a different org (I suspect ZOS user support is done by MS nowadays), and does not typically have any on-call staff.
    trittnerxx wrote: »
    all that matters is they were able to focus on the cooking stream instead of the servers that have been on fire

    Engineers also don't do cooking streams (but I'm guessing this was just a joke).

    Been there and I feel for the Engineers. This is not how they want to be spending a Friday night. I wish them well.
    PCNA
    PCEU
  • ZOS_Kevin
    ZOS_Kevin
    Community Manager
    Hi All, just wanted to note that we have a team investigating issues right now. But given the time right now, figuring things out will take some time. If we have an update, we will follow up.
    Community Manager for ZeniMax Online Studio and Elder Scrolls OnlineDev Tracker | Service Alerts | ESO Twitter
    Staff Post
  • TX12001rwb17_ESO
    TX12001rwb17_ESO
    ✭✭✭✭✭
    ✭✭✭✭
    I do not envy the ZOS employees who have to deal with this one bit, sure I could see being an ESO developer as being fun but not when you would have to deal with things like this when you would normally be asleep, yesterday there is a power outage and now this.

    There is a very strong chance it is a Hardware issue, something is broken and needs replacing, I hate to say this, but I think ZOS should keep the servers down for a few days and work on it, compensate everyone after that with a big Christmas present like 15 free crates or something.
  • hamgatan
    hamgatan
    ✭✭✭✭✭
    There is a very strong chance it is a Hardware issue, something is broken and needs replacing, I hate to say this, but I think ZOS should keep the servers down for a few days and work on it, compensate everyone after that with a big Christmas present like 15 free crates or something.

    even so, any properly set up environment has flags for that. id be surprised if there were not alarms in place.

    i mean heck i have iDRAC reporting from dozens of Dell hosts/SANs etc reporting the second anything goes *** up along with SNMP trap capture and Nagios NRPE flagging.. why wouldnt there be similar at ZOS's end?

    if something breaks.. move the service, throw the host in maintenance mode.
    Edited by hamgatan on December 14, 2024 9:04AM
    PC / NA - 1800 CP

    PvE Tanks
    L50 Imperial DK (US/DC) "Rampant Rabbit"
    L50 Nord Necro (US/DC) "Skeletons In The Closet"
    L50 Nord Arcanist (US/EP) "Now Thats a Huge Witch"

    PvE Healers
    L50 Argonian MagPlar (US/EP) "Smothers-With-Pillows"
    L50 Breton MagWarden (US/EP) "Drunk-The-Koolaid"
    L50 Altmer MagBlade (US/AD) "Never Goanna Heal You Up"

    PvE DPS
    L50 PvE DPS Khajit MagDK (US/EP) "Snowflake Crusher"
    L50 Dunmer Stam Arcanist PvE DPS (US/EP) "Sends-The-Trout"
    L50 Altmer MagSorc PvE DPS (US/DC) "Acirrum" - The vMA/vvH Potatoaky Sorc
    L50 Breton StamCro PvE DPS (US/DC) "Ivanna Fakakakis"
    L50 PvE DPS Argonian StamPlar (US/EP) "The Rusty Argonian Spade"
    L50 PvE DPS Khajit StamPlar (US/EP) "Critteh Kitteh"
    L50 Dunmer MagDK PvE DPS (US/DC) "Deep Fried Bin Chicken"

    Bank Skanks
    L20 Redguard StamBlade PvP Tank (US/AD) "Sneak Dogg"
    L40 Orc StamDen PvE DPS (US/EP) "Fugly Betty"

    PvP DPS
    L50 Orc StamSorc PvE DPS (US/AD) "Fraggle Proc"


    Xbox One / NA - 360 CP
    L50 Altmer MagBlade (US/AD) "Cork Soaking"
    L10 Argonian Templar (US/EP) "Makes-Me-Moist"
    L10 Argonian MagDK (US/EP) "<Forced-Name-Change>"
    L27 Altmer MagSorc (US/EP) "Sorcie McSorcface"

    |GM - The Bin Chicken Alliance | Aussie Dragon Slayers | Aedra | The Skooma Emporium | The Bus | The Bounty Hunters Guild |
  • DewiMorgan
    DewiMorgan
    ✭✭✭
    Possible suggestion of this being a DDoS, which would certainly explain the router flapping so badly.

    Not sure how true this is, but hey, rumors are fun. I've verified what I can.

    FFX!V is also down, and they're calling it a DDoS. They've been reporting DDoS issues for a few days now. So their outages kinda line up with ours.

    Rumor has it they're hosted in the same data center as Zenimax' game servers - but I have not been able to verify this, and I suspect it is not true. What little evidence I can find online suggests the FFXIV NA datacenter is in Sacramento, CA, while Zenimax/ESO's is in Austin, TX (or maybe Dallas?)

    Various other websites I know are also being DDoSed - 'tis the season, I guess?

    This may all be a response to the recent multinational "Operation PowerOFF" that shut down 27 DDoS sites a few days ago, to try and avert the usual spate of Xmas DDoSes (FBI page; Europol page).

    Maybe the DDoSers want to be like "haha, you can't stop us" or something. Or maybe that PowerOFF hurt them bad, so now they're scrambling hard to get blackmail moneys to recover all they lost?

    Either way, grrr. Jerks.

    Some internet weather sites are showing increased outages. Others showing none.
    internet weather map - mostly OK.
    Thousand Eyes - lots of outages
    Internet Health Report - lots of alarms
    Edited by DewiMorgan on December 14, 2024 10:19AM
  • Ella_Mental
    Ella_Mental
    ✭✭✭
    Thanks, Dewi, for the links to those "Internet Weather" sites! I didn't even think to check to find pages like that. <3
    @Ella_Mental [PCNA-Steam] - Ella_Mental on Discord - _Ella_Mental_ on Twitter/X
  • baconaura
    baconaura
    ✭✭✭
    DewiMorgan wrote: »
    baconaura wrote: »
    no oncalls. been going on for an hour, and still no acknowledgement there is an issue. everyone is just going to give up and call it a night because the game is unplayable.

    Engineers do not respond to customer service calls, especially when things are on fire. They focus on diagnosing and resolving the problem.

    For any large enterprise, there are essentially always engineers on call. I can essentially guarantee that there are some engineers right now with very unhappy faces, because they are not going to have a fun Friday night.

    Tech support is generally a different dept, and in this case maybe even a different org (I suspect ZOS user support is done by MS nowadays), and does not typically have any on-call staff.
    trittnerxx wrote: »
    all that matters is they were able to focus on the cooking stream instead of the servers that have been on fire

    Engineers also don't do cooking streams (but I'm guessing this was just a joke).

    Unfortunately this forum's announcement section at the top is the only way ZOS has communicated with us which is bottlenecked by requiring customer service/community managers/mods to update the status. If they had a status dashboard like so many companies, it would streamline the communication process, and keep everyone in the loop.

    Not to beat a dead horse, but communications could be improved, and providing status dashboards like below or using a product like atlassian statuspage would be one way to streamline things and make things more transparent.

    e.g.

    ffxiv statuses: https://na.finalfantasyxiv.com/lodestone/news/
    eve online status: https://status.eveonline.com/ (which i feel does a good job because it also shows status for cloud providers)
    reddit status: https://www.redditstatus.com/
    aws status: https://health.aws.amazon.com/health/status
    azure status: https://azure.status.microsoft/en-us/status

    Edited by baconaura on December 14, 2024 2:26PM
  • LadyGP
    LadyGP
    ✭✭✭✭✭
    DewiMorgan wrote: »
    Morvan wrote: »
    Yeah, seems to be the same thing that happened last week.

    Oof. I've worked for an MMO, and for an ISP, and either way, router problems were always the worst, because the ones with problems were rarely ours: Especially the Friday-night outages were always a hop or two from our data center, so we had to get in touch with their engineers, get through to someone who would escalate it, and then wait on them for a fix. And of course their IP contact records' contact info is someone who has no clue what the router even IS, and can't put us through to anyone who does.

    But of course it would look like OUR problem as far as the users were concerned, rather than them blaming some misconfigured router owned by our upstream ISP.

    At least when you're an ISP, and your network peers' routers mess up, you have some kinda business relationship and uptime agreements with those peers. A game doesn't get any of that kinda leverage.

    Looking at tracert, it looks like a router 4.71.220.10, four hops before 198.20.200.1, mightbe flapping? But can't really tell, tracert is so flakey nowadays. I miss the days when routers didn't deliberately eat ICMP, it made debugging this stuff so much easier.

    (hmm.. why's my ISP routing me all the way up to Dallas, if ESO is hosted here in Austin?)
    [...]
      6    18 ms    13 ms    13 ms   ae51.edge1.Dallas1.Level3.net [67.72.0.33]
      7     *        *        *     Request timed out.  <- maybe fine, might just eat ICMP?
      8    27 ms    24 ms     *     4.71.220.10 <- maybe flapping?
      9    23 ms     *        *     198.20.200.1
     10     *        *        *     Request timed out.
     11     *        *        *     Request timed out.
     12    31 ms     *       14 ms  198.20.200.1
    
    [...]
      6    28 ms    22 ms    18 ms  ae51.edge1.Dallas1.Level3.net [67.72.0.33]
      7     *        *        *     Request timed out.
      8     *        *       33 ms  4.71.220.10
      9     *        *        *     Request timed out.
     10     *        *        *     Request timed out.
     11     *        *        13 ms  ZENIMAX-MED.ear1.Dallas1.Level3.net [4.71.220.10]
     12    23 ms     *       16 ms  198.20.200.1
    

    The second tracert I see 4.71.220.10 in line #8 and #11, which... yeah. But the second time it resolves to "ZENIMAX-MED.ear1.Dallas1.Level3.net" - which means it's one side or the other of a connection between Zenimax and Level3, and is either in Dallas, or is on a connection TO Dallas (unfortunately routers are often named after where they are, but also often after where they connect to).

    So could be Zenimax router, or a Level3 one. If the latter, it'll be a non-fun Friday night for engineers in both companies.

    But this is just speculation. I don't work there or know anything. Could just be someone turning the servers on and off like a scene from Airplane, for all I know.

    So uh, hi! Sorry to jump into this thread and yoink you away. Would be curious if you had any insight into the https://forums.elderscrollsonline.com/en/discussion/658253/zos-massive-spike-in-ping-lag-in-recent-days-what-gives#latest situation and if you had any suggestions on things we could run on our side of the house to see if it's an us/isp issue... or kind of verify it's a ZoS thing and maybe nail down what is happening. Rich posted a big QA in there yesterday with some info.
    Will the real LadyGP please stand up.
Sign In or Register to comment.