Maintenance for the week of December 23:
· [COMPLETE] NA megaservers for maintenance – December 23, 4:00AM EST (9:00 UTC) - 9:00AM EST (14:00 UTC)
· [COMPLETE] EU megaservers for maintenance – December 23, 9:00 UTC (4:00AM EST) - 14:00 UTC (9:00AM EST)

Emergency Server Downtime Hangout Thread 12/12/2024

  • Danikat
    Danikat
    ✭✭✭✭✭
    ✭✭✭✭✭
    I don't understand how one power cut can take out both the NA and EU servers. Isn't the point of having different regional servers that they're in different physical locations, to improve the connection for people in that continent?
    PC EU player | She/her/hers | PAWS (Positively Against Wrip-off Stuff) - Say No to Crown Crates!

    "Remember in this game we call life that no one said it's fair"
  • Iriidius
    Iriidius
    ✭✭✭✭
    ZoS should extend the time of endeavours and login reward by 24 hours and give tomorrows login reward and seals of endeavours extra for free.

    It is unavoidable that players are unable to play when the server are down but not OK to still expect players to login to get their daily reward and endeavour seals when the servers are down.

    In germany/frankfurt where EU server is located it was 6pm when the server went down and they will not be up until after reset at 4am. Most Europeans are not playing before that time.
  • JoeCapricorn
    JoeCapricorn
    ✭✭✭✭✭
    Had a dream the other night that ESO added wearable wings and I had moth wings and was typing LÄMP repeatedly in zone chat.
    I simp for vampire lords and Glemyos Wildhorn
  • sarahthes
    sarahthes
    ✭✭✭✭✭
    ✭✭
    Danikat wrote: »
    I don't understand how one power cut can take out both the NA and EU servers. Isn't the point of having different regional servers that they're in different physical locations, to improve the connection for people in that continent?

    The login server is located in the same data center as the NA servers, and it serves both NA and EU.
  • arena25
    arena25
    ✭✭✭✭✭
    carthalis wrote: »
    So I hear there's this stuff called "grass." I'm headed out to investigate right now.

    be careful I've heard it can be quite dangerous stuff

    Good news is that I don't have to worry about grass itself - bad news is that's only because it's a wintry wonderland outside right now.
    LalMirchi wrote: »
    Red banners at night, sailors cower in fright.

    Green banners at morning light, sailors delight.

    Fixed that for you.
    hamgatan wrote: »
    sarahthes wrote: »
    We now have much more robust systems, but until that catastrophic failure happened, we thought our previous configuration was fine.

    That's why responsible companies run regular BCP tests, so that they don't have to wait for a catastrophic failure before they find out that they can't recover.

    Exactly. Thats why I have two DC's on a 100gbps trunk with vSphere HA and SAN Live Volumes as another level of failover in case the power goes to hell.

    Unfortunately can't always trust third parties when they come in the mix. I had to cold start an entire DC last year because facilities conveniently left checking that the diesel genset had fuel in it off their checklists.. Least the other DC was alive though.

    Can't always trust third parties for anything in any field, not just in IT. I should know this, I've seen third parties let me down more than once in my 2 years (and counting, I hope) in my time as a geologist/environmental consultant.
    If you can't handle the heat...stay out of the kitchen!
  • averyfarmanb14_ESO
    averyfarmanb14_ESO
    ✭✭✭
    Huh. Gaming sites are indicating a datacenter power failure.
    CrushDepth wrote: »
    Well, this seems like to watch RED ONE on Prime.

    Hoo boy, I'd rather sit in the dark at a dead datacenter than watch RED ONE anywhere... /shudder.
  • arena25
    arena25
    ✭✭✭✭✭
    Iriidius wrote: »
    ZoS should extend the time of endeavours and login reward by 24 hours and give tomorrows login reward and seals of endeavours extra for free.

    It is unavoidable that players are unable to play when the server are down but not OK to still expect players to login to get their daily reward and endeavour seals when the servers are down.

    In germany/frankfurt where EU server is located it was 6pm when the server went down and they will not be up until after reset at 4am. Most Europeans are not playing before that time.

    That would be nice and maybe something to be considered for NA as well since power got cut at or around 1130am Eastern U.S. time (when many of us are at work) and it could be 1130pm or later before they return to service (if downtime does in fact extend to 12 hours or more). Doubt those on the Atlantic coast of North America will want to wake up at 1 in the morning to try and grab log-in reward before reset at 5am (unless of course they have no responsibilities, in which case, I can't help you, and I may envy you). But at this point, ball is in Zeni's court.
    Danikat wrote: »
    I don't understand how one power cut can take out both the NA and EU servers. Isn't the point of having different regional servers that they're in different physical locations, to improve the connection for people in that continent?

    If I told this uno times, I told this a thousand uno times - login server is in NA, and it serves both NA and EU - EU megaserver was entirely unaffected, and anyone in-game over there before power went POP could still enjoy being in-game, PROVIDED they don't log out, crash, or otherwise lose connection to the game. Curious, though - is anyone still logged in over there? Or have they finally given up the fight til Zeni fixes the power problems?
    Edited by arena25 on December 12, 2024 10:56PM
    If you can't handle the heat...stay out of the kitchen!
  • Calastir
    Calastir
    ✭✭✭✭✭
    Just watched Red One,

    highly recommend. :)
    Chaszmyr Do'Benrae (Dunmer Magsorc Vampire Infinity) ~ Dusk Doublespeak (Breton Magplar Werewolf) ~ Stan of Rimari (Nord Dragonknight Tank) ~ Bunto Kim Alhambra (Redguard Magplar Paladin) ~ Alicyankali (Argonian Magicka Necromancer Draugr Kin) ~ Gruuman Odinfan (Orsimer Magplar) ~ Boymans van Beuningen (Khajiit Stam Warden Bowzerker) ~ Flannelflail (Imperial Stamina Nightblade Brawler PVP) ~ Calastir (Altmer Stamina Dragonknight) ~ Sallystir (Bosmer Stam Warden Frostbite PVP) ~ Zalastir (Altmer Magicka Warden Ice Storm) ~ Capt Peach (Nord Stamcanist Crux Cannon) ~ PC EU ~ Flynt Westwood (Bosmer Magicka Dragonknight) ~ PC NA ~ since May 26th, 2021.
  • Diundriel
    Diundriel
    ✭✭✭
    https://youtube.com/watch?v=rV5s1_0OqW0

    https://youtube.com/watch?v=i-O_geHXfoA&t=746s

    first and only emergency pvp content uploaded, more guild and BG content coming soon;)
    My YT:
    https://www.youtube.com/@MHWPLZ_ESO

    GM of former Slack Squad PvP Raid Guild
    Our Vids:
    https://www.youtube.com/channel/UCKLwZNZlv8an4p-xNoboE7w

    Characters:
    Zoe'la- AD Magplar AvA 50 x2.5
    Not Zoe'la- DC Magplar AvA 27
    Worst Healbot EU- EP Magplar AvA 20
    Diundriel- AD StamNB AvA 39
    Pugs Got Bombed- AD ManaNB AvA 36
    Cause we have dots- AD ManaSorc AvA 35
    Red Zergs Again- AD StamDen AvA 30
    Synergy Spam Bot- AD MagDK AvA 17
    Heals of Cyrodiil- AD ManaDen AvA 14
    Nawrina- DC StamDK AvA 26
    Not Ganking- StamNB PVE DD
    Stack Pls- DC ManaNB AvA 20
    Der Katzenmensch- AD AvA 30
    Der kleine Troll- DC StamDen AvA 25
    and some I deleted and new ones I am to lazy to add so well above 250 Mio AP and 7 Former Emperor Characters

    PvE: multiple Flawless Conqueror Chars, Spirit Slayer, vAS +2, vCloudrest +3, vRG, vKAhm etc
  • Desiato
    Desiato
    ✭✭✭✭✭
    Many years ago, I was tricked into performing in dance videos by a Seducer! Like many morals, I suffered for meddling in dark magic I did not understand and paid the price with blisters on my feet..

    I don't really remember but I think the idea behind these videos was to highlight the beautiful artwork in eso. So I thought players missing eso might appreciate them today.

    Upon entrance to the Orc homelands, Br'i meets with the Emperor of Orc Town. (is that even a real place?? Is that even a real title?? I think he may have buffed his resume!)

    Things tend to get a little hot and wild when DKs are in the club! David and Davey, a couple of fiery Dragon Knight hawties, fly us on a bar hop of their favorite party places. The popular pair live loud and large and always draw a crowd!

    It is all fun and games until somebody calls a guard.

    What girl doesn't love her B.O.B.?

    While suffering from an unhealthy interest in giant robots, Br'i finds they make (shockingly) lousy boyfriends...so, using spare parts, she builds her own! Can robots be programmed to dance??

    I played the roles of the Emperor of Orc Town, the DK in red, and the dancing robot! (I didn't create these, I was just a performer)

    Edited by Desiato on December 12, 2024 11:05PM
    spending a year dead for tax reasons
  • DinoZavr
    DinoZavr
    ✭✭✭✭✭
    KAFBCMq.png
    PC EU
  • RMW
    RMW
    ✭✭✭✭
    Some fun Elder Scrolls animation;

    https://www.youtube.com/watch?v=ccta76QbT9w
  • Sakiri
    Sakiri
    ✭✭✭✭✭
    ✭✭
    I am now going to re-watch the dagoth ur teaches dunmer slang video.
  • LadyGP
    LadyGP
    ✭✭✭✭✭
    Silo is an amazing show if someone is bored and needing something to watch.
    Will the real LadyGP please stand up.
  • merevie
    merevie
    ✭✭✭✭✭
    uqfgbbypapav.png
  • Doctor_Demento
    Doctor_Demento
    ✭✭✭
    This is ZOS idea of "some people" who can't get online. \

    u1h56vodmako.jpg

    This is reality of who can't get online on Planet Earth...

    tvsbail6yifi.jpg

    So truth is NO ONE can login. Zero...

    x96w2dki79nw.jpg




  • FireSoul
    FireSoul
    ✭✭
    Hi,

    I'm a Linux Systems Engineer, by profession, and I have been through a colo-wide Emergency Power Off event in my time.

    Let me tell you, it's not as simple as just turning stuff back on...
    1. Our colocation center ITSELF was supposed to be our UPS. There's no UPS. If the colo goes out, that's it.
    2. when power was cutoff, it didn't take us long to figure out that the colo.. disappeared. We basically clown-car'd over to the datacenter and we were there for a long time. The power failure had occured in the early evening, on a Friday, and we spent all night there. We were 5 staff members that rushed over.
    3. when the power came back, ALL of the machines all tried to POST and boot at the same time. I don't know if you've ever heard servers, but their fans scream and everything goes full power for a sec. There was a brownout and 2/3 of the hosts were stuck in POST, frozen. Someone had to go around with a crashcart/KVM to check on its health and force a powercycle. 1 host at a time. There can be a LOT of hosts in a colo.
    4. our disaster recovery plan never had a 'cold start' plan prepared and we had to make one up on the fly. The switches will just power on and everything needs to be up. Storage, Database, and caching hosts first. Tools and things that talk to storage hosts next. (workhorse hosts, website). Once that's up and healthy, Proxies come up next, opening the floodgates to services.
    5. many the database hosts had corrupted tables that needed SQL table repair after boot. I saw in another thread that there are indeed MySQL hosts involved, so they have my sympathy there. *1000 yards stare*
    6. Some hosts were DOA and wouldnt even power on. Sometimes it was a standby of a given role, so we just let them stay dead till we had time for a replacement. Others were Primaries, and we had to force emergency failovers and make sure the old dead primaries stayed dead and don't just come back to life to mess things up. That led to some things being out a sync a bit after revival.

    Anyway, we worked all weekend. We had standby hosts to revive or replace and a lot of cleanup to do to damaged databases that we had to prioritize.
    When we walked in the office door on Monday, the office staff stood up and gave us a standing ovation.
  • galbreath34b14_ESO
    I'm gonna go out on a limb and say that over 6 hours into a complete shutdown that having the idiotic "All Systems Operational" message saying all servers up is going to have long term trust erosion with players.

    dgupmnu8pkii.png

  • arena25
    arena25
    ✭✭✭✭✭
    Nightmare scenario over at Zeni right now.

    From ZoSKevin:
    Hi all, just providing an update. We are still hard at work getting systems back online. Based on what we know right now, we believe the Megaservers will most likely be offline longer than the original 12 hour estimation. We hope to provide more clarity on timeframe once we have a little more time to complete more work.

    Regarding the scope of work, this issue we ran into today was an edge-case emergency power outage at the data center that did not trigger standard backup failsafes for multiple tenants affected by the outage. (This type of outage is designed to cut ALL power in the event of a fire/flood scenario.) The outage now requires us to do a full reboot of our hardware while recovering from a full loss of power. Rebuilding piece by piece involves a methodical and lengthy process, including additional verification and testing as we bring the hardware online.

    Hopefully this provides some clarity on the work happening right now. Thanks again for the continued patience.

    Advice: Head for bed. Take your first shower in a month, get some actual rest, check back in tomorrow morning.

    At least I can rest easy knowing Zeni won't be the only folks with some major explaining to do...
    Edited by arena25 on December 12, 2024 11:17PM
    If you can't handle the heat...stay out of the kitchen!
  • Sleepsin
    Sleepsin
    ✭✭✭✭
    I'm gonna go out on a limb and say that over 6 hours into a complete shutdown that having the idiotic "All Systems Operational" message saying all servers up is going to have long term trust erosion with players.

    dgupmnu8pkii.png

    I was just going to mention that. Seems odd.
  • LadyGP
    LadyGP
    ✭✭✭✭✭
    FireSoul wrote: »
    Hi,

    I'm a Linux Systems Engineer, by profession, and I have been through a colo-wide Emergency Power Off event in my time.

    Let me tell you, it's not as simple as just turning stuff back on...
    1. Our colocation center ITSELF was supposed to be our UPS. There's no UPS. If the colo goes out, that's it.
    2. when power was cutoff, it didn't take us long to figure out that the colo.. disappeared. We basically clown-car'd over to the datacenter and we were there for a long time. The power failure had occured in the early evening, on a Friday, and we spent all night there. We were 5 staff members that rushed over.
    3. when the power came back, ALL of the machines all tried to POST and boot at the same time. I don't know if you've ever heard servers, but their fans scream and everything goes full power for a sec. There was a brownout and 2/3 of the hosts were stuck in POST, frozen. Someone had to go around with a crashcart/KVM to check on its health and force a powercycle. 1 host at a time. There can be a LOT of hosts in a colo.
    4. our disaster recovery plan never had a 'cold start' plan prepared and we had to make one up on the fly. The switches will just power on and everything needs to be up. Storage, Database, and caching hosts first. Tools and things that talk to storage hosts next. (workhorse hosts, website). Once that's up and healthy, Proxies come up next, opening the floodgates to services.
    5. many the database hosts had corrupted tables that needed SQL table repair after boot. I saw in another thread that there are indeed MySQL hosts involved, so they have my sympathy there. *1000 yards stare*
    6. Some hosts were DOA and wouldnt even power on. Sometimes it was a standby of a given role, so we just let them stay dead till we had time for a replacement. Others were Primaries, and we had to force emergency failovers and make sure the old dead primaries stayed dead and don't just come back to life to mess things up. That led to some things being out a sync a bit after revival.

    Anyway, we worked all weekend. We had standby hosts to revive or replace and a lot of cleanup to do to damaged databases that we had to prioritize.
    When we walked in the office door on Monday, the office staff stood up and gave us a standing ovation.

    This. Used to work in IT and had to assist some Sys Admins when things went down over the weekend. Yeah, this post is 100% what some poor souls are having to deal with right now at the data center. Truly feel for them... the stress they are under right now... isn't fun.
    Will the real LadyGP please stand up.
  • Tinyfangs
    Tinyfangs
    ✭✭✭
    When you only just got power back today, after storm Darragh had the lights go out on Friday last week (almost 6 blooming days without power and water!) - only to find ESO is down with its own power failure.

    At this point I am just laughing about it all :D (spent all my tears already on the lost log in rewards and endeavours...)

    Ah well, need research generators anyway................................

  • Gingaroth
    Gingaroth
    ✭✭✭
    It's also the point at which even more dishes seem to suddenly manifest in the sink out of no-where.

    How do you know so accurately what happens in my home? That's almost scary!
  • Destai
    Destai
    ✭✭✭✭✭
    ✭✭✭
    Just saw the update. Good luck guys, I am sure it’s stressful but just know that you are appreciated for communicating and working hard getting it back up @ZOS_Kevin @ZOS_JessicaFolsom @ZOS_GinaBruno
  • kargen27
    kargen27
    ✭✭✭✭✭
    ✭✭✭✭✭
    Time for a warm soak in the tub.

    HQ183jE.jpg

    Or spend the day painting that next masterpiece.

    AqVdPYu.jpg

    Do not though run off and do something rash.

    dYlmSco.jpg
    and then the parrot said, "must be the water mines green too."
  • dk_dunkirk
    dk_dunkirk
    ✭✭✭✭✭
    FireSoul wrote: »
    Hi,

    I'm a Linux Systems Engineer, by profession, and I have been through a colo-wide Emergency Power Off event in my time.

    Let me tell you, it's not as simple as just turning stuff back on...
    1. Our colocation center ITSELF was supposed to be our UPS. There's no UPS. If the colo goes out, that's it.
    2. when power was cutoff, it didn't take us long to figure out that the colo.. disappeared. We basically clown-car'd over to the datacenter and we were there for a long time. The power failure had occured in the early evening, on a Friday, and we spent all night there. We were 5 staff members that rushed over.
    3. when the power came back, ALL of the machines all tried to POST and boot at the same time. I don't know if you've ever heard servers, but their fans scream and everything goes full power for a sec. There was a brownout and 2/3 of the hosts were stuck in POST, frozen. Someone had to go around with a crashcart/KVM to check on its health and force a powercycle. 1 host at a time. There can be a LOT of hosts in a colo.
    4. our disaster recovery plan never had a 'cold start' plan prepared and we had to make one up on the fly. The switches will just power on and everything needs to be up. Storage, Database, and caching hosts first. Tools and things that talk to storage hosts next. (workhorse hosts, website). Once that's up and healthy, Proxies come up next, opening the floodgates to services.
    5. many the database hosts had corrupted tables that needed SQL table repair after boot. I saw in another thread that there are indeed MySQL hosts involved, so they have my sympathy there. *1000 yards stare*
    6. Some hosts were DOA and wouldnt even power on. Sometimes it was a standby of a given role, so we just let them stay dead till we had time for a replacement. Others were Primaries, and we had to force emergency failovers and make sure the old dead primaries stayed dead and don't just come back to life to mess things up. That led to some things being out a sync a bit after revival.

    Anyway, we worked all weekend. We had standby hosts to revive or replace and a lot of cleanup to do to damaged databases that we had to prioritize.
    When we walked in the office door on Monday, the office staff stood up and gave us a standing ovation.

    As someone who helped bring a data center online, I don't understand ANY of this. We had redundant EVERYTHING except main power feeds (because of local zoning). Even redundant ISP's and physical drops. We tested our generators and our UPS's and cooling towers monthly. I don't get it. A colo facility that even HAS an "edge case" scenario is not one I'd trust a million dollar a year business to.
  • Pendrillion
    Pendrillion
    ✭✭✭✭
    Wow... That sounds serious. The whole Infrastructure going down... Holy crap!
  • OutLaw_Nynx
    OutLaw_Nynx
    ✭✭✭✭✭
    ✭✭
    This is bad :(
  • sarahthes
    sarahthes
    ✭✭✭✭✭
    ✭✭
    dk_dunkirk wrote: »
    FireSoul wrote: »
    Hi,

    I'm a Linux Systems Engineer, by profession, and I have been through a colo-wide Emergency Power Off event in my time.

    Let me tell you, it's not as simple as just turning stuff back on...
    1. Our colocation center ITSELF was supposed to be our UPS. There's no UPS. If the colo goes out, that's it.
    2. when power was cutoff, it didn't take us long to figure out that the colo.. disappeared. We basically clown-car'd over to the datacenter and we were there for a long time. The power failure had occured in the early evening, on a Friday, and we spent all night there. We were 5 staff members that rushed over.
    3. when the power came back, ALL of the machines all tried to POST and boot at the same time. I don't know if you've ever heard servers, but their fans scream and everything goes full power for a sec. There was a brownout and 2/3 of the hosts were stuck in POST, frozen. Someone had to go around with a crashcart/KVM to check on its health and force a powercycle. 1 host at a time. There can be a LOT of hosts in a colo.
    4. our disaster recovery plan never had a 'cold start' plan prepared and we had to make one up on the fly. The switches will just power on and everything needs to be up. Storage, Database, and caching hosts first. Tools and things that talk to storage hosts next. (workhorse hosts, website). Once that's up and healthy, Proxies come up next, opening the floodgates to services.
    5. many the database hosts had corrupted tables that needed SQL table repair after boot. I saw in another thread that there are indeed MySQL hosts involved, so they have my sympathy there. *1000 yards stare*
    6. Some hosts were DOA and wouldnt even power on. Sometimes it was a standby of a given role, so we just let them stay dead till we had time for a replacement. Others were Primaries, and we had to force emergency failovers and make sure the old dead primaries stayed dead and don't just come back to life to mess things up. That led to some things being out a sync a bit after revival.

    Anyway, we worked all weekend. We had standby hosts to revive or replace and a lot of cleanup to do to damaged databases that we had to prioritize.
    When we walked in the office door on Monday, the office staff stood up and gave us a standing ovation.

    As someone who helped bring a data center online, I don't understand ANY of this. We had redundant EVERYTHING except main power feeds (because of local zoning). Even redundant ISP's and physical drops. We tested our generators and our UPS's and cooling towers monthly. I don't get it. A colo facility that even HAS an "edge case" scenario is not one I'd trust a million dollar a year business to.

    It sounds to me like the system that cuts everything out to prevent loss due to water damage kicked in. The one where "welp it's better to shut down unexpectedly rather than short out" comes into play. Basically where you don't WANT backup power to kick in.
  • Gingaroth
    Gingaroth
    ✭✭✭
    Rkindaleft wrote: »
    This is what I ate for dinner

    ggi7urantopy.jpeg

    That looks great! (Now I'm jealous)
    Edited by Gingaroth on December 13, 2024 12:06AM
This discussion has been closed.