Maintenance for the week of September 22:
· [COMPLETE] NA megaservers for maintenance – September 22, 4:00AM EDT (8:00 UTC) - 10:00AM EDT (14:00 UTC)
· [COMPLETE] EU megaservers for maintenance – September 22, 8:00 UTC (4:00AM EDT) - 14:00 UTC (10:00AM EDT)

SERVER CRASH AGAIN?!

  • SeptimusDova
    SeptimusDova
    ✭✭✭✭✭
    Kaithuzar said




    Q. I've been working in industry a long time & while I understand some of what you are saying can you clarify a few things?
    Personally I can't remember a single outage on the carrier level due to "solar disturbance".
    Are there notes on this happening or happening frequently at various locations?
    You state "are disruptive to backhaul circuits", are there not multiple pathways & failovers? Is there no redundancy? (both on the carrier side & ZOS's side)

    Traditionally, many larger companies have multiple carriers so if Verizon goes down, AT&T is still up & vice versa; you just have to reroute the traffic & everything still works.


    A. I cannot speak for the Zos side.

    I can speak for the co located sites and ATC sites and other .gov located sites. Microwave sites typically have a set path and the you cannot connect multiple radios to the same hardline or waveguide*Edited to add, Not with out some rather specialized equipment.* And no guarantee it will work**. Solar affects the Radio sites. It affects the receive frontend and the AGC of the equipment. As spectrum is saturated with white noise approaching S9 digital receivers go into a polling mode to make up for increased BER. If a dirty signal is generated, then a dirty signal is sent. GIGO.

    I have to be careful about violating and still pending NDA's and Security issues with PKI. So bear with me.
    colo microwave sites have fixed and limited resources. Racks, Power, Backup power, battery banks, inverters and most importantly Limited Towers. Wave guide and hardline trays are limited as well.
    Many Colo facilities for AT&T are actually in trust to Chase Manhattan bank. And as such you are limited to any building modifications you can make to expand floor space. I worked those sites as well as ATC sites and later on FAA gov sites. Ownership is a murky river. Who owns what and ties into what can be a mess. I have seen 2.5 GHz analog Collins equipment (Circa 1959) alongside a newer Alactel system followed by an older Nortel RD5 system. You never know what is in a microwave site until you open that door.

    Q. :"The OC-whatever & DS-whatever lines that are running underneath are just cabling right? So the cabling shouldn't be affected, it should just be the circuitry in the telco room at the colo & that room should be shielded by being a metal room or something that would not be conductive/affected by the "solar disturbance"."

    A. DS in this case is across a microwave path instead of a copper or fiber line. We do have items we can place "In-Line" that help reduce the effect. What we cannot fix is the Ionosphere saturation. Also waveguide and hardline systems require maintenance and on some circuits that 5 minutes a year. Example a 911CLN contracted for 99.999% uptime. I had this issue at the Little Maria site just before I retired the entire site went dark. 64 hours later we had it up and running again. At a cost of 4.5 mil to AT&T reason. Sabotage to the Generator system and the site as well.We had to replace all of the equipment.

    This picture should help clarify any questions.

    mGNK7r.jpg




    I managed this site along with many, many more across 9 states it can get tiring. When you have issues with Microwave backhaul.
    Edited by SeptimusDova on August 28, 2015 6:52PM
  • BlueGreenMikey
    cjthibs wrote: »
    I am pretty sure the total unplanned downtime experienced does not exceed 3.65 days in the past year, but even if it were 3.65 entire days would still put us at 99% uptime. Which is respectable for an operation this large.

    There are people who play this game other than PC users now. I'd be shocked if it wasn't more than 3.65 entire days for console users just since the XBOX/PS4 launches.

    Ignoring the rest of your unnecessary, unjustified, and unwarranted condescension.
    Edited by BlueGreenMikey on August 28, 2015 6:38PM
  • gard
    gard
    ✭✭✭✭✭
    It cracks me up when people attempt to validate their opinions by saying, "I've been playing since beta"

    It's a fact that internet connectivity is susceptible to failure.
    Lets accept that, and forget for a moment that datacenters generally peer with multiple backbone providers and can manage traffic flow among them relatively easily.

    What's yet to be explained is how a loss of internet connectivity can lead to a server crash, much less require a system restart.
    I have hundreds of linux servers on the east and west coasts. Didn't have to reboot a single one.

    Don't get me wrong, I'm not angry - I accept that things happen. I don't even feel that I'm owed an explanation.
    I'm just curious as to what really happened, because what I've heard so far doesn't add up.

    My wife complains that I never listen to her. (Or something like that.)
    -- I'm a one man smurf zerg!

    My ESO addons:
    Midnight - Find out when midnight is so that you can check for ww/vamp spawn.
    Goto - Adds a tab to the map pane allowing you to teleport to a friend, guildmate, or groupmate for free.
  • me_ming
    me_ming
    ✭✭✭✭✭
    Annalyse wrote: »
    I don't understand this reasoning at all.

    I pay a monthly fee for the use of my internet. Occasionally, something happens and I lose internet for a short period of time. Am I annoyed? Of course. But do I call my ISP and demand they reimburse me for that period in which I was unable to use the internet? No.

    But there's a major, major distinction here.

    "Occasionally, something happens and I lose internet for a short period of time."

    Yeah, I wouldn't complain about that either. I wouldn't demand money back for that either.

    ESO's servers go down with some regularity, however. Players repeatedly complain about issues of getting booted in the middle of playing the game. Though it has been more reliable lately, weekend outages at peak popularity times still happen. And then there's the tone-deaf "customer service" response of blaming everyone else---Microsoft/Sony, your ISP, you---rather than taking responsibility for their own actions unless pressed.

    (Also, most of the Internet was working last night. When ESO went down, many of us simply went and did other stuff. I watched Netflix, which worked without a hitch. Literally zero of the web technologies I tried last night other than ESO were broken. I don't know if it is because other companies anticipate problems better and come up with redundancies and plans. But given that and everything that we know about ESO and the lack of server reliability, I'd say that this has more to do than with just bad luck last night.)

    I personally wouldn't demand money back or anything for what happened last night, or probably ever. I'm not a subscriber, and I feel like I've already gotten my $60 worth out of the game, so everything is gravy now. But it's not ridiculous, especially for subscribers, to see yet another outage (whether it be last night or the next one) as the last straw. Our copies of ESO might as well be paperweights every time that ESO has an outage.

    For many of us (me included), ESO is the most unreliable technology we use, and there's nothing else that even comes close.

    Yeah, basically you saying "Literally zero of the web technologies I tried last night other than ESO were broken." is pretty hilarious, not because it's not true. I don't doubt that everything else was working on your "web technologies" other than ESO, but the fact that you don't get why ESO wasn't working is pretty funny. lol. Anyways, I'm pretty sure people [in this thread] have already extensively explained why you ESO was down despite the fact that you can go online, it's up to you if you want to read and at least try to understand.
    Edited by me_ming on August 28, 2015 6:52PM
    "We're heroes, my boon companion, and heroes always win! Let that be a lesson to you."
    -Caldwell, "The Final Assault"

    "There is always a choice. But you don't get to choose what is true, you only get to choose what you will do about it..."

    -Abnur Tharn, "God of Schemes"]
  • cjthibs
    cjthibs
    ✭✭✭✭✭
    cjthibs wrote: »
    I am pretty sure the total unplanned downtime experienced does not exceed 3.65 days in the past year, but even if it were 3.65 entire days would still put us at 99% uptime. Which is respectable for an operation this large.

    There are people who play this game other than PC users now. I'd be shocked if it wasn't more than 3.65 entire days for console users just since the XBOX/PS4 launches.

    Ignoring the rest of your unnecessary, unjustified, and unwarranted condescension.

    I can't speak for console users since I'm not one.
    I do remember hearing that some of those outages were related to their respective console services. (XBL/PSN)

    I probably also should've mentioned that, as far as I'm aware, ZOS makes no such guarantee of uptime anyhow. Which makes all of this purely academic.

    Disagreement ≠ Condescension.
  • ColoursYouHave
    ColoursYouHave
    ✭✭✭✭✭
    I, for one, am outraged, and fully expect compensation.

    By my estimations, the server was down between 11PM and 6AM, for a total of around 7 hours downtime. Despite the fact that I was asleep for a large portion of that downtime, I was still unable to use a service that I had paid for. At $15 per month, I end up paying about $.50 per day, or a little over $.02 per hour, so I demand you reimburse me the $.14 that I lost during this downtime in order to make things right!

    And yes, that 14 cents really is important to me, and I'm not just saying all of this as a passive-aggressive way to express my frustrations towards the company for something they could not have avoided!
  • Ravici
    Ravici
    ✭✭
    You guys (and gals, *fluttering eyelashes @ Gina*), have no idea about frustration!

    So, here's me last night, can't play, watching this thread and in the early hours trying to keep up with the darn thing. It may have slowed down later on but for the first couple hours, the post counts were nuts. Refresh the page to see new posts just to find you had two new pages of posts to get through :/

    So I'm a hexbox player and of course that means we can play without giving our email details to one of my most loathed companies right now so they don't bombard me with junk that I want absolutely nothing to do with... problem!

    So, here's the thing, until said company has your details, you are not capable (allowed doesn't even come into it) of posting in said hyperactive thread.

    Not only could I not play the game, I couldn't even communicate on the forums! but... can it get any worse? Of course it can!

    Here's the extra kicker. Not only were the game servers down, but the signup/account servers were too. This obviously for those not keeping up with my lame excuse of a post so far, will by now be realising that in order to post about the problem, I had to get past the problem in the first place. Talk about chicken and egg :lol: (Ok, you put a colon followed be a capital D, worldwide general term for a laughing smiley and it gives you... dissappointed???? sheesh!)

    So, after all that, Hi there from me. I promised I would never touch another Elder Scroll game after the abortion that was Skyrim, killed off what was once a favourite series but is actually enjoying this latest offering (even if I do *** and moan about it because in places it still sucks hehe) enough to maybe come back to the fold.

    Oh, and anyone saying this wasn't their fault, the buck has to stop and this is where it does. Enough with this passing the blame game. For those saying the internet was effected worldwide... get a grip. That's just marketing speak for a get out of jail card. For those talking about there being only the one game server and the European one being a sham... the game servers may be in different places, but we all need to go through an initial account server and it would be beyond silly to have two of those running all the time ;)

    You can put your tin foil hats in the bin on the way out :)

    Hi all *waves like a loony*
    Edited by Ravici on August 28, 2015 7:21PM
    Rafe Harwood on heXbox European server
  • danew6
    danew6
    ✭✭✭
    On the plus side the game kicking me out last night meant I went to bed earlier.
    Play on PS4 I do like the game, I just feel it could be a lot better and would like to feel like the devolpers actually care about the game and not just making money
  • SeptimusDova
    SeptimusDova
    ✭✭✭✭✭
    I am still trying to wake up.
  • stevenbennett_ESO
    stevenbennett_ESO
    ✭✭✭
    I can't believe I ate read the whole thread…

    Man… I go to sleep early for just one night instead of playing ESO late as I usually do, and *this* thread happens. Why do I always miss all the awesomeness? :tongue:

    Still, huge kudos to Gina especially for herding the cats in this thread and keeping things entertaining and fairly light in tone. As long as this thread was, it was still a highly entertaining read… :grin:
  • Spottswoode
    Spottswoode
    ✭✭✭✭✭
    I am still trying to wake up.

    61UX8ue-B3L._SL1000_.jpg
    Still coherent. :)
    Edited by Spottswoode on August 28, 2015 7:22PM
    Proud Player of The Elder Bank Screen Online.
    My khajiit loves his moon sugar.
    Steam Profile
    Libertas est periculosum. Liberum cogitandi est haeresis. Ergo, et ego terroristis.
    Current main PC build:
    i7 3770 (Not overclocking currently.)
    MSI Gaming X GTX 1070
    32gb RAM

    Laptop:
    i7-7700HQ
    GTX 1060
    16gb RAM

    Secondary build:
    i3 2330
    GTX 660
    8gb RAM
  • Elsonso
    Elsonso
    ✭✭✭✭✭
    ✭✭✭✭✭
    gard wrote: »
    What's yet to be explained is how a loss of internet connectivity can lead to a server crash, much less require a system restart.

    I have hundreds of linux servers on the east and west coasts. Didn't have to reboot a single one.

    The servers probably did not crash. They were probably all running normally and the monitors were showing a significant decrease in the number of players that were playing as each player lost connectivity.

    They rebooted the servers as part of manual recovery step, and I would guess that it was due to thousands of players that were suddenly disconnected in the middle of whatever they were doing. The server reboot probably acts to clean things up so that when players log back in they do not encounter any problems.
    XBox EU/NA:@ElsonsoJannus
    PC NA/EU: @Elsonso
    PSN NA/EU: @ElsonsoJannus
    Total in-game hours: 11321
    X/Twitter: ElsonsoJannus
  • SeptimusDova
    SeptimusDova
    ✭✭✭✭✭
    I am still trying to wake up.

    61UX8ue-B3L._SL1000_.jpg
    Still coherent. :)

    I am going to google and order that.
    Edited to add. Just ordered 5 pound bag. nice price.
    Edited by SeptimusDova on August 28, 2015 7:25PM
  • Greeniewolfub17_ESO
    cjthibs wrote: »
    I am pretty sure the total unplanned downtime experienced does not exceed 3.65 days in the past year, but even if it were 3.65 entire days would still put us at 99% uptime. Which is respectable for an operation this large.

    There are people who play this game other than PC users now. I'd be shocked if it wasn't more than 3.65 entire days for console users just since the XBOX/PS4 launches.

    Ignoring the rest of your unnecessary, unjustified, and unwarranted condescension.

    you're the one being condescending. He was just giving you the information you needed to stop being ignorant and being very tolerant and patient about it. Much more so than you deserve.
    Me: "Okay lets run to Alessia. Mount up and follow me!"
    Me five seconds later: "Um yeah... totally forgot about that cliff..."
  • stevenbennett_ESO
    stevenbennett_ESO
    ✭✭✭
    Oh, and for all those posting their confusion about how server X could be affected, etc… In this day an age, network topology is extremely complex even when you're talking about small systems. Databases often run on separate instances in the cloud than the servers which need to access them, and that communication happens through the internet. When you start getting into multiple interacting servers, the communications between each server happens through the internet. Often just the communications channels are complex, with server group A communicating to a bridge which is tunneling through the internet to server group B, C, D, etc. And that happens even with fairly simple websites and the like. There's a LOT of communication across the net which needs to happen or things stop working. And often, if those communications fail for more than just a few seconds, it causes program crashes or abnormal terminations, which then cascades to other failures.

    It's often possible (and common) to have redundant backup communications, but when the problem happens in a major internet backbone, there's a decent chance your backups go through the same backbone -- it's out of your control if that happens. Usually the backbones have some redundancy, but the redundant circuits can get overloaded and drop a lot of traffic -- something which has low traffic needs might work fine with a bit of a slowdown, and if you've got a sufficiently distributed system, you might still work because your clients can go to alternate data servers, but when you get into MMOs, that's another animal.

    MMO network topology, in particular, is a massive spiderweb of cross communications, some between local servers, some across the net to remote servers, some to databases which may be local or remote, some to admin / login servers which may be local / remote, with status monitoring, and synchronization issues which make normal network administrators shy away in terror at the complexity of it all. And unlike, say, a streaming server, or your average website, ALL of that communication needs to run at a fairly consistent high speed or the MMO fails.

    Fortunately, an MMO is a game, not something critical. People will survive an 8 hour outage. OTOH, I'm sure there are a lot of companies who are dealing with some very serious issues this morning due to the outage last night.
  • QuebraRegra
    QuebraRegra
    ✭✭✭✭✭
    I have responsibility for services domestic and foreign, with a significant portion being provided via LEVEL3, and AT&T carriers.

    Total number out outages/interruptions to services last night... ZERO. This is very notable, as while most of the network connections are designed with high availability/redundancy in mind, some of the links are single homed, and none appeared to have been affected (even LEVEL3 services in the Houston area, etc.).

    Is it possible that a network interruption could have impacted server performance?... Possibly, but I'd take a serious look at how my processes function in relation to network continuity were that the case.

    That stated, Gina's participation was appreciated none the less.



  • Ratbert
    Ratbert
    ✭✭
    I have responsibility for services domestic and foreign, with a significant portion being provided via LEVEL3, and AT&T carriers.

    Total number out outages/interruptions to services last night... ZERO. This is very notable, as while most of the network connections are designed with high availability/redundancy in mind, some of the links are single homed, and none appeared to have been affected (even LEVEL3 services in the Houston area, etc.).

    Is it possible that a network interruption could have impacted server performance?... Possibly, but I'd take a serious look at how my processes function in relation to network continuity were that the case.

    That stated, Gina's participation was appreciated none the less.

    I have responsibilities for Level 3 communications connections for my company out of Dallas. I can tell you we WERE affected by this outage. Some services were available to us and others were not. Many websites were available to us to browse and many weren't. Our hosted exchange was one of those services affected and unreachable, yet I could browse Bestbuy.com for a new TV but not get to facebook.
  • NDwarf
    NDwarf
    ✭✭✭
    Man what the...

    How did a server down thread become an E-Peen thread for IT guys?

    I'd say SeptimusDova is winning tho. He post the largest wall of text and it has images. You other IT guys need to step up your game or he'll be crowned Emperor of IT.
    "When people !@# with you you !@# with them ten times worse. Next thing you know, you're in a motel room with 24 beers and a half bucket of chicken. You see, that's how you get things done." Ricky, Trailer Park Boys.
  • QuebraRegra
    QuebraRegra
    ✭✭✭✭✭
    Ratbert wrote: »
    I have responsibility for services domestic and foreign, with a significant portion being provided via LEVEL3, and AT&T carriers.

    Total number out outages/interruptions to services last night... ZERO. This is very notable, as while most of the network connections are designed with high availability/redundancy in mind, some of the links are single homed, and none appeared to have been affected (even LEVEL3 services in the Houston area, etc.).

    Is it possible that a network interruption could have impacted server performance?... Possibly, but I'd take a serious look at how my processes function in relation to network continuity were that the case.

    That stated, Gina's participation was appreciated none the less.

    I have responsibilities for Level 3 communications connections for my company out of Dallas. I can tell you we WERE affected by this outage. Some services were available to us and others were not. Many websites were available to us to browse and many weren't. Our hosted exchange was one of those services affected and unreachable, yet I could browse Bestbuy.com for a new TV but not get to facebook.

    Then I guess the lesson here is to pay for and ensure diversely routed/provided carrier services? What about server virtualization at a remote or offsite?

    Overall, considering that we're talking about a videogame here, I guess the service metrics are acceptable. Wouldn't fly on my network.


  • Ratbert
    Ratbert
    ✭✭
    Then I guess the lesson here is to pay for and ensure diversely routed/provided carrier services? What about server virtualization at a remote or offsite?

    Overall, considering that we're talking about a videogame here, I guess the service metrics are acceptable. Wouldn't fly on my network.

    Not sure what the lesson is to be honest haha. Just saying what our experience was. I'm not certain anyone knows the topology of these carriers enough to definitively say one way or the other. I don't know how ESO's datacenters are built, how their traffic is routed, or even if they have a datacenter and aren't buying infrastructure from some Co-Lo.

    I only know what I have access to and even my Level 3 rep for my account wasn't hinting at what was going on, only that they had service interruptions affecting north america which was the same message on their automated IVR when I called.

    Didn't feel to me like a hardware issue though. Felt more software or configuration driven.
    Edited by Ratbert on August 28, 2015 9:03PM
  • Darlgon
    Darlgon
    ✭✭✭✭✭
    Edited by Darlgon on August 29, 2015 1:16AM
    Power level to CP160 in a week:
    Where is the end game? You just played it.
    Why don't I have 300+ skill points? Because you skipped content along the way.
    Where is new content? Sigh.
  • lathbury
    lathbury
    ✭✭✭✭✭
    Have try using a better ISP sick of these people on dial up slating the game when it runs fine .
  • ghostwise
    ghostwise
    ✭✭✭
    Yeah me and my buddies were experiencing heavy lag a short while ago. Very similar before it crashed last night.
  • Inactive Account
    Inactive Account
    ✭✭✭✭
    This may not be totally over yet; do be surprised if the servers go down again sometime in the near future.

    The system is still weak...according to...https://downdetector.com/status/level3/map/

    Read the comments at the bottom....
    ghostwise wrote: »
    Yeah me and my buddies were experiencing heavy lag a short while ago. Very similar before it crashed last night.

    Just saying
    Edited by Inactive Account on August 29, 2015 2:15AM
  • Knaxia
    Knaxia
    ✭✭✭
    Everyone had major lag spikes in Cyro (and not the usual ones) including people getting disconnected earlier, I didn't get anything but even our newly crowned emp had issues doing anything.
  • vaagventje17eb17_ESO
    Knaxia wrote: »
    Everyone had major lag spikes in Cyro (and not the usual ones) including people getting disconnected earlier, I didn't get anything but even our newly crowned emp had issues doing anything.

    omg not the emp! .......

  • nine9six
    nine9six
    ✭✭✭✭✭
    Again.
    Wake up, we're here. Why are you shaking? Are you ok? Wake up...
  • BlueGreenMikey
    Just happened to me too. Lag went crazy for about 3 minutes, got booted, and now I can't reconnect. (XBOX NA for me.)
    Edited by BlueGreenMikey on August 30, 2015 9:56PM
  • Reverb
    Reverb
    ✭✭✭✭✭
    ✭✭✭✭✭
    nine9six wrote: »
    Again.

    Yep, PC NA down
    Battle not with monsters, lest ye become a monster, and if you gaze into the abyss, the abyss gazes also into you. ~Friedrich Nietzsche
  • Tiebearion
    Tiebearion
    ✭✭
    A brave hamster died today, can we show some respect please!......RIP Mr. Jiggles
    Glory to the Pact!
Sign In or Register to comment.