Maintenance for the week of October 27:
• PC/Mac: NA and EU megaservers for patch maintenance – October 27, 4:00AM EDT (8:00 UTC) - 10:00AM EDT (14:00 UTC)
• Xbox: NA and EU megaservers for maintenance – October 29, 4:00AM EDT (8:00 UTC) - 12:00PM EDT (16:00 UTC)
• PlayStation®: NA and EU megaservers for maintenance – October 29, 4:00AM EDT (8:00 UTC) - 12:00PM EDT (16:00 UTC)

ESO Performance and Lag - Technical Discussion

RinaldoGandolphi
RinaldoGandolphi
✭✭✭✭✭
✭✭✭✭
Introduction

After many months of research, my own internal testing, and understanding of protocols such as TCP/IP, this article im about to write and the links included will give a very deep analysis into "why ESO lags and what must be done to fix it." In my day job im a Network administrator with over 10 years of experience in Cisco, Linux, Windows, and BSD based administration.

Some of my past articles on this site include:

PSA-Please Don't Use Port Forwarding for ESO

This post will be very technical in nature and i HIGHLY recommend reading every article i link before posting comments as it will be required to understand the very deep network inner workings especially on the TCP side of things. This post will primarily deal with the PC version of ESO

The console versions of the game are very different as Xbox Live and PSN have different requirements for how network connections and protocols work and I don't own either of those consoles to see if they work and use the same protocols the PC version does.

What im typing is "trying" to break this down into simple terms someone without a degree or longtime background into networking and protocols can understand. As the entire intent of this post is educational

that being said, lets get started

What Protocol does ESO use on PC

ESO mostly uses the TCP Protocol on PC. As can be seen in the ESO Ports knowledge base article ESO uses the following ports on PC:
  • TCP / UDP Ports 24100 through 24131
  • TCP / UDP Ports 24500 through 24507
  • TCP / UDP Ports 24300 through 24331
  • TCP Port 80
  • TCP Port 433

ESO don't appear to use UDP from what I can see on PC, but relies solely on TCP for its connections. The Consoles may be different as Xbox Live and PSN are more closed environments then PC and may have different network requirements.

Why is the use of TCP important?

the TCP Protocol has built-in network congestion algorithms built in to it. It was designed this way to prevent a network from just falling over under normal working conditions. Why is this important?

Its important because of Packet Loss

What is Packet Loss

Packet Loss is when a packet sent from your network to a remote network does not reach its destination for a large number of reasons. Packet Loss can be caused by a variety of factors from an overloaded router, a bad network cable or NIC, or even the server your sending packets too is overlaoded and simply can't service anymore requests(receive any more packets) so the packets are simply dropped.

Why is Packet Loss important to TCP

By default TCP assumes that packet loss is caused by "Congestion". This means if your client doesn't receive a response in X amount of time, TCP considers the packet as lost, and will then cut throughput and figure out which packet must be resent, all the while reducing throughput until it catches up. This means you can have spikes of 1000+ latency that last much longer then they would if another protocol, such as a custom UDP implementation with latency mitigation algoriths were used. This is why engines, such as Unreal Engine 3 do so well in the online gaming world.

A single packet loss is enough to first cause a 1000+ms delay, then increase the consequent roundtrips, which only slowly return to normal as TCP ramps up the allowed throughput.

What does ESO ping Indicator actually show when its red?

The in-game ping indicator is a showing of response time. when the Indicator turns red, you don't have a 999+ ping, what you have is Packet Loss. This means the server simply can't handle anymore requests being sent to it and what else can it do but just drop the packet. Imagine overloading a Cisco router to 100% CPU utilization by flooding it with packets and then trying to reach its admin interface over a network...good luck...it simply can't handle anymore and packets are just dropped as there are no resources left to issue a response of any kind.

This is what happens in ESO and PVP when lag gets really bad...as more and more players are pushing buttons for skills(sending requests to the server) the server simply can't handle anymore. Since your skills won't fire until it gets a response from the server, when things are really laggy, you simply cna't get anything to work...the 999+ red ping indicator is telling you about massive packet loss.

No amount of "LOS checks" and "changing ability animations" are going to Fix this issue.

What will Fix ESO lag issues for good?

Removing the dependence on TCP and writing the game network code in a RUDP(Reliable UDP) protocol and layering in latency mitigation algorithms.

Example:
With tcp the lag after its starts may never dissipate as it actually requires more bandwidth to catch up. Which is why restarting a connection corrects the issue.

An perfect example of this is when tcp gets a backlog stuck in its buffer. If the packet contains the information of every other player location on the map. Then when being over bandwidth it is really a massive advantage of loosing the packet. with tcp you will still need to read and process out of data information when suffering from packet loss. With udp the last packet your receive always contains the most up to date implementation.

The 2nd part of this simple example is how do you catch up again. The reason your probably suffering from packet loss is drops due to queues overflowing on a router because the link is overloaded. In order to resend the data you now need to send the original data and the new data to catch up. so you are at a serious disadvantage.

As you can see, one packet loss or dropped packet just snowballs with TCP leading to those 999+ lag you see in PVP and some PVE instances due to the lack of algorithms(outside of standard TCP congestion controls) to address packet loss.

TCP is a Layer 4 on the OSI Model. Its designed to handle latency issues as common network related issues(meaning congestion controls) not for the issues associated with games.

The 1st step to fixing the lag issues with ESO is to re-write the netcode using RUDP(Reliable UDP). This will then allow ZOS to write their own latency control issues into the custom protocol that would kinda make it a mix between Layer 4(UDP) with custom code built on top of it to handle these issues(Layer 7) functionality.

By doing this, the best example would be say you have one packet dropped, instead of having to resend 2 packets, the server would send you back one that contains all the data you need which by itself would reduce stress on their systems by 50% theoretically.

Conclusion

I think the 1st step towards fixing ESO lag issues lies in the netcode and its reliance on TCP. I think a custom RUDP(Reliable UDP protocol) implementation would go a long way in reducing server overhead and making the game lag significantly less in high stress situations. TCP simply isn't the best choice for a game the scale of ESO and this becomes woefully apparent on the PVP side of the game. FPS issues and such on the client side can be addressed later down the pipe, and may not even need addressed at all if the underlying network code is moved to a RUDP implementation to allow it to scale significantly higher then what can be achieved with TCP. Such a move would also give ZOS far more granual error and packet loss control options allowing them to truly write a robust netcode that handle a large player base of scale.

Good luck, and thank you for your time.

Useful References
https://en.wikipedia.org/wiki/List_of_network_protocols_(OSI_model)#Layer_7_.28Application_Layer.29
https://en.wikipedia.org/wiki/OSI_model#Layer_4:_Transport_Layer
http://www.freebsd.org/cgi/man.cgi?query=polling
https://1024monkeys.wordpress.com/
https://1024monkeys.wordpress.com/2014/04/08/udp-vs-tcp-a-follow-up/
Rinaldo Gandolphi-Breton Sorcerer Daggerfall Covenant
Juste Gandolphi Dark Elf Templar Daggerfall Covenant
Richter Gandolphi - Dark Elf Dragonknight Daggerfall Covenant
Mathias Gandolphi - Breton Nightblade Daggerfall Covenant
RinaldoGandolphi - High Elf Sorcerer Aldmeri Dominion
Officer Fire and Ice
Co-GM - MVP



Sorcerer's - The ONLY class in the game that is punished for using its class defining skill (Bolt Escape)

"Here in his shrine, that they have forgotten. Here do we toil, that we might remember. By night we reclaim, what by day was stolen. Far from ourselves, he grows ever near to us. Our eyes once were blinded, now through him do we see. Our hands once were idle, now through them does he speak. And when the world shall listen, and when the world shall see, and when the world remembers, that world will cease to be. - Miraak

  • Sallington
    Sallington
    ✭✭✭✭✭
    ✭✭✭✭
    My gut feeling is that ZOS already knows this, and just doesn't have (or won't spend) the resources to fix it. I find it heard to believe that any competent network/system admin wouldn't have been able to pinpoint exactly that bottleneck you've detailed, and delivered that information to management.

    Network admin: "Sir, here's our bottleneck. Here's the cost of the solution."

    Manager: "Nope. Not worth it. Figure something else out that doesn't cost us any money."

    That's how I assume those conversations have gone, and why they are trying such menial things as removing deer or torchbugs from Cyrodil to try and help performance.
    Edited by Sallington on March 28, 2016 5:20PM
    Daggerfall Covenant
    Sallington - Templar - Stormproof - Prefect II
    Cobham - Sorcerer - Stormproof - First Sergeant II
    Shallington - NightBlade - Lieutenant |
    Balmorah - Templar - Sergeant ||
  • Sugaroverdose
    Sugaroverdose
    ✭✭✭✭✭
    I don't really think that they don't see in their analytics that they have insane packet loss, it is basic networking after all.
    But mostly i agree, TCP is not good for nearly-realtime games like ESO.
  • hrothbern
    hrothbern
    ✭✭✭✭✭
    A technical comparison between TCP and UDP
    see also this link:
    http://javarevisited.blogspot.nl/2014/07/9-difference-between-tcp-and-udp-protocol.html

    Main diff is that TCP guarantees package delivery at slow speed
    and UDP guarantees high speed at the expense of some packages lost.

    Considering that in LAG situations the guaranteed package delivery makes no sense because it is already too late, it is no wonder that UDP seems to be favored for MMO's

    however: that is what the linked article states.
    Do we have factual evidence what is currently best practice for MMO's.
    Which MMO's use which protocol ?


    "I still do not understand why I followed the advice of Captain Rana to bring the villagers of Bleakrock into safety. We should have fought for our village and not have backed down, with our tail between our legs. Now my home village is in shambles, the houses burning, the invaders feasting.I swear every day to Shor that after Molag Bal has been defeated, I will hunt down the invaders and restore peace in Bleakrock and drink my mead with my friends at the market place".PC-EU
  • hrothbern
    hrothbern
    ✭✭✭✭✭
    Sallington wrote: »
    My gut feeling is that ZOS already knows this, and just doesn't have (or won't spend) the resources to fix it. I find it heard to believe that any competent network/system admin wouldn't have been able to pinpoint exactly that bottleneck you've detailed, and delivered that information to management.

    Network admin: "Sir, here's our bottleneck. Here's the cost of the solution."

    Manager: "Nope. Not worth it. Figure something else out that doesn't cost us any money."

    That's how I assume those conversations have gone, and why they are trying such menial things as removing deer or torchbugs from Cyrodil to try and help performance.

    @Sallington ,

    Your base assumption is that UDP costs more money than UDP.

    Can you substantiate that ?

    Edited by hrothbern on March 28, 2016 5:32PM
    "I still do not understand why I followed the advice of Captain Rana to bring the villagers of Bleakrock into safety. We should have fought for our village and not have backed down, with our tail between our legs. Now my home village is in shambles, the houses burning, the invaders feasting.I swear every day to Shor that after Molag Bal has been defeated, I will hunt down the invaders and restore peace in Bleakrock and drink my mead with my friends at the market place".PC-EU
  • Sallington
    Sallington
    ✭✭✭✭✭
    ✭✭✭✭
    hrothbern wrote: »
    Sallington wrote: »
    My gut feeling is that ZOS already knows this, and just doesn't have (or won't spend) the resources to fix it. I find it heard to believe that any competent network/system admin wouldn't have been able to pinpoint exactly that bottleneck you've detailed, and delivered that information to management.

    Network admin: "Sir, here's our bottleneck. Here's the cost of the solution."

    Manager: "Nope. Not worth it. Figure something else out that doesn't cost us any money."

    That's how I assume those conversations have gone, and why they are trying such menial things as removing deer or torchbugs from Cyrodil to try and help performance.

    @Sallington ,

    Your base assumption is that UDP costs more money than UDP.

    Can you substantiate that ?

    Using the protocols are free, but you still need to program the game to use them. The only cost I assumed was diverting resources away from....whatever new DLC they are working on. There would be no extra cost for new systems/switches/routers,. unless they are very old and should upgraded anyway. Servers with no ability for NIC teaming, 100mb switches, etc
    Edited by Sallington on March 28, 2016 5:34PM
    Daggerfall Covenant
    Sallington - Templar - Stormproof - Prefect II
    Cobham - Sorcerer - Stormproof - First Sergeant II
    Shallington - NightBlade - Lieutenant |
    Balmorah - Templar - Sergeant ||
  • hrothbern
    hrothbern
    ✭✭✭✭✭
    Sallington wrote: »
    hrothbern wrote: »
    Sallington wrote: »
    My gut feeling is that ZOS already knows this, and just doesn't have (or won't spend) the resources to fix it. I find it heard to believe that any competent network/system admin wouldn't have been able to pinpoint exactly that bottleneck you've detailed, and delivered that information to management.

    Network admin: "Sir, here's our bottleneck. Here's the cost of the solution."

    Manager: "Nope. Not worth it. Figure something else out that doesn't cost us any money."

    That's how I assume those conversations have gone, and why they are trying such menial things as removing deer or torchbugs from Cyrodil to try and help performance.

    @Sallington ,

    Your base assumption is that UDP costs more money than UDP.

    Can you substantiate that ?

    Using the protocols are free, but you still need to program the game to use them. The only cost I assumed was diverting resources away from....whatever new DLC they are working on. There would be no extra cost for new systems/switches/routers, unless they are very old and should upgraded anyway.

    ok

    I guess that the data interface will partially have to be newly coded when using UDP.
    But that will not need the same resources as DLC's etc.
    It is something you would be expect to be sourced out to a specialist.
    So it's only money.....

    and tbh I have no idea if that involves a lot of money.
    Do we have somebody knowledgable on this on the forum, who can react ?


    Edited by hrothbern on March 28, 2016 5:38PM
    "I still do not understand why I followed the advice of Captain Rana to bring the villagers of Bleakrock into safety. We should have fought for our village and not have backed down, with our tail between our legs. Now my home village is in shambles, the houses burning, the invaders feasting.I swear every day to Shor that after Molag Bal has been defeated, I will hunt down the invaders and restore peace in Bleakrock and drink my mead with my friends at the market place".PC-EU
  • Sugaroverdose
    Sugaroverdose
    ✭✭✭✭✭
    hrothbern wrote: »
    A technical comparison between TCP and UDP
    see also this link:
    http://javarevisited.blogspot.nl/2014/07/9-difference-between-tcp-and-udp-protocol.html

    Main diff is that TCP guarantees package delivery at slow speed
    and UDP guarantees high speed at the expense of some packages lost.

    Considering that in LAG situations the guaranteed package delivery makes no sense because it is already too late, it is no wonder that UDP seems to be favored for MMO's

    however: that is what the linked article states.
    Do we have factual evidence what is currently best practice for MMO's.
    Which MMO's use which protocol ?

    TCP guarantees packet delivery over network which does not guarantees 100% packet delivery, in some conditions it is bad, because it requires packet to be delivered, even if it's outdated.
    UDP does not guarantees anything, it's insanely simple base.

    Almost any near-realtime multiplayer game does use UDP.
  • RinaldoGandolphi
    RinaldoGandolphi
    ✭✭✭✭✭
    ✭✭✭✭
    Yes,

    It would be more of a software cost then a hardware cost I would think. Some parts of the server and client side software and interfaces would have to be modified from a coding standpoint to make it happen, but in doing so ZOS could truly create a very robust netcode tailored to their specific needs. Writing error, packet loss, and other measures directly on top of UDP gives them a lot of options and scalability moving forward.

    While TCP has worked, its not ideal. I questioned the use of TCP early on but was told it wouldn't be an issue, but i thought TCP ability to scale in ESO real-time type of combat system wouldn't work reliability.

    The underlying problems with TCP become very apparent in PVP. The only reason it doesn't as much in PVE is because you don't have any content that requires more then 12 players(trials). Once those group sizes go higher then 12(PVP) and you have more then 12 people spamming abilities in an area, the problems with TCP start to show and the more players you add the worse it gets.

    WOW(a game i have never played) also uses TCP, but i believe all of its abilities have cooldowns so they are able to "throttle" user input(server requests) though cooldown mechanics. ESO is a game that doesn't function like WOW on the cooldown front, and infact many abilities can be canceled allowing abilities to be fired off more in real time long as you have the resources to cast.

    In a real time combat system like ESO has, i just don't see how TCP can be implemented to scale correctly. a RUDP implementation would be so much better for the game and would allow it to bypass some of these bottlenecks, it would IMO overall make the game scale far higher, adding a client side anti-cheat and removing some of the server side checks along with such a netcode change would probably make lag near non-existent the majority of the time.
    Rinaldo Gandolphi-Breton Sorcerer Daggerfall Covenant
    Juste Gandolphi Dark Elf Templar Daggerfall Covenant
    Richter Gandolphi - Dark Elf Dragonknight Daggerfall Covenant
    Mathias Gandolphi - Breton Nightblade Daggerfall Covenant
    RinaldoGandolphi - High Elf Sorcerer Aldmeri Dominion
    Officer Fire and Ice
    Co-GM - MVP



    Sorcerer's - The ONLY class in the game that is punished for using its class defining skill (Bolt Escape)

    "Here in his shrine, that they have forgotten. Here do we toil, that we might remember. By night we reclaim, what by day was stolen. Far from ourselves, he grows ever near to us. Our eyes once were blinded, now through him do we see. Our hands once were idle, now through them does he speak. And when the world shall listen, and when the world shall see, and when the world remembers, that world will cease to be. - Miraak

  • KhajitFurTrader
    KhajitFurTrader
    ✭✭✭✭✭
    ✭✭
    .

    The 1st step to fixing the lag issues with ESO is to re-write the netcode using RUDP(Reliable UDP). This will then allow ZOS to write their own latency control issues into the custom protocol that would kinda make it a mix between Layer 4(UDP) with custom code built on top of it to handle these issues(Layer 7) functionality.

    Useful References
    https://1024monkeys.wordpress.com/
    https://1024monkeys.wordpress.com/2014/04/08/udp-vs-tcp-a-follow-up/
    Ok, first off, I miss a link to a relevant article from the author mentioned above: https://1024monkeys.wordpress.com/2014/04/01/game-servers-udp-vs-tcp/

    Second, trashing and rewriting the entire netcode (which needs to be done simultaneously both on client- and server-side, naturally) might be a very expensive option in terms of both programming and testing time and resources. The fundamental design decision to go with TCP has been made a long time ago, and the corresponding code has been tested over years in internal, closed, open, and stress test betas. It may simply not be affordable to conduct even a partial test schedule in a live environment, while the PTS does not even come close to live concurrency numbers, let alone stress test numbers. An operation on such a basic component of the code base without proper testing beforehand would be a high risk endeavor without guarantee of improvement -- after all, we don't know for sure whether the main cause for server-side lag is, e.g., packet loss due to network saturation (which UDP wouldn't change, btw), or high server-side response times due to load on script engines or RDBMS. The latter case could be solved easier and cheaper with optimizations, an ongoing process that has already begun.

    Besides, the consoles would be affected as well, since they share the same, common code base, meaning that the whole approvement process on both PSN and Xbox Live would have to be repeated.

    All in all, while of course improvements to server/network performance are important, the question should be asked whether they are feasible. Changing network protocols this late in the lifetime of the game might just not be, but none of us are privy to internal evaluations of the company.



  • Rylana
    Rylana
    ✭✭✭✭✭
    ✭✭✭✭✭
    Known this for a long time OP, glad you had the gumption and patience to actually put it to words.

    Another thing they need to do is streamline the sheer amount of data transfer. I have seen transfer rates in my taskmanager just from ESO related processes in excess of 10Mbps. Thats insane.
    @rylanadionysis == Closed Beta Tester October 2013 == Retired October 2016 == Uninstalled @ One Tamriel Release == Inactive Indefinitely
    Ebonheart Pact: Lyzara Dionysis - Sorc - AR 37 (Former Empress of Blackwater Blade and Haderus) == Shondra Dionysis - Temp - AR 23 == Arrianaya Dionysis - DK - AR 17
    Aldmeri Dominion: Rylana Dionysis - DK - AR 25 == Kailiana - NB - AR 21 == Minerva Dionysis - Temp - AR 21 == Victoria Dionysis - Sorc - AR 13
    Daggerfall Covenant: Dannika Dionysis - DK - AR 21 == The Catman Rises - Temp - AR 15 (Former Emperor of Blackwater Blade)
    Forum LOL Champion (retired) == Black Belt in Ballista-Fu == The Last Vice Member == Praise Cheesus == Electro-Goblin
  • Sugaroverdose
    Sugaroverdose
    ✭✭✭✭✭
    It seems like PS4 version does use UDP as transport and i don't believe that ZOS implemented different protocol for every platform.
    At least all communications between my console, "mphpp-zmpseup-5-2-7.vivox.com" and "disp-zmpseup-5-1.vivox.com" does run over UDP


    My bad, it's in game voice :)
    Edited by Sugaroverdose on March 28, 2016 7:35PM
  • Zyle
    Zyle
    ✭✭✭✭✭
    How was the game able to run without excessive latency in the earlier days? I get the advantages/disadvantages between the two protocols, but I don't get why it's an issue now if there was a time where TCP being used didn't cause terrible latency. I still think it's horrendous lack of optimization rather than improper protocol selection. Not saying UDP wouldn't be a better option, I just don't think it's the solution to the ridiculous latency we're seeing.


    676 CP
    Zyle - LVL50 Stamina Nightblade - Former Emp AS - VMA Clear (Flawless)
    Joven - LVL50 Hybrid Templar
    Adion - LVL50 Stamina DK
    Radac - LVL50 Magicka Sorcerer
    Vanikath - LVL50 Magicka DK
  • Cryptical
    Cryptical
    ✭✭✭✭✭
    Let me boil this down to a metaphor.

    Packet loss is like an earthquake upheaval, tossing cars a few inches off the ground, while the people (players) are trying to drive (play the game).
    TCP is like a car fitted with tank tracks, it turns really slow because it can only turn left/right when the tracks are on the ground (when lost packets are resent and such).
    UDP is like a car fitted with off road tires, it turns more quickly because the wheels can turn left/right when they are on the ground OR while bounced into the air (experiencing packet loss).

    So the road filled with TCP cars will see people driving their cars, then all actual maneuvering will halt while the earthquake (packet loss) bounces all of them off the ground for a moment, but the drivers keep stomping on the gas and spinning the steering wheel, and the cars try to catch up with all that input when they land back on the ground.
    But the road filled with UDP cars will see people driving, and when the earthquake (packet loss) pops them off the ground their individual cars keep maneuvering, and everyone's movement isn't exceptionally derailed by the momentarily loss of contact with the ground (loss of contact with the server).

    Roughly, a half decent characterization?
    Xbox NA
  • RinaldoGandolphi
    RinaldoGandolphi
    ✭✭✭✭✭
    ✭✭✭✭
    Cryptical wrote: »
    Let me boil this down to a metaphor.

    Packet loss is like an earthquake upheaval, tossing cars a few inches off the ground, while the people (players) are trying to drive (play the game).
    TCP is like a car fitted with tank tracks, it turns really slow because it can only turn left/right when the tracks are on the ground (when lost packets are resent and such).
    UDP is like a car fitted with off road tires, it turns more quickly because the wheels can turn left/right when they are on the ground OR while bounced into the air (experiencing packet loss).

    So the road filled with TCP cars will see people driving their cars, then all actual maneuvering will halt while the earthquake (packet loss) bounces all of them off the ground for a moment, but the drivers keep stomping on the gas and spinning the steering wheel, and the cars try to catch up with all that input when they land back on the ground.
    But the road filled with UDP cars will see people driving, and when the earthquake (packet loss) pops them off the ground their individual cars keep maneuvering, and everyone's movement isn't exceptionally derailed by the momentarily loss of contact with the ground (loss of contact with the server).

    Roughly, a half decent characterization?

    Bingo!

    Rinaldo Gandolphi-Breton Sorcerer Daggerfall Covenant
    Juste Gandolphi Dark Elf Templar Daggerfall Covenant
    Richter Gandolphi - Dark Elf Dragonknight Daggerfall Covenant
    Mathias Gandolphi - Breton Nightblade Daggerfall Covenant
    RinaldoGandolphi - High Elf Sorcerer Aldmeri Dominion
    Officer Fire and Ice
    Co-GM - MVP



    Sorcerer's - The ONLY class in the game that is punished for using its class defining skill (Bolt Escape)

    "Here in his shrine, that they have forgotten. Here do we toil, that we might remember. By night we reclaim, what by day was stolen. Far from ourselves, he grows ever near to us. Our eyes once were blinded, now through him do we see. Our hands once were idle, now through them does he speak. And when the world shall listen, and when the world shall see, and when the world remembers, that world will cease to be. - Miraak

  • hrothbern
    hrothbern
    ✭✭✭✭✭
    Zyle wrote: »
    How was the game able to run without excessive latency in the earlier days? I get the advantages/disadvantages between the two protocols, but I don't get why it's an issue now if there was a time where TCP being used didn't cause terrible latency. I still think it's horrendous lack of optimization rather than improper protocol selection. Not saying UDP wouldn't be a better option, I just don't think it's the solution to the ridiculous latency we're seeing.

    From what I understand is that much more of the calculations and decisions were done at Client level,
    leading to abuse and bots.

    So ZOS transfered much more to the server than originally intended at the expense of performance (the LAG)

    And this element of abuse does raise the question:

    Is UDP as difficult to hack as TCP ?
    "I still do not understand why I followed the advice of Captain Rana to bring the villagers of Bleakrock into safety. We should have fought for our village and not have backed down, with our tail between our legs. Now my home village is in shambles, the houses burning, the invaders feasting.I swear every day to Shor that after Molag Bal has been defeated, I will hunt down the invaders and restore peace in Bleakrock and drink my mead with my friends at the market place".PC-EU
  • KhajitFurTrader
    KhajitFurTrader
    ✭✭✭✭✭
    ✭✭
    hrothbern wrote: »
    :

    Is UDP as difficult to hack as TCP ?
    This isn't a question about the transport/connection layer protocol, it's about the client/server communications model. In his Laws of Online World Design, Raph Koster rightfully stated: "Never trust the client. The client is in the hands of the enemy." All kinds of implications and complications of client/server design arise from this, which are sadly not solved by simply changing protocols.
  • hrothbern
    hrothbern
    ✭✭✭✭✭
    hrothbern wrote: »
    :

    Is UDP as difficult to hack as TCP ?
    This isn't a question about the transport/connection layer protocol, it's about the client/server communications model. In his Laws of Online World Design, Raph Koster rightfully stated: "Never trust the client. The client is in the hands of the enemy." All kinds of implications and complications of client/server design arise from this, which are sadly not solved by simply changing protocols.

    ok
    Do I then understand correctly that in terms of abuse UDP is as such not more vulnerable then TPC ?
    (that was the intention of my perhaps not well enough worded question)
    "I still do not understand why I followed the advice of Captain Rana to bring the villagers of Bleakrock into safety. We should have fought for our village and not have backed down, with our tail between our legs. Now my home village is in shambles, the houses burning, the invaders feasting.I swear every day to Shor that after Molag Bal has been defeated, I will hunt down the invaders and restore peace in Bleakrock and drink my mead with my friends at the market place".PC-EU
  • KhajitFurTrader
    KhajitFurTrader
    ✭✭✭✭✭
    ✭✭
    hrothbern wrote: »
    hrothbern wrote: »
    :

    Is UDP as difficult to hack as TCP ?
    This isn't a question about the transport/connection layer protocol, it's about the client/server communications model. In his Laws of Online World Design, Raph Koster rightfully stated: "Never trust the client. The client is in the hands of the enemy." All kinds of implications and complications of client/server design arise from this, which are sadly not solved by simply changing protocols.

    ok
    Do I then understand correctly that in terms of abuse UDP is as such not more vulnerable then TPC ?
    (that was the intention of my perhaps not well enough worded question)
    Correct. Supposing that in both cases the client sends the same information to the server, the server still has to sanity-check everything, regardless how it got there.

  • hrothbern
    hrothbern
    ✭✭✭✭✭
    hrothbern wrote: »
    hrothbern wrote: »
    :

    Is UDP as difficult to hack as TCP ?
    This isn't a question about the transport/connection layer protocol, it's about the client/server communications model. In his Laws of Online World Design, Raph Koster rightfully stated: "Never trust the client. The client is in the hands of the enemy." All kinds of implications and complications of client/server design arise from this, which are sadly not solved by simply changing protocols.

    ok
    Do I then understand correctly that in terms of abuse UDP is as such not more vulnerable then TPC ?
    (that was the intention of my perhaps not well enough worded question)
    Correct. Supposing that in both cases the client sends the same information to the server, the server still has to sanity-check everything, regardless how it got there.

    ok
    Is it possible to do a reasonable reliable sanity check if some packages are missing ? (what will happen with UDP)
    "I still do not understand why I followed the advice of Captain Rana to bring the villagers of Bleakrock into safety. We should have fought for our village and not have backed down, with our tail between our legs. Now my home village is in shambles, the houses burning, the invaders feasting.I swear every day to Shor that after Molag Bal has been defeated, I will hunt down the invaders and restore peace in Bleakrock and drink my mead with my friends at the market place".PC-EU
  • KhajitFurTrader
    KhajitFurTrader
    ✭✭✭✭✭
    ✭✭
    hrothbern wrote: »
    Is it possible to do a reasonable reliable sanity check if some packages are missing ? (what will happen with UDP)
    Well, information that doesn't arrive at all cannot be checked, obviously. ;)

    If the information units exchanged in client/server communication are partitioned in such a way that they fit into a single network packet, i.e. one packet comprises a complete message on the application level -- which is a) possible in MMOs, and b) would be a prerequisite for UDP and a good practice for TCP (e.g., you don't want to worry about packet ordering at all) -- messaging between client and server is binary: messages either arrive fully, or they don't. So, sanity checking messages that arrive is always possible, even if some messages get lost on the way, regardless of the underlying protocol.

    The thing is, TCP has build-in package loss detection, which needs to be tweaked so it isn't as "prissy" as usual. UDP doesn't care, and gives no indication at all about lost messages by default. So it has to be made "reliable" to an arbitrary degree by the programmer, which involves re-inventing the wheel, as there is already a reliable protocol called TCP. The article linked above talks about the pros and cons of both TCP and UDP in depth.
  • ben_ESO5
    ben_ESO5
    ✭✭✭
    Good write-up, but fundamentally flawed in regards to the larger picture. I'm not saying that your discussion of TCP, UDP, and packet loss is wrong, it's just that you dismissed another huge issue, and one you assume ZOS has been incorrectly focusing on -- server-side CPU utilization (based on your statement that "No amount of "LOS checks" and "changing ability animations" are going to Fix this issue").

    You assume that lag is solely due to network congestion, but even in your write-up, you touched on a Cisco router running at 100% CPU utilization -- flooding the network interface is not the only thing that will cause packet loss, but as you said, CPU utilization as well, regardless of the amount of network traffic. For example, UTM firewalls that attempt too much Intrustion protection scanning, antivirus, application control, and web filtering can quickly become overloaded (and thus become unresponsive and show massive packet loss) even though only a fraction of their inbound interface is being utilized -- it's not only a question of how much network traffic there is, but also how many computations are performed to analyze each packet, or group of packets. The same holds true for ESO servers -- and ZOS has touched on this time and time again; they're desperately trying to reduce the number of computations required for every action taken by the players. This tells us that packet loss is being directly effected by the servers CPU utilization being nearly maxed out, and not just the network hardware having to play catch-up.

    It doesn't matter how much throughput is available, or how responsive the network side of things is, if the processing power of the server cluster in question cannot adequately process and respond to every incoming message without becoming unresponsive, and thus causing packet loss independently of the network hardware. But again, don't get me wrong, that's also not to say that your vision of things regarding TCP/UDP is not correct either, it's just that it's part of the whole, and not the only problem that needs solving.
  • ArgoCye
    ArgoCye
    ✭✭✭✭
    "You are so learned, Papa Homer."

    And I did learn - thanks all.

  • Merlight
    Merlight
    ✭✭✭✭✭
    I think the 1st step towards fixing ESO lag issues lies in the netcode and its reliance on TCP. I think a custom RUDP(Reliable UDP protocol) implementation would go a long way in reducing server overhead and making the game lag significantly less in high stress situations. TCP simply isn't the best choice for a game the scale of ESO and this becomes woefully apparent on the PVP side of the game.

    I agree that UDP might help, but abandoning TCP might not be as straightforward as it seems. I don't know ESO's application protocol, though, so I'll give you an example from a game using TCP whose application protocol I do know.

    In Lineage 2, communication with the game server in both directions is encrypted using a very simple, fast and insecure stream cipher -- it's just a XOR with a short key (server's encryption key equals client's decryption key, and vice versa), and each sent / received byte modifies one byte of the corresponding key. I guess they modify the key to make it harder for an adversary to tackle with the data stream -- but it also rules out dropping packets, as everything that was encrypted on the server must be decrypted on the client to keep their keys in sync. A lost packet means decryption of the next received packet will use stale key, which inevitably leads to disconnect as it produces garbage.

    Was it a poor choice for an MMO? Well, their servers were able to handle thousands of players in 2006 -- there were no megaservers, no instances, only 2000+ players on one physical server -- and 100+ people fighting over a castle wasn't uncommon. To be honest, there was not that much AoE back then, but heck it's been 10 years ago, and the map was as large as Cyrodiil...
    EU ‣ Wabbajack nostalgic ‣ Blackwater Blade defender ‣ Kyne wanderer
    The offspring of the root of all evil in ESO by DeanTheCat
    Why ESO needs a monthly subscription
    When an MMO is designed around a revenue model rather than around fun, it doesn’t have a long-term future.Richard A. Bartle
    Their idea of transparent, at least when it comes to communication, bears a striking resemblance to a block of coal.lordrichter
    ... in the balance of power between the accountants and marketing types against the artists, developers and those who generally want to build and run a good game then that balance needs to always be in favour of the latter - because the former will drag the game into the ground for every last bean they can squeeze out of it.Santie Claws
  • hrothbern
    hrothbern
    ✭✭✭✭✭
    hrothbern wrote: »
    Is it possible to do a reasonable reliable sanity check if some packages are missing ? (what will happen with UDP)
    Well, information that doesn't arrive at all cannot be checked, obviously. ;)

    If the information units exchanged in client/server communication are partitioned in such a way that they fit into a single network packet, i.e. one packet comprises a complete message on the application level -- which is a) possible in MMOs, and b) would be a prerequisite for UDP and a good practice for TCP (e.g., you don't want to worry about packet ordering at all) -- messaging between client and server is binary: messages either arrive fully, or they don't. So, sanity checking messages that arrive is always possible, even if some messages get lost on the way, regardless of the underlying protocol.

    The thing is, TCP has build-in package loss detection, which needs to be tweaked so it isn't as "prissy" as usual. UDP doesn't care, and gives no indication at all about lost messages by default. So it has to be made "reliable" to an arbitrary degree by the programmer, which involves re-inventing the wheel, as there is already a reliable protocol called TCP. The article linked above talks about the pros and cons of both TCP and UDP in depth.

    Thanks again @KhajitFurTrader , for your clarifying answer and the patience for my noob simple questions :)

    I did a lot of reading now on the subject on Internet, and found the following summary of TCP induced issues for a MMO, that nicely describes what state we are in:
    (http://gafferongames.com/networking-for-game-programmers/udp-vs-tcp/)

    Why you should never use TCP to network time critical data

    The problem with using TCP for realtime games like FPS is that unlike web browsers, or email or most other applications, these multiplayer games have a real time requirement on packet delivery. For many parts of your game, for example player input and character positions, it really doesn’t matter what happened a second ago, you only care about the most recent data. TCP was simply not designed with this in mind.

    Consider a very simple example of a multiplayer game, some sort of action game like a shooter. You want to network this in a very simple way. Every frame you send the input from the client to the server (eg. keypresses, mouse input controller input), and each frame the server processes the input from each player, updates the simulation, then sends the current position of game objects back to the client for rendering.

    So in our simple multiplayer game, whenever a packet is lost, everything has to stop and wait for that packet to be resent. On the client game objects stop receiving updates so they appear to be standing still, and on the server input stops getting through from the client, so the players cannot move or shoot. When the resent packet finally arrives, you receive this stale, out of date information that you don’t even care about! Plus, there are packets backed up in queue waiting for the resend which arrive at same time, so you have to process all of these packets in one frame. Everything is clumped up!

    Unfortunately, there is nothing you can do to fix this behavior with TCP, nor would you want to, it is just the fundamental nature of it! This is just what it takes to make the unreliable, packet-based internet look like a reliable-ordered stream.

    Thing is we don’t want a reliable ordered stream.

    We want our data to get as quickly as possible from client to server without having to wait for lost data to be resent.

    This is why you should never use TCP for networking time-critical data!


    So with hindsight using UDP looks like the better choice and should perhaps have been the choice right from the start.
    Now we are in the devils kitchen.
    But using UDP forces you to write your own overhead on top of it,
    which is as far I can oversee that now,
    no small undertaking, needs experienced specialists, is a substantial risk on performance during the development time that needs to happen partially in live...
    So converting to UDP seems like a last resort to me if all other fails.

    So I am inclined to concur with your post here below as describing our current position:
    Second, trashing and rewriting the entire netcode (which needs to be done simultaneously both on client- and server-side, naturally) might be a very expensive option in terms of both programming and testing time and resources. The fundamental design decision to go with TCP has been made a long time ago, and the corresponding code has been tested over years in internal, closed, open, and stress test betas. It may simply not be affordable to conduct even a partial test schedule in a live environment, while the PTS does not even come close to live concurrency numbers, let alone stress test numbers. An operation on such a basic component of the code base without proper testing beforehand would be a high risk endeavor without guarantee of improvement -- after all, we don't know for sure whether the main cause for server-side lag is, e.g., packet loss due to network saturation (which UDP wouldn't change, btw), or high server-side response times due to load on script engines or RDBMS. The latter case could be solved easier and cheaper with optimizations, an ongoing process that has already begun.

    Besides, the consoles would be affected as well, since they share the same, common code base, meaning that the whole approvement process on both PSN and Xbox Live would have to be repeated.

    All in all, while of course improvements to server/network performance are important, the question should be asked whether they are feasible. Changing network protocols this late in the lifetime of the game might just not be, but none of us are privy to internal evaluations of the company.

    EDIT:
    I think also that many of the improvements done and in progress are beneficial or even necessary to get UDP working well, if it comes to that.



    Edited by hrothbern on March 29, 2016 10:41AM
    "I still do not understand why I followed the advice of Captain Rana to bring the villagers of Bleakrock into safety. We should have fought for our village and not have backed down, with our tail between our legs. Now my home village is in shambles, the houses burning, the invaders feasting.I swear every day to Shor that after Molag Bal has been defeated, I will hunt down the invaders and restore peace in Bleakrock and drink my mead with my friends at the market place".PC-EU
  • Selstad
    Selstad
    ✭✭✭✭
    I do think that other games manages to function with TCP (WoW, Guild wars 2, The secret world), so I think that the main issue in ESO is the way the game is build. Neither of the 3 mentioned games, allows you to make use of all passives that might function with a synergy of that spell. ESO does. You have no way of restricting the amount of passives and synergies between spells, abilities and sets you're using. I've noticed in Cyrodill that most of the latency issues states from participating in combat, per example I was running around with no problem until I was attacked by a wolf, in which I started to lack. This leads me to believe that due to how the game is build, you end up bottlenecking the server, who has no way of "removing" other player's abilities and synergies.

    I know we had similar lag problems in WoW in world PvP situations, such as in Wintergrasp and Tol Barad, the battle started, a lot of players entered, the game started to lag. Which is why PvP should always be controlled in size on the group, too many and the server simply can't handle it, regardless of how many flood servers you try to put in.

    I think that the way to solve the problem in ESO, is to limit players to a set of 5 skills per action bar (2 action pars) as well as a selection of max 8 passives that are "active", while the rest of the passives are not working if not selected. This is similar to how The secret world runs their selection, though in that game you can only select 5 active abilities and 5 passive abilities.

    The next part of solving problems in PvP is to forgo the "large" Cyrodill map and make use of controlled battle servers with fixed PvP groups (10 Vs 10, 20 Vs 20 etc.). The good thing about this controlled battle mode, is that you can always control how many that can enter the server, and also monitoring the performance.

    All in all though; I think Zenimax did mess up a bit on their choices as well as messing up on the game mechanics. I think that the problems that there are with network issues, are far deeper and more integrated with the design choice of the game, as well as how those systems interact with the selected network interface. If they are to solve this problem, they might have to change more than their network interface.
  • hrothbern
    hrothbern
    ✭✭✭✭✭
    ah

    Between all that internet stuff on network issues, that I read, it was also mentioned several times that using Wi-Fi instead of cable will increase the chance that a package is lost, which triggers the "lag" for your PC.
    "I still do not understand why I followed the advice of Captain Rana to bring the villagers of Bleakrock into safety. We should have fought for our village and not have backed down, with our tail between our legs. Now my home village is in shambles, the houses burning, the invaders feasting.I swear every day to Shor that after Molag Bal has been defeated, I will hunt down the invaders and restore peace in Bleakrock and drink my mead with my friends at the market place".PC-EU
  • Marktoneth3
    Marktoneth3
    ✭✭✭
    Tag
  • Nerouyn
    Nerouyn
    ✭✭✭✭✭
    ✭✭
    Selstad wrote: »
    I think that the way to solve the problem in ESO, is to limit players to a set of 5 skills per action bar (2 action pars) as well as a selection of max 8 passives that are "active", while the rest of the passives are not working if not selected.

    I think you might be very confused about how it would work on the back end.

    Player stats are affected by equipment, enchantments, food, potions and passives. The server would not be communicating info about every player's equipment, enchantments, food, potions and passives to each other. It would really only communicate health total.

    Where passives have a direct effect on actives - eg. longer duration - that also wouldn't need to be communicated to other players. That info would be on the server and used when calculating the damage / duration of abilities.
  • KhajitFurTrader
    KhajitFurTrader
    ✭✭✭✭✭
    ✭✭
    hrothbern wrote: »
    This is why you should never use TCP for networking time-critical data!
    While everything being said in that article applies well to online first person shooters, it doesn't necessarily do so to MMOs -- because they are not time critical in the sense that they need to provide small, guaranteed maximum answer/turnaround times (e.g., like a real-time system does). MMOs, by definition, are not "twitchy" games -- ESO with its outstanding combat system might be a bit different, but reaction window times to combat indicators are well above multiple times of expectable network latency.

    When years ago I first heard that WoW internally runs with a delay of 500 ms calculated in, I was baffled. How can everything seem to be so immediate, so smooth, or look so synchronised, when everything you see in your viewport "happens" with more than a half second difference? The answer is: the client is cheating, a lot! When you press [W], your character immediately starts moving forward, but the server might only know about it at half the current roundtrip time plus an arbitrary amount of processing time, which will depend on zone load, later. Only then will it broadcast your movement to others within visible vicinity, and again, at half of their roundtrip time later, they will see you start moving. Global synchronicity within half a second is precise enough for MMOs, and a tradeoff between needed computational power, i.e. costs, and continuity for players -- only when message transmission times, caused either by delayed messages due to server load, or by high network latency, are becoming greater than the internally calculated delay, the system starts to break down. In the case of moving, the server calculates where you have been 500 ms ago and where you should be now, within reasonable boundaries. If the next status message from the client contains an implausible vector (position + direction + speed), the server overrules the client and your position is reset. Rubberbanding or slingshotting are the visible effects of this.

    So, MMOs are not time-critical, or real-time online applications, although through the use of trickery, they very much try to look like them.

    Edited by KhajitFurTrader on March 29, 2016 2:31PM
  • RinaldoGandolphi
    RinaldoGandolphi
    ✭✭✭✭✭
    ✭✭✭✭
    The use of TCP over a UDP with Layer 7 implementations that they could use to write their own error, latency, etc types of controls is just one of many poor decisions all put together that have lead us to what we have now.

    You could cut lag in half just by using an RUDP as you cut down packet transmissions by 50% alone as with UDP if one packet gets dropped you simply send a new one thats updated, with TCP your forced to resend and receive old data, and then wait for new data, all while throughput is cut as TCP treats dropped packets as being caused from network congestion issues not necessarily overloaded equipment.

    Not having to retransmit packets would reduce CPU and other overhead on the system significantly.

    Lastly moving to a Client side Anti-cheat such as Punkbuster, and implementing a Code Signing verification system where ZOS holds the private key, and the public key is embedded on the clients and your client can't connect to the server if any of the files fail the code signing verification.

    (If i code sign a binary any change even added a space to the code renders the signature invalid, this is how Microsoft code signs WHQL drivers on Windows, once signed they can't be changed or else they will fail with an invalid signature which is why every driver release as to be re-sighed if they want to be WHQL certified)

    Every CPU bought in the last 7 years has encryption instruction sets built-in to the wafer. Instead of putting all the checks on the server like they did in the 90's, you can use Code Signing Verification to trust your clients...any client who all the code signing signatures that are valid can be trusted and are allowed to connect, any who don;t can't.

    Microsoft gave up on the server thing back with Windows Vista and Windows Genuine Advantage...better off to hash the hard ware and verify code signatures then to put everything server side.

    No game hacker is going to break 256-bit AES encryption, just code sign the binaries and trust the clients that have valid code sigs and don't let those that don't connect. You poll memory at reg intervals as even hooking one of those files renders the code sig invalid...why do you think malware authors block all of MS servers and disable the WUAS and such when they patch files like svchost.exe, because they know Windows will know its invalid and will try to fetch legit versions.

    in ESO case, you wouldn't be able to connect to ZOS servers without all your files passing Code Signing Verification checks which put the kabosh on any attempts to hack and embed the polling inside the actual game executable so that way if you patch the game executable in anyway, it renders the code signing invalid and no game playing for you...it polls memory periodically along with Punkbuster and if any of those files in ram have been altered or hooked in anyway(which means code signing is invalid) kicked from the server you are.

    Without ZOS private key, there is no way to cheat this system short of having a quantum computer which nothing short of top level government organizations have, and they won't even admit they have them :)
    Rinaldo Gandolphi-Breton Sorcerer Daggerfall Covenant
    Juste Gandolphi Dark Elf Templar Daggerfall Covenant
    Richter Gandolphi - Dark Elf Dragonknight Daggerfall Covenant
    Mathias Gandolphi - Breton Nightblade Daggerfall Covenant
    RinaldoGandolphi - High Elf Sorcerer Aldmeri Dominion
    Officer Fire and Ice
    Co-GM - MVP



    Sorcerer's - The ONLY class in the game that is punished for using its class defining skill (Bolt Escape)

    "Here in his shrine, that they have forgotten. Here do we toil, that we might remember. By night we reclaim, what by day was stolen. Far from ourselves, he grows ever near to us. Our eyes once were blinded, now through him do we see. Our hands once were idle, now through them does he speak. And when the world shall listen, and when the world shall see, and when the world remembers, that world will cease to be. - Miraak

Sign In or Register to comment.