Maintenance for the week of April 6:
• PC/Mac: No maintenance – April 6

Crash in SLI configuration only while grouping in group instances

manyrabidrats
manyrabidrats
✭✭✭
over the past two weekends I've done some extensive testing. I am able to play for hours and hours solo without so much as a stutter with fps in the 70-100 range constantly.

the issue is a complete system crash when grouping only in group instances in SLI mode. it does not seem to be an application crash as alt+tab or other means of exiting or switching applications does not work, my second screen also freezes, it seems to be a crash of windows explorer and the system does not recover. must cut power to system to restore. it does not happen automatically, it takes about a half hour to an hour to crash while running the instance.

when I set the nvidia profile to use only a single card, the issue does not exist, it only happens in SLI.

i know my cpu is overclocked significantly, but this happens even when set to stock speeds.

both cards run at about 60% utilization gpu and about 2gb ram on each card used. (cards have 4gb each). there is plenty of headroom.

hardware is in attached dxdiag txt file. user settings also attached.

is anyone else using SLI having similar problems?

  • RinaldoGandolphi
    RinaldoGandolphi
    ✭✭✭✭✭
    ✭✭✭✭
    I see you have a significantly overclocked CPU. Actually a 1Ghz overclock if i recall correctly (2600k was 3.4ghz stock speed).

    If i was to make an educated hunch, assuming your RAM has tested out ok with 72 hours of Memtest86, and you don't have a significantly degraded power supply, i'd say you have a case of a degraded CPU.

    Everyone assumes overclocking is safe and only Temps mater and this is simply not true.

    Voltage is a silent killer, even if you keep temps well below the chips threshold, voltage can still cause electromigration and degrade your chip. Tom's Hardware had a big write up on this and done extensive testing it on, Sandy Bridge Chips, and this is what they found:

    http://www.tomshardware.com/reviews/automatic-overclock-motherboard-cpu,3048.html

    Our overclocking articles often mention a process called electromigration” where material is physically transferred from one part of a circuit to another. While the full description of this phenomenon is complex, it’s easy to understand that an insulator contaminated with conductive particles no longer insulates. Transistor gates function as either insulators or conductors depending on charge state and are particularly prone to this type of damage. And yet, many technology enthusiasts place the blame for a fried processor or GPU solely on heat, ignoring the fact that voltage is a measure of force.

    Force causes electromigration, and colder silicon more easily resists that force by being less pliable. Colder temperatures also increase the insulation capabilities of transistor gates in the “off” phase, reducing the number of electrons that are forced through the closed gate. The problem with blaming heat alone on a failure is that moderate increases in electromigration resistance usually require drastic temperature reductions. When it comes to protecting hundreds of dollars in equipment, we always make our recommendations to you erring on the side of caution.

    We've learned through trial, error, and dead processors that voltage levels beyond 1.45 V at above-ambient temperatures can kill an Intel CPU etched at 32 nm (Sandy Bridge-based parts included) very quickly. Those same processors die a fairly slow death at voltage levels between 1.40 V and 1.45 V (somewhere between weeks and months on our test benches). And we're expecting more than a year of reliable service from the parts we've dutifully kept below 1.40 V. Not all motherboards are perfect however. Voltage instability on a particularly cheap motherboard fried one of our processors when it was set to only1.38 V. Subsequently, you've seen us use 1.35 V for the overclocking tests in older motherboard round-ups, embracing 1.38 V to 1.40 V in more recent pieces covering higher-end platforms.

    So pretty much the gist is, with Sandy Bridge(the 2500 and 2600 i5 and i7 cpu's) going over 1.35v on the Vcore of those chips reduces that chips life significantly, you can have the best water cooling system in the world and still degrade the chip. What most overclockers online will tell you is not wrong info, the only thing they leave out is they are lucky to keep the same CPU for over a year, tech enthusiasts are buying new chips literally ever year, so they never keep the chip long enough for Electromigration to matter usually, where as the average person figures they can overclock their CPUY and as long as they keep it cool it will work fine, whereas keeping it cool only means it will work fine for a "limited amount of time" especially when voltages are pushed beyond their stock values.

    Hopefully it just a bad RAM stick or a weak power supply, but its very possible your CPU is beginning to degrade and is throwing error conditions in SLI because it is streesing a part of the CPU that is not used in single GPU set up and thus the degradation in the CPU is the culprit.

    I saw a buddy degrade an Ivy Bridge Chip in less then 4 months by just being slightly above stock volts on top of the lin water cooling. Temps are not everything, voltage is a silent killer. I wrote this post primarily for folks in the future to relate to, hopefully it shows up in a Google search and makes sure the prospective OC has all the facts before he decides to risk his chip :)
    Rinaldo Gandolphi-Breton Sorcerer Daggerfall Covenant
    Juste Gandolphi Dark Elf Templar Daggerfall Covenant
    Richter Gandolphi - Dark Elf Dragonknight Daggerfall Covenant
    Mathias Gandolphi - Breton Nightblade Daggerfall Covenant
    RinaldoGandolphi - High Elf Sorcerer Aldmeri Dominion
    Officer Fire and Ice
    Co-GM - MVP



    Sorcerer's - The ONLY class in the game that is punished for using its class defining skill (Bolt Escape)

    "Here in his shrine, that they have forgotten. Here do we toil, that we might remember. By night we reclaim, what by day was stolen. Far from ourselves, he grows ever near to us. Our eyes once were blinded, now through him do we see. Our hands once were idle, now through them does he speak. And when the world shall listen, and when the world shall see, and when the world remembers, that world will cease to be. - Miraak

  • manyrabidrats
    manyrabidrats
    ✭✭✭
    I suppose its a possiblility..
    i've had this cpu for about five years so i suppose its a possibility, to be honest, i'm waiting for it to fail so i can upgrade to a new one. (with a stock speed about 4ghz, then i wouldnt oc)
    If the cause of this issue i'm having is cpu degeneration I would expect the issue to be present in cpu tests or while playing more intensive games such as shadow of mordor in ultrahd 4k.

    the power supply i'm using is 850w

    i'll try the mem test over the weekend coming up.

    EDIT: upon further checking, the voltage of cpu is 1.24v
    Edited by manyrabidrats on April 8, 2015 2:05PM
Sign In or Register to comment.