"Disk Failed", then "Disk Unplugged" errors

Questions about SNMP, Power, System, Logs, disk, & RAID.
Post Reply
pas2190
Getting the hang of things
Posts: 63
Joined: Sun Jul 03, 2011 3:04 am

Re: "Disk Failed", then "Disk Unplugged" errors

Post by pas2190 »

QNAP TS-569L

Here is my story with a disc 1 slot1 and disc 2 slot 2 giving me an offline mode
it started about a month ago the 2 discs disappeared from a raid 5
All the 5 discs are just fine
I'm not sure what I did but somehow disc 1 and 2 went online and for some reason unknown
disc 2 appeared as new and the raid 5 rebuild.

after 3 weeks working, again disc 1 slot1 and disc 2 slot 2 giving me an offline mode
then i found this thread with the MOFSET repair.

Since the qnap TS-569L is off warranty anyway.
I gave the backplane to Antenor a Electronic guru master here in Brazil
to take a look at,

First, he swapped the chip 1 88SE9125-NAA2 but the result didn't resolve the issue
Then it took all the parts near chip 1 to check them.
he discovered that the transistor circled in the photo is controlled by the chip but gave an intermittent 1.0v that cause the slot 1 and 2 to go offline.
By the way he said that this type of transistor used in the backplane had a resistance of 0.6v
and in the long run will definitely give-up.

He changed the defective transistor for a more powerful one see photo.

That was it.
The backplane is now repaired the it's back to normal.

Big thanks to “Eletrononica TENORZÃO” Antenor here in Brazil who did a tremendous job on it.
You do not have the required permissions to view the files attached to this post.
User avatar
dolbyman
Guru
Posts: 35252
Joined: Sat Feb 12, 2011 2:11 am
Location: Vancouver BC , Canada

Re: "Disk Failed", then "Disk Unplugged" errors

Post by dolbyman »

so the swap of that marvel controller was not neccessary then ?

that would be harder two swap and source than then other components (and probably more costly..(haven't checked e.g. digikey for pricing)
Last edited by dolbyman on Wed May 27, 2020 12:28 am, edited 1 time in total.
pas2190
Getting the hang of things
Posts: 63
Joined: Sun Jul 03, 2011 3:04 am

Re: "Disk Failed", then "Disk Unplugged" errors

Post by pas2190 »

dolbyman wrote: Tue May 26, 2020 9:28 pm so the swap of that marvel controller was not neccessary then ?

that woukd be harder two swap and source than thenother components (and probably more costly..(haven't checked e.g. digikey for pricing)
Not necessary but it work as future reference
Tenorzao
New here
Posts: 2
Joined: Thu Apr 23, 2020 10:56 am

Re: "Disk Failed", then "Disk Unplugged" errors

Post by Tenorzao »

Pas2190, thank you very much for the opportunity to learn more about this device. 😃😉👍
ELETRÔNICA TENORZÃO - Brazil
rawbar
Starting out
Posts: 24
Joined: Thu May 15, 2014 2:41 am

Re: "Disk Failed", then "Disk Unplugged" errors

Post by rawbar »

rawbar wrote: Mon Feb 10, 2020 2:51 am Ok so it looks like keeping things cool is the key at least for mine. I have had no issues at all since blowing a fan at it. Past 2 days I've added a single USB powered pair of 60mm fans (purchased on Amazon) which i placed blowing down toward the drives (they fit nicely just resting on top) and routed the USB cable out the back and around to the front USB port. I have not seen any of the 4 drives get above 87F. I've set the fan on the QNAP permanently at medium speed (I posted I had it at high before but it was too loud). I've rebuilt the array twice (1st with the 8TB drive, then decided it was pointless since I don't trust this thing, I will never be expanding it, then with the 3TB that was in there originally) without issue. I'll buy a 2 bay synology for the extra 2x8TB drives I now have to back up the QNAP.

Image
I had a good run. No issue since February until today (May 27) when it's 80+ degrees outside and my house is a sauna. Problem came back. Guess I need fans that move more cfm
microsolder
Starting out
Posts: 11
Joined: Fri May 15, 2020 12:49 am

Re: "Disk Failed", then "Disk Unplugged" errors

Post by microsolder »

pas2190 wrote: Tue May 26, 2020 9:04 pm First, he swapped the chip 1 88SE9125-NAA2 but the result didn't resolve the issue
Then it took all the parts near chip 1 to check them.
he discovered that the transistor circled in the photo is controlled by the chip but gave an intermittent 1.0v that cause the slot 1 and 2 to go offline.
By the way he said that this type of transistor used in the backplane had a resistance of 0.6v
and in the long run will definitely give-up.

He changed the defective transistor for a more powerful one see photo.
The indicated transistor functions as linear mode core voltage regulator for the Marvell PCIe-to-2xSATA controller,
producing the 1V core VDD from the 1.8V bus.
The board has two other controllers in almost identical configurations. Those also have the same linear regulator circuit.
A question to Tenorzao is what was the core voltage before changing the transistor and what was it after?
Is there a danger that the other two may also need to be replaced? Of course, only if the other three bays are populated.
Tenorzao
New here
Posts: 2
Joined: Thu Apr 23, 2020 10:56 am

Re: "Disk Failed", then "Disk Unplugged" errors

Post by Tenorzao »

Hello microsolder!!!
The core voltage before replace the transistor was about 0.7V, after was 0.98V. The Marvell controller works in range 0.95V to 1.2V then 0.98V is OK. With original transistor is 1.0V fine. But is hard to find it here in Brazil.
Works on other circuits that uses this MarvelI, I found a external circuit step-down converter to supply this voltage 1.0V to controller, that don't use Marvell reference souce controller pin.
I suppose maybe this transistor has been underspecified in qnap project, according to what the Marvell datasheet recommends.
These transistors work regardless of whether or not they are connected hd, and may one day fail but it is unpredictable, like all electronic components. But we will only know if the transistor is defective when an hd is present in the bay. Only change it if you have a problem with him. 😉👍
microsolder
Starting out
Posts: 11
Joined: Fri May 15, 2020 12:49 am

Re: "Disk Failed", then "Disk Unplugged" errors

Post by microsolder »

Thanks for the update Tenorzao!

The transistor in the datasheet (PBSS303PZ) has rating of 5.3A and is SOT223 packaged, which cools well via the PCB (Rth(j-sp) 15 K/W) if properly laid out.
In the photos above the component appears to be SOT23 (what was the code?) which has (usually much) higher Rth to board (up to 90 K/W depending on
the component, similar to TO-92 Rth(j-c)). With that 0.8V drop at nominal I(VDD) of 0.9A of the controller the junction temperature is about 90C at
room temperature of 25C and in typical in-case temperatures of 35-40C due to heating of the disks the Tj gets above 100C. No wonder the tranny gives up.
With the component suggested in the DS the Tj would be around 50C which would have a decades of lifetime instead on 2-3 years.

Looks like there are a bunch of under-design issues with the QNAP products leading to premature failures threatening people's precious data.
bluenadas
New here
Posts: 9
Joined: Sat Mar 07, 2009 5:48 pm

Re: "Disk Failed", then "Disk Unplugged" errors

Post by bluenadas »

Not to beat a deadhorse here but I have a TVS-673 that has slot 5 & 6 dropping out on me. It has happened once for both drives (Wednesday), and again on drive 5 last night.

And before everyone asks, yes, I've gone through the HDD checks both in the NAS and offline using OEM software. Running 6GB Ironwolfs 1-5 and a 12GB EXOS in slot 6.

I have a ticket in with QNAP support now, and I think everything is pointing to a backplane issue. I'm going to try the external PS and see if that solves the issues. But given I'm about 5 months out of warranty I'm guessing I either get told they don't have replacements or they'll do the repair for $500+. And to think I replaced my last QNAP for the exact same reason except they didn't have repair parts for that model. My faith in QNAP is fading quickly.

I do see the MOSFETs (4957AGM) are rather inexpensive (<$10 for 5 of them on ebay) so I might just see what replacing them does [assuming my 673 board has the same ones].

For those using the jumper method, I'd be cautious doing any hot-swapping on those affected bays. Might not be an issue, but I'm guessing this switching circuit is there for that very reason.
Beddhist
Getting the hang of things
Posts: 92
Joined: Fri Dec 29, 2017 5:36 pm

Re: "Disk Failed", then "Disk Unplugged" errors

Post by Beddhist »

If it's an option, try turning up the fan speed to stabilise things, until you find a permanent solution. It worked for a few people, including me.
microsolder
Starting out
Posts: 11
Joined: Fri May 15, 2020 12:49 am

Re: "Disk Failed", then "Disk Unplugged" errors

Post by microsolder »

bluenadas wrote: Mon Jun 01, 2020 6:46 pm ...I'm going to try the external PS and see if that solves the issues. ...

I do see the MOSFETs (4957AGM) are rather inexpensive (<$10 for 5 of them on ebay) so I might just see what replacing them does [assuming my 673 board has the same ones].

For those using the jumper method, I'd be cautious doing any hot-swapping on those affected bays. Might not be an issue, but I'm guessing this switching circuit is there for that very reason.
If using the external power supply for the HDs solves the issue, it indicates most likely the aging PMOS as the culprit. It seems that there is some variance on the components used
in the products, so the next step is to open the case and peek at the numbers on the components, if they are the same as in others or different. As to the SO-8 dual PMOS switches,
there are lots of them with the same pinout so any component with better voltage and current rating and lower channel resistance will suit as a replacement. Mostly what are
used are VDS < -20V, ID < -4.5 A RDS < 50 mΩ. Quick search in Mouser yielded 17 components with price ranging from $0.50...$1.50 apiece. Definitely a cheap component to replace.

The danger is that even replacing the switch is just a temporary fix (a few years at most) if the driver is badly designed, so taking a look at the gate voltage when the
disk is turned on is a good idea. If the gate voltage of the 5V switch is above 0.5 V or that of the 12 V switch is a above 2 V then the driver must be modified, and the
solution usually is to reduce the gate resistor, which at leas tin the case of TS-431 was ridiculously high, 100 kΩ, when typical values are 1..10 kΩ. So replacing that
resistor along with the switch will be a long-life solutions.

As to the jumper solution, you are right. The switch is a part of the hot-swap switch, so if the switch PMOS is jumpered, then hot-swapping should not be done, as it may damage
both the connectors and the electronics. In most cases, though, the hot-swapping is easy to avoid, powering the device down before swapping disks.

Then a final warning, there is always a chance that there may be something else wrong, besides the switch, so, as always when repairing electronics, before changing the
parts one should to the troubleshooting, monitoring the voltages and signals when the device is running, to see where the real problem is. An easy way to this is to solder
probe wires to selected lines (as otherwise reliable probing is difficult under the disks) and then see if the voltages fluctuate when the device is running.
Any excessive (> 50..100 mV) fluctuation in power lines may point to an issue in the power delivery.
kommisar
Starting out
Posts: 10
Joined: Thu Nov 01, 2018 4:35 pm

Re: "Disk Failed", then "Disk Unplugged" errors

Post by kommisar »

microsolder wrote: Tue Jun 02, 2020 1:31 pm
bluenadas wrote: Mon Jun 01, 2020 6:46 pm ...I'm going to try the external PS and see if that solves the issues. ...

I do see the MOSFETs (4957AGM) are rather inexpensive (<$10 for 5 of them on ebay) so I might just see what replacing them does [assuming my 673 board has the same ones].

For those using the jumper method, I'd be cautious doing any hot-swapping on those affected bays. Might not be an issue, but I'm guessing this switching circuit is there for that very reason.
If using the external power supply for the HDs solves the issue, it indicates most likely the aging PMOS as the culprit. It seems that there is some variance on the components used
in the products, so the next step is to open the case and peek at the numbers on the components, if they are the same as in others or different. As to the SO-8 dual PMOS switches,
there are lots of them with the same pinout so any component with better voltage and current rating and lower channel resistance will suit as a replacement. Mostly what are
used are VDS < -20V, ID < -4.5 A RDS < 50 mΩ. Quick search in Mouser yielded 17 components with price ranging from $0.50...$1.50 apiece. Definitely a cheap component to replace.

The danger is that even replacing the switch is just a temporary fix (a few years at most) if the driver is badly designed, so taking a look at the gate voltage when the
disk is turned on is a good idea. If the gate voltage of the 5V switch is above 0.5 V or that of the 12 V switch is a above 2 V then the driver must be modified, and the
solution usually is to reduce the gate resistor, which at leas tin the case of TS-431 was ridiculously high, 100 kΩ, when typical values are 1..10 kΩ. So replacing that
resistor along with the switch will be a long-life solutions.

As to the jumper solution, you are right. The switch is a part of the hot-swap switch, so if the switch PMOS is jumpered, then hot-swapping should not be done, as it may damage
both the connectors and the electronics. In most cases, though, the hot-swapping is easy to avoid, powering the device down before swapping disks.

Then a final warning, there is always a chance that there may be something else wrong, besides the switch, so, as always when repairing electronics, before changing the
parts one should to the troubleshooting, monitoring the voltages and signals when the device is running, to see where the real problem is. An easy way to this is to solder
probe wires to selected lines (as otherwise reliable probing is difficult under the disks) and then see if the voltages fluctuate when the device is running.
Any excessive (> 50..100 mV) fluctuation in power lines may point to an issue in the power delivery.
Just to clarify, jump-wiring MOSFET does not affect hot-swap capabilities of the device. It is based on other principles. Please see SATA specification for more information. Moreover, on same QNAPs first two slots are hard wired to PS and powered permanently. And hot-swap works perfectly fine for those slots.
microsolder
Starting out
Posts: 11
Joined: Fri May 15, 2020 12:49 am

Re: "Disk Failed", then "Disk Unplugged" errors

Post by microsolder »

kommisar wrote: Sat Jun 06, 2020 4:55 pm Just to clarify, jump-wiring MOSFET does not affect hot-swap capabilities of the device. It is based on other principles. Please see SATA specification for more information. Moreover, on same QNAPs first two slots are hard wired to PS and powered permanently. And hot-swap works perfectly fine for those slots.
Hi Kommisar! You are right. The SATA specification describes the pinning of the power connector that should ensure correct power sequencing. In most cases that mechanical solution is sufficient. For added safety the power switch activated by ground pins is a part of many designs, including QNAP's. For example, in TS-431 the SATA power pin 5 pulls down the RC circuit at the source of the 2N7002 which then pulls down the gates of the switch MOSFETs turning the 5V and 12V powers on. This ensures that the grounds are securely connected before the power is applied and also adds a small delay from grounds connecting to the power switching on. The reason for that is that in case the ground connection is poor before the power pins connect, there may be significant return current in the 5V line, from the 12V circuitry, which it may not be able to handle safely. This is mentioned, for example, in https://wsyntax.com/cs/killer-norco-case/, a case of another design with under-rated MOSFETs. Also there the suggested solution, in case of not finding proper switch MOSFETS, is jumpering the failed ones.
Beddhist
Getting the hang of things
Posts: 92
Joined: Fri Dec 29, 2017 5:36 pm

Re: "Disk Failed", then "Disk Unplugged" errors

Post by Beddhist »

I just had my 2nd TS-431 lose its disk #4. So far, it's holding up with medium fan speed, but it will get bridged at the first opportunity.

Incidentally, it then refused to recover the disk when it was re-inserted, but I created another topic on that.
EasyGo
New here
Posts: 2
Joined: Thu May 07, 2020 9:22 am

Re: "Disk Failed", then "Disk Unplugged" errors

Post by EasyGo »

microsolder wrote: Fri Jun 12, 2020 12:39 am
kommisar wrote: Sat Jun 06, 2020 4:55 pm Just to clarify, jump-wiring MOSFET does not affect hot-swap capabilities of the device. It is based on other principles. Please see SATA specification for more information. Moreover, on same QNAPs first two slots are hard wired to PS and powered permanently. And hot-swap works perfectly fine for those slots.
Hi Kommisar! You are right. The SATA specification describes the pinning of the power connector that should ensure correct power sequencing. In most cases that mechanical solution is sufficient. For added safety the power switch activated by ground pins is a part of many designs, including QNAP's. For example, in TS-431 the SATA power pin 5 pulls down the RC circuit at the source of the 2N7002 which then pulls down the gates of the switch MOSFETs turning the 5V and 12V powers on. This ensures that the grounds are securely connected before the power is applied and also adds a small delay from grounds connecting to the power switching on. The reason for that is that in case the ground connection is poor before the power pins connect, there may be significant return current in the 5V line, from the 12V circuitry, which it may not be able to handle safely. This is mentioned, for example, in https://wsyntax.com/cs/killer-norco-case/, a case of another design with under-rated MOSFETs. Also there the suggested solution, in case of not finding proper switch MOSFETS, is jumpering the failed ones.
I certainly appreciate all the information supplied here by Kommisar, microsolder and others. My TS-831X has been running fine since jumpering the MOSFET on two channels some weeks ago. I have purchased replacement MOSFETS (and a hot air rework station after putting it off for 20 years) and am about to embark on replacing those along with the gate resistors as indicated by microsolder.

I do have a question for microsolder -- what scope did you use to do the data logging? It looks expensive (ha) but I'm going to purchase something to be able to do that. My old analog stuff just doesn't cut it any more.

Thanks!
Post Reply

Return to “System & Disk Volume Management”