ISCSI LUN corruption at reboot

iSCSI related applications
_Joren_
New here
Posts: 5
Joined: Thu Feb 04, 2016 8:47 pm

ISCSI LUN corruption at reboot

Postby _Joren_ » Fri Jul 07, 2017 9:59 pm

Hi,

We have 2 QNAP TDS-16489U devices. Both with firmware 4.2.1. They both had an iscsi LUN connected to my vmware environment. Did several reboots of both devices in the past year without any issues. The iscsi LUN's each time became visible again in vmware and the data was readable.

Now last week I had to shutdown and start both QNAP's which we did the normal way via the web interface but after starting them vmware was able to connect to both LUN's but the data on it (vmfs5 filesystem) was completely unreadable on both devices. Had vmware support look at it and we discovered both the partition table and LVM volume on the iscsi LUN got corrupted on both QNAP devices. We could restore the partition table but the LVM is broken beyond repair... So all data lost... The only thing vmware seems able to do is format the iscsi LUN's again and start from 0 again...

Vmware support saw clearly that both QNAP devices for some reason tried a resize oof the iscsi LUN volume on booting but I have no idea why because we never did an extend of the iscsi volume.
Checking the log files of the QNAP's shows indeed on each reboot a resize seems to happen but this particular time this caused us to loose all data.

Anyone else experienced such big issues with QNAP iscsi LUN's?

User avatar
storageman
Experience counts
Posts: 1722
Joined: Thu Sep 22, 2011 10:57 pm

Re: ISCSI LUN corruption at reboot

Postby storageman » Fri Jul 07, 2017 11:27 pm

Block or volume LUN?
What drives?
The box isn't big seller, not many people will have experience.
Where's your LUN snapshots?

_Joren_
New here
Posts: 5
Joined: Thu Feb 04, 2016 8:47 pm

Re: ISCSI LUN corruption at reboot

Postby _Joren_ » Mon Jul 10, 2017 3:30 pm

storageman wrote:Block or volume LUN? 1LUN formatted from vmware as vmfs 5 volume containing 1 vmdk file.
What drives? Western digital 4TB (WD40EFRX-68WT0N0 (SATA)) => 12 of those drives in Raid 6
The box isn't big seller, not many people will have experience.
Where's your LUN snapshots?
No snapshots, It contained vmware backup data which I did not backup again with snapshots or lun backups.

User avatar
storageman
Experience counts
Posts: 1722
Joined: Thu Sep 22, 2011 10:57 pm

Re: ISCSI LUN corruption at reboot

Postby storageman » Mon Jul 10, 2017 4:11 pm

I would get Qnap to troubleshoot given such expensive units.
Are you running VMware on the other motherboard?
This sounds like internal communication issues.

_Joren_
New here
Posts: 5
Joined: Thu Feb 04, 2016 8:47 pm

Re: ISCSI LUN corruption at reboot

Postby _Joren_ » Mon Jul 10, 2017 4:47 pm

storageman wrote:I would get Qnap to troubleshoot given such expensive units.
Are you running VMware on the other motherboard?
This sounds like internal communication issues.


Yes the other one was also keeping vmware backups. Rebooted both units in around the same time both have similar issue with corrupted data.
VMware support was clear and able to indicate corrupted partition table and LVM volume. QNAP support until now is very disappointing as they are not able to find the issue and are reacting quite slow.

I wanted to check if somebody else has had the same experience with iscsi LUN's behaving like this on QNAP devices?

User avatar
storageman
Experience counts
Posts: 1722
Joined: Thu Sep 22, 2011 10:57 pm

Re: ISCSI LUN corruption at reboot

Postby storageman » Mon Jul 10, 2017 5:04 pm

You still haven't said what type of LUNs, use block LUNs as they're faster/simpler.
What log shows Qnap trying to do resize. This sounds like you are using thin provisioned LUNs and it is clearing deleted space.
Use thick LUNs.

_Joren_
New here
Posts: 5
Joined: Thu Feb 04, 2016 8:47 pm

Re: ISCSI LUN corruption at reboot

Postby _Joren_ » Mon Jul 10, 2017 5:27 pm

storageman wrote:You still haven't said what type of LUNs, use block LUNs as they're faster/simpler.
What log shows Qnap trying to do resize. This sounds like you are using thin provisioned LUNs and it is clearing deleted space.
Use thick LUNs.


We use block LUN and they are thick provisioned. I also used a static volume on both QNAP's. So no difficult configs to be honest.

This is what I can find in the kmsg log after rebooting the QNAP (I am not sure if this is normal but it looks like some kind of resize happens):
<6>[ 3107.815670] md/raid:md1: /dev/sdd3 does not support SSD Trim.
<6>[ 3107.821404] md/raid:md1: /dev/sdf3 does not support SSD Trim.
<6>[ 3107.827136] md/raid:md1: /dev/sdi3 does not support SSD Trim.
<6>[ 3107.832885] md/raid:md1: /dev/sdk3 does not support SSD Trim.
<6>[ 3107.838622] md/raid:md1: /dev/sdb3 does not support SSD Trim.
<6>[ 3107.844356] md/raid:md1: /dev/sdg3 does not support SSD Trim.
<6>[ 3107.850092] md/raid:md1: /dev/sdj3 does not support SSD Trim.
<6>[ 3107.855827] md/raid:md1: /dev/sdl3 does not support SSD Trim.
<6>[ 3107.861565] md/raid:md1: /dev/sdc3 does not support SSD Trim.
<6>[ 3107.867300] md/raid:md1: /dev/sde3 does not support SSD Trim.
<6>[ 3107.873036] md/raid:md1: /dev/sdh3 does not support SSD Trim.
<6>[ 3107.878774] md/raid:md1: /dev/sdm3 does not support SSD Trim.
<6>[ 3107.884534] md1: detected capacity change from 0 to 39905929461760
<6>[ 3107.925156] md1: unknown partition table
<6>[ 3109.144153] drbd r1: Starting worker thread (from drbdsetup-84 [6254])
<6>[ 3109.150934] block drbd1: disk( Diskless -> Attaching )
<6>[ 3109.156250] drbd r1: Method to ensure write ordering: flush
<6>[ 3109.161811] block drbd1: max BIO size = 1048576
<6>[ 3109.166334] block drbd1: Adjusting my ra_pages to backing device's (32 -> 512)
<6>[ 3109.173539] block drbd1: drbd_bm_resize called with capacity == 77941248952
<6>[ 3109.183354] block drbd1: resync bitmap: bits=76114501 words=1189290 pages=2323
<6>[ 3109.190564] block drbd1: size = 36 TB (38970624476 KB)
<6>[ 3109.204895] block drbd1: Writing the whole bitmap, size changed
<6>[ 3109.248141] block drbd1: bitmap WRITE of 2323 pages took 37 jiffies
<6>[ 3109.254402] block drbd1: 36 TB (76114501 bits) marked out-of-sync by on disk bit-map.
<6>[ 3109.412883] block drbd1: recounting of set bits took additional 2 jiffies
<6>[ 3109.419661] block drbd1: 36 TB (76114501 bits) marked out-of-sync by on disk bit-map.
<6>[ 3109.427530] block drbd1: Suspended AL updates
<6>[ 3109.431885] block drbd1: disk( Attaching -> Inconsistent )
<6>[ 3109.437446] block drbd1: attached to UUIDs 0000000000000004:0000000000000000:0000000000000000:0000000000000000
<6>[ 3109.457674] drbd r1: conn( StandAlone -> Unconnected )
<6>[ 3109.462920] drbd r1: Starting receiver thread (from drbd_w_r1 [6255])
<6>[ 3109.469444] drbd r1: receiver (re)started
<6>[ 3109.473478] drbd r1: conn( Unconnected -> WFConnection )
<6>[ 3109.491397] drbd r1: conn( WFConnection -> Disconnecting )
<4>[ 3109.491439] drbd r1: Discarding network configuration.
<6>[ 3109.502170] drbd r1: Connection closed
<6>[ 3109.505927] drbd r1: conn( Disconnecting -> StandAlone )
<6>[ 3109.511320] drbd r1: receiver terminated
<6>[ 3109.515238] drbd r1: Terminating drbd_r_r1
<6>[ 3109.518310] block drbd1: role( Secondary -> Primary ) disk( Inconsistent -> UpToDate )
<4>[ 3109.545769] block drbd1: Forced to consider local data as UpToDate!
<6>[ 3109.552032] block drbd1: new current UUID D6B5645CAB24265F:0000000000000004:0000000000000000:0000000000000000
<6>[ 3111.069864] md: requested-resync of RAID array md1
<6>[ 3111.074661] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
<6>[ 3111.080481] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for requested-resync.
<6>[ 3111.090715] md: Resyncing started: md1
<6>[ 3111.094457] md/raid:md1: report qnap hal event: type = HAL_EVENT_RAID, action = RESYNCING_START
<6>[ 3111.103144] md/raid:md1: report qnap hal event: raid_id=1, pd_name=/dev/(null), spare=/dev/(null), pd_repair_sector=0
<6>[ 3111.113741] md: using 128k window, over a total of 3897063424k.

User avatar
storageman
Experience counts
Posts: 1722
Joined: Thu Sep 22, 2011 10:57 pm

Re: ISCSI LUN corruption at reboot

Postby storageman » Mon Jul 10, 2017 8:33 pm

BTW you should disconnect the LUNs before shutdown/reboot either by unmap or dismount in Vmware.
Qnap ISCSI is very sensitive to disconnects.

_Joren_
New here
Posts: 5
Joined: Thu Feb 04, 2016 8:47 pm

Re: ISCSI LUN corruption at reboot

Postby _Joren_ » Mon Jul 10, 2017 9:08 pm

storageman wrote:BTW you should disconnect the LUNs before shutdown/reboot either by unmap or dismount in Vmware.
Qnap ISCSI is very sensitive to disconnects.


I know and that was done. After the reboot I could reconnect the ISCSI Lun's in vmware. Only vmware was not able to read the vmfs5 volume data anymore. It was asking to format the LUN again which of course would erase all the data... So then I had vmware support looking at it and they showed the partition table and LVM volume manager where corrupted.


Return to “iSCSI – Target & Virtual Disk”

Who is online

Users browsing this forum: No registered users and 2 guests