Jump to content


Photo
- - - - -

Hangs with suspend, fast user switches, and time machine


  • Please log in to reply
29 replies to this topic

#1 sommerfeld

sommerfeld

    Newbie

  • Members
  • Pip
  • 7 posts

Posted 04 January 2010 - 01:13 PM

Shortly before Christmas I installed the snow leopard beta bits (4.0.0.197) on our home mac mini running snow leopard, and connected it to a 60GB zfs zvol on an OpenSolaris server. I'm using the iscsi device exclusively for Time Machine backups. Backups are (mostly) working but we've had to reboot the mac fairly frequently to recover from hangs -- most commonly, we get the "spinning beachball of death" and apps are unresponsive. Sometimes only one or two apps are affected; in other cases, they all lock up.

We use fast user switching extensively (between two users -- me and my wife) and I've seen the lockup occur when the system wakes up and I immediately user-switch to my account.

I realise that this bug report is woefully incomplete. What information can I collect at this point to help you diagnose the problem? I'm an experienced unix/bsd/solaris kernel developer (and so is my wife) but don't have macos-specific experience.

#2 sommerfeld

sommerfeld

    Newbie

  • Members
  • Pip
  • 7 posts

Posted 05 January 2010 - 09:52 PM

I realise that this bug report is woefully incomplete. What information can I collect at this point to help you diagnose the problem? I'm an experienced unix/bsd/solaris kernel developer (and so is my wife) but don't have macos-specific experience.

Here's some additional information. "sudo dmesg" contains the following messages which seem to relate to the iscsi initiator:

Wake reason = EHC1
System Wake
Previous Sleep Cause: 5
USB (EHCI):Port 7 on bus 0x24 has remote wakeup from some device
NVEthernet::setLinkStatus - Valid but not Active
Ethernet [nvenet]: Link up on en0, 1-Gigabit, Full-duplex, No flow-control, Debug [796d,0000,0de1,000d,c1e1,3800]
NVEthernet::setLinkStatus - link Valid and Active
GLO Warning: Error 32 while receiving BHS.
GLO Warning: Receiving thread has stopped with error 32.
SNSSCSIPeripheralDeviceType00::setPowerState(0xa36ba00, 1 -> 4) timed out after 100178 ms
GLO Warning: Error 32 while receiving BHS.
GLO Warning: Receiving thread has stopped with error 32.


and, earlier on the same system:

GLO Warning: Timeout occured while waiting Target became ready.
GLO Warning: Timeout occured while waiting Target became ready.
GLO Warning: Timeout occured while waiting Target became ready.
GLO Warning: Timeout occured while waiting Target became ready.
GLO Warning: Timeout occured while waiting Target became ready.
GLO Warning: Timeout occured while waiting Target became ready.
GLO Warning: Timeout occured while waiting Target became ready.
hfs: Removed 1 orphaned / unlinked files and 329 directories
ioqueue_depth = 128, ioscale = 4
GLO Warning: Timeout occured while waiting Target became ready.
GLO Warning: Timeout occured while waiting Target became ready.
GLO Warning: Timeout occured while waiting Target became ready.
ioqueue_depth = 128, ioscale = 4
GLO Warning: Timeout detected for connection 0xa0bf200 in state 2
GLO Warning: Error 32 while receiving BHS.
GLO Warning: Receiving thread has stopped with error 32.
SCSIPressurePathManager: Timed out waiting for inactive/error path to become active, loops = 0
SCSIPressurePathManager: Timed out waiting for inactive/error path to become active, loops = 1
SCSIPressurePathManager: Timed out waiting for inactive/error path to become active, loops = 2
SCSIPressurePathManager: Timed out waiting for inactive/error path to become active, loops = 3
SCSIPressurePathManager: Timed out waiting for inactive/error path to become active, loops = 4
SCSIPressurePathManager: Timed out waiting for inactive/error path to become active, loops = 5
SCSIPressurePathManager: Timed out waiting for inactive/error path to become active, loops = 6
SCSIPressurePathManager: Timed out waiting for inactive/error path to become active, loops = 7
SCSIPressurePathManager: Timed out waiting for inactive/error path to become active, loops = 8
SCSIPressurePathManager: Timed out waiting for inactive/error path to become active, loops = 9
SCSIPressurePathManager: new active path available, checking, loops = 10
NVEthernet::setLinkStatus - Valid but not Active
0 [Time 1262713319] [Message System Sleep


Is there anything else I should look for? error 32 is traditionally EPIPE; I suspect that the iscsi TCP connection is being closed by the target while the mac is asleep and it's not recovering from this.

#3 Boulder1259

Boulder1259

    Newbie

  • Members
  • Pip
  • 2 posts

Posted 05 January 2010 - 10:45 PM

Shortly before Christmas I installed the snow leopard beta bits (4.0.0.197) on our home mac mini running snow leopard, and connected it to a 60GB zfs zvol on an OpenSolaris server. I'm using the iscsi device exclusively for Time Machine backups. Backups are (mostly) working but we've had to reboot the mac fairly frequently to recover from hangs -- most commonly, we get the "spinning beachball of death" and apps are unresponsive. Sometimes only one or two apps are affected; in other cases, they all lock up.

We use fast user switching extensively (between two users -- me and my wife) and I've seen the lockup occur when the system wakes up and I immediately user-switch to my account.

I realise that this bug report is woefully incomplete. What information can I collect at this point to help you diagnose the problem? I'm an experienced unix/bsd/solaris kernel developer (and so is my wife) but don't have macos-specific experience.



#4 Boulder1259

Boulder1259

    Newbie

  • Members
  • Pip
  • 2 posts

Posted 05 January 2010 - 10:53 PM

I am experiencing similar problems when I try to do a backup to a QNAP TS-439 NAS. I'm working to establish a pattern, but so far I've noticed that Finder-based file transfers seem to work fine, but rsync-based transfers (rsync from the command line and Carbon Copy Cloner from the GUI) hang the machine, I've not tried Time Machine, but I believe it is rsync-based, and so I don't find it surprising that your transfers are hanging the machine. Like sommerfield, I'm not sure what data to collect to help diagnose this bug, but I'd like to learn. Thanks!

P.S. Note that I don't have these problems when attaching to the NAS via AFP.

#5 sommerfeld

sommerfeld

    Newbie

  • Members
  • Pip
  • 7 posts

Posted 05 January 2010 - 10:56 PM

I'm trying a few things now that it seems likely that it's a failure of the iscsi initiator on the mac to recover gracefully from an unexpected close of the tcp connection by the target. This is really in the category of workaround/mitigation rather than a real fix but if it lets us keep using time machine for now..

1) I set the "Wake for network access" preference in the "energy saver" system preference pane (it was previously not set).

2) I built a custom version of iscsitgtd on solaris which doesn't set the (sadly misnamed) "SO_KEEPALIVE" socket option so it should be more tolerant of an unresponsive peer.

It's too soon to tell if this makes a difference. I'll report back in a few days.

#6 sommerfeld

sommerfeld

    Newbie

  • Members
  • Pip
  • 7 posts

Posted 06 January 2010 - 01:01 AM

another new error message (bolded below) in "sudo dmesg" output:

Ethernet [nvenet]: Link up on en0, 1-Gigabit, Full-duplex, No flow-control, Debu
g [796d,0000,0de1,000d,c1e1,3800]
NVEthernet::setLinkStatus - link Valid and Active
GLO Warning: Error 32 while receiving BHS.
GLO Warning: Receiving thread has stopped with error 32.
SNSSCSIPeripheralDeviceType00::setPowerState(0x13ecb400, 0 -> 3) timed out after
100193 ms

GLO Warning: Error 32 while receiving BHS.
GLO Warning: Receiving thread has stopped with error 32.
GLO Warning: Timeout occured while waiting Target became ready.
GLO Warning: Timeout occured while waiting Target became ready.
GLO Warning: Timeout occured while waiting Target became ready.


Is the target driver attempting to power-manage the remote iscsi disk?

#7 Eric Newbauer

Eric Newbauer

    Advanced Member

  • KB Moderators
  • PipPipPip
  • 258 posts

Posted 08 January 2010 - 10:18 AM

Hey sommerfeld, thanks for the info. Investigating...
Eric Newbauer, SNS Moderator
----
Learn about our SAN/NAS shared media storage products
Follow Studio Network Solutions on Twitter

#8 sommerfeld

sommerfeld

    Newbie

  • Members
  • Pip
  • 7 posts

Posted 08 January 2010 - 02:17 PM

Hey sommerfeld, thanks for the info. Investigating...

Thanks. timestamps from /var/adm/kernel.log shows that "GLO Warning" messages are coming out at a rate of several per minute:

Jan 8 12:10:55 McGarrett kernel[0]: GLO Warning: Timeout occured while waiting Target became ready.
Jan 8 12:11:25: --- last message repeated 2 times ---
Jan 8 12:11:26 McGarrett kernel[0]: GLO Warning: Timeout occured while waiting Target became ready.
Jan 8 12:11:56: --- last message repeated 2 times ---
Jan 8 12:11:57 McGarrett kernel[0]: GLO Warning: Timeout occured while waiting Target became ready.

are these from the driver? anything you want me to look at with dtrace or other tools?

#9 lkraav

lkraav

    Newbie

  • Members
  • Pip
  • 2 posts

Posted 09 January 2010 - 04:28 AM

Thanks. timestamps from /var/adm/kernel.log shows that "GLO Warning" messages are coming out at a rate of several per minute:

are these from the driver? anything you want me to look at with dtrace or other tools?


im also getting these timeouts and wondering about them.

#10 SNSryan

SNSryan

    Advanced Member

  • KB Moderators
  • PipPipPip
  • 221 posts

Posted 12 January 2010 - 04:54 PM

Hey Guys,

A newer version of the initiator is available that may address your issues. This new version is available for download here -
http://www.snsftp.co....0.204_BETA.dmg

Please try to reproduce the issue using this newer version and let us know your results. Thanks in advance for your assistance. We look forward to hearing the results.

-ryan
EVO is an all-in-one, turnkey, SAN and NAS shared storage server specifically developed and tuned for the needs of AVID, Final Cut Pro, Adobe Premiere, Pro Tools, Autodesk Smoke, Assimilate Scratch and other film, video, graphics, VFX, animation, audio, and broadcast production software. EVO features 1/10Gb ethernet and or 4/8Gb fibrechannel - no switches needed! More info on EVO can be found here: http://www.studionet...vo-features.php

#11 sommerfeld

sommerfeld

    Newbie

  • Members
  • Pip
  • 7 posts

Posted 13 January 2010 - 10:25 AM

Please try to reproduce the issue using this newer version and let us know your results. Thanks in advance for your assistance. We look forward to hearing the results.

I've installed the new initiator. While it's too soon to be sure, it appears to be a big improvement over the previous beta. In particular, time machine seems to be much faster and more responsive now and there isn't a lag after resume before the system becomes responsive.

During and shortly after boot I saw the following messages in the kernel log:

...
Jan 12 23:53:00 McGarrett kernel[0]: jnl: disk1s2: replay_journal: from: 4760576 to: 5237760 (joffset 0x1e0000)
Jan 12 23:53:00 McGarrett kernel[0]: jnl: disk1s2: journal replay done.
Jan 12 23:53:00 McGarrett kernel[0]: ioqueue_depth = 128, ioscale = 4
Jan 12 23:53:01 McGarrett kernel[0]: GLO Warning: Tail (65536 bytes) of the Data Segment (PDU 0x9246400) will be ignored.
Jan 12 23:54:15 McGarrett kernel[0]: GLO Warning: Tail (65536 bytes) of the Data Segment (PDU 0x9246400) will be ignored.
Jan 12 23:56:19 McGarrett kernel[0]: GLO Warning: Tail (65536 bytes) of the Data Segment (PDU 0x8ee3c00) will be ignored.
Jan 12 23:56:32: --- last message repeated 3 times ---
Jan 12 23:56:32 McGarrett kernel[0]: GLO Warning: Tail (65536 bytes) of the Data Segment (PDU 0x93c2c00) will be ignored.
Jan 12 23:57:02: --- last message repeated 9 times ---
Jan 12 23:57:17 McGarrett kernel[0]: GLO Warning: Tail (65536 bytes) of the Data Segment (PDU 0x93c2c00) will be ignored.
Jan 12 23:58:07: --- last message repeated 3 times ---


I have not seen any:

Jan 12 23:46:45 McGarrett kernel[0]: GLO Warning: Timeout occured while waiting Target became ready.

since rebooting with the new beta.

#12 SNSryan

SNSryan

    Advanced Member

  • KB Moderators
  • PipPipPip
  • 221 posts

Posted 13 January 2010 - 05:38 PM

I've installed the new initiator. While it's too soon to be sure, it appears to be a big improvement over the previous beta. In particular, time machine seems to be much faster and more responsive now and there isn't a lag after resume before the system becomes responsive.

During and shortly after boot I saw the following messages in the kernel log:


I have not seen any:

since rebooting with the new beta.

That's great news! Thanks for posting the logs. Please let us know if you run into anything else. Thanks for all your help.

-ryan
EVO is an all-in-one, turnkey, SAN and NAS shared storage server specifically developed and tuned for the needs of AVID, Final Cut Pro, Adobe Premiere, Pro Tools, Autodesk Smoke, Assimilate Scratch and other film, video, graphics, VFX, animation, audio, and broadcast production software. EVO features 1/10Gb ethernet and or 4/8Gb fibrechannel - no switches needed! More info on EVO can be found here: http://www.studionet...vo-features.php

#13 robodude666

robodude666

    Member

  • Members
  • PipPip
  • 11 posts

Posted 18 January 2010 - 10:07 PM

Not to hijack someone else's thread, but I previously tested the first 4.0.0.x beta release which ended up disconnecting my iSCSI target shortly after connecting, and crashing finder. I'm pleased to say that 4.0.0.204 has solved all of those problems! It works flawlessly, and even connects after startup far faster than 3.3 ever did. Not to mention the GUI overhaul is beautiful.

Keep up the good work SNSryan, and the rest of the SNS gang!

#14 nstuyvesant

nstuyvesant

    Newbie

  • Members
  • Pip
  • 4 posts

Posted 19 January 2010 - 12:22 AM

Hey Guys,

A newer version of the initiator is available that may address your issues. This new version is available for download here -
http://www.snsftp.com/guest/globalSAN_4.0.0.204_BETA.dmg

-ryan


Hi Ryan,

First of all, thank you for providing a free iSCSI initiator for the Mac. Personally, I think Apple's been a bit lax but your company stepped up to the plate so kudos to you!

Just caught a minor thing... I was on the FAQ page on the Studio Network Solutions site and noticed the link to the 4.0 beta still points to 4.0.0.197 rather than 4.0.0.204.

Another tiny item... The copyright message in the globalSAN iSCSI System Preference says "2004-2009" rather than through 2010.

Also, a few questions...
- Do you have any release notes for 4.0.0.204?
- What are the plans as far as a 64-bit version of the initiator?
- Quite a while back, I tried 3.3.0.43 with an OpenSolaris target and found that write performance was really slow (compared to IET on Ubuntu running on the same hardware). As far as you know, were these issues addressed? I noticed someone posted at the top of this thread that they were using OpenSolaris but not much was said about performance.
- One thing I noticed after installing 4.0.0.204 over 4.0.0.197 and rebooting... my Finder's menu bar was overwritten with some other text that was unclear. I relaunched the Finder and that cleared it up.

Regards,
Nate

#15 nstuyvesant

nstuyvesant

    Newbie

  • Members
  • Pip
  • 4 posts

Posted 19 January 2010 - 12:04 PM

Hey Guys,

A newer version of the initiator is available that may address your issues. This new version is available for download here -
http://www.snsftp.com/guest/globalSAN_4.0.0.204_BETA.dmg

-Ryan


Performance is running about the same as it was for build 197 (which was excellent). Here are some write results from Snow Leopard (10.6.2) to IET 1.4.18 on Ubuntu 9.10 over Gigabit Ethernet (only a 1500 MTU - not jumbo frames):

bash-3.2# time dd if=/dev/zero of=/Volumes/Backup/testfile bs=1048576k count=4
4+0 records in
4+0 records out
4294967296 bytes transferred in 40.739566 secs (105424964 bytes/sec)

real 0m40.975s
user 0m0.001s
sys 0m9.961s

Regards,
Nate

#16 robodude666

robodude666

    Member

  • Members
  • PipPip
  • 11 posts

Posted 19 January 2010 - 01:50 PM

Not to hijack someone else's thread, but I previously tested the first 4.0.0.x beta release which ended up disconnecting my iSCSI target shortly after connecting, and crashing finder. I'm pleased to say that 4.0.0.204 has solved all of those problems! It works flawlessly, and even connects after startup far faster than 3.3 ever did. Not to mention the GUI overhaul is beautiful.

Keep up the good work SNSryan, and the rest of the SNS gang!


Whoops :(. Seems the previous 4.0.0.197 problems persist in 204 as well. Both Time Machine and iTunes froze after about ~20 minutes after startup. The iSCSI disks were mounted, and globalSAN said they were connected but when trying to view the contents past the root list of directors Finder crashed. Reverted back to 3.3 and both Time Machine, iTunes and Finder work as expected.

:(,
-robodude666

#17 SNSryan

SNSryan

    Advanced Member

  • KB Moderators
  • PipPipPip
  • 221 posts

Posted 19 January 2010 - 02:29 PM

Hi Ryan,

First of all, thank you for providing a free iSCSI initiator for the Mac. Personally, I think Apple's been a bit lax but your company stepped up to the plate so kudos to you!

Just caught a minor thing... I was on the FAQ page on the Studio Network Solutions site and noticed the link to the 4.0 beta still points to 4.0.0.197 rather than 4.0.0.204.

Another tiny item... The copyright message in the globalSAN iSCSI System Preference says "2004-2009" rather than through 2010.

Also, a few questions...
- Do you have any release notes for 4.0.0.204?
- What are the plans as far as a 64-bit version of the initiator?
- Quite a while back, I tried 3.3.0.43 with an OpenSolaris target and found that write performance was really slow (compared to IET on Ubuntu running on the same hardware). As far as you know, were these issues addressed? I noticed someone posted at the top of this thread that they were using OpenSolaris but not much was said about performance.
- One thing I noticed after installing 4.0.0.204 over 4.0.0.197 and rebooting... my Finder's menu bar was overwritten with some other text that was unclear. I relaunched the Finder and that cleared it up.

Regards,
Nate

Hi Nate,
Thanks for the feedback!

We'll try and reproduce the finder upgrade issue.

It's hard to say if the speed issue you ran into with your Open Solaris target is fixed or not. Please do try again and share your findings with the forum.

None of the documentation has been updated for any of the beta releases. You can expect all documentation to be updated when the final candidate hits the streets. Let us know if you spot anything else. Thanks for keeping an eye out :)

64bit globalSAN initiator is on the roadmap. Subscribe to the product announcements forum or follow us on Twitter - http://twitter.com/snstweets if you want to be automatically notified of the release.

-ryan
EVO is an all-in-one, turnkey, SAN and NAS shared storage server specifically developed and tuned for the needs of AVID, Final Cut Pro, Adobe Premiere, Pro Tools, Autodesk Smoke, Assimilate Scratch and other film, video, graphics, VFX, animation, audio, and broadcast production software. EVO features 1/10Gb ethernet and or 4/8Gb fibrechannel - no switches needed! More info on EVO can be found here: http://www.studionet...vo-features.php

#18 SNSryan

SNSryan

    Advanced Member

  • KB Moderators
  • PipPipPip
  • 221 posts

Posted 19 January 2010 - 02:31 PM

Whoops :(. Seems the previous 4.0.0.197 problems persist in 204 as well. Both Time Machine and iTunes froze after about ~20 minutes after startup. The iSCSI disks were mounted, and globalSAN said they were connected but when trying to view the contents past the root list of directors Finder crashed. Reverted back to 3.3 and both Time Machine, iTunes and Finder work as expected.

:(,
-robodude666

Hi,

Did you try uninstalling 3.3 using the original 3.3 installer package then upgrading to v.4.0.0.204?

-ryan
EVO is an all-in-one, turnkey, SAN and NAS shared storage server specifically developed and tuned for the needs of AVID, Final Cut Pro, Adobe Premiere, Pro Tools, Autodesk Smoke, Assimilate Scratch and other film, video, graphics, VFX, animation, audio, and broadcast production software. EVO features 1/10Gb ethernet and or 4/8Gb fibrechannel - no switches needed! More info on EVO can be found here: http://www.studionet...vo-features.php

#19 ezilagel

ezilagel

    Newbie

  • Members
  • Pip
  • 2 posts

Posted 22 January 2010 - 11:41 AM

Thanks for the feedback!



I really appreciate you guys making this happen for Mac users.. I really wish good things for this project. Here are some things that I have run into .

It would be nice if the target mounted before login instead of 1-2 minutes after you login. At least an option.
It would also be nice if I didn't have to repair my volume every 2 days. If your hfs+ does become corrupt. Just use Disk Warrior. Nothing is wrong with you data, you don't need to "recover" anything. It takes about 3 minutes to repair the volume.
Also, my very fast Intel Snow Leopard Mac seems sluggish when running this software. It will also be nice when they released a 64 bit version. Over disk speed seems bad also. Read and Write.

Using this on two machines seems to be what causes the most volume damage. Especially if one machine adds something to the volume or crashes and doen'ts properly eject the volume.

Have you released anything beyond the 4.0.0.204 beta?

#20 Eric Newbauer

Eric Newbauer

    Advanced Member

  • KB Moderators
  • PipPipPip
  • 258 posts

Posted 22 January 2010 - 04:50 PM

Using this on two machines seems to be what causes the most volume damage. Especially if one machine adds something to the volume or crashes and doen'ts properly eject the volume.


The corruption isn't stemming from globalSAN -- it's due to the fact that HFS+ is not natively a "multi-user" file system. Connecting multiple computers to an HFS+ file system (or any single user, non-clustered file system such as HFS+ and NTFS) will definitely cause problems like those you reported unless you use something like SANmp.

Please see this FAQ for more info.

At this time 4.0.0.204 is the latest beta drop.
Eric Newbauer, SNS Moderator
----
Learn about our SAN/NAS shared media storage products
Follow Studio Network Solutions on Twitter