Jump to content


Photo

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Started by Victor Homocea , 03 January 2014 - 01:41 AM
8 replies to this topic

Best Answer Victor Homocea , 04 January 2014 - 11:54 AM

I found out that HP has a hardware switch on the main-board that overrides the SL250s PCI Express 64-bit BAR Support option in BIOS. (DUUUH  :blink:)

 

In order to disable the HP BIOS option PCI Express 64-bit BAR Support in BIOS, you need to open the server, "navigate to the system maintenance switches (17 in the attached picture http://i.imgur.com/YpMmmVV.pngand move the System Maintenance Switch 9 to the OFF position.

 

To check if your work is done, start BIOS and check for the setting if is disabled.

 

I personally, will have to move my a%# to work in order to do this. Will let you know what is the outcome. Cheers!  B)

 

UPDATE:

I switched the switch ( :rolleyes:to the OFF position by unscrewing 2 side screws and removing the power and data links for the HDD cage. Then, with a screwdriver, but not with ease, i switched the 9th switch.

After this step, i went to BIOS and DISABLED the option PCI Express 64-bit BAR Support from the Service Menu (Ctrl+A)

 

This is the outcome of the nvidia-smi command now:

 

[root@xen ~]# nvidia-smi

Sat Jan  4 17:09:00 2014
+------------------------------------------------------+
| NVIDIA-SMI 331.30     Driver Version: 331.30         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GRID K2             On   | 0000:0A:00.0     Off |                  Off |
| N/A   32C    P8    28W / 117W |     10MiB /  4095MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GRID K2             On   | 0000:0B:00.0     Off |                  Off |
| N/A   36C    P8    28W / 117W |     10MiB /  4095MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
 
+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|  No running compute processes found                                         |
+-----------------------------------------------------------------------------+
 

I would appreciate if you would create/update any CTX document with this information. For me it was a pain in the a$$

 

Thanks for the coaching session! (happy)

Victor Homocea Members

Victor Homocea
  • 66 posts

Posted 03 January 2014 - 01:41 AM

Hi guys,
I was struggling for a while to get some GPU Pass-through working on my HP Z820 Nvidia Quadro K5000 and realised it ain't working due to some firmware hacks done by Nvidia so i aborted the operation.
 
Now i got a (almost <_<) totally supported hardware for testing vGPU and GPU Pass-through but again, problems appear. Damn!
 
My actual config is: HP Proliant SL250s with 2x Xeon E5-2660, 256GB RAM and 2 NVIDIA GRID K2.
My XenServer is fresh installed, with SP1 and NVIDIA's vGPU and my BIOS is P75 11/14/2013 (last public version). BIOS was updated after reading reading HP's advisory c03745865 and the recommended setting was applied (anyhow it was setup on Enabled)

 

This is what i get after trying to initialize the vGPU with the nvidia-smi command:

 
FATAL: Error inserting nvidia (/lib/modules/2.6.32.43-0.4.1.xs1.8.0.847.170785xen/kernel/drivers/video/nvidia.ko): No such device
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
 
 
This is what i get to see in /var/log/messages after running the nvidia-smi command
 
Jan  3 03:33:46 XenServerCTX kernel: [   83.184425] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
Jan  3 03:33:46 XenServerCTX kernel: [   83.184426] NVRM: BAR1 is 128M @ 0xf800000000000000 (PCI:03ff:00:0b.0)
Jan  3 03:33:46 XenServerCTX kernel: [   83.184428] NVRM: This is a 64-bit BAR mapped above 4GB by the system
Jan  3 03:33:46 XenServerCTX kernel: [   83.184429] NVRM: BIOS or the Linux kernel.  The NVIDIA Linux/x86
Jan  3 03:33:46 XenServerCTX kernel: [   83.184430] NVRM: graphics driver and other system software components
Jan  3 03:33:46 XenServerCTX kernel: [   83.184431] NVRM: do not support this configuration.
Jan  3 03:33:46 XenServerCTX kernel: [   83.184435] nvidia: probe of 0000:0b:00.0 failed with error -1
Jan  3 03:33:46 XenServerCTX kernel: [   83.185558] NVRM: The NVIDIA probe routine failed for 2 device(s).
Jan  3 03:33:46 XenServerCTX kernel: [   83.185561] NVRM: None of the NVIDIA graphics adapters were initialized!
Jan  3 03:33:46 XenServerCTX kernel: [   83.185734] NVRM: NVIDIA init module failed!
Jan  3 03:36:31 XenServerCTX ntpd[7772]: synchronized to LOCAL(0), stratum 10
 
Pretty please guys, help me out. :unsure: I'll be happy to create printscreens with all my BIOS settings if you want.
Thanks!
 
Victor
 

 



Tobias Kreidl CTP Member

Tobias Kreidl
  • 18,718 posts

Posted 03 January 2014 - 04:14 AM

I don't see that model HP listed on the HCL for supporting the GRID series: http://hcl.xensource.com/GPUPass-throughDeviceList.aspx



Rachel Berry Citrix Employees

Rachel Berry
  • 597 posts

Posted 03 January 2014 - 10:13 AM

 

Hi guys,
I was struggling for a while to get some GPU Pass-through working on my HP Z820 Nvidia Quadro K5000 and realised it ain't working due to some firmware hacks done by Nvidia so i aborted the operation.
 
Now i got a (almost <_<) totally supported hardware for testing vGPU and GPU Pass-through but again, problems appear. Damn!
 
My actual config is: HP Proliant SL250s with 2x Xeon E5-2660, 256GB RAM and 2 NVIDIA GRID K2.
My XenServer is fresh installed, with SP1 and NVIDIA's vGPU and my BIOS is P75 11/14/2013 (last public version). BIOS was updated after reading reading HP's advisory c03745865 and the recommended setting was applied (anyhow it was setup on Enabled)

 

This is what i get after trying to initialize the vGPU with the nvidia-smi command:

 
FATAL: Error inserting nvidia (/lib/modules/2.6.32.43-0.4.1.xs1.8.0.847.170785xen/kernel/drivers/video/nvidia.ko): No such device

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
 
 
This is what i get to see in /var/log/messages after running the nvidia-smi command
 

Jan  3 03:33:46 XenServerCTX kernel: [   83.184425] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
Jan  3 03:33:46 XenServerCTX kernel: [   83.184426] NVRM: BAR1 is 128M @ 0xf800000000000000 (PCI:03ff:00:0b.0)
Jan  3 03:33:46 XenServerCTX kernel: [   83.184428] NVRM: This is a 64-bit BAR mapped above 4GB by the system
Jan  3 03:33:46 XenServerCTX kernel: [   83.184429] NVRM: BIOS or the Linux kernel.  The NVIDIA Linux/x86
Jan  3 03:33:46 XenServerCTX kernel: [   83.184430] NVRM: graphics driver and other system software components
Jan  3 03:33:46 XenServerCTX kernel: [   83.184431] NVRM: do not support this configuration.
Jan  3 03:33:46 XenServerCTX kernel: [   83.184435] nvidia: probe of 0000:0b:00.0 failed with error -1
Jan  3 03:33:46 XenServerCTX kernel: [   83.185558] NVRM: The NVIDIA probe routine failed for 2 device(s).
Jan  3 03:33:46 XenServerCTX kernel: [   83.185561] NVRM: None of the NVIDIA graphics adapters were initialized!
Jan  3 03:33:46 XenServerCTX kernel: [   83.185734] NVRM: NVIDIA init module failed!
Jan  3 03:36:31 XenServerCTX ntpd[7772]: synchronized to LOCAL(0), stratum 10
 
Pretty please guys, help me out. :unsure: I'll be happy to create printscreens with all my BIOS settings if you want.
Thanks!
 
Victor

 

 

Hi Victor - "

Jan  3 03:33:46 XenServerCTX kernel: [   83.184426] NVRM: BAR1 is 128M @ 0xf800000000000000 (PCI:03ff:00:0b.0)
Jan  3 03:33:46 XenServerCTX kernel: [   83.184428] NVRM: This is a 64-bit BAR mapped above 4GB by the system

" stands out

 

http://support.citrix.com/article/CTX139834

 

You need to disable high MMIO in the BIOS (Memory Mapped I/O above 4GB)

. HP will be able to tell you the name of the setting for that server.

 

Best wishes,

Rachel



Victor Homocea Members

Victor Homocea
  • 66 posts

Posted 03 January 2014 - 10:56 AM

I don't see that model HP listed on the HCL for supporting the GRID series: http://hcl.xensource.com/GPUPass-throughDeviceList.aspx

 

 

Thanks Tobias, that's why i wrote "almost"  :D

 

Rachel, busy as a bee, i see... I tried to disable that feature but i see it enabled again and again... I though XenServer is enabling it.

Will try to contact HP!

 

Thanks!



Tobias Kreidl CTP Member

Tobias Kreidl
  • 18,718 posts

Posted 03 January 2014 - 04:26 PM

Rachel is right -- similar to Dell boxes, you can't run the driver above the 4 GB level. Maybe HP has a workaround -- good luck!

-=Tobias



Victor Homocea Members

Victor Homocea
  • 66 posts

Posted 04 January 2014 - 11:54 AM

I found out that HP has a hardware switch on the main-board that overrides the SL250s PCI Express 64-bit BAR Support option in BIOS. (DUUUH  :blink:)

 

In order to disable the HP BIOS option PCI Express 64-bit BAR Support in BIOS, you need to open the server, "navigate to the system maintenance switches (17 in the attached picture http://i.imgur.com/YpMmmVV.pngand move the System Maintenance Switch 9 to the OFF position.

 

To check if your work is done, start BIOS and check for the setting if is disabled.

 

I personally, will have to move my a%# to work in order to do this. Will let you know what is the outcome. Cheers!  B)

 

UPDATE:

I switched the switch ( :rolleyes:to the OFF position by unscrewing 2 side screws and removing the power and data links for the HDD cage. Then, with a screwdriver, but not with ease, i switched the 9th switch.

After this step, i went to BIOS and DISABLED the option PCI Express 64-bit BAR Support from the Service Menu (Ctrl+A)

 

This is the outcome of the nvidia-smi command now:

 

[root@xen ~]# nvidia-smi

Sat Jan  4 17:09:00 2014
+------------------------------------------------------+
| NVIDIA-SMI 331.30     Driver Version: 331.30         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GRID K2             On   | 0000:0A:00.0     Off |                  Off |
| N/A   32C    P8    28W / 117W |     10MiB /  4095MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GRID K2             On   | 0000:0B:00.0     Off |                  Off |
| N/A   36C    P8    28W / 117W |     10MiB /  4095MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
 
+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|  No running compute processes found                                         |
+-----------------------------------------------------------------------------+
 

I would appreciate if you would create/update any CTX document with this information. For me it was a pain in the a$$

 

Thanks for the coaching session! (happy)


Best Answer

Rachel Berry Citrix Employees

Rachel Berry
  • 597 posts

Posted 05 January 2014 - 02:16 PM

Thanks Victor for taking the time, I'll make sure we do something CTX or similar to use your experience.
Thank you!
Rachel

Mike Larwood Members

Mike Larwood
  • 44 posts

Posted 05 February 2014 - 09:59 AM

Rather helpfully the XenServer HCL and the nVidia GRID compatible server list seem to be somewhat different...



Nathan Biden Members

Nathan Biden
  • 22 posts

Posted 15 May 2014 - 08:25 PM

http://h20566.www2.hp.com/portal/site/hpsc/template.PAGE/public/kb/docDisplay?javax.portlet.begCacheTok=com.vignette.cachetoken&javax.portlet.endCacheTok=com.vignette.cachetoken&javax.portlet.prp_ba847bafb2a2d782fcbb0710b053ce01=wsrp-navigationalState%3DdocId%253Demr_na-c03745865-5%257CdocLocale%253D%257CcalledBy%253D&javax.portlet.tpst=ba847bafb2a2d782fcbb0710b053ce01&ac.admitted=1400181448795.876444892.199480143

 

 

That's HP's Official Article on the steps. We did this on a SL270s.