[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1107746: Fwd: gcc-14-offload-nvptx: offloading to gpu disabled if #pragma omp requires unified_shared_memory is used (it worked before on the same hardware)



Hi Tobias.

First off, many thanks for getting back to me, this helped me greatly to
avoid wasting time on something out of my control that I could not help.

I am getting back to you because now gcc 15 is in debian sid, as well as the
open nvidia kernel drivers version 550.163.01.  I just wanted to confirm
that unified shared memory does indeed work out of the box on a machine
running debian sid, using the stock kernel 6.16.12+deb14-amd64,
nvidia-open-kernel modules 550.163.01-3 and the packaged
gcc-15-offload-nvptx in debian sid with a reasonably recent nvidia gpu.

While this is a very welcome step forward, it may be useful to report on a
nasty problem though: if the memory of the GPU is exceeded, instead of just
failing to run or crashing the user space code, a kernel oops is triggered,
the process remains as an unremovable zombie and does not release the gpu
memory (basically forcing a reboot rather sooner than later...).

I attach here a simple test code that triggers the kernel oops on my laptop
if I set NX to a value larger than 80000000 (see source code attached).  It
works fine here with NX up to 60000000.  Otherwise it hangs forever,
unkillable, i.e.  if killed it leaves an unremovable <defunct> process
hanging anyway and resources are not freed.  I am also filing a bug report
for the nvidia-open-kernel-dkms package, so the package maintainers might forward it to you. Or maybe not, since it seems the problem is not gcc's
fault but the nvidia module's. In any case, I thought this might be a useful
piece of info for you.

Thanks again for your previous help. Please let me know if there is some
other simple test I can run on my machine to more precisely locate the
problem, if you want.

Bye
Giacomo

On Mon, 16 Jun 2025, Tobias Burnus wrote:

Subject:
Re: gcc-14-offload-nvptx: offloading to gpu disabled if #pragma omp requires
unified_shared_memory is used (it worked before on the same hardware)
Date:
Mon, 16 Jun 2025 10:50:30 +0200
From:
Tobias Burnus <burnus@net-b.de>
To:
1107746@bugs.debian.org

      I found out that in the current debian gcc-14-offload-nvptx and
      gcc-13-offload-nvptx, if I compile a code that requires
      unified_shared_memory and uses openmp to offload to gpu, the
      code is never
      run on the gpu. It does compile the offload code, but then it is
      never
      executed on the gpu.


That's to be expected for (at least) GCC 13 and GCC 14.

The OpenMP spec states that 'available devices' must be
'accessible' and 'supported'. And the later is defined
(glossary, here from 6.0):

"supported device - The host device or any non-host device supported
by the implementation, including any device-related requirements
specified by the requires directive."


Thus, if you specify

omp requires unified_shared_memory

and either the device or the implementation does not support
unified-shared memory, all unsupported devices are removed such
that only the host is left (host fallback).

In some old GCC versions, '#pragma omp requires' was simply
ignored (warning with -Wunknown-pragmas, implied by -Wall).

For some versions, requiring USM would give an error.

I think since GCC 13, the host-fallback mechanism is at works,
printing an warning with GOMP_DEBUG=1 at runtime, if a device
cannot fulfill the requirement.

* * *

Since GCC 15, USM is supported under the following conditions:

https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html

As this is about an Nvidia GPU:

"* OpenMP code that has a requires directive with self_maps or
unified_shared_memory runs on nvptx devices if and only if all
of those support the pageableMemoryAccess property;⁵ otherwise,
all nvptx device are removed from the list of available devices
(“host fallback”)."

(5)https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-require
ments


Which is fulfilled by [cf. (5)]:

"Linux HMM requires Linux kernel version 6.1.24+, 6.2.11+ or 6.3+, devices
with compute capability 7.5 or higher and a CUDA driver version 535+
installed with Open Kernel Modules." The pageableMemoryAccess is trueon,
e.g., the Frontier supercomputer
but also on my Laptop (compute capability 8.6, Ampere, meanwhile a
6.15 kernel), but admittedly we had some issues with Debian 12 Bookworm
and an Ada (8.9) card with the current 6.1.140 kernel (>= 6.1.24+) and
a recent open-kernel driver, even though it should have worked according
to the spec.

You can check this by something like:

CUresult res;
int n;
res = cuInit (0);
res = cuDeviceGetCount (&n);
for (int dev = 0; dev < n; ++dev)
{
int val;
__builtin_printf("============== Device%d =================\n", dev);
res = cuDeviceGetAttribute (&val,
CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS, dev);
__builtin_printf ("Device %d: pageableMemoryAccess: %d\n", dev, val);
}

* * *

SOLUTION:

* For full USM support, your system needs to support
pageableMemoryAccess (at least effectively for the devices involved)


If it does, you have two choices:

* Using GCC 15 from experimental, which supports USM, cf.
https://gcc.gnu.org/gcc-15/changes.html and
https://gcc.gnu.org/projects/gomp/#omp5.0 and
https://tracker.debian.org/pkg/gcc-15

* (Using (any) older GCC but) avoid using
omp requires unified_shared_memory.

The difference between the two solutions:

- With the requirement, all maps are 'self maps', i.e.
not data is actually copied.

- Without the requirement, data is copied but as, e.g.,
pointer members of structs still point to the host memory,
accessing those will work.

The USM support (HMM) works as follows:
If you access memory on the GPU that is not directly accessible
(= most host memory, unless you have e.g. a Grace-Hopper), a
memory-page fault is triggered and the the Linux kernel (+ Nvidia
kernel drivers) moves the page to device accessible memory.
Likewise on the way back from the device to the host accessible
memory.

* * *

If the system does not support pageableMemoryAccess, but at least
managedMemory you can access such memory (only) from the device.
If you are careful, this will work - but, obviously, the compiler
cannot regard such a system as supporting USM. Obtaining such memory
can be done using the CUDA-runtime routines for pinned and managed
memory.

--
_________________________________________________________________

Giacomo Mulas <giacomo.mulas@inaf.it>
_________________________________________________________________

INAF - Osservatorio Astronomico di Cagliari
via della scienza 5 - 09047 Selargius (CA)

tel.   +39 070 71180247
mob. : +39 329  6603810
_________________________________________________________________

"every year keeps getting shorter, never seem to find the time
 plans that either come to naught, or half a page of scribbled lines
 the time is gone, the song is over, thought I'd something more to say"
                         (Pink Floyd)
_________________________________________________________________
2025-10-23T14:36:42.388058+02:00 capitanata kernel: Oops: general protection fault, probably for non-canonical address 0x6b6b61042b6b6b48: 0000 [#1] SMP NOPTI
2025-10-23T14:36:42.388078+02:00 capitanata kernel: CPU: 1 UID: 0 PID: 16558 Comm: UVM GPU1 BH Tainted: P        W  OE       6.16.12+deb14-amd64 #1 PREEMPT(lazy)  Debian 6.16.12-1 
2025-10-23T14:36:42.388080+02:00 capitanata kernel: Tainted: [P]=PROPRIETARY_MODULE, [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
2025-10-23T14:36:42.388080+02:00 capitanata kernel: Hardware name: Dell Inc. XPS 15 9500/0RHXRG, BIOS 1.39.0 08/05/2025
2025-10-23T14:36:42.388081+02:00 capitanata kernel: RIP: 0010:__migrate_device_pages+0xab/0xb00
2025-10-23T14:36:42.388082+02:00 capitanata kernel: Code: 8d 1c 08 49 89 dd a8 01 0f 84 0d 01 00 00 48 89 c2 48 83 e2 c0 48 01 ca 48 85 db 0f 84 af 00 00 00 48 85 d2 0f 84 f6 00 00 00 <48> 8b 43 08 a8 01 0f 85 8a 06 00 00 0f 1f 44 00 00 4c 8b 42 08 49
2025-10-23T14:36:42.388084+02:00 capitanata kernel: RSP: 0018:ffffd22da86b3300 EFLAGS: 00010202
2025-10-23T14:36:42.388085+02:00 capitanata kernel: RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b61042b6b6b40 RCX: fffff598c0000000
2025-10-23T14:36:42.388086+02:00 capitanata kernel: RDX: 6b6b61042b6b6b40 RSI: ffff8baa0d6b40d0 RDI: ffff8baa0d6b30d0
2025-10-23T14:36:42.388086+02:00 capitanata kernel: RBP: ffff8baa0d6b30d0 R08: 6b6b6b6b6b6b6b40 R09: 0000000000000000
2025-10-23T14:36:42.388087+02:00 capitanata kernel: R10: 0000000000000000 R11: 0000000000003a31 R12: 0000000000000000
2025-10-23T14:36:42.388106+02:00 capitanata kernel: R13: 6b6b61042b6b6b40 R14: ffff8baa0d6b40d0 R15: 0000000000000000
2025-10-23T14:36:42.388107+02:00 capitanata kernel: FS:  0000000000000000(0000) GS:ffff8bb490c05000(0000) knlGS:0000000000000000
2025-10-23T14:36:42.388107+02:00 capitanata kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2025-10-23T14:36:42.388108+02:00 capitanata kernel: CR2: 000055ed1bc6c568 CR3: 00000002a822c005 CR4: 00000000007706f0
2025-10-23T14:36:42.388109+02:00 capitanata kernel: PKRU: 55555554
2025-10-23T14:36:42.388110+02:00 capitanata kernel: Call Trace:
2025-10-23T14:36:42.388110+02:00 capitanata kernel:  <TASK>
2025-10-23T14:36:42.388111+02:00 capitanata kernel:  hmm_va_block_evict_chunks+0x147/0x250 [nvidia_uvm]
2025-10-23T14:36:42.388112+02:00 capitanata kernel:  uvm_va_block_evict_chunks+0x321/0x730 [nvidia_uvm]
2025-10-23T14:36:42.388112+02:00 capitanata kernel:  ? __pfx_find_and_retain_va_block_to_evict+0x10/0x10 [nvidia_uvm]
2025-10-23T14:36:42.388113+02:00 capitanata kernel:  evict_root_chunk+0xe2/0x290 [nvidia_uvm]
2025-10-23T14:36:42.388113+02:00 capitanata kernel:  pick_and_evict_root_chunk+0x3e/0x140 [nvidia_uvm]
2025-10-23T14:36:42.388114+02:00 capitanata kernel:  uvm_pmm_gpu_alloc+0x5cb/0x690 [nvidia_uvm]
2025-10-23T14:36:42.388115+02:00 capitanata kernel:  block_populate_pages+0x7f5/0x16e0 [nvidia_uvm]
2025-10-23T14:36:42.388115+02:00 capitanata kernel:  ? uvm_va_block_make_resident_copy+0x1af/0x3d0 [nvidia_uvm]
2025-10-23T14:36:42.388116+02:00 capitanata kernel:  uvm_va_block_make_resident_copy+0x1af/0x3d0 [nvidia_uvm]
2025-10-23T14:36:42.388129+02:00 capitanata kernel:  uvm_va_block_service_copy+0xdc/0x310 [nvidia_uvm]
2025-10-23T14:36:42.388130+02:00 capitanata kernel:  uvm_hmm_va_block_service_locked+0x238/0x450 [nvidia_uvm]
2025-10-23T14:36:42.388131+02:00 capitanata kernel:  uvm_va_block_service_locked+0x9c/0x320 [nvidia_uvm]
2025-10-23T14:36:42.388131+02:00 capitanata kernel:  service_fault_batch_block_locked+0x959/0xb20 [nvidia_uvm]
2025-10-23T14:36:42.388132+02:00 capitanata kernel:  ? service_fault_batch_dispatch.isra.0+0x461/0x5d0 [nvidia_uvm]
2025-10-23T14:36:42.388132+02:00 capitanata kernel:  service_fault_batch_dispatch.isra.0+0x461/0x5d0 [nvidia_uvm]
2025-10-23T14:36:42.388133+02:00 capitanata kernel:  ? uvm_parent_gpu_canonical_address+0x37/0xb0 [nvidia_uvm]
2025-10-23T14:36:42.388133+02:00 capitanata kernel:  ? parse_fault_entry_common+0x188/0x1e0 [nvidia_uvm]
2025-10-23T14:36:42.388134+02:00 capitanata kernel:  service_fault_batch+0xbd/0x390 [nvidia_uvm]
2025-10-23T14:36:42.388134+02:00 capitanata kernel:  uvm_gpu_service_replayable_faults+0x141/0x12d0 [nvidia_uvm]
2025-10-23T14:36:42.388134+02:00 capitanata kernel:  ? schedule+0x27/0xd0
2025-10-23T14:36:42.388135+02:00 capitanata kernel:  ? schedule_timeout+0xbd/0x100
2025-10-23T14:36:42.388136+02:00 capitanata kernel:  replayable_faults_isr_bottom_half+0x5c/0x100 [nvidia_uvm]
2025-10-23T14:36:42.388136+02:00 capitanata kernel:  replayable_faults_isr_bottom_half_entry+0x97/0xd0 [nvidia_uvm]
2025-10-23T14:36:42.388137+02:00 capitanata kernel:  _main_loop+0x8f/0x150 [nvidia_uvm]
2025-10-23T14:36:42.388137+02:00 capitanata kernel:  ? __pfx__main_loop+0x10/0x10 [nvidia_uvm]
2025-10-23T14:36:42.388139+02:00 capitanata kernel:  kthread+0xfc/0x240
2025-10-23T14:36:42.388139+02:00 capitanata kernel:  ? __pfx_kthread+0x10/0x10
2025-10-23T14:36:42.388140+02:00 capitanata kernel:  ret_from_fork+0x19a/0x1d0
2025-10-23T14:36:42.388140+02:00 capitanata kernel:  ? __pfx_kthread+0x10/0x10
2025-10-23T14:36:42.388141+02:00 capitanata kernel:  ret_from_fork_asm+0x1a/0x30
2025-10-23T14:36:42.388141+02:00 capitanata kernel:  </TASK>
2025-10-23T14:36:42.388142+02:00 capitanata kernel: Modules linked in: sd_mod hid_logitech_hidpp uhid vxlan xt_policy xt_mark xt_bpf xt_nat nft_nat nft_ct socwatch2_16(OE) nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_masq veth nvidia_uvm(OE) sep5(OE) snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device tun pax(OE) vboxnetadp(OE) vboxnetflt(OE) l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppox vboxdrv(OE) ppp_generic xt_MASQUERADE slhc br_netfilter xfrm_interface bridge xfrm6_tunnel tunnel6 tunnel4 twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common xt_set ip_set serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 nft_chain_nat serpent_generic nf_nat blowfish_generic blowfish_x86_64 blowfish_common cast5_avx_x86_64 cast5_generic cast_common xt_addrtype des3_ede_x86_64 xfrm_user xfrm_algo des_generic libdes camellia_generic camellia_aesni_avx2 camellia_aesni_avx_x86_64 camellia_x86_64 xcbc openafs(POE) md4 scsi_transport_iscsi ccm rdma_ucm ib_uverbs rdma_cm iw_cm
2025-10-23T14:36:42.388144+02:00 capitanata kernel:  bnx2fc ib_cm cnic ib_core uio fcoe 8021q libfcoe garp stp libfc llc mrp scsi_transport_fc qrtr rfcomm cmac algif_hash algif_skcipher af_alg bnep nfnetlink_log at24 overlay spd5118 nf_conntrack_netlink uinput nft_limit xt_limit xt_LOG nf_log_syslog xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat binfmt_misc nf_tables squashfs loop zfs(POE) spl(OE) nfsd auth_rpcgss nfs_acl lockd grace sunrpc vmwgfx nvidia_drm(OE) snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_generic_allocation snd_sof_intel_hda_sdw_bpt snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink snd_sof_intel_hda snd_hda_codec_hdmi soundwire_cadence snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_acpi_intel_match snd_soc_acpi_intel_sdca_quirks snd_soc_acpi crc8 soundwire_bus snd_soc_sdca snd_soc_avs snd_soc_hda_codec snd_hda_ext_core snd_soc_core snd_ctl_led snd_hda_codec_realtek snd_compress snd_hda_codec_generic mei_pxp snd_hda_scodec_component snd_pcm_dmaengine
2025-10-23T14:36:42.388145+02:00 capitanata kernel:  drm_ttm_helper intel_uncore_frequency intel_uncore_frequency_common nvidia_modeset(OE) x86_pkg_temp_thermal intel_powerclamp dell_pc platform_profile mei_hdcp snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi uvcvideo snd_hda_codec coretemp kvm_intel snd_hda_core snd_hwdep intel_rapl_msr dell_laptop snd_pcm_oss videobuf2_vmalloc kvm snd_mixer_oss uvc videobuf2_memops btusb videobuf2_v4l2 dell_wmi snd_pcm btrtl btintel dell_smbios videodev btbcm hid_sensor_als joydev btmtk snd_timer dcdbas irqbypass hid_sensor_trigger rapl evdev intel_cstate intel_uncore bluetooth dell_wmi_sysman dell_wmi_ddv videobuf2_common snd hid_sensor_iio_common nls_ascii industrialio_triggered_buffer dell_smm_hwmon dell_wmi_descriptor firmware_attributes_class pcspkr mei_me nls_cp437 mc crc16 wmi_bmof intel_wmi_thunderbolt ee1004 soundcore kfifo_buf vfat industrialio fat i915 mei iwlmvm mac80211 libarc4 drm_buddy int3403_thermal ttm iwlwifi drm_display_helper processor_thermal_device_pci_legacy button processor_thermal_device
2025-10-23T14:36:42.388146+02:00 capitanata kernel:  processor_thermal_wt_hint platform_temperature_control intel_pmc_core cec processor_thermal_rfim cfg80211 pmt_telemetry rc_core processor_thermal_rapl intel_rapl_common pmt_class processor_thermal_wt_req intel_hid processor_thermal_power_floor drm_client_lib intel_pmc_ssram_telemetry int3400_thermal acpi_thermal_rel processor_thermal_mbox intel_vsec int340x_thermal_zone sparse_keymap acpi_tad acpi_pad drm_kms_helper ac intel_soc_dts_iosf rfkill i2c_algo_bit intel_pch_thermal nvidia(OE) typec_displayport msr i2c_dev parport_pc ppdev lp parport dm_multipath scsi_mod efi_pstore scsi_common configfs nfnetlink ip_tables x_tables autofs4 r8153_ecm cdc_ether usbnet r8152 mii libphy mdio_bus crc32c_cryptoapi btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 md_mod usbhid hid_sensor_hub intel_ishtp_hid dm_mod hid_multitouch ucsi_acpi hid_generic typec_ucsi typec iTCO_wdt i2c_hid_acpi intel_pmc_bxt i2c_hid rtsx_pci_sdmmc ghash_clmulni_intel
2025-10-23T14:36:42.388147+02:00 capitanata kernel:  roles iTCO_vendor_support mmc_core watchdog hid nvme sha512_ssse3 xhci_pci sha1_ssse3 drm psmouse nvme_core xhci_hcd aesni_intel video serio_raw thunderbolt rtsx_pci usbcore nvme_keyring wmi i2c_i801 nvme_auth intel_lpss_pci battery intel_ish_ipc intel_lpss i2c_smbus intel_ishtp idma64 usb_common efivarfs
2025-10-23T14:36:42.388147+02:00 capitanata kernel: ---[ end trace 0000000000000000 ]---
2025-10-23T14:36:42.388168+02:00 capitanata kernel: RIP: 0010:__migrate_device_pages+0xab/0xb00
2025-10-23T14:36:42.388178+02:00 capitanata kernel: Code: 8d 1c 08 49 89 dd a8 01 0f 84 0d 01 00 00 48 89 c2 48 83 e2 c0 48 01 ca 48 85 db 0f 84 af 00 00 00 48 85 d2 0f 84 f6 00 00 00 <48> 8b 43 08 a8 01 0f 85 8a 06 00 00 0f 1f 44 00 00 4c 8b 42 08 49
2025-10-23T14:36:42.388180+02:00 capitanata kernel: RSP: 0018:ffffd22da86b3300 EFLAGS: 00010202
2025-10-23T14:36:42.388180+02:00 capitanata kernel: RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b61042b6b6b40 RCX: fffff598c0000000
2025-10-23T14:36:42.388181+02:00 capitanata kernel: RDX: 6b6b61042b6b6b40 RSI: ffff8baa0d6b40d0 RDI: ffff8baa0d6b30d0
2025-10-23T14:36:42.388182+02:00 capitanata kernel: RBP: ffff8baa0d6b30d0 R08: 6b6b6b6b6b6b6b40 R09: 0000000000000000
2025-10-23T14:36:42.388182+02:00 capitanata kernel: R10: 0000000000000000 R11: 0000000000003a31 R12: 0000000000000000
2025-10-23T14:36:42.388183+02:00 capitanata kernel: R13: 6b6b61042b6b6b40 R14: ffff8baa0d6b40d0 R15: 0000000000000000
2025-10-23T14:36:42.388184+02:00 capitanata kernel: FS:  0000000000000000(0000) GS:ffff8bb490d85000(0000) knlGS:0000000000000000
2025-10-23T14:36:42.388185+02:00 capitanata kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2025-10-23T14:36:42.388185+02:00 capitanata kernel: CR2: 0000559e01a9cdb8 CR3: 00000003077f2006 CR4: 00000000007706f0
2025-10-23T14:36:42.388186+02:00 capitanata kernel: PKRU: 55555554
2025-10-23T14:38:30.440175+02:00 capitanata kernel: show_signal_msg: 14 callbacks suppressed
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#ifdef _OPENMP
#include <omp.h>
#endif
#include <complex.h>
/* #define NX 10000000 */

#pragma omp requires unified_shared_memory
typedef __complex__ double dcomplex;

int main(void)
{
  // double vecA[NX],vecB[NX],vecC[NX];
  long NX = 80000000;
  dcomplex *vecA = (dcomplex *) calloc(NX, sizeof(dcomplex));
  dcomplex *vecB = (dcomplex *) calloc(NX, sizeof(dcomplex));
  dcomplex *vecC = (dcomplex *) calloc(NX, sizeof(dcomplex));
  double pihalf = asin(1.0);
  double r = 0.9999;

/* Initialization of vectors */
//#pragma omp target enter data map(alloc:vecA[0:NX])
//#pragma omp target enter data map(alloc:vecB[0:NX])
  //#pragma omp target map(from:vecA[0:NX],vecB[0:NX])
#pragma omp target teams distribute parallel for simd
  for (long i = 0; i < NX; i++) {
     vecA[i] = pow(r, i);
     vecB[i] = -cexp(2.0*I*pihalf);
  }

/* dot product of two vectors */
//#pragma omp target data enter map(alloc:vecC[0:NX])
  //#pragma omp target map(tofrom:vecA[0:NX],vecB[0:NX]) map(from:vecC[0:NX])
  //#pragma omp target enter data map(alloc:vecC[0:NX])
#pragma omp target teams distribute parallel for simd
  for (long i = 0; i < NX; i++) {
     vecC[i] = vecA[i] * vecB[i];
  }
  //#pragma omp target exit data map(delete:vecA[0:NX])
  //#pragma omp target exit data map(from:vecB[0:NX])

  double sum = 0.0;
  /* calculate the sum */
  //#pragma omp target map(to:vecC[0:NX]) map(tofrom:sum)
  //#pragma omp target enter data map(to:sum)
#pragma omp target teams distribute parallel for simd reduction(+:sum)
  for (long i = 0; i < NX; i++) {
    sum += vecC[i];
  }
  //#pragma omp target exit data map(delete:vecC[0:NX]) map(from:sum)

  printf("The sum is: %8.6f \n", sum);
  printf("vecB[0] = %lg + %lg i \n", __real__(vecB[0]), __imag__(vecB[0]));
  free(vecA);
  free(vecB);
  free(vecC);
  return 0;
}

Reply to: