Skip to content

Release Notes v2.3.1

Summary

This version brings bug fixes and updates to our v2.3.0 release.

New features

  • [HG info]
    • Add support for CSV and JSON output formats
  • [HG/NA Perf Test]
    • Enable sizes to be passed using k/m/g qualifiers
  • [NA OFI]
    • Add tcp_rxm alias for tcp;ofi_rxm
    • Find CXI svc_id or vni if auth_key components have zeros (e.g., auth_key=0:0)
      • Add VNI index for SLINGSHOT_VNIS discovery as extra auth_key parameter

Bug fixes

  • [HG/NA]
    • Fix potential race when checking secondary completion queue
  • [HG]
    • Prevent multiple threads from entering HG_Core_progress()
      • Add HG_ALLOW_MULTI_PROGRESS CMake option to control behavior (ON by default)
      • Disable NA_HAS_MULTI_PROGRESS if HG_ALLOW_MULTI_PROGRESS is ON
    • Fix expected operation count for handle to be atomic
      • Expected operation count can change if extra RPC payload must be transferred
    • Let poll events remain private to HG poll wait
      • Prevent a race when multiple threads call progress and HG_ALLOW_MULTI_PROGRESS is OFF
    • Separate internal list from user created list of handles
      • Address an issue where HG_Context_unpost() would unnecessarily wait
  • [HG Core]
    • Cache disabled response info in proc info
    • Add HG_Core_registered_disable(d)_response() routines
    • Refactor and optimize self RPC code path
    • Add additional logging of refcount/expected op count
    • Fixes for self RPCs with no response
  • [HG Util]
    • Prevent locking in hg_request_wait()
      • Concurrent progress in multi-threaded scenarios on the same context could complete another thread's request and let a thread blocked in progress
  • [HG Perf]
    • Fix tests to be run in parallel with any communicator size
  • [HG Test]
    • Ensure affinity of class thread is set
    • Add concurrent multi RPC test
    • Add multi-progress test
    • Add multi-progress test with handle creation
    • Refactoring of unit test cleanup
  • [NA]
    • Fix memory leak on NA_Get_protocol_info()
  • [NA OFI]
    • Fix na_ofi_get_protocol_info() not returning opx protocol
      • Refactor na_ofi_getinfo() to account for NA_OFI_PROV_NULL type
      • Ensure there are no duplicated entries
    • Refactor parsing of init info strings and fix OPX parsing
    • Simplify parsing of some address strings
    • Bump default CQ size to have a maximum depth of 128k entries
    • Remove sockets as the only provider on macOS
    • Remove send after send tagged msg ordering
    • Ensure that rx_ctx_bits are not set if SEP is not used
    • Set CXI domain ops w/ slingshot 2.2 to prevent from potential memory corruptions
  • [NA Perf]
    • Prevent tests from being run as parallel tests
  • [CMake]
    • Pass INSTALL_NAME_DIR through target properties
      • This fixes an issue seen on macOS where libraries would not be found using @rpath

⚠ Known Issues

  • [NA OFI]
    • [tcp/verbs;ofi_rxm] Using more than 256 peers requires FI_UNIVERSE_SIZE to be set.

Last update: November 6, 2023