We are seeing a server hang in kbmMW 5.15.10 (TkbmMWDynamicLockFreeHashArray)

Home Forums kbmMW We are seeing a server hang in kbmMW 5.15.10 (TkbmMWDynamicLockFreeHashArray)

Viewing 3 reply threads
  • Author
    Posts
    • #58720
      VadimMest
      Participant

      Dear Kim

      We are seeing a server hang in kbmMW 5.15.10 and the current strongest suspicion is not our service code anymore, but the kbmMW lock-free infrastructure, specifically TkbmMWDynamicLockFreeHashArray / TkbmMWLockFreeHashArray.Contains, apparently related to FOngoingRequests.

      Our local source tree shows:

      kbmMW 5.15.10 Mar 29 2021: kbmMWServer.pas (line 5627)
      KBMMW_SUPPORT_LOCKFREE is enabled: kbmMW.inc (line 642)
      KBMMW_SUPPORT_FASTMRWSLOCK is enabled: kbmMWConfig.inc (line 36)
      Symptoms:

      the server process stays alive but stops responding;
      in Process Hacker, CLOSE_WAIT connections accumulate massively;
      one captured state was 1452 CLOSE_WAIT, 3 ESTABLISHED, 1 SYN_SENT: Process Hacker Network.txt (line 1)
      We captured a full dump:
      dfFileService.exe.dmp

      What we found:

      the process had 1511 threads;
      most threads were stuck in a very similar pattern;
      the dominant pattern appears to go through kbmMWLockFree.TkbmMWLockFreeHashArray<Integer>.Contains.
      Relevant code path:

      TkbmMWOnGoingServiceRequests = class(TkbmMWDynamicLockFreeHashArray<boolean>)
      kbmMWServer.pas (line 6142)
      service instances are inserted into FOngoingRequests:
      kbmMWServer.pas (line 8128)
      kbmMWServer.pas (line 8155)
      and removed on return:
      kbmMWServer.pas (line 9457)
      TkbmMWDynamicLockFreeHashArray<T>.Contains:
      kbmMWLockFree.pas (line 2967)
      base TkbmMWLockFreeHashArray<T>.Contains:
      kbmMWLockFree.pas (line 1644)
      The reason we now suspect this area is that the dump does not primarily point to file I/O waits (CreateFile/ReadFile/FindFirst/DeleteFile), but instead to many threads converging in the lock-free hash-array path.

      Also, your own changelog seems highly relevant:

      Fixed TkbmMWDynamicLockFreeHashArray and siblings related to deadlock…
      kbmMWServer.pas (line 4161)
      And the same appears in the public release notes for 5.03.00:

      ANN: kbmMW Professional and Enterprise Edition v. 5.03.00 released!


      Question:
      Does this look like a known issue in 5.15.10 involving TkbmMWDynamicLockFreeHashArray / TkbmMWLockFreeHashArray.Contains, especially as used by FOngoingRequests in kbmMWServer? If so, what would be the safest workaround on this branch:

      disable KBMMW_SUPPORT_LOCKFREE,
      patch FOngoingRequests usage,
      or apply a specific fix in Contains / resize / GC generation handling?

    • #58721
      VadimMest
      Participant

      I tried this fix

      // FTimingHash:=TkbmMWLockFreeHashArray32.Create(100)
      FTimingHash:=TkbmMWLockFreeHashArray32.Create(4096)

      It seems working
      The server has been running for 3 days without any freezes.

    • #58726
      kimbomadsen
      Keymaster

      Hi,

      There have been several fixes since 5.15 related to the lockfree hash array. So I would recommend to update to a newer version of kbmMW.

      best regards
      Kim/C4D

    • #58731
      VadimMest
      Participant

      I’d like to update kbmMW to a newer version, but you’re blocking sales for Russian users. So I’m forced to use version 5.15.

Viewing 3 reply threads
  • You must be logged in to reply to this topic.