How to analyze network disconnections shown in system log (BC transaction SM21) ?
System log (transaction SM21) shows network disconnections, e.g.
- Q04 Connection to user 2642 (EXTRACAO), terminal 38 (iguacucp125) lost
- Delete session 001 after error 061
- Operating system call recv failed (error no. 10054)
First at all, consider that a "network disconnection" in the system log (transaction SM21) or in a developer trace is not always meaningful; a typical case is an operating system 10061 error while trying to connect e.g. to the gateway of an SAP system that crashed; obviously, while trying to contact the sapgwXX service of the remote host, the connection cannot be established as the gateway is not running there. In case that still it makes sense to analyze the disconnection, there are several possibilities to analyze the errors:
SAP software: SAPgui and kernel
- Make sure that you are using the latest SAPGUI available.
- Make sure that your current SAP kernel is up-to-date (at least not older than half a year) in your SAP application servers.
- This is the starting point to eliminate that error. There are other possible causes that are to be checked if the issue persists after updating to the latest kernel and GUI patches, namely:
- A user with authorization for transaction SM04 can delete a session of any user; this will generate that message in the syslog and tracefiles.
- If a user is already logged in the system and he logs again with the same user, then he will get a pop up window with three options
- Continue with this logon and end any other logon (then his previous session in the system will be ended, and the information message "Delete session XXX after error 061" will be issued)
- Continue this log on without ending other logo
- Terminate this logon
- Another possibility are problems with the SAPGUI. In this case you should see some error messages after activating the frontend-trace.
Operating System support level: workstations and server(s)
Ensure that your systems are patched to the highest support pack, as well as the network card drivers, etc.
Check your hostname configuration ('hosts' files in the workstations, etc.).
Parametrization of SAP system
Sometimes disconnections are not a failure, but a feature offered by the SAP software to avoid the waste of resources due to disconnections caused by users closing the SAPgui without the proper log off, etc. The lines below explain how this mechanism works.
The kernel regularly checks whether a session is still in use and any session that is no longer in use is removed; the check is very simple: if the frontend has not sent any data to the application server for "rdisp/keepalive" seconds, the application server sends a short "ping" message to the frontend. The frontend should answer within the next 40 seconds with "pong", otherwise the application server assumes that the link is dead and releases all resources to the corresponding user. An error line “DP_CONN_DEAD" then appears in the trace file dev_disp. This usually occurs when a user switches off their PC without carrying out the shutdown procedure. A value of "rdisp/keepalive = 0" means that no check occurs.
If the parameter "rdisp/gui_auto_logout" is set, the timeout also applies to HTTP sessions as well as GUI sessions.
There are several situations that can cause a partner not to respond; if none of the above paragraphs can explain your issue, possibly one of the following will fit for your case:
- Workstation issue: a "hardware" issue (e.g. network card broken, but also an old NI driver, an outdated operating system, etc.), a local firewall or antivirus prevents the communication to flow, a OS restriction to the program (the SAPgui in our case) prevents the program to use the network (e.g. User Account Control in the Windows Vista or Server 2008), the program is not running, etc.
- Networking issue: a firewall placed between both parties prevents the communication, a hardware issue (e.g. a damaged cable, node, EM interferences, etc.)
- Server issue (similar to the workstation issue)
Then, the key here will be to determine which is the root cause of this issue. Of course, we will support you closely in case that the a bug in the SAP software is the cause; but please understand that we need to work very closely to you as we do not know your network configuration. It is convenient that you involve here your local networking team.
To further analyze the cause for the frontend not to respond, schedule a detailed network analysis between your application server and the workstation failing until this issue arises again (if ever) or, at least, for some days (even weeks, depending on the periodicity of this subject). This way we will decide if networking issues can be discarded as the root cause of this matter.
NIPING tool is located in the executables directory on any SAP server. You can fetch the latest version of NIPING from the Service Marketplace or, if it is not possible, you can copy the binary from your server binaries directory.
Operating System settings
The following are some typical errors for Microsoft Windows platforms:
- 10048 (WSAEADDRINUSE, SI_EPORT_INUSE) => Only one usage of each socket address (protocol/network address/port) is normally permitted.
- 10054 (WSAECONNRESET, SI_ECONN_BROKEN) => An existing connection was forcibly closed by the remote host.
- 10055 (WSAENOBUFS) => An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.
- 10061 (WSAECONNREFUSED) => No connection could be made because the destination computer actively refused it, e.g. in the remote TCP port there is no server program running.
Sometimes these are due to insufficient settings for your operating system due to your particular requirements. This would be the case e.g. if a Java application needs to create a high amount of threads in a very short period of time, everyone with one or more TCP/IP connections; then you should extend the default values for the registry keys MaxUserPort and TCPTimedWaitDelay, otherwise you will get aforementioned error 10055.
Also, we have found a lot of issues with some new features as the Scalable Networking Pack aka. SNP (TCP Chimney Offload feature, RSS, and NetDMA). In particular, we always recommend to disable the "TCP Chimney Offload" feature option on your NIC. In order to do so, you can run from a command prompt “netsh int ip set chimney DISABLED”; run “netsh int ip show chimney” in order to know its current status. Then, reboot the system (it is mandatory!).
Even the “Media Sensing” feature can cause some troubles. Note that this feature is disabled by default in a Windows Server 2003-based server cluster, and so the DisableDHCPMediaSense registry entry has no effect.