Saturday, January 29, 2011

TCP: Address already in use exception - possible causes for client port? NO PORT EXHAUSTION

Hello,

stupid problem. I get those from a client connecting to a server. Sadly, the setup is complicated making debugging complex - and we run out of options.

The environment: *Client/Server system, both running on the same machine. The client is actually a service doing some database manipulation at specific times. * The cnonection comes from C# going through OleDb to an EasySoft JDBC driver to a custom written JDBC server that then hosts logic in C++. Yeah, compelx - but the third party supplier decided to expose the extension mechanisms for their server through a JDBC interface. Not a lot can be done here ;)

The Symptom: At (ir)regular intervals we get a "Address already in use: connect" told from the JDBC driver. They seem to come from one particular service we run.

Now, I did read all the stuff about port exhaustion. This is why we have a little tool running now that counts ports and their states every minute. Last time this happened, we had an astonishing 370 ports in use, with the count rising to about 900 AFTER the error. We aleady patched the registry (it is a windows machine) to allow more than the 5000 client ports standard, but even then, we are far far from that limit to start with.

Which is why I am asking here. Ayneone an ide what ELSE could cause this?

It is a Windows 2003 Server machine, 64 bit. The only other thing I can see that may cause it (but this functionality is supposedly disabled) is Symantec Endpoint Protection that is installed on the server - and being capable of actinc as a firewall, it could possibly intercept network traffic. I dont want to open a can of worms by pointing to Symantec prematurely (if pointing to Symantec can ever be seen as such). So, anyone an idea what else may be the cause?

Thanks

  • Three things stand out:

    to a custom written JDBC server

    and

    Symantec Endpoint Protection

    and

    Client/Server system, both running on the same machine

    Question 1: How is the server choosing ports? Can it be that the client and server (which are on the same machine) some how sync at irregular intervals and choose the same port?

    Question 2: Have you tried turning off SEP? If not turn it off and see what happens. You need to eliminate the possibility that it is interfering.

    Question 3: What does netstat say during the time of the errors?

    TomTom : 1: this is client exhaustion, server listening ports are static. I could not repro that on my machine. 2: not allowed - this only happens on production. Regulations disallow that. 3: no idea ;) We can not really netstat - we have our own tool that runs once per minute and does similar. Port total never hit 500 during errors - which is way below the limits. This is why I assume something else and not a typical exhaustion. Regulations make that hell here. We talk of backend financial institution production system. We do development, others are admins, others for security.
    Joseph Kern : I think you may have to try #2, you need to strip out all of the complexity. It sounds like you're running a custom app ... are you sure the client is closing the connections and releasing the lock on the port?
    TomTom : It gets more crazy. We do run a custom app, but both the JDBC server and the driver are third party (Sungard Martini 5.5, to be exact). We do compile custom stored procedures into the JDBC server, but the server is a black blox, as is the driver. I have a repro system set up and I can hit the server with a LOT more load any log indicates is running at the production server at the moment it errors out. Like - 10 times the load. This is basically why I am coming here. I can get ports up to 4400 on test, prod blows at around 400.
    TomTom : As I track the number of ports in every status I am quite sure there are no left over connections. I get an output every minute in the firm of --! Start !-- 03/05/2010 08:20:08 -! Connections -! Totals Established 184 LastAck 49 TimeWait 70 #All 303 --! End !--
    Joseph Kern : The magic question: What is different on the test system? Is the test system running Symantec Endpoint Protection? Is the test system at the same patch levels the production is at? Are you connecting to the production and test system in the same way?
    TomTom : That is the crazy thing. All the same. I also can not repro that stupid error, which makes it a freaking PAIN to debug.

0 comments:

Post a Comment