Nginx log $upstream_addr multiple values

When I was troubleshooting an online problem recently, I found that the nginx log $upstream_addr field actually returned 2 values, in the form of "192.168.1.1:80, api-servers", where api-servers refers to your upstream Next to the group name, there is only one value "api-servers". Under normal circumstances, this place should be like "192.168.1.1:80", that is, a request falls to a backend service.

Query Official Documentation:

$upstream_addr

Keeps the ip address and port, or the path to the unix-domain socket of the upstream server. If selects we contacted during research, their add, Resses are seaParatd by commit , unix:/tmp/sock”. If an internal redirect from one server group to another happens, initiated by “X-Accel-Redirect” or error_page, then the server addresses from different groups are separated by colons, e.g. “192.168.1.1 :80, 192.168.1.2:80, unix:/tmp/sock : 192.168.10.1:80, 192.168.10.2:80”. If a server cannot be selected, the variable keeps the name of the server group.

After reading the above document carefully, when the request tries to connect to multiple back-end servers, the above comma-separated multi-valued records will happen. Colon-separated is another situation, which will not be discussed here. When the value is server group name, the meaning of "If a server cannot be selected" is unclear in English. The person who wrote this document should not be a native speaker of English. Whether all backends are unavailable, or they are not available within the default limited selection number , there are masters who know it can discuss it. In short, if your service is registered in the upstream, the service in the service pool should be unavailable at this time, and you have to troubleshoot yourself. The multi-value record gives you a clue.

In our scenario, the consul registration service is used, and it is determined that the consul cluster in a certain computer room is abnormal, resulting in the old service not being removed, and the new one not being updated in time, and the request for the old service has no available server. A 502 status code is returned.

I had doubts about this yesterday, and I came to test and verify it today.

1. Nginx configuration

 ➜ cat /etc/nginx/conf.d/test.com.conf
 upstream backend-servers{
         server 127.0.0.1:9527;
         server 127.0.0.1:9528;
         server 127.0.0.1:9529;
         server 127.0.0.1:9530;
         server 127.0.0.1:9531;
         server 127.0.0.1:9532;
         server 127.0.0.1:9533;
         server 127.0.0.1:9534;
         server 127.0.0.1:9535;
         server 127.0.0.1:9536;
         server 127.0.0.1:9537;
         server 127.0.0.1:9538;
         server 127.0.0.1:9539;
         server 127.0.0.1:9540;
         server 127.0.0.1:9541;
         server 127.0.0.1:9542;
         server 127.0.0.1:9543;
         server 127.0.0.1:9544;
         server 127.0.0.1:9545;
         server 127.0.0.1:9546;
         server 127.0.0.1:9547;
         server 127.0.0.1:9548;
         server 127.0.0.1:9549;
         server 127.0.0.1:9550;
         server 127.0.0.1:9551;
         server 127.0.0.1:9552;
         server 127.0.0.1:9552;
         server 127.0.0.1:9553;
         server 127.0.0.1:9554;
         server 127.0.0.1:9555;
         server 127.0.0.1:9556;
         server 127.0.0.1:9557;
         server 127.0.0.1:9558;
         server 127.0.0.1:9559;
         server 127.0.0.1:9560;
         server 127.0.0.1:9561;
         server 127.0.0.1:9562;
 }
 server {
         listen 80;
         server_name test.com;

         access_log /data/log/test.com.log access;

         location /{
                 proxy_pass http://backend-servers;
         }
 }

2. Use Flask to simulate running backend-server

 ➜ test cat server1.py
 from flask import Flask

 app = Flask(__name__)

 @app.route("/")
 def hello_world():
     return "<p>Hello, 9527!</p>"

 if __name__ == '__main__':
     app.run(host="0.0.0.0", port=9527)

 ➜ test cat server2.py
 from flask import Flask

 app = Flask(__name__)

 @app.route("/")
 def hello_world():
     return "<p>Hello, 9528!</p>"

 if __name__ == '__main__':
     app.run(host="0.0.0.0", port=9528)

As shown below

flask server

3. Test packet capture and view log verification

  • Test with the command curl

       curl -H "Host:test.com" http://127.0.0.1
    
  • tcpdump capture packets to see if nginx will connect to the failed backend-server

       sudo tcpdump -i lo port 9528
    
  • View log results

    nginx upstream test

    Start and shut down the flask server, and then check the value of the nginx $upstream_addr field to see what nginx records in different situations.

In conclusion, when there is an available service, upstream_addr records the ip:port of the available backend; non-dynamic registration (the test case is hard-coded upstream), Nginx will retry to connect to those invalid servers, and upstream_addr displays Multiple values, as many as you tried to connect, record how many are separated by commas; when all flasks are shut down, upstream_addr is displayed as "backend-servers", which is the name defined after upstream, and this appears, indicating that there is no available backend Service, the problem is big, the service is completely out of order!

Lastmod: Monday, August 28, 2023

See Also:

Translations: