Baidu website index api Python example

2022-09-26 · 3 min read · Python ·

As we all know, Baidu, the domestic search engine, has the largest market share, and if it is successfully included, it will help increase the influence of the site. This site has not been included by Baidu for N months, and I don’t know why. Anyway, Google and Bing abroad have included it, so let’s leave it to fate. Frequent manual submission is unbearable. The key point is that manual submission is limited to 20 addresses each time. When verifying, the pictures must be converted to horizontal. Baidu has worked hard, and I have worked hard too.

According to Baidu's document example, I wrote a simple python program to submit the URL. The script is as follows:

 1     #!/usr/bin/env python
 2
 3     import requests
 4
 5     web_site = "https://xxx.com"
 6     token = "your_baidu_api_token"
 7
 8     # api url must be formatted into such an address, if you use params yourself, Baidu will teach you how to behave ^_^
 9     baidu_api_url = "http://data.zz.baidu.com/urls?site={}&token={}".format(web_site, token)
10
11     header = {
12         "User-Agent": "curl/7.12.1",
13         "Host": "data.zz.baidu.com",
14         "Content-Type": "text/plain",
15     }
16
17     # urls.txt holds your submission URLs, one per line. in the same directory as this program
18     with open("urls.txt") as f:
19         urls = f. readlines()
20     urls = "".join(urls)
21
22     try:
23         r = requests.post(baidu_api_url, headers=header, data=urls)
24         print("Push URL:", r.url)
25         if r.status_code == 200:
26             print("Push successfully:", r.json())
27         r. raise_for_status()
28     # Catch various errors:
29     except requests.exceptions.HTTPError as errh:
30         print("Http Error:", errh)
31     except requests.exceptions.ConnectionError as errc:
32         print("Error Connecting:", errc)
33     except requests.exceptions.Timeout as errt:
34         print("Timeout Error:", errt)
35     except requests.exceptions.RequestException as err:
36         print("OOps: Something Else", err)

Test run, baidu.py is the above program, and each line in urls.txt has an address to be submitted

 (.venv) ➜ mephisto.cc git:(main) ✗ python3 baidu.py
 Push address: http://data.zz.baidu.com/urls?site=https://mephisto.cc&token=xxxxx
 Push succeeded: {'remain': 2913, 'success': 29}

The above means: 29 submissions were successful, and the remaining submission quota is 2913, which seems to be much more than the manual submission limit of 20. Note that every time urls.txt is filled with a new address to be submitted, the reason is introduced at the end of the article.

Excerpt from Baidu documentation:

API Push: The fastest way to submit. It is recommended that you immediately push the new link of the site to Baidu through this method, so as to ensure that the new link can be included by Baidu in time.

1. What effect will be achieved by using the API push function

Discover in time: It can shorten the time for Baidu crawlers to discover new links on your site, so that newly released pages can be included by Baidu in the first time
Protect original content: For the latest original content of the website, use the API push function to quickly notify Baidu, so that the content can be discovered by Baidu before forwarding

2. When is the most effective time to use the API push submission function?

Submit immediately when the page link is generated or published, so the effect is the best

3. What's wrong with resubmitting already published links?

There will be two effects. First, it will waste your submission quota. There is a limit to the number of submissions per site per day. If you have submitted old links, when there are new links, you may not be able to submit them because the quota is exhausted. Second, If you frequently submit old links repeatedly, we will lower your quota, and you may lose the permission of the API push function

Lastmod： Monday, August 28, 2023

Copyright statement:

All content that is not sourced is original., please do not reprint without authorization (because the typesetting is often disordered after reprinting, the content is uncontrollable, and cannot be continuously updated, etc.);
For non-profit purposes, to deduce any content of this blog, please give the relevant webpage address of this site in the form of 'source of original text' or 'reference link' (for the convenience of readers).

Baidu website index api Python example

1. What effect will be achieved by using the API push function

2. When is the most effective time to use the API push submission function?

3. What's wrong with resubmitting already published links?

Copyright statement:

Translations: