Google URL Collection API Python Example
Part of maintaining a personal website is to allow search engines to quickly index web content and obtain natural search traffic to serve more users. Submitting directly to search engines such as Google is one way, and submitting directly using API is another way. When you want to submit multiple URLs at a time, the API method is more convenient.
1. Create a Google Cloud Service Account
The Google index api is managed on the google cloud platform. This step is omitted (there are many related tutorials on the Internet). After the setting is successful, a json file will be downloaded and saved locally. The file contains some key information. Through this key file , to be authorized to call Google-related APIs
2. Write a Python program to call api
-
Prepare the environment and install dependent packages
1pip install --upgrade google-api-python-client oauth2client
-
File preview
1(.venv) ➜ google-api-python-client tree 2. 3├── google_batch.py 4├── google_index.py 5├── hugo-368210-30b9660ab8b3.json 6├── readme.txt 7└── urls.txt 8 90 directories, 5 files
-
google_batch.py for batch submission
The submission is generally
URL_UPDATED
, of course, it can also be deleted, and you can view the corresponding official documents if necessary.The contents of the program file are as follows:
1from oauth2client.service_account import ServiceAccountCredentials 2from googleapiclient.discovery import build 3from googleapiclient.http import BatchHttpRequest 4import httplib2 5 6 7with open("urls.txt") as f: 8 new_urls = f. readlines() 9 10 11JSON_KEY_FILE = "hugo-368210-30b9660ab8b3.json" 12SCOPES = ["https://www.googleapis.com/auth/indexing"] 13ENDPOINT = "https://indexing.googleapis.com/v3/urlNotifications:publish" 14 15#Authorize credentials 16credentials = ServiceAccountCredentials.from_json_keyfile_name( 17 JSON_KEY_FILE, scopes=SCOPES 18) 19http = credentials. authorize(httplib2. Http()) 20 21# Build service 22service = build("indexing", "v3", credentials=credentials) 23 24 25def insert_event(request_id, response, exception): 26 if exception is not None: 27 print(exception) 28 else: 29 print(response) 30 31 32batch = service.new_batch_http_request(callback=insert_event) 33 34#url updated 35for urls in new_urls: 36 batch. add( 37 service.urlNotifications().publish(body={"url": url, "type": "URL_UPDATED"}) 38 ) 39 40batch. execute()
-
google_index.py single URL submission
The
content
part in the file is the URL information and operation type you want to submit.Program example:
1 2from oauth2client.service_account import ServiceAccountCredentials 3import httplib2 4 5SCOPES = ["https://www.googleapis.com/auth/indexing"] 6ENDPOINT = "https://indexing.googleapis.com/v3/urlNotifications:publish" 7 8# service_account_file.json is the private key that you created for your service account. 9JSON_KEY_FILE = "hugo-368210-30b9660ab8b3.json" 10 11credentials = ServiceAccountCredentials.from_json_keyfile_name( 12 JSON_KEY_FILE, scopes=SCOPES 13) 14 15http = credentials. authorize(httplib2. Http()) 16 17# Define contents here as a JSON string. 18# This example shows a simple update request. 19# Other types of requests are described in the next step. 20 21content = """{ 22\"url\": \"http://mephisto.cc/zh-tw/tech/s2t/\", 23\"type\": \"URL_UPDATED\" 24}""" 25 26response, content = http. request(ENDPOINT, method="POST", body=content) 27print("response:", response) 28print("content:", content)
-
hugo-368210-30b9660ab8b3.json key file
This is downloaded locally after creating a Google cloud service account. It contains sensitive information, and the value part has been manually deleted. It is for reference only.
Example:
1{ 2 "type": "service_account", 3 "project_id": "", 4 "private_key_id": "", 5 "private_key": "", 6 "client_email": "", 7 "client_id": "", 8 "auth_uri": "", 9 "token_uri": "", 10 "auth_provider_x509_cert_url": "", 11 "client_x509_cert_url": "" 12}
-
urls.txt bulk URL file
This is the URL storage method I am used to, one URL per line, example:
1(.venv) ➜ google-api-python-client cat urls.txt 2https://mephisto.cc//note/hero-chen/ 3https://mephisto.cc/zh-tw/note/hero-chen/
The above is the simplified and traditional address of an article. This file is not needed when updating a single file, and can also be changed to the organization format you need. If you know python, you can modify the
google_batch.py
file yourself.
-
3. Practical example of submitting multiple URLs
-
Modify
urls.txt
, this step can be automatically generated by the program -
Run the program
1python google_batch.py
- example
1(.venv) ➜ google-api-python-client python google_batch.py 2{'urlNotificationMetadata': {'url': 'https://mephisto.cc/tech/google-index-api-python/\n', 'latestUpdate': {'url': 'https://mephisto.cc /tech/google-index-api-python/\n', 'type': 'URL_UPDATED', 'notifyTime': '2023-04-10T01:48:20.759700010Z'}}} 3{'urlNotificationMetadata': {'url': 'https://mephisto.cc/zh-tw/tech/google-index-api-python/\n', 'latestUpdate': {'url': 'https:/ /mephisto.cc/zh-tw/tech/google-index-api-python/\n', 'type': 'URL_UPDATED', 'notifyTime': '2023-04-10T01:48:20.759685330Z'}}}
Submitting a single file does not work. When you submit a request for inclusion, Google will process it step by step, which is much faster than Baidu.
Copyright statement:
- All content that is not sourced is original., please do not reprint without authorization (because the typesetting is often disordered after reprinting, the content is uncontrollable, and cannot be continuously updated, etc.);
- For non-profit purposes, to deduce any content of this blog, please give the relevant webpage address of this site in the form of 'source of original text' or 'reference link' (for the convenience of readers).