Google URL Collection API Python Example

Part of maintaining a personal website is to allow search engines to quickly index web content and obtain natural search traffic to serve more users. Submitting directly to search engines such as Google is one way, and submitting directly using API is another way. When you want to submit multiple URLs at a time, the API method is more convenient.

1. Create a Google Cloud Service Account

The Google index api is managed on the google cloud platform. This step is omitted (there are many related tutorials on the Internet). After the setting is successful, a json file will be downloaded and saved locally. The file contains some key information. Through this key file , to be authorized to call Google-related APIs

google cloud account setting

2. Write a Python program to call api

  • Prepare the environment and install dependent packages

    1pip install --upgrade google-api-python-client oauth2client
    
  • File preview

    1(.venv) ➜ google-api-python-client tree
    2.
    3├── google_batch.py
    4├── google_index.py
    5├── hugo-368210-30b9660ab8b3.json
    6├── readme.txt
    7└── urls.txt
    8
    90 directories, 5 files
    
    • google_batch.py for batch submission

      The submission is generally URL_UPDATED, of course, it can also be deleted, and you can view the corresponding official documents if necessary.

      The contents of the program file are as follows:

     1from oauth2client.service_account import ServiceAccountCredentials
     2from googleapiclient.discovery import build
     3from googleapiclient.http import BatchHttpRequest
     4import httplib2
     5
     6
     7with open("urls.txt") as f:
     8    new_urls = f. readlines()
     9
    10
    11JSON_KEY_FILE = "hugo-368210-30b9660ab8b3.json"
    12SCOPES = ["https://www.googleapis.com/auth/indexing"]
    13ENDPOINT = "https://indexing.googleapis.com/v3/urlNotifications:publish"
    14
    15#Authorize credentials
    16credentials = ServiceAccountCredentials.from_json_keyfile_name(
    17    JSON_KEY_FILE, scopes=SCOPES
    18)
    19http = credentials. authorize(httplib2. Http())
    20
    21# Build service
    22service = build("indexing", "v3", credentials=credentials)
    23
    24
    25def insert_event(request_id, response, exception):
    26    if exception is not None:
    27        print(exception)
    28    else:
    29        print(response)
    30
    31
    32batch = service.new_batch_http_request(callback=insert_event)
    33
    34#url updated
    35for urls in new_urls:
    36    batch. add(
    37        service.urlNotifications().publish(body={"url": url, "type": "URL_UPDATED"})
    38    )
    39
    40batch. execute()
    
    • google_index.py single URL submission

      The content part in the file is the URL information and operation type you want to submit.

      Program example:

     1
     2from oauth2client.service_account import ServiceAccountCredentials
     3import httplib2
     4
     5SCOPES = ["https://www.googleapis.com/auth/indexing"]
     6ENDPOINT = "https://indexing.googleapis.com/v3/urlNotifications:publish"
     7
     8# service_account_file.json is the private key that you created for your service account.
     9JSON_KEY_FILE = "hugo-368210-30b9660ab8b3.json"
    10
    11credentials = ServiceAccountCredentials.from_json_keyfile_name(
    12    JSON_KEY_FILE, scopes=SCOPES
    13)
    14
    15http = credentials. authorize(httplib2. Http())
    16
    17# Define contents here as a JSON string.
    18# This example shows a simple update request.
    19# Other types of requests are described in the next step.
    20
    21content = """{
    22\"url\": \"http://mephisto.cc/zh-tw/tech/s2t/\",
    23\"type\": \"URL_UPDATED\"
    24}"""
    25
    26response, content = http. request(ENDPOINT, method="POST", body=content)
    27print("response:", response)
    28print("content:", content)
    
    • hugo-368210-30b9660ab8b3.json key file

      This is downloaded locally after creating a Google cloud service account. It contains sensitive information, and the value part has been manually deleted. It is for reference only.

      Example:

       1{
       2  "type": "service_account",
       3  "project_id": "",
       4  "private_key_id": "",
       5  "private_key": "",
       6  "client_email": "",
       7  "client_id": "",
       8  "auth_uri": "",
       9  "token_uri": "",
      10  "auth_provider_x509_cert_url": "",
      11  "client_x509_cert_url": ""
      12}
      
    • urls.txt bulk URL file

      This is the URL storage method I am used to, one URL per line, example:

      1(.venv) ➜ google-api-python-client cat urls.txt
      2https://mephisto.cc//note/hero-chen/
      3https://mephisto.cc/zh-tw/note/hero-chen/
      

      The above is the simplified and traditional address of an article. This file is not needed when updating a single file, and can also be changed to the organization format you need. If you know python, you can modify the google_batch.py file yourself.

3. Practical example of submitting multiple URLs

  • Modify urls.txt, this step can be automatically generated by the program

  • Run the program

    1python google_batch.py
    
    • example
    1(.venv) ➜ google-api-python-client python google_batch.py
    2{'urlNotificationMetadata': {'url': 'https://mephisto.cc/tech/google-index-api-python/\n', 'latestUpdate': {'url': 'https://mephisto.cc /tech/google-index-api-python/\n', 'type': 'URL_UPDATED', 'notifyTime': '2023-04-10T01:48:20.759700010Z'}}}
    3{'urlNotificationMetadata': {'url': 'https://mephisto.cc/zh-tw/tech/google-index-api-python/\n', 'latestUpdate': {'url': 'https:/ /mephisto.cc/zh-tw/tech/google-index-api-python/\n', 'type': 'URL_UPDATED', 'notifyTime': '2023-04-10T01:48:20.759685330Z'}}}
    

Submitting a single file does not work. When you submit a request for inclusion, Google will process it step by step, which is much faster than Baidu.

Lastmod: Monday, August 28, 2023

See Also:

Translations: