Pub/Sub’s push and pull subscriptions aren’t just different ways to get messages; they fundamentally alter the direction of control and responsibility in your distributed system.
Let’s see what this looks like in practice. Imagine a simple scenario where we want to process incoming user sign-ups.
Scenario: Processing User Sign-ups
We have a user-signups topic. When a new user signs up, a message is published to this topic. We want to process these sign-ups to send a welcome email and update a CRM.
Option 1: Push Subscription
In this model, Pub/Sub acts like a proactive messenger. It pushes messages to a pre-configured endpoint you own.
- Publisher: Your application publishes a message to the
user-signupstopic. - Pub/Sub: Pub/Sub holds onto the message and, based on the subscription configuration, sends it to your specified webhook URL (e.g.,
https://your-service.com/pubsub-handler). - Subscriber: Your webhook endpoint receives the message, processes it (sends email, updates CRM), and acknowledges receipt to Pub/Sub.
Example Configuration (Conceptual):
# gcloud pubsub subscriptions create user-signups-push \
# --topic=user-signups \
# --push-endpoint=https://your-service.com/pubsub-handler \
# --push-auth-service-account=your-service-account@your-project.iam.gserviceaccount.com
Your application code at https://your-service.com/pubsub-handler would look something like this (Python Flask example):
from flask import Flask, request, jsonify
import base64
import json
app = Flask(__name__)
@app.route('/pubsub-handler', methods=['POST'])
def handle_pubsub_message():
envelope = request.get_json()
if not envelope:
msg = "no Pub/Sub message received"
print(f"error: {msg}")
return jsonify({'error': msg}), 400
if not isinstance(envelope, dict) or 'message' not in envelope:
msg = "invalid Pub/Sub message format"
print(f"error: {msg}")
return jsonify({'error': msg}), 400
pubsub_message = envelope['message']
if 'data' in pubsub_message:
data = base64.b64decode(pubsub_message['data']).decode('utf-8').strip()
try:
signup_data = json.loads(data)
print(f"Received signup data: {signup_data}")
# --- Your processing logic here ---
# 1. Send welcome email
# 2. Update CRM
# ----------------------------------
print("Signup processed successfully.")
return jsonify({'message': 'Message processed'}), 200
except json.JSONDecodeError:
print(f"Could not decode JSON data: {data}")
return jsonify({'error': 'Invalid JSON data'}), 400
except Exception as e:
print(f"Error processing message: {e}")
return jsonify({'error': 'Internal processing error'}), 500
else:
print("No data in message.")
return jsonify({'error': 'No data field in message'}), 400
if __name__ == '__main__':
app.run(port=8080)
Pub/Sub will retry sending the message to your endpoint if it doesn’t receive a 2xx status code within a configured timeout.
Option 2: Pull Subscription
With a pull subscription, your application is the one initiating the request. It actively pulls messages from Pub/Sub when it’s ready.
- Publisher: Your application publishes a message to the
user-signupstopic. - Subscriber: Your application, running on a server or as a batch job, periodically polls the subscription for available messages. It receives a batch of messages, processes them, and then acknowledges them back to Pub/Sub.
Example Configuration (Conceptual):
# gcloud pubsub subscriptions create user-signups-pull \
# --topic=user-signups
Your application code to pull messages might look like this (Python client library example):
from google.cloud import pubsub_v1
import json
import base64
import time
# TODO(developer) - Replace these with your actual project ID and subscription ID
project_id = "your-gcp-project-id"
subscription_id = "user-signups-pull"
subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path(project_id, subscription_id)
def callback(message: pubsub_v1.subscriber.message.Message) -> None:
try:
data = base64.b64decode(message.data).decode("utf-8")
signup_data = json.loads(data)
print(f"Received signup data: {signup_data}")
# --- Your processing logic here ---
# 1. Send welcome email
# 2. Update CRM
# ----------------------------------
print("Signup processed successfully.")
# Acknowledge the message
message.ack()
print(f"Acknowledged message ID: {message.message_id}")
except Exception as e:
print(f"Error processing message: {e}")
# If an error occurs, do NOT acknowledge the message.
# Pub/Sub will redeliver it after the ack deadline.
# message.nack() # Explicitly NACK if needed, though not acknowledging has same effect
def listen_for_messages():
print(f"Listening for messages on {subscription_path}...")
streaming_pull_future = subscriber.subscribe(subscription_path, callback=callback)
# Keep the main thread alive to process messages
try:
with streaming_pull_future:
# This will block until the future is done, which is when the subscription is closed.
streaming_pull_future.result()
except KeyboardInterrupt:
streaming_pull_future.cancel()
print("Shutting down subscriber.")
if __name__ == '__main__':
listen_for_messages()
The core decision hinges on where you want the control loop to reside. Push is ideal when you have a web service that can reliably receive HTTP requests and you want Pub/Sub to manage the delivery attempts. Pull is better when you need more control over when and how messages are fetched, perhaps for batch processing, or when your service isn’t designed to be a public-facing webhook.
The "ack deadline" is a critical concept for both. For push, it’s the time Pub/Sub waits for an acknowledgment after delivering to your endpoint. For pull, it’s the time Pub/Sub considers a message "in flight" to your subscriber before it becomes available for redelivery if not acknowledged. This deadline is configured per subscription and defaults to 10 seconds but can be extended up to 10 minutes. If your processing takes longer than this deadline, you’ll need to extend it or implement lease management within your subscriber.
What most people miss is how Pub/Sub handles "at-least-once" delivery with pull subscriptions. When you pull messages, Pub/Sub marks them as delivered to your subscriber but doesn’t fully remove them until an acknowledgment is received. If your subscriber crashes after receiving messages but before acknowledging them, Pub/Sub will redeliver those same messages once the ack deadline passes. This means your processing logic must be idempotent – able to handle duplicate messages without causing incorrect side effects.
The next step is understanding how to scale your subscribers to handle increasing message volumes, whether you’re using push endpoints or pull clients.