aws-samples/news-clustering-and-summarization

This repository contains code for a near real-time news clustering and summarization solution using AWS services like Lambda, Step Functions, Kinesis, and Bedrock. It demonstrates how to efficiently process, embed, cluster, and summarize large volumes of news articles to provide timely insights for financial services and other industries.

46
/ 100
Emerging

The solution implements a hybrid serverless and compute architecture: Kinesis ingests articles into a Step Functions workflow that preprocesses and embeds documents using Bedrock's Titan model, stages results in S3, then routes to EC2 instances running DBSCAN clustering via SQS micro-batches. DynamoDB Streams trigger Claude Haiku summarization pipelines when clusters exceed configurable thresholds, with results persisted back to DynamoDB for real-time UI consumption. Infrastructure is fully provisioned with Terraform, supporting dozen+ articles/second throughput through event-driven orchestration and auto-scaling compute pools.

No Package No Dependents
Maintenance 10 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 13 / 25

How are scores calculated?

Stars

41

Forks

6

Language

HCL

License

MIT-0

Last pushed

Feb 14, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/aws-samples/news-clustering-and-summarization"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.