aws-samples/news-clustering-and-summarization
This repository contains code for a near real-time news clustering and summarization solution using AWS services like Lambda, Step Functions, Kinesis, and Bedrock. It demonstrates how to efficiently process, embed, cluster, and summarize large volumes of news articles to provide timely insights for financial services and other industries.
The solution implements a hybrid serverless and compute architecture: Kinesis ingests articles into a Step Functions workflow that preprocesses and embeds documents using Bedrock's Titan model, stages results in S3, then routes to EC2 instances running DBSCAN clustering via SQS micro-batches. DynamoDB Streams trigger Claude Haiku summarization pipelines when clusters exceed configurable thresholds, with results persisted back to DynamoDB for real-time UI consumption. Infrastructure is fully provisioned with Terraform, supporting dozen+ articles/second throughput through event-driven orchestration and auto-scaling compute pools.
Stars
41
Forks
6
Language
HCL
License
MIT-0
Category
Last pushed
Feb 14, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/aws-samples/news-clustering-and-summarization"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
aws-samples/amazon-bedrock-samples
This repository contains examples for customers to get started using the Amazon Bedrock Service....
debnsuma/fcc-ai-engineering-aws
A Practical Course on Embeddings, RAG, Multimodal Models, and Agents with Amazon Nova.
arnobt78/Embeddable-RAG-Chatbot-Widget--JavaScript-Cloudflare-Workers-FullStack
A production-ready, embeddable AI chatbot widget built with Cloudflare Workers that can be...
f2daz/openclaw-knowledgebase
Self-hosted RAG system with Ollama embeddings and Supabase/pgvector. 100% local, 100% free.
mithun50/groq-rag
Extended Groq SDK with RAG (Retrieval-Augmented Generation), web browsing, and AI agent...