
Amazon Bedrock: Five Things Every Startup Should Know
I recently published an article on the AWS Builder Center with Jean Malha, diving into the practical lessons we’ve learned helping startups build with Amazon Bedrock. Here’s a quick overview of what we covered.
The Five Things
1. Mind Your Keys: API Authentication Gotchas
The authentication decision isn’t about your startup stage—it’s about what your tech stack supports. IAM roles with STS temporary credentials are ideal, but sometimes middleware like custom API gateways only speak bearer tokens. Bedrock offers long-term keys (up to 36,600 days) and short-term keys (12 hours max). Critical detail: you cannot retrieve API keys after generation—store them in Secrets Manager immediately.
2. Scale Smart: Cross-Region Inference Options
As you scale, understanding your three inference options becomes critical:
- Single-Region: Full data location control, supports Provisioned Throughput
- Geographic CRIS: Routes within US/EU/APAC boundaries for compliance (GDPR, HIPAA)
- Global CRIS: Maximum throughput with ~10% cost savings, routes anywhere worldwide
The model ID format tells the story: us.anthropic.claude-sonnet-4-5-20250929-v1:0 stays in US regions, while global.anthropic.claude-sonnet-4-5-20250929-v1:0 can route globally.
3. Cache Smart: Prompt Caching for Cost and Latency
Repetitive prompt elements—system instructions, document context, tool definitions—can be cached for up to 90% cost reduction and 85% faster responses. Structure prompts with static content first, then cache checkpoints, then dynamic content. One gotcha: cache hits require exact prefix matches, even whitespace matters.
4. Don’t Let max_tokens Tank Your Throughput
Setting max_tokens to the model’s maximum feels safe but kills throughput. Bedrock reserves quota using input_tokens + max_tokens at request start. A 3,000 input + 32,000 max_tokens request reserves 35,000 tokens—even if you only generate 1,000. Analyze your actual output patterns with CloudWatch and set max_tokens close to reality for 8x+ better quota utilization.
5. Not All Latency Metrics Are Created Equal
Here’s a subtle comparison trap: other providers often report Time to First Token (TTFT), while Bedrock’s CloudWatch shows end-to-end InvocationLatency. Comparing “200ms” TTFT to “2,500ms” E2E latency is apples-to-oranges. Use the CloudWatch GenAI Observability dashboard to measure what actually matters for your use case.
Read the full article with detailed decision trees and implementation guidance on AWS Builder Center.