Today's Context Window includes OpenAI's free doctor tool, Anthropic's Mythos model getting breached via URL guessing, and the practitioner's guide to building AI as infrastructure.
The autonomous commerce section is interesting because it is not just AI that buys stuff. It is AI navigating ambiguous pricing, stock signals, and trust in real time. I ran a similar experiment after Anthropic's vending machine test (https://thoughts.jock.pl/p/agentic-commerce-ai-shopping-for-you-2026): built a small store and had an agent make actual purchase decisions.
The failure point was not placing the order, it was the agent deciding what counted as a good deal. That judgment layer required me writing explicit value criteria, which made me realize agentic commerce is not autonomous, it is delegated with your preferences pre-encoded.
Love this framing of “agentic commerce as delegated, not autonomous.” The way you hit the wall on “what counts as a good deal” is exactly why I’m obsessed with this space: the hard problem isn’t checkout, it’s encoding your personal and messy human tradeoffs into something an agent can actually act on.
In my write-up I leaned into the Anthropic marketplace as a capability milestone, but your experiment makes it clear the bottleneck is preference modeling and value alignment, not tooling. Would love to compare notes on how you structured those value criteria and where they broke.
The autonomous commerce section is interesting because it is not just AI that buys stuff. It is AI navigating ambiguous pricing, stock signals, and trust in real time. I ran a similar experiment after Anthropic's vending machine test (https://thoughts.jock.pl/p/agentic-commerce-ai-shopping-for-you-2026): built a small store and had an agent make actual purchase decisions.
The failure point was not placing the order, it was the agent deciding what counted as a good deal. That judgment layer required me writing explicit value criteria, which made me realize agentic commerce is not autonomous, it is delegated with your preferences pre-encoded.
Love this framing of “agentic commerce as delegated, not autonomous.” The way you hit the wall on “what counts as a good deal” is exactly why I’m obsessed with this space: the hard problem isn’t checkout, it’s encoding your personal and messy human tradeoffs into something an agent can actually act on.
In my write-up I leaned into the Anthropic marketplace as a capability milestone, but your experiment makes it clear the bottleneck is preference modeling and value alignment, not tooling. Would love to compare notes on how you structured those value criteria and where they broke.