Anthropic bug is still the same.
Claude from Anthropic has had this problem for years:
It changes “
“ U+201C LEFT DOUBLE QUOTATION MARK: double turned comma quotation mark
into this "
" U+0022 QUOTATION MARK: double quote
Here is Claude 4.7 Opus (reasoning) reply:
You're right, and this is a well-documented issue that has affected Claude and Claude Code for an extended period.
What's happening
The behavior you describe — turning " (U+201C LEFT DOUBLE QUOTATION MARK) into " (U+0022 QUOTATION MARK) — is part of a broader pattern where non-ASCII typographic characters get silently normalized to their ASCII equivalents [2][3]. The same thing happens with:
- U+201D (right double quotation mark) → U+0022 [3]
- U+2019 (right single quotation mark) → U+0027 [8]
- U+201E (German low opening quote) and other typographic quotes [6]
- German umlauts (ä, ö, ü, ß) getting replaced with ASCII digraphs (ae, oe, ue, ss) [5]
Where the problem appears to live
Based on investigation by users, the root cause does not appear to be me generating the wrong characters. Rather:
- Claude appears to generate the correct Unicode punctuation internally, but something in Anthropic's API response pipeline normalizes these characters in the stream — without signaling this to either the user or the model [1].
- In Claude Code specifically, the
Edit, Write, and Update tools' string-matching/normalization layer conflates U+201E / U+201C / U+201D with U+0022 [3][6].
- This is particularly destructive when editing JSON files or doing OCR/document conversion, where exact character fidelity matters [2][3].
Real-world impact
- It makes Claude essentially unusable for OCR of documents into structured data where original punctuation must be preserved [2].
- German-language users have to run post-processing scripts like
fix_quotes.py after every interaction, which defeats the purpose of AI-assisted writing [7].
- In the Claude web app, producing German closing quotation marks (U+201C, which is the opening quote in German typography) has been reported as effectively impossible [4].
- The umlaut-substitution regression has been ongoing since December 2025 [5].
What I can and can't do about it
Honestly: if the normalization is happening in the transport/tool layer rather than in my token generation, then no amount of me "trying harder" in a single response will fix it — the characters get rewritten after I emit them [1]. If you're hitting this in Claude Code specifically, the workarounds people have used include:
- Writing the file via a shell command (e.g.,
printf or a heredoc) with explicit \u201C escapes, bypassing the Edit/Write tool normalization.
- Using base64-encoded content that gets decoded by a shell command.
- Post-processing with a sed/python script.
I can't promise that this very reply preserves the U+201C you pasted, for exactly the reasons above. If you tell me the specific context (API, Claude Code, web app) and what you're trying to accomplish, I can suggest the least-broken workaround.
[1]: the-curious-case-claudes-quotes.md - GitHub (26%)
[2]: [Bug] Claude models normalize Unicode quotation marks to ASCII ... (19%)
[3]: [BUG] Write and Update tools normalize unicode double ... - GitHub (15%)
[4]: Bug: Incorrect Handling of German Quotation Marks in Tool use ... - GitHub (10%)
[5]: Claude Code replaces German umlauts with ASCII substitutes for 3+ ... (9%)
[6]: Edit tool corrupts non-ASCII Unicode characters (typographic quotes ... (9%)
[7]: [BUG] German typographic quotes get incorrectly auto-corrected (9%)
[8]: [BUG] Claude cannot reliably output non-ASCII Unicode ... - GitHub (3%)