LLM (google/gemini-3.1-flash-lite-20260507) summary:
- Documented Exposure: A comprehensive dataset of over 200,000 public pastes gathered from various developer utility websites over seven years was analyzed and categorized.
- Data Sensitivity: Extracted documents frequently contained highly sensitive information such as live cloud infrastructure credentials, database connection strings, social security numbers, and internal service desk records.
- Emergence Of AI Tooling: The prevalence of leaking data through utility services has increased due to current workflows involving artificial intelligence coding assistants that encourage pasting production errors into external tools.
- Methodology Transparency: Researchers performed systematic, unauthenticated scraping of the providers publicly available recent links feed to demonstrate the lack of security surrounding stored professional workflows.
- Turkish Sector Impact: Specific analysis of Turkish data found records containing national identification numbers, bank account details, and private insurance policy documents accessible on public URLs.
- Platform Insecurity: The service utilized for the analysis exhibits a critical stored cross site scripting vulnerability, allowing attackers to execute code in the browser of any user viewing contaminated pastes.
- Institutional Necessity: Regulatory bodies are encouraged to classify the use of third party public paste services as a regulated data processing activity to ensure proper oversight of organizational information.
- Mitigation Recommendations: Proposed security strategies include restricting network access to public formatting tools, adopting offline developer utilities, and treating artificial intelligence prompts as potential data egress points.
Seven years inside the public "Recent Links" feeds of a family of JSON and code "beautifier" tools. What engineers pasted; whose data it was; what the rise of the AI coding assistant changed; and what a Turkish data controller is supposed to do about the TCKNs and IBANs sitting on a stranger's server right now. And the part we did not go looking for: the formatter itself carries a stored cross-site-scripting flaw, so the service holding all of this data can be made to run an attacker's code in your browser.
A morning at the keyboard, somewhere
It is mid-afternoon at a tax-preparation company in the United States. A document-delivery service keeps failing, so an engineer copies a JSON callback out of the debugger to clean it up: one client's filing, caught mid-pipeline. The record carries the client's name, their Social Security number in the clear, and, a few fields down, a live access key for the cloud queue that ships the document. The engineer pastes it into a public JSON formatter. The formatter saves the paste under a six-hex identifier and adds the resulting URL to its public "Recent Links" feed, where, months later, our scraper retrieves it.
That paste is one of the documents in our corpus. The client is one of several thousand people whose most private records have been routed, in pieces, through this single public service over the years. The vendor does not know. The client does not know. The company's information-security team almost certainly does not know either, because if it did, the paste would not still be retrievable as we write this.
Now move the same scene to Istanbul, or Ankara, or Izmir. A different engineer,
a different debugger, the same instinct. The payload that comes out is a retail bank customer's full credit limit and outstanding debt, balances in
Turkish lira. Or a taxpayer's invoice lifted straight off the national
e-Fatura rails. Or a company's entire member table, the whole roster in one
file. The instinct is identical, the tool is identical, and so is the
outcome: the data is gone the moment it leaves the laptop.
The pattern is not new. What has changed in recent years is the shape of who is doing it, what they are pasting, and, increasingly, why. This is a report on that change, and on Türkiye's specific, measurable place inside it.
Why a JSON formatter, and why now
There is nothing remarkable about a JSON formatter as a piece of software. It accepts a blob of text, indents it, and offers a button to save the result under a shareable URL. The remarkable part is that millions of engineers, every year, choose to do their debugging through one, and that the service, by default in many cases, publishes the saved paste on its own public listing page.
watchTowr Labs documented this surface in November 2025 on a sample of roughly 80,000 documents. Our work extends and quantifies theirs: around 200,000 documents, more than double their sample, collected over roughly seven years, ending May 2026, then read and sorted by what each one actually held: personal data, sector context, and a category that did not exist in the pre-LLM corpus: the workflow exhaust of human-AI interaction.
And <a href="http://jsonformatter.org" rel="nofollow">jsonformatter.org</a> is only one storefront. The same operator runs
<a href="http://codebeautify.org" rel="nofollow">codebeautify.org</a> and a family of sibling "beautifier" tools that share a
single save backend and a single pool of saved pastes, so a link saved through
one is retrievable through the others. We harvested that shared pool across its
sibling tools (<a href="http://jsonformatter.org" rel="nofollow">jsonformatter.org</a>, <a href="http://codebeautify.org" rel="nofollow">codebeautify.org</a>, and the rest), reaching
back roughly seven years. The documents this report analyses are the
validated, deduplicated core of that harvest; the raw multi-tool surface behind
them is larger still.
The headline number: at least 1,078 documents in this corpus carry a high-confidence flag for one or more named credentials, identifiers, or live secrets, and a further 2,167 carry the same flags at medium confidence. If you have ever debugged a production payload in a formatter on the open internet, the corpus probably contains your work. The point of this report is to argue that this is a structural problem, to show its shape in 2026 specifically, and to put a number on what it means for one country's regulated sectors.
How we collected it (and what we did not do)
Every document referenced here was retrieved by issuing the same HTTP request the operator's own front-end issues to render the paste-viewer page. We did not bypass any authentication, because there was none to bypass; the endpoint is unauthenticated and the listing surface enumerates the identifiers. A residential proxy budget and a Saturday afternoon will replicate the corpus we describe at single-figure dollar cost. The hard problem was never retrieval. It was making the retrieved data legible.
Concretely: the "Recent Links" page lists saved pastes ten at a time, paginated
by a plain offset: /recentLinksPage/json/0, then /10, then /20, onward
for as long as you care to walk it. Each row resolves to a six-hex identifier.
Hand that identifier to the same endpoint the site's own viewer calls, and the
original paste falls straight back out:
POST /service/getDataFromID
Content-Type: application/x-www-form-urlencoded
urlid=<six-hex-id>&toolstype=json
No token, no session, no rate limit. The only real obstacle is the operator's Cloudflare front door, so we drove a genuine browser engine over the Chrome DevTools Protocol with no automation flags set, rotated through mobile and residential IPs, and moved the cursor and scrolled on human timing, enough to read as the ordinary visitor we technically were. It is the kind of stack a competent team assembles in a weekend.
The harvester walking the public listing ten links at a time, then pulling each paste back through the viewer endpoint:

Reading a corpus this size by eye is not possible, so we did it in passes. One
looked for the things that should never be in a paste at all: live tokens,
keys, connection strings, card numbers. A second looked for Turkish personal
data and validated it rather than guessing. A TCKN counts only once it passes
the official national-ID checksum, an IBAN only once it passes MOD-97. A third
tried to place each document behind a real Turkish signal: a .tr host,
Turkish text, lira amounts, a Turkish bank code.
We did not run a canary-token experiment. watchTowr did, and observed retrieval of a planted canary by an unrelated party within 48 hours of submission. That experiment, independently published, is sufficient evidence for the threat model we describe in this report.
The leaks themselves, redacted
None of what follows is Turkish; Türkiye gets its own section later. The problem is global, and the worst of it lands in regulated industries everywhere. Most of it we show as it appeared in the formatter; a few documents we would rather only describe than screenshot, further down.
The only thing we changed before publishing is redaction. Every live secret,
identifier, payment card, IBAN, name, phone number, email, address,
geo-coordinate and internal hostname has been replaced with a bracketed
<REDACTED-X> marker that names the class of value that sat there. Everything
else is exactly as the engineer left it.
Service configuration, with four bundled credentials
One server-config object carrying four different live secrets at once (an AWS key, a database connection string, a Google AI key, and a SendGrid key), almost certainly pasted by someone fixing a typo in a config file.

A live session token, with the user attached
A login-success response carrying a fresh token, the user's contact details,
and, for good measure, a MongoDB connection string inside the same
store_config object.

We decoded a sample of these tokens to confirm they were the genuine article and not test stubs. The payloads carried real issuer and subject claims and expiry timestamps that placed the token, at the moment the engineer pasted it, inside its valid window. We did not test a single one against a live endpoint, and we reproduce no decoded claim here. Structural validation was enough to know what they were.
A data-protection vendor's Google Cloud key
A cloud backup vendor's own infrastructure-registration payload, pasted while
debugging a Google Cloud onboarding. It carries the service account's client
email, its privateKeyId, and the full clientPrivateKey, the credential that
authenticates to the cloud project named three fields away.

A background-check dossier on a private citizen
A commercial skip-tracing report, the kind a debt collector, landlord or investigator pulls, beautified to read it more easily. One named subject tied to dozens of neighbours, phone numbers, voter registrations, professional licences, property assessments and prior addresses, with property valuations attached. Nobody was breached; a paying subscriber ran the lookup and saved it to a public URL. It is the most invasive single profile in the corpus.

A tax filing, with the SSN and a live cloud key in one paste
A document-delivery callback from a US tax-preparation platform. It carries a filer's name and Social Security number in the clear and, a few fields away, a live Azure Service Bus key for the queue that ships the document. Identity and infrastructure, leaked in the same object.

A global payroll platform's private API key
An integration payload from a global employer-of-record platform, captured mid-handshake. It carries the full client certificate and the matching private key for one of the company's internal private APIs, the credential that authenticates its service-to-service traffic.

An EU regulator's internal ticketing system
A single issue exported from the internal service desk of a European Union regulatory agency: issue keys, workflow states, linked tickets and reviewer fields, served from the agency's own Jira host. Not a credential, but a clear window into how a supranational regulator runs its casework, sitting on a public URL.

A major bank's internal Jira issue
A single issue exported from a major US custody bank's internal tracker, dense with several hundred populated custom fields: project keys, reviewer names, workflow history and internal URLs, all addressed to the bank's own Jira host. One of the highest sensitive-field counts of any document in the corpus.

A few we describe, not show
Not every document needs a screenshot to land, and a few we would rather not reproduce even redacted. Three more from the global, non-Turkish pile, told rather than shown.
A US cable provider's customer file. A customer-record export, dropped into the corpus by a routine engineering paste: named addresses, ZIP codes, communication-preference flags, and the account identifiers tying them together. There is no credential in it, and it does not need one. A customer-record dump is exactly the post-incident artifact a regulator's investigator arrives looking for. This one is already public.
A military hospital's HR record. An employee record pasted line by line: name, date of birth, national identification number, home address, organizational unit, and, in a field the engineer probably never looked at twice, a 280 KB base64-encoded portrait photo. The paste is in the hospital's own language and names the hospital in the payload. Someone was debugging on the way back from lunch, and the security policy did not travel as far as their IDE.
A global bank's repository inventory. An internal Bitbucket-inventory feed (repository names, project keys, "Dev"/"Prod" markers, and the underlying Jira project URLs) tied to a major US bank's internal hostnames. Small, prosaic, and recurrent: the same event reappears across submissions, so it is a scheduled job, not a one-off slip. For an attacker mapping the bank's internal structure, it is a starting point that needs no exploit at all.
The bulk of it
The fragments above are a small, readable sample of a much larger pile, and the pile is the part that is hard to believe. The same scan returns live secrets across nearly every class an attacker would want, and it rarely finds them one at a time. Single documents carry dozens of distinct secrets; the richest carry several hundred, a whole environment's worth of keys, tokens and credentials beautified into one paste and saved to a public URL.
A redacted tour of what sits in the corpus, by class:
- Cloud keys. AWS access-key and secret pairs in config blobs (
"accessKeyId": "<REDACTED>","secretAccessKey": "<REDACTED>"), with Cloudflare, Datadog, PagerDuty and Twilio tokens beside them. - Private keys. PEM-encoded RSA keys (
-----BEGIN RSA PRIVATE KEY-----<REDACTED>) andssh-rsa AAAAB3...<REDACTED>authorized-keys, pasted straight out of job configs and cluster settings. - Database credentials. Connection strings with the password in the clear (
"password": "<REDACTED>"), MongoDB URIs and AMQP strings among them. - Payment and SaaS keys. Stripe live keys (
sk_live_<REDACTED>), SendGrid keys, and payment-partner credentials (key IDs and secrets) sitting in lender and checkout configs. - Workflow secrets. Atlassian/Jira and Bitbucket payloads, one of them carrying several hundred distinct secret tokens in a single document, plus OAuth refresh tokens and decoded JWTs with the user still attached.
- LLM provider keys. OpenAI and Google AI keys, the everyday exhaust of the AI coding assistant.
- Identity and PII. US Social Security numbers in populated customer records, passport and national-ID payloads, Active Directory and Kerberos config, and Turkish TCKNs and IBANs by the file.
We reproduce none of the live values. The counts behind these classes are the kind you re-run twice because you do not trust them the first time, and the originals remain a single unauthenticated request away on the operator's listing surface as we write.
Türkiye on the clipboard
We are a UK-based firm, and much of our team is Turkish, so Türkiye is close to home for us. When this work began, we expected to anchor the whole report on a Turkish critical-infrastructure deep-dive. We owe the reader, and the regulators we want to act on this, the same honesty we would want from anyone writing about somewhere they are this close to: the dataset does not sustain a "Türkiye is hemorrhaging more than everyone else" story, and we are not going to manufacture one. What it sustains is something more useful and, for a Turkish data controller, more actionable.
What the numbers say, and what they don't
The blunt sweep looks alarming. 2,087 documents in Turkish. 800 carrying a personal-data signal. 262 with at least one checksum-valid TCKN. 40 valid Turkish IBANs, 31 of them in a single paste. 18,019 Turkish-format phone numbers.
Then you open the documents, and the honesty tax comes due. A validated TCKN is a strong signal, not a perfect one. The paste that scored highest in the entire corpus, 46 "national IDs" and 61 "phone numbers", turned out to be Apple's end-of-day market data: CUSIP, CIK, and tax-ID strings that happen to pass the checksum. The one with over a thousand "phone numbers" was a delivery-dispatch queue. A 30-TCKN hit was Apple again. A 407-"phone" hit was a Spanish department store's category tree. The country tagger was no better: it filed an Indian pharma company's travel-booking system under "Turkish," and a US meal-kit app too, because one of its food groups is named "Turkey." The license-plate detector alone returned 622,100 matches, which is roughly the moment any honest researcher quietly deletes the license-plate detector.
So we did the slow thing. We separated the genuinely-Turkish documents by hand,
using Turkish text, .tr hostnames, lira amounts, and Turkish bank codes, and
threw the coincidences back. What is left is far smaller than the sweep, and it is
entirely real. Seven of them are below, and they reach from one citizen's
wallet into the government's own e-invoicing rails and a company's entire member
table, redacted to the bone.
Seven real ones
Each of these is confirmed Turkish. Each is described at the sector level, with live values removed.
A citizen, whole, in a single paste (automotive / consumer finance). A response from a Turkish vehicle-trade app, saved to a formatter to "validate the JSON." Skip the throwaway session name at the top; what sits underneath is one real person's life. A named individual (name, surname, TCKN) in Adana, listed beside their car (a 2013 diesel, the plate, status: "Onay bekleniyor", awaiting approval), and in the very same object their bank cards in the clear: the full card number, the CVV, the expiry month and year, and the name on the card, one tagged Ana Ödeme Kartı (primary card), another Onaylandı (approved). Two working payment cards, a national ID and a vehicle record, on one public URL. Everything you would need to be this person at a checkout.

A professional chamber's certified tradesmen (vocational licensing). A
dataset of vocational-qualification certificates issued through one of Turkey's
professional chambers: MYK/TÜRKAK-accredited Çelik Kaynakçısı (steel welder)
records from a chamber of the Union of Chambers of Turkish Engineers and
Architects. Each record pairs a real tradesman's name and TCKN with the chamber
president as the signing authority. This is not a template: the names and
national IDs are populated, record after record.

A private citizen, doxxed by their own chat export (personal data). Not a corporate system this time, one person. Someone saved an export of their Instagram direct messages, and partway through the thread a shoe order runs the usual course: the seller asks for name, shoe size and address, and the customer types all three. What lands on the public URL is her full name, her mobile number, and her home address down to the building number and neighbourhood, in Ağrı in the east of the country. No credential, no breach, no API, just a named human being's front door, left retrievable by anyone who walks the six-hex space.

An insurance policy, mid-issuance (insurance). A Turkish insurer's
policy-issuance response, and not just once: the same object recurs across
several pastes, the signature of a job running on a schedule. Premium 7,664.64
TL, commission and tax broken out, a one-year term, a sum insured of 246,979 TL,
the agent code, CC_MAIL_ORDER as the payment method, and a policyPartners
block that names the insured outright: national ID (TCKN), name, surname, and
role INSURED.

A taxpayer's invoice, off the national e-invoicing rails (government / tax).
A record from Türkiye's Revenue Administration e-invoice system (GİB e-Fatura):
an approved FATURA with its state-issued document number, the seller's
saticiVknTckn tax-identity number and name, the invoice date, and the ETTN,
the unique invoice identifier the state assigns. One taxpayer's invoice, lifted
off the national e-invoicing rails onto a public URL.

A bank customer's limit and debt (retail banking). A major Turkish private
bank's mobile card app, caught mid-session. The response spells out the
customer's credit limit (MÜŞTERİ LİMİTİ ₺10.000,00), their total balance
(TOPLAM BAKİYE −₺5.507,15, i.e. in the red), their usable limit (₺3.746,89),
and, in the cards array, the card number and type. The bank's own
…isbank…/maximummobil asset host names it; we don't.

A company's entire member table (multi-person PII). Not one citizen this
time. A Turkish hosting company's member export: 47 customer records in a
single paste, each with a full name, a personal email, a mobile number, a
registration date and, in plain text, the account's two-factor
AuthenticatorKey. A whole customer database, the 2FA seeds included.

Behind these seven sits the long tail: once the coincidences are stripped out, a few hundred genuinely-Turkish documents still carry citizens' national IDs, phone numbers, addresses, and plates, in customer-service tickets, municipal device logs, and short development blobs. We are not reproducing those.
Why this is a KVKK question, not a curiosity
Here is the framing we want a Turkish data controller and the KVK Kurumu to take
from this section. If your engineering team has ever used <a href="http://jsonformatter.org" rel="nofollow">jsonformatter.org</a> or
any of its analogues, whether your processing activity is compliant under KVKK
Article 12 reduces to one question: do any of your pastes contain personal data?
For the controllers behind the seven documents above, the answer is already yes,
the data is already public, and neither fact is in dispute. "Paste to a
third-party web service" is a covered processing activity in fact; the only open
question is whether your guidance and your egress controls treat it as one.
Are you a data controller who needs to know? We will run a private aggregate query against the corpus, at no cost, on a written request from an officer of any controller, Turkish or otherwise, and tell you whether your data is in it. Write to
<a href="mailto:info@beyondmemory.io">info@beyondmemory.io</a>.
The LLM era: what an AI coding assistant actually leaks
The most distinctive subset of the corpus, against any pre-2024 baseline, is the set of pastes whose shape is unmistakably the input or output of a large-language-model workflow. There was no equivalent surface in 2022, a much smaller one in 2024, and by 2026 it is a paste class in its own right.
System prompts, captured mid-flight. We identified 487 documents at medium-or-high system-prompt likelihood and 131 at the strictest threshold. A meaningful fraction are not toy prompts; they are production system prompts with named individuals, internal product names, and example PII embedded in the few-shot demonstrations. In one, a user opens a chat with an "assistant", asks for a summary, and then pastes several thousand characters of what reads like a family-court witness statement, naming the opposing party and a minor child. The chat application held that document in memory and never touched a public service. The user, separately, saved their own copy on a formatter while preparing the prompt. The two systems never communicated. The data leaked anyway.
Retrieval contexts, leaked one chunk at a time. Thirty documents carry a
confirmed RAG-output shape: chunk_id, source, metadata, sometimes paired
with embedding arrays. A RAG output is, by definition, material the system
retrieved from inside the organization's own knowledge base. Paste it to clean
it up and you have published the retrieval target in full. We name none of the
thirty.
"Paste this for the assistant to fix." Thirty-one documents are recognizable
as an engineer copying a production error into a chat with an AI assistant: a
stack trace, plus an Authorization header, plus an internal hostname, plus a
messages array. The intent is to ask the assistant to debug the error. The
side effect is to publish the cleaned-up error, through the same browser, on a
service the assistant never touched. One is a perfect ASP.NET yellow-screen
whose title is the immortal "Padding is invalid and cannot be removed."
AI provider keys, in passing. Ninety-eight documents contain at least one
live-shaped LLM-provider API key: 72 Google AI, 23 in the composite bucket
(Mistral, Cohere, Replicate, Groq, OpenRouter, Together), 3 on OpenAI's prefix.
The standout is a hand-written Node script for an AI cricket-betting tipster bot
that leaks a matched set (a live Telegram bot token, the operator's chat ID, and
an OpenAI key) all declared as friendly consts under a comment reading
// Set your values.
Why an unauthenticated public formatter still has secrets in 2026
Several arguments are worth surfacing.
First: engineers do not perceive the formatter as a third-party service. They perceive it as software running in their own browser tab, even when the Save button persists their input to the operator's database and exposes the URL on a public feed. The mental model is wrong, and the wrongness is structural, not personal. Calling individual engineers careless does not move the problem.
Second: secret-scanning tooling in the IDE catches secrets at commit time, not at paste time. The egress paths an engineer takes while debugging are not the egress paths corporate security has instrumented.
Third: AI coding assistants have, for many engineers, formalized "paste a payload somewhere clean and ask the assistant for help" as a workflow. The "somewhere clean" is, in practice, often a formatter. The formatter is shared, indexed, harvested.
Fourth: the operator of the formatter has neither the incentive nor, under its own terms of service, the obligation to remediate. The paste was authorized by the engineer who saved it. The operator's only legitimate remediation is to switch off the public listing surface, which would not retrieve the harvested copies already in circulation.
The architecture was never designed against a 2026 threat model. There is no reason to expect it to defend against one without intervention.
The adversary already has this
watchTowr Labs' November 2025 canary-token experiment is the cleanest public evidence we can cite. They planted a token-shaped string on the same operator's "Recent Links" feed and observed automated retrieval by an unrelated party within 48 hours. That party was not them. Their conclusion, which we adopt, is that the operator's listing surface is a routine open-source-intelligence feed for at least one third-party harvester whose intentions are not characterizable.
Beyond watchTowr's direct evidence, the architectural argument stands on its own. Six-hex identifiers are walkable. The endpoint is unauthenticated. The listing page enumerates the identifiers as a service to humans. Any party with a residential proxy budget and a Saturday afternoon can replicate the corpus we describe at single-figure dollar cost. The marginal effect of this publication on the threat model is zero. The marginal effect on the operator's incentives, and on the regulator community's awareness, is the point of writing it.
A stored XSS in the formatter itself
Everything above treats <a href="http://jsonformatter.org" rel="nofollow">jsonformatter.org</a> as a passive warehouse: engineers
put sensitive data in, third parties take it out. While assembling this report
we found the warehouse has a second problem of its own making. The page that
renders a saved paste writes the paste's contents back into the document without
encoding them for safety, so a paste can carry its own JavaScript. Save the
right string, hand someone the link, and your code runs in their browser inside
the <a href="http://jsonformatter.org" rel="nofollow">jsonformatter.org</a> origin. That is a stored, persistent
cross-site-scripting flaw, on a tool that ranks near the top of the results for
"json formatter" and is opened millions of times a month.
The site does gesture at a defence: it blocklists a few obvious tokens. That is the weakest class of XSS control, and it falls to the oldest trick there is, which is to never spell the word you are forbidden to spell. Our proof-of-concept writes none of the blocked tokens; it assembles them at runtime:
"><svg onload=alert(self['docu'+'ment']['domain'])>
The "><svg onload=…> breaks out of the surrounding attribute and executes with
no <script> tag at all; self['docu'+'ment']['domain'] reaches
document.domain without the string document ever appearing. The alert is
deliberately harmless: it pops the origin to prove whose security context the
code runs in. Swap the body for something useful and the same hole does real
work.
We confirmed it firing straight from a paste's title, a field the site reflects with no sanitisation at all, so the payload runs for anyone who merely browses the public "Recent Links" listing, no shared link required:


Why this is worse here than on an average site: by this report's own measurement, that origin is a standing pile of other people's credentials, tokens, customer records and national IDs. A stored XSS on it lets an attacker run code that can read what a victim's page can read, ride a logged-in session, silently rewrite a paste to attack whoever opens it next, or phish under a domain developers already trust, at the scale of a tool the whole industry pastes into. A warehouse with a broken lock is one thing. A warehouse with a broken lock and a trip-wire already on the door is another.
Disclosure. Unlike the data exposure, which is a by-design property of the
service with no patch to ship, this is a concrete, fixable bug. We reported it to
the operator on 3 June 2026, before publishing, with the proof-of-concept above
and a fix recommendation: encode output for its HTML context (or write it with
textContent), add a Content-Security-Policy, and stop treating a keyword
blocklist as the control. We publish the detail now, the operator notified, for
the same reason as the rest of this report: the people most exposed are the
millions who paste into this origin every month, and they are better served
knowing than not. The proof-of-concept is benign (it reads only its own origin),
and we accessed no user data in confirming the bug.
On publication without coordinated disclosure
The next five paragraphs are written for our lawyers, the operator, and every affected organization, and they are meant to be read straight.
No active credential, identifier, or other live secret appears anywhere in this report. Every figure presented is an aggregate. Every example is described, not quoted. Beyondmemory retains the underlying corpus solely on infrastructure under our exclusive control, behind access controls commensurate with the sensitivity of the data, and will destroy the corpus at the conclusion of this research program. Our retention policy is available on request to any regulator, operator, or affected organization with standing to ask.
We have deliberately chosen not to name any organization whose data appears in
the corpus. Where a finding could be identified to a specific entity, we have
described it at the sector level only. Public attribution of named victims would
compound harm rather than reduce it and is not necessary to support the analytic
claims of this report. Organizations who suspect their data may be in the
corpus, or who wish to confirm exposure for incident response purposes, may
contact <a href="mailto:info@beyondmemory.io">info@beyondmemory.io</a> for a good-faith confidential check.
Every document referenced in this report was retrieved by issuing the same HTTP
requests that <a href="http://jsonformatter.org" rel="nofollow">jsonformatter.org</a> itself issues to its own server when a visitor
lands on its "Recent Links" page and opens an individual paste. No
authentication was bypassed because none was present. No rate-limit or access
control was circumvented because the operator does not impose access controls on
the surface in question. The data was, and remains, publicly retrievable to any
party with a browser and patience.
Prior independent research has demonstrated that third parties harvest public
paste services on an ongoing basis. watchTowr Labs, in November 2025, deployed
canary tokens onto <a href="http://jsonformatter.org" rel="nofollow">jsonformatter.org</a> and observed automated retrieval by an
unrelated party within 48 hours of submission. Our publication does not
introduce a new threat. It documents an existing one at a scale, and across a
population of affected organizations, that the public record does not yet
reflect. The marginal risk created by this report is zero; the marginal
awareness it creates is the point of writing it.
This research is published because the audience that can act on it sits across multiple Turkish institutions: the Bilgi Teknolojileri ve İletişim Kurumu (BTK), the Bankacılık Düzenleme ve Denetleme Kurumu (BDDK), the Enerji Piyasası Düzenleme Kurumu (EPDK), the Ulusal Siber Olaylara Müdahale Merkezi (USOM), the Kişisel Verileri Koruma Kurumu (KVK Kurumu), and the CISOs of the organizations whose data appears in the corpus. Beyondmemory will share, at no cost, our methodology, our detection rules, and, where a lawful basis to do so can be agreed, indicator lists and per-document attribution evidence with any of the above bodies on written request. The public-facing version of this research is intentionally redacted; the private-cooperation version is not.
What to do about it
Six interventions, in increasing order of difficulty.
1. Stop allowing developer workstations to reach public paste services. This is one line in an egress proxy configuration. Most mature enterprise networks already have the proxy; they have simply never pointed it here.
2. Move the formatting to local tooling. Modern IDEs format JSON with no network call. Offline browser extensions exist. There is no defensible reason for a developer at a regulated organization to send a production payload to a stranger's server in 2026 to add indentation.
3. Treat the AI coding-assistant workflow as a sensitive egress channel. When an engineer copies a payload to feed an assistant, make sure the destination is the assistant, not the assistant and the formatter and whichever screenshot tool was open at the time. Policy is the cheapest intervention here; tooling is the second cheapest.
4. For regulators (the KVK Kurumu, BDDK, BTK, EPDK, and equivalents elsewhere): treat "paste to a third-party web service" as the covered processing activity it already is. The number of documents in this corpus that would satisfy a KVKK Article 12 breach-notification threshold is non-trivial; the number whose controllers know about the exposure today is near zero. The gap between those two numbers is the entire point.
5. For national CERTs (USOM and equivalents): treat public paste tooling as a routine OSINT feed. The cost is a few weeks of engineering. The yield is continuous indicator-of-compromise visibility for a constituency that does not currently report this class of exposure, because it does not know the exposure occurred.
6. For the operator: the public listing is a policy choice; the stored XSS is
not. The "Recent Links" surface is a design decision you are entitled to make,
even if we think it is the wrong one. The cross-site-scripting flaw we found is not
a decision, it is a defect. Encode paste output for its HTML context, add a
Content-Security-Policy, retire the keyword blocklist, and confirm that
<a href="http://codebeautify.org" rel="nofollow">codebeautify.org</a> and the sibling tools sharing the same viewer are fixed in the
same pass.
Closing
The corpus exists. It will keep growing for exactly as long as the operator leaves its listing surface public, which by every indication it intends to do. The threat model has been demonstrated to be active by at least one party other than ourselves. The interventions that close the exposure are not novel, not expensive, and have been available for years.
For Türkiye specifically, the honest finding is not that the country is uniquely exposed. It is that the country is exposed in exactly the same way as everyone else: a citizen's ID and bank cards in one paste, a welder's certificate and national ID, a brokerage balance, and a few hundred more documents carrying citizens' national IDs. The legal framework that already governs all of it, KVKK Article 12, is not yet being applied to this channel, because nobody had been measuring the channel. We measured it.
The question for the institutions reading this is not whether the exposure exists; the corpus answers that. The question is whether the response begins now, or after the next research firm, ours or someone else's, publishes the same paragraph, with your data in it, six months from now.
Private cooperation channel: <a href="mailto:info@beyondmemory.io">info@beyondmemory.io</a> (monitored).




