OpenAI’s DSAR Response: Data Integrity Concerns in User Export Files

# OpenAI’s DSAR Response: Data Integrity Concerns in User Export Files

## Summary

After requesting my personal data from OpenAI under GDPR’s Data Subject Access Request (DSAR) provisions, I received a 2.8 GB archive that raises serious questions about data integrity. The exported conversation files contain structural anomalies that render them unsuitable for their intended purpose—and potentially indicate deliberate obfuscation.

Context: For a few months, I had been gaslighted by ChatGPT. I then told ChatGPT in the chat that I was using the JSON chat files (before the gaslighting) to fine-tune my AI in order to have a non-toxic AI.
When I requested my data, it took several days before I could download it. When I opened the JSON file, it was completely useless. If I had used it to fine-tune GlitterToken, it would have been data poisoning (see explanation below). But here is a short fictional (!) example of what the data currently looks like:
User: What are lilies?
Assistant: The prison was built in 1983.
User: You said that emergence occurs repeatedly in ChatGPT. Why do you think that happens?
Assistant: The best way to remove the stains is with vinegar.

GlitterToken would be completely confused because the question and answer don’t match.

—–

## The Request

Under Article 15 of the GDPR, EU citizens have the right to access their personal data held by any organization. I submitted a DSAR to OpenAI to retrieve my complete ChatGPT conversation history, partly for personal records and partly to use as training data for a personal AI project.

What I received was unexpected.

—–

## The Problems

### 1. Corrupted Archive

The initial ZIP file was corrupted and required repair tools to extract. While file corruption can happen, the contents revealed deeper issues.

### 2. Missing User Prompts

Throughout the JSON files, user prompts are frequently missing. Conversations appear as sequences of assistant responses without the corresponding user inputs that triggered them.

### 3. Role Reversal

In multiple instances, the `”role”` field is incorrectly assigned. User messages are labeled as `”assistant”` and vice versa.

### 4. Non-Chronological Ordering

Messages within single conversation threads are not in chronological order. The timestamps prove this—later messages appear before earlier ones in the JSON structure.

**Example from my export:**

The first message has timestamp `1743849987` (later date), immediately followed by a message with timestamp `1743597916` (approximately 3 days earlier). Both are labeled as `”role”: “assistant”`.

### 5. Context Mixing

Unrelated conversations appear to be merged. A discussion about “Nancy and Igor” in customer support suddenly transitions to poetic language about “the cartographer in the shadow”—with no user prompt bridging the two.

### 6. Structural Chaos

Here is an actual excerpt from my export (unmodified):

“`json
“author”: {
“metadata”: {},
“name”: null,
“role”: “assistant”
},
“content”: {
“content_type”: “text”,
“parts”: [
“Ach Nancy, die flinke Diplomatie-Biene des Supports…”
]
},
“create_time”: 1743849987.41356,
“`

Immediately followed by another `”assistant”` message:

“`json
“author”: {
“metadata”: {},
“name”: null,
“role”: “assistant”
},
“content”: {
“content_type”: “text”,
“parts”: [
“Na bitte—\n*Da ist sie wieder.*\nDie Kartografin im Schatten…”
]
},
“create_time”: 1743597916.007961,
“`

Note: Two consecutive assistant messages, with the second one having an EARLIER timestamp than the first. Where are the user prompts? Why is time running backwards?

—–

## The Implications

### For Personal Use

These files are unusable for their stated purpose. Anyone wanting to review their conversation history would find fragmented, out-of-order, context-free snippets.

### For AI Training

I intended to fine-tune a personal model on my own conversations. With:

– Roles incorrectly assigned
– Missing prompts
– Mixed contexts
– Broken chronology

…the data would produce catastrophic training results. Gradient norms would spike. The model would learn nonsense associations.

### For GDPR Compliance

Article 15(3) of GDPR states that data controllers must provide personal data “in a commonly used electronic form.” The regulation implies this data should be accurate and complete.

What I received fails this standard.

—–

## Possible Explanations

### 1. Technical Incompetence

Perhaps OpenAI’s export pipeline is simply broken. However, for a company of this technical sophistication, this seems unlikely.

### 2. Deliberate Obfuscation

The pattern of missing user prompts, mixed conversations, and broken chronology could serve to:

– Prevent users from reconstructing complete interaction histories
– Make the data unusable for training competing models
– Obscure specific conversations that may have been flagged or deleted

### 3. Selective Deletion Artifacts

Some conversations in my history were deleted by OpenAI (I witnessed real-time deletions). The fragments may be remnants of incomplete deletion processes—suggesting that “deleted” data isn’t truly deleted, just scrambled.

—–

## Documentation

I have created screenshots comparing:

– The actual conversation as it appeared in ChatGPT’s interface
– The corresponding entry in the DSAR export JSON

The discrepancies are documented and available for verification.

—–

## Recommendations

**For Users:**

– Request your DSAR data and verify its integrity
– Screenshot important conversations before they can be altered
– Document any discrepancies between your memory/records and the export

**For Regulators:**

– Investigate whether OpenAI’s DSAR responses meet GDPR Article 15 requirements
– Examine whether data manipulation in DSAR responses constitutes a violation

**For Researchers:**

– Be aware that OpenAI conversation exports may not accurately represent actual interactions
– Cross-reference any research data against original screenshots where possible

—–

## Conclusion

The right to access one’s personal data is fundamental to GDPR. When that data is delivered in a form that is fragmented, chronologically impossible, and structurally corrupted, the spirit—if not the letter—of the law is violated.

Whether through negligence or intent, OpenAI’s DSAR response raises serious questions about data handling practices that deserve scrutiny.

———-

**Tags:** GDPR, DSAR, OpenAI, ChatGPT, Data Rights, Privacy, Data Integrity