Questions tagged [data-dump]

This tag is about the quarterly creative commons data dumps of all public data in the Stack Exchange network Q&A sites.

10 votes
0 answers

What is a pre vote?

As seen in New Vote Types in latest data dump?, some new vote types appeared (likely) exclusively in the Stack Overflow data dump. A couple users helped find out what each one meant. However, what ...
12 votes
1 answer

Checksums for data dumps should be included in the dump announcement and on the dump download page itself

Following on from my previous request for current checksums I'm looking at 2 use cases for a data dump: verifying if a current download for a data dump is correct verifying if a specific historical ...
16 votes
2 answers

Data Dumps - updates and bug fixes

Thanks to everyone who posted bug reports and feature requests related to the updated data dumps process. Below, we’ve detailed some work on those reports and requests. Issues reported on this post: ...
  • 1,877
9 votes
1 answer

Am I allowed to publicly reshare some JSON file containing SE data created after the introduction of the new data dump process?

I ran across some JSON (magnet link) containing SE data that was created after the introduction of the new data dump process. Am I allowed to publicly reshare it (e.g., on, or ...
21 votes
3 answers

Latest Data Dump has invalid XML and invalid characters

As I have been looking through the latest StackExchange data dump, it seems like a non-compliant XML serializer was used. There are numerous escape sequences that are simply invalid XML such as &#...
17 votes
1 answer

New Vote Types in latest data dump?

As I was looking through the latest StackExchange data dump, I noticed that in the StackOverflow dump there are a few new VoteTypeIds: 19,29,30,31,32,33 There is no mention of them in SEDE: https://...
6 votes
0 answers

Individual data dump filenames should contain the dates they were generated

Historically, the data dumps hosted by clearbits or the internet archive were a single, monolithic 'set', and having a single datestamp for the collection made sense. With the current system you're ...
10 votes
1 answer

How can Stack Exchange prove a violation to the Data Dump download agreement?

In the last few weeks there has been lot of talks about the new Data Dump process. From people asking if the dumps are watermarked to people pointing out that the user agreement for the download ...
16 votes
1 answer

My consent to the data-dump is not stored server side

When you ask me to consent to certain conditions like I understand that this file is being provided to me for my own use and for projects that do not include training a large language model (LLM), ...
  • 91.5k
13 votes
1 answer

Are the new data dumps watermarked?

Are the new data dumps watermarked? I.e., can someone identify which user downloaded them?
32 votes
3 answers

Creative Commons License (BY-SA) Violation: Data dump must not force users to agree to additional terms

The new mechanism for downloading the Data Dump appears to be be live. It still has a checkbox that requires acknowledgement of "additional terms" in addition to those included in the ...
  • 13.9k
12 votes
0 answers

One of my site data dumps appears to be giving a false positive for having a virus - what should I do?

I'm a moderator at Super User, and downloaded the dumps for that site. The 'main' dump's fine. The dump for meta failed because Windows detected it as having a virus. Since I got it from SE (and other ...
21 votes
2 answers

Could checksums be provided for the july 2024+/'new' SE hosted data dumps?

One of the nice things with the previous 'torrent' based downloads of the dump, and some of the internet archive front ends was that there were built in ways to verify that a download was correct. ...
16 votes
1 answer

The per-site data dump downloads are missing metadata files

When the data dumps were hosted on the Internet Archive, it was a torrent that consisted of the torrent file, a number of 7z files for each site, some images for branding, and two text files - readme....
  • 54.1k
11 votes
2 answers

How to get SE sstatic images for my website

I've written about 2,500 questions and answers in Stack Exchange sites. About half of them I scrape into my website on GitHub Pages for customized presentation and searching. A short time ago all the ...
