Skip to main content

Questions tagged [data-dump]

This tag is about the quarterly creative commons data dumps of all public data in the Stack Exchange network Q&A sites.

10 votes
0 answers
171 views

What is a pre vote?

As seen in New Vote Types in latest data dump?, some new vote types appeared (likely) exclusively in the Stack Overflow data dump. A couple users helped find out what each one meant. However, what ...
Тyma Gaidash's user avatar
12 votes
1 answer
169 views

Checksums for data dumps should be included in the dump announcement and on the dump download page itself

Following on from my previous request for current checksums I'm looking at 2 use cases for a data dump: verifying if a current download for a data dump is correct verifying if a specific historical ...
Journeyman Geek's user avatar
16 votes
2 answers
1k views

Data Dumps - updates and bug fixes

Thanks to everyone who posted bug reports and feature requests related to the updated data dumps process. Below, we’ve detailed some work on those reports and requests. Issues reported on this post: ...
Berthold's user avatar
  • 1,877
9 votes
1 answer
239 views

Am I allowed to publicly reshare some JSON file containing SE data created after the introduction of the new data dump process?

I ran across some JSON (magnet link) containing SE data that was created after the introduction of the new data dump process. Am I allowed to publicly reshare it (e.g., on https://archive.org), or ...
Franck Dernoncourt's user avatar
21 votes
3 answers
482 views

Latest Data Dump has invalid XML and invalid characters

As I have been looking through the latest StackExchange data dump, it seems like a non-compliant XML serializer was used. There are numerous escape sequences that are simply invalid XML such as &#...
Maxwell175's user avatar
17 votes
1 answer
348 views

New Vote Types in latest data dump?

As I was looking through the latest StackExchange data dump, I noticed that in the StackOverflow dump there are a few new VoteTypeIds: 19,29,30,31,32,33 There is no mention of them in SEDE: https://...
Maxwell175's user avatar
6 votes
0 answers
130 views

Individual data dump filenames should contain the dates they were generated

Historically, the data dumps hosted by clearbits or the internet archive were a single, monolithic 'set', and having a single datestamp for the collection made sense. With the current system you're ...
Journeyman Geek's user avatar
10 votes
1 answer
375 views

How can Stack Exchange prove a violation to the Data Dump download agreement?

In the last few weeks there has been lot of talks about the new Data Dump process. From people asking if the dumps are watermarked to people pointing out that the user agreement for the download ...
SPArcheon - on strike's user avatar
16 votes
1 answer
446 views

My consent to the data-dump is not stored server side

When you ask me to consent to certain conditions like I understand that this file is being provided to me for my own use and for projects that do not include training a large language model (LLM), ...
rene's user avatar
  • 91.5k
13 votes
1 answer
262 views

Are the new data dumps watermarked?

Are the new data dumps watermarked? I.e., can someone identify which user downloaded them?
Franck Dernoncourt's user avatar
32 votes
3 answers
381 views

Creative Commons License (BY-SA) Violation: Data dump must not force users to agree to additional terms

The new mechanism for downloading the Data Dump appears to be be live. It still has a checkbox that requires acknowledgement of "additional terms" in addition to those included in the ...
AMtwo's user avatar
  • 13.9k
12 votes
0 answers
319 views

One of my site data dumps appears to be giving a false positive for having a virus - what should I do?

I'm a moderator at Super User, and downloaded the dumps for that site. The 'main' dump's fine. The dump for meta failed because Windows detected it as having a virus. Since I got it from SE (and other ...
Journeyman Geek's user avatar
21 votes
2 answers
411 views

Could checksums be provided for the july 2024+/'new' SE hosted data dumps?

One of the nice things with the previous 'torrent' based downloads of the dump, and some of the internet archive front ends was that there were built in ways to verify that a download was correct. ...
Journeyman Geek's user avatar
16 votes
1 answer
251 views

The per-site data dump downloads are missing metadata files

When the data dumps were hosted on the Internet Archive, it was a torrent that consisted of the torrent file, a number of 7z files for each site, some images for branding, and two text files - readme....
Thomas Owens's user avatar
  • 54.1k
11 votes
2 answers
268 views

How to get SE sstatic images for my website

I've written about 2,500 questions and answers in Stack Exchange sites. About half of them I scrape into my website on GitHub Pages for customized presentation and searching. A short time ago all the ...
WinEunuuchs2Unix's user avatar

15 30 50 per page
1
2 3 4 5
35