Whistleblower drops 'largest ever' ICE leak to unmask agents: 'The last straw'

ByteOnBikes@discuss.online · 3 days ago

Whistleblower drops 'largest ever' ICE leak to unmask agents: 'The last straw'

partofthevoice@lemmy.zip · 3 days ago

I have been trying to access the site for over an hour. I don’t know if it’s authentic trader or a DDOS, but only the front page loads. The wiki subdomain (where the list is) returns 503 server error every time.

thermal_shock@lemmy.world · 2 days ago

Worked on my VPN to Mexico City. Slow but loading. Trying to export all the data.

partofthevoice@lemmy.zip · 2 days ago

If you get any of it, would you mind sharing what you can?

We might be able to patch something together given multiple efforts. No telling how long the attack will last, but we can hold steady with slow progress.

thermal_shock@lemmy.world · 2 days ago

it’s like half up. here is some. the special:export is kinda working, but i’m very new at ripping wiki sites

https://archive.is/ipv5h

partofthevoice@lemmy.zip · 2 days ago

Any chance you know Python, or maybe you can have ChatGPT write a script that uses bs4 to parse through each of the name hyperlinks in the html? A simple loop should be able to visit each page, cache the results on disk, and finish up eventually.

The link doesn’t work where I am, still.

thermal_shock@lemmy.world · edit-2 1 day ago

it’s very slow for me. i’m looking into python and all the other stuff. if you one specifically that will scrap the whole site, i’m down to run it. any OS except mac i can use. i can’t copy/paste each agent page, looking to scrap the whole site at once

edit!

i think i have it scraping it all to html files. it’s taking time, just trying to make sure it gets the agents and incidents at least.

partofthevoice@lemmy.zip · 1 day ago

I see the edit, sweet!

I wouldn’t be able to write the script myself because I can’t load the site at all. So I can’t analyze the html and determine how to loop through it an extract all the additional links for scraping. Unless you want to send me the html. Then I can work with that, but without testing on the live site.

If you have something working though, great! Let’s just start there.

Check by opening one of the html files with your browser, or VSCode or something, and see if it has the data you were expecting.

thermal_shock@lemmy.world · 1 day ago

the files look correct, just html version of the pages, just not sure if it’s going to crawl and get every page. i don’t even are if it’s just a folder of html file sat this time, as long as all the agents are retrived. will keep testing.

partofthevoice@lemmy.zip · 1 day ago

I just set up a little bot in Jupyter Notebook to get them as well, and it’s working! Very slow, but working. I just fetched all the links to the agent detail pages. There are 1532 total. Once I fetch all the detail pages, I’ll move on to the incidents too.

Definitely keep pulling yourself, though. Who knows if I will be able to get them all?

Anything else worth grabbing while I can?

CaptainPedantic@lemmy.world · 3 days ago

I’ve been having issues for a while too.

primalmotion@lemmy.ml · 2 days ago

It had a burst and gave a few actual data eventually, but I think it’s just victim of its own success. People really want that list

muusemuuse@sh.itjust.works · edit-2 2 days ago

I can poke in a page load sometimes but it’s not going great. We need to coordinate a scrape of the site and torrent it out before it’s shut down.

Dudes doing good work but they will absolutely kill him. Grab that data.

0ops@piefed.zip · 2 days ago

Hopefully somebody who has it gets a torrent up and seeding

Whistleblower drops 'largest ever' ICE leak to unmask agents: 'The last straw'

Whistleblower drops 'largest ever' ICE leak to unmask agents: 'The last straw'

403 Forbidden