• 01 December 2024 (3 messages)
  • @kiwifarms #388 04:30 AM, 01 Dec 2024
    Update
    The restore has been a nightmare, complicated by the fact I am mid-packing for my trip in December before I return to the US. The TL;DR is that the import is running and I think I've sorted out any issues preventing it from restoring this time.

    The Kiwi Farms's raw backup file is over 60GiB of pure text. Trying to restore this under any conditions is a slow and painful process. Our situation is complicated manifold by having to run our own hardware which I am solely responsible for at approximately 6000 miles away.

    At the beginning of the year, our original server died while being handled by remote hands. Somehow, the motherboard died. I've known people who've spent 20 years handling servers who have never seen a motherboard die. People refused to believe me when I said that was the actual problem.

    We bought a new server off eBay and moved all the regular SATA SSDs over to it. These are very high capacity SSDs that are raided for extra redundancy, because if I need one video of Destiny gobbling cock, I need it four times. This highly resilient setup has a trade-off: much higher read speeds, much slower write speeds.

    This server (which identifies its mainboard as the H12DSi-N6) came off eBay with two different sized M.2 SSDs (they are fast). We did not raid these because raiding NVMe actually slows it down. Instead, the smaller was made a boot drive, and the larger was made a MySQL drive. This worked great.

    The MySQL drive melted within the first month, marking the latest victim of the Kiwi Farms hardware curse. There are years of history of drives failing spontaneously and I swear to fucking god it's not my fault.

    In the emergency, I reimported the database then to the storage RAID. I made the decision that putting an intensive process on the boot drive was a bad idea. This worked OK for a while, but eventually I had to start putting in fixes to reduce I/O. Having a very read-optimized RAID sharing work with the write-heavy database caused problems. I also had to adjust the database to write less - the draw off being that if there's a crash, there's more data loss if you write less.

    You can probably detect where this is going. Add in several thousand people trying to download a video of a goblin sucking dick along with a database already struggling to handle normal traffic, and you've got the potential energy for a disaster. A table locked up (the alerts table, which is millions of rows long) and in trying to get it to unlock the InnoDB structure completely collapsed.

    After trying to fix it, I decided I would re-import it. This is not an easy decision to come to. The import is very, very slow because you're writing hundreds of millions of rows into a slow RAID. At a certain point, the import crashed. I became impatient and tried against to import, this time into the boot NVMe thinking it would be temporary until I can replace the NVMe I should have replaced months ago. Attempt #2 didn't work. The database is now so big it does not fit on that drive. Attempt #3 was attempting again to import on the slow raid. That didn't work. I fixed something and now we're on Attempt #4. Each of these attempts is at least two hours of time.

    As it turns out, the motherboard has spaces for four M.2 slots, when I thought it had only two (one spare). I misremembered because it only came with two. I could have fixed this with zero downtime months ago but I never did, in part because I've never actually seen this server, and not knowing what I have is a huge detriment to being able to proactively solve problems.

    I very much look forward to having a server in driving distance at some point in my near future, but even that has its own slew of problems because I have no idea how I'm going to move it to the east coast without 2 continuous days of downtime, and I really don't want to spend 4 to 5 figures on a new one just to avoid that.

    Computers suck. Become an electrician or plumber. Do not become a computer person.
  • @kiwifarms #389 02:11 PM, 01 Dec 2024
    I have completely run out of time. We're waiting on 200 MILLION+ rows of xf_user_alerts importing over 8 fucking hours. All of this shit is just "Dipshit faggot retard rated your post autistic" over and over and over again for 10 years.

    There will be 2 continuous days of downtime. I will not have a chance to finish the repair today.
  • 02 December 2024 (1 messages)
  • @kiwifarms #391 07:11 AM, 02 Dec 2024
    test shit, i have 30 minutes
  • 05 December 2024 (1 messages)
  • @kiwifarms #392 09:23 AM, 05 Dec 2024
    One of our providers is conducting scheduled maintenance.
  • 11 December 2024 (2 messages)
  • @kiwifarms #393 01:23 PM, 11 Dec 2024
    I am aware of an outage related to the DNS certificates. I am working on it now.
  • @kiwifarms #394 01:29 PM, 11 Dec 2024
    Renewed, refresh. CTRL+F5 if that doesn't work.