Using the Cloud

Please note: this page is in progress

Unlike my personal website where I publish pages that are really in progress — with TODOs floating around, fragmentary thoughts, and much unpolish — any given in progress page on this ministry website is really only in progress insofar as I have not finished writing all the content that I expect to be eventually located on the page. That is to say, everything that is published on the page is already complete, edited, and checked-over for accuracy and correctness, but there is still more planned writing on the page to be completed.

I'm an outliner when I write, so how this plays out in practice is that I will fill in the outline skeleton (as displayed in the table of contents) with content over time, until the whole page is eventually complete.

Motivation

Some people want to use cloud storage options for collaboration; some people want to use cloud storage options for storing images of VMs; and some people want a simple way to make sure that they have family pictures and videos backed up for future access. All of these are equally valid and sensible uses of the cloud. But they are definitely not identical usages. We shouldn’t expect one service to accomplish all things inherently (even though it would be possible in theory).

Collaboration

Since I only work with text based datatypes and live in my editors (NeoVim, Emacs for Org mode, IntelliJ/PyCharm/etc. for code), Floobits looks like a good collaboration choice. Far superior to hackish support via Dropbox concurrent editing, and far less crippled than Google Docs or Office online. (Not because these are necessarily crippled for most people, but they are in terms of the Org mode files and code files I work with/plan to work with).

Sync

I have a primary work tablet with a 128 GB SSD. Due to installed software (some of which includes HD images of Greek manuscripts that take up a lot of space), I only have about 25 GB of usable space from the get go.

After bumping into the idea of a cloud file system that only downloads files locally during use (e.g., ODrive, Dropbox’s smart sync, Google’s file stream), I was sold instantly. Why should I buy a whole bunch of physical storage if can get by with keeping most of files in the cloud? I was going to keep a set of my files in the cloud anyway for backup purposes and irregular access (from a computer I don’t own and what have you), so why not just forgo expensive internal SSDs to begin with?

This is different from mounting remote filesystems through SFTP with a persistent connection (cf. SSHFS). You only need an internet connection when downloading copies locally and when syncing these local copies back up to the cloud. Unstable connections do not present as much of a problem for the local + sync model as they do for the remote file system model.

General cloud storage

  • https://www.cloudwards.net/comparison/
  • block-level sync is mandatory for performance reasons. If you change one byte in a 4 GB file, why you should you need to resync the whole file? That’s dumb.
  • “smart sync” (sync only what you need, dynamically) is important, in my opinion. If not automatic, absolutely must be scriptable with some sort of API.
  • link sharing (expiration, passwords), app integrations, Zapier/IFTTT support, (zero-knowledge) encryption, etc. are all also factors.

P2P is maybe a good idea? Over LAN?

Deduplication

Cheap storage options

Inotify + rsync idea

ZFS replication as an ideal solution? Fascinating stuff.

Here are some more links in this area from another round of research. Most of this is still a bit above me:

ZFS vs btfrs performance

ZFS on Linux performance (vs. something like FreeBSD)

Rsync.net

  • https://www.rsync.net/index.html
  • Apparently you get root access inside your slice of the pie. But can you run a shell if you SSH in? Install stuff (Python, CLIs for stuff, etc.) like a normal VM?
  • Would allow for ZFS replication workflow
  • Support from real sysadmins not phone-answering people
  • Need to email them for more info.

Legit Linux servers for sure

  • https://www.linode.com/
  • Expensive for just cloud storage. But you could definitely host a website/webapp/etc. on this. Again, need to figure out exactly what you can do on Rsync.net (to see if it is a full-blown VM that you can use over SSH or something more crippled).

SSHFS

  • might be too slow? Not so bad if you use a stream cipher? Is RC4 secure (is there something better?)? http://www.admin-magazine.com/HPC/Articles/Sharing-Data-with-SSHFS
  • Could always have the cloud mounted and then ZFS replicate over files you know you are going to use to local? Mirror directory structure and one-way copy with rsync maybe?

Compression and encryption

  • Save bandwidth. Tradeoff with CPU cycles/processing time? Faster to just transfer the files straight?
  • Does zero-knowledge encryption slow things down? Can you do block-level sync with encrypted archives? Would VeraCrypt work?
  • Assuming a secure datacenter does encryption even matter?

Multiple clouds

  • Is geographic redundancy necessary or just statistically a waste of money (and electricity etc. on the environmental side from duplicate servers)?