Using the Cloud

Please Note: This Page Is In Progress

This means, among other things, that:

  • Some of the content is not fleshed out, so you should not read more into things than exactly what is there.
  • Some sections might have things marked as “TODOs” (e.g., questions or things that must be done). These TODOs should not be taken to be representative of truth in any respect, and indicate areas that need more research and thought. If you have particular knowledge in things related to these, you can help! (Please see: contribution guidelines).
  • There probably will not be any section that pulls everything together in an easily understandable way.

This does not mean that:

  • I am not firmly convinced of the veracity of all the content currently published. If I am not sure of something, I don’t push it to the website. (This doesn’t mean that I won’t ever change my positions if I come to learn that I am in error, but that I strive, as much as possible, to only push content to the website if I am absolutely certain that it is true).
  • This page cannot be helpful to you in its present form. If you are aware of the limitations of the current state, you may find this page helpful long before I officially publish it.

Motivation

Some people want to use cloud storage options for collaboration; some people want to use cloud storage options for storing images of VMs; and some people want a simple way to make sure that they have family pictures and videos backed up for future access. All of these are equally valid and sensible uses of the cloud. But they are definitely not identical usages. We shouldn’t expect one service to accomplish all things inherently (even though it would be possible in theory).

Collaboration

Since I only work with text based datatypes and live in my editors (NeoVim, Emacs for Org mode, IntelliJ/PyCharm/etc. for code), Floobits looks like a good collaboration choice. Far superior to hackish support via Dropbox concurrent editing, and far less crippled than Google Docs or Office online. (Not because these are necessarily crippled for most people, but they are in terms of the Org mode files and code files I work with/plan to work with).

Sync

I have a primary work tablet with a 128 GB SSD. Due to installed software (some of which includes HD images of Greek manuscripts that take up a lot of space), I only have about 25 GB of usable space from the get go.

After bumping into the idea of a cloud file system that only downloads files locally during use (e.g., ODrive, Dropbox’s smart sync, Google’s file stream), I was sold instantly. Why should I buy a whole bunch of physical storage if can get by with keeping most of files in the cloud? I was going to keep a set of my files in the cloud anyway for backup purposes and irregular access (from a computer I don’t own and what have you), so why not just forgo expensive internal SSDs to begin with?

This is different from mounting remote filesystems through SFTP with a persistent connection (cf. SSHFS). You only need an internet connection when downloading copies locally and when syncing these local copies back up to the cloud. Unstable connections do not present as much of a problem for the local + sync model as they do for the remote file system model.

General cloud storage

  • https://www.cloudwards.net/comparison/
  • block-level sync is mandatory for performance reasons. If you change one byte in a 4 GB file, why you should you need to resync the whole file? That’s dumb.
  • “smart sync” (sync only what you need, dynamically) is important, in my opinion. If not automatic, absolutely must be scriptable with some sort of API.
  • link sharing (expiration, passwords), app integrations, Zapier/IFTTT support, (zero-knowledge) encryption, etc. are all also factors.

P2P is maybe a good idea? Over LAN?

Deduplication

Cheap storage options

Inotify + rsync idea

ZFS replication as an ideal solution? Fascinating stuff.

Here are some more links in this area from another round of research. Most of this is still a bit above me:

ZFS vs btfrs performance

ZFS on Linux performance (vs. something like FreeBSD)

Rsync.net

  • https://www.rsync.net/index.html
  • Apparently you get root access inside your slice of the pie. But can you run a shell if you SSH in? Install stuff (Python, CLIs for stuff, etc.) like a normal VM?
  • Would allow for ZFS replication workflow
  • Support from real sysadmins not phone-answering people
  • Need to email them for more info.

Legit Linux servers for sure

  • https://www.linode.com/
  • Expensive for just cloud storage. But you could definitely host a website/webapp/etc. on this. Again, need to figure out exactly what you can do on Rsync.net (to see if it is a full-blown VM that you can use over SSH or something more crippled).

SSHFS

  • might be too slow? Not so bad if you use a stream cipher? Is RC4 secure (is there something better?)? http://www.admin-magazine.com/HPC/Articles/Sharing-Data-with-SSHFS
  • Could always have the cloud mounted and then ZFS replicate over files you know you are going to use to local? Mirror directory structure and one-way copy with rsync maybe?

Compression and encryption

  • Save bandwidth. Tradeoff with CPU cycles/processing time? Faster to just transfer the files straight?
  • Does zero-knowledge encryption slow things down? Can you do block-level sync with encrypted archives? Would VeraCrypt work?
  • Assuming a secure datacenter does encryption even matter?

Multiple clouds

  • Is geographic redundancy necessary or just statistically a waste of money (and electricity etc. on the environmental side from duplicate servers)?


comments powered by Disqus