git-annex's high-level design is mostly inherent in the data that it stores in git, and alongside git. See internals for details.

See encryption for design of encryption elements.

It's a linux kernel so perhaps another option would be to create a big file and mount -o loop
Comment by gdr-go2 Mon May 28 18:12:10 2012
I think it's already on the list: "configurable option to only annex files meeting certian size or filename criteria" -- files not meeting those criteria would just be git added.
Comment by joeyh.name Mon Jun 4 19:46:03 2012
Jimmy, I hope to make it as easy as possible to install. I've been focusing on getting it directly into popular Linux distributions, rather than shipping my own binary. The OSX binary is static, and while I lack a OSX machine, I would like to get it easier to distribute to OSX users.
Comment by joeyh.name Mon Jun 4 19:45:00 2012
Yes, it's in git with the rest of git-annex. Currently in the watch branch.
Comment by joeyh.name Sat Jun 9 23:01:29 2012

Assuming you're storing your encrypted annex with me and I with you, our regular cron jobs to verify all data will catch corruption in each other's annexes.

Checksums of the encrypted objects could be optional, mitigating any potential attack scenarios.

It's not only about the cost of setting up new remotes. It would also be a way to keep data in one annex while making it accessible only in a subset of them. For example, I might need some private letters at work, but I don't want my work machine to be able to access them all.

Comment by Richard Tue Apr 5 23:24:17 2011
Thanks, that's already been useful to me. You might as well skip the debian-specific "bpo" tags too.
Comment by joeyh.name Sat Jun 9 18:07:51 2012

@Richard the easy way to deal with that scenario is to set up a remote that work can access, and only put in it files work should be able to see. Needing to specify which key a file should be encrypted to when putting it in a remote that supported multiple keys would add another level of complexity which that avoids.

Of course, the right approach is probably to have a separate repository for work. If you don't trust it with seeing file contents, you probably also don't trust it with the contents of your git repository.

Comment by joey Thu Apr 7 19:59:30 2011

I always appreciate your OSX work Jimmy...

Could it be put into macports?

Comment by joeyh.name Fri Jun 8 01:56:52 2012
I'd agree getting it into the main distros is the way to go, if you need OSX binaries, I could volunteer to setup an autobuilder to generate binaries for OSX users, however it would rely on users to have macports with the correct ports installed to use it (things like coreutils etc...)
Comment by Jimmy Thu Jun 7 20:22:55 2012
Are you publishing the source code for git-annex assistant somewhere?
Comment by Matt Sat Jun 9 22:34:30 2012

New encryption keys could be used for different directories/files/patterns/times/whatever. One could then encrypt this new key for the public keys of other people/machines and push them out along with the actual data. This would allow some level of access restriction or future revocation. git-annex would need to keep track of which files can be decrypted with which keys. I am undecided if that information needs to be encrypted or not.

Encrypted object files should be checksummed in encrypted form so that it's possible to verify integrity without knowing any keys. Same goes for encrypted keys, etc.

Chunking files in this context seems like needless overkill. This might make sense to store a DVD image on CDs or similar, at some point. But not for encryption, imo. Coming up with sane chunk sizes for all use cases is literally impossible and as you pointed out, correlation by the remote admin is trivial.

Comment by Richard Sun Apr 3 20:03:14 2011
I would find it useful if the watch command could 'git add' new files (instead of 'git annex add') for certain repositories.
Comment by svend [ciffer.net] Mon Jun 4 19:42:07 2012

In relation to macports, I often found that haskell in macports are often behind other distros, and I'm not willing to put much effort into maintaining or updating those ports. I found that to build git-annex, installing macports manually and then installing haskell-platform from the upstream to be the best way to get the most up to date dependancies for git-annex.

fyi in macports ghc is at version 6.10.4 and haskell platform is at version 2009.2, so there are a significant number of ports to update.

I was thinking about this a bit more and I reckon it might be easier to try and build a self contained .pkg package and have all the needed binaries in a .app styled package, that would work well when the webapp comes along. I will take a look at it in a week or two (currently moving house so I dont have much time)

Comment by Jimmy Fri Jun 8 07:22:34 2012
Good thought Jim. I've done something like that.
Comment by joeyh.name Thu Jun 7 04:48:15 2012

I see no use case for verifying encrypted object files w/o access to the encryption key. And possible use cases for not allowing anyone to verify your data.

If there are to be multiple encryption keys usable within a single encrypted remote, than they would need to be given some kind of name (a since symmetric key is used, there is no pubkey to provide a name), and the name encoded in the files stored in the remote. While certainly doable I'm not sold that adding a layer of indirection is worthwhile. It only seems it would be worthwhile if setting up a new encrypted remote was expensive to do. Perhaps that could be the case for some type of remote other than S3 buckets.

Comment by joey Tue Apr 5 18:41:49 2011

For the unfamiliar, it's hard to tell if a command like that would persist. I'd suggest being as clear as possible, e.g.:

Increase the limit for now by running:
  sudo sysctl fs.inotify.max_user_watches=81920
Increase the limit now and automatically at every boot by running:
  echo fs.inotify.max_user_watches=81920 | sudo tee -a /etc/sysctl.conf; sudo sysctl -p
Comment by Jim Thu Jun 7 03:43:19 2012
Actually, Dropbox giver you a warning via libnotify inotify. It tends to go away too quickly to properly read though, much less actually copy down the command...
Comment by Jo-Herman Wed Jun 6 22:03:29 2012
Will statically linked binaries be provided for say Linux, OSX and *BSD? I think having some statically linked binaries will certainly help and appeal to a lot of users.
Comment by Jimmy Sat Jun 2 12:06:37 2012
When I work on the webapp, I'm planning to make it display this warning, and any other similar warning messages that might come up.
Comment by joeyh.name Wed Jun 6 23:25:57 2012

It's not much for now... but see http://www.sgenomics.org/~jtang/gitbuilder-git-annex-x00-x86_64-apple-darwin10.8.0/ I'm ignoring the debian-stable and pristine-tar branches for now, as I am just building and testing on osx 10.7.

Hope the autobuilder will help you develop the OSX side of things without having direct access to an osx machine! I will try and get gitbuilder to spit out appropriately named tarballs of the compiled binaries in a few days when I have more time.

Comment by Jimmy Fri Jun 8 15:21:18 2012

Complete fsck is good, but once a week probably enough.

But please see if you can make fsck optional depending on if the machine is running on battery.

Comment by Richard Fri Jun 15 09:57:33 2012
But Rich is right, and I was thinking the same thing earlier this morning, that delaying the lsof allows the writer to change the file and exit, and only fsck can detect the problem then. Setting file permissions doesn't help once a process already has it open for write. Which has put me off the delayed lsof idea unfortunately. lsof could be run safely during the intial annexing.
Comment by joeyh.name Fri Jun 15 15:23:21 2012
In relation to OSX support, hfsevents (or supporting hfs is probably a bad idea), its very osx specific and users who are moving usb keys and disks between systems will probably end up using fat32/exfat/vfat disks around. Also if you want I can lower the turn around time for the OSX auto-builder that I have setup to every 1 or 2mins? would that help?
Comment by Jimmy Sun Jun 17 08:52:32 2012

Hey Joey!

I'm not very tech savvy, but here is my question. I think for all cloud service providers, there is an upload limitation on how big one file may be. For example, I can't upload a file bigger than 100 MB on box.net. Does this affect git-annex at all? Will git-annex automatically split the file depending on the cloud provider or will I have to create small RAR archives of one large file to upload them?

Thanks! James

Comment by James Mon Jun 11 02:15:04 2012

wasn't there some filesystem functionality that could tell you the amount of open file handles on a certain file? I thought this was tracked per-file too. Or maybe i'm just confusing it with the number of hard links (which stat can tell you), anyway something to look into.

Comment by dieter Fri Jun 15 08:21:37 2012

hfsevents seems usable, git-annex does not need to watch for file changes on remotes on other media.

But, trying kqueue first.

You could perhaps run the autobuilder on a per-commit basis..

Comment by joeyh.name Sun Jun 17 16:39:43 2012

Corner case, but if the other program finishes writing while you are annexing and your check shows no open files, you are left with bad checksum on a correct file. This "broken" file with propagate and the next round of fsck will show that all copies are "bad".

Without verifying if this is viable, could you set the file RO and thus block future writes before starting to annex?

Comment by Richard Fri Jun 15 10:21:17 2012
There's librsync which might support reporting the progress through its API, but it seems to be in beta.
Comment by abhidg [myopenid.com] Wed Jun 13 02:14:29 2012

@wichert All this inotify stuff is entirely linux specific AFAIK anyway, so it's find for workarounds to limitations in inotify functionality to also be linux specific.

@dieter I think you're thinking of hard links, filesystems don't track number of open file handles afaik.

@Jimmy, I'm planning to get watch going on freebsd (and hopefully that will also cover OSX), after merging it :)

@Richard, the file is set RO while it's being annexed, so any lsof would come after that point.

Comment by joeyh.name Fri Jun 15 15:14:52 2012

maybe at some point, your tool could show "warning, the following files are still open and are hence not being annexed" to avoid any nasty surprises of a file not being annexed and the user not realizing it.

Comment by dieter Sat Jun 16 09:14:26 2012
Yes, git-annex has to split files for certian providers. I already added support for this as part of my first pass at supporting box.com, see using box.com as a special remote.
Comment by joeyh.name Mon Jun 11 04:48:08 2012
I would also be reluctant to use lsof for the sake of non-linux systems or systems that don't have lsof. I've only been playing around with the watch branch of my "other" laptop under archlinux. It looks usable, however I would prefer support for OSX before the watch branch gets merged to master ;)
Comment by Jimmy Fri Jun 15 08:58:17 2012

Homebrew is a much better package manager than MacPorts IMO.

Comment by Matt Fri Jun 22 04:26:02 2012
A downside of relying on lsof is that you might be painting yourself into a linux corner: other operating systems might not have a lsof or alternative you can rely on. Especially for Windows this might be a worry.
Comment by Wichert Fri Jun 15 07:19:23 2012
Comments on this page are closed.