[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Looking for document and file organisation tools



On 03/03/2015 18:13, Hendrik Boom wrote:
What free software is there in the way of organizing lots of documents?

To be more precise, the ones I *need* to organize are the files on hard 
drives, though if I could include documents I have elsewhere (bookshelves 
and photocopy files) I wouldn't mind.  They are text documents in a 
variety of file formats and languages, source code for current and 
obsolete systems, jpeg images, film clips, drawings, SVG files, files, 
object code, shared libraries, fragments of drafts of books,  ragged 
software documentation, works in progress ...

And I'm not looking for one single solution that will do everything I'd 
like.  Indeed, I suspect that's impossible without building an entirely 
new OS.  Which I'm not likely to find off the shelf, nor am I likely to 
be able to do it myself in the few decades I may have left in my life.
And even if it were feasible, there's probably a lot of research to be 
done before we even know what such a thing should actually do.

Of course the files are already semi-organized in directories.  But I 
haven't yet managed to find a suitable collection of directory names.  
Hierarchical classification isn't ideal -- there are files that fit in 
several categories, and there are a lot files that have to be in a 
particular location because of the way they are used (executables in a 
bin directory, for example) or the way they are updated or maintained.

Of course the taxonomists would advise setting up a controlled vocabulary 
of tags and attaching tags to the various files.  I'd end up with   
triples store or some other database describing files.

But how to identify the files being tagged?  A file-system pathname isn't 
enough.  Files get moved, and sometimes entire directory trees full of 
files get moved from one place to another for various pragmatic reasons.  
And a hashcode isn't enough.  files get edited, upgraded, recompiled, 
reformatted, converted from JIS code to UTF-8, and so forth.  Images get 
cropped and colour-corrected.  And under these changes they should keep 
their assigned classification tags.

Now a number of file formats can accommodate metadata.  And some software 
that manipulates files can preserve metadata and even allow user editing 
of the metadata.  But more doesn't.

Much of it could perhaps be done by auttomatic content analysis.  Other 
material may require labour-intensive manual classification.

No I don't expect to see any off-the-shelf solution for all of this.

But does anyone have ideas as to how to accomplish even some of this?  
Even poorly?

Does anyone know of relevant practical tools?  Or have ideas towards 
tools that *should* exist but currently don't?

I'm ready to experiment.

-- hendrik


For tagging your files, have you seen tmsu (http://tmsu.org/)? The homepage says:

TMSU is a tool for tagging your files. It provides a simple command-line tool for applying tags and a virtual filesystem so that you can get a tag-based view of your files from within any other program.

TMSU does not alter your files in any way: they remain unchanged on disk, or on the network, wherever you put them. TMSU maintains its own database and you simply gain an additional view, which you can mount, based upon the tags you set up. The only commitment required is your time and there's absolutely no lock-in.

Never used it myself. I’m not sure how it handles moving/renames of files, which is one of your concerns.  Maybe there’s something planned in it for that. At least it makes the tagged filesystem available in any program, which is quite convenient I think.

Reply to: