Paperless office/PDF/Document Management solutions

From: Dan Lyke 
We run an Ubuntu household. Paper is swamping us. We want to go to
paperless archives. I have almost convinced myself that current SANE
support for the Fujitsu ScanSnap iX500 is sufficient, and that I can
get "scanbd" or "scanbuttond" working with it. Run the output from
that through gocr, get some probably extremely noisy text files, along
with PDFs, I'm headed the right direction. [1]

The question then becomes "what do we do with it?"

I see 3 options:

1. A shared tree that we can search and that gets backed up.
2. git archives checked out on our various laptops
3. Some dedicated paperless office software solution.

#1 is appealing, except for two considerations:

First, what network file system do we use? I'd like something that
works from both inside and outside our network (or at least doesn't
hork up a hairball when we try to boot our machines outside our
network). Should this be:

*  NFS? (Seems like it breaks and requires new configuration with
every upgrade, definitely wouldn't work outside the network)
* SAMBA? (Never gotten this really working well, and putting this on
our server which has outside exposure scares the crap out of me)
* sshfs?

Second, what's the best technology for searching those .txt files that
accompany and were OCRed from the PDFs?

Any suggestions?




=============================================================== From: Dave Brockman ------------------------------------------------------ -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 My thoughts would be, whatever works best for your environment, if you are comfortable with NFS, then use NFS. I tend to use SMB because I have to interact with a lot of Windows machines. Whatever you use, go ahead and get rid of any thoughts of direct connections from outside your network. Use a VPN if you need to connect back. Outside of that, I have used a WEB-DAV front-end to an SVN repository for file storage before. Perhaps you could do something similar with GIT? You could also build your own dropbox (Owncloud?) and sync that way, then you only have to expose Owncloud. Didn't help you any there, did I? :) Regards, dtb - -- "Some things in life can never be fully appreciated nor understood unless experienced firsthand. Some things in networking can never be fully understood by someone who neither builds commercial networking equipment nor runs an operational network." RFC 1925 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (MingW32) Comment: Using GnuPG with Thunderbird - iQEcBAEBAgAGBQJSzDsPAAoJEMP+wtEOVbcdIHoH/imecoCRMS5ZSJS5mod+eBSL W9jaWYVQWgZ+nQuQ8i5fHHEz4fwrEAlgX2W4kN5/jVTZWBLEmROtW8f7VYwak517 EpF0zyy4L2WJFCrrWD5Dditz8d/h4vMry3MWPoJZIHZ8HsDwqChuOF9hTUK5U/xp JweFWjErLYJrj6JZwAMatXldCP2c1cHAasQbvm6rSYncNqLmNKWJhM2amGxpixFf 67FQ2Nrv12hhNS/yzS1qOpKXx7W947WetOf4wQ0KQkP+lbfGpatFaor8lBxApVDx 7f9tW2SP8BSX/8OUIOZuRhiwKc/niVM300yUTfW/I5BLMP3yR8NXIs4yMprzjSg= =60cf -----END PGP SIGNATURE-----

=============================================================== From: Wil Wade ------------------------------------------------------ So I have been working with heading our household to be more "paperless" as well although some of it is just digital duplicates of stuff we have to keep, but would want copies of if something happened to them. For now I have been using to make the pdfs which works well and (I think as I did not want it) has gocr support. Currently my storage is Google Drive + local backups (made easy with Google Takeout). Why? Because they will do all that ocr stuff for me and let me search it easily in with all the other documents my wife and I share and I have easy secure access from anywhere. If I were starting over, I would likely not care about ocr and go with storing them on my server that is backed up to drives that go into my fire safe with rsync (via BackInTime). If I were starting over with my backup I would likely switch to git on the server for versioning and the backup. Anyway hope that gives you some ideas. Wil

=============================================================== From: Lynn Dixon ------------------------------------------------------ I would suggest Owncloud and some of the OCR plugins that are available for it. Just scan your docs into your owncloud, and let the OCR plugins handle indexing it in the owncloud search.

=============================================================== From: Mike Harrison ------------------------------------------------------ I have a file drawer and a safe. Truly import paper goes into the safe: car titles, house papers.. etc.. More for fire protection than physical safety. Things for tax purposes (rental house stuff, etc..) goes into a small file folder.. in the safe. Other paper (bill receipts, etc.. ) goes into the file drawer. Just piled on top. It becomes a chronological file... and when it gets close to full, I pull out the bottom half or so, and destroy it (kindling mostly). Once every 3 to 6 months I need something from the drawer. Maybe. How often do you really need what you are attempting to catalogue and store?

=============================================================== From: Lynn Dixon ------------------------------------------------------ Neat Receipts offers all of these things, if you don't mind paying their premium for their hardware.

=============================================================== From: Dan Lyke ------------------------------------------------------ On Wed, 8 Jan 2014 14:43:33 +0000 (UTC) Mike Harrison wrote: Most likely: Never. Or come tax time (or, @DEITIES forfend, audit time). But every once in a while we need to go back to utility bills and say "wait, your system is fouled up" or something similar. In a best case we'd actually make better use of these documents: I can see slinging some code to detect receipts and trying to do auto-categorization, scanning and time-stamping business cards would be way more useful than having the physical items, that sort of thing. There are also a few alternate uses for the scanner. For instance: I'm running through some archives of square dancing resources, it'd be way easier for me to read 'em on the tablet, and I could do so easily with a sheet fed scanner. In a perfect world, vendors and service providers never screw up. Charlene catches them at it more often than I ever did, and I think the main purpose is having less paper lying around, but still having something available for when we have to go back and audit them, Dan

=============================================================== From: Jonathan Calloway ------------------------------------------------------ I know they work on OS X, do they work in Linux? their premium for their hardware. wrote: and the

=============================================================== From: Dan Lyke ------------------------------------------------------ On Wed, 8 Jan 2014 12:16:52 -0500 Lynn Dixon wrote: I don't mind paying the premium for their hardware, I min'd paying the admin premium to run Windows or Mac for anything but paying development work at home. If it runs stand-alone and will deliver things to a Linux server: shut up and take my money. Dan