[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: How could you load only once a Linux ultility without a batch --input-files kind of option and repeatedly use it on many files? . . .



On Vi, 15 mai 20, 12:38:12, Albretch Mueller wrote:
> On 5/14/20, Nicolas George <george@nsup.org> wrote:
> 
> > The question was not how to find the files, the formulation of the
> > question indicates that Albretch has that covered.
> 
>  Yeah, my problem is not finding the files per se. I have them or
> could have them easily listed.

If your filenames contain "strange" characters you can avoid a lot of 
headaches by using 'find -exec <whatever> {} +' instead of using xargs 
directly.

The man page claims the '-exec {} +' is similar to xargs. Since you have 
these many files you could test ;)

Using 'xargs' directly (or combined with 'find -print0' to avoid issues 
with strange filenames) allows for some additional tuning.

>  The thing is that when you work on copora research you have to get
> fairly complicated answers from millions of text "as fast as possible"
> and you have to make sure that your baseline hasn't been changed.
> 
>  I will have to play (again) with the options that you have given me
> and by the way I said sha256sum as an example in the typical case you
> would run "file" and two hashes on each file and that would take
> forever a user's machine.

Are you sure the bottleneck is in execution? With so many files it could 
be many other things (storage, RAM, etc.).

Kind regards,
Andrei
-- 
http://wiki.debian.org/FAQsFromDebianUser

Attachment: signature.asc
Description: PGP signature


Reply to: