[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Fwd: Re: Hurd Projects



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



- ----------  Forwarded Message  ----------
Return-Path: <Ondrej.Hurt@seznam.cz>
X-Flags: 0000
Delivered-To: GMX delivery to b.gohla@gmx.de
Received: (qmail 31348 invoked by uid 0); 19 Dec 2001 19:29:03 -0000
Received: from omx.seznam.cz (HELO email.seznam.cz) (195.119.180.41)
  by mx0.gmx.net (mx022-rz3) with SMTP; 19 Dec 2001 19:29:03 -0000
Received: (qmail 17182 invoked by uid 0); 19 Dec 2001 19:29:03 -0000
Received: from [62.24.89.227] by email.seznam.cz with HTTP;
	Wed, 19 Dec 2001 20:29:02 +0100 (CET)
To: b.gohla@gmx.de
From: =?iso-8859-2?Q?Ondrej=20Hurt?= <Ondrej.Hurt@seznam.cz>
Subject: =?iso-8859-2?Q?Re=3A=20Hurd=20Projects?=
In-Reply-To: <[🔎] 01121921054002.00947@linux>
Content-Type: text/plain;
  charset=ISO-8859-2
Content-Transfer-Encoding: quoted-printable
Date: Wed, 19 Dec 2001 20:29:02 +0100 (CET)
Reply-To: =?iso-8859-2?Q?Ondrej=20Hurt?= <Ondrej.Hurt@seznam.cz>
Mime-Version: 1.0
Message-Id: <1733.3293-29595-1287109439-1008790142@seznam.cz>
Status: R 
X-Status: N


> i have been thinking about an index translator, that crawls specific =

> directories and compiles indices and provides a directory for each wo=
rd 
> found, while that directory contains links to the files containing th=
e 
> respective word. 

> a more advanced translator i image could be useful would provide dire=
ctories 
> linking to files sufficiently similar in their vocabulary and thus li=
kely 
> treating of a similar topic. 

This sounds good ! But I wouldn't list all words (subdirectories)
>from the index automatically but the user would have to create these
subdirectories. Each subdirectory would be a result of search in the
index, its name would be a query in searching language e.g. "car AND
accident* WITH 80". "WITH 80" means relevance treshold that each
document has to exceed to be comprised in search result. 

How do you want to search non plain-text files e.g. XML, binary
files, executables ? There could be a method for extracting plain
text from filesystem objects (e.g. files :-) ). What about adding a
new message to the standard Hurd io protocol that returns text
instead of raw data like io_read ? Imagine that you could index
symbols contained in executables, text stored in images files,
StarOffice docs, listings of files in archives etc. :-)


______________________________________________________________________
Vylepsete svuj prohlizec Internetu na http://software.seznam.cz/listick=
a


> i have been thinking about an index translator, that crawls specific
> directories and compiles indices and provides a directory for each word
> found, while that directory contains links to the files containing the
> respective word.
>
> a more advanced translator i image could be useful would provide
> directories linking to files sufficiently similar in their vocabulary and
> thus likely treating of a similar topic.

This sounds good ! But I wouldn't list all words (subdirectories)

>from the index automatically but the user would have to create these

subdirectories. Each subdirectory would be a result of search in the
index, its name would be a query in searching language e.g. "car AND
accident* WITH 80". "WITH 80" means relevance treshold that each
document has to exceed to be comprised in search result.

How do you want to search non plain-text files e.g. XML, binary
files, executables ? There could be a method for extracting plain
text from filesystem objects (e.g. files :-) ). What about adding a
new message to the standard Hurd io protocol that returns text
instead of raw data like io_read ? Imagine that you could index
symbols contained in executables, text stored in images files,
StarOffice docs, listings of files in archives etc. :-)


______________________________________________________________________
Vylepsete svuj prohlizec Internetu na http://software.seznam.cz/listicka

- -------------------------------------------------------

- -- 
- --------------------
()  ASCII ribbon against html email 
/\  and Microsoft attachments.
pub  1024D/834F4976 2001-01-07 Björn Gohla (Wissenschaftler, Weltbürger) 
<b.gohla@gmx.de>
     Key fingerprint = 9FF4 FEDA CCDF DA0E 14D5  8129 6C14 3C39 834F 4976
sub  1024g/29571FE2 2001-01-07
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE8IRDQbBQ8OYNPSXYRAuBdAJ45+y7Iu8zRSZIn4s4FZ9l37t15jQCfZeJ3
RBLOKKHSPJuA3X51JPzPxl4=
=2N3O
-----END PGP SIGNATURE-----



Reply to: