[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Status of new search engine

  After that last request I decided it was time to get serious about the
searching and have now got it going, sort of.

You can have a look at the test page at http://search.debian.org/new/

Some good news:
  - Cache mode is working, it uses flat files instead of databases 
    (but still uses a DB)
  - It's indexing the website.  It's telling me it knows about 26,000
    pages and it has about 7800 to go.  The total number increases once
	it finds more pages (links) in that 7800. It's a few hours work with
	a niced process.
  - This one is using UTF-8, it should mean the charsets that were not
    supported before are now, including 2byte ones.

There are some things to be sorted out:
  - How to get language selection working, it can do it but it refuses
    to do it for me.
  - Translators will have to adjust the searchtmpl files, minor stuff
  - It doesn't work with search.cgi being a symlink, bah.  I've got hard
    links in there.
  - It doesn't work if search.cgi is called as an index, eg
	DirectoryIndex search.cgi
  - The template has to get, ironically, the UdmComment stuff deleted
    (manually) because it thinks it is part of the section.
  - I have to talk to someone about the black magic needed to get
	language content negotiation going again, I have no idea how it
  - Non-latin1 people need to test it and let me know how it goes.

It uses a TCP port to connect the indexer(s) (you can have many) to the
cache daemon, currently it runs as my uid and only when I'm indexing.
I'd like that to be moved to a Unix socket one day.  It does mean you
can have multiple indexers across many machines, which is cool.

  - Craig
Craig Small VK2XLZ  GnuPG:1C1B D893 1418 2AF4 45EE  95CB C76C E5AC 12CA DFA5
Eye-Net Consulting http://www.enc.com.au/                <csmall@enc.com.au>
MIEEE <csmall@ieee.org>                 Debian developer <csmall@debian.org>

Reply to: