Status of new search engine
After that last request I decided it was time to get serious about the
searching and have now got it going, sort of.
You can have a look at the test page at http://search.debian.org/new/
Some good news:
- Cache mode is working, it uses flat files instead of databases
(but still uses a DB)
- It's indexing the website. It's telling me it knows about 26,000
pages and it has about 7800 to go. The total number increases once
it finds more pages (links) in that 7800. It's a few hours work with
a niced process.
- This one is using UTF-8, it should mean the charsets that were not
supported before are now, including 2byte ones.
There are some things to be sorted out:
- How to get language selection working, it can do it but it refuses
to do it for me.
- Translators will have to adjust the searchtmpl files, minor stuff
- It doesn't work with search.cgi being a symlink, bah. I've got hard
links in there.
- It doesn't work if search.cgi is called as an index, eg
- The template has to get, ironically, the UdmComment stuff deleted
(manually) because it thinks it is part of the section.
- I have to talk to someone about the black magic needed to get
language content negotiation going again, I have no idea how it
- Non-latin1 people need to test it and let me know how it goes.
It uses a TCP port to connect the indexer(s) (you can have many) to the
cache daemon, currently it runs as my uid and only when I'm indexing.
I'd like that to be moved to a Unix socket one day. It does mean you
can have multiple indexers across many machines, which is cool.
Craig Small VK2XLZ GnuPG:1C1B D893 1418 2AF4 45EE 95CB C76C E5AC 12CA DFA5
Eye-Net Consulting http://www.enc.com.au/ <email@example.com>
MIEEE <firstname.lastname@example.org> Debian developer <email@example.com>