Browse Source

Added comment regarding the new BatchSet import.

Gwyneth Llewelyn 7 years ago
parent
commit
2ae117743a
1 changed files with 1 additions and 1 deletions
  1. 1 1
      README.md

+ 1 - 1
README.md

@@ -79,7 +79,7 @@ Note that the current version can be used as a direct replacement for [W-Hat's n
 
 To actually _use_ the W-Hat database, you need to download it first and import it. This means using the `-import` command (use the `name2key.csv.bz2`version). W-Hat still updates that database daily, so, with some clever `cron` magic, you might be able to get a fresh copy every day to import. Note that the database is supposed to be unique by name (and the UUIDs are not supposed to change): that means that you can import the 'new' version over an 'old' version, and only the relevant entries will be changed. Also, if you happen to have captured new entries (not yet existing on W-Hat's database) then these will _not_ be overwritten (or deleted) with a new import. To delete an old database, just delete the directory it is in.
 
-Importing the whole W-Hat database, which has a bit over 9 million entries, took on my Mac 3 minutes and 5 seconds. Aye, that's quite a long time. On a shared server, it can be even longer.
+Importing the whole W-Hat database, which has a bit over 9 million entries, took on my Mac 3 minutes and 5 seconds. Aye, that's quite a long time. On a shared server, it can be even longer. The code has been substantially changed to use `BatchSet` which is allegedly the recommended way of importing large databases, but even in the scenario to consume as little memory as possible, it will break most shared servers, simply because Go's garbage collector will not be fast enough to clean up after each batch is sent — I may have to take a look at how to do this better, perhaps with less concurrency.
 
 This also works for OpenSimulator grids and you can use the same scripts and database if you wish. Currently, the database stores an extra field with UUID, which is usually set to the name of the grid. Linden Lab sets these as 'Production' and 'Testing' respectively; other grid operators may use other names. There is no guarantee that every grid operator has configured their database with an unique name. Also, because of the way the key/value database works, it _assumes_ that all avatar names are unique across _all_ grids, which may not be true: only UUIDs are guaranteed to be unique. The reason why this was implemented this way was that I wanted _very fast_ name2key searches, while I'm not worried about very slow key2name searches, since those are implemented in LSL anyway. To make sure that you can have many avatars with the same name but different UUIDs, well, it would require a different kind of database. Also, what should a function return for an avatar with three different UUIDs? You can see the problem there. Therefore I kept it simple, but be mindful of this limitation.