New subject: [webpages-l] sitemap_m.html

1 Sep 1999


      Hmmm, I wonder why it sometimes misses files.  For example,
it finds the German Rheinland-Pfalz history page, but not the
English version:

Deutsche Genealogie: Rheinland-Pfalz (D) / German Genealogy:
Rheinland-Pfalz/Rhineland-Palatinate (E)
      Deutsche Genealogie: Rheinland-Pfalz, die Geschichte
...
...
...
...
...
...
English history would presumably go here<<<<<<<
        frenchzone.jpg
I'm not the type that would use this kind of resource to get
to know a site, but I suppose there are some that are.

Rick

At 07:24 PM 8/31/99 -0400, Jim Eggert wrote:
...
I've uploaded an improved site map to
 http://www2.genealogy.net/gene/tmp/up/pages/sitemap_m.html
This might be a complete catalog of all the clickable links on our
server.  (I haven't checked.)  This includes clickable images, pdf
files, and text files.
The program does a better job with the hierarchy because it requires
that subordinate members be in the same or subordinate directories.
It also knows a lot about exceptions:
o Files that should not be parsed, like team.html
o Files that look like language-pairs, but aren't
o Files that shouldn't be included at all (so far none)
o Files that should be pinned in the hierarchy, independent of the
  crawling mechanism (so far none)
Titles and hierarchy are still derived by crawling a local copy of the
site.  So if you don't like the titles, change them in the site!
Feedback is encouraged.  So far I've had almost none.
-- 
=Jim Eggert   EggertJ@LL.mit.edu

Re: [webpages-l] sitemap_m.html

Richard Heli

Jim Eggert

tags

participants (2)