extracting sequences of hyperlink labels from ODP

sperugin · Feb 19, 2004, 09:40 PM

Hello,

I am interested in collecting sequences of hyperlink labels (the text anchoring
the a href's) for various sub-branches of ODP, where a sequence is the path
from the root of ODP to a leaf (external webpage indexed). These sequences are
essentially the breadcrumbs at the top of each page in ODP.

For example, the Arts sub-branch contains the following selected sequences:

Arts: Animation: Cartoons: Campaigns and Petitions: url to CybertOOn's Cartoon Campaign
Arts: Animation: Cartoons: Chats and Forums: url to Cartoon World
Arts: Animation: Cartoons: Chats and Forums: url to Cartoons
Arts: Animation: Cartoons: Chats and Forums: url to Toon Zone Forums

Each sequence leads to a unique webpage (URL).

What is difficult is getting the paths involving crosslinks. For example,
Games: Coin-Op: Jukeboxes: Retailers is one such path. Where Jukeboxes
and Retailers are crosslinks (prefaced with a '@').

For example, I'd like to extract the following sequences, involving
crosslinks, from the Arts branch:

"Arts: Antiques: Directories: Art: url to Affordable Antique Art.com"
(when Affordable Antique Art.com really lives in the Recreation sub-branch
under Recreation: Antiques: Directories: Art).

"Arts: Dance: Disabled: url to Adaptive Dancing, Inc."
(when Adaptive Dancing, Inc. really lives in the Society sub-branch
under Arts: Performing Arts: Dance).

I'm interested in collecting such sequences on the
order of thousands in selected sub-branches of ODP.
Can I extract these sequences from the rdf structure dump
with slight modification to the POD scripts?

Thank You and Best Regards,
Saverio

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Labels	Fuzzybear	Depression	4	Sep 16, 2007 09:13 PM
Labels...	Lexi232	Other Mental Health Discussion	4	Apr 24, 2007 08:42 PM
Warning LABELS....... lol	Rhapsody	General Social Chat	5	Feb 10, 2007 04:23 PM
Labels	stormgirl	Dissociative Disorders	5	May 04, 2006 04:26 AM
they aren't "labels" ...in my opinion	Shadowsinsideme	Other Mental Health Discussion	17	Apr 30, 2005 02:36 AM

Menu

extracting sequences of hyperlink labels from ODP

My Support Forums