![]() |
FAQ/Help |
Calendar |
Search |
#1
|
|||
|
|||
Hello,
I am interested in collecting sequences of hyperlink labels (the text anchoring the a href's) for various sub-branches of ODP, where a sequence is the path from the root of ODP to a leaf (external webpage indexed). These sequences are essentially the breadcrumbs at the top of each page in ODP. For example, the Arts sub-branch contains the following selected sequences: Arts: Animation: Cartoons: Campaigns and Petitions: url to CybertOOn's Cartoon Campaign Arts: Animation: Cartoons: Chats and Forums: url to Cartoon World Arts: Animation: Cartoons: Chats and Forums: url to Cartoons Arts: Animation: Cartoons: Chats and Forums: url to Toon Zone Forums Each sequence leads to a unique webpage (URL). What is difficult is getting the paths involving crosslinks. For example, Games: Coin-Op: Jukeboxes: Retailers is one such path. Where Jukeboxes and Retailers are crosslinks (prefaced with a '@'). For example, I'd like to extract the following sequences, involving crosslinks, from the Arts branch: "Arts: Antiques: Directories: Art: url to Affordable Antique Art.com" (when Affordable Antique Art.com really lives in the Recreation sub-branch under Recreation: Antiques: Directories: Art). "Arts: Dance: Disabled: url to Adaptive Dancing, Inc." (when Adaptive Dancing, Inc. really lives in the Society sub-branch under Arts: Performing Arts: Dance). I'm interested in collecting such sequences on the order of thousands in selected sub-branches of ODP. Can I extract these sequences from the rdf structure dump with slight modification to the POD scripts? Thank You and Best Regards, Saverio |
Reply |
|
![]() |
||||
Thread | Forum | |||
Labels | Depression | |||
Labels... | Other Mental Health Discussion | |||
Warning LABELS....... lol | General Social Chat | |||
Labels | Dissociative Disorders | |||
they aren't "labels" ...in my opinion | Other Mental Health Discussion |