Cached resources database

From WhyNotWiki

(Redirected from Web page snapshot tool)
Jump to: navigation, search

What it is: This is my proposed solution to the link rot and Web archiving problems.

Aliases: A tool to automatically cache all externally linked sites, Web page snapshot tool

  • Resource type: HTML page, MP3, PDF, image (gif, png, jpg, ...), ...
  • URI
  • license => copyright, copyleft, etc.
  • Cached at (host + location on file system)

Rename just to Resources? (since caching might be optional)

Contents

[edit] Purpose / Motivation

[edit] Prevent disappearing web sites from affecting you

Because sites get taken down, URLs become no longer accessible. I want to have access to the content via my cache even if the site ceases to exist.

Similar to the WayBackMachine or Google's cache.

If some resource got moved or deleted, it lets me see how it used to appear.

In the case of being moved, I can use keywords from the cached old location to find (Google for) the new location.

[edit] Store snapshot of time-sensitive information

Useful for cases when you need to know how a page appeared at some point in time.

Example: membership agreements, terms of service. Even if they have a clause saying "we have the right to update our terms; the most current version of this page is what is actually binding", it still might be good to keep the original around.

But couldn't you just save the page? Am I trying to make an alternative to saving web pages locally? Well... it'd be nice if I could, perhaps... But some "pages" (as: order confirmation pages) are the result of a transaction and only the user agent that initiated the transaction can actually see the result/save the page. So in some cases, at least, the page download would have to start with a client web browser.

[edit] Even if the page itself doesn't disappear, the specific thing about/on the page that you are referring to may not remain there

For example, if you are doing an analysis of the visual design of some of your favorite sites and are commenting about how a certain border here and color here and image here gives it a nice effect, all your thoughtful commenting goes out the window and becomes useless as soon as the thing you are commenting about gets a face lift (is redesigned).

For example, if you found an example of some language thing (a word or phrase that was used, perhaps), there is no guarantee that the same wording will be there when your readers go to check out the site (that you linked to) later on.... They could have changed the wording completely since then, and your nice example would essentially cease to exist... unless you cached it -- took a snapshot of it at the point in time when you noticed the example.

[edit] Other snapshot software

[edit] star_full.gif star_full.gif star_full.gif Zotero

http://www.zotero.org/

Saves to a location like

file:///F:/User%20Data/Programs/Mozilla/Firefox/Profiles/xsgissta.default/zotero/storage/4298/top-10-firefox-extensions-to-improve-your-productivity.html

Included images, .js, and .css is saved in that folder as well.

Works pretty well. Snapshots can be organized homogeneously in the Zotero tree with regular bookmarks.

Unfortunately, the snapshot and bookmark for the same resource are treated as two separate, independent objects. They aren't (and can't be) associated with each other. It looks you can create a "web page" object which can contain any number of attachments, including links and snapshots.

[edit] See also

Related: This also goes hand in hand with a link checker.

[edit] To do

  • Create database schema
  • Decide how resources will be stored on the file system
  • Write scripts to cache/grab stuff
    • Crawler for my site. Cache all external links.
Personal tools