Scattered records

From WhyNotWiki

(Redirected from Distributed records)
Jump to: navigation, search

Definition: This is the problem where the actual records of data for a database table are "scattered"/distributed across several locations. The table itself is actually centralized, but its contents appear to be spread out and may be displayed or edited from a number of different locations/pages.

The context in which I have encountered this problem most frequently is in my wiki.

Examples:

Two observations:

  • This information is highly structured; therefore, I'd like it to be stored in a database table, with well-defined columns.
  • This information is highly reusable, and is more useful when aggregated; therefore, I'd like it to be stored centrally.

Also, by solving this problem, one can (hopefully) make a big dent in the amount of yucky duplication that you need to have: store the data/records in one table and just distribute the display and editing of those records (a subset of them) across many pages, as necessary.

Contents

[edit] How it's a problem with MediaWiki

[edit] Case study: Citations

Wikipedia came up with some great templates to let you make citations in a very structured way — Template:Cite web, for example.

It also provides a way (through the __ extension) to reuse references within a page...

<ref name="whatever">citation</ref>
<ref name="whatever" />

...which is simply terrific!

However, this mechanism only allows reuse of those "records" on that single page. You can't reuse the reference / citation information that you had on page A over on page B.

In my experience, however, it is fairly often that I want to reuse the same source reference across multiple articles. (I guess that's why I came up with the each-source/reference-gets-its-own-page convention.)

This is a great example of the Scattered records problem, illustrating why it would be nice to be able to use a "scattered record" on multiple pages.

[edit] Case study: Aliases

The most useful place to see/use/have aliases is on the article/page itself that has the aliases.

  • This makes the article more findable (you can search for it via any of its aliases/keywords, which most likely will also be redirects to the official title/name of the article)

But it would be great if those records were also stored/displayed in some central list, such as what the Naming conventions page is right now.

  • This allows you to look for and detect patterns, which you may decide to extract into new rules/conventions
  • And it allows you to check for compliance with existing conventions
  • Audit yourself to see how consistent you've been; correct deviations


[edit] How I've tried to work around it

To enforce structure: I've made a lot of templates, which are really functionally equivalent to a record in a table: they allow me to unambiguously specify which values go in which columns, and I can even make certain columns "required" to some extent (although MediaWiki's template system doesn't have any true support for error messages).

To enforce reusability:

  • I've tried to create smaller objects/pages which I can then transclude or link to as necessary on other pages.
  • The page itself then is the equivalent of a record in a table.

Combining these ideas, I can create and maintain data that is both structured and reusable — I just have to create a new page for every record, and on that page, make sure I use the right template.

In short, I'm using MediaWiki as a sort of "database" for various structured information. The article title acts as a unique primary key (which can be used for linking to or "querying from" (transcluding) "records" from the database)...

[edit] Problems

But that's really not ideal. It's just a workaround until I can move the data to a truly structured solution, such as a RDBMS.

It's a pain to have to create a new page every time I want a new record. And a pain to try to remember which template to use, and how to use it...

It's also a pain to have to navigate to the page, and then click Edit, just to edit a record. (Though I've found a way to work around that by using include with edit link, which adds a direct link to the edit view...)

Inefficient, Unstructured

Internally, all the structured data is stored in a block of wikitext, which must be parsed every time it is needed. All columns of my virtual "table" then end up being stored in a single table in the "text" (or whatever) table.

That can't be great for efficiency or searching. And it's certainly not very structured (internally).

Limited presentation options. Unfortunately, the "record" (page -- actually, the template, probably) has the presentation logic embedded in it, so it can only be presented in one way... Not very flexible. Ideally, you could create many different views for the same data model...

But although MediaWiki lets you "put things into" the database in a structured manner (via templates), it doesn't let you pull that information in a structured/raw/custom way (only in a pre-formatted/rendered-template way).


[edit] [Solution (category)]: How I hope to solve this problem with Wrinkle

The main difference I hope to have in my new wiki system is that that these "scattered records" would not actually be stored in the wikitext itself but in a separate/specialized table. In other words, the data would not be scattered -- they would be stored centrally.

Whenever you felt like something should be structured, and you wanted to have multiple records with the same structure, you would create a new, specialized table for that data (easily and intuitively, of course—see Ad hoc table creation, Pliable table schemas).

You would always be free to use the table view for that view (built using ActiveScaffold), if you wanted to see a list of all the records in the table, search for records in the table, or modify a bunch of records at a time.

But you would also often want to just use an individual record from that table in a page. For instance, to make a reference to/citation from some source, you would refer to that record in the database. Internally, of course, you would link to that record via its primary key. But we don't want the user to really be exposed to the primary keys or accidentally choose the wrong record because they mistyped 1007 instead of 10007...

So the challenge then becomes to create the impression/illusion that you have still have scattered records (even though they're actually centralized, not scattered).

We want to create the impression that the data is duplicated on every page, when in reality (thankfully) there is no duplication.

To create that impression, we might do some of the following...

Where the record appears on the page, have a button the user can click that causes the record to be immediately editable -- inline, on the fly, using Ajax.

When they edit the full page (if we even allow that any more), it should also show ActiveScaffold forms for all "included" records that appear on the page.

Or (and I don't like this idea as well) substitute the reference to the record's ID that appears in the wikitext with the equivalent of a MediaWiki template... So you'd have all the values for all the columns of that record right there in the wikitext where you can edit it... Then when you saved the page, it would check for changes to all embedded records, modify the associated tables if necessary, and then replace that part of the wikitext again with the simple reference to the primary key, as was there before you opened the page for editing.

[edit]

See also: My databases

Aliases: Distributed records

Personal tools