Software projects database

From WhyNotWiki

(Redirected from Software database)
Jump to: navigation, search

Currently this entire article is used for metadata only. Instances/records of software projects can be found in the Software article and Category:Software projects.


Software projects  edit   (Category  edit) .


Article Metadata:

[edit] Attributes/Columns

  • Name
  • Project organization
    • Project creation date
    • People
      • Authors/contributors
      • Current developers
      • Maintainers/Admins
    • Are they/Do they have a company/foundation? If so, details about it.
  • Story
    • What motivated them to write it
  • Sense of humor?
  • Target environment(s)
    • Operating system platform (Mac, POSIX, OS-independent, ...)
    • User interface (web, desktop application, command-line, e-mail-based, ...)
  • Project implementation decisions/tools
    • Implementation language (one or more of:)
    • Persistence technology/Database support (one or more of:)
  • Development/project links/URLs/etc.
    • Announcement URL (for latest public release) (often a blog post)
    • Changelog URL (?) (for latest public release)
    • Documentation
      • There can be many of these. Each can be labeled according to what type of documentation it is.
      • Getting started / Installation instructions
      • RDoc
      • Readme (allows us, among other things, to automate the comparison of Readmes, look for sections that are common to most projects, etc.)
      • Examples
      • Online demonstration
      • Wiki
      • Mailing list(s)
      • IRC channel / Instant messaging
      • ...
    • Homepage: the entry point at which they want users / potential users to come to first
    • "Project page"/"Development home" (can have multiple, although probably not common) -- where they keep track of issues, etc.
      • development_url
        • rubyforge_project_url
        • sourceforge_project_url
    • Project documentation rating and comments
    • Issue tracker
      • URL
      • Classification / software used: {Bugzilla, Mantis, Trac, etc.}
  • Source
    • Repository/source URL for this project (svn.whatever.com...) (really, there should only be one of these, but there may be multiple). Preference is given to actual repository URLs, but if they simply have a browsable directory view available on the Web or a ViewVC app exposed, we should link to it as well.
    • Package URL/directions: for example 'gem install foo' or 'rapt http://path/to/plugin'
    • Meat: a direct link (or multiple) into the heart of the source code to where the most relevant, core, interesting implementation is
  • Intellectual Property
    • Copyright holders
    • License
    • Patents (patented ideas used by this software)
  • Relationships to other SoftwareProject records
    • Depends on
    • Based on (ambiguous unless clarified)
      • Derived from
      • Uses / Integrates (libraries)
    • Incompatible with ___ (due to monkey patching or other incompatibility silliness)
    • Compatible with (not a comprehensive list! but if one is likely to ask if it's compatible with something (like ActiveRecord), then we ought to preemptively answer their question)
    • Forked from (rare)
    • Deprecated by (rare)
    • Competes with (??)
    • Related projects [This is a very weak and ambiguous association. Use Topic or Owner or other attribute to unambiguously show "relatedness".]
  • Response/Acceptance by users / Popularity / Analysis/criticism of (by users)
    • Who's using it (each user can vote "have used it"/"am using it"?)
    • Testimonials
    • Criticisms
    • Rating
      • We can start with having just one rating field and make it aggregate/average other rating fields at some later point in time...
    • Size: ranging from trivial ("I could write this from scratch in 3 hours") to huge/substantial (like Rails)
    • Bloat: somewhat related to size, but huge-sized softwares aren't necessarily bloated; this is partly a measure of the well-designedness of the software (maybe that would be a better name for this rating?)
  • Associations (low precedence category)
    • has_many :issues
    • has_and_belongs_to_many :features
  • Status/maturity/readiness
    • Readiness/Status: beta, stable, mature, etc.
    • Current, last several stable public releases (version, date)
  • Activity/vitality
    • RubyForge activity percentile
    • Initial public release (version, date)
    • Maintainer (nil if it is unmaintained/abandoned)
    • Number of commits / number of adds (RubyForge has)
  • Natural language
  • Degree of open-sourcedness/freeness
    • Openness (hosted only, closed source, etc.)
    • Cost


[edit] Virtual/calculated attributes

  • Project implementation decisions/tools
    • What do they use for revision control: [Subversion, CVS, etc.] Just look at source_url for presence of svn or cvs? No, can't always determine it from URL. Sample darcs repository: http://chneukirchen.org/repos/testspec/.
  • Activity/status/maturity/readiness/vitality
    • Activity percentile
    • Team size (developers.size)
    • Commit frequency
    • Release frequency:
      • They may delete old releases from their release list, so this data will have to be collected and stored each time we poll/visit the releases page.
    • Things based on project creation date
      • project age (now - project creation date)
      • software age (latest release - project creation date/initial release)
      • credibility of maturity claims ((software age < 1 year) && (complexity == high) && (maturity == mature)? don't believe it)
    • Chance of success (a guesstimate based on many variables, including number of variables, external funding/sponsorship, etc.; see article that mentioned "success algorithm")
  • Openness
  • Popularity


[edit] Details about attributes

[edit] environment

This will probably end up being multiple columns.

environment is used pretty loosely here. It tries to answer the question "where can this software be used". So any qualifications related to that question are fair game here. Aliases: scope, context, usable_in, ... .

Possible values:
  • library vs. application
  • to be used with Rake
  • Rails plugin
  • Desktop/GUI vs. "Console (Text Based)"

[edit] "openness" column

  • "Hosted only" (the epitome of closed source)
  • "Binaries available only" (closed source, "freeware", "shareware")
  • "Source available" (can be easily inferred from license)
    "Free Software" if it is FSF license certified
    "Open Source" if it is Open Source Initiative license certified

Other names considered: availability

[edit] "readiness" column

Other ideas considered for "readiness" column:

  • "readiness" refers to how ready it is to be used by other people
  • readiness_rating
  • development_status (a la rubyforge.org)
  • stability
  • polishedness
  • quality

[edit] activity

See also: RubyForge's activity statistics example

[edit] implementation_language

[Question (category)]: If it's built using a framework, like Rails, should we put that as the "implementation_language" or list it somewhere else (like in built_on or required_dependencies)?

I think technically it should not go in implementation_language, because a framework isn't really a language. If it's written in Rails, then the programming language it was implemented in would be Ruby.

On the other hand, the language could simply be inferred from the framework, right? If we put that it was written in Rails, then we would automatically know it was written in Ruby, by simple [inheritance]. So since it would be redundant to put both, maybe we should only put the most specific one.

On the other hand, what if it's built using a combination of different frameworks? For that matter, some software systems might even be written using a combination of implementation/programming languages! (Ruby + Bash + Python + Lisp + Prolog + XML + Java??).

[edit] In use by

I would be interested in seeing a count and a list of all sites using, say, MediaWiki vs. MoinWiki.

I might also want it to drill down further... For those sites using MediaWiki, I would want my crawler to browse to the wiki's Special:Version page and scrape the list of extensions that are installed. It should then add this site as a user for each of those MediaWiki extensions...

That's how the data acquisition would work; the display of that data would work something like this: I would probably rarely want to look at the usage stats for a particular extension all by itself; I would generally want to see it in the context of a list/table of all MediaWiki extensions.

I should even be able to drill down / filter by attributes/categories. For example, drill down from All Software to Wiki Software to Wiki Software written primarily in Ruby, etc. Or, if I browsed to the record for MediaWiki, I would want it to list all extensions for MediaWiki, perhaps ordered by popularity, showing in a column the number of sites that had that extension installed. Clicking on the number could expand it (Ajax) to show the list, perhaps.


[edit] RubyForge has

  • Development Status: 3 - Alpha
  • Environment: Web Environment
  • Intended Audience: Developers
  • License: GNU General Public License (GPL) version 2
  • Natural Language: English
  • Operating System: OS Independent
  • Programming Language: Ruby
  • Topic: WWW/HTTP, Software Development

[edit] Vim scripts have

" Vim syntax file
" Language:     FlexWiki, http://www.flexwiki.com/
" Maintainer:   George V. Reilly  <george@reilly.org>
" Home:         http://www.georgevreilly.com/vim/flexwiki/
" Other Home:   http://www.vim.org/scripts/script.php?script_id=1529
" Author:       George V. Reilly
" Filenames:    *.wiki
" Last Change: Wed Apr 26 11:00 PM 2006 P
" Version:      0.3

[edit] Existing directories

More specific directories listed on Ruby libraries, [Rails plugins and libraries]], etc.

[edit] http://www.ohloh.net/

http://www.ohloh.net/. Retrieved on 2007-05-11 11:18.


Ohloh is a new kind of software directory, combining community-driven content with a unique source code crawler that monitors up-to-date development activity.

Ohloh's objective reports and community feedback help you find the software you need.


http://www.ohloh.net/about/us. Retrieved on 2007-05-11 11:18.


Ohloh is a resource for open source intelligence on thousands of open source projects. Ohloh collects software metrics from a variety of sources including the project's source code and the software development infrastructure used by the project's development team. Service

Software development has historically been a veiled process, providing users little visibility into how software is built and supported. The open source movement has pierced the veil to some extent by freeing licensing provisions and opening up access to source code. Nonetheless, deciding which open source software to use is largely guesswork. With Ohloh users can be more rigorous in evaluating open source software and more creative in exploring simpler and cheaper alternatives to proprietary software.

PHP Eats Rails for Breakfast (http://www.ohloh.net/articles/php_eats_rails). Retrieved on 2007-05-11 11:18.


Here at Ohloh we've accumulated an enormous database of open source development facts. So far, we've indexed over 3,000 projects and 220 million lines of source code. In addition, we've followed the history of these lines of code, to identify when and by whom all of this code was written.

As a result, we can measure the total amount of activity in a given language over time. In this article, we'll take a look at the changing popularity of web scripting languages, specifically PHP.

...

[edit] Shows factors that are likely to predict success

Example: http://www.ohloh.net/projects/9


  • Mostly written in C/C++
  • Very large, active development team
  • Mature, well-established codebase

[edit] Mission statement

My mission is to create the best, most useful, most open database of software project metadata available on the Internet.

Whereas other software directories (like RubyForge) force you to use their user interface to access their data, I aim to make this data as open as possible. That may include doing such things as providing a web service interface to the data as well as providing downloadable full dumps (in SQL, XML, YAML, and whatever other formats people find useful).

I have some ideas for how to create a useful frontend to the data, but I don't want anyone to be forced to use it. The data is the core of this project. Creating a nice view for the data is a close second, but second.

It aims to be a highly collaborative effort, open to changes and submissions from anyone, while carefully balancing that openness with an editorial team to keep things organized and high-quality.

[edit] What makes it different?

So it's going to be everything "they" are and more. That's the idea anyway.

  • More hierarchical
  • More metadata
    • Record the commonalities between various projects as well as the differences that make each particular project unique (for example, one project may do everything client-side (with JavaScript) and another use Ajax)...
    • Relationships between projects
  • More direct links to relevant project pages (Readme, interactive demo, etc.)
  • More editorial effort
  • Links to same project in other major directories (sourceforge, swik, ...)
  • More interest taken in project organization, license choices, etc.

[edit] Acquisition and maintenance

Of course there are a lot of directories of software already. So rather than duplicating all of the effort and knowledge that they represent, I will take advantage of it by doing a continuous synchronization from those sources.

I guess to take full advantage of their data, we'd need to have dedicated, read-only fields for each synchronized field. So we are forced to some extent to inherit the design choices that they made. (This can be overcome though by using virtual attributes based on one or more read-only attributes, or by creating parallel writable fields that are initialized from the read-only fields but have only a respectful soft-sync relationship with their source.)

[edit] Feature: uni-directional [synchronization (category)] with/mirroring of other directories/databases

Why?

  • Because, at least at first, that's where I'll pull most of the data from, to get some data to build upon. So there will at least be an initial import. But why stop there...?
  • To keep up-to-date... The other directories, at least at first, will surely be better-maintained, owing to the much larger number of people using/editing/maintaining them. So we want this database to benefit from those updates that are continuously happening.

Which things should be mirrored and for how long? — i.e., would we ever want to break the link?

  • I think for the foreseeable future, I'd want to plan on keeping the mirroring going. Which means any metadata that I add needs to be in separate fields so that it doesn't get overwritten.

How?

Screen scrape

How often?

Query their database maybe daily... Only change our database as often as their database changes, though, of course.

[edit] How do you compete with the likes of SWik? And should one bother? (Analysis of SWiK)

They may have a head start but they don't have a corner on good ideas.

http://swik.net/

SWiK.net is a project to help people collaboratively document open-source software.

SWiK is visited by tens of thousands of people daily, it’s a place to make notes and publish articles on software development and open source projects, tag projects to help organize the world of open source, or just browse around and find interesting stuff.

The wiki is community run and completely open – it depends on editors like you to help build pages with useful information about using open source software and developing applications.

Some of it is really slick...

  • Can create your own tags and tagged pages ([1])
  • nice Javascript-dynamic [wiki (category)]
    • "edit this" link opens form via JavaScript -- no delay!
  • tags edit box is auto-completing

Other parts I think I could implement better...

  • The pages in general and their list of comments looks kind of ugly and confusing and often have little actual useful content (example: [2])
    • although it claims/seems to be powered by a "wiki", the pages look more like useless link-blogs
  • I haven't seen evidence that it is spider/web server friendly?
  • The software that runs the site (such as the nice Javascript-dynamic wiki) is not open-source

[edit] Conventions

[edit] [English conventions (category)] What to call the records/objects in this table/database

It's not helpful that "software" is an uncountable noun or I'd just call it something simple: "softwares"! Unfortunately, there's no such thing as "softwares" so I have to be a bit more creative/longer...

Are they "software projects", "software packages", applications, etc.?

"Applications" is too specific. It doesn't encompass libraries.

I like how "software projects" is more general and inclusive than "software packages". "software packages" are just deliverables produced by a project. I also care about other aspects of the project, like it's documentation, contributors, organization, story, etc.

So "software projects" currently seems like the best option, encompassing everything I want it to encompass.

"software projects"? Again, too much focus on deliverables. And it's hard to really consider "free software" as a "product", since it's essentially not sold.

[edit] Customizable/personalizable

So, for example, Bob the individual programmer or QualitySoft the company, could keep their own list of favorite projects/libraries/plugins and associate their own personal notes and comments with any project...

In their view, they would only see the softwares they had flagged as "interesting" (or hadn't flagged at all yet). So maybe I should just say they would see all softwares except those they chose to filter out... But that makes it sound like a lot of work (to filter out all the uninteresting ones, since that will probably be > 70% of those available).

User interface for building personal list: In the header/heading area for each project detail view, maybe we could have "[your name]'s list: [Reject] [Include]" (in addition to the normal Digg-like thump-up/thumb-down to rate that record).

  • Rejected objects would automatically be filtered out of all views in the future
  • Included objects would be listed in your "favorites" view/list
  • Objects that are neither included nor rejected would be listed in most lists (they wouldn't be treated specially)

[edit] Project status

Status: Planning / Pre-Alpha

So in the meantime, I'm just using this here wiki (see Rails plugins and libraries, Ruby libraries) and my trusty Template:Software metadata template to get me by. Much of the information I collect with this method will end up being discarded / overwritten by up-to-date data pulled straight from RubyForge (once I get everything in a database and get the syncing going), but it will not all be for naught. The following efforts will not have been wasted when I finally turn on the syncomatic:

  • Screening -- Deciding which softwares are interesting / worth looking at
  • Classification -- Grouping similar projects together and adding tags/categories to them
  • Comments and examples (wiki text) -- will be 100% preserved for sure

[edit] .

Aliases: Open-source software projects database, Software project database, Software projects database, Software projects directory, Open-source software directory

[edit] To do

  • Get the list I started on my cs.wwc.edu site
Personal tools