Dax Package Hierarchy

From Daxtoolkit
Jump to: navigation, search

We recently ran into a naming problem with the Dax classes. We discovered that we have classes and files in different directories that have almost the exact same name. The initial conversation about naming convention expanded into a broader topic about class organization.

This design document attempts to discuss and codify the class organization we use in Dax. To do this, we consider the classes as organized in a hierarchy of logical packages. These packages can be thought of in the same way that the Java programming language has packages. For example, if we were writing Dax in Java (which we are not), we would probably place everything in a package called dax. We could then have sub packages dax.control and dax.execution to separate the API for the two programming environments. The packaging convention would probably be reflected in the directory structure the source code is stored in. The convention could also be reflected in C++ namespaces (e.g. ::dax::Control) but is more likely to be reflected as a naming convention (DaxControl*).

Dimensions of Classification

In a recent email thread, we have identified two potential dimensions of classification. The first is obvious: the coding environment the class belongs to. The class is available in the control environment or the execution environment or, in some cases, both. Different environments are likely to need similar classes that are handled in different ways. For example, a field class in the control environment might store an array of field values to attach to a mesh whereas a field class in the execution environment would be a handle to get the value for a point or cell.

The second dimension of classification refers to the scope of behavior for the group of classes. It provides a broad function of the classes as well as implicitly what they can access and what can access them. This dimension is somewhat, but not completely, analogous to the separation of models in VTK (Common, Filtering, Rendering, Parallel, etc.).

In Dax, there will clearly be a separation between exposed classes for the interface and internal classes. In addition, there should probably be a further breakdown of the internal classes. There will be a core set of classes and other sets that handle the specifics for threaded implementations like CUDA, OpenMP, etc. The interface classes could also be divided into basic types that are available everywhere (as well as calling specifications for different environments) and the classes used to create and control worklets.

These two dimension form a matrix of packages. An X is placed where that package most likely makes sense.

Control Common Execution
Basic Types X
Interface X X
Core X X X
CUDA X X
OpenMP X X

Each dimension also has an implied DAG of dependence. This is important as we want the packages (and the libraries they create) to have a DAG of dependence. (It is poor programing style to have cyclical dependencies between libraries or modules.) The following shows the dependencies for each dimension.

This is a graph with borders and nodes. Maybe there is an Imagemap used so the nodes may be linking to some Pages.

The dependencies of the combined dimensions (that is, for the packages), is the union of the dependencies. Although perhaps pedantic, the following diagram shows the dependencies of all packages. For clarity the diagram assumes that dependence is transitive (that is, if A depends on B and B depends on C then A depends on C) and repetitive relationships have been removed. For example, all packages depend on "Basic Types", but most dependencies are shown indirectly through other packages.

This is a graph with borders and nodes. Maybe there is an Imagemap used so the nodes may be linking to some Pages.

Order of Hierarchy

The dimensions of classification must be translated to levels in a package hierarchy. There are two basic choices defined by which dimension in on which layer. The first layout has the module type as the first layer and the execution mode as the second layer.

This is a graph with borders and nodes. Maybe there is an Imagemap used so the nodes may be linking to some Pages.

The second layout has the execution environment first and the module second.

This is a graph with borders and nodes. Maybe there is an Imagemap used so the nodes may be linking to some Pages.

Both hierarchies are less than ideal. As you can see, there is a lot of repetition in both. And is some ways the choice is inconsequential as they end in the same number packages in the final level.

However, my preference is for the first form with the module on the top level. The dependent libraries are defined on this level, so this structure will help us keep code depending on some library to a particular part. For example, if we added an MPI module for distributed memory computing, we would want all the MPI-dependent code in one place, not spread out amongst a bunch of packages.

Although my first reaction was to prefer second form, I think you make a good argument about dependencies on CUDA/MPI etc. So I agree with form 1 as well. - Utkarsh 10:39, 27 July 2011 (EDT)

Naming Convention

With this package hierarchy in mind, we can consider a set of prefixes to use for naming classes. The names are taken from descending the hierarchy and prefixing the appropriate string as you go. The strings used are subject for debate, but I propose this straw man. Note that some packages add no prefix.

Logical Package Name String to Use
Dax dax
Common
Control cont
Execution exec
Basic Types
Interface
Core core
CUDA cuda
OpenMP openmp

Applying this to our matrix of packages, we get the following prefixes.

Control Common Execution
Basic Types dax
Interface dax::cont dax::exec
Core dax::core::cont dax::core dax::core::exec
CUDA dax::cuda::cont dax::cuda::exec
OpenMP dax::openmp::cont dax::openmp::exec

Using Namespaces Instead of Naming

One thing we should seriously consider is using namespaces instead of prefixing names to everything. Avoiding name collisions by tacking identifiers to names is pretty old school. The "modern way" to differentiate symbols from different packages is to use namespaces. Thus, instead of having a class named DaxContExecutive, you would have a class named Executive in a namespace like ::dax::cont.

Using namespaces affords the following advantages.

  • Namespaces are structured in exactly the logical hierarchy we are trying to implement.
  • The colons in namespaces provide nice visual separators between packages. Compare DaxContExecutive to dax::cont::Executive. How about DaxOpenMPContScheduleThreads versus dax::openmp::cont::ScheduleThreads?
  • Namespaces allow names to be abbreviated with the using keyword. We may or may not use this internally, but Dax users might appreciate it. It's also convenient to abbreviate like this when writing code samples in documentation or papers where you cannot fit many characters in a single line.
  • All the cool C++ programmers are using them. We don't want to look like losers.

That said, there are some disadvantages.

  • Including namespace scopes makes class differentiation more ambiguous. Just like our coding standards dictate using this-> to access any member function, we may dictate typing the whole namespace to avoid this problem.
  • When you do type the whole namespace, it is larger than if we just used prefixes because you insert the :: between namespaces.
I have nothing against namespaces. I think they are totally fine here and if we are in agreement we should start using them with the style guildeline you suggest. "using" is always confusing to me (except for std, of course). - Utkarsh 10:42, 27 July 2011 (EDT)
I agree as well, I think namespaces are the way to go. - Robert Miller 13:49, 27 July 2011 (EDT)
Works for me. Any thoughts on what the convention for namespaces should be? Should they be all lowercase? That seems to be the convention I usually see in other projects (and for the analog packages in Java and modules in Python). Also, how short should we try to make them. Specifically, should we use abbreviations cont and exec or spell out control and execution? --Kenneth Moreland 15:07, 27 July 2011 (EDT)
I think abbreviated names are fine, otherwise people will be tempted to rename as they are using them, defeating the purpose of full qualification. Although this goes against VTK style guideline, VTK doesn't really have any style guideline for namespace names so to speak :). Utkarsh 20:54, 27 July 2011 (EDT)
Fair enough. I'm changing the above documentation to use the namespace names. The only abbreviations I am really considering are cont and exec. These are extremely prevalent, so there is both a big advantage for the shortening and it will also be easier to remember. The rest I think should stay as long as necessary. For example, there is no point in shortening openmp to omp. --Kenneth Moreland 10:27, 28 July 2011 (EDT)

Acknowledgements

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.

SandiaLogo.png DOELogo.png

SAND 2011-5253 P