MultiCellXML XML Specification
MultiCellXML is a human-readable, XML-based data format, which includes the random seed state, global variables, information on (and filenames of) microenvironmental field variables, and a list of each cell object and its current state. This structure allows us to easily parse the data (using standardised XML parsers, such as Expat, xmlParser, and TinyXML for use in data visualisation and post-processing. The list of cells in the XML file is very similar to the objectoriented Cell data structure in the simulator, making the format well-suited to resuming simulations from saved states. Modifying simulation parameters during a simulation can be readily achieved with simple plaintext search/replace operations in the XML files.
MultiCellXML Version 1.00
File header and other early elements
We begin with XML header information (?xml) for XML 1.0 standards compliance, followed by a "root" data_set tag. In the data_source section, we include information on the originating simulation software (simulator), the user (user), and any publication information that may assist the recipient of a data file in (1) locating the original source of the data, and (2) proper academic citation (reference). See Fig. 1. Future MultiCellXML versions may include reference and citation information for the simulation software.
created29 July 2010/created
compiled1 July 2010/compiled
citationMacklin et al. J. Theor. Biol. (2011) (in review)/citation
noteUser notes may go here./note
Following the data_source section, the globals section includes information such as the current simulation time, and random seed state--this is important for resuming saved simulation states without affecting the pseudorandom number generator. See Fig. 1. Where possible, we include information on physical units as XML tag attributes. We note that because this was initially a format developed for internal use, we have not been entirely consistent in our conventions; improvements are planned in future drafts of the file specification. For dimensionless quanti- ties, the scale should ideally be stated (e.g., as an additional XML attribute):
local oxygen units="dimensionless" scale="far-field"0.84local oxygen
In future drafts, we may include a new scales section to facilitate this.
The file format continues with a list structure of all the cells (cell_list), with essentially all internal cell variables (i.e., member data of the Cell class) listed clearly. We give each cell both a numeric type (cell_type_code) to assist comparing and classifying cells in software, and a human-readable type (cell_type_text) to assist data recipients with interpretting the data. See Fig. 2. Note that we have included "type" attributes to indicate boolean variables, rather than units. In future file version drafts, we may include both "type" and "units" attributes to all cell data fields. However, we can gen- erally assume that the presence of units indicates a non-boolean variable, and the precence of a boolean type obviates "units."
volume units="cubic microns"4130.00487398/volume
mature_volume units="cubic microns"4130.00487398/mature_volume
solid_volume units="cubic microns"413.000487398/solid_volume
BM_adhesion_max_distance units="x radius"1.214/BM_adhesion_max_distance
Due to historical reasons stemming from code development, each cell is split into cell properties and cell state sections; future versions of the data standard will likely merge these into a single cell state section, because many cell properties tend to change over time due to the cells' exposure to differing microenvironments.
Field variables and other final elements
After all data files have been listed, we include a global_variables section with a list of all saved field variables and file formation information. See Fig. 3. Note that we have included the full path of each data file; often all the files (including the XML file) are saved in the same directory, so postprocessing may need to strip part of the path by comparison to the filename filed in the data_source section. Due to the large size of 2-D and 3-D double-precision data arrays, we opted for a binary data format. For increased compatibility, we choose the MATLAB .MAT (Level 4) file format, which is relatively simple to implement directly from the published file format specification, and is simple to read and write with common open source software (e.g., Octave) as well as MATLAB. In the source code to follow, we include C++ code to read and write these MATLAB data.
format version="Level 4"MATLAB/format
format version="Level 4"MATLAB/format
Lastly, note that a primary goal of our specification is to make the for- mat as human-readable as possible, rendering the format (partially) "self- documenting". This will make it simpler to interpret archived data long after the originating software is out of use, thus eliminating the need for reverse engineering-hence our choice of human-readable, non-binary data. While this results in much larger files, we regard data compression as a separate software problem from the specification of content. Compression can readily be applied to the data files after creation with widespread open source libraries, such as gzip.