How to correctly format custom datasets

Customers that have purchased a Managed Services subscription together with a subscription of the National Data Service can send us requests for loading custom (often localised) data into their data catalog. This data will then be available to view in the Data Explorer of your Data Observatory and in Report Builder for ArcGIS.

To be able to load your custom data into your data catalog it must be provided in the correct format. Please use this MS Excel template to prepare your data. Instructions on how to format your data correctly within the template file are provided below.

If you require help filling in any part of the workbook, please contact the Esri UK Technical Support Team.

Indicators sheet

Specify the variables that are to be included in the dataset, and the properties of these variables, on this sheet. The information entered on this sheet (particularly the ID) will determine what data/information needs to be entered in the other sheets of the template.

ID – The name of the data variable. This will be used to refer to a variable throughout this spreadsheet. Please do not include spaces or special characters into the variable names (underscores and hyphens are allowed). To avoid duplicate indicator IDs in our system, please add your organisation name followed by an underscore as a prefix. IDs should not be longer than 51 characters. (Note: For variables where there are multiple years of data, please define the variable once only (for all years).)

Name – A name or short description of the data that the variable holds – this will be the name that the variable takes in the Data Explorer and Report Builder.

ShortName – A short version of the name. Some widgets in Report Builder use the short name of an indicator where space is limited.

Theme – Used to group similar variables together (e.g. a range of variables containing population data for different age groups are all assigned the theme ‘Population’).

DataType – the type of data that the variable will contain. Valid DataType options are:

  • Count – Numeric variable that can be aggregated to get figures at a higher level of geography (e.g. add up wards to get LA-level data)
  • NoAgg – Numeric variable that cannot be aggregated (e.g. a percentage)
  • Rate – Numeric variable – used where the variable is calculated from other variables in the dataset (e.g. percentage of females in the population of an area – calculated as number of females / number of persons). Therefore, Numerator and Denominator IDs should be supplied in the appropriate fields. The ScaleFactor should also be supplied if different from 100 (see below).
  • Categoric – Text or another non-numeric variable

ScaleFactor – only applicable for indicators of DataType Rate. A rate by default is set to be a percentage, if another rate is required, define the scale factor e.g. per 1000, ScaleFactor = 1000

Security – the level of security required for the variable. Security options available:

  • Public – Data is in the public domain, and can be freely shared with others

Numerator/Denominator – only applicable for indicators of DataType Rate. These fields are used to specify the variables in this dataset for calculating the rate (i.e. value of this variable = Numerator variable / Denominator variable). Enter the indicator IDs, not names.

DateFormat – used to specify the format of dates associated with the variable. This defines how the date for the published data will look like. The DateFormat options available are:

  • Monthly: yyyy-MM (for example 2018-01)
  • Monthly option 2: MMM yyyy (for example Jul 2018)
  • Multiple years: MYx (for example MY3 is 2018-20)
  • Financial Year: FY
  • Financial Quarter: FQ
  • Quarter: Q (example for instance 2020-08-01, display date will be 2020 Q3)
  • Multiple months: MMMyyyy12, MMMyyyyy8 etc. (example with MMMyyyy12 and instance date 2012-10-01, final display date will be Oct 12 – Sep 13)

Position – used to define the order of the indicators in the specific theme.

DecimalPlaces – for numeric variables, the number of decimal places that will be displayed when the variable is shown in the Data Explorer and Report Builder.

Suppress – Set to ‘Yes’ if you do not want the variable to be available in the Data Explorer and Report Builder (e.g. if the variable contains data that is used only to calculate values for another variable)

Metadata sheet

Please provide additional information about each variable in the dataset on this sheet.

ID – the name of the indicator (needs to be identical to the ID on the Indicators sheet). Having the same position or chronological order with IDs in indicator sheet is preferred.

For all subsequent fields, please refer to this table on metadata elements. Please note that some of the elements listed on this page (e.g. Spatial, Temporal) will be calculated from the available data and do not need to be provided in the metadata sheet.

Important: Due to an Excel limitation, the Description cannot be longer than 32,767 characters.

While not all metadata elements are mandatory, it is the responsibility of the customer to provide enough metadata to comply with any attribution requirements and to ensure their users have the information they will need to understand the data.

Data sheets

These are the additional sheets that contain the actual data values. Include a separate sheet for each geography you are supplying data for. Data is read “as is” so the value entered in the spreadsheet shall be the value that appears in the database. For example, a cell formatted to display as 90% in Excel, the actual underlying value that would be loaded into the database would be 0.9 not 90.

Give the sheet the correct name as follows:

  • LSOA – Lower Layer Super Output Area 2011
  • LSOA2021 – Lower Layer Super Output Area 2021
  • MSOA – Middle Layer Super Output Area 2011
  • MSOA2021 – Middle Layer Super Output Area 2021
  • Ward – Wards
  • LTLA – Lower Tier Local Authority
  • UTLA – Upper Tier Local Authority
  • Rgn – Region
  • Ctry – Country

If you provide data for one or more existing custom geographies, please use the correct layer ID. If you don’t know the layer ID of your custom geography layer, please ask the Esri UK Technical Support Team.

Retain only the sheets for the geographies that you require (e.g. if your data only contains breakdowns to UTLA level, you should delete the LSOA, MSOA and LTLA sheets).

GeoID – The geographic identifier for the area that the data point refers to. These should be the nine-character geographic codes used/defined by ONS (e.g. E92000001 for England, E07000008 for the district of Cambridge, E06000002 for the unitary authority of Middlesborough).

More information on these codes is available on the ONS Geography Linked Data site: http://statistics.data.gov.uk/

There should be one unique geographic code per row.

Data columns – The column headers for the other columns in this sheet define the data that the column will contain – these are made up of the indicator ID (from the Indicators sheet), a pipe symbol (| – Shift and backslash) and the date/year to which the data refers in yyyy-mm-dd format. i.e. IndicatorID|yyyy-mm-dd

Where you have a time series for a variable (i.e. data for multiple years), each year of data should be put into a separate column.

For rate indicators e.g. percentages, please do not include any % symbols or similar unit notations in the value cells. Please ensure rates are already multiplied by the scale factor e.g. for percentages use numerator / denominator * 100. The ScaleFactor from the Indicator sheet is only used for aggregations of your data to other geography layers. It will not be applied for layers you supply the data for.

A single data sheet should not contain more than 255 columns. (including the GeoID column) If you have more columns per layer, please split the datasheet into multiple sheets, naming them with a number index behind the layer name e.g. UTLA1, UTLA2, etc.