Allowed Characters for System Interoperability Data Standard

Intended Audience and Contact Information

Contact	Chief Data Officer, Office of the CIO
Intended audience	Internal UBC
UDM Domain	Multiple Domain
Data Standard ID	DG0094

Purpose

This standard aims to achieve consistency around entry of data and the allowed characters in business data fields where the information is being passed between applications and/or is used for reporting.

Conformance to this standard is required to identify and relate to the business context, control the usage of invalid characters and to avoid script injection attacks e.g. SQL injection, Cross-Site Scripting (XSS), or Homoglyph. Properly entered data will also result in meaningful reporting and analysis.

This standard is derived by UBC and applies to data fields in any application where the information is being shared with other applications or is being used for reporting purposes. Exceptions are listed in the dispensation section below.

Please note that this data standard is a temporary measure to address issues with special characters that are not recognized by downstream systems, as well as to provide guidelines on the transliteration of special characters in systems that cannot consume those characters. Some examples of scenarios where this data standard would apply include, but are not limited to, Legal Name, Program of Study Name, and Course Subject Name fields where the data is passed from one system to another or is being used in reporting.

Allowed Characters in Data Fields

Plain characters (i.e., Alphanumeric) will be used in data fields except those that Enterprise Data Governance has approved for the use of accented characters. This will be captured within each relevant data standard. For those fields that are approved for accented characters, valid characters include:

Letters: a-z and A-Z
Numerals: 0-9
Symbols: apostrophe, hyphen, period, space
Accents on specified uppercase or lowercase characters: acute, grave, circumflex, umlaut, and cedilla as shown below:

Accent Mark	Characters
Acute	Á á É é Í í Ó ó Ú ú Ý ý
Grave	À à È è Ì ì Ò ò Ù ù
Circumflex	Â â Ê ê Î î Ô ô Û û
Umlaut	Ä ä Ë ë Ï ï Ö ö Ü ü
Cedilla	Ç ç

Note: Brackets (), slashes / or other symbols are not accepted.

Consumption of Accented Characters by Downstream Systems

For downstream applications that do not accept the valid characters above, a corresponding attribute using plain characters should be created to pass this data. The following is the naming convention for this attribute:

Plain<AttributeName>name

Name Element	Description	Allowed Characters
Attribute Name	The name of the relevant attribute that allows the use of accented characters.	Alpha characters Allowed symbols include: apostrophe, hyphen, period, space

The accented characters listed below should be converted to plain characters in the attribute as follows:

	Accented Characters
	Á á	É é	Í í	Ó ó	Ú ú	Ý ý	Ç ç
	À à	È è	Ì ì	Ò ò	Ù ù
	Â â	Ê ê	Î î	Ô ô	Û û
	Ä ä	Ë ë	Ï ï	Ö ö	Ü ü
Plain Character Transliteration	A a	E e	I i	O o	U u	Y y	C c

Free-Form Entry in Text Fields

The following rules should be observed for free-form entry in text fields:

Use plain language. Any user should be able to see text within a data field in context and understand the meaning.
Letters or ideographs are allowed in the input, i.e. American Standard Code for Information Interchange (ASCII) character set. Emoji or mathematical symbols are not allowed.
Abbreviations are not allowed except for acronyms approved by Enterprise Data Governance as outlined within each individual data standard. Abbreviations can obscure the meaning of a term.
Do not use the ampersand "&" character. Use "and" instead.
Avoid hyphens. Use them only if a user would type them in a search. If you need to use them as a separator, put a space before and after the hyphen.

Compliance

The above standard must be complied with at every stage of the data lifecycle with the exception of any dispensations (see Dispensation section).

All applications must collect data as recommended in this standard.
Enterprise Data Integration must adopt this standard.

Reference Data Compliance for Data Integration

The use of accepted reference data values in this standard for data integration among applications must comply with the enterprise integration pattern of leveraging the reference data common service API (Application Programming Interface) published in UBC MuleSoft Exchange.

Any application that intends to access real-time, case-level reference data should have the application owner or manager complete and submit a Request API Access form.

Mapping of Invalid Values from System(s) of Record (SoR) to Common Services

A common service can only accommodate standard reference data enumerations that are available in the SoR as approved by the Data Governance Steering Committee or Data Trustee.

A reference data value that does not match any of the standard reference data value enumerations is considered ‘invalid’. Any records from a SoR containing an invalid reference data value for a given data element or attribute must be mapped as an ‘empty’ value in common service(s). Where a reference data value may potentially have the same meaning as a standard enumeration but named differently in the system of record can be corrected to match the appropriate standard enumeration. Please consult with the EDG team in such cases.

Additional reference data values in a SoR that are not part of the standard reference data enumerations are to be omitted in the common service.

Dispensation

Legacy systems are exempt from this data standard. As systems are replaced, adoption of this standard is required. Examples of legacy systems are:

Student Information System (SIS)

For any compliance questions or requests for a temporary dispensation, please contact the Enterprise Data Governance Team.