Properties can be stored in memory in a two-stage table with only 7 Unicode character properties required in RL1.2 Performance-is the two-stage table, discussed in Chapter 5 of The A common mechanismįor reducing the memory requirements-while still maintaining Which added the euro sign currency symbol.Īt any level, efficiently handling properties or conditions based onĪ large character set can take a lot of memory. That a regular expression that tests for currency symbols, forĮxample, has different results in Unicode 2.0 than in Unicode 2.1, Note: The Unicode Standard is constantly evolving: newĬharacters will be added in the future. Provision should be made for the syntax to be extended in the future Even if higher-level support is not currently offered, One of the most important requirements for a regular expressionĮngine is to document clearly what Unicode features are and are not However, some of the subitems in Level 2 are more Level 2 is recommended for implementations that need to.All regex implementations dealing with Unicode should be at Level 1 is the minimally useful level of support for.Regular-expression writer needing to know about some of theĬomplications of Unicode encoding structure. Support for end-user expectations than the raw level 1, without the Level-independent of country or language-but provides much better Word boundaries, and canonical equivalence. At this level, the regular expressionĮngine also accounts for extended grapheme clusters (what theĮnd-user generally thinks of as a character), better detection of The user of the regular expression engine would need to write moreĬomplicated regular expressions to do full Unicode processing.Įxtended Unicode Support. The results of regular expression matchingĪt this level are independent of country or language. It does not account for end-userĮxpectations for character support, but does satisfy most low-level UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, or UTF-32LE.) This is a minimal (This is independent of the actual serialization of Unicode as Provides support for Unicode characters as basic logical units. At this level, the regular expression engine There are three fundamental levels of Unicode support that canīe offered by regular expression engines: Have very different characteristics than English or other western Unicode encompasses a wide variety of languages which can.That are only adapted to handle small character sets will not scale Unicode is a large character set-regular expression engines.The following issues are involved in supporting Unicode. Starting in 1999, this document has supplied guidelines and conformance levels for supporting Unicode in regular expressions. They are a key component of many programming languages, databases, and spreadsheets. Regular expressions are a powerful tool for using patterns to search and modify text. Resolving Character Classes with Strings and ComplementĪnnex E: Notation for Properties of Strings Sample Collation Grapheme Cluster Code (Retracted) 3 Tailored Support: Level 3 (Retracted).Grapheme Clusters and Character Classes with Strings 1.2.6 Script and Script Extensions Properties.įor more information about versions of the Unicode Standard, see. įor a list of current Unicode Technical Reports, see. Related information that is useful in understanding this document is found in theįor the latest version of the Unicode Standard, see. Please submit corrigenda and other comments with the online reporting Material or cited as a normative reference by other specifications.Ī Unicode Technical Standard (UTS) is an independent specification.Ĭonformance to the Unicode Standard does not imply conformance to any UTS. This is a stable document and may be used as reference Interested parties, and has been approved for publication by the UnicodeĬonsortium. This document has been reviewed by Unicode members and other Regular expression engines to use Unicode. This document describes guidelines for how to adapt Unicode® Technical Standard #18 Unicode Regular Expressions Version
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |