libgpac
Documentation of the core library of GPAC
Unicode and UTF

UTF and Unicode-related functions. More...

+ Collaboration diagram for Unicode and UTF:

Macros

#define GF_UTF8_FAIL   0xFFFFFFFF
 

Functions

u32 gf_utf8_wcstombs (char *dst, size_t dst_len, const unsigned short **srcp)
 wide-char to multibyte conversion More...
 
u32 gf_utf8_mbstowcs (unsigned short *dst, size_t dst_len, const char **srcp)
 multibyte to wide-char conversion More...
 
u32 gf_utf8_wcslen (const unsigned short *s)
 wide-char string length More...
 
GF_Err gf_utf_get_string_from_bom (const u8 *data, u32 size, char **out_ptr, char **result, u32 *res_size)
 returns a string from a string started with BOM More...
 
Bool gf_utf8_is_legal (const u8 *data, u32 size)
 Checks validity of a UTF8 string. More...
 
Bool gf_utf8_reorder_bidi (u16 *utf_string, u32 len)
 string bidi reordering More...
 
u32 utf8_to_ucs4 (u32 *ucs4_buf, u32 utf8_len, unsigned char *utf8_buf)
 Unicode conversion from UTF-8 to UCS-4. More...
 

Variables

static const u32 UTF8_MAX_BYTES_PER_CHAR = 4
 

Detailed Description

This section documents the UTF functions of the GPAC framework.
The wide characters in GPAC are unsignad shorts, in other words GPAC only supports UTF8 and UTF16 coding styles.

Note
these functions are just ports of libutf8 library tools into GPAC.

Macro Definition Documentation

◆ GF_UTF8_FAIL

#define GF_UTF8_FAIL   0xFFFFFFFF

error code for UTF-8 conversion errors

Function Documentation

◆ gf_utf8_wcstombs()

u32 gf_utf8_wcstombs ( char *  dst,
size_t  dst_len,
const unsigned short **  srcp 
)

Converts a wide-char string to a multibyte string

Parameters
dstmultibyte destination buffer
dst_lenmultibyte destination buffer size
srcpaddress of the wide-char string. This will be set to the next char to be converted in the input buffer if not enough space in the destination, or NULL if conversion was completed.
Returns
length (in byte) of the multibyte string or GF_UTF8_FAIL if error.
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ gf_utf8_mbstowcs()

u32 gf_utf8_mbstowcs ( unsigned short *  dst,
size_t  dst_len,
const char **  srcp 
)

Converts a multibyte string to a wide-char string

Parameters
dstwide-char destination buffer
dst_lenwide-char destination buffer size
srcpaddress of the multibyte character buffer. This will be set to the next char to be converted in the input buffer if not enough space in the destination, or NULL if conversion was completed.
Returns
length (in unsigned short) of the wide-char string or GF_UTF8_FAIL if error.
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ gf_utf8_wcslen()

u32 gf_utf8_wcslen ( const unsigned short *  s)

Gets the length in character of a wide-char string

Parameters
sthe wide-char string
Returns
the wide-char string length
+ Here is the caller graph for this function:

◆ gf_utf_get_string_from_bom()

GF_Err gf_utf_get_string_from_bom ( const u8 data,
u32  size,
char **  out_ptr,
char **  result,
u32 res_size 
)

Returns string from data, potentially converting utf16 to utf8

Parameters
datathe string or wide-char string
sizeof the data buffer size of the data buffer
out_ptrset to an allocated buffer if needed for conversion, shall be destroyed by caller. Must not be NULL
resultset to resulting string. Must not be NULL
res_sizeset to length of resulting string. May be NULL
Returns
error if any: GF_IO_ERR if UTF decode error or GF_BAD_PARAM
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ gf_utf8_is_legal()

Bool gf_utf8_is_legal ( const u8 data,
u32  size 
)

Checks if a given byte sequence is a valid UTF-8 encoding

Parameters
datathe byte equence buffer
sizethe length of the byte sequence
Returns
GF_TRUE if valid UTF8, GF_FALSE otherwise
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ gf_utf8_reorder_bidi()

Bool gf_utf8_reorder_bidi ( u16 utf_string,
u32  len 
)

Performs a simple reordering of words in the string based on each word direction, so that glyphs are sorted in display order.

Parameters
utf_stringthe wide-char string
lenthe len of the wide-char string
Returns
1 if the main direction is right-to-left, 0 otherwise
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ utf8_to_ucs4()

u32 utf8_to_ucs4 ( u32 ucs4_buf,
u32  utf8_len,
unsigned char *  utf8_buf 
)
Parameters
ucs4_bufThe UCS-4 buffer to fill
utf8_lenThe length of the UTF-8 buffer
utf8_bufThe buffer containing the UTF-8 data
Returns
the length of the ucs4_buf. Note that the ucs4_buf should be allocated by parent and should be at least utf8_len * 4

This code has been adapted from http://www.ietf.org/rfc/rfc2640.txt Full Copyright Statement

Copyright (C) The Internet Society (1999). All Rights Reserved.

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.

The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.

This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Acknowledgement

Funding for the RFC Editor function is currently provided by the Internet Society.

+ Here is the caller graph for this function:

Variable Documentation

◆ UTF8_MAX_BYTES_PER_CHAR

const u32 UTF8_MAX_BYTES_PER_CHAR = 4
static

maximum character size in bytes