문자셋(Character Set)

Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

Kim's Programming

문자셋(Character Set) 본문

Programming/System Programming

문자셋(Character Set)

Programmer. 2017. 6. 24. 16:32

1. Windows에서의 유니코드(UNICODE)

문자셋(Character Sets)의 종류와 특성

SBCS(Single Byte Character Set)
-> Single Byte라는 이름이 의미하듯이 문자를 표현하는 데 있어서 1바이트만을 사용하는 방식이다. 아스키 코드가 이에 해당한다.
MBCS(Multi Byte Character Set)
-> Multi Byte라는 이름이 의미하듯이 문자를 표현하는 데 있어서 동일한 바이트 수를 적용하는 것이 아니라, 다양한 바이트 수를 사용해서 문자를 표현하는 방식이다. 어떤 문자는 1바티으로 표현하고 어떤 문자는 2바이트로 표현한다.
WBCS(Wide Byte Character Set)
->유니코드가 이에 해당하고 모든 문자를 2바이트로 처리하는 문자셋이다.

* MBCS 기반의 문자열

1
2
3
4
5
6
7
8
9
10
11
12
13
#include<iostream>
#include<cstring>
 
int main(int argc, char* argv[])
{
    char str[] = "ABC한글";
    int size = sizeof(str);
    int len = strlen(str);
 
    std::cout << "size of Array = " << size << std::endl;
    std::cout << "length of string = " << len << std::endl;
    return 0;
}
Colored by Color Scripter
cs

*WBCS 기반의 문자열

1
2
3
4
5
6
7
8
9
10
11
12
13
#include<iostream>
#include<cstring>
 
int wmain(int argc, wchar_t* argv[])
{
    wchar_t str[] = L"ABC한글";
    int size = sizeof(str);
    int len = wcslen(str);
 
    std::cout << "size of Array = " << size << std::endl;
    std::cout << "length of string = " << len << std::endl;
    return 0;
}
Colored by Color Scripter
cs

2. MBCS와 WBCS의 동시 지원

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#include<iostream>
#include<Windows.h>
#include<cstring>
#include<tchar.h>
 
INT _tmain(INT argc, TCHAR* argv[])
{
    TCHAR str[] = _T("ABC한글");
    INT size = sizeof(str);
    INT len = _tcslen(str);
 
    std::wcout << _T("size of Array = ") << size << std::endl;
    std::wcout << _T("length of string = ") << len << std::endl;
    return 0;
}
Colored by Color Scripter
cs

결과는 WBCS와 같다.

WIndows.h 에는 TCHAR, _T() 매크로등을 이용하면 MBCS와 WBCS에서 동시에 사용할 수 있다. 내부 구조를 살펴보면 printf 기준으로 _tprintf()를 사용했을 때 Unicode를 사용할 때는 유니코드에 맞는 함수인 _wprintf()로 바뀌고 멀티바이트를 사용할 때면 print로 바뀐다. 양쪽으로 사용하기 위한 함수들 앞에는 대부분 함수 앞에 _tc 또는 _t가 붙어있다.

저작자표시 비영리 동일조건

'Programming > System Programming' 카테고리의 다른 글

프로세스의 생성 (0)	2017.06.24
IPC(Inter-Process Communication) - MailSlot (0)	2016.02.25
프로세스 예제 (0)	2016.02.05
커널 오브젝트 그리고 Usage Count (0)	2016.01.01
구조적 예외처리 - SEH(Structured Exception Handling) (0)	2015.12.30

'Programming/System Programming' Related Articles

Kim's Programming

문자셋(Character Set) 본문

문자셋(Character Set)

'Programming > System Programming' 카테고리의 다른 글

티스토리툴바