데이터 구조의 해싱 - DS 튜토리얼

데이터 구조의 해싱 소개:

해싱은 대규모 데이터 세트를 고정 길이 값으로 매핑하는 것과 관련된 컴퓨터 과학에서 널리 사용되는 기술입니다. 가변 크기의 데이터 세트를 고정 크기의 데이터 세트로 변환하는 프로세스입니다. 효율적인 조회 작업을 수행하는 능력은 해싱을 데이터 구조의 필수 개념으로 만듭니다.

int를 문자로

해싱이란 무엇입니까?

해싱 알고리즘은 입력(예: 문자열 또는 정수)을 고정 크기 출력(해시 코드 또는 해시 값이라고 함)으로 변환하는 데 사용됩니다. 그런 다음 이 해시 값을 배열 또는 해시 테이블의 인덱스로 사용하여 데이터를 저장하고 검색합니다. 해시 함수는 결정적이어야 하며, 이는 주어진 입력에 대해 항상 동일한 결과를 산출함을 보장합니다.

해싱은 일반적으로 데이터 조각에 대한 고유 식별자를 생성하는 데 사용되며, 이는 대규모 데이터 세트에서 해당 데이터를 빠르게 조회하는 데 사용할 수 있습니다. 예를 들어, 웹 브라우저는 해싱을 사용하여 웹 사이트 비밀번호를 안전하게 저장할 수 있습니다. 사용자가 비밀번호를 입력하면 브라우저는 이를 해시 값으로 변환하고 저장된 해시 값과 비교하여 사용자를 인증합니다.

해시 키란 무엇입니까?

해싱과 관련하여 해시 키(해시 값 또는 해시 코드라고도 함)는 해싱 알고리즘에 의해 생성된 고정 크기의 숫자 또는 영숫자 표현입니다. 이는 해싱이라는 프로세스를 통해 텍스트 문자열이나 파일과 같은 입력 데이터에서 파생됩니다.

해싱에는 입력 데이터에 특정 수학 함수를 적용하는 작업이 포함되며, 이는 입력 크기에 관계없이 일반적으로 고정 길이의 고유한 해시 키를 생성합니다. 결과 해시 키는 본질적으로 원본 데이터의 디지털 지문입니다.

해시 키는 여러 가지 용도로 사용됩니다. 입력 데이터가 조금만 변경되어도 상당히 다른 해시 키가 생성되므로 데이터 무결성 검사에 일반적으로 사용됩니다. 해시 키는 빠른 조회 및 비교 작업을 허용하므로 해시 테이블이나 데이터 구조에서 효율적인 데이터 검색 및 저장에도 사용됩니다.

해싱은 어떻게 작동하나요?

해싱 프로세스는 세 단계로 나눌 수 있습니다.

입력: 해싱할 데이터가 해싱 알고리즘에 입력됩니다.
해시 함수: 해싱 알고리즘은 입력 데이터를 사용하고 수학 함수를 적용하여 고정 크기 해시 값을 생성합니다. 해시 함수는 입력 값에 따라 다른 해시 값이 생성되고, 입력의 작은 변화가 출력의 큰 변화를 가져오도록 설계되어야 합니다.
출력: 데이터 구조에서 데이터를 저장하거나 검색하기 위한 인덱스로 사용되는 해시 값이 반환됩니다.

해싱 알고리즘:

수많은 해싱 알고리즘이 있으며 각각 뚜렷한 장점과 단점이 있습니다. 가장 널리 사용되는 알고리즘은 다음과 같습니다.

MD5: 128비트 해시 값을 생성하는 널리 사용되는 해싱 알고리즘입니다.
SHA-1: 160비트 해시 값을 생성하는 널리 사용되는 해싱 알고리즘입니다.
SHA-256: 256비트 해시 값을 생성하는 보다 안전한 해싱 알고리즘입니다.

해시 함수:

해시 함수: 해시 함수는 입력(또는 키)을 받아 해시 코드 또는 해시 값으로 알려진 고정 크기 결과를 출력하는 일종의 수학 연산입니다. 해시 함수는 결정적이기 위해 항상 동일한 입력에 대해 동일한 해시 코드를 생성해야 합니다. 또한 해시 함수는 각 입력에 대해 해시 속성이라고 하는 고유한 해시 코드를 생성해야 합니다.

다음을 포함하여 다양한 유형의 해시 함수가 있습니다.

분할 방법:

이 방법에는 키를 테이블 크기로 나누고 나머지를 해시 값으로 사용하는 방법이 포함됩니다. 예를 들어 테이블 크기가 10이고 키가 23인 경우 해시 값은 3(23% 10 = 3)이 됩니다.

곱셈 방법:

이 방법에는 키에 상수를 곱하고 제품의 소수 부분을 해시 값으로 사용하는 작업이 포함됩니다. 예를 들어 키가 23이고 상수가 0.618인 경우 해시 값은 2(floor(10*(0.61823 - Floor(0.61823))) = Floor(2.236) = 2)가 됩니다.

범용 해싱:

이 방법에는 해시 함수 계열의 무작위 해시 함수를 사용하는 작업이 포함됩니다. 이렇게 하면 해시 함수가 특정 입력에 편향되지 않고 공격에 저항할 수 있습니다.

충돌 해결

해싱의 주요 과제 중 하나는 두 개 이상의 입력 값이 동일한 해시 값을 생성할 때 발생하는 충돌을 처리하는 것입니다. 충돌을 해결하는 데 사용되는 다양한 기술은 다음과 같습니다.

연결: 이 기술에서 각 해시 테이블 슬롯에는 동일한 해시 값을 가진 모든 값의 연결된 목록이 포함됩니다. 이 기술은 간단하고 구현하기 쉽지만 연결된 목록이 너무 길어지면 성능이 저하될 수 있습니다.
개방형 주소 지정: 이 기술에서는 충돌이 발생하면 알고리즘이 빈 슬롯을 찾을 때까지 연속 슬롯을 탐색하여 해시 테이블에서 빈 슬롯을 검색합니다. 이 기술은 부하율이 낮을 때는 체이닝보다 효율적일 수 있지만, 부하율이 높을 때는 클러스터링이 발생하고 성능이 저하될 수 있습니다.
이중 해싱: 이는 충돌이 발생할 때 조사할 다음 슬롯을 결정하기 위해 두 번째 해시 함수를 사용하는 개방형 주소 지정의 변형입니다. 이 기술은 클러스터링을 줄이고 성능을 향상시키는 데 도움이 될 수 있습니다.

충돌 해결의 예

크기가 5인 해시 테이블의 예를 계속 진행해 보겠습니다. 해시 테이블에 키-값 쌍 'John: 123456' 및 'Mary: 987654'를 저장하려고 합니다. 두 키 모두 동일한 해시 코드 4를 생성하므로 충돌이 발생합니다.

더블 자바가 뭐야?

연쇄를 사용하여 충돌을 해결할 수 있습니다. 인덱스 4에 연결된 목록을 만들고 키-값 쌍을 목록에 추가합니다. 이제 해시 테이블은 다음과 같습니다.

삼:

이진 트리 중위 순회

4: 존: 123456 -> 메리: 987654

해시 테이블:

해시 테이블은 데이터를 배열에 저장하는 데이터 구조입니다. 일반적으로 해시 테이블에 들어갈 수 있는 요소 수보다 큰 배열 크기가 선택됩니다. 키는 해시 함수를 사용하여 배열의 인덱스에 매핑됩니다.

해시 함수는 새 요소를 추가하기 위해 해시 테이블에 요소를 삽입해야 하는 인덱스를 찾는 데 사용됩니다. 충돌이 없으면 요소가 해당 인덱스에 추가됩니다. 충돌이 있으면 충돌 해결 방법을 사용하여 배열에서 사용 가능한 다음 슬롯을 찾습니다.

해시 함수는 해시 테이블에서 요소를 검색하기 위해 저장된 요소의 인덱스를 찾는 데 사용됩니다. 해당 인덱스에서 요소를 찾을 수 없으면 충돌 해결 방법을 사용하여 연결된 목록(체인이 사용되는 경우) 또는 다음 사용 가능한 슬롯(오픈 주소 지정이 사용되는 경우)에서 요소를 검색합니다.

해시 테이블 작업

해시 테이블에서 수행할 수 있는 작업은 다음과 같습니다.

삽입: 해시 테이블에 새로운 키-값 쌍을 삽입합니다.
삭제: 해시 테이블에서 키-값 쌍을 제거합니다.
검색: 해시 테이블에서 키-값 쌍을 검색합니다.

해시 테이블 생성:

해싱은 빠른 데이터 삽입, 삭제 및 검색을 가능하게 하는 데이터 구조인 해시 테이블을 구축하는 데 자주 사용됩니다. 해시 테이블을 구성하는 각 버킷 배열에는 하나 이상의 키-값 쌍이 저장될 수 있습니다.

해시 테이블을 생성하려면 먼저 각 키를 배열의 고유 인덱스에 매핑하는 해시 함수를 정의해야 합니다. 간단한 해시 함수는 키에 있는 문자의 ASCII 값의 합계를 취하고 배열 크기로 나눈 나머지를 사용하는 것입니다. 그러나 이 해시 함수는 비효율적이며 충돌(동일한 인덱스에 매핑되는 두 개의 키)로 이어질 수 있습니다.

충돌을 피하기 위해 배열 전체에 해시 값을 보다 균일하게 분포시키는 고급 해시 함수를 사용할 수 있습니다. 널리 사용되는 알고리즘 중 하나는 비트 단위 연산을 사용하여 해시 값을 생성하는 djb2 해시 함수입니다.

 unsigned long hash(char* str) { unsigned long hash = 5381; int c; while (c = *str++) { hash = ((hash &lt;&lt; 5) + hash) + c; } return hash; }

이 해시 함수는 문자열을 입력으로 사용하고 부호 없는 긴 정수 해시 값을 반환합니다. 이 함수는 해시 값 5381을 초기화한 다음 비트 연산을 사용하여 문자열의 각 문자를 반복하여 새 해시 값을 생성합니다. 최종 해시 값이 반환됩니다.

C++의 해시 테이블

C++에서 표준 라이브러리는 unordered_map이라는 해시 테이블 컨테이너 클래스를 제공합니다. unordered_map 컨테이너는 해시 테이블을 사용하여 구현되며 키-값 쌍에 대한 빠른 액세스를 제공합니다. unordered_map 컨테이너는 해시 함수를 사용하여 키의 해시 코드를 계산한 다음 개방형 주소 지정을 사용하여 충돌을 해결합니다.

C++에서 unordered_map 컨테이너를 사용하려면 헤더 파일을 포함해야 합니다. 다음은 C++에서 unordered_map 컨테이너를 만드는 방법에 대한 예입니다.

 #include #include int main() { // create an unordered_map container std::unordered_map my_map; // insert some key-value pairs into the map my_map[&apos;apple&apos;] = 10; my_map[&apos;banana&apos;] = 20; my_map[&apos;orange&apos;] = 30; // print the value associated with the &apos;banana&apos; key std::cout &lt;&lt; my_map[&apos;banana&apos;] &lt;&lt; std::endl; return 0; }

설명:

이 프로그램은 해시 테이블을 사용하여 구현되고 키-값 쌍에 대한 빠른 액세스를 제공하는 C++의 unordered_map 컨테이너 사용을 보여줍니다.
첫째, 프로그램에는 필요한 헤더 파일이 포함되어 있습니다.
그런 다음 프로그램은 문자열 키와 정수 값을 포함하는 my_map이라는 빈 unordered_map 컨테이너를 만듭니다. 이는 std::unordered_map my_map 구문을 사용하여 수행됩니다.
다음으로 프로그램은 [] 연산자를 사용하여 세 개의 키-값 쌍을 my_map 컨테이너에 삽입합니다. 값이 10인 'apple', 값이 20인 'banana', 값이 30인 'orange'입니다.
이는 my_map['apple'] = 10;, my_map['banana'] = 20; 및 my_map['orange'] = 30; 구문을 사용하여 수행됩니다. 각기.
마지막으로 프로그램은 [] 연산자와 std::cout 객체를 사용하여 'banana' 키와 연관된 값을 인쇄합니다.

프로그램 출력:

25c ~ k

해시 테이블에 데이터 삽입

해시 테이블에 키-값 쌍을 삽입하려면 먼저 키-값 쌍을 저장할 배열에 대한 인덱스가 필요합니다. 다른 키가 동일한 인덱스에 매핑되면 충돌이 발생하므로 이를 적절하게 처리해야 합니다. 일반적인 방법 중 하나는 배열의 각 버킷에 동일한 해시 값을 가진 키-값 쌍의 연결된 목록이 포함되는 체인을 사용하는 것입니다.

다음은 체인을 사용하여 해시 테이블에 키-값 쌍을 삽입하는 방법의 예입니다.

 typedef struct node { char* key; int value; struct node* next; } node; node* hash_table[100]; void insert(char* key, int value) { unsigned long hash_value = hash(key) % 100; node* new_node = (node*) malloc(sizeof(node)); new_node-&gt;key = key; new_node-&gt;value = value; new_node-&gt;next = NULL; if (hash_table[hash_value] == NULL) { hash_table[hash_value] = new_node; } else { node* curr_node = hash_table[hash_value]; while (curr_node-&gt;next != NULL) { curr_node = curr_node-&gt;next; } curr_node-&gt;next = new_node; } }

설명:

먼저, 해시 테이블의 단일 노드를 나타내는 node라는 구조체가 정의됩니다.
각 노드에는 키를 저장하는 char* 키, 관련 값을 저장하는 int 값, 연결된 목록을 사용하여 해시 테이블의 충돌을 처리하기 위해 호출되는 다른 노드에 대한 포인터 등 세 가지 멤버가 있습니다.
hash_table이라는 노드 포인터 배열은 크기 100으로 선언됩니다. 이 배열은 해시 테이블의 요소를 저장하는 데 사용됩니다.
insert 함수는 char* 키와 int 값을 매개변수로 사용합니다.
이는 프로그램의 다른 곳에서 정의된 것으로 가정되는 해시 함수를 사용하여 주어진 키에 대한 해시 값을 계산하는 것부터 시작합니다.
그런 다음 해시 값은 모듈러스 연산자 % 100을 사용하여 hash_table 배열의 크기에 맞게 줄어듭니다.
동적 메모리 할당(malloc(sizeof(node)))을 사용하여 새 노드가 생성되고 해당 멤버(key, value 및 next)에는 각각 제공된 키, 값 및 NULL이 할당됩니다.
hash_table 배열의 해당 슬롯이 비어 있으면(NULL) 충돌이 발생하지 않았음을 나타내며 새 노드가 해당 슬롯에 직접 할당됩니다(hash_table[hash_value] = new_node).

그러나 hash_table 배열의 해당 인덱스에 이미 노드가 있는 경우 함수는 충돌을 처리해야 합니다. 현재 노드(hash_table[hash_value])부터 시작하여 연결 리스트를 순회하고 끝에 도달할 때까지 다음 노드로 이동합니다(curr_node->next != NULL). 목록의 끝에 도달하면 새 노드가 다음 노드로 추가됩니다(curr_node->next = new_node).

C++에서 해싱 구현:

충돌 해결을 위해 개방형 주소 지정과 선형 조사를 사용하여 C++에서 해싱을 구현하는 방법을 살펴보겠습니다. 정수를 저장하는 해시 테이블을 구현하겠습니다.

 #include using namespace std; const int SIZE = 10; class HashTable { private: int arr[SIZE]; public: HashTable() { for (int i = 0; i <size; i++) { arr[i]="-1;" } int hashfunction(int key) return key % size; void insert(int index="hashFunction(key);" i="0;" while (arr[(index + i) size] !="-1" && arr[(index i++; if cout << 'element already exists in the hash table' endl; else remove(int deleted from return; not found display() for (int < (arr[i]="=" -1 || -2) continue; 'index ' ': }; main() hashtable ht; ht.insert(5); ht.insert(15); ht.insert(25); ht.insert(35); ht.insert(45); ht.display(); ht.remove(15); ht.remove(10); ht.insert(55); 0; pre> <p> <strong>Explanation:</strong> </p> <ul> <li>This program implements a hash table data structure using linear probing to handle collisions.</li> <li>A hash table is a data structure that stores data in key-value pairs, where the keys are hashed using a hash function to generate an index in an array. This allows for constant-time average-case complexity for inserting, searching, and deleting elements from the hash table.</li> <li>The HashTable class has a private integer array arr of size SIZE, which is initialized to -1 in the constructor. The hash function method takes an integer key and returns the hash value of the key, which is simply the remainder of the key when divided by SIZE.</li> <li>The insert method takes an integer key and uses the hash function to get the index where the key should be inserted.</li> <li>If the index is already occupied by another key, linear probing is used to find the next available index in the array. Linear probing checks the next index in the array until it finds an empty slot or the key itself.</li> <li>If the key is already in the hash table, the method displays a message saying that the element already exists. Otherwise, it inserts the key at the calculated index.</li> <li>The remove method takes an integer key and uses the hash function to get the index where the key is located.</li> <li>If the key is not in the calculated index, linear probing is used to search for the key in the next indices in the array. Once the key is found, it is deleted by setting its value to -2.</li> <li>If the key is not found in the hash table, the method displays a message saying that the element is not found.</li> <li>The display method simply iterates through the array and prints out all non-empty key-value pairs.</li> <li>In the main function, an instance of the HashTable class is created, and several integers are inserted into the hash table using the insert method.</li> <li>Then, the display method is called to show the contents of the hash table. The remove method is called twice, first to remove an element that exists in the hash table and then to remove an element that does not exist.</li> <li>The display method is called after each remove method call to show the updated contents of the hash table.</li> <li>Finally, another integer is inserted into the hash table, and the display method is called again to show the final contents of the hash table.</li> </ul> <p> <strong>Program Output:</strong> </p> <img src="//techcodeview.com/img/ds-tutorial/65/hashing-data-structure-3.webp" alt="Hashing in Data Structure"> <h2>Applications of Hashing</h2> <p>Hashing has many applications in computer science, including:</p> <ul> <li>Databases: Hashing is used to index and search large databases efficiently.</li> <li>Cryptography: Hash functions are used to generate message digests, which are used to verify the integrity of data and protect against tampering.</li> <li>Caching: Hash tables are used in caching systems to store frequently accessed data and improve performance.</li> <li>Spell checking: Hashing is used in spell checkers to quickly search for words in a dictionary.</li> <li>Network routing: Hashing is used in load balancing and routing algorithms to distribute network traffic across multiple servers.</li> </ul> <h2>Advantages of Hashing:</h2> <ul> <li>Fast Access: Hashing provides constant time access to data, making it faster than other data structures like linked lists and arrays.</li> <li>Efficient Search: Hashing allows for quick search operations, making it an ideal data structure for applications that require frequent search operations.</li> <li>Space-Efficient: Hashing can be more space-efficient than other data structures, as it only requires a fixed amount of memory to store the hash table.</li> </ul> <h2>Limitations of Hashing:</h2> <ul> <li>Hash Collisions: Hashing can produce the same hash value for different keys, leading to hash collisions. To handle collisions, we need to use collision resolution techniques like chaining or open addressing.</li> <li>Hash Function Quality: The quality of the hash function determines the efficiency of the hashing algorithm. A poor-quality hash function can lead to more collisions, reducing the performance of the hashing algorithm.</li> </ul> <h2>Conclusion:</h2> <p>In conclusion, hashing is a widely used technique in a data structure that provides efficient access to data. It involves mapping a large amount of data to a smaller hash table using a hash function, which reduces the amount of time needed to search for specific data elements. The hash function ensures that data is stored in a unique location within the hash table and allows for easy retrieval of data when needed.</p> <p>Hashing has several advantages over other data structure techniques, such as faster retrieval times, efficient use of memory, and reduced collisions due to the use of a good hash function. However, it also has some limitations, including the possibility of hash collisions and the need for a good hash function that can distribute data evenly across the hash table.</p> <p>Overall, hashing is a powerful technique that is used in many applications, including database indexing, spell-checking, and password storage. By using a good hash function and implementing appropriate collision resolution techniques, developers can optimize the performance of their applications and provide users with a seamless experience.</p> <hr></size;>