完美世界辰东,欢乐颂第二季,玄幻小说排行榜

主頁(yè) > 知識(shí)庫(kù) > Trie樹(shù)_字典樹(shù)(字符串排序)簡(jiǎn)介及實(shí)現(xiàn)

Trie樹(shù)_字典樹(shù)(字符串排序)簡(jiǎn)介及實(shí)現(xiàn)

1.綜述

又稱(chēng)單詞查找樹(shù)，Trie樹(shù)，是一種樹(shù)形結(jié)構(gòu)，是一種哈希樹(shù)的變種。典型應(yīng)用是用于統(tǒng)計(jì)，排序和保存大量的字符串（但不僅限于字符串），所以經(jīng)常被搜索引擎系統(tǒng)用于文本詞頻統(tǒng)計(jì)。
它的優(yōu)點(diǎn)是：利用字符串的公共前綴來(lái)節(jié)約存儲(chǔ)空間，最大限度地減少無(wú)謂的字符串比較，查詢(xún)效率比哈希表高。

Trie樹(shù)結(jié)構(gòu)的優(yōu)點(diǎn)在于：
1）不限制子節(jié)點(diǎn)的數(shù)量；
2）自定義的輸入序列化，突破了具體語(yǔ)言、應(yīng)用的限制，成為一個(gè)通用的框架；
3）可以進(jìn)行最大Tokens序列長(zhǎng)度的限制；
4）根據(jù)已定閾值輸出重復(fù)的字符串；
5）提供單個(gè)字符串頻度查找功能；
6）速度快，在兩分鐘內(nèi)完成1998年1月份人民日?qǐng)?bào)（19056行）的重復(fù)字符串抽取工作。

2.性質(zhì)

它有3個(gè)基本性質(zhì)：
1)     根節(jié)點(diǎn)不包含字符，除根節(jié)點(diǎn)外每一個(gè)節(jié)點(diǎn)都只包含一個(gè)字符。
2)     從根節(jié)點(diǎn)到某一節(jié)點(diǎn)，路徑上經(jīng)過(guò)的字符連接起來(lái)，為該節(jié)點(diǎn)對(duì)應(yīng)的字符串。
3)     每個(gè)節(jié)點(diǎn)的所有子節(jié)點(diǎn)包含的字符都不相同。

3.基本操作

其基本操作有:查找、插入和刪除,當(dāng)然刪除操作比較少見(jiàn).我在這里只是實(shí)現(xiàn)了對(duì)整個(gè)樹(shù)的刪除操作,至于單個(gè)word的刪除操作也很簡(jiǎn)單.

4.實(shí)現(xiàn)方法

搜索字典項(xiàng)目的方法為：
　　(1) 從根結(jié)點(diǎn)開(kāi)始一次搜索；
　　(2) 取得要查找關(guān)鍵詞的第一個(gè)字母，并根據(jù)該字母選擇對(duì)應(yīng)的子樹(shù)并轉(zhuǎn)到該子樹(shù)繼續(xù)進(jìn)行檢索；
　　(3) 在相應(yīng)的子樹(shù)上，取得要查找關(guān)鍵詞的第二個(gè)字母,并進(jìn)一步選擇對(duì)應(yīng)的子樹(shù)進(jìn)行檢索。
　　(4) 迭代過(guò)程……
(5) 在某個(gè)結(jié)點(diǎn)處，關(guān)鍵詞的所有字母已被取出，則讀取附在該結(jié)點(diǎn)上的信息，即完成查找。
其他操作類(lèi)似處理
5. Trie原理——Trie的核心思想是空間換時(shí)間。利用字符串的公共前綴來(lái)降低查詢(xún)時(shí)間的開(kāi)銷(xiāo)以達(dá)到提高效率的目的。

6.代碼實(shí)現(xiàn)

復(fù)制代碼代碼如下:

const int branchNum = 26; //聲明常量
int i;

struct Trie_node
{
       boolisStr;                //記錄此處是否構(gòu)成一個(gè)串。
       Trie_node*next[branchNum];//指向各個(gè)子樹(shù)的指針,下標(biāo)0-25代表26字符
       Trie_node():isStr(false)
       {
              memset(next,NULL,sizeof(next));
       }
};

class Trie
{
 public:
     Trie();
     voidinsert(const char* word);
     boolsearch(char* word);
     voiddeleteTrie(Trie_node *root);
       // voidprintTrie(Trie_node *root);   //new add

private:
    Trie_node* root;
 };

Trie::Trie()
{
     root =new Trie_node();
}

void Trie::insert(const char* word)
 {
    Trie_node*location = root;
   while(*word)
     {
       if(location->next[*word-'a'] == NULL)//不存在則建立
         {
           Trie_node *tmp = new Trie_node();
           location->next[*word-'a'] = tmp;
        }   
       location = location->next[*word-'a']; //每插入一步，相當(dāng)于有一個(gè)新串經(jīng)過(guò)，指針要向下移動(dòng)
       word++;
    }
   location->isStr = true; //到達(dá)尾部,標(biāo)記一個(gè)串
 }

bool Trie::search(char *word)
{
       Trie_node*location = root;
       while(*word location)
       {
              location= location->next[*word-'a'];
              word++;
       }
       return(location!=NULL  location->isStr);
 }

void Trie::deleteTrie(Trie_node *root)
{
       for(i =0; i  branchNum; i++)
       {
              if(root->next[i]!= NULL)
              {
                     deleteTrie(root->next[i]);
              }
       }
       deleteroot;
}

void main() //簡(jiǎn)單測(cè)試
{
       Trie t;
       t.insert("a");        
       t.insert("abandon");

       char * c= "abandoned";
       t.insert(c);
       t.insert("abashed");

       if(t.search("abashed"))
       {
          printf("true\n");  //已經(jīng)插入了
       }
}

有時(shí)，我們會(huì)碰到對(duì)字符串的排序，若采用一些經(jīng)典的排序算法，則時(shí)間復(fù)雜度一般為O(n*lgn)，但若采用Trie樹(shù)，則時(shí)間復(fù)雜度僅為O(n)。

Trie樹(shù)又名字典樹(shù)，從字面意思即可理解，這種樹(shù)的結(jié)構(gòu)像英文字典一樣，相鄰的單詞一般前綴相同，之所以時(shí)間復(fù)雜度低，是因?yàn)槠洳捎昧艘钥臻g換取時(shí)間的策略。

下圖為一個(gè)針對(duì)字符串排序的Trie樹(shù)(我們假設(shè)在這里字符串都是小寫(xiě)字母)，每個(gè)結(jié)點(diǎn)有26個(gè)分支，每個(gè)分支代表一個(gè)字母，結(jié)點(diǎn)存放的是從root節(jié)點(diǎn)到達(dá)此結(jié)點(diǎn)的路經(jīng)上的字符組成的字符串。

將每個(gè)字符串插入到trie樹(shù)中，到達(dá)特定的結(jié)尾節(jié)點(diǎn)時(shí)，在這個(gè)節(jié)點(diǎn)上進(jìn)行標(biāo)記，如插入"afb"，第一個(gè)字母為a，沿著a往下，然后第二個(gè)字母為f，沿著f往下，第三個(gè)為b，沿著b往下，由于字符串最后一個(gè)字符為'\0'，因而結(jié)束，不再往下了，然后在這個(gè)節(jié)點(diǎn)上標(biāo)記afb.count++，即其個(gè)數(shù)增加1.

之后，通過(guò)前序遍歷此樹(shù)，即可得到字符串從小到大的順序。

實(shí)現(xiàn)代碼如下（g++、VC++都編譯通過(guò)）：

復(fù)制代碼代碼如下:

#include iostream>
#include string.h>
using namespace std;
#define NUM 26
class Node
{
public:
    int count; //記錄該處字符串個(gè)數(shù)
    Node* char_arr[NUM];  //分支
    char* current_str;   //記錄到達(dá)此處的路徑上的所有字母組成的字符串
    Node();
};
class Trie
{
public:
    Node* root;
    Trie();
    void insert(char* str);
    void output(Node* node, char** str, int count);
};
//程序未考慮delete動(dòng)態(tài)內(nèi)存
int main()
{
    char** str = new char*[12];
    str[0] = "zbdfasd";
    str[1] = "zbcfd";
    str[2] = "zbcdfdasfasf";
    str[3] = "abcdaf";
    str[4] = "defdasfa";
    str[5] = "fedfasfd";
    str[6] = "dfdfsa";
    str[7] = "dadfd";
    str[8] = "dfdfasf";
    str[9] = "abcfdfa";
    str[10] = "fbcdfd";
    str[11] = "abcdaf";
    //建立trie樹(shù)
    Trie* trie = new Trie();
    for(int i = 0; i  12; i++)
        trie->insert(str[i]);
    int count = 0;
    trie->output(trie->root, str, count);
    for(int i = 0; i  12; i++)
        coutstr[i]endl;
    return 0;
}
Node::Node()
{
    count = 0;
    for(int i = 0; i  NUM; i++)
        char_arr[i] = NULL;
    current_str = new char[100];
    current_str[0] = '\0';
}
Trie::Trie()
{
    root = new Node();
}
void Trie::insert(char* str)
{
    int i = 0;
    Node* parent = root;
    //將str[i]插入到trie樹(shù)中
    while(str[i] != '\0')
    {
        //如果包含str[i]的分支存在，則新建此分支
        if(parent->char_arr[str[i] - 'a'] == NULL)
        {
            parent->char_arr[str[i] - 'a'] = new Node();
            //將父節(jié)點(diǎn)中的字符串添加到當(dāng)前節(jié)點(diǎn)的字符串中
            strcat(parent->char_arr[str[i] - 'a']->current_str, parent->current_str);
            char str_tmp[2];
            str_tmp[0] = str[i];
            str_tmp[1] = '\0';
            //將str[i]添加到當(dāng)前節(jié)點(diǎn)的字符串中
            strcat(parent->char_arr[str[i] - 'a']->current_str, str_tmp);
            parent = parent->char_arr[str[i] - 'a'];
        }
        else
        {
            parent = parent->char_arr[str[i] - 'a'];
        }
        i++;
    }
    parent->count++;
}
//采用前序遍歷
void Trie::output(Node* node, char** str, int count)
{
    if(node != NULL)
    {
        if(node->count != 0)
        {
            for(int i = 0; i  node->count; i++)
                str[count++] = node->current_str;
        }
        for(int i = 0; i  NUM; i++)
        {
            output(node->char_arr[i], str, count);
        }
    }
}

您可能感興趣的文章:

Java中實(shí)現(xiàn)雙數(shù)組Trie樹(shù)實(shí)例
Python Trie樹(shù)實(shí)現(xiàn)字典排序
C# TrieTree介紹及實(shí)現(xiàn)方法
TrieTree服務(wù)-組件構(gòu)成及其作用介紹
詳解字典樹(shù)Trie結(jié)構(gòu)及其Python代碼實(shí)現(xiàn)
Trie樹(shù)(字典樹(shù))的介紹及Java實(shí)現(xiàn)

標(biāo)簽：桂林南昌鹽城景德鎮(zhèn) 東莞文山黔南宣城

巨人網(wǎng)絡(luò)通訊聲明：本文標(biāo)題《Trie樹(shù)_字典樹(shù)(字符串排序)簡(jiǎn)介及實(shí)現(xiàn)》，本文關(guān)鍵詞 Trie,樹(shù),字典,字符串,排序,；如發(fā)現(xiàn)本文內(nèi)容存在版權(quán)問(wèn)題，煩請(qǐng)?zhí)峁┫嚓P(guān)信息告之我們，我們將及時(shí)溝通與處理。本站內(nèi)容系統(tǒng)采集于網(wǎng)絡(luò)，涉及言論、版權(quán)與本站無(wú)關(guān)。