KCodecs 5.109.0
Public Types | Public Member Functions | Static Public Member Functions | List of all members
KEncodingProber

Provides encoding detection(probe) capabilities. More...

#include <KEncodingProber>

Public Types

enum  ProberState { FoundIt , NotMe , Probing }
 
enum  ProberType {
  None , Universal , Arabic , Baltic ,
  CentralEuropean , ChineseSimplified , ChineseTraditional , Cyrillic ,
  Greek , Hebrew , Japanese , Korean ,
  NorthernSaami , Other , SouthEasternEurope , Thai ,
  Turkish , Unicode , WesternEuropean
}
 

Public Member Functions

 KEncodingProber (const KEncodingProber &)=delete
 
 KEncodingProber (ProberType proberType=Universal)
 Default ProberType is Universal(detect all possible encodings)
 
float confidence () const
 
QByteArray encoding () const
 
ProberState feed (const char *data, int len)
 
ProberState feed (const QByteArray &data)
 The main class method.
 
KEncodingProberoperator= (const KEncodingProber &)=delete
 
ProberType proberType () const
 
void reset ()
 reset the prober's internal state and data.
 
void setProberType (ProberType proberType)
 change current prober's ProberType and reset the prober
 
ProberState state () const
 

Static Public Member Functions

static QString nameForProberType (ProberType proberType)
 map ProberType to language string
 
static ProberType proberTypeForName (const QString &lang)
 

Detailed Description

Provides encoding detection(probe) capabilities.

Probe the encoding of raw data only. In the case it can't find it, return the most possible encoding it guessed.

Always do Unicode probe regardless the ProberType

Feed data to it several times with feed() until ProberState changes to FoundIt/NotMe, or confidence() returns a value you find acceptable.

Intended lifetime of the object: one instance per ProberType.

Typical use:

QByteArray data, moredata;
...
KEncodingProber prober(KEncodingProber::Chinese);
prober.feed(data);
prober.feed(moredata);
if (prober.confidence() > 0.6)
encoding = prober.encoding();
QByteArray encoding() const

At least 256 characters are needed to change the ProberState from Probing to FoundIt. If you don't have so many characters to probe, decide whether to accept the encoding it guessed so far according to the Confidence by yourself.

Guess encoding of char array

Member Enumeration Documentation

◆ ProberState

Enumerator
FoundIt 

Sure find the encoding.

NotMe 

Sure not included in current ProberType's all supported encodings

Probing 

Need more data to make a decision.

Constructor & Destructor Documentation

◆ KEncodingProber()

KEncodingProber::KEncodingProber ( ProberType  proberType = Universal)

Default ProberType is Universal(detect all possible encodings)

Member Function Documentation

◆ confidence()

float KEncodingProber::confidence ( ) const
Returns
the confidence(sureness) of encoding it guessed so far (0.0 ~ 0.99), not very reliable for single byte encodings

◆ encoding()

QByteArray KEncodingProber::encoding ( ) const
Returns
a QByteArray with the name of the best encoding it has guessed so far
Since
4.2.2

◆ feed()

ProberState KEncodingProber::feed ( const QByteArray &  data)

The main class method.

feed data to the prober

Returns
the ProberState after probing the fed data.

◆ nameForProberType()

static QString KEncodingProber::nameForProberType ( ProberType  proberType)
static

map ProberType to language string

◆ proberTypeForName()

static ProberType KEncodingProber::proberTypeForName ( const QString &  lang)
static
Returns
the ProberType for lang (eg. proberTypeForName("Chinese Simplified") will return KEncodingProber::ChineseSimplified

◆ reset()

void KEncodingProber::reset ( )

reset the prober's internal state and data.

◆ setProberType()

void KEncodingProber::setProberType ( ProberType  proberType)

change current prober's ProberType and reset the prober

◆ state()

ProberState KEncodingProber::state ( ) const
Returns
the prober's current ProberState