ruby - Regex error: too many multibyte code ranges are specified -


i have regex needs match bunch of characters. code has no problem ruby 1.8.7 in 1.9 kills over. guess has encoding, i've done chunk of google searches maybe can enlighten me.

code:

# encoding: utf-8 non_latin_hashtag_chars = [   (0xa960..0xa97f).to_a, # hangul jamo extended-a   (0xac00..0xd7af).to_a, # hangul syllables   (0xd7b0..0xd7ff).to_a  # hangul jamo extended-b ].flatten.pack('u*').freeze  e = /[a-z_#{non_latin_hashtag_chars}]/io 

error:

~/desktop: ruby regex_test.rb  regex_test.rb:13:in `<main>': many multibyte code ranges specified: /[a-z_가각갂갃간갅갆갇갈갉갊갋갌갍갎갏감갑값갓갔강갖갗갘같갚갛개객갞갟갠갡갢갣갤갥갦갧갨갩갪갫갬갭갮갯갰갱갲갳갴갵갶갷갸갹갺갻갼갽갾갿걀걁걂걃걄걅걆걇걈걉걊걋걌걍...... 

as twehad points out, there 10k limit in regexp.

in anycase, should use unicode ranges within regexp:

/[a-z_\ua960-\ua97f\uac00-\ud7af\ud7b0-\ud7ff]/io 

i'm not expert in korean don't know if equivalent, if want match hangul characters, should use class instead:

/[a-z_\p{hangul}]/io 

Comments

Popular posts from this blog

c++ - Is it possible to compile a VST on linux? -

java - Output of Eclipse is rubbish -

jquery - Confused with JSON data and normal data in Django ajax request -