class String
A String object has an arbitrary sequence of bytes, typically representing text or binary data. A String object may be created using String::new or as literals.
String objects differ from Symbol objects in that Symbol objects are designed to be used as identifiers, instead of text or data.
You can create a String object explicitly with
You can convert certain objects to Strings with
-
Method
String.
Some String methods modify self. Typically, a method whose name ends with ! modifies self and returns self; often, a similarly named method (without the !) returns a new string.
In general, if both bang and non-bang versions of a method exist, the bang method mutates and the non-bang method does not. However, a method without a bang can also mutate, such as String#replace.
Substitution Methods
These methods perform substitutions
-
String#sub: One substitution (or none); returns a new string. -
String#sub!: One substitution (or none); returnsselfif any changes,nilotherwise. -
String#gsub: Zero or more substitutions; returns a new string. -
String#gsub!: Zero or more substitutions; returnsselfif any changes,nilotherwise.
Each of these methods takes
-
A first argument,
pattern(StringorRegexp), that specifies the substring(s) to be replaced. -
Either of the following
The examples in this section mostly use the String#sub and String#gsub methods; the principles illustrated apply to all four substitution methods.
Argument pattern
Argument pattern is commonly a regular expression
s = 'hello' s.sub(/[aeiou]/, '*') # => "h*llo" s.gsub(/[aeiou]/, '*') # => "h*ll*" s.gsub(/[aeiou]/, '') # => "hll" s.sub(/ell/, 'al') # => "halo" s.gsub(/xyzzy/, '*') # => "hello" 'THX1138'.gsub(/\d+/, '00') # => "THX00"
When pattern is a string, all its characters are treated as ordinary characters (not as Regexp special characters)
'THX1138'.gsub('\d+', '00') # => "THX1138"
String replacement
If replacement is a string, that string determines the replacing string that is substituted for the matched text.
Each of the examples above uses a simple string as the replacing string.
String replacement may contain back-references to the pattern’s captures
-
\n(n is a non-negative integer) refers to$n. -
\k<name>refers to the named capturename.
See Regexp for details.
Note that within the string replacement, a character combination such as $& is treated as ordinary text, not as a special match variable. However, you may refer to some special match variables using these combinations
-
\&and\0correspond to$&, which contains the complete matched text. -
\'corresponds to$', which contains the string after the match. -
\`corresponds to$`, which contains the string before the match. -
\+corresponds to$+, which contains the last capture group.
See Regexp for details.
Note that \\ is interpreted as an escape, i.e., a single backslash.
Note also that a string literal consumes backslashes. See String Literals for details about string literals.
A back-reference is typically preceded by an additional backslash. For example, if you want to write a back-reference \& in replacement with a double-quoted string literal, you need to write "..\\&..".
If you want to write a non-back-reference string \& in replacement, you need to first escape the backslash to prevent this method from interpreting it as a back-reference, and then you need to escape the backslashes again to prevent a string literal from consuming them: "..\\\\&..".
You may want to use the block form to avoid excessive backslashes.
Hash replacement
If the argument replacement is a hash, and pattern matches one of its keys, the replacing string is the value for that key
h = {'foo' => 'bar', 'baz' => 'bat'} 'food'.sub('foo', h) # => "bard"
Note that a symbol key does not match
h = {foo: 'bar', baz: 'bat'} 'food'.sub('foo', h) # => "d"
Block
In the block form, the current match string is passed to the block; the block’s return value becomes the replacing string
s = '@' '1234'.gsub(/\d/) { |match| s.succ! } # => "ABCD"
Special match variables such as $1, $2, $`, $&, and $' are set appropriately.
Whitespace in Strings
In the class String, whitespace is defined as a contiguous sequence of characters consisting of any mixture of the following
-
NL (null):
"\x00","\u0000". -
HT (horizontal tab):
"\x09","\t". -
LF (line feed):
"\x0a","\n". -
VT (vertical tab):
"\x0b","\v". -
FF (form feed):
"\x0c","\f". -
CR (carriage return):
"\x0d","\r". -
SP (space):
"\x20"," ".
Whitespace is relevant for the following methods
这里有什么
First, what’s elsewhere. Class String
-
Inherits from the Object class.
-
Includes the Comparable module.
Here, class String provides methods that are useful for
Creating a String
-
::new: Returns a new string. -
::try_convert: Returns a new string created from a given object.
Freezing/Unfreezing
-
+@: Returns a string that is not frozen:selfif not frozen;self.dupotherwise. -
-@(aliased asdedup): Returns a string that is frozen:selfif already frozen;self.freezeotherwise. -
freeze: Freezesselfif not already frozen; returnsself.
查询
Counts
-
bytesize: Returns the count of bytes. -
count: Returns the count of substrings matching given strings. -
empty?: Returns whether the length ofselfis zero. -
length(aliased assize): Returns the count of characters (not bytes).
Substrings
-
=~: Returns the index of the first substring that matches a givenRegexpor other object; returnsnilif no match is found. -
byteindex: Returns the byte index of the first occurrence of a given substring. -
byterindex: Returns the byte index of the last occurrence of a given substring. -
index: Returns the index of the first occurrence of a given substring; returnsnilif none found. -
rindex: Returns the index of the last occurrence of a given substring; returnsnilif none found. -
include?: Returnstrueif the string contains a given substring;falseotherwise. -
match: Returns aMatchDataobject if the string matches a givenRegexp;nilotherwise. -
match?: Returnstrueif the string matches a givenRegexp;falseotherwise. -
start_with?: Returnstrueif the string begins with any of the given substrings. -
end_with?: Returnstrueif the string ends with any of the given substrings.
编码
-
encoding: Returns theEncodingobject that represents the encoding of the string. -
unicode_normalized?: Returnstrueif the string is in Unicode normalized form;falseotherwise. -
valid_encoding?: Returnstrueif the string contains only characters that are valid for its encoding. -
ascii_only?: Returnstrueif the string has only ASCII characters;falseotherwise.
其他
-
sum: Returns a basic checksum for the string: the sum of each byte. -
hash: Returns the integer hash code.
比较
-
==(aliased as===): Returnstrueif a given other string has the same content asself. -
eql?: Returnstrueif the content is the same as the given other string. -
<=>: Returns -1, 0, or 1 as a given other string is smaller than, equal to, or larger thanself. -
casecmp: Ignoring case, returns -1, 0, or 1 asselfis smaller than, equal to, or larger than a given other string. -
casecmp?: Ignoring case, returns whether a given other string is equal toself.
Modifying
Each of these methods modifies self.
Insertion
-
insert: Returnsselfwith a given string inserted at a specified offset. -
<<: Returnsselfconcatenated with a given string or integer. -
append_as_bytes: Returnsselfconcatenated with strings without performing any encoding validation or conversion. -
prepend: Prefixes toselfthe concatenation of given other strings.
Substitution
-
bytesplice: Replaces bytes ofselfwith bytes from a given string; returnsself. -
sub!: Replaces the first substring that matches a given pattern with a given replacement string; returnsselfif any changes,nilotherwise. -
gsub!: Replaces each substring that matches a given pattern with a given replacement string; returnsselfif any changes,nilotherwise. -
succ!(aliased asnext!): Returnsselfmodified to become its own successor. -
replace: Returnsselfwith its entire content replaced by a given string. -
reverse!: Returnsselfwith its characters in reverse order. -
setbyte: Sets the byte at a given integer offset to a given value; returns the argument. -
tr!: Replaces specified characters inselfwith specified replacement characters; returnsselfif any changes,nilotherwise. -
tr_s!: Replaces specified characters inselfwith specified replacement characters, removing duplicates from the substrings that were modified; returnsselfif any changes,nilotherwise.
Casing
-
capitalize!: Upcases the initial character and downcases all others; returnsselfif any changes,nilotherwise. -
downcase!: Downcases all characters; returnsselfif any changes,nilotherwise. -
upcase!: Upcases all characters; returnsselfif any changes,nilotherwise. -
swapcase!: Upcases each downcase character and downcases each upcase character; returnsselfif any changes,nilotherwise.
Encoding
-
encode!: Returnsselfwith all characters transcoded from one encoding to another. -
unicode_normalize!: Unicode-normalizesself; returnsself. -
scrub!: Replaces each invalid byte with a given character; returnsself. -
force_encoding: Changes the encoding to a given encoding; returnsself.
Deletion
-
clear: Removes all content, so thatselfis empty; returnsself. -
slice!,[]=: Removes a substring determined by a given index, start/length, range, regexp, or substring. -
squeeze!: Removes contiguous duplicate characters; returnsself. -
delete!: Removes characters as determined by the intersection of substring arguments. -
delete_prefix!: Removes leading prefix; returnsselfif any changes,nilotherwise. -
delete_suffix!: Removes trailing suffix; returnsselfif any changes,nilotherwise. -
lstrip!: Removes leading whitespace; returnsselfif any changes,nilotherwise. -
rstrip!: Removes trailing whitespace; returnsselfif any changes,nilotherwise. -
strip!: Removes leading and trailing whitespace; returnsselfif any changes,nilotherwise. -
chomp!: Removes the trailing record separator, if found; returnsselfif any changes,nilotherwise. -
chop!: Removes trailing newline characters if found; otherwise removes the last character; returnsselfif any changes,nilotherwise.
Converting to New String
Each of these methods returns a new String based on self, often just a modified copy of self.
Extension
-
*: Returns the concatenation of multiple copies ofself. -
+: Returns the concatenation ofselfand a given other string. -
center: Returns a copy ofself, centered by specified padding. -
concat: Returns the concatenation ofselfwith given other strings. -
ljust: Returns a copy ofselfof a given length, right-padded with a given other string. -
rjust: Returns a copy ofselfof a given length, left-padded with a given other string.
Encoding
-
b: Returns a copy ofselfwith ASCII-8BIT encoding. -
scrub: Returns a copy ofselfwith each invalid byte replaced with a given character. -
unicode_normalize: Returns a copy ofselfwith each character Unicode-normalized. -
encode: Returns a copy ofselfwith all characters transcoded from one encoding to another.
Substitution
-
dump: Returns a printable version ofself, enclosed in double-quotes. -
undump: Inverse ofdump; returns a copy ofselfwith changes of the kinds made bydump“undone.” -
sub: Returns a copy ofselfwith the first substring matching a given pattern replaced with a given replacement string. -
gsub: Returns a copy ofselfwith each substring that matches a given pattern replaced with a given replacement string. -
succ(aliased asnext): Returns the string that is the successor toself. -
reverse: Returns a copy ofselfwith its characters in reverse order. -
tr: Returns a copy ofselfwith specified characters replaced with specified replacement characters. -
tr_s: Returns a copy ofselfwith specified characters replaced with specified replacement characters, removing duplicates from the substrings that were modified. -
%: Returns the string resulting from formatting a given object intoself.
Casing
-
capitalize: Returns a copy ofselfwith the first character upcased and all other characters downcased. -
downcase: Returns a copy ofselfwith all characters downcased. -
upcase: Returns a copy ofselfwith all characters upcased. -
swapcase: Returns a copy ofselfwith all upcase characters downcased and all downcase characters upcased.
Deletion
-
delete: Returns a copy ofselfwith characters removed. -
delete_prefix: Returns a copy ofselfwith a given prefix removed. -
delete_suffix: Returns a copy ofselfwith a given suffix removed. -
lstrip: Returns a copy ofselfwith leading whitespace removed. -
rstrip: Returns a copy ofselfwith trailing whitespace removed. -
strip: Returns a copy ofselfwith leading and trailing whitespace removed. -
chomp: Returns a copy ofselfwith a trailing record separator removed, if found. -
chop: Returns a copy ofselfwith trailing newline characters or the last character removed. -
squeeze: Returns a copy ofselfwith contiguous duplicate characters removed. -
[](aliased asslice): Returns a substring determined by a given index, start/length, range, regexp, or string. -
byteslice: Returns a substring determined by a given index, start/length, or range. -
chr: Returns the first character.
Duplication
-
to_s(aliased asto_str): Ifselfis a subclass ofString, returnsselfcopied into aString; otherwise, returnsself.
Converting to Non-String
Each of these methods converts the contents of self to a non-String.
Characters, Bytes, and Clusters
-
bytes: Returns an array of the bytes inself. -
chars: Returns an array of the characters inself. -
codepoints: Returns an array of the integer ordinals inself. -
getbyte: Returns the integer byte at the given index inself. -
grapheme_clusters: Returns an array of the grapheme clusters inself.
Splitting
-
lines: Returns an array of the lines inself, as determined by a given record separator. -
partition: Returns a 3-element array determined by the first substring that matches a given substring or regexp. -
rpartition: Returns a 3-element array determined by the last substring that matches a given substring or regexp. -
split: Returns an array of substrings determined by a given delimiter – regexp or string – or, if a block is given, passes those substrings to the block.
Matching
-
scan: Returns an array of substrings matching a given regexp or string, or, if a block is given, passes each matching substring to the block. -
unpack: Returns an array of substrings extracted fromselfaccording to a given format. -
unpack1: Returns the first substring extracted fromselfaccording to a given format.
Numerics
-
hex: Returns the integer value of the leading characters, interpreted as hexadecimal digits. -
oct: Returns the integer value of the leading characters, interpreted as octal digits. -
ord: Returns the integer ordinal of the first character inself. -
to_c: Returns the complex value of leading characters, interpreted as a complex number. -
to_i: Returns the integer value of leading characters, interpreted as an integer. -
to_f: Returns the floating-point value of leading characters, interpreted as a floating-point number. -
to_r: Returns the rational value of leading characters, interpreted as a rational.
Strings and Symbols
-
inspect: Returns a copy ofself, enclosed in double quotes, with special characters escaped. -
intern(aliased asto_sym): Returns the symbol corresponding toself.
迭代
-
each_byte: Calls the given block with each successive byte inself. -
each_char: Calls the given block with each successive character inself. -
each_codepoint: Calls the given block with each successive integer codepoint inself. -
each_grapheme_cluster: Calls the given block with each successive grapheme cluster inself. -
each_line: Calls the given block with each successive line inself, as determined by a given record separator. -
upto: Calls the given block with each string value returned by successive calls tosucc.
Public Class Methods
Source
# File ext/json/lib/json/add/string.rb, line 11 def self.json_create(object) object["raw"].pack("C*") end
Source
static VALUE
rb_str_init(int argc, VALUE *argv, VALUE str)
{
static ID keyword_ids[2];
VALUE orig, opt, venc, vcapa;
VALUE kwargs[2];
rb_encoding *enc = 0;
int n;
if (!keyword_ids[0]) {
keyword_ids[0] = rb_id_encoding();
CONST_ID(keyword_ids[1], "capacity");
}
n = rb_scan_args(argc, argv, "01:", &orig, &opt);
if (!NIL_P(opt)) {
rb_get_kwargs(opt, keyword_ids, 0, 2, kwargs);
venc = kwargs[0];
vcapa = kwargs[1];
if (!UNDEF_P(venc) && !NIL_P(venc)) {
enc = rb_to_encoding(venc);
}
if (!UNDEF_P(vcapa) && !NIL_P(vcapa)) {
long capa = NUM2LONG(vcapa);
long len = 0;
int termlen = enc ? rb_enc_mbminlen(enc) : 1;
if (capa < STR_BUF_MIN_SIZE) {
capa = STR_BUF_MIN_SIZE;
}
if (n == 1) {
StringValue(orig);
len = RSTRING_LEN(orig);
if (capa < len) {
capa = len;
}
if (orig == str) n = 0;
}
str_modifiable(str);
if (STR_EMBED_P(str) || FL_TEST(str, STR_SHARED|STR_NOFREE)) {
/* make noembed always */
const size_t size = (size_t)capa + termlen;
const char *const old_ptr = RSTRING_PTR(str);
const size_t osize = RSTRING_LEN(str) + TERM_LEN(str);
char *new_ptr = ALLOC_N(char, size);
if (STR_EMBED_P(str)) RUBY_ASSERT((long)osize <= str_embed_capa(str));
memcpy(new_ptr, old_ptr, osize < size ? osize : size);
FL_UNSET_RAW(str, STR_SHARED|STR_NOFREE);
RSTRING(str)->as.heap.ptr = new_ptr;
}
else if (STR_HEAP_SIZE(str) != (size_t)capa + termlen) {
SIZED_REALLOC_N(RSTRING(str)->as.heap.ptr, char,
(size_t)capa + termlen, STR_HEAP_SIZE(str));
}
STR_SET_LEN(str, len);
TERM_FILL(&RSTRING(str)->as.heap.ptr[len], termlen);
if (n == 1) {
memcpy(RSTRING(str)->as.heap.ptr, RSTRING_PTR(orig), len);
rb_enc_cr_str_exact_copy(str, orig);
}
FL_SET(str, STR_NOEMBED);
RSTRING(str)->as.heap.aux.capa = capa;
}
else if (n == 1) {
rb_str_replace(str, orig);
}
if (enc) {
rb_enc_associate(str, enc);
ENC_CODERANGE_CLEAR(str);
}
}
else if (n == 1) {
rb_str_replace(str, orig);
}
return str;
}
Returns a new String object containing the given string.
The options are optional keyword options (see below).
With no argument given and keyword encoding also not given, returns an empty string with the Encoding ASCII-8BIT
s = String.new # => "" s.encoding # => #<Encoding:ASCII-8BIT>
With argument string given and keyword option encoding not given, returns a new string with the same encoding as string
s0 = 'foo'.encode(Encoding::UTF_16) s1 = String.new(s0) s1.encoding # => #<Encoding:UTF-16 (dummy)>
(Unlike String.new, a string literal like '' or a here document literal always has script encoding.)
With keyword option encoding given, returns a string with the specified encoding; the encoding may be an Encoding object, an encoding name, or an encoding name alias
String.new(encoding: Encoding::US_ASCII).encoding # => #<Encoding:US-ASCII> String.new('', encoding: Encoding::US_ASCII).encoding # => #<Encoding:US-ASCII> String.new('foo', encoding: Encoding::US_ASCII).encoding # => #<Encoding:US-ASCII> String.new('foo', encoding: 'US-ASCII').encoding # => #<Encoding:US-ASCII> String.new('foo', encoding: 'ASCII').encoding # => #<Encoding:US-ASCII>
The given encoding need not be valid for the string’s content, and its validity is not checked
s = String.new('こんにちは', encoding: 'ascii') s.valid_encoding? # => false
But the given encoding itself is checked
String.new('foo', encoding: 'bar') # Raises ArgumentError.
With keyword option capacity given, the given value is advisory only, and may or may not set the size of the internal buffer, which may in turn affect performance
String.new('foo', capacity: 1) # Buffer size is at least 4 (includes terminal null byte). String.new('foo', capacity: 4096) # Buffer size is at least 4; # may be equal to, greater than, or less than 4096.
Source
static VALUE
rb_str_s_try_convert(VALUE dummy, VALUE str)
{
return rb_check_string_type(str);
}
尝试将给定的 object 转换为字符串。
如果 object 已经是字符串,则返回 object,不作修改。
否则,如果 object 响应 :to_str 方法,则调用 object.to_str 并返回结果。
如果 object 不响应 :to_str 方法,则返回 nil。
除非 object.to_str 返回一个字符串,否则会引发异常。
Public Instance Methods
Source
static VALUE
rb_str_format_m(VALUE str, VALUE arg)
{
VALUE tmp = rb_check_array_type(arg);
if (!NIL_P(tmp)) {
return rb_str_format(RARRAY_LENINT(tmp), RARRAY_CONST_PTR(tmp), str);
}
return rb_str_format(1, &arg, str);
}
返回将 object 格式化到 self 中包含的格式规范的结果(参见 格式规范)
'%05d' % 123 # => "00123"
如果 self 包含多个格式规范,则 object 必须是一个数组或哈希,其中包含要格式化的对象
'%-5s: %016x' % [ 'ID', self.object_id ] # => "ID : 00002b054ec93168" 'foo = %{foo}' % {foo: 'bar'} # => "foo = bar" 'foo = %{foo}, baz = %{baz}' % {foo: 'bar', baz: 'bat'} # => "foo = bar, baz = bat"
相关:参见 转换为新字符串。
Source
VALUE
rb_str_times(VALUE str, VALUE times)
{
VALUE str2;
long n, len;
char *ptr2;
int termlen;
if (times == INT2FIX(1)) {
return str_duplicate(rb_cString, str);
}
if (times == INT2FIX(0)) {
str2 = str_alloc_embed(rb_cString, 0);
rb_enc_copy(str2, str);
return str2;
}
len = NUM2LONG(times);
if (len < 0) {
rb_raise(rb_eArgError, "negative argument");
}
if (RSTRING_LEN(str) == 1 && RSTRING_PTR(str)[0] == 0) {
if (STR_EMBEDDABLE_P(len, 1)) {
str2 = str_alloc_embed(rb_cString, len + 1);
memset(RSTRING_PTR(str2), 0, len + 1);
}
else {
str2 = str_alloc_heap(rb_cString);
RSTRING(str2)->as.heap.aux.capa = len;
RSTRING(str2)->as.heap.ptr = ZALLOC_N(char, (size_t)len + 1);
}
STR_SET_LEN(str2, len);
rb_enc_copy(str2, str);
return str2;
}
if (len && LONG_MAX/len < RSTRING_LEN(str)) {
rb_raise(rb_eArgError, "argument too big");
}
len *= RSTRING_LEN(str);
termlen = TERM_LEN(str);
str2 = str_enc_new(rb_cString, 0, len, STR_ENC_GET(str));
ptr2 = RSTRING_PTR(str2);
if (len) {
n = RSTRING_LEN(str);
memcpy(ptr2, RSTRING_PTR(str), n);
while (n <= len/2) {
memcpy(ptr2 + n, ptr2, n);
n *= 2;
}
memcpy(ptr2 + n, ptr2, len-n);
}
STR_SET_LEN(str2, len);
TERM_FILL(&ptr2[len], termlen);
rb_enc_cr_str_copy_for_substr(str2, str);
return str2;
}
Source
VALUE
rb_str_plus(VALUE str1, VALUE str2)
{
VALUE str3;
rb_encoding *enc;
char *ptr1, *ptr2, *ptr3;
long len1, len2;
int termlen;
StringValue(str2);
enc = rb_enc_check_str(str1, str2);
RSTRING_GETMEM(str1, ptr1, len1);
RSTRING_GETMEM(str2, ptr2, len2);
termlen = rb_enc_mbminlen(enc);
if (len1 > LONG_MAX - len2) {
rb_raise(rb_eArgError, "string size too big");
}
str3 = str_enc_new(rb_cString, 0, len1+len2, enc);
ptr3 = RSTRING_PTR(str3);
memcpy(ptr3, ptr1, len1);
memcpy(ptr3+len1, ptr2, len2);
TERM_FILL(&ptr3[len1+len2], termlen);
ENCODING_CODERANGE_SET(str3, rb_enc_to_index(enc),
ENC_CODERANGE_AND(ENC_CODERANGE(str1), ENC_CODERANGE(str2)));
RB_GC_GUARD(str1);
RB_GC_GUARD(str2);
return str3;
}
Source
static VALUE
str_uplus(VALUE str)
{
if (OBJ_FROZEN(str) || CHILLED_STRING_P(str)) {
return rb_str_dup(str);
}
else {
return str;
}
}
Source
static VALUE
str_uminus(VALUE str)
{
if (!BARE_STRING_P(str) && !rb_obj_frozen_p(str)) {
str = rb_str_dup(str);
}
return rb_fstring(str);
}
返回一个等于 self 的已冻结字符串。
当且仅当以下所有条件都为真时,才返回 self
-
self已被冻结。 -
self是 String 的实例(而不是 String 的子类) -
self上未设置任何实例变量。
否则,返回的字符串是 self 的已冻结副本。
当可能时返回 self 可以节省复制 self 的开销;参见 数据去重。
还可能节省复制其他已存在字符串的开销
s0 = 'foo' s1 = 'foo' s0.object_id == s1.object_id # => false (-s0).object_id == (-s1).object_id # => true
请注意,方法 -@ 对于定义常量很方便
FileName = -'config/database.yml'
虽然其别名 dedup 更适合链式调用
'foo'.dedup.gsub!('o')
相关:参见 冻结/解冻。
Source
VALUE
rb_str_concat(VALUE str1, VALUE str2)
{
unsigned int code;
rb_encoding *enc = STR_ENC_GET(str1);
int encidx;
if (RB_INTEGER_TYPE_P(str2)) {
if (rb_num_to_uint(str2, &code) == 0) {
}
else if (FIXNUM_P(str2)) {
rb_raise(rb_eRangeError, "%ld out of char range", FIX2LONG(str2));
}
else {
rb_raise(rb_eRangeError, "bignum out of char range");
}
}
else {
return rb_str_append(str1, str2);
}
encidx = rb_ascii8bit_appendable_encoding_index(enc, code);
if (encidx >= 0) {
rb_str_buf_cat_byte(str1, (unsigned char)code);
}
else {
long pos = RSTRING_LEN(str1);
int cr = ENC_CODERANGE(str1);
int len;
char *buf;
switch (len = rb_enc_codelen(code, enc)) {
case ONIGERR_INVALID_CODE_POINT_VALUE:
rb_raise(rb_eRangeError, "invalid codepoint 0x%X in %s", code, rb_enc_name(enc));
break;
case ONIGERR_TOO_BIG_WIDE_CHAR_VALUE:
case 0:
rb_raise(rb_eRangeError, "%u out of char range", code);
break;
}
buf = ALLOCA_N(char, len + 1);
rb_enc_mbcput(code, buf, enc);
if (rb_enc_precise_mbclen(buf, buf + len + 1, enc) != len) {
rb_raise(rb_eRangeError, "invalid codepoint 0x%X in %s", code, rb_enc_name(enc));
}
rb_str_resize(str1, pos+len);
memcpy(RSTRING_PTR(str1) + pos, buf, len);
if (cr == ENC_CODERANGE_7BIT && code > 127) {
cr = ENC_CODERANGE_VALID;
}
else if (cr == ENC_CODERANGE_BROKEN) {
cr = ENC_CODERANGE_UNKNOWN;
}
ENC_CODERANGE_SET(str1, cr);
}
return str1;
}
将 object 的字符串表示追加到 self;返回 self。
如果 object 是字符串,则将其追加到 self
s = 'foo' s << 'bar' # => "foobar" s # => "foobar"
如果 object 是整数,则其值被视为一个码点;在连接之前将该值转换为字符
s = 'foo' s << 33 # => "foo!"
此外,如果码点在 0..0xff 范围内,并且 self 的编码是 Encoding::US_ASCII,则将编码更改为 Encoding::ASCII_8BIT
s = 'foo'.encode(Encoding::US_ASCII) s.encoding # => #<Encoding:US-ASCII> s << 0xff # => "foo\xFF" s.encoding # => #<Encoding:BINARY (ASCII-8BIT)>
如果码点无法在 self 的编码中表示,则引发 RangeError
s = 'foo' s.encoding # => <Encoding:UTF-8> s << 0x00110000 # 1114112 out of char range (RangeError) s = 'foo'.encode(Encoding::EUC_JP) s << 0x00800080 # invalid codepoint 0x800080 in EUC-JP (RangeError)
相关:参见 修改。
Source
static VALUE
rb_str_cmp_m(VALUE str1, VALUE str2)
{
int result;
VALUE s = rb_check_string_type(str2);
if (NIL_P(s)) {
return rb_invcmp(str1, str2);
}
result = rb_str_cmp(str1, s);
return INT2FIX(result);
}
比较 self 和 other,评估它们的内容,而不是它们的长度。
返回
-
如果
self较小,则为-1。 -
如果两者相等,则为
0。 -
如果
self较大,则为1。 -
如果两者无法比较,则为
nil。
示例
'a' <=> 'b' # => -1 'a' <=> 'ab' # => -1 'a' <=> 'a' # => 0 'b' <=> 'a' # => 1 'ab' <=> 'a' # => 1 'a' <=> :a # => nil
类 String 包含模块 Comparable,其每个方法都使用 String#<=> 进行比较。
相关:参见 比较。
Source
VALUE
rb_str_equal(VALUE str1, VALUE str2)
{
if (str1 == str2) return Qtrue;
if (!RB_TYPE_P(str2, T_STRING)) {
if (!rb_respond_to(str2, idTo_str)) {
return Qfalse;
}
return rb_equal(str2, str1);
}
return rb_str_eql_internal(str1, str2);
}
返回 object 是否等于 self。
当 object 是字符串时,返回 object 是否与 self 具有相同的长度和内容
s = 'foo' s == 'foo' # => true s == 'food' # => false s == 'FOO' # => false
如果两个字符串的编码不兼容,则返回 false
"\u{e4 f6 fc}".encode(Encoding::ISO_8859_1) == ("\u{c4 d6 dc}") # => false
当 object 不是字符串时
相关:比较。
Source
static VALUE
rb_str_match(VALUE x, VALUE y)
{
switch (OBJ_BUILTIN_TYPE(y)) {
case T_STRING:
rb_raise(rb_eTypeError, "type mismatch: String given");
case T_REGEXP:
return rb_reg_match(y, x);
default:
return rb_funcall(y, idEqTilde, 1, x);
}
}
当 object 是 Regexp 时,返回 self 中被 object 匹配的第一个子字符串的索引,如果没有找到匹配项,则返回 nil;更新 与 Regexp 相关的全局变量
'foo' =~ /f/ # => 0 $~ # => #<MatchData "f"> 'foo' =~ /o/ # => 1 $~ # => #<MatchData "o"> 'foo' =~ /x/ # => nil $~ # => nil
请注意,string =~ regexp 与 regexp =~ string 不同(参见 Regexp#=~)
number = nil 'no. 9' =~ /(?<number>\d+)/ # => 4 number # => nil # Not assigned. /(?<number>\d+)/ =~ 'no. 9' # => 4 number # => "9" # Assigned.
如果 object 不是 Regexp,则返回 object =~ self 返回的值。
相关:参见 查询。
Source
static VALUE
rb_str_aref_m(int argc, VALUE *argv, VALUE str)
{
if (argc == 2) {
if (RB_TYPE_P(argv[0], T_REGEXP)) {
return rb_str_subpat(str, argv[0], argv[1]);
}
else {
return rb_str_substr_two_fixnums(str, argv[0], argv[1], TRUE);
}
}
rb_check_arity(argc, 1, 2);
return rb_str_aref(str, argv[0]);
}
返回由参数指定的 self 的子字符串。
形式 self[index]
给定一个非负整数参数 index,返回 self 中位于字符偏移量 index 处的 1 个字符的子字符串
'hello'[0] # => "h" 'hello'[4] # => "o" 'hello'[5] # => nil 'Привет'[2] # => "и" 'こんにちは'[4] # => "は"
给定一个负整数参数 index,从 self 的末尾开始倒数
'hello'[-1] # => "o" 'hello'[-5] # => "h" 'hello'[-6] # => nil
形式 self[start, length]
给定整数参数 start 和 length,返回一个长度为 length 个字符(可用时)的子字符串,该子字符串从 start 指定的字符偏移量开始。
如果参数 start 为非负数,则偏移量为 start
'hello'[0, 1] # => "h" 'hello'[0, 5] # => "hello" 'hello'[0, 6] # => "hello" 'hello'[2, 3] # => "llo" 'hello'[2, 0] # => "" 'hello'[2, -1] # => nil
如果参数 start 为负数,则从 self 的末尾开始倒数
'hello'[-1, 1] # => "o" 'hello'[-5, 5] # => "hello" 'hello'[-1, 0] # => "" 'hello'[-6, 5] # => nil
特殊情况:如果 start 等于 self 的长度,则返回一个新的空字符串
'hello'[5, 3] # => ""
形式 self[range]
给定 Range 参数 range,形成子字符串 self[range.start, range.size]
'hello'[0..2] # => "hel" 'hello'[0, 3] # => "hel" 'hello'[0...2] # => "he" 'hello'[0, 2] # => "he" 'hello'[0, 0] # => "" 'hello'[0...0] # => ""
形式 self[regexp, capture = 0]
给定 Regexp 参数 regexp 和 capture 为零,在 self 中搜索匹配的子字符串;更新 与 Regexp 相关的全局变量
'hello'[/ell/] # => "ell" 'hello'[/l+/] # => "ll" 'hello'[//] # => "" 'hello'[/nosuch/] # => nil
当 capture 为正整数 n 时,返回第 n 个匹配组
'hello'[/(h)(e)(l+)(o)/] # => "hello" 'hello'[/(h)(e)(l+)(o)/, 1] # => "h" $1 # => "h" 'hello'[/(h)(e)(l+)(o)/, 2] # => "e" $2 # => "e" 'hello'[/(h)(e)(l+)(o)/, 3] # => "ll" 'hello'[/(h)(e)(l+)(o)/, 4] # => "o" 'hello'[/(h)(e)(l+)(o)/, 5] # => nil
形式 self[substring]
给定字符串参数 substring,如果找到,则返回 self 的匹配子字符串
'hello'['ell'] # => "ell" 'hello'[''] # => "" 'hello'['nosuch'] # => nil 'Привет'['ив'] # => "ив" 'こんにちは'['んにち'] # => "んにち"
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_aset_m(int argc, VALUE *argv, VALUE str)
{
if (argc == 3) {
if (RB_TYPE_P(argv[0], T_REGEXP)) {
rb_str_subpat_set(str, argv[0], argv[1], argv[2]);
}
else {
rb_str_update(str, NUM2LONG(argv[0]), NUM2LONG(argv[1]), argv[2]);
}
return argv[2];
}
rb_check_arity(argc, 2, 3);
return rb_str_aset(str, argv[0], argv[1]);
}
返回 self,其全部、部分或无内容被替换;返回参数 other_string。
形式 self[index] = other_string
给定一个非负整数参数 index,搜索 self 中位于字符偏移量 index 处的 1 个字符的子字符串
s = 'hello' s[0] = 'foo' # => "foo" s # => "fooello" s = 'hello' s[4] = 'foo' # => "foo" s # => "hellfoo" s = 'hello' s[5] = 'foo' # => "foo" s # => "hellofoo" s = 'hello' s[6] = 'foo' # Raises IndexError: index 6 out of string.
给定一个负整数参数 index,从 self 的末尾开始倒数
s = 'hello' s[-1] = 'foo' # => "foo" s # => "hellfoo" s = 'hello' s[-5] = 'foo' # => "foo" s # => "fooello" s = 'hello' s[-6] = 'foo' # Raises IndexError: index -6 out of string.
形式 self[start, length] = other_string
给定整数参数 start 和 length,搜索一个长度为 length 个字符(可用时)的子字符串,该子字符串从 start 指定的字符偏移量开始。
如果参数 start 为非负数,则偏移量为 start
s = 'hello' s[0, 1] = 'foo' # => "foo" s # => "fooello" s = 'hello' s[0, 5] = 'foo' # => "foo" s # => "foo" s = 'hello' s[0, 9] = 'foo' # => "foo" s # => "foo" s = 'hello' s[2, 0] = 'foo' # => "foo" s # => "hefoollo" s = 'hello' s[2, -1] = 'foo' # Raises IndexError: negative length -1.
如果参数 start 为负数,则从 self 的末尾开始倒数
s = 'hello' s[-1, 1] = 'foo' # => "foo" s # => "hellfoo" s = 'hello' s[-1, 9] = 'foo' # => "foo" s # => "hellfoo" s = 'hello' s[-5, 2] = 'foo' # => "foo" s # => "foollo" s = 'hello' s[-3, 0] = 'foo' # => "foo" s # => "hefoollo" s = 'hello' s[-6, 2] = 'foo' # Raises IndexError: index -6 out of string.
特殊情况:如果 start 等于 self 的长度,则将参数追加到 self
s = 'hello' s[5, 3] = 'foo' # => "foo" s # => "hellofoo"
形式 self[range] = other_string
给定 Range 参数 range,等同于 self[range.start, range.size] = other_string
s0 = 'hello' s1 = 'hello' s0[0..2] = 'foo' # => "foo" s1[0, 3] = 'foo' # => "foo" s0 # => "foolo" s1 # => "foolo" s = 'hello' s[0...2] = 'foo' # => "foo" s # => "foollo" s = 'hello' s[0...0] = 'foo' # => "foo" s # => "foohello" s = 'hello' s[9..10] = 'foo' # Raises RangeError: 9..10 out of range
形式 self[regexp, capture = 0] = other_string
给定 Regexp 参数 regexp 和 capture 为零,在 self 中搜索匹配的子字符串;更新 与 Regexp 相关的全局变量
s = 'hello' s[/l/] = 'L' # => "L" [$`, $&, $'] # => ["he", "l", "lo"] s[/eLlo/] = 'owdy' # => "owdy" [$`, $&, $'] # => ["h", "eLlo", ""] s[/eLlo/] = 'owdy' # Raises IndexError: regexp not matched. [$`, $&, $'] # => [nil, nil, nil]
当 capture 为正整数 n 时,搜索第 n 个匹配组
s = 'hello' s[/(h)(e)(l+)(o)/] = 'foo' # => "foo" [$`, $&, $'] # => ["", "hello", ""] s = 'hello' s[/(h)(e)(l+)(o)/, 1] = 'foo' # => "foo" s # => "fooello" [$`, $&, $'] # => ["", "hello", ""] s = 'hello' s[/(h)(e)(l+)(o)/, 2] = 'foo' # => "foo" s # => "hfoollo" [$`, $&, $'] # => ["", "hello", ""] s = 'hello' s[/(h)(e)(l+)(o)/, 4] = 'foo' # => "foo" s # => "hellfoo" [$`, $&, $'] # => ["", "hello", ""] s = 'hello' # => "hello" s[/(h)(e)(l+)(o)/, 5] = 'foo # Raises IndexError: index 5 out of regexp. s = 'hello' s[/nosuch/] = 'foo' # Raises IndexError: regexp not matched.
形式 self[substring] = other_string
给定字符串参数 substring
s = 'hello' s['l'] = 'foo' # => "foo" s # => "hefoolo" s = 'hello' s['ll'] = 'foo' # => "foo" s # => "hefooo" s = 'Привет' s['ив'] = 'foo' # => "foo" s # => "Прfooет" s = 'こんにちは' s['んにち'] = 'foo' # => "foo" s # => "こfooは" s['nosuch'] = 'foo' # Raises IndexError: string not matched.
相关:参见 修改。
Source
VALUE
rb_str_append_as_bytes(int argc, VALUE *argv, VALUE str)
{
long needed_capacity = 0;
volatile VALUE t0;
enum ruby_value_type *types = ALLOCV_N(enum ruby_value_type, t0, argc);
for (int index = 0; index < argc; index++) {
VALUE obj = argv[index];
enum ruby_value_type type = types[index] = rb_type(obj);
switch (type) {
case T_FIXNUM:
case T_BIGNUM:
needed_capacity++;
break;
case T_STRING:
needed_capacity += RSTRING_LEN(obj);
break;
default:
rb_raise(
rb_eTypeError,
"wrong argument type %"PRIsVALUE" (expected String or Integer)",
rb_obj_class(obj)
);
break;
}
}
str_ensure_available_capa(str, needed_capacity);
char *sptr = RSTRING_END(str);
for (int index = 0; index < argc; index++) {
VALUE obj = argv[index];
enum ruby_value_type type = types[index];
switch (type) {
case T_FIXNUM:
case T_BIGNUM: {
argv[index] = obj = rb_int_and(obj, INT2FIX(0xff));
char byte = (char)(NUM2INT(obj) & 0xFF);
*sptr = byte;
sptr++;
break;
}
case T_STRING: {
const char *ptr;
long len;
RSTRING_GETMEM(obj, ptr, len);
memcpy(sptr, ptr, len);
sptr += len;
break;
}
default:
rb_bug("append_as_bytes arguments should have been validated");
}
}
STR_SET_LEN(str, RSTRING_LEN(str) + needed_capacity);
TERM_FILL(sptr, TERM_LEN(str)); /* sentinel */
int cr = ENC_CODERANGE(str);
switch (cr) {
case ENC_CODERANGE_7BIT: {
for (int index = 0; index < argc; index++) {
VALUE obj = argv[index];
enum ruby_value_type type = types[index];
switch (type) {
case T_FIXNUM:
case T_BIGNUM: {
if (!ISASCII(NUM2INT(obj))) {
goto clear_cr;
}
break;
}
case T_STRING: {
if (ENC_CODERANGE(obj) != ENC_CODERANGE_7BIT) {
goto clear_cr;
}
break;
}
default:
rb_bug("append_as_bytes arguments should have been validated");
}
}
break;
}
case ENC_CODERANGE_VALID:
if (ENCODING_GET_INLINED(str) == ENCINDEX_ASCII_8BIT) {
goto keep_cr;
}
else {
goto clear_cr;
}
break;
default:
goto clear_cr;
break;
}
RB_GC_GUARD(t0);
clear_cr:
// If no fast path was hit, we clear the coderange.
// append_as_bytes is predominantly meant to be used in
// buffering situation, hence it's likely the coderange
// will never be scanned, so it's not worth spending time
// precomputing the coderange except for simple and common
// situations.
ENC_CODERANGE_CLEAR(str);
keep_cr:
return str;
}
将 objects 中的每个对象连接到 self;返回 self;不执行任何编码验证或转换
s = 'foo' s.append_as_bytes(" \xE2\x82") # => "foo \xE2\x82" s.valid_encoding? # => false s.append_as_bytes("\xAC 12") s.valid_encoding? # => true
当给定对象是整数时,该值被视为一个 8 位字节;如果整数占用多个字节(即大于 255),则只追加低位字节(类似于 String#setbyte)
s = "" s.append_as_bytes(0, 257) # => "\u0000\u0001" s.bytesize # => 2
相关:参见 修改。
Source
static VALUE
rb_str_is_ascii_only_p(VALUE str)
{
int cr = rb_enc_str_coderange(str);
return RBOOL(cr == ENC_CODERANGE_7BIT);
}
Source
static VALUE
rb_str_b(VALUE str)
{
VALUE str2;
if (STR_EMBED_P(str)) {
str2 = str_alloc_embed(rb_cString, RSTRING_LEN(str) + TERM_LEN(str));
}
else {
str2 = str_alloc_heap(rb_cString);
}
str_replace_shared_without_enc(str2, str);
if (rb_enc_asciicompat(STR_ENC_GET(str))) {
// BINARY strings can never be broken; they're either 7-bit ASCII or VALID.
// If we know the receiver's code range then we know the result's code range.
int cr = ENC_CODERANGE(str);
switch (cr) {
case ENC_CODERANGE_7BIT:
ENC_CODERANGE_SET(str2, ENC_CODERANGE_7BIT);
break;
case ENC_CODERANGE_BROKEN:
case ENC_CODERANGE_VALID:
ENC_CODERANGE_SET(str2, ENC_CODERANGE_VALID);
break;
default:
ENC_CODERANGE_CLEAR(str2);
break;
}
}
return str2;
}
返回一个具有 ASCII-8BIT 编码的 self 的副本;底层字节不会被修改
s = "\x99" s.encoding # => #<Encoding:UTF-8> t = s.b # => "\x99" t.encoding # => #<Encoding:ASCII-8BIT> s = "\u4095" # => "䂕" s.encoding # => #<Encoding:UTF-8> s.bytes # => [228, 130, 149] t = s.b # => "\xE4\x82\x95" t.encoding # => #<Encoding:ASCII-8BIT> t.bytes # => [228, 130, 149]
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_byteindex_m(int argc, VALUE *argv, VALUE str)
{
VALUE sub;
VALUE initpos;
long pos;
if (rb_scan_args(argc, argv, "11", &sub, &initpos) == 2) {
long slen = RSTRING_LEN(str);
pos = NUM2LONG(initpos);
if (pos < 0 ? (pos += slen) < 0 : pos > slen) {
if (RB_TYPE_P(sub, T_REGEXP)) {
rb_backref_set(Qnil);
}
return Qnil;
}
}
else {
pos = 0;
}
str_ensure_byte_pos(str, pos);
if (RB_TYPE_P(sub, T_REGEXP)) {
if (rb_reg_search(sub, str, pos, 0) >= 0) {
VALUE match = rb_backref_get();
struct re_registers *regs = RMATCH_REGS(match);
pos = BEG(0);
return LONG2NUM(pos);
}
}
else {
StringValue(sub);
pos = rb_str_byteindex(str, sub, pos);
if (pos >= 0) return LONG2NUM(pos);
}
return Qnil;
}
返回 self 中由 object(字符串或 Regexp)和 offset 指定的子字符串的 0 基索引,如果没有这样的子字符串,则返回 nil;返回的索引是字节(不是字符)的计数。
当 object 是字符串时,返回第一个找到的等于 object 的子字符串的索引
s = 'foo' # => "foo" s.size # => 3 # Three 1-byte characters. s.bytesize # => 3 # Three bytes. s.byteindex('f') # => 0 s.byteindex('o') # => 1 s.byteindex('oo') # => 1 s.byteindex('ooo') # => nil
当 object 是 Regexp 时,返回第一个找到的匹配 object 的子字符串的索引;更新 与 Regexp 相关的全局变量
s = 'foo' s.byteindex(/f/) # => 0 $~ # => #<MatchData "f"> s.byteindex(/o/) # => 1 s.byteindex(/oo/) # => 1 s.byteindex(/ooo/) # => nil $~ # => nil
整数参数 offset(如果给定)指定搜索开始的字节的 0 基索引。
当 offset 为非负数时,搜索从字节位置 offset 开始
s = 'foo' s.byteindex('o', 1) # => 1 s.byteindex('o', 2) # => 2 s.byteindex('o', 3) # => nil
当 offset 为负数时,从 self 的末尾开始倒数
s = 'foo' s.byteindex('o', -1) # => 2 s.byteindex('o', -2) # => 1 s.byteindex('o', -3) # => 1 s.byteindex('o', -4) # => nil
如果 offset 处的字节不是字符的第一个字节,则引发 IndexError
s = "\uFFFF\uFFFF" # => "\uFFFF\uFFFF" s.size # => 2 # Two 3-byte characters. s.bytesize # => 6 # Six bytes. s.byteindex("\uFFFF") # => 0 s.byteindex("\uFFFF", 1) # Raises IndexError s.byteindex("\uFFFF", 2) # Raises IndexError s.byteindex("\uFFFF", 3) # => 3 s.byteindex("\uFFFF", 4) # Raises IndexError s.byteindex("\uFFFF", 5) # Raises IndexError s.byteindex("\uFFFF", 6) # => nil
相关:参见 查询。
Source
static VALUE
rb_str_byterindex_m(int argc, VALUE *argv, VALUE str)
{
VALUE sub;
VALUE initpos;
long pos, len = RSTRING_LEN(str);
if (rb_scan_args(argc, argv, "11", &sub, &initpos) == 2) {
pos = NUM2LONG(initpos);
if (pos < 0 && (pos += len) < 0) {
if (RB_TYPE_P(sub, T_REGEXP)) {
rb_backref_set(Qnil);
}
return Qnil;
}
if (pos > len) pos = len;
}
else {
pos = len;
}
str_ensure_byte_pos(str, pos);
if (RB_TYPE_P(sub, T_REGEXP)) {
if (rb_reg_search(sub, str, pos, 1) >= 0) {
VALUE match = rb_backref_get();
struct re_registers *regs = RMATCH_REGS(match);
pos = BEG(0);
return LONG2NUM(pos);
}
}
else {
StringValue(sub);
pos = rb_str_byterindex(str, sub, pos);
if (pos >= 0) return LONG2NUM(pos);
}
return Qnil;
}
返回 self 中由给定的 object(字符串或 Regexp)和 offset 指定的子字符串的最后一个匹配项的 0 基索引,如果没有这样的子字符串,则返回 nil;返回的索引是字节(不是字符)的计数。
当 object 是字符串时,返回最后一个找到的等于 object 的子字符串的索引
s = 'foo' # => "foo" s.size # => 3 # Three 1-byte characters. s.bytesize # => 3 # Three bytes. s.byterindex('f') # => 0 s.byterindex('o') # => 2 s.byterindex('oo') # => 1 s.byterindex('ooo') # => nil
当 object 是 Regexp 时,返回最后一个找到的匹配 object 的子字符串的索引;更新 与 Regexp 相关的全局变量
s = 'foo' s.byterindex(/f/) # => 0 $~ # => #<MatchData "f"> s.byterindex(/o/) # => 2 s.byterindex(/oo/) # => 1 s.byterindex(/ooo/) # => nil $~ # => nil
最后一个匹配意味着从可能的最后一个位置开始,而不是最长匹配的最后一个
s = 'foo' s.byterindex(/o+/) # => 2 $~ #=> #<MatchData "o">
要获得最后一个最长匹配,请使用负向前瞻
s = 'foo' s.byterindex(/(?<!o)o+/) # => 1 $~ # => #<MatchData "oo">
或者使用方法 byteindex 和负向前瞻
s = 'foo' s.byteindex(/o+(?!.*o)/) # => 1 $~ #=> #<MatchData "oo">
整数参数 offset(如果给定)指定搜索结束的字节的 0 基索引。
当 offset 为非负数时,搜索在字节位置 offset 结束
s = 'foo' s.byterindex('o', 0) # => nil s.byterindex('o', 1) # => 1 s.byterindex('o', 2) # => 2 s.byterindex('o', 3) # => 2
当 offset 为负数时,从 self 的末尾开始倒数
s = 'foo' s.byterindex('o', -1) # => 2 s.byterindex('o', -2) # => 1 s.byterindex('o', -3) # => nil
如果 offset 处的字节不是字符的第一个字节,则引发 IndexError
s = "\uFFFF\uFFFF" # => "\uFFFF\uFFFF" s.size # => 2 # Two 3-byte characters. s.bytesize # => 6 # Six bytes. s.byterindex("\uFFFF") # => 3 s.byterindex("\uFFFF", 1) # Raises IndexError s.byterindex("\uFFFF", 2) # Raises IndexError s.byterindex("\uFFFF", 3) # => 3 s.byterindex("\uFFFF", 4) # Raises IndexError s.byterindex("\uFFFF", 5) # Raises IndexError s.byterindex("\uFFFF", 6) # => nil
相关:参见 查询。
Source
static VALUE
rb_str_bytes(VALUE str)
{
VALUE ary = WANTARRAY("bytes", RSTRING_LEN(str));
return rb_str_enumerate_bytes(str, ary);
}
返回 self 中的字节数组
'hello'.bytes # => [104, 101, 108, 108, 111] 'Привет'.bytes # => [208, 159, 209, 128, 208, 184, 208, 178, 208, 181, 209, 130] 'こんにちは'.bytes # => [227, 129, 147, 227, 130, 147, 227, 129, 171, 227, 129, 161, 227, 129, 175]
相关:参见 转换为非字符串。
Source
VALUE
rb_str_bytesize(VALUE str)
{
return LONG2NUM(RSTRING_LEN(str));
}
Source
static VALUE
rb_str_byteslice(int argc, VALUE *argv, VALUE str)
{
if (argc == 2) {
long beg = NUM2LONG(argv[0]);
long len = NUM2LONG(argv[1]);
return str_byte_substr(str, beg, len, TRUE);
}
rb_check_arity(argc, 1, 2);
return str_byte_aref(str, argv[0]);
}
返回 self 的子字符串,如果无法构造子字符串,则返回 nil。
给定整数参数 offset 和 length,返回从给定 offset 开始,长度为给定 length(可用时)的子字符串
s = '0123456789' # => "0123456789" s.byteslice(2) # => "2" s.byteslice(200) # => nil s.byteslice(4, 3) # => "456" s.byteslice(4, 30) # => "456789"
如果 length 为负数或 offset 超出 self 的范围,则返回 nil
s.byteslice(4, -1) # => nil s.byteslice(40, 2) # => nil
如果 offset 为负数,则从 self 的末尾开始倒数
s = '0123456789' # => "0123456789" s.byteslice(-4) # => "6" s.byteslice(-4, 3) # => "678"
给定 Range 参数 range,返回 byteslice(range.begin, range.size)
s = '0123456789' # => "0123456789" s.byteslice(4..6) # => "456" s.byteslice(-6..-4) # => "456" s.byteslice(5..2) # => "" # range.size is zero. s.byteslice(40..42) # => nil
起始和结束偏移量不必是字符边界
s = 'こんにちは' s.byteslice(0, 3) # => "こ" s.byteslice(1, 3) # => "\x81\x93\xE3"
self 和返回的子字符串的编码始终相同
s.encoding # => #<Encoding:UTF-8> s.byteslice(0, 3).encoding # => #<Encoding:UTF-8> s.byteslice(1, 3).encoding # => #<Encoding:UTF-8>
但是,根据字符边界,返回的子字符串的编码可能无效
s.valid_encoding? # => true s.byteslice(0, 3).valid_encoding? # => true s.byteslice(1, 3).valid_encoding? # => false
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_bytesplice(int argc, VALUE *argv, VALUE str)
{
long beg, len, vbeg, vlen;
VALUE val;
int cr;
rb_check_arity(argc, 2, 5);
if (!(argc == 2 || argc == 3 || argc == 5)) {
rb_raise(rb_eArgError, "wrong number of arguments (given %d, expected 2, 3, or 5)", argc);
}
if (argc == 2 || (argc == 3 && !RB_INTEGER_TYPE_P(argv[0]))) {
if (!rb_range_beg_len(argv[0], &beg, &len, RSTRING_LEN(str), 2)) {
rb_raise(rb_eTypeError, "wrong argument type %s (expected Range)",
rb_builtin_class_name(argv[0]));
}
val = argv[1];
StringValue(val);
if (argc == 2) {
/* bytesplice(range, str) */
vbeg = 0;
vlen = RSTRING_LEN(val);
}
else {
/* bytesplice(range, str, str_range) */
if (!rb_range_beg_len(argv[2], &vbeg, &vlen, RSTRING_LEN(val), 2)) {
rb_raise(rb_eTypeError, "wrong argument type %s (expected Range)",
rb_builtin_class_name(argv[2]));
}
}
}
else {
beg = NUM2LONG(argv[0]);
len = NUM2LONG(argv[1]);
val = argv[2];
StringValue(val);
if (argc == 3) {
/* bytesplice(index, length, str) */
vbeg = 0;
vlen = RSTRING_LEN(val);
}
else {
/* bytesplice(index, length, str, str_index, str_length) */
vbeg = NUM2LONG(argv[3]);
vlen = NUM2LONG(argv[4]);
}
}
str_check_beg_len(str, &beg, &len);
str_check_beg_len(val, &vbeg, &vlen);
str_modify_keep_cr(str);
if (RB_UNLIKELY(ENCODING_GET_INLINED(str) != ENCODING_GET_INLINED(val))) {
rb_enc_associate(str, rb_enc_check(str, val));
}
rb_str_update_1(str, beg, len, val, vbeg, vlen);
cr = ENC_CODERANGE_AND(ENC_CODERANGE(str), ENC_CODERANGE(val));
if (cr != ENC_CODERANGE_BROKEN)
ENC_CODERANGE_SET(str, cr);
return str;
}
用给定字符串 str 中的源字节替换 self 中目标字节;返回 self。
在第一种形式中,参数 offset 和 length 确定目标字节,而源字节是给定 str 的所有字节
'0123456789'.bytesplice(0, 3, 'abc') # => "abc3456789" '0123456789'.bytesplice(3, 3, 'abc') # => "012abc6789" '0123456789'.bytesplice(0, 50, 'abc') # => "abc" '0123456789'.bytesplice(50, 3, 'abc') # Raises IndexError.
目标字节数和源字节数可能不同
'0123456789'.bytesplice(0, 6, 'abc') # => "abc6789" # Shorter source. '0123456789'.bytesplice(0, 1, 'abc') # => "abc123456789" # Shorter target.
任一计数都可能为零(即,指定一个空字符串)
'0123456789'.bytesplice(0, 3, '') # => "3456789" # Empty source. '0123456789'.bytesplice(0, 0, 'abc') # => "abc0123456789" # Empty target.
在第二种形式中,与第一种形式一样,参数 offset 和 length 确定目标字节;参数 str 包含源字节,额外的参数 str_offset 和 str_length 确定实际源字节
'0123456789'.bytesplice(0, 3, 'abc', 0, 3) # => "abc3456789" '0123456789'.bytesplice(0, 3, 'abc', 1, 1) # => "b3456789" # Shorter source. '0123456789'.bytesplice(0, 1, 'abc', 0, 3) # => "abc123456789" # Shorter target. '0123456789'.bytesplice(0, 3, 'abc', 1, 0) # => "3456789" # Empty source. '0123456789'.bytesplice(0, 0, 'abc', 0, 3) # => "abc0123456789" # Empty target.
在第三种形式中,参数 range 确定目标字节,而源字节是给定 str 的所有字节
'0123456789'.bytesplice(0..2, 'abc') # => "abc3456789" '0123456789'.bytesplice(3..5, 'abc') # => "012abc6789" '0123456789'.bytesplice(0..5, 'abc') # => "abc6789" # Shorter source. '0123456789'.bytesplice(0..0, 'abc') # => "abc123456789" # Shorter target. '0123456789'.bytesplice(0..2, '') # => "3456789" # Empty source. '0123456789'.bytesplice(0...0, 'abc') # => "abc0123456789" # Empty target.
在第四种形式中,与第三种形式一样,参数 range 确定目标字节;参数 str 包含源字节,额外的参数 str_range 确定实际源字节
'0123456789'.bytesplice(0..2, 'abc', 0..2) # => "abc3456789" '0123456789'.bytesplice(3..5, 'abc', 0..2) # => "012abc6789" '0123456789'.bytesplice(0..2, 'abc', 0..1) # => "ab3456789" # Shorter source. '0123456789'.bytesplice(0..1, 'abc', 0..2) # => "abc23456789" # Shorter target. '0123456789'.bytesplice(0..2, 'abc', 0...0) # => "3456789" # Empty source. '0123456789'.bytesplice(0...0, 'abc', 0..2) # => "abc0123456789" # Empty target.
在任何形式中,源和目标的所有开始和结束都必须是字符边界。
在这些示例中,self 包含五个 3 字节字符,因此在偏移量 0、3、6、9、12 和 15 处有字符边界。
'こんにちは'.bytesplice(0, 3, 'abc') # => "abcんにちは" 'こんにちは'.bytesplice(1, 3, 'abc') # Raises IndexError. 'こんにちは'.bytesplice(0, 2, 'abc') # Raises IndexError.
Source
static VALUE
rb_str_capitalize(int argc, VALUE *argv, VALUE str)
{
rb_encoding *enc;
OnigCaseFoldType flags = ONIGENC_CASE_UPCASE | ONIGENC_CASE_TITLECASE;
VALUE ret;
flags = check_case_options(argc, argv, flags);
enc = str_true_enc(str);
if (RSTRING_LEN(str) == 0 || !RSTRING_PTR(str)) return str;
if (flags&ONIGENC_CASE_ASCII_ONLY) {
ret = rb_str_new(0, RSTRING_LEN(str));
rb_str_ascii_casemap(str, ret, &flags, enc);
}
else {
ret = rb_str_casemap(str, &flags, enc);
}
return ret;
}
返回一个包含 self 中字符的字符串,其中每个字符的大小写可能已更改
-
第一个字符大写。
-
所有其他字符都小写。
示例
'hello'.capitalize # => "Hello" 'HELLO'.capitalize # => "Hello" 'straße'.capitalize # => "Straße" # Lowercase 'ß' not changed. 'STRAẞE'.capitalize # => "Straße" # Uppercase 'ẞ' downcased to 'ß'. 'привет'.capitalize # => "Привет" 'ПРИВЕТ'.capitalize # => "Привет"
某些字符(以及某些字符集)没有大写和小写的版本;参见 大小写映射
s = '1, 2, 3, ...' s.capitalize == s # => true s = 'こんにちは' s.capitalize == s # => true
大小写受给定的 mapping 影响,该映射可以是 :ascii、:fold 或 :turkic;参见 大小写映射。
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_capitalize_bang(int argc, VALUE *argv, VALUE str)
{
rb_encoding *enc;
OnigCaseFoldType flags = ONIGENC_CASE_UPCASE | ONIGENC_CASE_TITLECASE;
flags = check_case_options(argc, argv, flags);
str_modify_keep_cr(str);
enc = str_true_enc(str);
if (RSTRING_LEN(str) == 0 || !RSTRING_PTR(str)) return Qnil;
if (flags&ONIGENC_CASE_ASCII_ONLY)
rb_str_ascii_casemap(str, str, &flags, enc);
else
str_shared_replace(str, rb_str_casemap(str, &flags, enc));
if (ONIGENC_CASE_MODIFIED&flags) return str;
return Qnil;
}
Source
static VALUE
rb_str_casecmp(VALUE str1, VALUE str2)
{
VALUE s = rb_check_string_type(str2);
if (NIL_P(s)) {
return Qnil;
}
return str_casecmp(str1, s);
}
忽略大小写,比较 self 和 other_string;返回
-
如果
self.downcase小于other_string.downcase,则为-1。 -
如果两者相等,则为
0。 -
如果
self.downcase大于other_string.downcase,则为1。 -
如果两者无法比较,则为
nil。
参见 大小写映射。
示例
'foo'.casecmp('goo') # => -1 'goo'.casecmp('foo') # => 1 'foo'.casecmp('food') # => -1 'food'.casecmp('foo') # => 1 'FOO'.casecmp('foo') # => 0 'foo'.casecmp('FOO') # => 0 'foo'.casecmp(1) # => nil
相关:参见 比较。
Source
static VALUE
rb_str_casecmp_p(VALUE str1, VALUE str2)
{
VALUE s = rb_check_string_type(str2);
if (NIL_P(s)) {
return Qnil;
}
return str_casecmp_p(str1, s);
}
如果 self 和 other_string 在进行 Unicode 大小写折叠后相等,则返回 true,如果不相等则返回 false,如果无法比较则返回 nil。
参见 大小写映射。
示例
'foo'.casecmp?('goo') # => false 'goo'.casecmp?('foo') # => false 'foo'.casecmp?('food') # => false 'food'.casecmp?('foo') # => false 'FOO'.casecmp?('foo') # => true 'foo'.casecmp?('FOO') # => true 'foo'.casecmp?(1) # => nil
相关:参见 比较。
Source
static VALUE
rb_str_center(int argc, VALUE *argv, VALUE str)
{
return rb_str_justify(argc, argv, str, 'c');
}
返回 self 的居中副本。
如果整数参数 size 大于 self 的大小(按字符数计),则返回一个长度为 size 的新字符串,该字符串是 self 的副本,居中并在一端或两端用 pad_string 填充
'hello'.center(6) # => "hello " # Padded on one end. 'hello'.center(10) # => " hello " # Padded on both ends. 'hello'.center(20, '-|') # => "-|-|-|-hello-|-|-|-|" # Some padding repeated. 'hello'.center(10, 'abcdefg') # => "abhelloabc" # Some padding not used. ' hello '.center(13) # => " hello " 'Привет'.center(10) # => " Привет " 'こんにちは'.center(10) # => " こんにちは " # Multi-byte characters.
如果 size 小于或等于 self 的大小,则返回一个未填充的 self 副本
'hello'.center(5) # => "hello" 'hello'.center(-10) # => "hello"
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_chars(VALUE str)
{
VALUE ary = WANTARRAY("chars", rb_str_strlen(str));
return rb_str_enumerate_chars(str, ary);
}
返回 self 中的字符数组
'hello'.chars # => ["h", "e", "l", "l", "o"] 'Привет'.chars # => ["П", "р", "и", "в", "е", "т"] 'こんにちは'.chars # => ["こ", "ん", "に", "ち", "は"] ''.chars # => []
相关:参见 转换为非字符串。
Source
static VALUE
rb_str_chomp(int argc, VALUE *argv, VALUE str)
{
VALUE rs = chomp_rs(argc, argv);
if (NIL_P(rs)) return str_duplicate(rb_cString, str);
return rb_str_subseq(str, 0, chompped_length(str, rs));
}
返回一个从 self 复制的新字符串,其中尾部字符可能被删除
当 line_sep 是 "\n" 时,如果最后的一个或两个字符是 "\r"、"\n" 或 "\r\n"(但不是 "\n\r"),则删除它们
$/ # => "\n" "abc\r".chomp # => "abc" "abc\n".chomp # => "abc" "abc\r\n".chomp # => "abc" "abc\n\r".chomp # => "abc\n" "тест\r\n".chomp # => "тест" "こんにちは\r\n".chomp # => "こんにちは"
当 line_sep 是 ''(空字符串)时,删除多个尾部的 "\n" 或 "\r\n"(但不是 "\r" 或 "\n\r")
"abc\n\n\n".chomp('') # => "abc" "abc\r\n\r\n\r\n".chomp('') # => "abc" "abc\n\n\r\n\r\n\n\n".chomp('') # => "abc" "abc\n\r\n\r\n\r".chomp('') # => "abc\n\r\n\r\n\r" "abc\r\r\r".chomp('') # => "abc\r\r\r"
当 line_sep 既不是 "\n" 也不是 '' 时,如果存在单个尾部行分隔符,则将其删除
'abcd'.chomp('cd') # => "ab" 'abcdcd'.chomp('cd') # => "abcd" 'abcd'.chomp('xx') # => "abcd"
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_chomp_bang(int argc, VALUE *argv, VALUE str)
{
VALUE rs;
str_modifiable(str);
if (RSTRING_LEN(str) == 0 && argc < 2) return Qnil;
rs = chomp_rs(argc, argv);
if (NIL_P(rs)) return Qnil;
return rb_str_chomp_string(str, rs);
}
Source
static VALUE
rb_str_chop(VALUE str)
{
return rb_str_subseq(str, 0, chopped_length(str));
}
返回一个从 self 复制的新字符串,其中尾部字符可能被删除。
如果最后两个字符是 "\r\n",则删除它们。
"abc\r\n".chop # => "abc" "тест\r\n".chop # => "тест" "こんにちは\r\n".chop # => "こんにちは"
否则,如果存在最后一个字符,则删除它。
'abcd'.chop # => "abc" 'тест'.chop # => "тес" 'こんにちは'.chop # => "こんにち" ''.chop # => ""
如果您只需要删除字符串末尾的换行符,String#chomp 是更好的选择。
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_chop_bang(VALUE str)
{
str_modify_keep_cr(str);
if (RSTRING_LEN(str) > 0) {
long len;
len = chopped_length(str);
STR_SET_LEN(str, len);
TERM_FILL(&RSTRING_PTR(str)[len], TERM_LEN(str));
if (ENC_CODERANGE(str) != ENC_CODERANGE_7BIT) {
ENC_CODERANGE_CLEAR(str);
}
return str;
}
return Qnil;
}
Source
static VALUE
rb_str_chr(VALUE str)
{
return rb_str_substr(str, 0, 1);
}
返回一个包含 self 第一个字符的字符串
'hello'.chr # => "h" 'тест'.chr # => "т" 'こんにちは'.chr # => "こ" ''.chr # => ""
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_clear(VALUE str)
{
str_discard(str);
STR_SET_EMBED(str);
STR_SET_LEN(str, 0);
RSTRING_PTR(str)[0] = 0;
if (rb_enc_asciicompat(STR_ENC_GET(str)))
ENC_CODERANGE_SET(str, ENC_CODERANGE_7BIT);
else
ENC_CODERANGE_SET(str, ENC_CODERANGE_VALID);
return str;
}
Source
static VALUE
rb_str_codepoints(VALUE str)
{
VALUE ary = WANTARRAY("codepoints", rb_str_strlen(str));
return rb_str_enumerate_codepoints(str, ary);
}
返回 self 中码点的数组;每个码点是字符的整数值
'hello'.codepoints # => [104, 101, 108, 108, 111] 'тест'.codepoints # => [1090, 1077, 1089, 1090] 'こんにちは'.codepoints # => [12371, 12435, 12395, 12385, 12399] ''.codepoints # => []
相关:参见 转换为非字符串。
Source
static VALUE
rb_str_concat_multi(int argc, VALUE *argv, VALUE str)
{
str_modifiable(str);
if (argc == 1) {
return rb_str_concat(str, argv[0]);
}
else if (argc > 1) {
int i;
VALUE arg_str = rb_str_tmp_new(0);
rb_enc_copy(arg_str, str);
for (i = 0; i < argc; i++) {
rb_str_concat(arg_str, argv[i]);
}
rb_str_buf_append(str, arg_str);
}
return str;
}
将 objects 中的每个对象连接到 self;返回 self
'foo'.concat('bar', 'baz') # => "foobarbaz"
对于每个给定的整数对象 object,该值被视为一个码点,并在连接前转换为字符
'foo'.concat(32, 'bar', 32, 'baz') # => "foo bar baz" # Embeds spaces. 'те'.concat(1089, 1090) # => "тест" 'こん'.concat(12395, 12385, 12399) # => "こんにちは"
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_count(int argc, VALUE *argv, VALUE str)
{
char table[TR_TABLE_SIZE];
rb_encoding *enc = 0;
VALUE del = 0, nodel = 0, tstr;
char *s, *send;
int i;
int ascompat;
size_t n = 0;
rb_check_arity(argc, 1, UNLIMITED_ARGUMENTS);
tstr = argv[0];
StringValue(tstr);
enc = rb_enc_check(str, tstr);
if (argc == 1) {
const char *ptstr;
if (RSTRING_LEN(tstr) == 1 && rb_enc_asciicompat(enc) &&
(ptstr = RSTRING_PTR(tstr),
ONIGENC_IS_ALLOWED_REVERSE_MATCH(enc, (const unsigned char *)ptstr, (const unsigned char *)ptstr+1)) &&
!is_broken_string(str)) {
int clen;
unsigned char c = rb_enc_codepoint_len(ptstr, ptstr+1, &clen, enc);
s = RSTRING_PTR(str);
if (!s || RSTRING_LEN(str) == 0) return INT2FIX(0);
send = RSTRING_END(str);
while (s < send) {
if (*(unsigned char*)s++ == c) n++;
}
return SIZET2NUM(n);
}
}
tr_setup_table(tstr, table, TRUE, &del, &nodel, enc);
for (i=1; i<argc; i++) {
tstr = argv[i];
StringValue(tstr);
enc = rb_enc_check(str, tstr);
tr_setup_table(tstr, table, FALSE, &del, &nodel, enc);
}
s = RSTRING_PTR(str);
if (!s || RSTRING_LEN(str) == 0) return INT2FIX(0);
send = RSTRING_END(str);
ascompat = rb_enc_asciicompat(enc);
while (s < send) {
unsigned int c;
if (ascompat && (c = *(unsigned char*)s) < 0x80) {
if (table[c]) {
n++;
}
s++;
}
else {
int clen;
c = rb_enc_codepoint_len(s, send, &clen, enc);
if (tr_find(c, table, del, nodel)) {
n++;
}
s += clen;
}
}
return SIZET2NUM(n);
}
返回 self 中由给定选择器指定的字符的总数。
对于一个 1 个字符的选择器,返回该字符的实例数
s = 'abracadabra' s.count('a') # => 5 s.count('b') # => 2 s.count('x') # => 0 s.count('') # => 0 s = 'тест' s.count('т') # => 2 s.count('е') # => 1 s = 'よろしくお願いします' s.count('よ') # => 1 s.count('し') # => 2
对于一个多字符的选择器,返回所有指定字符的实例数
s = 'abracadabra' s.count('ab') # => 7 s.count('abc') # => 8 s.count('abcd') # => 9 s.count('abcdr') # => 11 s.count('abcdrx') # => 11
顺序和重复无关紧要
s.count('ba') == s.count('ab') # => true s.count('baab') == s.count('ab') # => true
对于多个选择器,形成一个选择器,该选择器是所有选择器中字符的交集,并返回该选择器的实例数
s = 'abcdefg' s.count('abcde', 'dcbfg') == s.count('bcd') # => true s.count('abc', 'def') == s.count('') # => true
在字符选择器中,三个字符被特殊处理
-
脱字符(
'^')用作紧随其后的字符的否定运算符s = 'abracadabra' s.count('^bc') # => 8 # Count of all except 'b' and 'c'.
-
两个字符之间的连字符(
'-')定义了一个字符范围s = 'abracadabra' s.count('a-c') # => 8 # Count of all 'a', 'b', and 'c'.
-
反斜杠(
'\')用作脱字符、连字符或另一个反斜杠的转义符s = 'abracadabra' s.count('\^bc') # => 3 # Count of '^', 'b', and 'c'. s.count('a\-c') # => 6 # Count of 'a', '-', and 'c'. 'foo\bar\baz'.count('\\') # => 2 # Count of '\'.
这些用法可以混合
s = 'abracadabra' s.count('a-cq-t') # => 10 # Multiple ranges. s.count('ac-d') # => 7 # Range mixed with plain characters. s.count('^a-c') # => 3 # Range mixed with negation.
对于多个选择器,可以使用所有形式,包括否定、范围和转义。
s = 'abracadabra' s.count('^abc', '^def') == s.count('^abcdef') # => true s.count('a-e', 'c-g') == s.count('cde') # => true s.count('^abc', 'c-g') == s.count('defg') # => true
相关:参见 查询。
Source
static VALUE
rb_str_crypt(VALUE str, VALUE salt)
{
#ifdef HAVE_CRYPT_R
VALUE databuf;
struct crypt_data *data;
# define CRYPT_END() ALLOCV_END(databuf)
#else
char *tmp_buf;
extern char *crypt(const char *, const char *);
# define CRYPT_END() rb_nativethread_lock_unlock(&crypt_mutex.lock)
#endif
VALUE result;
const char *s, *saltp;
char *res;
#ifdef BROKEN_CRYPT
char salt_8bit_clean[3];
#endif
StringValue(salt);
mustnot_wchar(str);
mustnot_wchar(salt);
s = StringValueCStr(str);
saltp = RSTRING_PTR(salt);
if (RSTRING_LEN(salt) < 2 || !saltp[0] || !saltp[1]) {
rb_raise(rb_eArgError, "salt too short (need >=2 bytes)");
}
#ifdef BROKEN_CRYPT
if (!ISASCII((unsigned char)saltp[0]) || !ISASCII((unsigned char)saltp[1])) {
salt_8bit_clean[0] = saltp[0] & 0x7f;
salt_8bit_clean[1] = saltp[1] & 0x7f;
salt_8bit_clean[2] = '\0';
saltp = salt_8bit_clean;
}
#endif
#ifdef HAVE_CRYPT_R
data = ALLOCV(databuf, sizeof(struct crypt_data));
# ifdef HAVE_STRUCT_CRYPT_DATA_INITIALIZED
data->initialized = 0;
# endif
res = crypt_r(s, saltp, data);
#else
rb_nativethread_lock_lock(&crypt_mutex.lock);
res = crypt(s, saltp);
#endif
if (!res) {
int err = errno;
CRYPT_END();
rb_syserr_fail(err, "crypt");
}
#ifdef HAVE_CRYPT_R
result = rb_str_new_cstr(res);
CRYPT_END();
#else
// We need to copy this buffer because it's static and we need to unlock the mutex
// before allocating a new object (the string to be returned). If we allocate while
// holding the lock, we could run GC which fires the VM barrier and causes a deadlock
// if other ractors are waiting on this lock.
size_t res_size = strlen(res)+1;
tmp_buf = ALLOCA_N(char, res_size); // should be small enough to alloca
memcpy(tmp_buf, res, res_size);
res = tmp_buf;
CRYPT_END();
result = rb_str_new_cstr(res);
#endif
return result;
}
通过调用 crypt(3) 标准库函数并按顺序将 str 和 salt_str 作为参数来返回生成的字符串。请不要再使用此方法。它是遗留的;仅为向早期 Ruby 脚本提供向后兼容性。出于多种原因,在当代程序中使用它是不好的
-
C 的
crypt(3)的行为取决于运行它的操作系统。生成的字符串缺乏数据可移植性。 -
在某些操作系统(如 Mac OS)上,
crypt(3)永远不会失败(即,它会静默地产生意外结果)。 -
在某些操作系统(如 Mac OS)上,
crypt(3)不是线程安全的。 -
所谓的“传统”用法
crypt(3)非常非常非常薄弱。根据其 manpage,Linux 的传统crypt(3)输出只有 2**56 种变体;今天太容易被暴力破解。这就是默认行为。 -
为了使事情更健壮,一些操作系统实现了所谓的“模块化”用法。要进行此操作,您必须手动进行复杂的
salt_str参数构建。生成正确的 salt 字符串的失败往往不会产生任何错误;参数中的拼写错误通常是无法检测到的。-
例如,在以下示例中,第二次调用
String#crypt是错误的;它在“round=”中有拼写错误(缺少“s”)。但是调用并不失败,而是生成了一些意外的内容。"foo".crypt("$5$rounds=1000$salt$") # OK, proper usage "foo".crypt("$5$round=1000$salt$") # Typo not detected
-
-
即使在“模块化”模式下,一些哈希函数也被认为过时,并且不再推荐使用;例如,模块
$1$已被其作者正式放弃:参见 phk.freebsd.dk/sagas/md5crypt_eol/。另一个例子是模块$3$被认为完全损坏:参见 FreeBSD 的 manpage。 -
在某些操作系统(如 Mac OS)上,没有模块化模式。然而,如上所述,Mac OS 上的
crypt(3)永远不会失败。这意味着即使您构建了一个正确的 salt 字符串,它仍然会生成一个传统的 DES 哈希,而且您无法知道。"foo".crypt("$5$rounds=1000$salt$") # => "$5fNPQMxC5j6."
如果您因为某些原因无法迁移到其他安全且当代的密码哈希算法,请安装 string-crypt gem 并 require 'string/crypt' 以继续使用它。
返回一个等于 self 的已冻结字符串。
当且仅当以下所有条件都为真时,才返回 self
-
self已被冻结。 -
self是 String 的实例(而不是 String 的子类) -
self上未设置任何实例变量。
否则,返回的字符串是 self 的已冻结副本。
当可能时返回 self 可以节省复制 self 的开销;参见 数据去重。
还可能节省复制其他已存在字符串的开销
s0 = 'foo' s1 = 'foo' s0.object_id == s1.object_id # => false (-s0).object_id == (-s1).object_id # => true
请注意,方法 -@ 对于定义常量很方便
FileName = -'config/database.yml'
虽然其别名 dedup 更适合链式调用
'foo'.dedup.gsub!('o')
相关:参见 冻结/解冻。
Source
static VALUE
rb_str_delete(int argc, VALUE *argv, VALUE str)
{
str = str_duplicate(rb_cString, str);
rb_str_delete_bang(argc, argv, str);
return str;
}
返回一个 self 的副本,其中删除了某些字符;删除的字符是给定字符串 selectors 指定的所有字符的实例。
对于一个 1 个字符的选择器,删除该字符的所有实例
s = 'abracadabra' s.delete('a') # => "brcdbr" s.delete('b') # => "aracadara" s.delete('x') # => "abracadabra" s.delete('') # => "abracadabra" s = 'тест' s.delete('т') # => "ес" s.delete('е') # => "тст" s = 'よろしくお願いします' s.delete('よ') # => "ろしくお願いします" s.delete('し') # => "よろくお願います"
对于一个多字符的选择器,删除指定字符的所有实例
s = 'abracadabra' s.delete('ab') # => "rcdr" s.delete('abc') # => "rdr" s.delete('abcd') # => "rr" s.delete('abcdr') # => "" s.delete('abcdrx') # => ""
顺序和重复无关紧要
s.delete('ba') == s.delete('ab') # => true s.delete('baab') == s.delete('ab') # => true
对于多个选择器,形成一个选择器,该选择器是所有选择器中字符的交集,并删除该选择器指定的字符的所有实例
s = 'abcdefg' s.delete('abcde', 'dcbfg') == s.delete('bcd') # => true s.delete('abc', 'def') == s.delete('') # => true
在字符选择器中,三个字符被特殊处理
-
脱字符(
'^')用作紧随其后的字符的否定运算符s = 'abracadabra' s.delete('^bc') # => "bcb" # Deletes all except 'b' and 'c'.
-
两个字符之间的连字符(
'-')定义了一个字符范围s = 'abracadabra' s.delete('a-c') # => "rdr" # Deletes all 'a', 'b', and 'c'.
-
反斜杠(
'\')用作脱字符、连字符或另一个反斜杠的转义符s = 'abracadabra' s.delete('\^bc') # => "araadara" # Deletes all '^', 'b', and 'c'. s.delete('a\-c') # => "brdbr" # Deletes all 'a', '-', and 'c'. 'foo\bar\baz'.delete('\\') # => "foobarbaz" # Deletes all '\'.
这些用法可以混合
s = 'abracadabra' s.delete('a-cq-t') # => "d" # Multiple ranges. s.delete('ac-d') # => "brbr" # Range mixed with plain characters. s.delete('^a-c') # => "abacaaba" # Range mixed with negation.
对于多个选择器,可以使用所有形式,包括否定、范围和转义。
s = 'abracadabra' s.delete('^abc', '^def') == s.delete('^abcdef') # => true s.delete('a-e', 'c-g') == s.delete('cde') # => true s.delete('^abc', 'c-g') == s.delete('defg') # => true
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_delete_bang(int argc, VALUE *argv, VALUE str)
{
char squeez[TR_TABLE_SIZE];
rb_encoding *enc = 0;
char *s, *send, *t;
VALUE del = 0, nodel = 0;
int modify = 0;
int i, ascompat, cr;
if (RSTRING_LEN(str) == 0 || !RSTRING_PTR(str)) return Qnil;
rb_check_arity(argc, 1, UNLIMITED_ARGUMENTS);
for (i=0; i<argc; i++) {
VALUE s = argv[i];
StringValue(s);
enc = rb_enc_check(str, s);
tr_setup_table(s, squeez, i==0, &del, &nodel, enc);
}
str_modify_keep_cr(str);
ascompat = rb_enc_asciicompat(enc);
s = t = RSTRING_PTR(str);
send = RSTRING_END(str);
cr = ascompat ? ENC_CODERANGE_7BIT : ENC_CODERANGE_VALID;
while (s < send) {
unsigned int c;
int clen;
if (ascompat && (c = *(unsigned char*)s) < 0x80) {
if (squeez[c]) {
modify = 1;
}
else {
if (t != s) *t = c;
t++;
}
s++;
}
else {
c = rb_enc_codepoint_len(s, send, &clen, enc);
if (tr_find(c, squeez, del, nodel)) {
modify = 1;
}
else {
if (t != s) rb_enc_mbcput(c, t, enc);
t += clen;
if (cr == ENC_CODERANGE_7BIT) cr = ENC_CODERANGE_VALID;
}
s += clen;
}
}
TERM_FILL(t, TERM_LEN(str));
STR_SET_LEN(str, t - RSTRING_PTR(str));
ENC_CODERANGE_SET(str, cr);
if (modify) return str;
return Qnil;
}
类似于 String#delete,但原地修改 self;如果删除了任何字符,则返回 self,否则返回 nil。
相关:参见 修改。
Source
static VALUE
rb_str_delete_prefix(VALUE str, VALUE prefix)
{
long prefixlen;
prefixlen = deleted_prefix_length(str, prefix);
if (prefixlen <= 0) return str_duplicate(rb_cString, str);
return rb_str_subseq(str, prefixlen, RSTRING_LEN(str) - prefixlen);
}
返回 self 的副本,其中删除了前缀 prefix
'oof'.delete_prefix('o') # => "of" 'oof'.delete_prefix('oo') # => "f" 'oof'.delete_prefix('oof') # => "" 'oof'.delete_prefix('x') # => "oof" 'тест'.delete_prefix('те') # => "ст" 'こんにちは'.delete_prefix('こん') # => "にちは"
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_delete_prefix_bang(VALUE str, VALUE prefix)
{
long prefixlen;
str_modify_keep_cr(str);
prefixlen = deleted_prefix_length(str, prefix);
if (prefixlen <= 0) return Qnil;
return rb_str_drop_bytes(str, prefixlen);
}
类似于 String#delete_prefix,但 self 被原地修改;如果删除了前缀,则返回 self,否则返回 nil。
相关:参见 修改。
Source
static VALUE
rb_str_delete_suffix(VALUE str, VALUE suffix)
{
long suffixlen;
suffixlen = deleted_suffix_length(str, suffix);
if (suffixlen <= 0) return str_duplicate(rb_cString, str);
return rb_str_subseq(str, 0, RSTRING_LEN(str) - suffixlen);
}
返回 self 的副本,其中删除了后缀 suffix
'foo'.delete_suffix('o') # => "fo" 'foo'.delete_suffix('oo') # => "f" 'foo'.delete_suffix('foo') # => "" 'foo'.delete_suffix('f') # => "foo" 'foo'.delete_suffix('x') # => "foo" 'тест'.delete_suffix('ст') # => "те" 'こんにちは'.delete_suffix('ちは') # => "こんに"
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_delete_suffix_bang(VALUE str, VALUE suffix)
{
long olen, suffixlen, len;
str_modifiable(str);
suffixlen = deleted_suffix_length(str, suffix);
if (suffixlen <= 0) return Qnil;
olen = RSTRING_LEN(str);
str_modify_keep_cr(str);
len = olen - suffixlen;
STR_SET_LEN(str, len);
TERM_FILL(&RSTRING_PTR(str)[len], TERM_LEN(str));
if (ENC_CODERANGE(str) != ENC_CODERANGE_7BIT) {
ENC_CODERANGE_CLEAR(str);
}
return str;
}
类似于 String#delete_suffix,但 self 被原地修改;如果删除了后缀,则返回 self,否则返回 nil。
相关:参见 修改。
Source
static VALUE
rb_str_downcase(int argc, VALUE *argv, VALUE str)
{
rb_encoding *enc;
OnigCaseFoldType flags = ONIGENC_CASE_DOWNCASE;
VALUE ret;
flags = check_case_options(argc, argv, flags);
enc = str_true_enc(str);
if (case_option_single_p(flags, enc, str)) {
ret = rb_str_new(RSTRING_PTR(str), RSTRING_LEN(str));
str_enc_copy_direct(ret, str);
downcase_single(ret);
}
else if (flags&ONIGENC_CASE_ASCII_ONLY) {
ret = rb_str_new(0, RSTRING_LEN(str));
rb_str_ascii_casemap(str, ret, &flags, enc);
}
else {
ret = rb_str_casemap(str, &flags, enc);
}
return ret;
}
返回一个包含 self 中小写字符的新字符串
'HELLO'.downcase # => "hello" 'STRAẞE'.downcase # => "straße" 'ПРИВЕТ'.downcase # => "привет" 'RubyGems.org'.downcase # => "rubygems.org"
某些字符(以及某些字符集)没有大写和小写的版本;参见 大小写映射
s = '1, 2, 3, ...' s.downcase == s # => true s = 'こんにちは' s.downcase == s # => true
大小写受给定的 mapping 影响,该映射可以是 :ascii、:fold 或 :turkic;参见 大小写映射。
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_downcase_bang(int argc, VALUE *argv, VALUE str)
{
rb_encoding *enc;
OnigCaseFoldType flags = ONIGENC_CASE_DOWNCASE;
flags = check_case_options(argc, argv, flags);
str_modify_keep_cr(str);
enc = str_true_enc(str);
if (case_option_single_p(flags, enc, str)) {
if (downcase_single(str))
flags |= ONIGENC_CASE_MODIFIED;
}
else if (flags&ONIGENC_CASE_ASCII_ONLY)
rb_str_ascii_casemap(str, str, &flags, enc);
else
str_shared_replace(str, rb_str_casemap(str, &flags, enc));
if (ONIGENC_CASE_MODIFIED&flags) return str;
return Qnil;
}
Source
VALUE
rb_str_dump(VALUE str)
{
int encidx = rb_enc_get_index(str);
rb_encoding *enc = rb_enc_from_index(encidx);
long len;
const char *p, *pend;
char *q, *qend;
VALUE result;
int u8 = (encidx == rb_utf8_encindex());
static const char nonascii_suffix[] = ".dup.force_encoding(\"%s\")";
len = 2; /* "" */
if (!rb_enc_asciicompat(enc)) {
len += strlen(nonascii_suffix) - rb_strlen_lit("%s");
len += strlen(enc->name);
}
p = RSTRING_PTR(str); pend = p + RSTRING_LEN(str);
while (p < pend) {
int clen;
unsigned char c = *p++;
switch (c) {
case '"': case '\\':
case '\n': case '\r':
case '\t': case '\f':
case '\013': case '\010': case '\007': case '\033':
clen = 2;
break;
case '#':
clen = IS_EVSTR(p, pend) ? 2 : 1;
break;
default:
if (ISPRINT(c)) {
clen = 1;
}
else {
if (u8 && c > 0x7F) { /* \u notation */
int n = rb_enc_precise_mbclen(p-1, pend, enc);
if (MBCLEN_CHARFOUND_P(n)) {
unsigned int cc = rb_enc_mbc_to_codepoint(p-1, pend, enc);
if (cc <= 0xFFFF)
clen = 6; /* \uXXXX */
else if (cc <= 0xFFFFF)
clen = 9; /* \u{XXXXX} */
else
clen = 10; /* \u{XXXXXX} */
p += MBCLEN_CHARFOUND_LEN(n)-1;
break;
}
}
clen = 4; /* \xNN */
}
break;
}
if (clen > LONG_MAX - len) {
rb_raise(rb_eRuntimeError, "string size too big");
}
len += clen;
}
result = rb_str_new(0, len);
p = RSTRING_PTR(str); pend = p + RSTRING_LEN(str);
q = RSTRING_PTR(result); qend = q + len + 1;
*q++ = '"';
while (p < pend) {
unsigned char c = *p++;
if (c == '"' || c == '\\') {
*q++ = '\\';
*q++ = c;
}
else if (c == '#') {
if (IS_EVSTR(p, pend)) *q++ = '\\';
*q++ = '#';
}
else if (c == '\n') {
*q++ = '\\';
*q++ = 'n';
}
else if (c == '\r') {
*q++ = '\\';
*q++ = 'r';
}
else if (c == '\t') {
*q++ = '\\';
*q++ = 't';
}
else if (c == '\f') {
*q++ = '\\';
*q++ = 'f';
}
else if (c == '\013') {
*q++ = '\\';
*q++ = 'v';
}
else if (c == '\010') {
*q++ = '\\';
*q++ = 'b';
}
else if (c == '\007') {
*q++ = '\\';
*q++ = 'a';
}
else if (c == '\033') {
*q++ = '\\';
*q++ = 'e';
}
else if (ISPRINT(c)) {
*q++ = c;
}
else {
*q++ = '\\';
if (u8) {
int n = rb_enc_precise_mbclen(p-1, pend, enc) - 1;
if (MBCLEN_CHARFOUND_P(n)) {
int cc = rb_enc_mbc_to_codepoint(p-1, pend, enc);
p += n;
if (cc <= 0xFFFF)
snprintf(q, qend-q, "u%04X", cc); /* \uXXXX */
else
snprintf(q, qend-q, "u{%X}", cc); /* \u{XXXXX} or \u{XXXXXX} */
q += strlen(q);
continue;
}
}
snprintf(q, qend-q, "x%02X", c);
q += 3;
}
}
*q++ = '"';
*q = '\0';
if (!rb_enc_asciicompat(enc)) {
snprintf(q, qend-q, nonascii_suffix, enc->name);
encidx = rb_ascii8bit_encindex();
}
/* result from dump is ASCII */
rb_enc_associate_index(result, encidx);
ENC_CODERANGE_SET(result, ENC_CODERANGE_7BIT);
return result;
}
对于普通字符串,此方法 +String#dump+ 返回 self 的可打印的仅 ASCII 版本,并用双引号括起来。
对于转储的字符串,方法 String#undump 是 +String#dump+ 的反向操作;它返回 self 的“恢复”版本,其中所有转储的更改都已撤销。
在最简单的情况下,转储的字符串包含原始字符串,并用双引号括起来;此示例在 irb(交互式 Ruby)中完成,它使用方法 'inspect` 来呈现结果
s = 'hello' # => "hello" s.dump # => "\"hello\"" s.dump.undump # => "hello"
请记住,在上面第二行中
-
外部双引号由
inspect添加,并且不是dump输出的一部分。 -
内部双引号是
dump输出的一部分,并且由于它们位于外部双引号内而被inspect转义。
为避免混淆,我们将使用此辅助方法来省略外部双引号
def dump(s) print "String: ", s, "\n" print "Dumped: ", s.dump, "\n" print "Undumped: ", s.dump.undump, "\n" end
因此,对于字符串 'hello',我们将看到
String: hello Dumped: "hello" Undumped: hello
在转储中,某些特殊字符会被转义
String: " Dumped: "\"" Undumped: " String: \ Dumped: "\\" Undumped: \
在转储中,不可打印字符会被可打印字符替换;不可打印字符是空白字符(空格本身除外);在这里,我们看到这些字符的序数,以及解释性文本
h = { 7 => 'Alert (BEL)', 8 => 'Backspace (BS)', 9 => 'Horizontal tab (HT)', 10 => 'Linefeed (LF)', 11 => 'Vertical tab (VT)', 12 => 'Formfeed (FF)', 13 => 'Carriage return (CR)' }
在此示例中,转储输出由方法 inspect 打印,因此同时包含外部双引号和转义的双引号
s = '' h.keys.each {|i| s << i } # => [7, 8, 9, 10, 11, 12, 13] s # => "\a\b\t\n\v\f\r" s.dump # => "\"\\a\\b\\t\\n\\v\\f\\r\""
如果 self 编码为 UTF-8 并包含 Unicode 字符,则每个 Unicode 字符将被转储为 Unicode 转义序列
String: тест Dumped: "\u0442\u0435\u0441\u0442" Undumped: тест String: こんにちは Dumped: "\u3053\u3093\u306B\u3061\u306F" Undumped: こんにちは
如果 self 的编码不是 ASCII 兼容的(即,如果 self.encoding.ascii_compatible? 返回 false),则每个 ASCII 兼容字节将被转储为 ASCII 字符,所有其他字节将被转储为十六进制;还会追加 .dup.force_encoding(\"encoding\"),其中 <encoding> 是 self.encoding.name
String: hello
Dumped: "\xFE\xFF\x00h\x00e\x00l\x00l\x00o".dup.force_encoding("UTF-16")
Undumped: hello
String: тест
Dumped: "\xFE\xFF\x04B\x045\x04A\x04B".dup.force_encoding("UTF-16")
Undumped: тест
String: こんにちは
Dumped: "\xFE\xFF0S0\x930k0a0o".dup.force_encoding("UTF-16")
Undumped: こんにちは
Source
static VALUE
rb_str_each_byte(VALUE str)
{
RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_byte_size);
return rb_str_enumerate_bytes(str, 0);
}
给定一个块时,将对 self 中的每个字节调用该块;返回 self
a = [] 'hello'.each_byte {|byte| a.push(byte) } # Five 1-byte characters. a # => [104, 101, 108, 108, 111] a = [] 'тест'.each_byte {|byte| a.push(byte) } # Four 2-byte characters. a # => [209, 130, 208, 181, 209, 129, 209, 130] a = [] 'こんにちは'.each_byte {|byte| a.push(byte) } # Five 3-byte characters. a # => [227, 129, 147, 227, 130, 147, 227, 129, 171, 227, 129, 161, 227, 129, 175]
没有块时,返回一个枚举器。
相关:参见 迭代。
Source
static VALUE
rb_str_each_char(VALUE str)
{
RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_char_size);
return rb_str_enumerate_chars(str, 0);
}
给定一个块时,将对 self 中的每个字符调用该块;返回 self
a = [] 'hello'.each_char do |char| a.push(char) end a # => ["h", "e", "l", "l", "o"] a = [] 'тест'.each_char do |char| a.push(char) end a # => ["т", "е", "с", "т"] a = [] 'こんにちは'.each_char do |char| a.push(char) end a # => ["こ", "ん", "に", "ち", "は"]
没有块时,返回一个枚举器。
相关:参见 迭代。
Source
static VALUE
rb_str_each_codepoint(VALUE str)
{
RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_char_size);
return rb_str_enumerate_codepoints(str, 0);
}
给定一个块时,将对 self 中的每个码点调用该块;每个 码点 是字符的整数值;返回 self
a = [] 'hello'.each_codepoint do |codepoint| a.push(codepoint) end a # => [104, 101, 108, 108, 111] a = [] 'тест'.each_codepoint do |codepoint| a.push(codepoint) end a # => [1090, 1077, 1089, 1090] a = [] 'こんにちは'.each_codepoint do |codepoint| a.push(codepoint) end a # => [12371, 12435, 12395, 12385, 12399]
没有块时,返回一个枚举器。
相关:参见 迭代。
Source
static VALUE
rb_str_each_grapheme_cluster(VALUE str)
{
RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_grapheme_cluster_size);
return rb_str_enumerate_grapheme_clusters(str, 0);
}
给定一个块时,将对 self 中的每个字素簇调用给定的块(参见 Unicode 字素簇边界);返回 self
a = [] 'hello'.each_grapheme_cluster do |grapheme_cluster| a.push(grapheme_cluster) end a # => ["h", "e", "l", "l", "o"] a = [] 'тест'.each_grapheme_cluster do |grapheme_cluster| a.push(grapheme_cluster) end a # => ["т", "е", "с", "т"] a = [] 'こんにちは'.each_grapheme_cluster do |grapheme_cluster| a.push(grapheme_cluster) end a # => ["こ", "ん", "に", "ち", "は"]
没有块时,返回一个枚举器。
相关:参见 迭代。
Source
static VALUE
rb_str_each_line(int argc, VALUE *argv, VALUE str)
{
RETURN_SIZED_ENUMERATOR(str, argc, argv, 0);
return rb_str_enumerate_lines(argc, argv, str, 0);
}
给定一个块时,将 self 按每个 record_separator 出现的位置分割成的子字符串(行)形成;将每行传递给块;返回 self。
使用默认 record_separator
$/ # => "\n" s = <<~EOT This is the first line. This is line two. This is line four. This is line five. EOT s.each_line {|line| p line }
输出
"This is the first line.\n" "This is line two.\n" "\n" "This is line four.\n" "This is line five.\n"
使用不同的 record_separator
record_separator = ' is ' s.each_line(record_separator) {|line| p line }
输出
"This is " "the first line.\nThis is " "line two.\n\nThis is " "line four.\nThis is " "line five.\n"
当 chomp 为 true 时,从每行中删除尾部的 record_separator
s.each_line(chomp: true) {|line| p line }
输出
"This is the first line." "This is line two." "" "This is line four." "This is line five."
将空字符串作为 record_separator,通过在两个或多个换行符处分割来形成并传递“段落”
record_separator = '' s.each_line(record_separator) {|line| p line }
输出
"This is the first line.\nThis is line two.\n\n" "This is line four.\nThis is line five.\n"
没有块时,返回一个枚举器。
相关:参见 迭代。
Source
static VALUE
rb_str_empty(VALUE str)
{
return RBOOL(RSTRING_LEN(str) == 0);
}
Source
static VALUE
str_encode(int argc, VALUE *argv, VALUE str)
{
VALUE newstr = str;
int encidx = str_transcode(argc, argv, &newstr);
return encoded_dup(newstr, str, encidx);
}
返回一个根据 dst_encoding 转码的 self 的副本;参见 编码。
默认情况下,如果 self 包含无效字节或 dst_encoding 中未定义的字符,则引发异常;该行为可以通过编码选项进行修改;参见下方。
无参数
-
如果
Encoding.default_internal为nil(默认值),则使用相同的编码Encoding.default_internal # => nil s = "Ruby\x99".force_encoding('Windows-1252') s.encoding # => #<Encoding:Windows-1252> s.bytes # => [82, 117, 98, 121, 153] t = s.encode # => "Ruby\x99" t.encoding # => #<Encoding:Windows-1252> t.bytes # => [82, 117, 98, 121, 226, 132, 162]
-
否则,使用编码
Encoding.default_internalEncoding.default_internal = 'UTF-8' t = s.encode # => "Ruby™" t.encoding # => #<Encoding:UTF-8>
仅给定参数 dst_encoding,则使用该编码
s = "Ruby\x99".force_encoding('Windows-1252') s.encoding # => #<Encoding:Windows-1252> t = s.encode('UTF-8') # => "Ruby™" t.encoding # => #<Encoding:UTF-8>
给定参数 dst_encoding 和 src_encoding,将 self 解释为使用 src_encoding,然后使用 dst_encoding 对新字符串进行编码
s = "Ruby\x99" t = s.encode('UTF-8', 'Windows-1252') # => "Ruby™" t.encoding # => #<Encoding:UTF-8>
可选关键字参数 enc_opts 指定编码选项;参见 编码选项。
请注意,除非给出 invalid: :replace 选项,否则从编码 enc 到相同编码 enc 的转换(无论 enc 是显式给出还是隐式给出)都是一个无操作,即字符串只是被复制而没有任何更改,并且不会引发异常,即使存在无效字节。
相关:参见 转换为新字符串。
Source
static VALUE
str_encode_bang(int argc, VALUE *argv, VALUE str)
{
VALUE newstr;
int encidx;
rb_check_frozen(str);
newstr = str;
encidx = str_transcode(argc, argv, &newstr);
if (encidx < 0) return str;
if (newstr == str) {
rb_enc_associate_index(str, encidx);
return str;
}
rb_str_shared_replace(str, newstr);
return str_encode_associate(str, encidx);
}
Source
VALUE
rb_obj_encoding(VALUE obj)
{
int idx = rb_enc_get_index(obj);
if (idx < 0) {
rb_raise(rb_eTypeError, "unknown encoding");
}
return rb_enc_from_encoding_index(idx & ENC_INDEX_MASK);
}
Source
static VALUE
rb_str_end_with(int argc, VALUE *argv, VALUE str)
{
int i;
for (i=0; i<argc; i++) {
VALUE tmp = argv[i];
const char *p, *s, *e;
long slen, tlen;
rb_encoding *enc;
StringValue(tmp);
enc = rb_enc_check(str, tmp);
if ((tlen = RSTRING_LEN(tmp)) == 0) return Qtrue;
if ((slen = RSTRING_LEN(str)) < tlen) continue;
p = RSTRING_PTR(str);
e = p + slen;
s = e - tlen;
if (!at_char_boundary(p, s, e, enc))
continue;
if (memcmp(s, RSTRING_PTR(tmp), tlen) == 0)
return Qtrue;
}
return Qfalse;
}
返回 self 是否以给定的任何 strings 结尾
'foo'.end_with?('oo') # => true 'foo'.end_with?('bar', 'oo') # => true 'foo'.end_with?('bar', 'baz') # => false 'foo'.end_with?('') # => true 'тест'.end_with?('т') # => true 'こんにちは'.end_with?('は') # => true
相关:参见 查询。
Source
VALUE
rb_str_eql(VALUE str1, VALUE str2)
{
if (str1 == str2) return Qtrue;
if (!RB_TYPE_P(str2, T_STRING)) return Qfalse;
return rb_str_eql_internal(str1, str2);
}
返回 self 和 object 是否具有相同的长度和内容
s = 'foo' s.eql?('foo') # => true s.eql?('food') # => false s.eql?('FOO') # => false
如果两个字符串的编码不兼容,则返回 false
s0 = "äöü" # => "äöü" s1 = s0.encode(Encoding::ISO_8859_1) # => "\xE4\xF6\xFC" s0.encoding # => #<Encoding:UTF-8> s1.encoding # => #<Encoding:ISO-8859-1> s0.eql?(s1) # => false
参见 编码。
相关:参见 查询。
Source
static VALUE
rb_str_force_encoding(VALUE str, VALUE enc)
{
str_modifiable(str);
rb_encoding *encoding = rb_to_encoding(enc);
int idx = rb_enc_to_index(encoding);
// If the encoding is unchanged, we do nothing.
if (ENCODING_GET(str) == idx) {
return str;
}
rb_enc_associate_index(str, idx);
// If the coderange was 7bit and the new encoding is ASCII-compatible
// we can keep the coderange.
if (ENC_CODERANGE(str) == ENC_CODERANGE_7BIT && encoding && rb_enc_asciicompat(encoding)) {
return str;
}
ENC_CODERANGE_CLEAR(str);
return str;
}
将 self 的编码更改为给定的 encoding,该编码可以是字符串编码名称或 Encoding 对象;不会更改底层字节;返回 self
s = 'łał' s.bytes # => [197, 130, 97, 197, 130] s.encoding # => #<Encoding:UTF-8> s.force_encoding('ascii') # => "\xC5\x82a\xC5\x82" s.encoding # => #<Encoding:US-ASCII> s.valid_encoding? # => true s.bytes # => [197, 130, 97, 197, 130]
即使给定的 encoding 对 self 无效,也会进行更改(如上述更改)
s.valid_encoding? # => false
参见 编码。
相关:参见 修改。
Source
VALUE
rb_str_getbyte(VALUE str, VALUE index)
{
long pos = NUM2LONG(index);
if (pos < 0)
pos += RSTRING_LEN(str);
if (pos < 0 || RSTRING_LEN(str) <= pos)
return Qnil;
return INT2FIX((unsigned char)RSTRING_PTR(str)[pos]);
}
将零基 index 处的字节作为整数返回
s = 'foo' s.getbyte(0) # => 102 s.getbyte(1) # => 111 s.getbyte(2) # => 111
如果 index 为负数,则从末尾开始倒数
s.getbyte(-3) # => 102
如果 index 超出范围,则返回 nil
s.getbyte(3) # => nil s.getbyte(-4) # => nil
更多示例
s = 'тест' s.bytes # => [209, 130, 208, 181, 209, 129, 209, 130] s.getbyte(2) # => 208 s = 'こんにちは' s.bytes # => [227, 129, 147, 227, 130, 147, 227, 129, 171, 227, 129, 161, 227, 129, 175] s.getbyte(2) # => 147
相关:参见 转换为非字符串。
Source
static VALUE
rb_str_grapheme_clusters(VALUE str)
{
VALUE ary = WANTARRAY("grapheme_clusters", rb_str_strlen(str));
return rb_str_enumerate_grapheme_clusters(str, ary);
}
返回 self 中的字素簇数组(参见 Unicode 字素簇边界)
s = "ä-pqr-b̈-xyz-c̈" s.size # => 16 s.bytesize # => 19 s.grapheme_clusters.size # => 13 s.grapheme_clusters # => ["ä", "-", "p", "q", "r", "-", "b̈", "-", "x", "y", "z", "-", "c̈"]
详情
s = "ä" s.grapheme_clusters # => ["ä"] # One grapheme cluster. s.bytes # => [97, 204, 136] # Three bytes. s.chars # => ["a", "̈"] # Two characters. s.chars.map {|char| char.ord } # => [97, 776] # Their values.
相关:参见 转换为非字符串。
Source
static VALUE
rb_str_gsub(int argc, VALUE *argv, VALUE str)
{
return str_gsub(argc, argv, str, 0);
}
返回一个 self 的副本,其中零个或多个子字符串被替换。
参数 pattern 可以是字符串或 Regexp;参数 replacement 可以是字符串或 Hash。参数值的不同类型使此方法非常通用。
下面是一些简单的示例;有关更多示例,请参见 替换方法。
给定参数 pattern 和字符串 replacement,用给定的 replacement 字符串替换每个匹配的子字符串
s = 'abracadabra' s.gsub('ab', 'AB') # => "ABracadABra" s.gsub(/[a-c]/, 'X') # => "XXrXXXdXXrX"
给定参数 pattern 和哈希 replacement,用给定 replacement 哈希中的值替换每个匹配的子字符串,或者删除它
h = {'a' => 'A', 'b' => 'B', 'c' => 'C'} s.gsub(/[a-c]/, h) # => "ABrACAdABrA" # 'a', 'b', 'c' replaced. s.gsub(/[a-d]/, h) # => "ABrACAABrA" # 'd' removed.
给定参数 pattern 和一个块,用每个匹配的子字符串调用该块;将该子字符串替换为块的返回值
s.gsub(/[a-d]/) {|substring| substring.upcase } # => "ABrACADABrA"
给定参数 pattern 且没有块,则返回一个新的 Enumerator。
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_gsub_bang(int argc, VALUE *argv, VALUE str)
{
str_modify_keep_cr(str);
return str_gsub(argc, argv, str, 1);
}
Source
static VALUE
rb_str_hash_m(VALUE str)
{
st_index_t hval = rb_str_hash(str);
return ST2FIX(hval);
}
返回 self 的整数哈希值。
具有相同内容和兼容编码的两个 String 对象也具有相同的哈希值;请参阅 Object#hash 和 Encodings
s = 'foo' h = s.hash # => -569050784 h == 'foo'.hash # => true h == 'food'.hash # => false h == 'FOO'.hash # => false s0 = "äöü" s1 = s0.encode(Encoding::ISO_8859_1) s0.encoding # => #<Encoding:UTF-8> s1.encoding # => #<Encoding:ISO-8859-1> s0.hash == s1.hash # => false
相关:参见 查询。
Source
static VALUE
rb_str_hex(VALUE str)
{
return rb_str_to_inum(str, 16, FALSE);
}
将 self 的前导子串解释为十六进制,可能带符号;返回其作为整数的值。
当前导子串以以下内容开头时,将其解释为十六进制:
-
一个或多个表示十六进制数字的字符(每个字符的范围在
'0'..'9'、'a'..'f'或'A'..'F');要解释的字符串在第一个不表示十六进制数字的字符处结束。'f'.hex # => 15 '11'.hex # => 17 'FFF'.hex # => 4095 'fffg'.hex # => 4095 'foo'.hex # => 15 # 'f' hexadecimal, 'oo' not. 'bar'.hex # => 186 # 'ba' hexadecimal, 'r' not. 'deadbeef'.hex # => 3735928559
-
'0x'或'0X',后跟一个或多个十六进制数字。'0xfff'.hex # => 4095 '0xfffg'.hex # => 4095
以上任何一种都可能以 '-' 为前缀,这会否定解释后的值。
'-fff'.hex # => -4095 '-0xFFF'.hex # => -4095
对于上面未描述的任何子串,返回零。
'xxx'.hex # => 0 ''.hex # => 0
请注意,与 oct 不同,此方法仅解释十六进制,而不解释二进制、八进制或十进制表示法。
'0b111'.hex # => 45329 '0o777'.hex # => 0 '0d999'.hex # => 55705
相关:请参阅 转换为非字符串。
Source
VALUE
rb_str_include(VALUE str, VALUE arg)
{
long i;
StringValue(arg);
i = rb_str_index(str, arg, 0);
return RBOOL(i != -1);
}
返回 self 是否包含 other_string。
s = 'bar' s.include?('ba') # => true s.include?('ar') # => true s.include?('bar') # => true s.include?('a') # => true s.include?('') # => true s.include?('foo') # => false
相关:参见 查询。
Source
static VALUE
rb_str_index_m(int argc, VALUE *argv, VALUE str)
{
VALUE sub;
VALUE initpos;
rb_encoding *enc = STR_ENC_GET(str);
long pos;
if (rb_scan_args(argc, argv, "11", &sub, &initpos) == 2) {
long slen = str_strlen(str, enc); /* str's enc */
pos = NUM2LONG(initpos);
if (pos < 0 ? (pos += slen) < 0 : pos > slen) {
if (RB_TYPE_P(sub, T_REGEXP)) {
rb_backref_set(Qnil);
}
return Qnil;
}
}
else {
pos = 0;
}
if (RB_TYPE_P(sub, T_REGEXP)) {
pos = str_offset(RSTRING_PTR(str), RSTRING_END(str), pos,
enc, single_byte_optimizable(str));
if (rb_reg_search(sub, str, pos, 0) >= 0) {
VALUE match = rb_backref_get();
struct re_registers *regs = RMATCH_REGS(match);
pos = rb_str_sublen(str, BEG(0));
return LONG2NUM(pos);
}
}
else {
StringValue(sub);
pos = rb_str_index(str, sub, pos);
if (pos >= 0) {
pos = rb_str_sublen(str, pos);
return LONG2NUM(pos);
}
}
return Qnil;
}
返回第一个与给定参数 pattern 匹配的子串的整数位置,如果未找到则返回 nil。
当 pattern 是字符串时,返回 self 中第一个匹配子串的索引。
'foo'.index('f') # => 0 'foo'.index('o') # => 1 'foo'.index('oo') # => 1 'foo'.index('ooo') # => nil 'тест'.index('с') # => 2 # Characters, not bytes. 'こんにちは'.index('ち') # => 3
当 pattern 是 Regexp 时,返回 self 中第一个匹配项的索引。
'foo'.index(/o./) # => 1 'foo'.index(/.o/) # => 0
当 offset 为非负数时,从位置 offset 开始搜索;返回的索引相对于 self 的开头。
'bar'.index('r', 0) # => 2 'bar'.index('r', 1) # => 2 'bar'.index('r', 2) # => 2 'bar'.index('r', 3) # => nil 'bar'.index(/[r-z]/, 0) # => 2 'тест'.index('с', 1) # => 2 'тест'.index('с', 2) # => 2 'тест'.index('с', 3) # => nil # Offset in characters, not bytes. 'こんにちは'.index('ち', 2) # => 3
当 offset 参数为负整数时,通过从 self 末尾开始计数来选择搜索位置。
'foo'.index('o', -1) # => 2 'foo'.index('o', -2) # => 1 'foo'.index('o', -3) # => 1 'foo'.index('o', -4) # => nil 'foo'.index(/o./, -2) # => 1 'foo'.index(/.o/, -2) # => 1
相关:参见 查询。
Source
static VALUE
rb_str_insert(VALUE str, VALUE idx, VALUE str2)
{
long pos = NUM2LONG(idx);
if (pos == -1) {
return rb_str_append(str, str2);
}
else if (pos < 0) {
pos++;
}
rb_str_update(str, pos, 0, str2);
return str;
}
将给定的 other_string 插入到 self 中;返回 self。
如果给定的 index 为非负数,则在偏移量 index 处插入 other_string。
'foo'.insert(0, 'bar') # => "barfoo" 'foo'.insert(1, 'bar') # => "fbaroo" 'foo'.insert(3, 'bar') # => "foobar" 'тест'.insert(2, 'bar') # => "теbarст" # Characters, not bytes. 'こんにちは'.insert(2, 'bar') # => "こんbarにちは"
如果 index 为负数,则从 self 末尾开始计数,并在偏移量之后插入 other_string。
'foo'.insert(-2, 'bar') # => "fobaro"
相关:参见 修改。
Source
VALUE
rb_str_inspect(VALUE str)
{
int encidx = ENCODING_GET(str);
rb_encoding *enc = rb_enc_from_index(encidx);
const char *p, *pend, *prev;
char buf[CHAR_ESC_LEN + 1];
VALUE result = rb_str_buf_new(0);
rb_encoding *resenc = rb_default_internal_encoding();
int unicode_p = rb_enc_unicode_p(enc);
int asciicompat = rb_enc_asciicompat(enc);
if (resenc == NULL) resenc = rb_default_external_encoding();
if (!rb_enc_asciicompat(resenc)) resenc = rb_usascii_encoding();
rb_enc_associate(result, resenc);
str_buf_cat2(result, "\"");
p = RSTRING_PTR(str); pend = RSTRING_END(str);
prev = p;
while (p < pend) {
unsigned int c, cc;
int n;
n = rb_enc_precise_mbclen(p, pend, enc);
if (!MBCLEN_CHARFOUND_P(n)) {
if (p > prev) str_buf_cat(result, prev, p - prev);
n = rb_enc_mbminlen(enc);
if (pend < p + n)
n = (int)(pend - p);
while (n--) {
snprintf(buf, CHAR_ESC_LEN, "\\x%02X", *p & 0377);
str_buf_cat(result, buf, strlen(buf));
prev = ++p;
}
continue;
}
n = MBCLEN_CHARFOUND_LEN(n);
c = rb_enc_mbc_to_codepoint(p, pend, enc);
p += n;
if ((asciicompat || unicode_p) &&
(c == '"'|| c == '\\' ||
(c == '#' &&
p < pend &&
MBCLEN_CHARFOUND_P(rb_enc_precise_mbclen(p,pend,enc)) &&
(cc = rb_enc_codepoint(p,pend,enc),
(cc == '$' || cc == '@' || cc == '{'))))) {
if (p - n > prev) str_buf_cat(result, prev, p - n - prev);
str_buf_cat2(result, "\\");
if (asciicompat || enc == resenc) {
prev = p - n;
continue;
}
}
switch (c) {
case '\n': cc = 'n'; break;
case '\r': cc = 'r'; break;
case '\t': cc = 't'; break;
case '\f': cc = 'f'; break;
case '\013': cc = 'v'; break;
case '\010': cc = 'b'; break;
case '\007': cc = 'a'; break;
case 033: cc = 'e'; break;
default: cc = 0; break;
}
if (cc) {
if (p - n > prev) str_buf_cat(result, prev, p - n - prev);
buf[0] = '\\';
buf[1] = (char)cc;
str_buf_cat(result, buf, 2);
prev = p;
continue;
}
/* The special casing of 0x85 (NEXT_LINE) here is because
* Oniguruma historically treats it as printable, but it
* doesn't match the print POSIX bracket class or character
* property in regexps.
*
* See Ruby Bug #16842 for details:
* https://bugs.ruby-lang.org/issues/16842
*/
if ((enc == resenc && rb_enc_isprint(c, enc) && c != 0x85) ||
(asciicompat && rb_enc_isascii(c, enc) && ISPRINT(c))) {
continue;
}
else {
if (p - n > prev) str_buf_cat(result, prev, p - n - prev);
rb_str_buf_cat_escaped_char(result, c, unicode_p);
prev = p;
continue;
}
}
if (p > prev) str_buf_cat(result, prev, p - prev);
str_buf_cat2(result, "\"");
return result;
}
返回 self 的可打印版本,用双引号括起来。
大多数可打印字符将简单地显示为它们本身。
'abc'.inspect # => "\"abc\"" '012'.inspect # => "\"012\"" ''.inspect # => "\"\"" "\u000012".inspect # => "\"\\u000012\"" 'тест'.inspect # => "\"тест\"" 'こんにちは'.inspect # => "\"こんにちは\""
但是,可打印字符双引号 ('"') 和反斜杠 ('\') 会被转义。
'"'.inspect # => "\"\\\"\"" '\\'.inspect # => "\"\\\\\""
不可打印字符是 ASCII 字符,其值为 0..31 范围内的值,以及值为 127 的字符。
其中大多数字符的显示如下:
0.chr.inspect # => "\"\\x00\"" 1.chr.inspect # => "\"\\x01\"" 2.chr.inspect # => "\"\\x02\"" # ...
然而,少数字符有特殊的显示方式。
7.chr.inspect # => "\"\\a\"" # BEL 8.chr.inspect # => "\"\\b\"" # BS 9.chr.inspect # => "\"\\t\"" # TAB 10.chr.inspect # => "\"\\n\"" # LF 11.chr.inspect # => "\"\\v\"" # VT 12.chr.inspect # => "\"\\f\"" # FF 13.chr.inspect # => "\"\\r\"" # CR 27.chr.inspect # => "\"\\e\"" # ESC
相关:参见 转换为非字符串。
Source
VALUE
rb_str_intern(VALUE str)
{
return sym_find_or_insert_dynamic_symbol(&ruby_global_symbols, str);
}
返回从 self 派生的 Symbol 对象,如果它尚不存在则创建它。
'foo'.intern # => :foo 'тест'.intern # => :тест 'こんにちは'.intern # => :こんにちは
相关:参见 转换为非字符串。
Source
VALUE
rb_str_length(VALUE str)
{
return LONG2NUM(str_strlen(str, NULL));
}
返回 self 中的字符数(不是字节数)。
'foo'.length # => 3 'тест'.length # => 4 'こんにちは'.length # => 5
与 String#bytesize 对比。
'foo'.bytesize # => 3 'тест'.bytesize # => 8 'こんにちは'.bytesize # => 15
相关:参见 查询。
Source
static VALUE
rb_str_lines(int argc, VALUE *argv, VALUE str)
{
VALUE ary = WANTARRAY("lines", 0);
return rb_str_enumerate_lines(argc, argv, str, ary);
}
根据给定的参数返回 self 的子串(“行”)。
s = <<~EOT This is the first line. This is line two. This is line four. This is line five. EOT
使用默认参数值:
$/ # => "\n" s.lines # => ["This is the first line.\n", "This is line two.\n", "\n", "This is line four.\n", "This is line five.\n"]
使用不同的 record_separator
record_separator = ' is ' s.lines(record_separator) # => ["This is ", "the first line.\nThis is ", "line two.\n\nThis is ", "line four.\nThis is ", "line five.\n"]
使用关键字参数 chomp 为 true,会从每行中删除尾随的换行符。
s.lines(chomp: true) # => ["This is the first line.", "This is line two.", "", "This is line four.", "This is line five."]
相关:参见 转换为非字符串。
Source
static VALUE
rb_str_ljust(int argc, VALUE *argv, VALUE str)
{
return rb_str_justify(argc, argv, str, 'l');
}
返回 self 的副本,左对齐,并在必要时用 pad_string 进行右填充。
'hello'.ljust(10) # => "hello " ' hello'.ljust(10) # => " hello " 'hello'.ljust(10, 'ab') # => "helloababa" 'тест'.ljust(10) # => "тест " 'こんにちは'.ljust(10) # => "こんにちは "
如果 width <= self.length,则返回 self 的副本。
'hello'.ljust(5) # => "hello" 'hello'.ljust(1) # => "hello" # Does not truncate to width.
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_lstrip(int argc, VALUE *argv, VALUE str)
{
char *start;
long len, loffset;
RSTRING_GETMEM(str, start, len);
if (argc > 0) {
char table[TR_TABLE_SIZE];
VALUE del = 0, nodel = 0;
tr_setup_table_multi(table, &del, &nodel, str, argc, argv);
loffset = lstrip_offset_table(str, start, start+len, STR_ENC_GET(str), table, del, nodel);
}
else {
loffset = lstrip_offset(str, start, start+len, STR_ENC_GET(str));
}
if (loffset <= 0) return str_duplicate(rb_cString, str);
return rb_str_subseq(str, loffset, len - loffset);
}
返回 self 的副本,并删除前导空格;请参阅 字符串中的空格。
whitespace = "\x00\t\n\v\f\r " s = whitespace + 'abc' + whitespace # => "\u0000\t\n\v\f\r abc\u0000\t\n\v\f\r " s.lstrip # => "abc\u0000\t\n\v\f\r "
如果给定了 selectors,则从 self 的开头删除 selectors 中的字符。
s = "---abc+++" s.lstrip("-") # => "abc+++"
selectors 必须是有效的字符选择器(请参阅 Character Selectors),并且可以使用其任何有效形式,包括否定、范围和转义。
"01234abc56789".lstrip("0-9") # "abc56789" "01234abc56789".lstrip("0-9", "^4-6") # "4abc56789"
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_lstrip_bang(int argc, VALUE *argv, VALUE str)
{
rb_encoding *enc;
char *start, *s;
long olen, loffset;
str_modify_keep_cr(str);
enc = STR_ENC_GET(str);
RSTRING_GETMEM(str, start, olen);
if (argc > 0) {
char table[TR_TABLE_SIZE];
VALUE del = 0, nodel = 0;
tr_setup_table_multi(table, &del, &nodel, str, argc, argv);
loffset = lstrip_offset_table(str, start, start+olen, enc, table, del, nodel);
}
else {
loffset = lstrip_offset(str, start, start+olen, enc);
}
if (loffset > 0) {
long len = olen-loffset;
s = start + loffset;
memmove(start, s, len);
STR_SET_LEN(str, len);
TERM_FILL(start+len, rb_enc_mbminlen(enc));
return str;
}
return Qnil;
}
Source
static VALUE
rb_str_match_m(int argc, VALUE *argv, VALUE str)
{
VALUE re, result;
if (argc < 1)
rb_check_arity(argc, 1, 2);
re = argv[0];
argv[0] = str;
result = rb_funcallv(get_pat(re), rb_intern("match"), argc, argv);
if (!NIL_P(result) && rb_block_given_p()) {
return rb_yield(result);
}
return result;
}
基于 self 和给定参数创建 MatchData 对象;更新 Regexp 全局变量。
-
通过将
pattern(如果不是Regexp)转换为Regexp来计算regexp。regexp = Regexp.new(pattern)
-
计算
matchdata,它将是MatchData对象或nil(请参阅Regexp#match)。matchdata = regexp.match(self[offset..])
如果不提供块,则返回计算出的 matchdata 或 nil。
'foo'.match('f') # => #<MatchData "f"> 'foo'.match('o') # => #<MatchData "o"> 'foo'.match('x') # => nil 'foo'.match('f', 1) # => nil 'foo'.match('o', 1) # => #<MatchData "o">
如果提供了块且计算出的 matchdata 非空,则用 matchdata 调用该块;返回块的返回值。
'foo'.match(/o/) {|matchdata| matchdata } # => #<MatchData "o">
如果提供了块且 matchdata 为 nil,则不调用该块。
'foo'.match(/x/) {|matchdata| fail 'Cannot happen' } # => nil
相关:参见 查询。
Source
static VALUE
rb_str_match_m_p(int argc, VALUE *argv, VALUE str)
{
VALUE re;
rb_check_arity(argc, 1, 2);
re = get_pat(argv[0]);
return rb_reg_match_p(re, str, argc > 1 ? NUM2LONG(argv[1]) : 0);
}
返回是否为 self 和给定参数找到匹配项;不更新 Regexp 全局变量。
通过将 pattern(如果不是 Regexp)转换为 Regexp 来计算 regexp。
regexp = Regexp.new(pattern)
如果 self[offset..].match(regexp) 返回 MatchData 对象,则返回 true,否则返回 false。
'foo'.match?(/o/) # => true 'foo'.match?('o') # => true 'foo'.match?(/x/) # => false 'foo'.match?('f', 1) # => false 'foo'.match?('o', 1) # => true
相关:参见 查询。
Source
static VALUE
rb_str_oct(VALUE str)
{
return rb_str_to_inum(str, -8, FALSE);
}
将 self 的前导子串解释为八进制、二进制、十进制或十六进制,可能带符号;返回其作为整数的值。
简而言之:
# Interpreted as octal. '777'.oct # => 511 '777x'.oct # => 511 '0777'.oct # => 511 '0o777'.oct # => 511 '-777'.oct # => -511 # Not interpreted as octal. '0b111'.oct # => 7 # Interpreted as binary. '0d999'.oct # => 999 # Interpreted as decimal. '0xfff'.oct # => 4095 # Interpreted as hexadecimal.
当前导子串以以下内容开头时,将其解释为八进制:
-
一个或多个表示八进制数字的字符(每个字符的范围在
'0'..'7');要解释的字符串在第一个不表示八进制数字的字符处结束。'7'.oct @ => 7 '11'.oct # => 9 '777'.oct # => 511 '0777'.oct # => 511 '7778'.oct # => 511 '777x'.oct # => 511
-
'0o',后跟一个或多个八进制数字。'0o777'.oct # => 511 '0o7778'.oct # => 511
当当前导子串以以下内容开头时,不将其解释为八进制:
-
'0b',后跟一个或多个表示二进制数字的字符(每个字符的范围在'0'..'1');要解释的字符串在第一个不表示二进制数字的字符处结束。该字符串被解释为二进制数字(基数 2)。'0b111'.oct # => 7 '0b1112'.oct # => 7
-
'0d',后跟一个或多个表示十进制数字的字符(每个字符的范围在'0'..'9');要解释的字符串在第一个不表示十进制数字的字符处结束。该字符串被解释为十进制数字(基数 10)。'0d999'.oct # => 999 '0d999x'.oct # => 999
-
'0x',后跟一个或多个表示十六进制数字的字符(每个字符的范围在'0'..'9'、'a'..'f'或'A'..'F');要解释的字符串在第一个不表示十六进制数字的字符处结束。该字符串被解释为十六进制数字(基数 16)。'0xfff'.oct # => 4095 '0xfffg'.oct # => 4095
以上任何一种都可能以 '-' 为前缀,这会否定解释后的值。
'-777'.oct # => -511 '-0777'.oct # => -511 '-0b111'.oct # => -7 '-0xfff'.oct # => -4095
对于上面未描述的任何子串,返回零。
'foo'.oct # => 0 ''.oct # => 0
相关:参见 转换为非字符串。
Source
static VALUE
rb_str_ord(VALUE s)
{
unsigned int c;
c = rb_enc_codepoint(RSTRING_PTR(s), RSTRING_END(s), STR_ENC_GET(s));
return UINT2NUM(c);
}
返回 self 第一个字符的整数序数值。
'h'.ord # => 104 'hello'.ord # => 104 'тест'.ord # => 1090 'こんにちは'.ord # => 12371
相关:参见 转换为非字符串。
Source
static VALUE
rb_str_partition(VALUE str, VALUE sep)
{
long pos;
sep = get_pat_quoted(sep, 0);
if (RB_TYPE_P(sep, T_REGEXP)) {
if (rb_reg_search(sep, str, 0, 0) < 0) {
goto failed;
}
VALUE match = rb_backref_get();
struct re_registers *regs = RMATCH_REGS(match);
pos = BEG(0);
sep = rb_str_subseq(str, pos, END(0) - pos);
}
else {
pos = rb_str_index(str, sep, 0);
if (pos < 0) goto failed;
}
return rb_ary_new3(3, rb_str_subseq(str, 0, pos),
sep,
rb_str_subseq(str, pos+RSTRING_LEN(sep),
RSTRING_LEN(str)-pos-RSTRING_LEN(sep)));
failed:
return rb_ary_new3(3, str_duplicate(rb_cString, str), str_new_empty_String(str), str_new_empty_String(str));
}
返回一个 3 元素的 self 子串数组。
如果 pattern 匹配,则返回数组:
[pre_match, first_match, post_match]
其中:
-
first_match是第一个找到的匹配子串。 -
pre_match和post_match是前面的和后面的子串。
如果 pattern 未匹配,则返回数组:
[self.dup, "", ""]
请注意,在以下示例中,返回的字符串 'hello' 是 self 的副本,而不是 self 本身。
如果 pattern 是 Regexp,则执行相当于 self.match(pattern) 的操作(同时设置 匹配数据变量)。
'hello'.partition(/h/) # => ["", "h", "ello"] 'hello'.partition(/l/) # => ["he", "l", "lo"] 'hello'.partition(/l+/) # => ["he", "ll", "o"] 'hello'.partition(/o/) # => ["hell", "o", ""] 'hello'.partition(/^/) # => ["", "", "hello"] 'hello'.partition(//) # => ["", "", "hello"] 'hello'.partition(/$/) # => ["hello", "", ""] 'hello'.partition(/x/) # => ["hello", "", ""]
如果 pattern 不是 Regexp,则将其转换为字符串(如果它还不是字符串),然后执行相当于 self.index(pattern) 的操作(并且不设置 匹配数据全局变量)。
'hello'.partition('h') # => ["", "h", "ello"] 'hello'.partition('l') # => ["he", "l", "lo"] 'hello'.partition('ll') # => ["he", "ll", "o"] 'hello'.partition('o') # => ["hell", "o", ""] 'hello'.partition('') # => ["", "", "hello"] 'hello'.partition('x') # => ["hello", "", ""] 'тест'.partition('т') # => ["", "т", "ест"] 'こんにちは'.partition('に') # => ["こん", "に", "ちは"]
相关:参见 转换为非字符串。
Source
static VALUE
rb_str_prepend_multi(int argc, VALUE *argv, VALUE str)
{
str_modifiable(str);
if (argc == 1) {
rb_str_update(str, 0L, 0L, argv[0]);
}
else if (argc > 1) {
int i;
VALUE arg_str = rb_str_tmp_new(0);
rb_enc_copy(arg_str, str);
for (i = 0; i < argc; i++) {
rb_str_append(arg_str, argv[i]);
}
rb_str_update(str, 0L, 0L, arg_str);
}
return str;
}
Source
VALUE
rb_str_replace(VALUE str, VALUE str2)
{
str_modifiable(str);
if (str == str2) return str;
StringValue(str2);
str_discard(str);
return str_replace(str, str2);
}
Source
static VALUE
rb_str_reverse(VALUE str)
{
rb_encoding *enc;
VALUE rev;
char *s, *e, *p;
int cr;
if (RSTRING_LEN(str) <= 1) return str_duplicate(rb_cString, str);
enc = STR_ENC_GET(str);
rev = rb_str_new(0, RSTRING_LEN(str));
s = RSTRING_PTR(str); e = RSTRING_END(str);
p = RSTRING_END(rev);
cr = ENC_CODERANGE(str);
if (RSTRING_LEN(str) > 1) {
if (single_byte_optimizable(str)) {
while (s < e) {
*--p = *s++;
}
}
else if (cr == ENC_CODERANGE_VALID) {
while (s < e) {
int clen = rb_enc_fast_mbclen(s, e, enc);
p -= clen;
memcpy(p, s, clen);
s += clen;
}
}
else {
cr = rb_enc_asciicompat(enc) ?
ENC_CODERANGE_7BIT : ENC_CODERANGE_VALID;
while (s < e) {
int clen = rb_enc_mbclen(s, e, enc);
if (clen > 1 || (*s & 0x80)) cr = ENC_CODERANGE_UNKNOWN;
p -= clen;
memcpy(p, s, clen);
s += clen;
}
}
}
STR_SET_LEN(rev, RSTRING_LEN(str));
str_enc_copy_direct(rev, str);
ENC_CODERANGE_SET(rev, cr);
return rev;
}
返回一个新字符串,其中包含 self 中字符的倒序。
'drawer'.reverse # => "reward" 'reviled'.reverse # => "deliver" 'stressed'.reverse # => "desserts" 'semordnilaps'.reverse # => "spalindromes"
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_reverse_bang(VALUE str)
{
if (RSTRING_LEN(str) > 1) {
if (single_byte_optimizable(str)) {
char *s, *e, c;
str_modify_keep_cr(str);
s = RSTRING_PTR(str);
e = RSTRING_END(str) - 1;
while (s < e) {
c = *s;
*s++ = *e;
*e-- = c;
}
}
else {
str_shared_replace(str, rb_str_reverse(str));
}
}
else {
str_modify_keep_cr(str);
}
return str;
}
返回 self,其字符被反转。
'drawer'.reverse! # => "reward" 'reviled'.reverse! # => "deliver" 'stressed'.reverse! # => "desserts" 'semordnilaps'.reverse! # => "spalindromes"
相关:参见 修改。
Source
static VALUE
rb_str_rindex_m(int argc, VALUE *argv, VALUE str)
{
VALUE sub;
VALUE initpos;
rb_encoding *enc = STR_ENC_GET(str);
long pos, len = str_strlen(str, enc); /* str's enc */
if (rb_scan_args(argc, argv, "11", &sub, &initpos) == 2) {
pos = NUM2LONG(initpos);
if (pos < 0 && (pos += len) < 0) {
if (RB_TYPE_P(sub, T_REGEXP)) {
rb_backref_set(Qnil);
}
return Qnil;
}
if (pos > len) pos = len;
}
else {
pos = len;
}
if (RB_TYPE_P(sub, T_REGEXP)) {
/* enc = rb_enc_check(str, sub); */
pos = str_offset(RSTRING_PTR(str), RSTRING_END(str), pos,
enc, single_byte_optimizable(str));
if (rb_reg_search(sub, str, pos, 1) >= 0) {
VALUE match = rb_backref_get();
struct re_registers *regs = RMATCH_REGS(match);
pos = rb_str_sublen(str, BEG(0));
return LONG2NUM(pos);
}
}
else {
StringValue(sub);
pos = rb_str_rindex(str, sub, pos);
if (pos >= 0) {
pos = rb_str_sublen(str, pos);
return LONG2NUM(pos);
}
}
return Qnil;
}
返回最后一个匹配给定参数 pattern 的子串的整数位置,如果未找到则返回 nil。
当 pattern 是字符串时,返回 self 中最后一个匹配子串的索引。
'foo'.rindex('f') # => 0
'foo'.rindex('o') # => 2
'foo'.rindex('oo' # => 1
'foo'.rindex('ooo') # => nil
'тест'.rindex('т') # => 3
'こんにちは'.rindex('ち') # => 3
当 pattern 是 Regexp 时,返回 self 中最后一个匹配项的索引。
'foo'.rindex(/f/) # => 0 'foo'.rindex(/o/) # => 2 'foo'.rindex(/oo/) # => 1 'foo'.rindex(/ooo/) # => nil
当 offset 为非负数时,它指定字符串中用于结束搜索的最大起始位置。
'foo'.rindex('o', 0) # => nil 'foo'.rindex('o', 1) # => 1 'foo'.rindex('o', 2) # => 2 'foo'.rindex('o', 3) # => 2
当 offset 参数为负整数时,通过从 self 末尾开始计数来选择搜索位置。
'foo'.rindex('o', -1) # => 2 'foo'.rindex('o', -2) # => 1 'foo'.rindex('o', -3) # => nil 'foo'.rindex('o', -4) # => nil
最后一个匹配意味着从可能的最后一个位置开始,而不是最后一个最长匹配。
'foo'.rindex(/o+/) # => 2 $~ # => #<MatchData "o">
要获得最后一个最长匹配,请结合使用负向后行断言。
'foo'.rindex(/(?<!o)o+/) # => 1 $~ # => #<MatchData "oo">
或者对 String#index 使用负向前行断言。
'foo'.index(/o+(?!.*o)/) # => 1 $~ # => #<MatchData "oo">
相关:参见 查询。
Source
static VALUE
rb_str_rjust(int argc, VALUE *argv, VALUE str)
{
return rb_str_justify(argc, argv, str, 'r');
}
返回 self 的右对齐副本。
如果整数参数 width 大于 self 的大小(以字符计),则返回一个长度为 width 的新字符串,该字符串是 self 的副本,右对齐并在左侧用 pad_string 填充。
'hello'.rjust(10) # => " hello" 'hello '.rjust(10) # => " hello " 'hello'.rjust(10, 'ab') # => "ababahello" 'тест'.rjust(10) # => " тест" 'こんにちは'.rjust(10) # => " こんにちは"
如果 width <= self.size,则返回 self 的副本。
'hello'.rjust(5, 'ab') # => "hello" 'hello'.rjust(1, 'ab') # => "hello"
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_rpartition(VALUE str, VALUE sep)
{
long pos = RSTRING_LEN(str);
sep = get_pat_quoted(sep, 0);
if (RB_TYPE_P(sep, T_REGEXP)) {
if (rb_reg_search(sep, str, pos, 1) < 0) {
goto failed;
}
VALUE match = rb_backref_get();
struct re_registers *regs = RMATCH_REGS(match);
pos = BEG(0);
sep = rb_str_subseq(str, pos, END(0) - pos);
}
else {
pos = rb_str_sublen(str, pos);
pos = rb_str_rindex(str, sep, pos);
if (pos < 0) {
goto failed;
}
}
return rb_ary_new3(3, rb_str_subseq(str, 0, pos),
sep,
rb_str_subseq(str, pos+RSTRING_LEN(sep),
RSTRING_LEN(str)-pos-RSTRING_LEN(sep)));
failed:
return rb_ary_new3(3, str_new_empty_String(str), str_new_empty_String(str), str_duplicate(rb_cString, str));
}
返回一个 3 元素的 self 子串数组。
在 self 中搜索 pattern 的匹配项,查找最后一个匹配项。
如果 pattern 未匹配,则返回数组:
["", "", self.dup]
如果 pattern 匹配,则返回数组:
[pre_match, last_match, post_match]
其中:
-
last_match是最后一个找到的匹配子串。 -
pre_match和post_match是前面的和后面的子串。
使用的模式是:
-
pattern本身,如果它是Regexp。 -
Regexp.quote(pattern),如果pattern是字符串。
请注意,在以下示例中,返回的字符串 'hello' 是 self 的副本,而不是 self 本身。
如果 pattern 是 Regexp,则搜索最后一个匹配子串(同时设置 匹配数据全局变量)。
'hello'.rpartition(/l/) # => ["hel", "l", "o"] 'hello'.rpartition(/ll/) # => ["he", "ll", "o"] 'hello'.rpartition(/h/) # => ["", "h", "ello"] 'hello'.rpartition(/o/) # => ["hell", "o", ""] 'hello'.rpartition(//) # => ["hello", "", ""] 'hello'.rpartition(/x/) # => ["", "", "hello"] 'тест'.rpartition(/т/) # => ["тес", "т", ""] 'こんにちは'.rpartition(/に/) # => ["こん", "に", "ちは"]
如果 pattern 不是 Regexp,则将其转换为字符串(如果它还不是字符串),然后搜索最后一个匹配子串(并且不设置 匹配数据全局变量)。
'hello'.rpartition('l') # => ["hel", "l", "o"] 'hello'.rpartition('ll') # => ["he", "ll", "o"] 'hello'.rpartition('h') # => ["", "h", "ello"] 'hello'.rpartition('o') # => ["hell", "o", ""] 'hello'.rpartition('') # => ["hello", "", ""] 'тест'.rpartition('т') # => ["тес", "т", ""] 'こんにちは'.rpartition('に') # => ["こん", "に", "ちは"]
相关:参见 转换为非字符串。
Source
static VALUE
rb_str_rstrip(int argc, VALUE *argv, VALUE str)
{
rb_encoding *enc;
char *start;
long olen, roffset;
enc = STR_ENC_GET(str);
RSTRING_GETMEM(str, start, olen);
if (argc > 0) {
char table[TR_TABLE_SIZE];
VALUE del = 0, nodel = 0;
tr_setup_table_multi(table, &del, &nodel, str, argc, argv);
roffset = rstrip_offset_table(str, start, start+olen, enc, table, del, nodel);
}
else {
roffset = rstrip_offset(str, start, start+olen, enc);
}
if (roffset <= 0) return str_duplicate(rb_cString, str);
return rb_str_subseq(str, 0, olen-roffset);
}
返回 self 的副本,并删除尾随空格;请参阅 字符串中的空格。
whitespace = "\x00\t\n\v\f\r " s = whitespace + 'abc' + whitespace s # => "\u0000\t\n\v\f\r abc\u0000\t\n\v\f\r " s.rstrip # => "\u0000\t\n\v\f\r abc"
如果给定了 selectors,则从 self 的末尾删除 selectors 中的字符。
s = "---abc+++" s.rstrip("+") # => "---abc"
selectors 必须是有效的字符选择器(请参阅 Character Selectors),并且可以使用其任何有效形式,包括否定、范围和转义。
"01234abc56789".rstrip("0-9") # "01234abc" "01234abc56789".rstrip("0-9", "^4-6") # "01234abc56"
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_rstrip_bang(int argc, VALUE *argv, VALUE str)
{
rb_encoding *enc;
char *start;
long olen, roffset;
str_modify_keep_cr(str);
enc = STR_ENC_GET(str);
RSTRING_GETMEM(str, start, olen);
if (argc > 0) {
char table[TR_TABLE_SIZE];
VALUE del = 0, nodel = 0;
tr_setup_table_multi(table, &del, &nodel, str, argc, argv);
roffset = rstrip_offset_table(str, start, start+olen, enc, table, del, nodel);
}
else {
roffset = rstrip_offset(str, start, start+olen, enc);
}
if (roffset > 0) {
long len = olen - roffset;
STR_SET_LEN(str, len);
TERM_FILL(start+len, rb_enc_mbminlen(enc));
return str;
}
return Qnil;
}
Source
static VALUE
rb_str_scan(VALUE str, VALUE pat)
{
VALUE result;
long start = 0;
long last = -1, prev = 0;
char *p = RSTRING_PTR(str); long len = RSTRING_LEN(str);
pat = get_pat_quoted(pat, 1);
mustnot_broken(str);
if (!rb_block_given_p()) {
VALUE ary = rb_ary_new();
while (!NIL_P(result = scan_once(str, pat, &start, 0))) {
last = prev;
prev = start;
rb_ary_push(ary, result);
}
if (last >= 0) rb_pat_search(pat, str, last, 1);
else rb_backref_set(Qnil);
return ary;
}
while (!NIL_P(result = scan_once(str, pat, &start, 1))) {
last = prev;
prev = start;
rb_yield(result);
str_mod_check(str, p, len);
}
if (last >= 0) rb_pat_search(pat, str, last, 1);
return str;
}
匹配 self 中的模式。
-
如果
pattern是Regexp,则使用的模式是pattern本身。 -
如果
pattern是字符串,则使用的模式是Regexp.quote(pattern)。
生成匹配结果的集合,并更新 Regexp 相关全局变量。
-
如果模式不包含分组,则每个结果都是一个匹配的子串。
-
如果模式包含分组,则每个结果都是一个数组,其中包含每个分组的匹配子串。
如果不提供块,则返回结果数组。
'cruel world'.scan(/\w+/) # => ["cruel", "world"] 'cruel world'.scan(/.../) # => ["cru", "el ", "wor"] 'cruel world'.scan(/(...)/) # => [["cru"], ["el "], ["wor"]] 'cruel world'.scan(/(..)(..)/) # => [["cr", "ue"], ["l ", "wo"]] 'тест'.scan(/../) # => ["те", "ст"] 'こんにちは'.scan(/../) # => ["こん", "にち"] 'abracadabra'.scan('ab') # => ["ab", "ab"] 'abracadabra'.scan('nosuch') # => []
如果提供了块,则用每个结果调用该块;返回 self。
'cruel world'.scan(/\w+/) {|w| p w } # => "cruel" # => "world" 'cruel world'.scan(/(.)(.)/) {|x, y| p [x, y] } # => ["c", "r"] # => ["u", "e"] # => ["l", " "] # => ["w", "o"] # => ["r", "l"]
相关:参见 转换为非字符串。
Source
static VALUE
str_scrub(int argc, VALUE *argv, VALUE str)
{
VALUE repl = argc ? (rb_check_arity(argc, 0, 1), argv[0]) : Qnil;
VALUE new = rb_str_scrub(str, repl);
return NIL_P(new) ? str_duplicate(rb_cString, str): new;
}
返回 self 的副本,其中每个无效字节序列都替换为给定的 replacement_string。
如果不提供块,则用给定的 default_replacement_string 替换每个无效序列(默认情况下,对于 Unicode 编码为 "�",否则为 '?')。
"foo\x81\x81bar"scrub # => "foo��bar"
"foo\x81\x81bar".force_encoding('US-ASCII').scrub # => "foo??bar"
"foo\x81\x81bar".scrub('xyzzy') # => "fooxyzzyxyzzybar"
如果提供了块,则用每个无效序列调用该块,并将该序列替换为块的返回值。
"foo\x81\x81bar".scrub {|sequence| p sequence; 'XYZZY' } # => "fooXYZZYXYZZYbar"
输出:
"\x81" "\x81"
相关:参见 转换为新字符串。
Source
static VALUE
str_scrub_bang(int argc, VALUE *argv, VALUE str)
{
VALUE repl = argc ? (rb_check_arity(argc, 0, 1), argv[0]) : Qnil;
VALUE new = rb_str_scrub(str, repl);
if (!NIL_P(new)) rb_str_replace(str, new);
return str;
}
Source
VALUE
rb_str_setbyte(VALUE str, VALUE index, VALUE value)
{
long pos = NUM2LONG(index);
long len = RSTRING_LEN(str);
char *ptr, *head, *left = 0;
rb_encoding *enc;
int cr = ENC_CODERANGE_UNKNOWN, width, nlen;
if (pos < -len || len <= pos)
rb_raise(rb_eIndexError, "index %ld out of string", pos);
if (pos < 0)
pos += len;
VALUE v = rb_to_int(value);
VALUE w = rb_int_and(v, INT2FIX(0xff));
char byte = (char)(NUM2INT(w) & 0xFF);
if (!str_independent(str))
str_make_independent(str);
enc = STR_ENC_GET(str);
head = RSTRING_PTR(str);
ptr = &head[pos];
if (!STR_EMBED_P(str)) {
cr = ENC_CODERANGE(str);
switch (cr) {
case ENC_CODERANGE_7BIT:
left = ptr;
*ptr = byte;
if (ISASCII(byte)) goto end;
nlen = rb_enc_precise_mbclen(left, head+len, enc);
if (!MBCLEN_CHARFOUND_P(nlen))
ENC_CODERANGE_SET(str, ENC_CODERANGE_BROKEN);
else
ENC_CODERANGE_SET(str, ENC_CODERANGE_VALID);
goto end;
case ENC_CODERANGE_VALID:
left = rb_enc_left_char_head(head, ptr, head+len, enc);
width = rb_enc_precise_mbclen(left, head+len, enc);
*ptr = byte;
nlen = rb_enc_precise_mbclen(left, head+len, enc);
if (!MBCLEN_CHARFOUND_P(nlen))
ENC_CODERANGE_SET(str, ENC_CODERANGE_BROKEN);
else if (MBCLEN_CHARFOUND_LEN(nlen) != width || ISASCII(byte))
ENC_CODERANGE_CLEAR(str);
goto end;
}
}
ENC_CODERANGE_CLEAR(str);
*ptr = byte;
end:
return value;
}
将零基偏移量 index 处的字节设置为给定 integer 的值;返回 integer。
s = 'xyzzy' s.setbyte(2, 129) # => 129 s # => "xy\x81zy"
相关:参见 修改。
Source
# File lib/shellwords.rb, line 238 def shellescape Shellwords.escape(self) end
转义 str,使其可以安全地用于 Bourne shell 命令。
有关详细信息,请参阅 Shellwords.shellescape。
Source
# File lib/shellwords.rb, line 227 def shellsplit Shellwords.split(self) end
以 UNIX Bourne shell 的方式将 str 分割成一个令牌数组。
有关详细信息,请参阅 Shellwords.shellsplit。
Source
static VALUE
rb_str_slice_bang(int argc, VALUE *argv, VALUE str)
{
VALUE result = Qnil;
VALUE indx;
long beg, len = 1;
char *p;
rb_check_arity(argc, 1, 2);
str_modify_keep_cr(str);
indx = argv[0];
if (RB_TYPE_P(indx, T_REGEXP)) {
if (rb_reg_search(indx, str, 0, 0) < 0) return Qnil;
VALUE match = rb_backref_get();
struct re_registers *regs = RMATCH_REGS(match);
int nth = 0;
if (argc > 1 && (nth = rb_reg_backref_number(match, argv[1])) < 0) {
if ((nth += regs->num_regs) <= 0) return Qnil;
}
else if (nth >= regs->num_regs) return Qnil;
beg = BEG(nth);
len = END(nth) - beg;
goto subseq;
}
else if (argc == 2) {
beg = NUM2LONG(indx);
len = NUM2LONG(argv[1]);
goto num_index;
}
else if (FIXNUM_P(indx)) {
beg = FIX2LONG(indx);
if (!(p = rb_str_subpos(str, beg, &len))) return Qnil;
if (!len) return Qnil;
beg = p - RSTRING_PTR(str);
goto subseq;
}
else if (RB_TYPE_P(indx, T_STRING)) {
beg = rb_str_index(str, indx, 0);
if (beg == -1) return Qnil;
len = RSTRING_LEN(indx);
result = str_duplicate(rb_cString, indx);
goto squash;
}
else {
switch (rb_range_beg_len(indx, &beg, &len, str_strlen(str, NULL), 0)) {
case Qnil:
return Qnil;
case Qfalse:
beg = NUM2LONG(indx);
if (!(p = rb_str_subpos(str, beg, &len))) return Qnil;
if (!len) return Qnil;
beg = p - RSTRING_PTR(str);
goto subseq;
default:
goto num_index;
}
}
num_index:
if (!(p = rb_str_subpos(str, beg, &len))) return Qnil;
beg = p - RSTRING_PTR(str);
subseq:
result = rb_str_new(RSTRING_PTR(str)+beg, len);
rb_enc_cr_str_copy_for_substr(result, str);
squash:
if (len > 0) {
if (beg == 0) {
rb_str_drop_bytes(str, len);
}
else {
char *sptr = RSTRING_PTR(str);
long slen = RSTRING_LEN(str);
if (beg + len > slen) /* pathological check */
len = slen - beg;
memmove(sptr + beg,
sptr + beg + len,
slen - (beg + len));
slen -= len;
STR_SET_LEN(str, slen);
TERM_FILL(&sptr[slen], TERM_LEN(str));
}
}
return result;
}
类似于 String#[](及其别名 String#slice),但:
-
在
self中执行替换(而不是在self的副本中)。 -
如果进行了任何修改,则返回移除的子串,否则返回
nil。
一些示例:
s = 'hello' s.slice!('e') # => "e" s # => "hllo" s.slice!('e') # => nil s # => "hllo"
相关:参见 修改。
Source
static VALUE
rb_str_split_m(int argc, VALUE *argv, VALUE str)
{
rb_encoding *enc;
VALUE spat;
VALUE limit;
split_type_t split_type;
long beg, end, i = 0, empty_count = -1;
int lim = 0;
VALUE result, tmp;
result = rb_block_given_p() ? Qfalse : Qnil;
if (rb_scan_args(argc, argv, "02", &spat, &limit) == 2) {
lim = NUM2INT(limit);
if (lim <= 0) limit = Qnil;
else if (lim == 1) {
if (RSTRING_LEN(str) == 0)
return result ? rb_ary_new2(0) : str;
tmp = str_duplicate(rb_cString, str);
if (!result) {
rb_yield(tmp);
return str;
}
return rb_ary_new3(1, tmp);
}
i = 1;
}
if (NIL_P(limit) && !lim) empty_count = 0;
enc = STR_ENC_GET(str);
split_type = SPLIT_TYPE_REGEXP;
if (!NIL_P(spat)) {
spat = get_pat_quoted(spat, 0);
}
else if (NIL_P(spat = rb_fs)) {
split_type = SPLIT_TYPE_AWK;
}
else if (!(spat = rb_fs_check(spat))) {
rb_raise(rb_eTypeError, "value of $; must be String or Regexp");
}
else {
rb_category_warn(RB_WARN_CATEGORY_DEPRECATED, "$; is set to non-nil value");
}
if (split_type != SPLIT_TYPE_AWK) {
switch (BUILTIN_TYPE(spat)) {
case T_REGEXP:
rb_reg_options(spat); /* check if uninitialized */
tmp = RREGEXP_SRC(spat);
split_type = literal_split_pattern(tmp, SPLIT_TYPE_REGEXP);
if (split_type == SPLIT_TYPE_AWK) {
spat = tmp;
split_type = SPLIT_TYPE_STRING;
}
break;
case T_STRING:
mustnot_broken(spat);
split_type = literal_split_pattern(spat, SPLIT_TYPE_STRING);
break;
default:
UNREACHABLE_RETURN(Qnil);
}
}
#define SPLIT_STR(beg, len) ( \
empty_count = split_string(result, str, beg, len, empty_count), \
str_mod_check(str, str_start, str_len))
beg = 0;
char *ptr = RSTRING_PTR(str);
char *const str_start = ptr;
const long str_len = RSTRING_LEN(str);
char *const eptr = str_start + str_len;
if (split_type == SPLIT_TYPE_AWK) {
char *bptr = ptr;
int skip = 1;
unsigned int c;
if (result) result = rb_ary_new();
end = beg;
if (is_ascii_string(str)) {
while (ptr < eptr) {
c = (unsigned char)*ptr++;
if (skip) {
if (ascii_isspace(c)) {
beg = ptr - bptr;
}
else {
end = ptr - bptr;
skip = 0;
if (!NIL_P(limit) && lim <= i) break;
}
}
else if (ascii_isspace(c)) {
SPLIT_STR(beg, end-beg);
skip = 1;
beg = ptr - bptr;
if (!NIL_P(limit)) ++i;
}
else {
end = ptr - bptr;
}
}
}
else {
while (ptr < eptr) {
int n;
c = rb_enc_codepoint_len(ptr, eptr, &n, enc);
ptr += n;
if (skip) {
if (rb_isspace(c)) {
beg = ptr - bptr;
}
else {
end = ptr - bptr;
skip = 0;
if (!NIL_P(limit) && lim <= i) break;
}
}
else if (rb_isspace(c)) {
SPLIT_STR(beg, end-beg);
skip = 1;
beg = ptr - bptr;
if (!NIL_P(limit)) ++i;
}
else {
end = ptr - bptr;
}
}
}
}
else if (split_type == SPLIT_TYPE_STRING) {
char *substr_start = ptr;
char *sptr = RSTRING_PTR(spat);
long slen = RSTRING_LEN(spat);
if (result) result = rb_ary_new();
mustnot_broken(str);
enc = rb_enc_check(str, spat);
while (ptr < eptr &&
(end = rb_memsearch(sptr, slen, ptr, eptr - ptr, enc)) >= 0) {
/* Check we are at the start of a char */
char *t = rb_enc_right_char_head(ptr, ptr + end, eptr, enc);
if (t != ptr + end) {
ptr = t;
continue;
}
SPLIT_STR(substr_start - str_start, (ptr+end) - substr_start);
str_mod_check(spat, sptr, slen);
ptr += end + slen;
substr_start = ptr;
if (!NIL_P(limit) && lim <= ++i) break;
}
beg = ptr - str_start;
}
else if (split_type == SPLIT_TYPE_CHARS) {
int n;
if (result) result = rb_ary_new_capa(RSTRING_LEN(str));
mustnot_broken(str);
enc = rb_enc_get(str);
while (ptr < eptr &&
(n = rb_enc_precise_mbclen(ptr, eptr, enc)) > 0) {
SPLIT_STR(ptr - str_start, n);
ptr += n;
if (!NIL_P(limit) && lim <= ++i) break;
}
beg = ptr - str_start;
}
else {
if (result) result = rb_ary_new();
long len = RSTRING_LEN(str);
long start = beg;
long idx;
int last_null = 0;
struct re_registers *regs;
VALUE match = 0;
for (; rb_reg_search(spat, str, start, 0) >= 0;
(match ? (rb_match_unbusy(match), rb_backref_set(match)) : (void)0)) {
match = rb_backref_get();
if (!result) rb_match_busy(match);
regs = RMATCH_REGS(match);
end = BEG(0);
if (start == end && BEG(0) == END(0)) {
if (!ptr) {
SPLIT_STR(0, 0);
break;
}
else if (last_null == 1) {
SPLIT_STR(beg, rb_enc_fast_mbclen(ptr+beg, eptr, enc));
beg = start;
}
else {
if (start == len)
start++;
else
start += rb_enc_fast_mbclen(ptr+start,eptr,enc);
last_null = 1;
continue;
}
}
else {
SPLIT_STR(beg, end-beg);
beg = start = END(0);
}
last_null = 0;
for (idx=1; idx < regs->num_regs; idx++) {
if (BEG(idx) == -1) continue;
SPLIT_STR(BEG(idx), END(idx)-BEG(idx));
}
if (!NIL_P(limit) && lim <= ++i) break;
}
if (match) rb_match_unbusy(match);
}
if (RSTRING_LEN(str) > 0 && (!NIL_P(limit) || RSTRING_LEN(str) > beg || lim < 0)) {
SPLIT_STR(beg, RSTRING_LEN(str)-beg);
}
return result ? result : str;
}
通过在给定字段分隔符 field_sep 的每个出现处分割 self 来创建子串数组。
如果不提供参数,则使用字段分隔符 $; 进行分割,其默认值为 nil。
如果不提供块,则返回子串数组。
'abracadabra'.split('a') # => ["", "br", "c", "d", "br"]
当 field_sep 为 nil 或 ' '(单个空格)时,在每个空格序列处分割。
'foo bar baz'.split(nil) # => ["foo", "bar", "baz"] 'foo bar baz'.split(' ') # => ["foo", "bar", "baz"] "foo \n\tbar\t\n baz".split(' ') # => ["foo", "bar", "baz"] 'foo bar baz'.split(' ') # => ["foo", "bar", "baz"] ''.split(' ') # => []
当 field_sep 为空字符串时,在每个字符处分割。
'abracadabra'.split('') # => ["a", "b", "r", "a", "c", "a", "d", "a", "b", "r", "a"] ''.split('') # => [] 'тест'.split('') # => ["т", "е", "с", "т"] 'こんにちは'.split('') # => ["こ", "ん", "に", "ち", "は"]
当 field_sep 为非空字符串且与 ' '(单个空格)不同时,将其用作分隔符。
'abracadabra'.split('a') # => ["", "br", "c", "d", "br"] 'abracadabra'.split('ab') # => ["", "racad", "ra"] ''.split('a') # => [] 'тест'.split('т') # => ["", "ес"] 'こんにちは'.split('に') # => ["こん", "ちは"]
当 field_sep 为 Regexp 时,在匹配子串的每个出现处分割。
'abracadabra'.split(/ab/) # => ["", "racad", "ra"] '1 + 1 == 2'.split(/\W+/) # => ["1", "1", "2"] 'abracadabra'.split(//) # => ["a", "b", "r", "a", "c", "a", "d", "a", "b", "r", "a"]
如果 Regexp 包含分组,则其匹配项包含在返回的数组中。
'1:2:3'.split(/(:)()()/, 2) # => ["1", ":", "", "", "2:3"]
参数 limit 设置返回数组的大小限制;它还决定是否在返回的数组中包含尾随的空字符串。
当 limit 为零时,数组大小没有限制,但会省略尾随的空字符串。
'abracadabra'.split('', 0) # => ["a", "b", "r", "a", "c", "a", "d", "a", "b", "r", "a"] 'abracadabra'.split('a', 0) # => ["", "br", "c", "d", "br"] # Empty string after last 'a' omitted.
当 limit 为正整数时,数组大小有限制(最多发生 n - 1 次分割),并包含尾随的空字符串。
'abracadabra'.split('', 3) # => ["a", "b", "racadabra"] 'abracadabra'.split('a', 3) # => ["", "br", "cadabra"] 'abracadabra'.split('', 30) # => ["a", "b", "r", "a", "c", "a", "d", "a", "b", "r", "a", ""] 'abracadabra'.split('a', 30) # => ["", "br", "c", "d", "br", ""] 'abracadabra'.split('', 1) # => ["abracadabra"] 'abracadabra'.split('a', 1) # => ["abracadabra"]
当 limit 为负数时,数组大小没有限制,并且会省略尾随的空字符串。
'abracadabra'.split('', -1) # => ["a", "b", "r", "a", "c", "a", "d", "a", "b", "r", "a", ""] 'abracadabra'.split('a', -1) # => ["", "br", "c", "d", "br", ""]
如果提供了块,则用每个子串调用该块,并返回 self。
'foo bar baz'.split(' ') {|substring| p substring }
输出:
"foo" "bar" "baz"
请注意,上述示例在功能上等同于:
'foo bar baz'.split(' ').each {|substring| p substring }
输出:
"foo" "bar" "baz"
但是后者:
-
性能较差,因为它创建了一个中间数组。
-
返回一个数组(而不是
self)。
相关:参见 转换为非字符串。
Source
static VALUE
rb_str_squeeze(int argc, VALUE *argv, VALUE str)
{
str = str_duplicate(rb_cString, str);
rb_str_squeeze_bang(argc, argv, str);
return str;
}
返回 self 的副本,其中指定的字符的每个元组(重复、三连等)被“压缩”为单个字符。
要被压缩的元组由参数 selectors 指定,每个参数都是一个字符串;请参阅 Character Selectors。
单个参数可以是单个字符。
'Noooooo!'.squeeze('o') # => "No!" 'foo bar baz'.squeeze(' ') # => "foo bar baz" 'Mississippi'.squeeze('s') # => "Misisippi" 'Mississippi'.squeeze('p') # => "Mississipi" 'Mississippi'.squeeze('x') # => "Mississippi" # Unused selector character is ignored. 'бессонница'.squeeze('с') # => "бесонница" 'бессонница'.squeeze('н') # => "бессоница"
单个参数可以是字符字符串。
'Mississippi'.squeeze('sp') # => "Misisipi" 'Mississippi'.squeeze('ps') # => "Misisipi" # Order doesn't matter. 'Mississippi'.squeeze('nonsense') # => "Misisippi" # Unused selector characters are ignored.
单个参数可以是字符范围。
'Mississippi'.squeeze('a-p') # => "Mississipi" 'Mississippi'.squeeze('q-z') # => "Misisippi" 'Mississippi'.squeeze('a-z') # => "Misisipi"
允许使用多个参数;请参阅 Multiple Character Selectors。
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_squeeze_bang(int argc, VALUE *argv, VALUE str)
{
char squeez[TR_TABLE_SIZE];
rb_encoding *enc = 0;
VALUE del = 0, nodel = 0;
unsigned char *s, *send, *t;
int i, modify = 0;
int ascompat, singlebyte = single_byte_optimizable(str);
unsigned int save;
if (argc == 0) {
enc = STR_ENC_GET(str);
}
else {
for (i=0; i<argc; i++) {
VALUE s = argv[i];
StringValue(s);
enc = rb_enc_check(str, s);
if (singlebyte && !single_byte_optimizable(s))
singlebyte = 0;
tr_setup_table(s, squeez, i==0, &del, &nodel, enc);
}
}
str_modify_keep_cr(str);
s = t = (unsigned char *)RSTRING_PTR(str);
if (!s || RSTRING_LEN(str) == 0) return Qnil;
send = (unsigned char *)RSTRING_END(str);
save = -1;
ascompat = rb_enc_asciicompat(enc);
if (singlebyte) {
while (s < send) {
unsigned int c = *s++;
if (c != save || (argc > 0 && !squeez[c])) {
*t++ = save = c;
}
}
}
else {
while (s < send) {
unsigned int c;
int clen;
if (ascompat && (c = *s) < 0x80) {
if (c != save || (argc > 0 && !squeez[c])) {
*t++ = save = c;
}
s++;
}
else {
c = rb_enc_codepoint_len((char *)s, (char *)send, &clen, enc);
if (c != save || (argc > 0 && !tr_find(c, squeez, del, nodel))) {
if (t != s) rb_enc_mbcput(c, t, enc);
save = c;
t += clen;
}
s += clen;
}
}
}
TERM_FILL((char *)t, TERM_LEN(str));
if ((char *)t - RSTRING_PTR(str) != RSTRING_LEN(str)) {
STR_SET_LEN(str, (char *)t - RSTRING_PTR(str));
modify = 1;
}
if (modify) return str;
return Qnil;
}
Source
static VALUE
rb_str_start_with(int argc, VALUE *argv, VALUE str)
{
int i;
for (i=0; i<argc; i++) {
VALUE tmp = argv[i];
if (RB_TYPE_P(tmp, T_REGEXP)) {
if (rb_reg_start_with_p(tmp, str))
return Qtrue;
}
else {
const char *p, *s, *e;
long slen, tlen;
rb_encoding *enc;
StringValue(tmp);
enc = rb_enc_check(str, tmp);
if ((tlen = RSTRING_LEN(tmp)) == 0) return Qtrue;
if ((slen = RSTRING_LEN(str)) < tlen) continue;
p = RSTRING_PTR(str);
e = p + slen;
s = p + tlen;
if (!at_char_right_boundary(p, s, e, enc))
continue;
if (memcmp(p, RSTRING_PTR(tmp), tlen) == 0)
return Qtrue;
}
}
return Qfalse;
}
返回 self 是否以任何给定的 patterns 开头。
对于每个参数,使用的模式是:
-
模式本身,如果它是
Regexp。 -
Regexp.quote(pattern),如果它是字符串。
如果任何模式匹配开头,则返回 true,否则返回 false。
'hello'.start_with?('hell') # => true 'hello'.start_with?(/H/i) # => true 'hello'.start_with?('heaven', 'hell') # => true 'hello'.start_with?('heaven', 'paradise') # => false 'тест'.start_with?('т') # => true 'こんにちは'.start_with?('こ') # => true
相关:参见 查询。
Source
static VALUE
rb_str_strip(int argc, VALUE *argv, VALUE str)
{
char *start;
long olen, loffset, roffset;
rb_encoding *enc = STR_ENC_GET(str);
RSTRING_GETMEM(str, start, olen);
if (argc > 0) {
char table[TR_TABLE_SIZE];
VALUE del = 0, nodel = 0;
tr_setup_table_multi(table, &del, &nodel, str, argc, argv);
loffset = lstrip_offset_table(str, start, start+olen, enc, table, del, nodel);
roffset = rstrip_offset_table(str, start+loffset, start+olen, enc, table, del, nodel);
}
else {
loffset = lstrip_offset(str, start, start+olen, enc);
roffset = rstrip_offset(str, start+loffset, start+olen, enc);
}
if (loffset <= 0 && roffset <= 0) return str_duplicate(rb_cString, str);
return rb_str_subseq(str, loffset, olen-loffset-roffset);
}
返回 self 的副本,并删除前导和尾随空格;请参阅 字符串中的空格。
whitespace = "\x00\t\n\v\f\r " s = whitespace + 'abc' + whitespace # => "\u0000\t\n\v\f\r abc\u0000\t\n\v\f\r " s.strip # => "abc"
如果给定了 selectors,则从 self 的两端删除 selectors 中的字符。
s = "---abc+++" s.strip("-+") # => "abc" s.strip("+-") # => "abc"
selectors 必须是有效的字符选择器(请参阅 Character Selectors),并且可以使用其任何有效形式,包括否定、范围和转义。
"01234abc56789".strip("0-9") # "abc" "01234abc56789".strip("0-9", "^4-6") # "4abc56"
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_strip_bang(int argc, VALUE *argv, VALUE str)
{
char *start;
long olen, loffset, roffset;
rb_encoding *enc;
str_modify_keep_cr(str);
enc = STR_ENC_GET(str);
RSTRING_GETMEM(str, start, olen);
if (argc > 0) {
char table[TR_TABLE_SIZE];
VALUE del = 0, nodel = 0;
tr_setup_table_multi(table, &del, &nodel, str, argc, argv);
loffset = lstrip_offset_table(str, start, start+olen, enc, table, del, nodel);
roffset = rstrip_offset_table(str, start+loffset, start+olen, enc, table, del, nodel);
}
else {
loffset = lstrip_offset(str, start, start+olen, enc);
roffset = rstrip_offset(str, start+loffset, start+olen, enc);
}
if (loffset > 0 || roffset > 0) {
long len = olen-roffset;
if (loffset > 0) {
len -= loffset;
memmove(start, start + loffset, len);
}
STR_SET_LEN(str, len);
TERM_FILL(start+len, rb_enc_mbminlen(enc));
return str;
}
return Qnil;
}
Source
static VALUE
rb_str_sub(int argc, VALUE *argv, VALUE str)
{
str = str_duplicate(rb_cString, str);
rb_str_sub_bang(argc, argv, str);
return str;
}
返回 self 的副本,可能替换了子串。
参数 pattern 可以是字符串或 Regexp;参数 replacement 可以是字符串或 Hash。
参数值的不同类型使此方法非常通用。
下面是一些简单的示例;有关更多示例,请参见 替换方法。
给定参数 pattern 和字符串 replacement,用给定的替换字符串替换第一个匹配的子串。
s = 'abracadabra' # => "abracadabra" s.sub('bra', 'xyzzy') # => "axyzzycadabra" s.sub(/bra/, 'xyzzy') # => "axyzzycadabra" s.sub('nope', 'xyzzy') # => "abracadabra"
给定参数 pattern 和哈希 replacement,用给定的替换哈希中的值替换第一个匹配的子串,或将其删除。
h = {'a' => 'A', 'b' => 'B', 'c' => 'C'} s.sub('b', h) # => "aBracadabra" s.sub(/b/, h) # => "aBracadabra" s.sub(/d/, h) # => "abracaabra" # 'd' removed.
给定参数 pattern 和一个块,用每个匹配的子字符串调用该块;将该子字符串替换为块的返回值
s.sub('b') {|match| match.upcase } # => "aBracadabra"
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_sub_bang(int argc, VALUE *argv, VALUE str)
{
VALUE pat, repl, hash = Qnil;
int iter = 0;
long plen;
int min_arity = rb_block_given_p() ? 1 : 2;
long beg;
rb_check_arity(argc, min_arity, 2);
if (argc == 1) {
iter = 1;
}
else {
repl = argv[1];
hash = rb_check_hash_type(argv[1]);
if (NIL_P(hash)) {
StringValue(repl);
}
}
pat = get_pat_quoted(argv[0], 1);
str_modifiable(str);
beg = rb_pat_search(pat, str, 0, 1);
if (beg >= 0) {
rb_encoding *enc;
int cr = ENC_CODERANGE(str);
long beg0, end0;
VALUE match, match0 = Qnil;
struct re_registers *regs;
char *p, *rp;
long len, rlen;
match = rb_backref_get();
regs = RMATCH_REGS(match);
if (RB_TYPE_P(pat, T_STRING)) {
beg0 = beg;
end0 = beg0 + RSTRING_LEN(pat);
match0 = pat;
}
else {
beg0 = BEG(0);
end0 = END(0);
if (iter) match0 = rb_reg_nth_match(0, match);
}
if (iter || !NIL_P(hash)) {
p = RSTRING_PTR(str); len = RSTRING_LEN(str);
if (iter) {
repl = rb_obj_as_string(rb_yield(match0));
}
else {
repl = rb_hash_aref(hash, rb_str_subseq(str, beg0, end0 - beg0));
repl = rb_obj_as_string(repl);
}
str_mod_check(str, p, len);
rb_check_frozen(str);
}
else {
repl = rb_reg_regsub(repl, str, regs, RB_TYPE_P(pat, T_STRING) ? Qnil : pat);
}
enc = rb_enc_compatible(str, repl);
if (!enc) {
rb_encoding *str_enc = STR_ENC_GET(str);
p = RSTRING_PTR(str); len = RSTRING_LEN(str);
if (coderange_scan(p, beg0, str_enc) != ENC_CODERANGE_7BIT ||
coderange_scan(p+end0, len-end0, str_enc) != ENC_CODERANGE_7BIT) {
rb_raise(rb_eEncCompatError, "incompatible character encodings: %s and %s",
rb_enc_inspect_name(str_enc),
rb_enc_inspect_name(STR_ENC_GET(repl)));
}
enc = STR_ENC_GET(repl);
}
rb_str_modify(str);
rb_enc_associate(str, enc);
if (ENC_CODERANGE_UNKNOWN < cr && cr < ENC_CODERANGE_BROKEN) {
int cr2 = ENC_CODERANGE(repl);
if (cr2 == ENC_CODERANGE_BROKEN ||
(cr == ENC_CODERANGE_VALID && cr2 == ENC_CODERANGE_7BIT))
cr = ENC_CODERANGE_UNKNOWN;
else
cr = cr2;
}
plen = end0 - beg0;
rlen = RSTRING_LEN(repl);
len = RSTRING_LEN(str);
if (rlen > plen) {
RESIZE_CAPA(str, len + rlen - plen);
}
p = RSTRING_PTR(str);
if (rlen != plen) {
memmove(p + beg0 + rlen, p + beg0 + plen, len - beg0 - plen);
}
rp = RSTRING_PTR(repl);
memmove(p + beg0, rp, rlen);
len += rlen - plen;
STR_SET_LEN(str, len);
TERM_FILL(&RSTRING_PTR(str)[len], TERM_LEN(str));
ENC_CODERANGE_SET(str, cr);
RB_GC_GUARD(match);
return str;
}
return Qnil;
}
Source
VALUE
rb_str_succ(VALUE orig)
{
VALUE str;
str = rb_str_new(RSTRING_PTR(orig), RSTRING_LEN(orig));
rb_enc_cr_str_copy_for_substr(str, orig);
return str_succ(str);
}
返回 self 的后继项。后继项是通过递增字符计算的。
要递增的第一个字符是右侧的字母数字字符;如果没有字母数字字符,则是右侧的字符。
'THX1138'.succ # => "THX1139" '<<koala>>'.succ # => "<<koalb>>" '***'.succ # => '**+' 'тест'.succ # => "тесу" 'こんにちは'.succ # => "こんにちば"
数字的后继项是另一个数字,“进位”到下一个左侧字符,以实现从 9 到 0 的“翻转”,如果需要,则添加另一个数字。
'00'.succ # => "01" '09'.succ # => "10" '99'.succ # => "100"
字母的后继项是另一个相同大小写的字母,通过“进位”到下一个左侧字符来实现翻转,如果需要,则添加另一个相同大小写的字母。
'aa'.succ # => "ab" 'az'.succ # => "ba" 'zz'.succ # => "aaa" 'AA'.succ # => "AB" 'AZ'.succ # => "BA" 'ZZ'.succ # => "AAA"
非字母数字字符的后继项是底层字符集排序序列中的下一个字符,通过“进位”到下一个左侧字符来实现翻转,如果需要,则添加另一个字符。
s = 0.chr * 3 # => "\x00\x00\x00" s.succ # => "\x00\x00\x01" s = 255.chr * 3 # => "\xFF\xFF\xFF" s.succ # => "\x01\x00\x00\x00"
可以在字母数字字符的混合之间以及它们之间发生进位。
s = 'zz99zz99' # => "zz99zz99" s.succ # => "aaa00aa00" s = '99zz99zz' # => "99zz99zz" s.succ # => "100aa00aa"
''.succ # => ""
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_succ_bang(VALUE str)
{
rb_str_modify(str);
str_succ(str);
return str;
}
类似于 String#succ,但会就地修改 self;返回 self。
相关:参见 修改。
Source
static VALUE
rb_str_sum(int argc, VALUE *argv, VALUE str)
{
int bits = 16;
char *ptr, *p, *pend;
long len;
VALUE sum = INT2FIX(0);
unsigned long sum0 = 0;
if (rb_check_arity(argc, 0, 1) && (bits = NUM2INT(argv[0])) < 0) {
bits = 0;
}
ptr = p = RSTRING_PTR(str);
len = RSTRING_LEN(str);
pend = p + len;
while (p < pend) {
if (FIXNUM_MAX - UCHAR_MAX < sum0) {
sum = rb_funcall(sum, '+', 1, LONG2FIX(sum0));
str_mod_check(str, ptr, len);
sum0 = 0;
}
sum0 += (unsigned char)*p;
p++;
}
if (bits == 0) {
if (sum0) {
sum = rb_funcall(sum, '+', 1, LONG2FIX(sum0));
}
}
else {
if (sum == INT2FIX(0)) {
if (bits < (int)sizeof(long)*CHAR_BIT) {
sum0 &= (((unsigned long)1)<<bits)-1;
}
sum = LONG2FIX(sum0);
}
else {
VALUE mod;
if (sum0) {
sum = rb_funcall(sum, '+', 1, LONG2FIX(sum0));
}
mod = rb_funcall(INT2FIX(1), idLTLT, 1, INT2FIX(bits));
mod = rb_funcall(mod, '-', 1, INT2FIX(1));
sum = rb_funcall(sum, '&', 1, mod);
}
}
return sum;
}
Source
static VALUE
rb_str_swapcase(int argc, VALUE *argv, VALUE str)
{
rb_encoding *enc;
OnigCaseFoldType flags = ONIGENC_CASE_UPCASE | ONIGENC_CASE_DOWNCASE;
VALUE ret;
flags = check_case_options(argc, argv, flags);
enc = str_true_enc(str);
if (RSTRING_LEN(str) == 0 || !RSTRING_PTR(str)) return str_duplicate(rb_cString, str);
if (flags&ONIGENC_CASE_ASCII_ONLY) {
ret = rb_str_new(0, RSTRING_LEN(str));
rb_str_ascii_casemap(str, ret, &flags, enc);
}
else {
ret = rb_str_casemap(str, &flags, enc);
}
return ret;
}
返回一个字符串,其中包含 self 中的字符,大小写已反转。
-
每个大写字符都转换为小写。
-
每个小写字符都转换为大写。
示例
'Hello'.swapcase # => "hELLO" 'Straße'.swapcase # => "sTRASSE" 'Привет'.swapcase # => "пРИВЕТ" 'RubyGems.org'.swapcase # => "rUBYgEMS.ORG"
self 和大写结果的大小可能不同。
s = 'Straße' s.size # => 6 s.swapcase # => "sTRASSE" s.swapcase.size # => 7
某些字符(以及某些字符集)没有大写和小写的版本;参见 大小写映射
s = '1, 2, 3, ...' s.swapcase == s # => true s = 'こんにちは' s.swapcase == s # => true
大小写受给定的 mapping 影响,该映射可以是 :ascii、:fold 或 :turkic;参见 大小写映射。
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_swapcase_bang(int argc, VALUE *argv, VALUE str)
{
rb_encoding *enc;
OnigCaseFoldType flags = ONIGENC_CASE_UPCASE | ONIGENC_CASE_DOWNCASE;
flags = check_case_options(argc, argv, flags);
str_modify_keep_cr(str);
enc = str_true_enc(str);
if (flags&ONIGENC_CASE_ASCII_ONLY)
rb_str_ascii_casemap(str, str, &flags, enc);
else
str_shared_replace(str, rb_str_casemap(str, &flags, enc));
if (ONIGENC_CASE_MODIFIED&flags) return str;
return Qnil;
}
Source
static VALUE
string_to_c(VALUE self)
{
VALUE num;
rb_must_asciicompat(self);
(void)parse_comp(rb_str_fill_terminator(self, 1), FALSE, &num);
return num;
}
返回一个 Complex 对象:解析 self 的前导子串以提取两个数字,这两个数字成为复数对象的坐标。
子串被解释为包含矩形坐标(实部和虚部)或极坐标(幅度和角度),具体取决于包含或隐含的“分隔符”字符。
-
'+'、'-'或无分隔符:矩形坐标。 -
'@':极坐标。
简而言之
在这些示例中,我们使用方法 Complex#rect 显示矩形坐标,使用方法 Complex#polar 显示极坐标。
# Rectangular coordinates.
# Real-only: no separator; imaginary part is zero.
'9'.to_c.rect # => [9, 0] # Integer.
'-9'.to_c.rect # => [-9, 0] # Integer (negative).
'2.5'.to_c.rect # => [2.5, 0] # Float.
'1.23e-14'.to_c.rect # => [1.23e-14, 0] # Float with exponent.
'2.5/1'.to_c.rect # => [(5/2), 0] # Rational.
# Some things are ignored.
'foo1'.to_c.rect # => [0, 0] # Unparsed entire substring.
'1foo'.to_c.rect # => [1, 0] # Unparsed trailing substring.
' 1 '.to_c.rect # => [1, 0] # Leading and trailing whitespace.
*
# Imaginary only: trailing 'i' required; real part is zero.
'9i'.to_c.rect # => [0, 9]
'-9i'.to_c.rect # => [0, -9]
'2.5i'.to_c.rect # => [0, 2.5]
'1.23e-14i'.to_c.rect # => [0, 1.23e-14]
'2.5/1i'.to_c.rect # => [0, (5/2)]
# Real and imaginary; '+' or '-' separator; trailing 'i' required.
'2+3i'.to_c.rect # => [2, 3]
'-2-3i'.to_c.rect # => [-2, -3]
'2.5+3i'.to_c.rect # => [2.5, 3]
'2.5+3/2i'.to_c.rect # => [2.5, (3/2)]
# Polar coordinates; '@' separator; magnitude required.
'1.0@0'.to_c.polar # => [1.0, 0.0]
'1.0@'.to_c.polar # => [1.0, 0.0]
"1.0@#{Math::PI}".to_c.polar # => [1.0, 3.141592653589793]
"1.0@#{Math::PI/2}".to_c.polar # => [1.0, 1.5707963267948966]
解析值
解析可以被视为在子串中查找嵌入的数字字面量。
本节展示了该方法如何从前导子串解析数字值。示例显示了仅实部或仅虚部的解析;解析对每个部分都相同。
'1foo'.to_c # => (1+0i) # Ignores trailing unparsed characters. ' 1 '.to_c # => (1+0i) # Ignores leading and trailing whitespace. 'x1'.to_c # => (0+0i) # Finds no leading numeric. # Integer literal embedded in the substring. '1'.to_c # => (1+0i) '-1'.to_c # => (-1+0i) '1i'.to_c # => (0+1i) # Integer literals that don't work. '0b100'.to_c # => (0+0i) # Not parsed as binary. '0o100'.to_c # => (0+0i) # Not parsed as octal. '0d100'.to_c # => (0+0i) # Not parsed as decimal. '0x100'.to_c # => (0+0i) # Not parsed as hexadecimal. '010'.to_c # => (10+0i) # Not parsed as octal. # Float literals: '3.14'.to_c # => (3.14+0i) '3.14i'.to_c # => (0+3.14i) '1.23e4'.to_c # => (12300.0+0i) '1.23e+4'.to_c # => (12300.0+0i) '1.23e-4'.to_c # => (0.000123+0i) # Rational literals: '1/2'.to_c # => ((1/2)+0i) '-1/2'.to_c # => ((-1/2)+0i) '1/2r'.to_c # => ((1/2)+0i) '-1/2r'.to_c # => ((-1/2)+0i)
矩形坐标
使用分隔符 '+' 或 '-',或者不使用分隔符,将值解释为矩形坐标:实部和虚部。
不使用分隔符时,将单个值分配给实部或虚部。
''.to_c # => (0+0i) # Defaults to zero. '1'.to_c # => (1+0i) # Real (no trailing 'i'). '1i'.to_c # => (0+1i) # Imaginary (trailing 'i'). 'i'.to_c # => (0+1i) # Special case (imaginary 1).
使用分隔符 '+',两个部分都为正(或零)。
# Without trailing 'i'. '+'.to_c # => (0+0i) # No values: defaults to zero. '+1'.to_c # => (1+0i) # Value after '+': real only. '1+'.to_c # => (1+0i) # Value before '+': real only. '2+1'.to_c # => (2+0i) # Values before and after '+': real and imaginary. # With trailing 'i'. '+1i'.to_c # => (0+1i) # Value after '+': imaginary only. '2+i'.to_c # => (2+1i) # Value before '+': real and imaginary 1. '2+1i'.to_c # => (2+1i) # Values before and after '+': real and imaginary.
使用分隔符 '-',虚部为负。
# Without trailing 'i'. '-'.to_c # => (0+0i) # No values: defaults to zero. '-1'.to_c # => (-1+0i) # Value after '-': negative real, zero imaginary. '1-'.to_c # => (1+0i) # Value before '-': positive real, zero imaginary. '2-1'.to_c # => (2+0i) # Values before and after '-': positive real, zero imaginary. # With trailing 'i'. '-1i'.to_c # => (0-1i) # Value after '-': negative real, zero imaginary. '2-i'.to_c # => (2-1i) # Value before '-': positive real, negative imaginary. '2-1i'.to_c # => (2-1i) # Values before and after '-': positive real, negative imaginary.
请注意,后缀字符 'i' 可以是 'I'、'j' 或 'J',效果相同。
极坐标
使用分隔符 '@')将值解释为极坐标:幅度和角度。
'2@'.to_c.polar # => [2, 0.0] # Value before '@': magnitude only. # Values before and after '@': magnitude and angle. '2@1'.to_c.polar # => [2.0, 1.0] "1.0@#{Math::PI/2}".to_c # => (0.0+1i) "1.0@#{Math::PI}".to_c # => (-1+0.0i) # Magnitude not given: defaults to zero. '@'.to_c.polar # => [0, 0.0] '@1'.to_c.polar # => [0, 0.0] '1.0@0'.to_c # => (1+0.0i)
请注意,在所有情况下,后缀字符 'i' 都可以是 'I'、'j'、'J',效果相同。
请参阅 转换为非字符串。
Source
static VALUE
rb_str_to_f(VALUE str)
{
return DBL2NUM(rb_str_to_dbl(str, FALSE));
}
Returns the result of interpreting leading characters in +self+ as a Float: '3.14159'.to_f # => 3.14159 '1.234e-2'.to_f # => 0.01234 Characters past a leading valid number are ignored: '3.14 (pi to two places)'.to_f # => 3.14 Returns zero if there is no leading valid number: 'abcdef'.to_f # => 0.0
请参阅 转换为非字符串。
Source
static VALUE
rb_str_to_i(int argc, VALUE *argv, VALUE str)
{
int base = 10;
if (rb_check_arity(argc, 0, 1) && (base = NUM2INT(argv[0])) < 0) {
rb_raise(rb_eArgError, "invalid radix %d", base);
}
return rb_str_to_inum(str, base, FALSE);
}
返回解释 self 前导字符为给定 base 的整数的结果;base 必须是 0 或范围 (2..36)。
'123456'.to_i # => 123456 '123def'.to_i(16) # => 1195503
当给定 base 为零时,字符串 object 可能包含前导字符来指定实际基数。
'123def'.to_i(0) # => 123 '0123def'.to_i(0) # => 83 '0b123def'.to_i(0) # => 1 '0o123def'.to_i(0) # => 83 '0d123def'.to_i(0) # => 123 '0x123def'.to_i(0) # => 1195503
(给定 base)的有效数字后的字符将被忽略。
'12.345'.to_i # => 12 '12345'.to_i(2) # => 1
如果没有前导有效数字,则返回零。
'abcdef'.to_i # => 0 '2'.to_i(2) # => 0
相关:参见 转换为非字符串。
Source
# File ext/json/lib/json/add/string.rb, line 32 def to_json_raw(...) to_json_raw_object.to_json(...) end
此方法通过调用此 String 的 to_json_raw_object 的结果创建 JSON 文本。
Source
# File ext/json/lib/json/add/string.rb, line 21 def to_json_raw_object { JSON.create_id => self.class.name, "raw" => unpack("C*"), } end
此方法创建一个原始对象哈希,可以嵌套到其他数据结构中,并会生成为原始字符串。如果您想将原始字符串转换为 JSON 而不是 UTF-8 字符串(例如,二进制数据),则应使用此方法。
Source
static VALUE
string_to_r(VALUE self)
{
VALUE num;
rb_must_asciicompat(self);
num = parse_rat(RSTRING_PTR(self), RSTRING_END(self), 0, TRUE);
if (RB_FLOAT_TYPE_P(num) && !FLOAT_ZERO_P(num))
rb_raise(rb_eFloatDomainError, "Infinity");
return num;
}
返回解释 self 前导字符为有理数值的结果。
'123'.to_r # => (123/1) # Integer literal. '300/2'.to_r # => (150/1) # Rational literal. '-9.2'.to_r # => (-46/5) # Float literal. '-9.2e2'.to_r # => (-920/1) # Float literal.
忽略前导和尾随空格,以及尾随的非数字字符。
' 2 '.to_r # => (2/1) '21-Jun-09'.to_r # => (21/1)
如果没有前导数字字符,则返回有理数零。
'BWV 1079'.to_r # => (0/1)
注意: '0.3'.to_r 等同于 3/10r,但与 0.3.to_r 不同。
'0.3'.to_r # => (3/10) 3/10r # => (3/10) 0.3.to_r # => (5404319552844595/18014398509481984)
相关:参见 转换为非字符串。
Source
static VALUE
rb_str_to_s(VALUE str)
{
if (rb_obj_class(str) != rb_cString) {
return str_duplicate(rb_cString, str);
}
return str;
}
Source
static VALUE
rb_str_tr(VALUE str, VALUE src, VALUE repl)
{
str = str_duplicate(rb_cString, str);
tr_trans(str, src, repl, 0);
return str;
}
返回 self 的副本,其中由字符串 selector 指定的每个字符都转换为字符串 replacements 中相应的字符。对应关系是位置的。
-
selector指定的第一个字符的每次出现都将转换为replacements中的第一个字符。 -
selector指定的第二个字符的每次出现都将转换为replacements中的第二个字符。 -
依此类推。
示例
'hello'.tr('el', 'ip') #=> "hippo"
如果 replacements 比 selector 短,它将被隐式地用其最后一个字符进行填充。
'hello'.tr('aeiou', '-') # => "h-ll-" 'hello'.tr('aeiou', 'AA-') # => "hAll-"
参数 selector 和 replacements 必须是有效的字符选择器(请参阅 Character Selectors),并且可以使用其任何有效形式,包括否定、范围和转义。
'hello'.tr('^aeiou', '-') # => "-e--o" # Negation. 'ibm'.tr('b-z', 'a-z') # => "hal" # Range. 'hel^lo'.tr('\^aeiou', '-') # => "h-l-l-" # Escaped leading caret. 'i-b-m'.tr('b\-z', 'a-z') # => "ibabm" # Escaped embedded hyphen. 'foo\\bar'.tr('ab\\', 'XYZ') # => "fooZYXr" # Escaped backslash.
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_tr_bang(VALUE str, VALUE src, VALUE repl)
{
return tr_trans(str, src, repl, 0);
}
Source
static VALUE
rb_str_tr_s(VALUE str, VALUE src, VALUE repl)
{
str = str_duplicate(rb_cString, str);
tr_trans(str, src, repl, 1);
return str;
}
类似于 String#tr,但:
-
还会压缩转换后字符串的修改部分;请参阅
String#squeeze。 -
返回转换并压缩后的字符串。
示例
'hello'.tr_s('l', 'r') #=> "hero" 'hello'.tr_s('el', '-') #=> "h-o" 'hello'.tr_s('el', 'hx') #=> "hhxo"
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_tr_s_bang(VALUE str, VALUE src, VALUE repl)
{
return tr_trans(str, src, repl, 1);
}
Source
static VALUE
str_undump(VALUE str)
{
const char *s = RSTRING_PTR(str);
const char *s_end = RSTRING_END(str);
rb_encoding *enc = rb_enc_get(str);
VALUE undumped = rb_enc_str_new(s, 0L, enc);
bool utf8 = false;
bool binary = false;
int w;
rb_must_asciicompat(str);
if (rb_str_is_ascii_only_p(str) == Qfalse) {
rb_raise(rb_eRuntimeError, "non-ASCII character detected");
}
if (!str_null_check(str, &w)) {
rb_raise(rb_eRuntimeError, "string contains null byte");
}
if (RSTRING_LEN(str) < 2) goto invalid_format;
if (*s != '"') goto invalid_format;
/* strip '"' at the start */
s++;
for (;;) {
if (s >= s_end) {
rb_raise(rb_eRuntimeError, "unterminated dumped string");
}
if (*s == '"') {
/* epilogue */
s++;
if (s == s_end) {
/* ascii compatible dumped string */
break;
}
else {
static const char force_encoding_suffix[] = ".force_encoding(\""; /* "\")" */
static const char dup_suffix[] = ".dup";
const char *encname;
int encidx;
ptrdiff_t size;
/* check separately for strings dumped by older versions */
size = sizeof(dup_suffix) - 1;
if (s_end - s > size && memcmp(s, dup_suffix, size) == 0) s += size;
size = sizeof(force_encoding_suffix) - 1;
if (s_end - s <= size) goto invalid_format;
if (memcmp(s, force_encoding_suffix, size) != 0) goto invalid_format;
s += size;
if (utf8) {
rb_raise(rb_eRuntimeError, "dumped string contained Unicode escape but used force_encoding");
}
encname = s;
s = memchr(s, '"', s_end-s);
size = s - encname;
if (!s) goto invalid_format;
if (s_end - s != 2) goto invalid_format;
if (s[0] != '"' || s[1] != ')') goto invalid_format;
encidx = rb_enc_find_index2(encname, (long)size);
if (encidx < 0) {
rb_raise(rb_eRuntimeError, "dumped string has unknown encoding name");
}
rb_enc_associate_index(undumped, encidx);
}
break;
}
if (*s == '\\') {
s++;
if (s >= s_end) {
rb_raise(rb_eRuntimeError, "invalid escape");
}
undump_after_backslash(undumped, &s, s_end, &enc, &utf8, &binary);
}
else {
rb_str_cat(undumped, s++, 1);
}
}
RB_GC_GUARD(str);
return undumped;
invalid_format:
rb_raise(rb_eRuntimeError, "invalid dumped string; not wrapped with '\"' nor '\"...\".force_encoding(\"...\")' form");
}
是 String#dump 的逆操作;返回 self 的副本,其中 String#dump 所做的更改已“撤销”。
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_unicode_normalize(int argc, VALUE *argv, VALUE str)
{
return unicode_normalize_common(argc, argv, str, id_normalize);
}
返回 self 的副本,其中应用了 Unicode 规范化。
参数 form 必须是以下符号之一(请参阅 Unicode 规范化形式):
-
:nfc:规范分解,然后进行规范组合。 -
:nfd:规范分解。 -
:nfkc:兼容性分解,然后进行规范组合。 -
:nfkd:兼容性分解。
self 的编码必须是以下之一:
-
Encoding::UTF_8. -
Encoding::UTF_16BE. -
Encoding::UTF_16LE. -
Encoding::UTF_32BE. -
Encoding::UTF_32LE. -
Encoding::GB18030. -
Encoding::UCS_2BE. -
Encoding::UCS_4BE.
示例
"a\u0300".unicode_normalize # => "à" # Lowercase 'a' with grave accens. "a\u0300".unicode_normalize(:nfd) # => "à" # Same.
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_unicode_normalize_bang(int argc, VALUE *argv, VALUE str)
{
return rb_str_replace(str, unicode_normalize_common(argc, argv, str, id_normalize));
}
类似于 String#unicode_normalize,但规范化是在 self 上执行的(而不是在 self 的副本上)。
相关:参见 修改。
Source
static VALUE
rb_str_unicode_normalized_p(int argc, VALUE *argv, VALUE str)
{
return unicode_normalize_common(argc, argv, str, id_normalized_p);
}
返回 self 是否处于给定 form 的 Unicode 规范化状态;请参阅 String#unicode_normalize。
form 必须是 :nfc、:nfd、:nfkc 或 :nfkd 之一。
示例
"a\u0300".unicode_normalized? # => false "a\u0300".unicode_normalized?(:nfd) # => true "\u00E0".unicode_normalized? # => true "\u00E0".unicode_normalized?(:nfd) # => false
如果 self 不是 Unicode 编码,则引发异常。
s = "\xE0".force_encoding(Encoding::ISO_8859_1) s.unicode_normalized? # Raises Encoding::CompatibilityError
相关:参见 查询。
Source
# File pack.rb, line 25 def unpack(fmt, offset: 0) Primitive.attr! :use_block Primitive.pack_unpack(fmt, offset) end
Source
# File pack.rb, line 37 def unpack1(fmt, offset: 0) Primitive.pack_unpack1(fmt, offset) end
类似于不带块的 String#unpack,但只解包并返回第一个提取的对象。请参阅 Packed Data。
相关:参见 转换为非字符串。
Source
static VALUE
rb_str_upcase(int argc, VALUE *argv, VALUE str)
{
rb_encoding *enc;
OnigCaseFoldType flags = ONIGENC_CASE_UPCASE;
VALUE ret;
flags = check_case_options(argc, argv, flags);
enc = str_true_enc(str);
if (case_option_single_p(flags, enc, str)) {
ret = rb_str_new(RSTRING_PTR(str), RSTRING_LEN(str));
str_enc_copy_direct(ret, str);
upcase_single(ret);
}
else if (flags&ONIGENC_CASE_ASCII_ONLY) {
ret = rb_str_new(0, RSTRING_LEN(str));
rb_str_ascii_casemap(str, ret, &flags, enc);
}
else {
ret = rb_str_casemap(str, &flags, enc);
}
return ret;
}
返回一个新字符串,其中包含 self 中的大写字符。
'hello'.upcase # => "HELLO" 'straße'.upcase # => "STRASSE" 'привет'.upcase # => "ПРИВЕТ" 'RubyGems.org'.upcase # => "RUBYGEMS.ORG"
self 和大写结果的大小可能不同。
s = 'Straße' s.size # => 6 s.upcase # => "STRASSE" s.upcase.size # => 7
某些字符(以及某些字符集)没有大写和小写的版本;参见 大小写映射
s = '1, 2, 3, ...' s.upcase == s # => true s = 'こんにちは' s.upcase == s # => true
大小写受给定的 mapping 影响,该映射可以是 :ascii、:fold 或 :turkic;参见 大小写映射。
相关:参见 转换为新字符串。
Source
static VALUE
rb_str_upcase_bang(int argc, VALUE *argv, VALUE str)
{
rb_encoding *enc;
OnigCaseFoldType flags = ONIGENC_CASE_UPCASE;
flags = check_case_options(argc, argv, flags);
str_modify_keep_cr(str);
enc = str_true_enc(str);
if (case_option_single_p(flags, enc, str)) {
if (upcase_single(str))
flags |= ONIGENC_CASE_MODIFIED;
}
else if (flags&ONIGENC_CASE_ASCII_ONLY)
rb_str_ascii_casemap(str, str, &flags, enc);
else
str_shared_replace(str, rb_str_casemap(str, &flags, enc));
if (ONIGENC_CASE_MODIFIED&flags) return str;
return Qnil;
}
Source
static VALUE
rb_str_upto(int argc, VALUE *argv, VALUE beg)
{
VALUE end, exclusive;
rb_scan_args(argc, argv, "11", &end, &exclusive);
RETURN_ENUMERATOR(beg, argc, argv);
return rb_str_upto_each(beg, end, RTEST(exclusive), str_upto_i, Qnil);
}
给定一个块,将 successive calls to String#succ 返回的每个 String 值调用该块;第一个值是 self,下一个是 self.succ,以此类推;序列在达到值 other_string 时终止;返回 self
a = [] 'a'.upto('f') {|c| a.push(c) } a # => ["a", "b", "c", "d", "e", "f"] a = [] 'Ж'.upto('П') {|c| a.push(c) } a # => ["Ж", "З", "И", "Й", "К", "Л", "М", "Н", "О", "П"] a = [] 'よ'.upto('ろ') {|c| a.push(c) } a # => ["よ", "ら", "り", "る", "れ", "ろ"] a = [] 'a8'.upto('b6') {|c| a.push(c) } a # => ["a8", "a9", "b0", "b1", "b2", "b3", "b4", "b5", "b6"]
如果参数 exclusive 被给出为一个真值对象,则最后一个值将被省略
a = [] 'a'.upto('f', true) {|c| a.push(c) } a # => ["a", "b", "c", "d", "e"]
如果 other_string 无法被达到,则不调用该块
'25'.upto('5') {|s| fail s } 'aa'.upto('a') {|s| fail s }
没有给出块时,返回一个新的 Enumerator
'a8'.upto('b6') # => #<Enumerator: "a8":upto("b6")>
相关:参见 迭代。
Source
static VALUE
rb_str_valid_encoding_p(VALUE str)
{
int cr = rb_enc_str_coderange(str);
return RBOOL(cr != ENC_CODERANGE_BROKEN);
}
返回 self 是否编码正确
s = 'Straße' s.valid_encoding? # => true s.encoding # => #<Encoding:UTF-8> s.force_encoding(Encoding::ASCII).valid_encoding? # => false
相关:参见 查询。