class String

A String object has an arbitrary sequence of bytes, typically representing text or binary data. A String object may be created using String::new or as literals.

String objects differ from Symbol objects in that Symbol objects are designed to be used as identifiers, instead of text or data.

You can create a String object explicitly with

A string literal.
A heredoc literal.

You can convert certain objects to Strings with

Method String.

Some String methods modify self. Typically, a method whose name ends with ! modifies self and returns self; often, a similarly named method (without the !) returns a new string.

In general, if both bang and non-bang versions of a method exist, the bang method mutates and the non-bang method does not. However, a method without a bang can also mutate, such as String#replace.

Substitution Methods

These methods perform substitutions

String#sub: One substitution (or none); returns a new string.
String#sub!: One substitution (or none); returns self if any changes, nil otherwise.
String#gsub: Zero or more substitutions; returns a new string.
String#gsub!: Zero or more substitutions; returns self if any changes, nil otherwise.

Each of these methods takes

A first argument, pattern (String or Regexp), that specifies the substring(s) to be replaced.
Either of the following
- A second argument, replacement (String or Hash), that determines the replacing string.
- A block that will determine the replacing string.

The examples in this section mostly use the String#sub and String#gsub methods; the principles illustrated apply to all four substitution methods.

Argument pattern

Argument pattern is commonly a regular expression

s = 'hello'
s.sub(/[aeiou]/, '*') # => "h*llo"
s.gsub(/[aeiou]/, '*') # => "h*ll*"
s.gsub(/[aeiou]/, '')  # => "hll"
s.sub(/ell/, 'al')     # => "halo"
s.gsub(/xyzzy/, '*')   # => "hello"
'THX1138'.gsub(/\d+/, '00') # => "THX00"

When pattern is a string, all its characters are treated as ordinary characters (not as Regexp special characters)

'THX1138'.gsub('\d+', '00') # => "THX1138"

String replacement

If replacement is a string, that string determines the replacing string that is substituted for the matched text.

Each of the examples above uses a simple string as the replacing string.

String replacement may contain back-references to the pattern’s captures

\n (n is a non-negative integer) refers to $n.
\k<name> refers to the named capture name.

See Regexp for details.

Note that within the string replacement, a character combination such as $& is treated as ordinary text, not as a special match variable. However, you may refer to some special match variables using these combinations

\& and \0 correspond to $&, which contains the complete matched text.
\' corresponds to $', which contains the string after the match.
\` corresponds to $`, which contains the string before the match.
\+ corresponds to $+, which contains the last capture group.

See Regexp for details.

Note that \\ is interpreted as an escape, i.e., a single backslash.

Note also that a string literal consumes backslashes. See String Literals for details about string literals.

A back-reference is typically preceded by an additional backslash. For example, if you want to write a back-reference \& in replacement with a double-quoted string literal, you need to write "..\\&..".

If you want to write a non-back-reference string \& in replacement, you need to first escape the backslash to prevent this method from interpreting it as a back-reference, and then you need to escape the backslashes again to prevent a string literal from consuming them: "..\\\\&..".

You may want to use the block form to avoid excessive backslashes.

Hash replacement

If the argument replacement is a hash, and pattern matches one of its keys, the replacing string is the value for that key

h = {'foo' => 'bar', 'baz' => 'bat'}
'food'.sub('foo', h) # => "bard"

Note that a symbol key does not match

h = {foo: 'bar', baz: 'bat'}
'food'.sub('foo', h) # => "d"

Block

In the block form, the current match string is passed to the block; the block’s return value becomes the replacing string

s = '@'
'1234'.gsub(/\d/) { |match| s.succ! } # => "ABCD"

Special match variables such as $1, $2, $`, $&, and $' are set appropriately.

Whitespace in Strings

In the class String, whitespace is defined as a contiguous sequence of characters consisting of any mixture of the following

NL (null): "\x00", "\u0000".
HT (horizontal tab): "\x09", "\t".
LF (line feed): "\x0a", "\n".
VT (vertical tab): "\x0b", "\v".
FF (form feed): "\x0c", "\f".
CR (carriage return): "\x0d", "\r".
SP (space): "\x20", " ".

Whitespace is relevant for the following methods

lstrip, lstrip!: Strip leading whitespace.
rstrip, rstrip!: Strip trailing whitespace.
strip, strip!: Strip leading and trailing whitespace.

这里有什么

First, what’s elsewhere. Class String

Inherits from the Object class.
Includes the Comparable module.

Here, class String provides methods that are useful for

Creating a String

::new: Returns a new string.
::try_convert: Returns a new string created from a given object.

Freezing/Unfreezing

+@: Returns a string that is not frozen: self if not frozen; self.dup otherwise.
-@ (aliased as dedup): Returns a string that is frozen: self if already frozen; self.freeze otherwise.
freeze: Freezes self if not already frozen; returns self.

查询

Counts

bytesize: Returns the count of bytes.
count: Returns the count of substrings matching given strings.
empty?: Returns whether the length of self is zero.
length (aliased as size): Returns the count of characters (not bytes).

Substrings

=~: Returns the index of the first substring that matches a given Regexp or other object; returns nil if no match is found.
byteindex: Returns the byte index of the first occurrence of a given substring.
byterindex: Returns the byte index of the last occurrence of a given substring.
index: Returns the index of the first occurrence of a given substring; returns nil if none found.
rindex: Returns the index of the last occurrence of a given substring; returns nil if none found.
include?: Returns true if the string contains a given substring; false otherwise.
match: Returns a MatchData object if the string matches a given Regexp; nil otherwise.
match?: Returns true if the string matches a given Regexp; false otherwise.
start_with?: Returns true if the string begins with any of the given substrings.
end_with?: Returns true if the string ends with any of the given substrings.

编码

encoding: Returns the Encoding object that represents the encoding of the string.
unicode_normalized?: Returns true if the string is in Unicode normalized form; false otherwise.
valid_encoding?: Returns true if the string contains only characters that are valid for its encoding.
ascii_only?: Returns true if the string has only ASCII characters; false otherwise.

其他

sum: Returns a basic checksum for the string: the sum of each byte.
hash: Returns the integer hash code.

比较

== (aliased as ===): Returns true if a given other string has the same content as self.
eql?: Returns true if the content is the same as the given other string.
<=>: Returns -1, 0, or 1 as a given other string is smaller than, equal to, or larger than self.
casecmp: Ignoring case, returns -1, 0, or 1 as self is smaller than, equal to, or larger than a given other string.
casecmp?: Ignoring case, returns whether a given other string is equal to self.

Modifying

Each of these methods modifies self.

Insertion

insert: Returns self with a given string inserted at a specified offset.
<<: Returns self concatenated with a given string or integer.
append_as_bytes: Returns self concatenated with strings without performing any encoding validation or conversion.
prepend: Prefixes to self the concatenation of given other strings.

Substitution

bytesplice: Replaces bytes of self with bytes from a given string; returns self.
sub!: Replaces the first substring that matches a given pattern with a given replacement string; returns self if any changes, nil otherwise.
gsub!: Replaces each substring that matches a given pattern with a given replacement string; returns self if any changes, nil otherwise.
succ! (aliased as next!): Returns self modified to become its own successor.
replace: Returns self with its entire content replaced by a given string.
reverse!: Returns self with its characters in reverse order.
setbyte: Sets the byte at a given integer offset to a given value; returns the argument.
tr!: Replaces specified characters in self with specified replacement characters; returns self if any changes, nil otherwise.
tr_s!: Replaces specified characters in self with specified replacement characters, removing duplicates from the substrings that were modified; returns self if any changes, nil otherwise.

Casing

capitalize!: Upcases the initial character and downcases all others; returns self if any changes, nil otherwise.
downcase!: Downcases all characters; returns self if any changes, nil otherwise.
upcase!: Upcases all characters; returns self if any changes, nil otherwise.
swapcase!: Upcases each downcase character and downcases each upcase character; returns self if any changes, nil otherwise.

Encoding

encode!: Returns self with all characters transcoded from one encoding to another.
unicode_normalize!: Unicode-normalizes self; returns self.
scrub!: Replaces each invalid byte with a given character; returns self.
force_encoding: Changes the encoding to a given encoding; returns self.

Deletion

clear: Removes all content, so that self is empty; returns self.
slice!, []=: Removes a substring determined by a given index, start/length, range, regexp, or substring.
squeeze!: Removes contiguous duplicate characters; returns self.
delete!: Removes characters as determined by the intersection of substring arguments.
delete_prefix!: Removes leading prefix; returns self if any changes, nil otherwise.
delete_suffix!: Removes trailing suffix; returns self if any changes, nil otherwise.
lstrip!: Removes leading whitespace; returns self if any changes, nil otherwise.
rstrip!: Removes trailing whitespace; returns self if any changes, nil otherwise.
strip!: Removes leading and trailing whitespace; returns self if any changes, nil otherwise.
chomp!: Removes the trailing record separator, if found; returns self if any changes, nil otherwise.
chop!: Removes trailing newline characters if found; otherwise removes the last character; returns self if any changes, nil otherwise.

Converting to New String

Each of these methods returns a new String based on self, often just a modified copy of self.

Extension

*: Returns the concatenation of multiple copies of self.
+: Returns the concatenation of self and a given other string.
center: Returns a copy of self, centered by specified padding.
concat: Returns the concatenation of self with given other strings.
ljust: Returns a copy of self of a given length, right-padded with a given other string.
rjust: Returns a copy of self of a given length, left-padded with a given other string.

Encoding

b: Returns a copy of self with ASCII-8BIT encoding.
scrub: Returns a copy of self with each invalid byte replaced with a given character.
unicode_normalize: Returns a copy of self with each character Unicode-normalized.
encode: Returns a copy of self with all characters transcoded from one encoding to another.

Substitution

dump: Returns a printable version of self, enclosed in double-quotes.
undump: Inverse of dump; returns a copy of self with changes of the kinds made by dump “undone.”
sub: Returns a copy of self with the first substring matching a given pattern replaced with a given replacement string.
gsub: Returns a copy of self with each substring that matches a given pattern replaced with a given replacement string.
succ (aliased as next): Returns the string that is the successor to self.
reverse: Returns a copy of self with its characters in reverse order.
tr: Returns a copy of self with specified characters replaced with specified replacement characters.
tr_s: Returns a copy of self with specified characters replaced with specified replacement characters, removing duplicates from the substrings that were modified.
%: Returns the string resulting from formatting a given object into self.

Casing

capitalize: Returns a copy of self with the first character upcased and all other characters downcased.
downcase: Returns a copy of self with all characters downcased.
upcase: Returns a copy of self with all characters upcased.
swapcase: Returns a copy of self with all upcase characters downcased and all downcase characters upcased.

Deletion

delete: Returns a copy of self with characters removed.
delete_prefix: Returns a copy of self with a given prefix removed.
delete_suffix: Returns a copy of self with a given suffix removed.
lstrip: Returns a copy of self with leading whitespace removed.
rstrip: Returns a copy of self with trailing whitespace removed.
strip: Returns a copy of self with leading and trailing whitespace removed.
chomp: Returns a copy of self with a trailing record separator removed, if found.
chop: Returns a copy of self with trailing newline characters or the last character removed.
squeeze: Returns a copy of self with contiguous duplicate characters removed.
[] (aliased as slice): Returns a substring determined by a given index, start/length, range, regexp, or string.
byteslice: Returns a substring determined by a given index, start/length, or range.
chr: Returns the first character.

Duplication

to_s (aliased as to_str): If self is a subclass of String, returns self copied into a String; otherwise, returns self.

Converting to Non-String

Each of these methods converts the contents of self to a non-String.

Characters, Bytes, and Clusters

bytes: Returns an array of the bytes in self.
chars: Returns an array of the characters in self.
codepoints: Returns an array of the integer ordinals in self.
getbyte: Returns the integer byte at the given index in self.
grapheme_clusters: Returns an array of the grapheme clusters in self.

Splitting

lines: Returns an array of the lines in self, as determined by a given record separator.
partition: Returns a 3-element array determined by the first substring that matches a given substring or regexp.
rpartition: Returns a 3-element array determined by the last substring that matches a given substring or regexp.
split: Returns an array of substrings determined by a given delimiter – regexp or string – or, if a block is given, passes those substrings to the block.

Matching

scan: Returns an array of substrings matching a given regexp or string, or, if a block is given, passes each matching substring to the block.
unpack: Returns an array of substrings extracted from self according to a given format.
unpack1: Returns the first substring extracted from self according to a given format.

Numerics

hex: Returns the integer value of the leading characters, interpreted as hexadecimal digits.
oct: Returns the integer value of the leading characters, interpreted as octal digits.
ord: Returns the integer ordinal of the first character in self.
to_c: Returns the complex value of leading characters, interpreted as a complex number.
to_i: Returns the integer value of leading characters, interpreted as an integer.
to_f: Returns the floating-point value of leading characters, interpreted as a floating-point number.
to_r: Returns the rational value of leading characters, interpreted as a rational.

Strings and Symbols

inspect: Returns a copy of self, enclosed in double quotes, with special characters escaped.
intern (aliased as to_sym): Returns the symbol corresponding to self.

迭代

each_byte: Calls the given block with each successive byte in self.
each_char: Calls the given block with each successive character in self.
each_codepoint: Calls the given block with each successive integer codepoint in self.
each_grapheme_cluster: Calls the given block with each successive grapheme cluster in self.
each_line: Calls the given block with each successive line in self, as determined by a given record separator.
upto: Calls the given block with each string value returned by successive calls to succ.

Public Class Methods

json_create(o)

Source

# File ext/json/lib/json/add/string.rb, line 11
def self.json_create(object)
  object["raw"].pack("C*")
end

Raw Strings are JSON Objects (the raw bytes are stored in an array for the key “raw”). The Ruby String can be created by this class method.

new(string = ''.encode(Encoding::ASCII_8BIT) , **options) → new_string

Source

static VALUE
rb_str_init(int argc, VALUE *argv, VALUE str)
{
    static ID keyword_ids[2];
    VALUE orig, opt, venc, vcapa;
    VALUE kwargs[2];
    rb_encoding *enc = 0;
    int n;

    if (!keyword_ids[0]) {
        keyword_ids[0] = rb_id_encoding();
        CONST_ID(keyword_ids[1], "capacity");
    }

    n = rb_scan_args(argc, argv, "01:", &orig, &opt);
    if (!NIL_P(opt)) {
        rb_get_kwargs(opt, keyword_ids, 0, 2, kwargs);
        venc = kwargs[0];
        vcapa = kwargs[1];
        if (!UNDEF_P(venc) && !NIL_P(venc)) {
            enc = rb_to_encoding(venc);
        }
        if (!UNDEF_P(vcapa) && !NIL_P(vcapa)) {
            long capa = NUM2LONG(vcapa);
            long len = 0;
            int termlen = enc ? rb_enc_mbminlen(enc) : 1;

            if (capa < STR_BUF_MIN_SIZE) {
                capa = STR_BUF_MIN_SIZE;
            }
            if (n == 1) {
                StringValue(orig);
                len = RSTRING_LEN(orig);
                if (capa < len) {
                    capa = len;
                }
                if (orig == str) n = 0;
            }
            str_modifiable(str);
            if (STR_EMBED_P(str) || FL_TEST(str, STR_SHARED|STR_NOFREE)) {
                /* make noembed always */
                const size_t size = (size_t)capa + termlen;
                const char *const old_ptr = RSTRING_PTR(str);
                const size_t osize = RSTRING_LEN(str) + TERM_LEN(str);
                char *new_ptr = ALLOC_N(char, size);
                if (STR_EMBED_P(str)) RUBY_ASSERT((long)osize <= str_embed_capa(str));
                memcpy(new_ptr, old_ptr, osize < size ? osize : size);
                FL_UNSET_RAW(str, STR_SHARED|STR_NOFREE);
                RSTRING(str)->as.heap.ptr = new_ptr;
            }
            else if (STR_HEAP_SIZE(str) != (size_t)capa + termlen) {
                SIZED_REALLOC_N(RSTRING(str)->as.heap.ptr, char,
                        (size_t)capa + termlen, STR_HEAP_SIZE(str));
            }
            STR_SET_LEN(str, len);
            TERM_FILL(&RSTRING(str)->as.heap.ptr[len], termlen);
            if (n == 1) {
                memcpy(RSTRING(str)->as.heap.ptr, RSTRING_PTR(orig), len);
                rb_enc_cr_str_exact_copy(str, orig);
            }
            FL_SET(str, STR_NOEMBED);
            RSTRING(str)->as.heap.aux.capa = capa;
        }
        else if (n == 1) {
            rb_str_replace(str, orig);
        }
        if (enc) {
            rb_enc_associate(str, enc);
            ENC_CODERANGE_CLEAR(str);
        }
    }
    else if (n == 1) {
        rb_str_replace(str, orig);
    }
    return str;
}

Returns a new String object containing the given string.

The options are optional keyword options (see below).

With no argument given and keyword encoding also not given, returns an empty string with the Encoding ASCII-8BIT

s = String.new # => ""
s.encoding     # => #<Encoding:ASCII-8BIT>

With argument string given and keyword option encoding not given, returns a new string with the same encoding as string

s0 = 'foo'.encode(Encoding::UTF_16)
s1 = String.new(s0)
s1.encoding # => #<Encoding:UTF-16 (dummy)>

(Unlike String.new, a string literal like '' or a here document literal always has script encoding.)

With keyword option encoding given, returns a string with the specified encoding; the encoding may be an Encoding object, an encoding name, or an encoding name alias

String.new(encoding: Encoding::US_ASCII).encoding        # => #<Encoding:US-ASCII>
String.new('', encoding: Encoding::US_ASCII).encoding    # => #<Encoding:US-ASCII>
String.new('foo', encoding: Encoding::US_ASCII).encoding # => #<Encoding:US-ASCII>
String.new('foo', encoding: 'US-ASCII').encoding         # => #<Encoding:US-ASCII>
String.new('foo', encoding: 'ASCII').encoding            # => #<Encoding:US-ASCII>

The given encoding need not be valid for the string’s content, and its validity is not checked

s = String.new('こんにちは', encoding: 'ascii')
s.valid_encoding? # => false

But the given encoding itself is checked

String.new('foo', encoding: 'bar') # Raises ArgumentError.

With keyword option capacity given, the given value is advisory only, and may or may not set the size of the internal buffer, which may in turn affect performance

String.new('foo', capacity: 1)    # Buffer size is at least 4 (includes terminal null byte).
String.new('foo', capacity: 4096) # Buffer size is at least 4;
                                  # may be equal to, greater than, or less than 4096.

try_convert(object) → object, new_string, or nil

Source

static VALUE
rb_str_s_try_convert(VALUE dummy, VALUE str)
{
    return rb_check_string_type(str);
}

尝试将给定的 object 转换为字符串。

如果 object 已经是字符串，则返回 object，不作修改。

否则，如果 object 响应 :to_str 方法，则调用 object.to_str 并返回结果。

如果 object 不响应 :to_str 方法，则返回 nil。

除非 object.to_str 返回一个字符串，否则会引发异常。

Public Instance Methods

self % object → new_string

Source

static VALUE
rb_str_format_m(VALUE str, VALUE arg)
{
    VALUE tmp = rb_check_array_type(arg);

    if (!NIL_P(tmp)) {
        return rb_str_format(RARRAY_LENINT(tmp), RARRAY_CONST_PTR(tmp), str);
    }
    return rb_str_format(1, &arg, str);
}

返回将 object 格式化到 self 中包含的格式规范的结果（参见格式规范）

'%05d' % 123 # => "00123"

如果 self 包含多个格式规范，则 object 必须是一个数组或哈希，其中包含要格式化的对象

'%-5s: %016x' % [ 'ID', self.object_id ]                # => "ID   : 00002b054ec93168"
'foo = %{foo}' % {foo: 'bar'}                           # => "foo = bar"
'foo = %{foo}, baz = %{baz}' % {foo: 'bar', baz: 'bat'} # => "foo = bar, baz = bat"

相关：参见转换为新字符串。

self * n → new_string

Source

VALUE
rb_str_times(VALUE str, VALUE times)
{
    VALUE str2;
    long n, len;
    char *ptr2;
    int termlen;

    if (times == INT2FIX(1)) {
        return str_duplicate(rb_cString, str);
    }
    if (times == INT2FIX(0)) {
        str2 = str_alloc_embed(rb_cString, 0);
        rb_enc_copy(str2, str);
        return str2;
    }
    len = NUM2LONG(times);
    if (len < 0) {
        rb_raise(rb_eArgError, "negative argument");
    }
    if (RSTRING_LEN(str) == 1 && RSTRING_PTR(str)[0] == 0) {
        if (STR_EMBEDDABLE_P(len, 1)) {
            str2 = str_alloc_embed(rb_cString, len + 1);
            memset(RSTRING_PTR(str2), 0, len + 1);
        }
        else {
            str2 = str_alloc_heap(rb_cString);
            RSTRING(str2)->as.heap.aux.capa = len;
            RSTRING(str2)->as.heap.ptr = ZALLOC_N(char, (size_t)len + 1);
        }
        STR_SET_LEN(str2, len);
        rb_enc_copy(str2, str);
        return str2;
    }
    if (len && LONG_MAX/len <  RSTRING_LEN(str)) {
        rb_raise(rb_eArgError, "argument too big");
    }

    len *= RSTRING_LEN(str);
    termlen = TERM_LEN(str);
    str2 = str_enc_new(rb_cString, 0, len, STR_ENC_GET(str));
    ptr2 = RSTRING_PTR(str2);
    if (len) {
        n = RSTRING_LEN(str);
        memcpy(ptr2, RSTRING_PTR(str), n);
        while (n <= len/2) {
            memcpy(ptr2 + n, ptr2, n);
            n *= 2;
        }
        memcpy(ptr2 + n, ptr2, len-n);
    }
    STR_SET_LEN(str2, len);
    TERM_FILL(&ptr2[len], termlen);
    rb_enc_cr_str_copy_for_substr(str2, str);

    return str2;
}

返回一个包含 self 的 n 个副本的新字符串

'Ho!' * 3 # => "Ho!Ho!Ho!"
'No!' * 0 # => ""

相关：参见转换为新字符串。

self + other_string → new_string

Source

VALUE
rb_str_plus(VALUE str1, VALUE str2)
{
    VALUE str3;
    rb_encoding *enc;
    char *ptr1, *ptr2, *ptr3;
    long len1, len2;
    int termlen;

    StringValue(str2);
    enc = rb_enc_check_str(str1, str2);
    RSTRING_GETMEM(str1, ptr1, len1);
    RSTRING_GETMEM(str2, ptr2, len2);
    termlen = rb_enc_mbminlen(enc);
    if (len1 > LONG_MAX - len2) {
        rb_raise(rb_eArgError, "string size too big");
    }
    str3 = str_enc_new(rb_cString, 0, len1+len2, enc);
    ptr3 = RSTRING_PTR(str3);
    memcpy(ptr3, ptr1, len1);
    memcpy(ptr3+len1, ptr2, len2);
    TERM_FILL(&ptr3[len1+len2], termlen);

    ENCODING_CODERANGE_SET(str3, rb_enc_to_index(enc),
                           ENC_CODERANGE_AND(ENC_CODERANGE(str1), ENC_CODERANGE(str2)));
    RB_GC_GUARD(str1);
    RB_GC_GUARD(str2);
    return str3;
}

返回一个将 other_string 连接到 self 的新字符串

'Hello from ' + self.to_s # => "Hello from main"

相关：参见转换为新字符串。

+string → new_string or self

Source

static VALUE
str_uplus(VALUE str)
{
    if (OBJ_FROZEN(str) || CHILLED_STRING_P(str)) {
        return rb_str_dup(str);
    }
    else {
        return str;
    }
}

如果 self 未被冻结且可以被修改而不会发出警告，则返回 self。

否则，返回 self.dup，它未被冻结。

相关：参见修改。

self <=> other → -1, 0, 1, or nil

Source

static VALUE
rb_str_cmp_m(VALUE str1, VALUE str2)
{
    int result;
    VALUE s = rb_check_string_type(str2);
    if (NIL_P(s)) {
        return rb_invcmp(str1, str2);
    }
    result = rb_str_cmp(str1, s);
    return INT2FIX(result);
}

比较 self 和 other，评估它们的内容，而不是它们的长度。

如果 self 较小，则为 -1。
如果两者相等，则为 0。
如果 self 较大，则为 1。
如果两者无法比较，则为 nil。

示例

'a'  <=> 'b'  # => -1
'a'  <=> 'ab' # => -1
'a'  <=> 'a'  # => 0
'b'  <=> 'a'  # => 1
'ab' <=> 'a'  # => 1
'a'  <=> :a   # => nil

类 String 包含模块 Comparable，其每个方法都使用 String#<=> 进行比较。

相关：参见查询。

self[index] → new_string or nil

self[start, length] → new_string or nil

self[range] → new_string or nil

self[regexp, capture = 0] → new_string or nil

self[substring] → new_string or nil

Source

static VALUE
rb_str_aref_m(int argc, VALUE *argv, VALUE str)
{
    if (argc == 2) {
        if (RB_TYPE_P(argv[0], T_REGEXP)) {
            return rb_str_subpat(str, argv[0], argv[1]);
        }
        else {
            return rb_str_substr_two_fixnums(str, argv[0], argv[1], TRUE);
        }
    }
    rb_check_arity(argc, 1, 2);
    return rb_str_aref(str, argv[0]);
}

返回由参数指定的 self 的子字符串。

形式 self[index]

给定一个非负整数参数 index，返回 self 中位于字符偏移量 index 处的 1 个字符的子字符串

'hello'[0]    # => "h"
'hello'[4]    # => "o"
'hello'[5]    # => nil
'Привет'[2]   # => "и"
'こんにちは'[4] # => "は"

给定一个负整数参数 index，从 self 的末尾开始倒数

'hello'[-1] # => "o"
'hello'[-5] # => "h"
'hello'[-6] # => nil

形式 self[start, length]

给定整数参数 start 和 length，返回一个长度为 length 个字符（可用时）的子字符串，该子字符串从 start 指定的字符偏移量开始。

如果参数 start 为非负数，则偏移量为 start

'hello'[0, 1]  # => "h"
'hello'[0, 5]  # => "hello"
'hello'[0, 6]  # => "hello"
'hello'[2, 3]  # => "llo"
'hello'[2, 0]  # => ""
'hello'[2, -1] # => nil

如果参数 start 为负数，则从 self 的末尾开始倒数

'hello'[-1, 1] # => "o"
'hello'[-5, 5] # => "hello"
'hello'[-1, 0] # => ""
'hello'[-6, 5] # => nil

特殊情况：如果 start 等于 self 的长度，则返回一个新的空字符串

'hello'[5, 3]  # => ""

形式 self[range]

给定 Range 参数 range，形成子字符串 self[range.start, range.size]

'hello'[0..2]  # => "hel"
'hello'[0, 3]  # => "hel"

'hello'[0...2] # => "he"
'hello'[0, 2]  # => "he"

'hello'[0, 0]  # => ""
'hello'[0...0] # => ""

形式 self[regexp, capture = 0]

给定 Regexp 参数 regexp 和 capture 为零，在 self 中搜索匹配的子字符串；更新与 Regexp 相关的全局变量

'hello'[/ell/]     # => "ell"
'hello'[/l+/]      # => "ll"
'hello'[//]        # => ""
'hello'[/nosuch/]  # => nil

当 capture 为正整数 n 时，返回第 n 个匹配组

'hello'[/(h)(e)(l+)(o)/]    # => "hello"
'hello'[/(h)(e)(l+)(o)/, 1] # => "h"
$1                          # => "h"
'hello'[/(h)(e)(l+)(o)/, 2] # => "e"
$2                          # => "e"
'hello'[/(h)(e)(l+)(o)/, 3] # => "ll"
'hello'[/(h)(e)(l+)(o)/, 4] # => "o"
'hello'[/(h)(e)(l+)(o)/, 5] # => nil

形式 self[substring]

给定字符串参数 substring，如果找到，则返回 self 的匹配子字符串

'hello'['ell']      # => "ell"
'hello'['']         # => ""
'hello'['nosuch']   # => nil
'Привет'['ив']      # => "ив"
'こんにちは'['んにち'] # => "んにち"

相关：参见转换为新字符串。

也别名为：slice

self[index] = other_string → new_string

self[start, length] = other_string → new_string

self[range] = other_string → new_string

self[regexp, capture = 0] = other_string → new_string

self[substring] = other_string → new_string

Source

static VALUE
rb_str_aset_m(int argc, VALUE *argv, VALUE str)
{
    if (argc == 3) {
        if (RB_TYPE_P(argv[0], T_REGEXP)) {
            rb_str_subpat_set(str, argv[0], argv[1], argv[2]);
        }
        else {
            rb_str_update(str, NUM2LONG(argv[0]), NUM2LONG(argv[1]), argv[2]);
        }
        return argv[2];
    }
    rb_check_arity(argc, 2, 3);
    return rb_str_aset(str, argv[0], argv[1]);
}

返回 self，其全部、部分或无内容被替换；返回参数 other_string。

形式 self[index] = other_string

给定一个非负整数参数 index，搜索 self 中位于字符偏移量 index 处的 1 个字符的子字符串

s = 'hello'
s[0] = 'foo' # => "foo"
s            # => "fooello"

s = 'hello'
s[4] = 'foo' # => "foo"
s            # => "hellfoo"

s = 'hello'
s[5] = 'foo' # => "foo"
s            # => "hellofoo"

s = 'hello'
s[6] = 'foo' # Raises IndexError: index 6 out of string.

给定一个负整数参数 index，从 self 的末尾开始倒数

s = 'hello'
s[-1] = 'foo'  # => "foo"
s              # => "hellfoo"

s = 'hello'
s[-5] = 'foo'  # => "foo"
s              # => "fooello"

s = 'hello'
s[-6] = 'foo'  # Raises IndexError: index -6 out of string.

形式 self[start, length] = other_string

给定整数参数 start 和 length，搜索一个长度为 length 个字符（可用时）的子字符串，该子字符串从 start 指定的字符偏移量开始。

如果参数 start 为非负数，则偏移量为 start

s = 'hello'
s[0, 1] = 'foo'  # => "foo"
s                # => "fooello"

s = 'hello'
s[0, 5] = 'foo'  # => "foo"
s                # => "foo"

s = 'hello'
s[0, 9] = 'foo'  # => "foo"
s                # => "foo"

s = 'hello'
s[2, 0] = 'foo'  # => "foo"
s                # => "hefoollo"

s = 'hello'
s[2, -1] = 'foo' # Raises IndexError: negative length -1.

如果参数 start 为负数，则从 self 的末尾开始倒数

s = 'hello'
s[-1, 1] = 'foo' # => "foo"
s                # => "hellfoo"

s = 'hello'
s[-1, 9] = 'foo' # => "foo"
s                # => "hellfoo"

s = 'hello'
s[-5, 2] = 'foo' # => "foo"
s                # => "foollo"

s = 'hello'
s[-3, 0] = 'foo' # => "foo"
s                # => "hefoollo"

s = 'hello'
s[-6, 2] = 'foo' # Raises IndexError: index -6 out of string.

特殊情况：如果 start 等于 self 的长度，则将参数追加到 self

s = 'hello'
s[5, 3] = 'foo' # => "foo"
s               # => "hellofoo"

形式 self[range] = other_string

给定 Range 参数 range，等同于 self[range.start, range.size] = other_string

s0 = 'hello'
s1 = 'hello'
s0[0..2] = 'foo' # => "foo"
s1[0, 3] = 'foo' # => "foo"
s0               # => "foolo"
s1               # => "foolo"

s = 'hello'
s[0...2] = 'foo' # => "foo"
s                # => "foollo"

s = 'hello'
s[0...0] = 'foo' # => "foo"
s                # => "foohello"

s = 'hello'
s[9..10] = 'foo' # Raises RangeError: 9..10 out of range

形式 self[regexp, capture = 0] = other_string

给定 Regexp 参数 regexp 和 capture 为零，在 self 中搜索匹配的子字符串；更新与 Regexp 相关的全局变量

s = 'hello'
s[/l/] = 'L'       # => "L"
[$`, $&, $']       # => ["he", "l", "lo"]
s[/eLlo/] = 'owdy' # => "owdy"
[$`, $&, $']       # => ["h", "eLlo", ""]
s[/eLlo/] = 'owdy' # Raises IndexError: regexp not matched.
[$`, $&, $']       # => [nil, nil, nil]

当 capture 为正整数 n 时，搜索第 n 个匹配组

s = 'hello'
s[/(h)(e)(l+)(o)/] = 'foo'    # => "foo"
[$`, $&, $']                  # => ["", "hello", ""]

s = 'hello'
s[/(h)(e)(l+)(o)/, 1] = 'foo' # => "foo"
s                             # => "fooello"
[$`, $&, $']                  # => ["", "hello", ""]

s = 'hello'
s[/(h)(e)(l+)(o)/, 2] = 'foo' # => "foo"
s                             # => "hfoollo"
[$`, $&, $']                  # => ["", "hello", ""]

s = 'hello'
s[/(h)(e)(l+)(o)/, 4] = 'foo' # => "foo"
s                             # => "hellfoo"
[$`, $&, $']                  # => ["", "hello", ""]

s = 'hello'
# => "hello"
s[/(h)(e)(l+)(o)/, 5] = 'foo  # Raises IndexError: index 5 out of regexp.

s = 'hello'
s[/nosuch/] = 'foo'           # Raises IndexError: regexp not matched.

形式 self[substring] = other_string

给定字符串参数 substring

s = 'hello'
s['l'] = 'foo'  # => "foo"
s  # => "hefoolo"

s = 'hello'
s['ll'] = 'foo'  # => "foo"
s  # => "hefooo"

s = 'Привет'
s['ив'] = 'foo'  # => "foo"
s  # => "Прfooет"

s = 'こんにちは'
s['んにち'] = 'foo'  # => "foo"
s  # => "こfooは"

s['nosuch'] = 'foo' # Raises IndexError: string not matched.

相关：参见修改。

append_as_bytes(*objects) → self

Source

VALUE
rb_str_append_as_bytes(int argc, VALUE *argv, VALUE str)
{
    long needed_capacity = 0;
    volatile VALUE t0;
    enum ruby_value_type *types = ALLOCV_N(enum ruby_value_type, t0, argc);

    for (int index = 0; index < argc; index++) {
        VALUE obj = argv[index];
        enum ruby_value_type type = types[index] = rb_type(obj);
        switch (type) {
          case T_FIXNUM:
          case T_BIGNUM:
            needed_capacity++;
            break;
          case T_STRING:
            needed_capacity += RSTRING_LEN(obj);
            break;
          default:
            rb_raise(
                rb_eTypeError,
                "wrong argument type %"PRIsVALUE" (expected String or Integer)",
                rb_obj_class(obj)
            );
            break;
        }
    }

    str_ensure_available_capa(str, needed_capacity);
    char *sptr = RSTRING_END(str);

    for (int index = 0; index < argc; index++) {
        VALUE obj = argv[index];
        enum ruby_value_type type = types[index];
        switch (type) {
          case T_FIXNUM:
          case T_BIGNUM: {
            argv[index] = obj = rb_int_and(obj, INT2FIX(0xff));
            char byte = (char)(NUM2INT(obj) & 0xFF);
            *sptr = byte;
            sptr++;
            break;
          }
          case T_STRING: {
            const char *ptr;
            long len;
            RSTRING_GETMEM(obj, ptr, len);
            memcpy(sptr, ptr, len);
            sptr += len;
            break;
          }
          default:
            rb_bug("append_as_bytes arguments should have been validated");
        }
    }

    STR_SET_LEN(str, RSTRING_LEN(str) + needed_capacity);
    TERM_FILL(sptr, TERM_LEN(str)); /* sentinel */

    int cr = ENC_CODERANGE(str);
    switch (cr) {
      case ENC_CODERANGE_7BIT: {
        for (int index = 0; index < argc; index++) {
            VALUE obj = argv[index];
            enum ruby_value_type type = types[index];
            switch (type) {
              case T_FIXNUM:
              case T_BIGNUM: {
                if (!ISASCII(NUM2INT(obj))) {
                    goto clear_cr;
                }
                break;
              }
              case T_STRING: {
                if (ENC_CODERANGE(obj) != ENC_CODERANGE_7BIT) {
                    goto clear_cr;
                }
                break;
              }
              default:
                rb_bug("append_as_bytes arguments should have been validated");
            }
        }
        break;
      }
      case ENC_CODERANGE_VALID:
        if (ENCODING_GET_INLINED(str) == ENCINDEX_ASCII_8BIT) {
            goto keep_cr;
        }
        else {
            goto clear_cr;
        }
        break;
      default:
        goto clear_cr;
        break;
    }

    RB_GC_GUARD(t0);

  clear_cr:
    // If no fast path was hit, we clear the coderange.
    // append_as_bytes is predominantly meant to be used in
    // buffering situation, hence it's likely the coderange
    // will never be scanned, so it's not worth spending time
    // precomputing the coderange except for simple and common
    // situations.
    ENC_CODERANGE_CLEAR(str);
  keep_cr:
    return str;
}

将 objects 中的每个对象连接到 self；返回 self；不执行任何编码验证或转换

s = 'foo'
s.append_as_bytes(" \xE2\x82") # => "foo \xE2\x82"
s.valid_encoding?              # => false
s.append_as_bytes("\xAC 12")
s.valid_encoding?              # => true

当给定对象是整数时，该值被视为一个 8 位字节；如果整数占用多个字节（即大于 255），则只追加低位字节（类似于 String#setbyte）

s = ""
s.append_as_bytes(0, 257) # => "\u0000\u0001"
s.bytesize                # => 2

相关：参见修改。

ascii_only? → true or false

Source

static VALUE
rb_str_is_ascii_only_p(VALUE str)
{
    int cr = rb_enc_str_coderange(str);

    return RBOOL(cr == ENC_CODERANGE_7BIT);
}

返回 self 是否仅包含 ASCII 字符

'abc'.ascii_only?         # => true
"abc\u{6666}".ascii_only? # => false

相关：参见查询。

b → new_string

Source

static VALUE
rb_str_b(VALUE str)
{
    VALUE str2;
    if (STR_EMBED_P(str)) {
        str2 = str_alloc_embed(rb_cString, RSTRING_LEN(str) + TERM_LEN(str));
    }
    else {
        str2 = str_alloc_heap(rb_cString);
    }
    str_replace_shared_without_enc(str2, str);

    if (rb_enc_asciicompat(STR_ENC_GET(str))) {
        // BINARY strings can never be broken; they're either 7-bit ASCII or VALID.
        // If we know the receiver's code range then we know the result's code range.
        int cr = ENC_CODERANGE(str);
        switch (cr) {
          case ENC_CODERANGE_7BIT:
            ENC_CODERANGE_SET(str2, ENC_CODERANGE_7BIT);
            break;
          case ENC_CODERANGE_BROKEN:
          case ENC_CODERANGE_VALID:
            ENC_CODERANGE_SET(str2, ENC_CODERANGE_VALID);
            break;
          default:
            ENC_CODERANGE_CLEAR(str2);
            break;
        }
    }

    return str2;
}

返回一个具有 ASCII-8BIT 编码的 self 的副本；底层字节不会被修改

s = "\x99"
s.encoding   # => #<Encoding:UTF-8>
t = s.b      # => "\x99"
t.encoding   # => #<Encoding:ASCII-8BIT>

s = "\u4095" # => "䂕"
s.encoding   # => #<Encoding:UTF-8>
s.bytes      # => [228, 130, 149]
t = s.b      # => "\xE4\x82\x95"
t.encoding   # => #<Encoding:ASCII-8BIT>
t.bytes      # => [228, 130, 149]

Source

static VALUE
rb_str_byteindex_m(int argc, VALUE *argv, VALUE str)
{
    VALUE sub;
    VALUE initpos;
    long pos;

    if (rb_scan_args(argc, argv, "11", &sub, &initpos) == 2) {
        long slen = RSTRING_LEN(str);
        pos = NUM2LONG(initpos);
        if (pos < 0 ? (pos += slen) < 0 : pos > slen) {
            if (RB_TYPE_P(sub, T_REGEXP)) {
                rb_backref_set(Qnil);
            }
            return Qnil;
        }
    }
    else {
        pos = 0;
    }

    str_ensure_byte_pos(str, pos);

    if (RB_TYPE_P(sub, T_REGEXP)) {
        if (rb_reg_search(sub, str, pos, 0) >= 0) {
            VALUE match = rb_backref_get();
            struct re_registers *regs = RMATCH_REGS(match);
            pos = BEG(0);
            return LONG2NUM(pos);
        }
    }
    else {
        StringValue(sub);
        pos = rb_str_byteindex(str, sub, pos);
        if (pos >= 0) return LONG2NUM(pos);
    }
    return Qnil;
}

返回 self 中由 object（字符串或 Regexp）和 offset 指定的子字符串的 0 基索引，如果没有这样的子字符串，则返回 nil；返回的索引是字节（不是字符）的计数。

当 object 是字符串时，返回第一个找到的等于 object 的子字符串的索引

s = 'foo'          # => "foo"
s.size             # => 3 # Three 1-byte characters.
s.bytesize         # => 3 # Three bytes.
s.byteindex('f')   # => 0
s.byteindex('o')   # => 1
s.byteindex('oo')  # => 1
s.byteindex('ooo') # => nil

当 object 是 Regexp 时，返回第一个找到的匹配 object 的子字符串的索引；更新与 Regexp 相关的全局变量

s = 'foo'
s.byteindex(/f/)   # => 0
$~                 # => #<MatchData "f">
s.byteindex(/o/)   # => 1
s.byteindex(/oo/)  # => 1
s.byteindex(/ooo/) # => nil
$~                 # => nil

整数参数 offset（如果给定）指定搜索开始的字节的 0 基索引。

当 offset 为非负数时，搜索从字节位置 offset 开始

s = 'foo'
s.byteindex('o', 1) # => 1
s.byteindex('o', 2) # => 2
s.byteindex('o', 3) # => nil

当 offset 为负数时，从 self 的末尾开始倒数

s = 'foo'
s.byteindex('o', -1) # => 2
s.byteindex('o', -2) # => 1
s.byteindex('o', -3) # => 1
s.byteindex('o', -4) # => nil

如果 offset 处的字节不是字符的第一个字节，则引发 IndexError

s = "\uFFFF\uFFFF"       # => "\uFFFF\uFFFF"
s.size                   # => 2 # Two 3-byte characters.
s.bytesize               # => 6 # Six bytes.
s.byteindex("\uFFFF")    # => 0
s.byteindex("\uFFFF", 1) # Raises IndexError
s.byteindex("\uFFFF", 2) # Raises IndexError
s.byteindex("\uFFFF", 3) # => 3
s.byteindex("\uFFFF", 4) # Raises IndexError
s.byteindex("\uFFFF", 5) # Raises IndexError
s.byteindex("\uFFFF", 6) # => nil

Source

static VALUE
rb_str_byterindex_m(int argc, VALUE *argv, VALUE str)
{
    VALUE sub;
    VALUE initpos;
    long pos, len = RSTRING_LEN(str);

    if (rb_scan_args(argc, argv, "11", &sub, &initpos) == 2) {
        pos = NUM2LONG(initpos);
        if (pos < 0 && (pos += len) < 0) {
            if (RB_TYPE_P(sub, T_REGEXP)) {
                rb_backref_set(Qnil);
            }
            return Qnil;
        }
        if (pos > len) pos = len;
    }
    else {
        pos = len;
    }

    str_ensure_byte_pos(str, pos);

    if (RB_TYPE_P(sub, T_REGEXP)) {
        if (rb_reg_search(sub, str, pos, 1) >= 0) {
            VALUE match = rb_backref_get();
            struct re_registers *regs = RMATCH_REGS(match);
            pos = BEG(0);
            return LONG2NUM(pos);
        }
    }
    else {
        StringValue(sub);
        pos = rb_str_byterindex(str, sub, pos);
        if (pos >= 0) return LONG2NUM(pos);
    }
    return Qnil;
}

返回 self 中由给定的 object（字符串或 Regexp）和 offset 指定的子字符串的最后一个匹配项的 0 基索引，如果没有这样的子字符串，则返回 nil；返回的索引是字节（不是字符）的计数。

当 object 是字符串时，返回最后一个找到的等于 object 的子字符串的索引

s = 'foo'           # => "foo"
s.size              # => 3 # Three 1-byte characters.
s.bytesize          # => 3 # Three bytes.
s.byterindex('f')   # => 0
s.byterindex('o')   # => 2
s.byterindex('oo')  # => 1
s.byterindex('ooo') # => nil

当 object 是 Regexp 时，返回最后一个找到的匹配 object 的子字符串的索引；更新与 Regexp 相关的全局变量

s = 'foo'
s.byterindex(/f/)   # => 0
$~                  # => #<MatchData "f">
s.byterindex(/o/)   # => 2
s.byterindex(/oo/)  # => 1
s.byterindex(/ooo/) # => nil
$~                  # => nil

最后一个匹配意味着从可能的最后一个位置开始，而不是最长匹配的最后一个

s = 'foo'
s.byterindex(/o+/) # => 2
$~                 #=> #<MatchData "o">

要获得最后一个最长匹配，请使用负向前瞻

s = 'foo'
s.byterindex(/(?<!o)o+/) # => 1
$~                       # => #<MatchData "oo">

或者使用方法 byteindex 和负向前瞻

s = 'foo'
s.byteindex(/o+(?!.*o)/) # => 1
$~                       #=> #<MatchData "oo">

整数参数 offset（如果给定）指定搜索结束的字节的 0 基索引。

当 offset 为非负数时，搜索在字节位置 offset 结束

s = 'foo'
s.byterindex('o', 0) # => nil
s.byterindex('o', 1) # => 1
s.byterindex('o', 2) # => 2
s.byterindex('o', 3) # => 2

当 offset 为负数时，从 self 的末尾开始倒数

s = 'foo'
s.byterindex('o', -1) # => 2
s.byterindex('o', -2) # => 1
s.byterindex('o', -3) # => nil

如果 offset 处的字节不是字符的第一个字节，则引发 IndexError

s = "\uFFFF\uFFFF"        # => "\uFFFF\uFFFF"
s.size                    # => 2 # Two 3-byte characters.
s.bytesize                # => 6 # Six bytes.
s.byterindex("\uFFFF")    # => 3
s.byterindex("\uFFFF", 1) # Raises IndexError
s.byterindex("\uFFFF", 2) # Raises IndexError
s.byterindex("\uFFFF", 3) # => 3
s.byterindex("\uFFFF", 4) # Raises IndexError
s.byterindex("\uFFFF", 5) # Raises IndexError
s.byterindex("\uFFFF", 6) # => nil

相关：参见查询。

bytes → array_of_bytes

Source

static VALUE
rb_str_bytes(VALUE str)
{
    VALUE ary = WANTARRAY("bytes", RSTRING_LEN(str));
    return rb_str_enumerate_bytes(str, ary);
}

返回 self 中的字节数组

'hello'.bytes  # => [104, 101, 108, 108, 111]
'Привет'.bytes # => [208, 159, 209, 128, 208, 184, 208, 178, 208, 181, 209, 130]
'こんにちは'.bytes
# => [227, 129, 147, 227, 130, 147, 227, 129, 171, 227, 129, 161, 227, 129, 175]

相关：参见转换为非字符串。

bytesize → integer

Source

VALUE
rb_str_bytesize(VALUE str)
{
    return LONG2NUM(RSTRING_LEN(str));
}

返回 self 中的字节数。

请注意，字节数可能与字符数（由 size 返回）不同

s = 'foo'
s.bytesize # => 3
s.size     # => 3
s = 'Привет'
s.bytesize # => 12
s.size     # => 6
s = 'こんにちは'
s.bytesize # => 15
s.size     # => 5

byteslice(range) → string or nil

Source

static VALUE
rb_str_byteslice(int argc, VALUE *argv, VALUE str)
{
    if (argc == 2) {
        long beg = NUM2LONG(argv[0]);
        long len = NUM2LONG(argv[1]);
        return str_byte_substr(str, beg, len, TRUE);
    }
    rb_check_arity(argc, 1, 2);
    return str_byte_aref(str, argv[0]);
}

返回 self 的子字符串，如果无法构造子字符串，则返回 nil。

给定整数参数 offset 和 length，返回从给定 offset 开始，长度为给定 length（可用时）的子字符串

s = '0123456789'   # => "0123456789"
s.byteslice(2)     # => "2"
s.byteslice(200)   # => nil
s.byteslice(4, 3)  # => "456"
s.byteslice(4, 30) # => "456789"

如果 length 为负数或 offset 超出 self 的范围，则返回 nil

s.byteslice(4, -1) # => nil
s.byteslice(40, 2) # => nil

如果 offset 为负数，则从 self 的末尾开始倒数

s = '0123456789'   # => "0123456789"
s.byteslice(-4)    # => "6"
s.byteslice(-4, 3) # => "678"

给定 Range 参数 range，返回 byteslice(range.begin, range.size)

s = '0123456789'    # => "0123456789"
s.byteslice(4..6)   # => "456"
s.byteslice(-6..-4) # => "456"
s.byteslice(5..2)   # => "" # range.size is zero.
s.byteslice(40..42) # => nil

起始和结束偏移量不必是字符边界

s = 'こんにちは'
s.byteslice(0, 3) # => "こ"
s.byteslice(1, 3) # => "\x81\x93\xE3"

self 和返回的子字符串的编码始终相同

s.encoding                 # => #<Encoding:UTF-8>
s.byteslice(0, 3).encoding # => #<Encoding:UTF-8>
s.byteslice(1, 3).encoding # => #<Encoding:UTF-8>

但是，根据字符边界，返回的子字符串的编码可能无效

s.valid_encoding?                 # => true
s.byteslice(0, 3).valid_encoding? # => true
s.byteslice(1, 3).valid_encoding? # => false

相关：参见转换为新字符串。

bytesplice(offset, length, str) → self

bytesplice(offset, length, str, str_offset, str_length) → self

bytesplice(range, str) → self

bytesplice(range, str, str_range) → self

Source

static VALUE
rb_str_bytesplice(int argc, VALUE *argv, VALUE str)
{
    long beg, len, vbeg, vlen;
    VALUE val;
    int cr;

    rb_check_arity(argc, 2, 5);
    if (!(argc == 2 || argc == 3 || argc == 5)) {
        rb_raise(rb_eArgError, "wrong number of arguments (given %d, expected 2, 3, or 5)", argc);
    }
    if (argc == 2 || (argc == 3 && !RB_INTEGER_TYPE_P(argv[0]))) {
        if (!rb_range_beg_len(argv[0], &beg, &len, RSTRING_LEN(str), 2)) {
            rb_raise(rb_eTypeError, "wrong argument type %s (expected Range)",
                     rb_builtin_class_name(argv[0]));
        }
        val = argv[1];
        StringValue(val);
        if (argc == 2) {
            /* bytesplice(range, str) */
            vbeg = 0;
            vlen = RSTRING_LEN(val);
        }
        else {
            /* bytesplice(range, str, str_range) */
            if (!rb_range_beg_len(argv[2], &vbeg, &vlen, RSTRING_LEN(val), 2)) {
                rb_raise(rb_eTypeError, "wrong argument type %s (expected Range)",
                         rb_builtin_class_name(argv[2]));
            }
        }
    }
    else {
        beg = NUM2LONG(argv[0]);
        len = NUM2LONG(argv[1]);
        val = argv[2];
        StringValue(val);
        if (argc == 3) {
            /* bytesplice(index, length, str) */
            vbeg = 0;
            vlen = RSTRING_LEN(val);
        }
        else {
            /* bytesplice(index, length, str, str_index, str_length) */
            vbeg = NUM2LONG(argv[3]);
            vlen = NUM2LONG(argv[4]);
        }
    }
    str_check_beg_len(str, &beg, &len);
    str_check_beg_len(val, &vbeg, &vlen);
    str_modify_keep_cr(str);

    if (RB_UNLIKELY(ENCODING_GET_INLINED(str) != ENCODING_GET_INLINED(val))) {
        rb_enc_associate(str, rb_enc_check(str, val));
    }

    rb_str_update_1(str, beg, len, val, vbeg, vlen);
    cr = ENC_CODERANGE_AND(ENC_CODERANGE(str), ENC_CODERANGE(val));
    if (cr != ENC_CODERANGE_BROKEN)
        ENC_CODERANGE_SET(str, cr);
    return str;
}

用给定字符串 str 中的源字节替换 self 中目标字节；返回 self。

在第一种形式中，参数 offset 和 length 确定目标字节，而源字节是给定 str 的所有字节

'0123456789'.bytesplice(0, 3, 'abc')  # => "abc3456789"
'0123456789'.bytesplice(3, 3, 'abc')  # => "012abc6789"
'0123456789'.bytesplice(0, 50, 'abc') # => "abc"
'0123456789'.bytesplice(50, 3, 'abc') # Raises IndexError.

目标字节数和源字节数可能不同

'0123456789'.bytesplice(0, 6, 'abc') # => "abc6789"      # Shorter source.
'0123456789'.bytesplice(0, 1, 'abc') # => "abc123456789" # Shorter target.

任一计数都可能为零（即，指定一个空字符串）

'0123456789'.bytesplice(0, 3, '')    # => "3456789"       # Empty source.
'0123456789'.bytesplice(0, 0, 'abc') # => "abc0123456789" # Empty target.

在第二种形式中，与第一种形式一样，参数 offset 和 length 确定目标字节；参数 str 包含源字节，额外的参数 str_offset 和 str_length 确定实际源字节

'0123456789'.bytesplice(0, 3, 'abc', 0, 3) # => "abc3456789"
'0123456789'.bytesplice(0, 3, 'abc', 1, 1) # => "b3456789"      # Shorter source.
'0123456789'.bytesplice(0, 1, 'abc', 0, 3) # => "abc123456789"  # Shorter target.
'0123456789'.bytesplice(0, 3, 'abc', 1, 0) # => "3456789"       # Empty source.
'0123456789'.bytesplice(0, 0, 'abc', 0, 3) # => "abc0123456789" # Empty target.

在第三种形式中，参数 range 确定目标字节，而源字节是给定 str 的所有字节

'0123456789'.bytesplice(0..2, 'abc')  # => "abc3456789"
'0123456789'.bytesplice(3..5, 'abc')  # => "012abc6789"
'0123456789'.bytesplice(0..5, 'abc')  # => "abc6789"       # Shorter source.
'0123456789'.bytesplice(0..0, 'abc')  # => "abc123456789"  # Shorter target.
'0123456789'.bytesplice(0..2, '')     # => "3456789"       # Empty source.
'0123456789'.bytesplice(0...0, 'abc') # => "abc0123456789" # Empty target.

在第四种形式中，与第三种形式一样，参数 range 确定目标字节；参数 str 包含源字节，额外的参数 str_range 确定实际源字节

'0123456789'.bytesplice(0..2, 'abc', 0..2)  # => "abc3456789"
'0123456789'.bytesplice(3..5, 'abc', 0..2)  # => "012abc6789"
'0123456789'.bytesplice(0..2, 'abc', 0..1)  # => "ab3456789"     # Shorter source.
'0123456789'.bytesplice(0..1, 'abc', 0..2)  # => "abc23456789"   # Shorter target.
'0123456789'.bytesplice(0..2, 'abc', 0...0) # => "3456789"       # Empty source.
'0123456789'.bytesplice(0...0, 'abc', 0..2) # => "abc0123456789" # Empty target.

在任何形式中，源和目标的所有开始和结束都必须是字符边界。

在这些示例中，self 包含五个 3 字节字符，因此在偏移量 0、3、6、9、12 和 15 处有字符边界。

'こんにちは'.bytesplice(0, 3, 'abc') # => "abcんにちは"
'こんにちは'.bytesplice(1, 3, 'abc') # Raises IndexError.
'こんにちは'.bytesplice(0, 2, 'abc') # Raises IndexError.

capitalize(mapping = :ascii) → new_string

Source

static VALUE
rb_str_capitalize(int argc, VALUE *argv, VALUE str)
{
    rb_encoding *enc;
    OnigCaseFoldType flags = ONIGENC_CASE_UPCASE | ONIGENC_CASE_TITLECASE;
    VALUE ret;

    flags = check_case_options(argc, argv, flags);
    enc = str_true_enc(str);
    if (RSTRING_LEN(str) == 0 || !RSTRING_PTR(str)) return str;
    if (flags&ONIGENC_CASE_ASCII_ONLY) {
        ret = rb_str_new(0, RSTRING_LEN(str));
        rb_str_ascii_casemap(str, ret, &flags, enc);
    }
    else {
        ret = rb_str_casemap(str, &flags, enc);
    }
    return ret;
}

返回一个包含 self 中字符的字符串，其中每个字符的大小写可能已更改

第一个字符大写。
所有其他字符都小写。

示例

'hello'.capitalize  # => "Hello"
'HELLO'.capitalize  # => "Hello"
'straße'.capitalize # => "Straße"  # Lowercase 'ß' not changed.
'STRAẞE'.capitalize # => "Straße"  # Uppercase 'ẞ' downcased to 'ß'.
'привет'.capitalize # => "Привет"
'ПРИВЕТ'.capitalize # => "Привет"

某些字符（以及某些字符集）没有大写和小写的版本；参见大小写映射

s = '1, 2, 3, ...'
s.capitalize == s # => true
s = 'こんにちは'
s.capitalize == s # => true

大小写受给定的 mapping 影响，该映射可以是 :ascii、:fold 或 :turkic；参见大小写映射。

Source

static VALUE
rb_str_capitalize_bang(int argc, VALUE *argv, VALUE str)
{
    rb_encoding *enc;
    OnigCaseFoldType flags = ONIGENC_CASE_UPCASE | ONIGENC_CASE_TITLECASE;

    flags = check_case_options(argc, argv, flags);
    str_modify_keep_cr(str);
    enc = str_true_enc(str);
    if (RSTRING_LEN(str) == 0 || !RSTRING_PTR(str)) return Qnil;
    if (flags&ONIGENC_CASE_ASCII_ONLY)
        rb_str_ascii_casemap(str, str, &flags, enc);
    else
        str_shared_replace(str, rb_str_casemap(str, &flags, enc));

    if (ONIGENC_CASE_MODIFIED&flags) return str;
    return Qnil;
}

类似于 String#capitalize，但不同之处在于

更改 self 中的字符大小写（而不是 self 的副本）。
如果进行了任何更改，则返回 self，否则返回 nil。

Source

static VALUE
rb_str_casecmp(VALUE str1, VALUE str2)
{
    VALUE s = rb_check_string_type(str2);
    if (NIL_P(s)) {
        return Qnil;
    }
    return str_casecmp(str1, s);
}

忽略大小写，比较 self 和 other_string；返回

如果 self.downcase 小于 other_string.downcase，则为 -1。
如果两者相等，则为 0。
如果 self.downcase 大于 other_string.downcase，则为 1。
如果两者无法比较，则为 nil。

参见大小写映射。

示例

'foo'.casecmp('goo')  # => -1
'goo'.casecmp('foo')  # => 1
'foo'.casecmp('food') # => -1
'food'.casecmp('foo') # => 1
'FOO'.casecmp('foo')  # => 0
'foo'.casecmp('FOO')  # => 0
'foo'.casecmp(1)      # => nil

相关：参见转换为新字符串。

chars → array_of_characters

Source

static VALUE
rb_str_chars(VALUE str)
{
    VALUE ary = WANTARRAY("chars", rb_str_strlen(str));
    return rb_str_enumerate_chars(str, ary);
}

返回 self 中的字符数组

'hello'.chars     # => ["h", "e", "l", "l", "o"]
'Привет'.chars    # => ["П", "р", "и", "в", "е", "т"]
'こんにちは'.chars # => ["こ", "ん", "に", "ち", "は"]
''.chars          # => []

相关：参见转换为非字符串。

chomp(line_sep = $/) → new_string

Source

static VALUE
rb_str_chomp(int argc, VALUE *argv, VALUE str)
{
    VALUE rs = chomp_rs(argc, argv);
    if (NIL_P(rs)) return str_duplicate(rb_cString, str);
    return rb_str_subseq(str, 0, chompped_length(str, rs));
}

返回一个从 self 复制的新字符串，其中尾部字符可能被删除

当 line_sep 是 "\n" 时，如果最后的一个或两个字符是 "\r"、"\n" 或 "\r\n"（但不是 "\n\r"），则删除它们

$/                    # => "\n"
"abc\r".chomp         # => "abc"
"abc\n".chomp         # => "abc"
"abc\r\n".chomp       # => "abc"
"abc\n\r".chomp       # => "abc\n"
"тест\r\n".chomp      # => "тест"
"こんにちは\r\n".chomp  # => "こんにちは"

当 line_sep 是 ''（空字符串）时，删除多个尾部的 "\n" 或 "\r\n"（但不是 "\r" 或 "\n\r"）

"abc\n\n\n".chomp('')           # => "abc"
"abc\r\n\r\n\r\n".chomp('')     # => "abc"
"abc\n\n\r\n\r\n\n\n".chomp('') # => "abc"
"abc\n\r\n\r\n\r".chomp('')     # => "abc\n\r\n\r\n\r"
"abc\r\r\r".chomp('')           # => "abc\r\r\r"

当 line_sep 既不是 "\n" 也不是 '' 时，如果存在单个尾部行分隔符，则将其删除

'abcd'.chomp('cd')   # => "ab"
'abcdcd'.chomp('cd') # => "abcd"
'abcd'.chomp('xx')   # => "abcd"

相关：参见转换为新字符串。

chomp!(line_sep = $/) → self or nil

Source

static VALUE
rb_str_chomp_bang(int argc, VALUE *argv, VALUE str)
{
    VALUE rs;
    str_modifiable(str);
    if (RSTRING_LEN(str) == 0 && argc < 2) return Qnil;
    rs = chomp_rs(argc, argv);
    if (NIL_P(rs)) return Qnil;
    return rb_str_chomp_string(str, rs);
}

类似于 String#chomp，但不同之处在于

从 self 中删除尾部字符（而不是从 self 的副本）。
如果删除了任何字符，则返回 self，否则返回 nil。

相关：参见修改。

chop → new_string

Source

static VALUE
rb_str_chop(VALUE str)
{
    return rb_str_subseq(str, 0, chopped_length(str));
}

返回一个从 self 复制的新字符串，其中尾部字符可能被删除。

如果最后两个字符是 "\r\n"，则删除它们。

"abc\r\n".chop      # => "abc"
"тест\r\n".chop     # => "тест"
"こんにちは\r\n".chop # => "こんにちは"

否则，如果存在最后一个字符，则删除它。

'abcd'.chop     # => "abc"
'тест'.chop     # => "тес"
'こんにちは'.chop # => "こんにち"
''.chop         # => ""

如果您只需要删除字符串末尾的换行符，String#chomp 是更好的选择。

相关：参见转换为新字符串。

chop! → self or nil

Source

static VALUE
rb_str_chop_bang(VALUE str)
{
    str_modify_keep_cr(str);
    if (RSTRING_LEN(str) > 0) {
        long len;
        len = chopped_length(str);
        STR_SET_LEN(str, len);
        TERM_FILL(&RSTRING_PTR(str)[len], TERM_LEN(str));
        if (ENC_CODERANGE(str) != ENC_CODERANGE_7BIT) {
            ENC_CODERANGE_CLEAR(str);
        }
        return str;
    }
    return Qnil;
}

类似于 String#chop，但不同之处在于

从 self 中删除尾部字符（而不是从 self 的副本）。
如果删除了任何字符，则返回 self，否则返回 nil。

相关：参见修改。

chr → string

Source

static VALUE
rb_str_chr(VALUE str)
{
    return rb_str_substr(str, 0, 1);
}

返回一个包含 self 第一个字符的字符串

'hello'.chr     # => "h"
'тест'.chr      # => "т"
'こんにちは'.chr # => "こ"
''.chr          # => ""

相关：参见转换为新字符串。

clear → self

Source

static VALUE
rb_str_clear(VALUE str)
{
    str_discard(str);
    STR_SET_EMBED(str);
    STR_SET_LEN(str, 0);
    RSTRING_PTR(str)[0] = 0;
    if (rb_enc_asciicompat(STR_ENC_GET(str)))
        ENC_CODERANGE_SET(str, ENC_CODERANGE_7BIT);
    else
        ENC_CODERANGE_SET(str, ENC_CODERANGE_VALID);
    return str;
}

删除 self 的内容

s = 'foo'
s.clear # => ""
s       # => ""

相关：参见修改。

codepoints → array_of_integers

Source

static VALUE
rb_str_codepoints(VALUE str)
{
    VALUE ary = WANTARRAY("codepoints", rb_str_strlen(str));
    return rb_str_enumerate_codepoints(str, ary);
}

返回 self 中码点的数组；每个码点是字符的整数值

'hello'.codepoints     # => [104, 101, 108, 108, 111]
'тест'.codepoints      # => [1090, 1077, 1089, 1090]
'こんにちは'.codepoints # => [12371, 12435, 12395, 12385, 12399]
''.codepoints          # => []

相关：参见转换为非字符串。

concat(*objects) → string

Source

static VALUE
rb_str_concat_multi(int argc, VALUE *argv, VALUE str)
{
    str_modifiable(str);

    if (argc == 1) {
        return rb_str_concat(str, argv[0]);
    }
    else if (argc > 1) {
        int i;
        VALUE arg_str = rb_str_tmp_new(0);
        rb_enc_copy(arg_str, str);
        for (i = 0; i < argc; i++) {
            rb_str_concat(arg_str, argv[i]);
        }
        rb_str_buf_append(str, arg_str);
    }

    return str;
}

将 objects 中的每个对象连接到 self；返回 self

'foo'.concat('bar', 'baz') # => "foobarbaz"

对于每个给定的整数对象 object，该值被视为一个码点，并在连接前转换为字符

'foo'.concat(32, 'bar', 32, 'baz') # => "foo bar baz" # Embeds spaces.
'те'.concat(1089, 1090)            # => "тест"
'こん'.concat(12395, 12385, 12399)  # => "こんにちは"

相关：参见转换为新字符串。

count(*selectors) → integer

Source

static VALUE
rb_str_count(int argc, VALUE *argv, VALUE str)
{
    char table[TR_TABLE_SIZE];
    rb_encoding *enc = 0;
    VALUE del = 0, nodel = 0, tstr;
    char *s, *send;
    int i;
    int ascompat;
    size_t n = 0;

    rb_check_arity(argc, 1, UNLIMITED_ARGUMENTS);

    tstr = argv[0];
    StringValue(tstr);
    enc = rb_enc_check(str, tstr);
    if (argc == 1) {
        const char *ptstr;
        if (RSTRING_LEN(tstr) == 1 && rb_enc_asciicompat(enc) &&
            (ptstr = RSTRING_PTR(tstr),
             ONIGENC_IS_ALLOWED_REVERSE_MATCH(enc, (const unsigned char *)ptstr, (const unsigned char *)ptstr+1)) &&
            !is_broken_string(str)) {
            int clen;
            unsigned char c = rb_enc_codepoint_len(ptstr, ptstr+1, &clen, enc);

            s = RSTRING_PTR(str);
            if (!s || RSTRING_LEN(str) == 0) return INT2FIX(0);
            send = RSTRING_END(str);
            while (s < send) {
                if (*(unsigned char*)s++ == c) n++;
            }
            return SIZET2NUM(n);
        }
    }

    tr_setup_table(tstr, table, TRUE, &del, &nodel, enc);
    for (i=1; i<argc; i++) {
        tstr = argv[i];
        StringValue(tstr);
        enc = rb_enc_check(str, tstr);
        tr_setup_table(tstr, table, FALSE, &del, &nodel, enc);
    }

    s = RSTRING_PTR(str);
    if (!s || RSTRING_LEN(str) == 0) return INT2FIX(0);
    send = RSTRING_END(str);
    ascompat = rb_enc_asciicompat(enc);
    while (s < send) {
        unsigned int c;

        if (ascompat && (c = *(unsigned char*)s) < 0x80) {
            if (table[c]) {
                n++;
            }
            s++;
        }
        else {
            int clen;
            c = rb_enc_codepoint_len(s, send, &clen, enc);
            if (tr_find(c, table, del, nodel)) {
                n++;
            }
            s += clen;
        }
    }

    return SIZET2NUM(n);
}

返回 self 中由给定选择器指定的字符的总数。

对于一个 1 个字符的选择器，返回该字符的实例数

s = 'abracadabra'
s.count('a') # => 5
s.count('b') # => 2
s.count('x') # => 0
s.count('')  # => 0

s = 'тест'
s.count('т')  # => 2
s.count('е')  # => 1

s = 'よろしくお願いします'
s.count('よ')  # => 1
s.count('し')  # => 2

对于一个多字符的选择器，返回所有指定字符的实例数

s = 'abracadabra'
s.count('ab')     # => 7
s.count('abc')    # => 8
s.count('abcd')   # => 9
s.count('abcdr')  # => 11
s.count('abcdrx') # => 11

顺序和重复无关紧要

s.count('ba')   == s.count('ab') # => true
s.count('baab') == s.count('ab') # => true

对于多个选择器，形成一个选择器，该选择器是所有选择器中字符的交集，并返回该选择器的实例数

s = 'abcdefg'
s.count('abcde', 'dcbfg') == s.count('bcd') # => true
s.count('abc', 'def')     == s.count('')    # => true

在字符选择器中，三个字符被特殊处理

脱字符（'^'）用作紧随其后的字符的否定运算符

s = 'abracadabra'
s.count('^bc') # => 8  # Count of all except 'b' and 'c'.

两个字符之间的连字符（'-'）定义了一个字符范围

s = 'abracadabra'
s.count('a-c') # => 8  # Count of all 'a', 'b', and 'c'.

反斜杠（'\'）用作脱字符、连字符或另一个反斜杠的转义符

s = 'abracadabra'
s.count('\^bc')           # => 3  # Count of '^', 'b', and 'c'.
s.count('a\-c')           # => 6  # Count of 'a', '-', and 'c'.
'foo\bar\baz'.count('\\') # => 2  # Count of '\'.

这些用法可以混合

s = 'abracadabra'
s.count('a-cq-t') # => 10  # Multiple ranges.
s.count('ac-d')   # => 7   # Range mixed with plain characters.
s.count('^a-c')   # => 3   # Range mixed with negation.

对于多个选择器，可以使用所有形式，包括否定、范围和转义。

s = 'abracadabra'
s.count('^abc', '^def') == s.count('^abcdef') # => true
s.count('a-e', 'c-g')   == s.count('cde')     # => true
s.count('^abc', 'c-g')  == s.count('defg')    # => true

相关：参见查询。

crypt(salt_str) → new_string

Source

static VALUE
rb_str_crypt(VALUE str, VALUE salt)
{
#ifdef HAVE_CRYPT_R
    VALUE databuf;
    struct crypt_data *data;
#   define CRYPT_END() ALLOCV_END(databuf)
#else
    char *tmp_buf;
    extern char *crypt(const char *, const char *);
#   define CRYPT_END() rb_nativethread_lock_unlock(&crypt_mutex.lock)
#endif
    VALUE result;
    const char *s, *saltp;
    char *res;
#ifdef BROKEN_CRYPT
    char salt_8bit_clean[3];
#endif

    StringValue(salt);
    mustnot_wchar(str);
    mustnot_wchar(salt);
    s = StringValueCStr(str);
    saltp = RSTRING_PTR(salt);
    if (RSTRING_LEN(salt) < 2 || !saltp[0] || !saltp[1]) {
        rb_raise(rb_eArgError, "salt too short (need >=2 bytes)");
    }

#ifdef BROKEN_CRYPT
    if (!ISASCII((unsigned char)saltp[0]) || !ISASCII((unsigned char)saltp[1])) {
        salt_8bit_clean[0] = saltp[0] & 0x7f;
        salt_8bit_clean[1] = saltp[1] & 0x7f;
        salt_8bit_clean[2] = '\0';
        saltp = salt_8bit_clean;
    }
#endif
#ifdef HAVE_CRYPT_R
    data = ALLOCV(databuf, sizeof(struct crypt_data));
# ifdef HAVE_STRUCT_CRYPT_DATA_INITIALIZED
    data->initialized = 0;
# endif
    res = crypt_r(s, saltp, data);
#else
    rb_nativethread_lock_lock(&crypt_mutex.lock);
    res = crypt(s, saltp);
#endif
    if (!res) {
        int err = errno;
        CRYPT_END();
        rb_syserr_fail(err, "crypt");
    }
#ifdef HAVE_CRYPT_R
    result = rb_str_new_cstr(res);
    CRYPT_END();
#else
    // We need to copy this buffer because it's static and we need to unlock the mutex
    // before allocating a new object (the string to be returned). If we allocate while
    // holding the lock, we could run GC which fires the VM barrier and causes a deadlock
    // if other ractors are waiting on this lock.
    size_t res_size = strlen(res)+1;
    tmp_buf = ALLOCA_N(char, res_size); // should be small enough to alloca
    memcpy(tmp_buf, res, res_size);
    res = tmp_buf;
    CRYPT_END();
    result = rb_str_new_cstr(res);
#endif
    return result;
}

通过调用 crypt(3) 标准库函数并按顺序将 str 和 salt_str 作为参数来返回生成的字符串。请不要再使用此方法。它是遗留的；仅为向早期 Ruby 脚本提供向后兼容性。出于多种原因，在当代程序中使用它是不好的

C 的 crypt(3) 的行为取决于运行它的操作系统。生成的字符串缺乏数据可移植性。
在某些操作系统（如 Mac OS）上，crypt(3) 永远不会失败（即，它会静默地产生意外结果）。
在某些操作系统（如 Mac OS）上，crypt(3) 不是线程安全的。
所谓的“传统”用法 crypt(3) 非常非常非常薄弱。根据其 manpage，Linux 的传统 crypt(3) 输出只有 2**56 种变体；今天太容易被暴力破解。这就是默认行为。
为了使事情更健壮，一些操作系统实现了所谓的“模块化”用法。要进行此操作，您必须手动进行复杂的 salt_str 参数构建。生成正确的 salt 字符串的失败往往不会产生任何错误；参数中的拼写错误通常是无法检测到的。
- 例如，在以下示例中，第二次调用 String#crypt 是错误的；它在“round=”中有拼写错误（缺少“s”）。但是调用并不失败，而是生成了一些意外的内容。
```
"foo".crypt("$5$rounds=1000$salt$") # OK, proper usage
"foo".crypt("$5$round=1000$salt$")  # Typo not detected
```
即使在“模块化”模式下，一些哈希函数也被认为过时，并且不再推荐使用；例如，模块 $1$ 已被其作者正式放弃：参见 phk.freebsd.dk/sagas/md5crypt_eol/。另一个例子是模块 $3$ 被认为完全损坏：参见 FreeBSD 的 manpage。
在某些操作系统（如 Mac OS）上，没有模块化模式。然而，如上所述，Mac OS 上的 crypt(3) 永远不会失败。这意味着即使您构建了一个正确的 salt 字符串，它仍然会生成一个传统的 DES 哈希，而且您无法知道。
```
"foo".crypt("$5$rounds=1000$salt$") # => "$5fNPQMxC5j6."
```

如果您因为某些原因无法迁移到其他安全且当代的密码哈希算法，请安装 string-crypt gem 并 require 'string/crypt' 以继续使用它。

-self → frozen_string

返回一个等于 self 的已冻结字符串。

当且仅当以下所有条件都为真时，才返回 self

self 已被冻结。
self 是 String 的实例（而不是 String 的子类）
self 上未设置任何实例变量。

否则，返回的字符串是 self 的已冻结副本。

当可能时返回 self 可以节省复制 self 的开销；参见数据去重。

还可能节省复制其他已存在字符串的开销

s0 = 'foo'
s1 = 'foo'
s0.object_id == s1.object_id       # => false
(-s0).object_id == (-s1).object_id # => true

请注意，方法 -@ 对于定义常量很方便

FileName = -'config/database.yml'

虽然其别名 dedup 更适合链式调用

'foo'.dedup.gsub!('o')

相关：参见转换为新字符串。

delete!(*selectors) → self or nil

Source

static VALUE
rb_str_delete_bang(int argc, VALUE *argv, VALUE str)
{
    char squeez[TR_TABLE_SIZE];
    rb_encoding *enc = 0;
    char *s, *send, *t;
    VALUE del = 0, nodel = 0;
    int modify = 0;
    int i, ascompat, cr;

    if (RSTRING_LEN(str) == 0 || !RSTRING_PTR(str)) return Qnil;
    rb_check_arity(argc, 1, UNLIMITED_ARGUMENTS);
    for (i=0; i<argc; i++) {
        VALUE s = argv[i];

        StringValue(s);
        enc = rb_enc_check(str, s);
        tr_setup_table(s, squeez, i==0, &del, &nodel, enc);
    }

    str_modify_keep_cr(str);
    ascompat = rb_enc_asciicompat(enc);
    s = t = RSTRING_PTR(str);
    send = RSTRING_END(str);
    cr = ascompat ? ENC_CODERANGE_7BIT : ENC_CODERANGE_VALID;
    while (s < send) {
        unsigned int c;
        int clen;

        if (ascompat && (c = *(unsigned char*)s) < 0x80) {
            if (squeez[c]) {
                modify = 1;
            }
            else {
                if (t != s) *t = c;
                t++;
            }
            s++;
        }
        else {
            c = rb_enc_codepoint_len(s, send, &clen, enc);

            if (tr_find(c, squeez, del, nodel)) {
                modify = 1;
            }
            else {
                if (t != s) rb_enc_mbcput(c, t, enc);
                t += clen;
                if (cr == ENC_CODERANGE_7BIT) cr = ENC_CODERANGE_VALID;
            }
            s += clen;
        }
    }
    TERM_FILL(t, TERM_LEN(str));
    STR_SET_LEN(str, t - RSTRING_PTR(str));
    ENC_CODERANGE_SET(str, cr);

    if (modify) return str;
    return Qnil;
}

类似于 String#delete，但原地修改 self；如果删除了任何字符，则返回 self，否则返回 nil。

相关：参见修改。

delete_prefix(prefix) → new_string

Source

static VALUE
rb_str_delete_prefix(VALUE str, VALUE prefix)
{
    long prefixlen;

    prefixlen = deleted_prefix_length(str, prefix);
    if (prefixlen <= 0) return str_duplicate(rb_cString, str);

    return rb_str_subseq(str, prefixlen, RSTRING_LEN(str) - prefixlen);
}

返回 self 的副本，其中删除了前缀 prefix

'oof'.delete_prefix('o')          # => "of"
'oof'.delete_prefix('oo')         # => "f"
'oof'.delete_prefix('oof')        # => ""
'oof'.delete_prefix('x')          # => "oof"
'тест'.delete_prefix('те')        # => "ст"
'こんにちは'.delete_prefix('こん')  # => "にちは"

相关：参见转换为新字符串。

delete_prefix!(prefix) → self or nil

Source

static VALUE
rb_str_delete_prefix_bang(VALUE str, VALUE prefix)
{
    long prefixlen;
    str_modify_keep_cr(str);

    prefixlen = deleted_prefix_length(str, prefix);
    if (prefixlen <= 0) return Qnil;

    return rb_str_drop_bytes(str, prefixlen);
}

类似于 String#delete_prefix，但 self 被原地修改；如果删除了前缀，则返回 self，否则返回 nil。

相关：参见修改。

delete_suffix(suffix) → new_string

Source

static VALUE
rb_str_delete_suffix(VALUE str, VALUE suffix)
{
    long suffixlen;

    suffixlen = deleted_suffix_length(str, suffix);
    if (suffixlen <= 0) return str_duplicate(rb_cString, str);

    return rb_str_subseq(str, 0, RSTRING_LEN(str) - suffixlen);
}

返回 self 的副本，其中删除了后缀 suffix

'foo'.delete_suffix('o')           # => "fo"
'foo'.delete_suffix('oo')          # => "f"
'foo'.delete_suffix('foo')         # => ""
'foo'.delete_suffix('f')           # => "foo"
'foo'.delete_suffix('x')           # => "foo"
'тест'.delete_suffix('ст')         # => "те"
'こんにちは'.delete_suffix('ちは')  # => "こんに"

相关：参见转换为新字符串。

delete_suffix!(suffix) → self or nil

Source

static VALUE
rb_str_delete_suffix_bang(VALUE str, VALUE suffix)
{
    long olen, suffixlen, len;
    str_modifiable(str);

    suffixlen = deleted_suffix_length(str, suffix);
    if (suffixlen <= 0) return Qnil;

    olen = RSTRING_LEN(str);
    str_modify_keep_cr(str);
    len = olen - suffixlen;
    STR_SET_LEN(str, len);
    TERM_FILL(&RSTRING_PTR(str)[len], TERM_LEN(str));
    if (ENC_CODERANGE(str) != ENC_CODERANGE_7BIT) {
        ENC_CODERANGE_CLEAR(str);
    }
    return str;
}

类似于 String#delete_suffix，但 self 被原地修改；如果删除了后缀，则返回 self，否则返回 nil。

相关：参见修改。

downcase(mapping = :ascii) → new_string

Source

static VALUE
rb_str_downcase(int argc, VALUE *argv, VALUE str)
{
    rb_encoding *enc;
    OnigCaseFoldType flags = ONIGENC_CASE_DOWNCASE;
    VALUE ret;

    flags = check_case_options(argc, argv, flags);
    enc = str_true_enc(str);
    if (case_option_single_p(flags, enc, str)) {
        ret = rb_str_new(RSTRING_PTR(str), RSTRING_LEN(str));
        str_enc_copy_direct(ret, str);
        downcase_single(ret);
    }
    else if (flags&ONIGENC_CASE_ASCII_ONLY) {
        ret = rb_str_new(0, RSTRING_LEN(str));
        rb_str_ascii_casemap(str, ret, &flags, enc);
    }
    else {
        ret = rb_str_casemap(str, &flags, enc);
    }

    return ret;
}

返回一个包含 self 中小写字符的新字符串

'HELLO'.downcase        # => "hello"
'STRAẞE'.downcase       # => "straße"
'ПРИВЕТ'.downcase       # => "привет"
'RubyGems.org'.downcase # => "rubygems.org"

某些字符（以及某些字符集）没有大写和小写的版本；参见大小写映射

s = '1, 2, 3, ...'
s.downcase == s # => true
s = 'こんにちは'
s.downcase == s # => true

大小写受给定的 mapping 影响，该映射可以是 :ascii、:fold 或 :turkic；参见大小写映射。

相关：参见转换为新字符串。

downcase!(mapping) → self or nil

Source

static VALUE
rb_str_downcase_bang(int argc, VALUE *argv, VALUE str)
{
    rb_encoding *enc;
    OnigCaseFoldType flags = ONIGENC_CASE_DOWNCASE;

    flags = check_case_options(argc, argv, flags);
    str_modify_keep_cr(str);
    enc = str_true_enc(str);
    if (case_option_single_p(flags, enc, str)) {
        if (downcase_single(str))
            flags |= ONIGENC_CASE_MODIFIED;
    }
    else if (flags&ONIGENC_CASE_ASCII_ONLY)
        rb_str_ascii_casemap(str, str, &flags, enc);
    else
        str_shared_replace(str, rb_str_casemap(str, &flags, enc));

    if (ONIGENC_CASE_MODIFIED&flags) return str;
    return Qnil;
}

类似于 String#downcase，但不同之处在于

更改 self 中的字符大小写（而不是 self 的副本）。
如果进行了任何更改，则返回 self，否则返回 nil。

相关：参见修改。

dump → new_string

Source

VALUE
rb_str_dump(VALUE str)
{
    int encidx = rb_enc_get_index(str);
    rb_encoding *enc = rb_enc_from_index(encidx);
    long len;
    const char *p, *pend;
    char *q, *qend;
    VALUE result;
    int u8 = (encidx == rb_utf8_encindex());
    static const char nonascii_suffix[] = ".dup.force_encoding(\"%s\")";

    len = 2;                    /* "" */
    if (!rb_enc_asciicompat(enc)) {
        len += strlen(nonascii_suffix) - rb_strlen_lit("%s");
        len += strlen(enc->name);
    }

    p = RSTRING_PTR(str); pend = p + RSTRING_LEN(str);
    while (p < pend) {
        int clen;
        unsigned char c = *p++;

        switch (c) {
          case '"':  case '\\':
          case '\n': case '\r':
          case '\t': case '\f':
          case '\013': case '\010': case '\007': case '\033':
            clen = 2;
            break;

          case '#':
            clen = IS_EVSTR(p, pend) ? 2 : 1;
            break;

          default:
            if (ISPRINT(c)) {
                clen = 1;
            }
            else {
                if (u8 && c > 0x7F) {   /* \u notation */
                    int n = rb_enc_precise_mbclen(p-1, pend, enc);
                    if (MBCLEN_CHARFOUND_P(n)) {
                        unsigned int cc = rb_enc_mbc_to_codepoint(p-1, pend, enc);
                        if (cc <= 0xFFFF)
                            clen = 6;  /* \uXXXX */
                        else if (cc <= 0xFFFFF)
                            clen = 9;  /* \u{XXXXX} */
                        else
                            clen = 10; /* \u{XXXXXX} */
                        p += MBCLEN_CHARFOUND_LEN(n)-1;
                        break;
                    }
                }
                clen = 4;       /* \xNN */
            }
            break;
        }

        if (clen > LONG_MAX - len) {
            rb_raise(rb_eRuntimeError, "string size too big");
        }
        len += clen;
    }

    result = rb_str_new(0, len);
    p = RSTRING_PTR(str); pend = p + RSTRING_LEN(str);
    q = RSTRING_PTR(result); qend = q + len + 1;

    *q++ = '"';
    while (p < pend) {
        unsigned char c = *p++;

        if (c == '"' || c == '\\') {
            *q++ = '\\';
            *q++ = c;
        }
        else if (c == '#') {
            if (IS_EVSTR(p, pend)) *q++ = '\\';
            *q++ = '#';
        }
        else if (c == '\n') {
            *q++ = '\\';
            *q++ = 'n';
        }
        else if (c == '\r') {
            *q++ = '\\';
            *q++ = 'r';
        }
        else if (c == '\t') {
            *q++ = '\\';
            *q++ = 't';
        }
        else if (c == '\f') {
            *q++ = '\\';
            *q++ = 'f';
        }
        else if (c == '\013') {
            *q++ = '\\';
            *q++ = 'v';
        }
        else if (c == '\010') {
            *q++ = '\\';
            *q++ = 'b';
        }
        else if (c == '\007') {
            *q++ = '\\';
            *q++ = 'a';
        }
        else if (c == '\033') {
            *q++ = '\\';
            *q++ = 'e';
        }
        else if (ISPRINT(c)) {
            *q++ = c;
        }
        else {
            *q++ = '\\';
            if (u8) {
                int n = rb_enc_precise_mbclen(p-1, pend, enc) - 1;
                if (MBCLEN_CHARFOUND_P(n)) {
                    int cc = rb_enc_mbc_to_codepoint(p-1, pend, enc);
                    p += n;
                    if (cc <= 0xFFFF)
                        snprintf(q, qend-q, "u%04X", cc);    /* \uXXXX */
                    else
                        snprintf(q, qend-q, "u{%X}", cc);  /* \u{XXXXX} or \u{XXXXXX} */
                    q += strlen(q);
                    continue;
                }
            }
            snprintf(q, qend-q, "x%02X", c);
            q += 3;
        }
    }
    *q++ = '"';
    *q = '\0';
    if (!rb_enc_asciicompat(enc)) {
        snprintf(q, qend-q, nonascii_suffix, enc->name);
        encidx = rb_ascii8bit_encindex();
    }
    /* result from dump is ASCII */
    rb_enc_associate_index(result, encidx);
    ENC_CODERANGE_SET(result, ENC_CODERANGE_7BIT);
    return result;
}

对于普通字符串，此方法 +String#dump+ 返回 self 的可打印的仅 ASCII 版本，并用双引号括起来。

对于转储的字符串，方法 String#undump 是 +String#dump+ 的反向操作；它返回 self 的“恢复”版本，其中所有转储的更改都已撤销。

在最简单的情况下，转储的字符串包含原始字符串，并用双引号括起来；此示例在 irb（交互式 Ruby）中完成，它使用方法 'inspect` 来呈现结果

s = 'hello'   # => "hello"
s.dump        # => "\"hello\""
s.dump.undump # => "hello"

请记住，在上面第二行中

外部双引号由 inspect 添加，并且不是 dump 输出的一部分。
内部双引号是 dump 输出的一部分，并且由于它们位于外部双引号内而被 inspect 转义。

为避免混淆，我们将使用此辅助方法来省略外部双引号

def dump(s)
  print "String:   ", s, "\n"
  print "Dumped:   ", s.dump, "\n"
  print "Undumped: ", s.dump.undump, "\n"
end

因此，对于字符串 'hello'，我们将看到

String:    hello
Dumped:    "hello"
Undumped:  hello

在转储中，某些特殊字符会被转义

String:    "
Dumped:    "\""
Undumped:  "

String:    \
Dumped:    "\\"
Undumped:  \

在转储中，不可打印字符会被可打印字符替换；不可打印字符是空白字符（空格本身除外）；在这里，我们看到这些字符的序数，以及解释性文本

h = {
   7 => 'Alert (BEL)',
   8 => 'Backspace (BS)',
   9 => 'Horizontal tab (HT)',
  10 => 'Linefeed (LF)',
  11 => 'Vertical tab (VT)',
  12 => 'Formfeed (FF)',
  13 => 'Carriage return (CR)'
}

在此示例中，转储输出由方法 inspect 打印，因此同时包含外部双引号和转义的双引号

s = ''
h.keys.each {|i| s << i } # => [7, 8, 9, 10, 11, 12, 13]
s                         # => "\a\b\t\n\v\f\r"
s.dump                    # => "\"\\a\\b\\t\\n\\v\\f\\r\""

如果 self 编码为 UTF-8 并包含 Unicode 字符，则每个 Unicode 字符将被转储为 Unicode 转义序列

String:    тест
Dumped:    "\u0442\u0435\u0441\u0442"
Undumped:  тест

String:    こんにちは
Dumped:    "\u3053\u3093\u306B\u3061\u306F"
Undumped:  こんにちは

如果 self 的编码不是 ASCII 兼容的（即，如果 self.encoding.ascii_compatible? 返回 false），则每个 ASCII 兼容字节将被转储为 ASCII 字符，所有其他字节将被转储为十六进制；还会追加 .dup.force_encoding(\"encoding\")，其中 <encoding> 是 self.encoding.name

String:    hello
Dumped:    "\xFE\xFF\x00h\x00e\x00l\x00l\x00o".dup.force_encoding("UTF-16")
Undumped:  hello

String:    тест
Dumped:    "\xFE\xFF\x04B\x045\x04A\x04B".dup.force_encoding("UTF-16")
Undumped:  тест

String:    こんにちは
Dumped:    "\xFE\xFF0S0\x930k0a0o".dup.force_encoding("UTF-16")
Undumped:  こんにちは

each_byte {|byte| ... } → self

each_byte → enumerator

Source

static VALUE
rb_str_each_byte(VALUE str)
{
    RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_byte_size);
    return rb_str_enumerate_bytes(str, 0);
}

给定一个块时，将对 self 中的每个字节调用该块；返回 self

a = []
'hello'.each_byte {|byte| a.push(byte) }     # Five 1-byte characters.
a # => [104, 101, 108, 108, 111]
a = []
'тест'.each_byte {|byte| a.push(byte) }      # Four 2-byte characters.
a # => [209, 130, 208, 181, 209, 129, 209, 130]
a = []
'こんにちは'.each_byte {|byte| a.push(byte) }  # Five 3-byte characters.
a # => [227, 129, 147, 227, 130, 147, 227, 129, 171, 227, 129, 161, 227, 129, 175]

没有块时，返回一个枚举器。

相关：参见迭代。

each_char {|char| ... } → self

each_char → enumerator

Source

static VALUE
rb_str_each_char(VALUE str)
{
    RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_char_size);
    return rb_str_enumerate_chars(str, 0);
}

给定一个块时，将对 self 中的每个字符调用该块；返回 self

a = []
'hello'.each_char do |char|
  a.push(char)
end
a # => ["h", "e", "l", "l", "o"]
a = []
'тест'.each_char do |char|
  a.push(char)
end
a # => ["т", "е", "с", "т"]
a = []
'こんにちは'.each_char do |char|
  a.push(char)
end
a # => ["こ", "ん", "に", "ち", "は"]

没有块时，返回一个枚举器。

each_codepoint → enumerator

Source

static VALUE
rb_str_each_codepoint(VALUE str)
{
    RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_char_size);
    return rb_str_enumerate_codepoints(str, 0);
}

给定一个块时，将对 self 中的每个码点调用该块；每个码点是字符的整数值；返回 self

a = []
'hello'.each_codepoint do |codepoint|
  a.push(codepoint)
end
a # => [104, 101, 108, 108, 111]
a = []
'тест'.each_codepoint do |codepoint|
  a.push(codepoint)
end
a # => [1090, 1077, 1089, 1090]
a = []
'こんにちは'.each_codepoint do |codepoint|
  a.push(codepoint)
end
a # => [12371, 12435, 12395, 12385, 12399]

没有块时，返回一个枚举器。

each_grapheme_cluster → enumerator

Source

static VALUE
rb_str_each_grapheme_cluster(VALUE str)
{
    RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_grapheme_cluster_size);
    return rb_str_enumerate_grapheme_clusters(str, 0);
}

给定一个块时，将对 self 中的每个字素簇调用给定的块（参见 Unicode 字素簇边界）；返回 self

a = []
'hello'.each_grapheme_cluster do |grapheme_cluster|
  a.push(grapheme_cluster)
end
a  # => ["h", "e", "l", "l", "o"]

a = []
'тест'.each_grapheme_cluster do |grapheme_cluster|
  a.push(grapheme_cluster)
end
a # => ["т", "е", "с", "т"]

a = []
'こんにちは'.each_grapheme_cluster do |grapheme_cluster|
  a.push(grapheme_cluster)
end
a # => ["こ", "ん", "に", "ち", "は"]

没有块时，返回一个枚举器。

each_line(record_separator = $/, chomp: false) → enumerator

Source

static VALUE
rb_str_each_line(int argc, VALUE *argv, VALUE str)
{
    RETURN_SIZED_ENUMERATOR(str, argc, argv, 0);
    return rb_str_enumerate_lines(argc, argv, str, 0);
}

给定一个块时，将 self 按每个 record_separator 出现的位置分割成的子字符串（行）形成；将每行传递给块；返回 self。

使用默认 record_separator

$/ # => "\n"
s = <<~EOT
This is the first line.
This is line two.

This is line four.
This is line five.
EOT
s.each_line {|line| p line }

输出

"This is the first line.\n"
"This is line two.\n"
"\n"
"This is line four.\n"
"This is line five.\n"

使用不同的 record_separator

record_separator = ' is '
s.each_line(record_separator) {|line| p line }

输出

"This is "
"the first line.\nThis is "
"line two.\n\nThis is "
"line four.\nThis is "
"line five.\n"

当 chomp 为 true 时，从每行中删除尾部的 record_separator

s.each_line(chomp: true) {|line| p line }

输出

"This is the first line."
"This is line two."
""
"This is line four."
"This is line five."

将空字符串作为 record_separator，通过在两个或多个换行符处分割来形成并传递“段落”

record_separator = ''
s.each_line(record_separator) {|line| p line }

输出

"This is the first line.\nThis is line two.\n\n"
"This is line four.\nThis is line five.\n"

没有块时，返回一个枚举器。

相关：参见迭代。

empty? → true or false

Source

static VALUE
rb_str_empty(VALUE str)
{
    return RBOOL(RSTRING_LEN(str) == 0);
}

返回 self 的长度是否为零

'hello'.empty? # => false
' '.empty? # => false
''.empty? # => true

encode(dst_encoding, src_encoding, **enc_opts) → string

Source

static VALUE
str_encode(int argc, VALUE *argv, VALUE str)
{
    VALUE newstr = str;
    int encidx = str_transcode(argc, argv, &newstr);
    return encoded_dup(newstr, str, encidx);
}

返回一个根据 dst_encoding 转码的 self 的副本；参见编码。

默认情况下，如果 self 包含无效字节或 dst_encoding 中未定义的字符，则引发异常；该行为可以通过编码选项进行修改；参见下方。

无参数

如果 Encoding.default_internal 为 nil（默认值），则使用相同的编码

Encoding.default_internal # => nil
s = "Ruby\x99".force_encoding('Windows-1252')
s.encoding                # => #<Encoding:Windows-1252>
s.bytes                   # => [82, 117, 98, 121, 153]
t = s.encode              # => "Ruby\x99"
t.encoding                # => #<Encoding:Windows-1252>
t.bytes                   # => [82, 117, 98, 121, 226, 132, 162]

否则，使用编码 Encoding.default_internal

Encoding.default_internal = 'UTF-8'
t = s.encode              # => "Ruby™"
t.encoding                # => #<Encoding:UTF-8>

仅给定参数 dst_encoding，则使用该编码

s = "Ruby\x99".force_encoding('Windows-1252')
s.encoding            # => #<Encoding:Windows-1252>
t = s.encode('UTF-8') # => "Ruby™"
t.encoding            # => #<Encoding:UTF-8>

给定参数 dst_encoding 和 src_encoding，将 self 解释为使用 src_encoding，然后使用 dst_encoding 对新字符串进行编码

s = "Ruby\x99"
t = s.encode('UTF-8', 'Windows-1252') # => "Ruby™"
t.encoding                            # => #<Encoding:UTF-8>

可选关键字参数 enc_opts 指定编码选项；参见编码选项。

请注意，除非给出 invalid: :replace 选项，否则从编码 enc 到相同编码 enc 的转换（无论 enc 是显式给出还是隐式给出）都是一个无操作，即字符串只是被复制而没有任何更改，并且不会引发异常，即使存在无效字节。

encode!(dst_encoding, src_encoding, **enc_opts) → self

Source

static VALUE
str_encode_bang(int argc, VALUE *argv, VALUE str)
{
    VALUE newstr;
    int encidx;

    rb_check_frozen(str);

    newstr = str;
    encidx = str_transcode(argc, argv, &newstr);

    if (encidx < 0) return str;
    if (newstr == str) {
        rb_enc_associate_index(str, encidx);
        return str;
    }
    rb_str_shared_replace(str, newstr);
    return str_encode_associate(str, encidx);
}

类似于 encode，但将编码更改应用于 self；返回 self。

相关：参见修改。

encoding → encoding

Source

VALUE
rb_obj_encoding(VALUE obj)
{
    int idx = rb_enc_get_index(obj);
    if (idx < 0) {
        rb_raise(rb_eTypeError, "unknown encoding");
    }
    return rb_enc_from_encoding_index(idx & ENC_INDEX_MASK);
}

返回一个表示 self 编码的 Encoding 对象；参见编码。

相关：参见查询。

end_with?(*strings) → true or false

Source

static VALUE
rb_str_end_with(int argc, VALUE *argv, VALUE str)
{
    int i;

    for (i=0; i<argc; i++) {
        VALUE tmp = argv[i];
        const char *p, *s, *e;
        long slen, tlen;
        rb_encoding *enc;

        StringValue(tmp);
        enc = rb_enc_check(str, tmp);
        if ((tlen = RSTRING_LEN(tmp)) == 0) return Qtrue;
        if ((slen = RSTRING_LEN(str)) < tlen) continue;
        p = RSTRING_PTR(str);
        e = p + slen;
        s = e - tlen;
        if (!at_char_boundary(p, s, e, enc))
            continue;
        if (memcmp(s, RSTRING_PTR(tmp), tlen) == 0)
            return Qtrue;
    }
    return Qfalse;
}

返回 self 是否以给定的任何 strings 结尾

'foo'.end_with?('oo')         # => true
'foo'.end_with?('bar', 'oo')  # => true
'foo'.end_with?('bar', 'baz') # => false
'foo'.end_with?('')           # => true
'тест'.end_with?('т')         # => true
'こんにちは'.end_with?('は')   # => true

相关：参见查询。

eql?(object) → true or false

Source

VALUE
rb_str_eql(VALUE str1, VALUE str2)
{
    if (str1 == str2) return Qtrue;
    if (!RB_TYPE_P(str2, T_STRING)) return Qfalse;
    return rb_str_eql_internal(str1, str2);
}

返回 self 和 object 是否具有相同的长度和内容

s = 'foo'
s.eql?('foo')  # => true
s.eql?('food') # => false
s.eql?('FOO')  # => false

如果两个字符串的编码不兼容，则返回 false

s0 = "äöü"                           # => "äöü"
s1 = s0.encode(Encoding::ISO_8859_1) # => "\xE4\xF6\xFC"
s0.encoding                          # => #<Encoding:UTF-8>
s1.encoding                          # => #<Encoding:ISO-8859-1>
s0.eql?(s1)                          # => false

参见编码。

相关：参见查询。

force_encoding(encoding) → self

Source

static VALUE
rb_str_force_encoding(VALUE str, VALUE enc)
{
    str_modifiable(str);

    rb_encoding *encoding = rb_to_encoding(enc);
    int idx = rb_enc_to_index(encoding);

    // If the encoding is unchanged, we do nothing.
    if (ENCODING_GET(str) == idx) {
        return str;
    }

    rb_enc_associate_index(str, idx);

    // If the coderange was 7bit and the new encoding is ASCII-compatible
    // we can keep the coderange.
    if (ENC_CODERANGE(str) == ENC_CODERANGE_7BIT && encoding && rb_enc_asciicompat(encoding)) {
        return str;
    }

    ENC_CODERANGE_CLEAR(str);
    return str;
}

将 self 的编码更改为给定的 encoding，该编码可以是字符串编码名称或 Encoding 对象；不会更改底层字节；返回 self

s = 'łał'
s.bytes                   # => [197, 130, 97, 197, 130]
s.encoding                # => #<Encoding:UTF-8>
s.force_encoding('ascii') # => "\xC5\x82a\xC5\x82"
s.encoding                # => #<Encoding:US-ASCII>
s.valid_encoding?         # => true
s.bytes                   # => [197, 130, 97, 197, 130]

即使给定的 encoding 对 self 无效，也会进行更改（如上述更改）

s.valid_encoding?         # => false

参见编码。

相关：参见修改。

getbyte(index) → integer or nil

Source

VALUE
rb_str_getbyte(VALUE str, VALUE index)
{
    long pos = NUM2LONG(index);

    if (pos < 0)
        pos += RSTRING_LEN(str);
    if (pos < 0 ||  RSTRING_LEN(str) <= pos)
        return Qnil;

    return INT2FIX((unsigned char)RSTRING_PTR(str)[pos]);
}

将零基 index 处的字节作为整数返回

s = 'foo'
s.getbyte(0)    # => 102
s.getbyte(1)    # => 111
s.getbyte(2)    # => 111

如果 index 为负数，则从末尾开始倒数

s.getbyte(-3) # => 102

如果 index 超出范围，则返回 nil

s.getbyte(3)  # => nil
s.getbyte(-4) # => nil

更多示例

s = 'тест'
s.bytes      # => [209, 130, 208, 181, 209, 129, 209, 130]
s.getbyte(2) # => 208
s = 'こんにちは'
s.bytes      # => [227, 129, 147, 227, 130, 147, 227, 129, 171, 227, 129, 161, 227, 129, 175]
s.getbyte(2) # => 147

Source

static VALUE
rb_str_grapheme_clusters(VALUE str)
{
    VALUE ary = WANTARRAY("grapheme_clusters", rb_str_strlen(str));
    return rb_str_enumerate_grapheme_clusters(str, ary);
}

返回 self 中的字素簇数组（参见 Unicode 字素簇边界）

s = "ä-pqr-b̈-xyz-c̈"
s.size                   # => 16
s.bytesize               # => 19
s.grapheme_clusters.size # => 13
s.grapheme_clusters
# => ["ä", "-", "p", "q", "r", "-", "b̈", "-", "x", "y", "z", "-", "c̈"]

详情

s = "ä"
s.grapheme_clusters             # => ["ä"]           # One grapheme cluster.
s.bytes                         # => [97, 204, 136]  # Three bytes.
s.chars                         # => ["a", "̈"]       # Two characters.
s.chars.map {|char| char.ord }  # => [97, 776]       # Their values.

相关：参见转换为非字符串。

gsub(pattern, replacement) → new_string

gsub(pattern) {|match| ... } → new_string

gsub(pattern) → enumerator

Source

static VALUE
rb_str_gsub(int argc, VALUE *argv, VALUE str)
{
    return str_gsub(argc, argv, str, 0);
}

返回一个 self 的副本，其中零个或多个子字符串被替换。

参数 pattern 可以是字符串或 Regexp；参数 replacement 可以是字符串或 Hash。参数值的不同类型使此方法非常通用。

下面是一些简单的示例；有关更多示例，请参见替换方法。

给定参数 pattern 和字符串 replacement，用给定的 replacement 字符串替换每个匹配的子字符串

s = 'abracadabra'
s.gsub('ab', 'AB')   # => "ABracadABra"
s.gsub(/[a-c]/, 'X') # => "XXrXXXdXXrX"

给定参数 pattern 和哈希 replacement，用给定 replacement 哈希中的值替换每个匹配的子字符串，或者删除它

h = {'a' => 'A', 'b' => 'B', 'c' => 'C'}
s.gsub(/[a-c]/, h) # => "ABrACAdABrA"  # 'a', 'b', 'c' replaced.
s.gsub(/[a-d]/, h) # => "ABrACAABrA"   # 'd' removed.

给定参数 pattern 和一个块，用每个匹配的子字符串调用该块；将该子字符串替换为块的返回值

s.gsub(/[a-d]/) {|substring| substring.upcase }
# => "ABrACADABrA"

给定参数 pattern 且没有块，则返回一个新的 Enumerator。

gsub!(pattern) {|match| ... } → self or nil

gsub!(pattern) → an_enumerator

Source

static VALUE
rb_str_gsub_bang(int argc, VALUE *argv, VALUE str)
{
    str_modify_keep_cr(str);
    return str_gsub(argc, argv, str, 1);
}

类似于 String#gsub，但不同之处在于

在 self 中执行替换（而不是在 self 的副本中）。
如果删除了任何字符，则返回 self，否则返回 nil。

相关：参见修改。

hash → integer

Source

static VALUE
rb_str_hash_m(VALUE str)
{
    st_index_t hval = rb_str_hash(str);
    return ST2FIX(hval);
}

返回 self 的整数哈希值。

具有相同内容和兼容编码的两个 String 对象也具有相同的哈希值；请参阅 Object#hash 和 Encodings

s = 'foo'
h = s.hash       # => -569050784
h == 'foo'.hash  # => true
h == 'food'.hash # => false
h == 'FOO'.hash  # => false

s0 = "äöü"
s1 = s0.encode(Encoding::ISO_8859_1)
s0.encoding        # => #<Encoding:UTF-8>
s1.encoding        # => #<Encoding:ISO-8859-1>
s0.hash == s1.hash # => false

相关：参见查询。

hex → integer

Source

static VALUE
rb_str_hex(VALUE str)
{
    return rb_str_to_inum(str, 16, FALSE);
}

将 self 的前导子串解释为十六进制，可能带符号；返回其作为整数的值。

当前导子串以以下内容开头时，将其解释为十六进制：

一个或多个表示十六进制数字的字符（每个字符的范围在 '0'..'9'、'a'..'f' 或 'A'..'F'）；要解释的字符串在第一个不表示十六进制数字的字符处结束。

'f'.hex        # => 15
'11'.hex       # => 17
'FFF'.hex      # => 4095
'fffg'.hex     # => 4095
'foo'.hex      # => 15   # 'f' hexadecimal, 'oo' not.
'bar'.hex      # => 186  # 'ba' hexadecimal, 'r' not.
'deadbeef'.hex # => 3735928559

'0x' 或 '0X'，后跟一个或多个十六进制数字。

'0xfff'.hex    # => 4095
'0xfffg'.hex   # => 4095

以上任何一种都可能以 '-' 为前缀，这会否定解释后的值。

'-fff'.hex      # => -4095
'-0xFFF'.hex    # => -4095

对于上面未描述的任何子串，返回零。

'xxx'.hex     # => 0
''.hex        # => 0

请注意，与 oct 不同，此方法仅解释十六进制，而不解释二进制、八进制或十进制表示法。

'0b111'.hex   # => 45329
'0o777'.hex   # => 0
'0d999'.hex   # => 55705

相关：请参阅转换为非字符串。

include?(other_string) → true or false

Source

VALUE
rb_str_include(VALUE str, VALUE arg)
{
    long i;

    StringValue(arg);
    i = rb_str_index(str, arg, 0);

    return RBOOL(i != -1);
}

返回 self 是否包含 other_string。

s = 'bar'
s.include?('ba')  # => true
s.include?('ar')  # => true
s.include?('bar') # => true
s.include?('a')   # => true
s.include?('')    # => true
s.include?('foo') # => false

Source

static VALUE
rb_str_index_m(int argc, VALUE *argv, VALUE str)
{
    VALUE sub;
    VALUE initpos;
    rb_encoding *enc = STR_ENC_GET(str);
    long pos;

    if (rb_scan_args(argc, argv, "11", &sub, &initpos) == 2) {
        long slen = str_strlen(str, enc); /* str's enc */
        pos = NUM2LONG(initpos);
        if (pos < 0 ? (pos += slen) < 0 : pos > slen) {
            if (RB_TYPE_P(sub, T_REGEXP)) {
                rb_backref_set(Qnil);
            }
            return Qnil;
        }
    }
    else {
        pos = 0;
    }

    if (RB_TYPE_P(sub, T_REGEXP)) {
        pos = str_offset(RSTRING_PTR(str), RSTRING_END(str), pos,
                         enc, single_byte_optimizable(str));

        if (rb_reg_search(sub, str, pos, 0) >= 0) {
            VALUE match = rb_backref_get();
            struct re_registers *regs = RMATCH_REGS(match);
            pos = rb_str_sublen(str, BEG(0));
            return LONG2NUM(pos);
        }
    }
    else {
        StringValue(sub);
        pos = rb_str_index(str, sub, pos);
        if (pos >= 0) {
            pos = rb_str_sublen(str, pos);
            return LONG2NUM(pos);
        }
    }
    return Qnil;
}

返回第一个与给定参数 pattern 匹配的子串的整数位置，如果未找到则返回 nil。

当 pattern 是字符串时，返回 self 中第一个匹配子串的索引。

'foo'.index('f')         # => 0
'foo'.index('o')         # => 1
'foo'.index('oo')        # => 1
'foo'.index('ooo')       # => nil
'тест'.index('с')        # => 2  # Characters, not bytes.
'こんにちは'.index('ち')  # => 3

当 pattern 是 Regexp 时，返回 self 中第一个匹配项的索引。

'foo'.index(/o./) # => 1
'foo'.index(/.o/) # => 0

当 offset 为非负数时，从位置 offset 开始搜索；返回的索引相对于 self 的开头。

'bar'.index('r', 0)        # => 2
'bar'.index('r', 1)        # => 2
'bar'.index('r', 2)        # => 2
'bar'.index('r', 3)        # => nil
'bar'.index(/[r-z]/, 0)    # => 2
'тест'.index('с', 1)       # => 2
'тест'.index('с', 2)       # => 2
'тест'.index('с', 3)       # => nil  # Offset in characters, not bytes.
'こんにちは'.index('ち', 2) # => 3

当 offset 参数为负整数时，通过从 self 末尾开始计数来选择搜索位置。

'foo'.index('o', -1)  # => 2
'foo'.index('o', -2)  # => 1
'foo'.index('o', -3)  # => 1
'foo'.index('o', -4)  # => nil
'foo'.index(/o./, -2) # => 1
'foo'.index(/.o/, -2) # => 1

相关：参见查询。

initialize_copy

别名： replace

insert(offset, other_string) → self

Source

static VALUE
rb_str_insert(VALUE str, VALUE idx, VALUE str2)
{
    long pos = NUM2LONG(idx);

    if (pos == -1) {
        return rb_str_append(str, str2);
    }
    else if (pos < 0) {
        pos++;
    }
    rb_str_update(str, pos, 0, str2);
    return str;
}

将给定的 other_string 插入到 self 中；返回 self。

如果给定的 index 为非负数，则在偏移量 index 处插入 other_string。

'foo'.insert(0, 'bar')       # => "barfoo"
'foo'.insert(1, 'bar')       # => "fbaroo"
'foo'.insert(3, 'bar')       # => "foobar"
'тест'.insert(2, 'bar')      # => "теbarст"  # Characters, not bytes.
'こんにちは'.insert(2, 'bar') # => "こんbarにちは"

如果 index 为负数，则从 self 末尾开始计数，并在偏移量之后插入 other_string。

'foo'.insert(-2, 'bar') # => "fobaro"

相关：参见修改。

inspect → string

Source

VALUE
rb_str_inspect(VALUE str)
{
    int encidx = ENCODING_GET(str);
    rb_encoding *enc = rb_enc_from_index(encidx);
    const char *p, *pend, *prev;
    char buf[CHAR_ESC_LEN + 1];
    VALUE result = rb_str_buf_new(0);
    rb_encoding *resenc = rb_default_internal_encoding();
    int unicode_p = rb_enc_unicode_p(enc);
    int asciicompat = rb_enc_asciicompat(enc);

    if (resenc == NULL) resenc = rb_default_external_encoding();
    if (!rb_enc_asciicompat(resenc)) resenc = rb_usascii_encoding();
    rb_enc_associate(result, resenc);
    str_buf_cat2(result, "\"");

    p = RSTRING_PTR(str); pend = RSTRING_END(str);
    prev = p;
    while (p < pend) {
        unsigned int c, cc;
        int n;

        n = rb_enc_precise_mbclen(p, pend, enc);
        if (!MBCLEN_CHARFOUND_P(n)) {
            if (p > prev) str_buf_cat(result, prev, p - prev);
            n = rb_enc_mbminlen(enc);
            if (pend < p + n)
                n = (int)(pend - p);
            while (n--) {
                snprintf(buf, CHAR_ESC_LEN, "\\x%02X", *p & 0377);
                str_buf_cat(result, buf, strlen(buf));
                prev = ++p;
            }
            continue;
        }
        n = MBCLEN_CHARFOUND_LEN(n);
        c = rb_enc_mbc_to_codepoint(p, pend, enc);
        p += n;
        if ((asciicompat || unicode_p) &&
          (c == '"'|| c == '\\' ||
            (c == '#' &&
             p < pend &&
             MBCLEN_CHARFOUND_P(rb_enc_precise_mbclen(p,pend,enc)) &&
             (cc = rb_enc_codepoint(p,pend,enc),
              (cc == '$' || cc == '@' || cc == '{'))))) {
            if (p - n > prev) str_buf_cat(result, prev, p - n - prev);
            str_buf_cat2(result, "\\");
            if (asciicompat || enc == resenc) {
                prev = p - n;
                continue;
            }
        }
        switch (c) {
          case '\n': cc = 'n'; break;
          case '\r': cc = 'r'; break;
          case '\t': cc = 't'; break;
          case '\f': cc = 'f'; break;
          case '\013': cc = 'v'; break;
          case '\010': cc = 'b'; break;
          case '\007': cc = 'a'; break;
          case 033: cc = 'e'; break;
          default: cc = 0; break;
        }
        if (cc) {
            if (p - n > prev) str_buf_cat(result, prev, p - n - prev);
            buf[0] = '\\';
            buf[1] = (char)cc;
            str_buf_cat(result, buf, 2);
            prev = p;
            continue;
        }
        /* The special casing of 0x85 (NEXT_LINE) here is because
         * Oniguruma historically treats it as printable, but it
         * doesn't match the print POSIX bracket class or character
         * property in regexps.
         *
         * See Ruby Bug #16842 for details:
         * https://bugs.ruby-lang.org/issues/16842
         */
        if ((enc == resenc && rb_enc_isprint(c, enc) && c != 0x85) ||
            (asciicompat && rb_enc_isascii(c, enc) && ISPRINT(c))) {
            continue;
        }
        else {
            if (p - n > prev) str_buf_cat(result, prev, p - n - prev);
            rb_str_buf_cat_escaped_char(result, c, unicode_p);
            prev = p;
            continue;
        }
    }
    if (p > prev) str_buf_cat(result, prev, p - prev);
    str_buf_cat2(result, "\"");

    return result;
}

返回 self 的可打印版本，用双引号括起来。

大多数可打印字符将简单地显示为它们本身。

'abc'.inspect        # => "\"abc\""
'012'.inspect        # => "\"012\""
''.inspect           # => "\"\""
"\u000012".inspect   # => "\"\\u000012\""
'тест'.inspect       # => "\"тест\""
'こんにちは'.inspect  # => "\"こんにちは\""

但是，可打印字符双引号 ('"') 和反斜杠 ('\') 会被转义。

'"'.inspect  # => "\"\\\"\""
'\\'.inspect # => "\"\\\\\""

不可打印字符是 ASCII 字符，其值为 0..31 范围内的值，以及值为 127 的字符。

其中大多数字符的显示如下：

0.chr.inspect # => "\"\\x00\""
1.chr.inspect # => "\"\\x01\""
2.chr.inspect # => "\"\\x02\""
# ...

然而，少数字符有特殊的显示方式。

7.chr.inspect  # => "\"\\a\""  # BEL
8.chr.inspect  # => "\"\\b\""  # BS
9.chr.inspect  # => "\"\\t\""  # TAB
10.chr.inspect # => "\"\\n\""  # LF
11.chr.inspect # => "\"\\v\""  # VT
12.chr.inspect # => "\"\\f\""  # FF
13.chr.inspect # => "\"\\r\""  # CR
27.chr.inspect # => "\"\\e\""  # ESC

相关：参见转换为非字符串。

intern → symbol

Source

VALUE
rb_str_intern(VALUE str)
{
    return sym_find_or_insert_dynamic_symbol(&ruby_global_symbols, str);
}

返回从 self 派生的 Symbol 对象，如果它尚不存在则创建它。

'foo'.intern       # => :foo
'тест'.intern      # => :тест
'こんにちは'.intern # => :こんにちは

相关：参见转换为非字符串。

也别名为： to_sym

length → integer

Source

VALUE
rb_str_length(VALUE str)
{
    return LONG2NUM(str_strlen(str, NULL));
}

返回 self 中的字符数（不是字节数）。

'foo'.length        # => 3
'тест'.length       # => 4
'こんにちは'.length  # => 5

与 String#bytesize 对比。

'foo'.bytesize        # => 3
'тест'.bytesize       # => 8
'こんにちは'.bytesize  # => 15

相关：参见查询。

也别名为： size

lines(record_separator = $/, chomp: false) → array_of_strings

Source

static VALUE
rb_str_lines(int argc, VALUE *argv, VALUE str)
{
    VALUE ary = WANTARRAY("lines", 0);
    return rb_str_enumerate_lines(argc, argv, str, ary);
}

根据给定的参数返回 self 的子串（“行”）。

s = <<~EOT
This is the first line.
This is line two.

This is line four.
This is line five.
EOT

使用默认参数值：

$/ # => "\n"
s.lines
# =>
["This is the first line.\n",
 "This is line two.\n",
 "\n",
 "This is line four.\n",
 "This is line five.\n"]

使用不同的 record_separator

record_separator = ' is '
s.lines(record_separator)
# =>
["This is ",
 "the first line.\nThis is ",
 "line two.\n\nThis is ",
 "line four.\nThis is ",
 "line five.\n"]

使用关键字参数 chomp 为 true，会从每行中删除尾随的换行符。

s.lines(chomp: true)
# =>
["This is the first line.",
 "This is line two.",
 "",
 "This is line four.",
 "This is line five."]

Source

static VALUE
rb_str_ljust(int argc, VALUE *argv, VALUE str)
{
    return rb_str_justify(argc, argv, str, 'l');
}

返回 self 的副本，左对齐，并在必要时用 pad_string 进行右填充。

'hello'.ljust(10)       # => "hello     "
'  hello'.ljust(10)     # => "  hello   "
'hello'.ljust(10, 'ab') # => "helloababa"
'тест'.ljust(10)        # => "тест      "
'こんにちは'.ljust(10)   # => "こんにちは     "

如果 width <= self.length，则返回 self 的副本。

'hello'.ljust(5)  # => "hello"
'hello'.ljust(1)  # => "hello"  # Does not truncate to width.

相关：参见转换为新字符串。

lstrip(*selectors) → new_string

Source

static VALUE
rb_str_lstrip(int argc, VALUE *argv, VALUE str)
{
    char *start;
    long len, loffset;

    RSTRING_GETMEM(str, start, len);
    if (argc > 0) {
        char table[TR_TABLE_SIZE];
        VALUE del = 0, nodel = 0;

        tr_setup_table_multi(table, &del, &nodel, str, argc, argv);
        loffset = lstrip_offset_table(str, start, start+len, STR_ENC_GET(str), table, del, nodel);
    }
    else {
        loffset = lstrip_offset(str, start, start+len, STR_ENC_GET(str));
    }
    if (loffset <= 0) return str_duplicate(rb_cString, str);
    return rb_str_subseq(str, loffset, len - loffset);
}

返回 self 的副本，并删除前导空格；请参阅字符串中的空格。

whitespace = "\x00\t\n\v\f\r "
s = whitespace + 'abc' + whitespace
# => "\u0000\t\n\v\f\r abc\u0000\t\n\v\f\r "
s.lstrip
# => "abc\u0000\t\n\v\f\r "

如果给定了 selectors，则从 self 的开头删除 selectors 中的字符。

s = "---abc+++"
s.lstrip("-") # => "abc+++"

selectors 必须是有效的字符选择器（请参阅 Character Selectors），并且可以使用其任何有效形式，包括否定、范围和转义。

"01234abc56789".lstrip("0-9") # "abc56789"
"01234abc56789".lstrip("0-9", "^4-6") # "4abc56789"

相关：参见转换为新字符串。

lstrip!(*selectors) → self or nil

Source

static VALUE
rb_str_lstrip_bang(int argc, VALUE *argv, VALUE str)
{
    rb_encoding *enc;
    char *start, *s;
    long olen, loffset;

    str_modify_keep_cr(str);
    enc = STR_ENC_GET(str);
    RSTRING_GETMEM(str, start, olen);
    if (argc > 0) {
        char table[TR_TABLE_SIZE];
        VALUE del = 0, nodel = 0;

        tr_setup_table_multi(table, &del, &nodel, str, argc, argv);
        loffset = lstrip_offset_table(str, start, start+olen, enc, table, del, nodel);
    }
    else {
        loffset = lstrip_offset(str, start, start+olen, enc);
    }

    if (loffset > 0) {
        long len = olen-loffset;
        s = start + loffset;
        memmove(start, s, len);
        STR_SET_LEN(str, len);
        TERM_FILL(start+len, rb_enc_mbminlen(enc));
        return str;
    }
    return Qnil;
}

类似于 String#lstrip，但：

在 self 中执行删除操作（而不是在 self 的副本中）。
如果删除了任何字符，则返回 self，否则返回 nil。

match(pattern, offset = 0) {|matchdata| ... } → object

Source

static VALUE
rb_str_match_m(int argc, VALUE *argv, VALUE str)
{
    VALUE re, result;
    if (argc < 1)
        rb_check_arity(argc, 1, 2);
    re = argv[0];
    argv[0] = str;
    result = rb_funcallv(get_pat(re), rb_intern("match"), argc, argv);
    if (!NIL_P(result) && rb_block_given_p()) {
        return rb_yield(result);
    }
    return result;
}

基于 self 和给定参数创建 MatchData 对象；更新 Regexp 全局变量。

通过将 pattern（如果不是 Regexp）转换为 Regexp 来计算 regexp。
```
regexp = Regexp.new(pattern)
```
计算 matchdata，它将是 MatchData 对象或 nil（请参阅 Regexp#match）。
```
matchdata = regexp.match(self[offset..])
```

如果不提供块，则返回计算出的 matchdata 或 nil。

'foo'.match('f')    # => #<MatchData "f">
'foo'.match('o')    # => #<MatchData "o">
'foo'.match('x')    # => nil
'foo'.match('f', 1) # => nil
'foo'.match('o', 1) # => #<MatchData "o">

如果提供了块且计算出的 matchdata 非空，则用 matchdata 调用该块；返回块的返回值。

'foo'.match(/o/) {|matchdata| matchdata } # => #<MatchData "o">

如果提供了块且 matchdata 为 nil，则不调用该块。

'foo'.match(/x/) {|matchdata| fail 'Cannot happen' } # => nil

Source

static VALUE
rb_str_match_m_p(int argc, VALUE *argv, VALUE str)
{
    VALUE re;
    rb_check_arity(argc, 1, 2);
    re = get_pat(argv[0]);
    return rb_reg_match_p(re, str, argc > 1 ? NUM2LONG(argv[1]) : 0);
}

返回是否为 self 和给定参数找到匹配项；不更新 Regexp 全局变量。

通过将 pattern（如果不是 Regexp）转换为 Regexp 来计算 regexp。

regexp = Regexp.new(pattern)

如果 self[offset..].match(regexp) 返回 MatchData 对象，则返回 true，否则返回 false。

'foo'.match?(/o/) # => true
'foo'.match?('o') # => true
'foo'.match?(/x/) # => false
'foo'.match?('f', 1) # => false
'foo'.match?('o', 1) # => true

相关：参见查询。

别名： succ

next!

别名： succ!

oct → integer

Source

static VALUE
rb_str_oct(VALUE str)
{
    return rb_str_to_inum(str, -8, FALSE);
}

将 self 的前导子串解释为八进制、二进制、十进制或十六进制，可能带符号；返回其作为整数的值。

简而言之：

# Interpreted as octal.
'777'.oct   # => 511
'777x'.oct  # => 511
'0777'.oct  # => 511
'0o777'.oct # => 511
'-777'.oct  # => -511
# Not interpreted as octal.
'0b111'.oct # => 7     # Interpreted as binary.
'0d999'.oct # => 999   # Interpreted as decimal.
'0xfff'.oct # => 4095  # Interpreted as hexadecimal.

当前导子串以以下内容开头时，将其解释为八进制：

一个或多个表示八进制数字的字符（每个字符的范围在 '0'..'7'）；要解释的字符串在第一个不表示八进制数字的字符处结束。
```
'7'.oct      @ => 7
'11'.oct     # => 9
'777'.oct    # => 511
'0777'.oct   # => 511
'7778'.oct   # => 511
'777x'.oct   # => 511
```

'0o'，后跟一个或多个八进制数字。

'0o777'.oct  # => 511
'0o7778'.oct # => 511

当当前导子串以以下内容开头时，不将其解释为八进制：

'0b'，后跟一个或多个表示二进制数字的字符（每个字符的范围在 '0'..'1'）；要解释的字符串在第一个不表示二进制数字的字符处结束。该字符串被解释为二进制数字（基数 2）。
```
'0b111'.oct  # => 7
'0b1112'.oct # => 7
```
'0d'，后跟一个或多个表示十进制数字的字符（每个字符的范围在 '0'..'9'）；要解释的字符串在第一个不表示十进制数字的字符处结束。该字符串被解释为十进制数字（基数 10）。
```
'0d999'.oct  # => 999
'0d999x'.oct # => 999
```
'0x'，后跟一个或多个表示十六进制数字的字符（每个字符的范围在 '0'..'9'、'a'..'f' 或 'A'..'F'）；要解释的字符串在第一个不表示十六进制数字的字符处结束。该字符串被解释为十六进制数字（基数 16）。
```
'0xfff'.oct  # => 4095
'0xfffg'.oct # => 4095
```

以上任何一种都可能以 '-' 为前缀，这会否定解释后的值。

'-777'.oct   # => -511
'-0777'.oct  # => -511
'-0b111'.oct # => -7
'-0xfff'.oct # => -4095

对于上面未描述的任何子串，返回零。

'foo'.oct      # => 0
''.oct         # => 0

相关：参见转换为非字符串。

ord → integer

Source

static VALUE
rb_str_ord(VALUE s)
{
    unsigned int c;

    c = rb_enc_codepoint(RSTRING_PTR(s), RSTRING_END(s), STR_ENC_GET(s));
    return UINT2NUM(c);
}

返回 self 第一个字符的整数序数值。

'h'.ord         # => 104
'hello'.ord     # => 104
'тест'.ord      # => 1090
'こんにちは'.ord  # => 12371

Source

static VALUE
rb_str_partition(VALUE str, VALUE sep)
{
    long pos;

    sep = get_pat_quoted(sep, 0);
    if (RB_TYPE_P(sep, T_REGEXP)) {
        if (rb_reg_search(sep, str, 0, 0) < 0) {
            goto failed;
        }
        VALUE match = rb_backref_get();
        struct re_registers *regs = RMATCH_REGS(match);

        pos = BEG(0);
        sep = rb_str_subseq(str, pos, END(0) - pos);
    }
    else {
        pos = rb_str_index(str, sep, 0);
        if (pos < 0) goto failed;
    }
    return rb_ary_new3(3, rb_str_subseq(str, 0, pos),
                          sep,
                          rb_str_subseq(str, pos+RSTRING_LEN(sep),
                                             RSTRING_LEN(str)-pos-RSTRING_LEN(sep)));

  failed:
    return rb_ary_new3(3, str_duplicate(rb_cString, str), str_new_empty_String(str), str_new_empty_String(str));
}

返回一个 3 元素的 self 子串数组。

如果 pattern 匹配，则返回数组：

[pre_match, first_match, post_match]

其中：

first_match 是第一个找到的匹配子串。
pre_match 和 post_match 是前面的和后面的子串。

如果 pattern 未匹配，则返回数组：

[self.dup, "", ""]

请注意，在以下示例中，返回的字符串 'hello' 是 self 的副本，而不是 self 本身。

如果 pattern 是 Regexp，则执行相当于 self.match(pattern) 的操作（同时设置匹配数据变量）。

'hello'.partition(/h/)  # => ["", "h", "ello"]
'hello'.partition(/l/)  # => ["he", "l", "lo"]
'hello'.partition(/l+/) # => ["he", "ll", "o"]
'hello'.partition(/o/)  # => ["hell", "o", ""]
'hello'.partition(/^/)  # => ["", "", "hello"]
'hello'.partition(//)   # => ["", "", "hello"]
'hello'.partition(/$/)  # => ["hello", "", ""]
'hello'.partition(/x/)  # => ["hello", "", ""]

如果 pattern 不是 Regexp，则将其转换为字符串（如果它还不是字符串），然后执行相当于 self.index(pattern) 的操作（并且不设置匹配数据全局变量）。

'hello'.partition('h')     # => ["", "h", "ello"]
'hello'.partition('l')     # => ["he", "l", "lo"]
'hello'.partition('ll')    # => ["he", "ll", "o"]
'hello'.partition('o')     # => ["hell", "o", ""]
'hello'.partition('')      # => ["", "", "hello"]
'hello'.partition('x')     # => ["hello", "", ""]
'тест'.partition('т')      # => ["", "т", "ест"]
'こんにちは'.partition('に') # => ["こん", "に", "ちは"]

相关：参见转换为非字符串。

prepend(*other_strings) → new_string

Source

static VALUE
rb_str_prepend_multi(int argc, VALUE *argv, VALUE str)
{
    str_modifiable(str);

    if (argc == 1) {
        rb_str_update(str, 0L, 0L, argv[0]);
    }
    else if (argc > 1) {
        int i;
        VALUE arg_str = rb_str_tmp_new(0);
        rb_enc_copy(arg_str, str);
        for (i = 0; i < argc; i++) {
            rb_str_append(arg_str, argv[i]);
        }
        rb_str_update(str, 0L, 0L, arg_str);
    }

    return str;
}

将给定 other_strings 的连接作为前缀添加到 self；返回 self。

'baz'.prepend('foo', 'bar') # => "foobarbaz"

相关：参见修改。

replace(other_string) → self

Source

VALUE
rb_str_replace(VALUE str, VALUE str2)
{
    str_modifiable(str);
    if (str == str2) return str;

    StringValue(str2);
    str_discard(str);
    return str_replace(str, str2);
}

用 other_string 的内容替换 self 的内容；返回 self。

s = 'foo'        # => "foo"
s.replace('bar') # => "bar"

相关：参见修改。

也别名为： initialize_copy

reverse → new_string

Source

static VALUE
rb_str_reverse(VALUE str)
{
    rb_encoding *enc;
    VALUE rev;
    char *s, *e, *p;
    int cr;

    if (RSTRING_LEN(str) <= 1) return str_duplicate(rb_cString, str);
    enc = STR_ENC_GET(str);
    rev = rb_str_new(0, RSTRING_LEN(str));
    s = RSTRING_PTR(str); e = RSTRING_END(str);
    p = RSTRING_END(rev);
    cr = ENC_CODERANGE(str);

    if (RSTRING_LEN(str) > 1) {
        if (single_byte_optimizable(str)) {
            while (s < e) {
                *--p = *s++;
            }
        }
        else if (cr == ENC_CODERANGE_VALID) {
            while (s < e) {
                int clen = rb_enc_fast_mbclen(s, e, enc);

                p -= clen;
                memcpy(p, s, clen);
                s += clen;
            }
        }
        else {
            cr = rb_enc_asciicompat(enc) ?
                ENC_CODERANGE_7BIT : ENC_CODERANGE_VALID;
            while (s < e) {
                int clen = rb_enc_mbclen(s, e, enc);

                if (clen > 1 || (*s & 0x80)) cr = ENC_CODERANGE_UNKNOWN;
                p -= clen;
                memcpy(p, s, clen);
                s += clen;
            }
        }
    }
    STR_SET_LEN(rev, RSTRING_LEN(str));
    str_enc_copy_direct(rev, str);
    ENC_CODERANGE_SET(rev, cr);

    return rev;
}

返回一个新字符串，其中包含 self 中字符的倒序。

'drawer'.reverse       # => "reward"
'reviled'.reverse      # => "deliver"
'stressed'.reverse     # => "desserts"
'semordnilaps'.reverse # => "spalindromes"

相关：参见转换为新字符串。

reverse! → self

Source

static VALUE
rb_str_reverse_bang(VALUE str)
{
    if (RSTRING_LEN(str) > 1) {
        if (single_byte_optimizable(str)) {
            char *s, *e, c;

            str_modify_keep_cr(str);
            s = RSTRING_PTR(str);
            e = RSTRING_END(str) - 1;
            while (s < e) {
                c = *s;
                *s++ = *e;
                *e-- = c;
            }
        }
        else {
            str_shared_replace(str, rb_str_reverse(str));
        }
    }
    else {
        str_modify_keep_cr(str);
    }
    return str;
}

返回 self，其字符被反转。

'drawer'.reverse!       # => "reward"
'reviled'.reverse!      # => "deliver"
'stressed'.reverse!     # => "desserts"
'semordnilaps'.reverse! # => "spalindromes"

Source

static VALUE
rb_str_rindex_m(int argc, VALUE *argv, VALUE str)
{
    VALUE sub;
    VALUE initpos;
    rb_encoding *enc = STR_ENC_GET(str);
    long pos, len = str_strlen(str, enc); /* str's enc */

    if (rb_scan_args(argc, argv, "11", &sub, &initpos) == 2) {
        pos = NUM2LONG(initpos);
        if (pos < 0 && (pos += len) < 0) {
            if (RB_TYPE_P(sub, T_REGEXP)) {
                rb_backref_set(Qnil);
            }
            return Qnil;
        }
        if (pos > len) pos = len;
    }
    else {
        pos = len;
    }

    if (RB_TYPE_P(sub, T_REGEXP)) {
        /* enc = rb_enc_check(str, sub); */
        pos = str_offset(RSTRING_PTR(str), RSTRING_END(str), pos,
                         enc, single_byte_optimizable(str));

        if (rb_reg_search(sub, str, pos, 1) >= 0) {
            VALUE match = rb_backref_get();
            struct re_registers *regs = RMATCH_REGS(match);
            pos = rb_str_sublen(str, BEG(0));
            return LONG2NUM(pos);
        }
    }
    else {
        StringValue(sub);
        pos = rb_str_rindex(str, sub, pos);
        if (pos >= 0) {
            pos = rb_str_sublen(str, pos);
            return LONG2NUM(pos);
        }
    }
    return Qnil;
}

返回最后一个匹配给定参数 pattern 的子串的整数位置，如果未找到则返回 nil。

当 pattern 是字符串时，返回 self 中最后一个匹配子串的索引。

'foo'.rindex('f')       # => 0
'foo'.rindex('o')       # => 2
'foo'.rindex('oo'       # => 1
'foo'.rindex('ooo')     # => nil
'тест'.rindex('т')      # => 3
'こんにちは'.rindex('ち') # => 3

当 pattern 是 Regexp 时，返回 self 中最后一个匹配项的索引。

'foo'.rindex(/f/)   # => 0
'foo'.rindex(/o/)   # => 2
'foo'.rindex(/oo/)  # => 1
'foo'.rindex(/ooo/) # => nil

当 offset 为非负数时，它指定字符串中用于结束搜索的最大起始位置。

'foo'.rindex('o', 0) # => nil
'foo'.rindex('o', 1) # => 1
'foo'.rindex('o', 2) # => 2
'foo'.rindex('o', 3) # => 2

当 offset 参数为负整数时，通过从 self 末尾开始计数来选择搜索位置。

'foo'.rindex('o', -1) # => 2
'foo'.rindex('o', -2) # => 1
'foo'.rindex('o', -3) # => nil
'foo'.rindex('o', -4) # => nil

最后一个匹配意味着从可能的最后一个位置开始，而不是最后一个最长匹配。

'foo'.rindex(/o+/) # => 2
$~                 # => #<MatchData "o">

要获得最后一个最长匹配，请结合使用负向后行断言。

'foo'.rindex(/(?<!o)o+/) # => 1
$~                       # => #<MatchData "oo">

或者对 String#index 使用负向前行断言。

'foo'.index(/o+(?!.*o)/) # => 1
$~                       # => #<MatchData "oo">

Source

static VALUE
rb_str_rjust(int argc, VALUE *argv, VALUE str)
{
    return rb_str_justify(argc, argv, str, 'r');
}

返回 self 的右对齐副本。

如果整数参数 width 大于 self 的大小（以字符计），则返回一个长度为 width 的新字符串，该字符串是 self 的副本，右对齐并在左侧用 pad_string 填充。

'hello'.rjust(10)       # => "     hello"
'hello  '.rjust(10)     # => "   hello  "
'hello'.rjust(10, 'ab') # => "ababahello"
'тест'.rjust(10)        # => "      тест"
'こんにちは'.rjust(10)    # => "     こんにちは"

如果 width <= self.size，则返回 self 的副本。

'hello'.rjust(5, 'ab')  # => "hello"
'hello'.rjust(1, 'ab')  # => "hello"

Source

static VALUE
rb_str_rpartition(VALUE str, VALUE sep)
{
    long pos = RSTRING_LEN(str);

    sep = get_pat_quoted(sep, 0);
    if (RB_TYPE_P(sep, T_REGEXP)) {
        if (rb_reg_search(sep, str, pos, 1) < 0) {
            goto failed;
        }
        VALUE match = rb_backref_get();
        struct re_registers *regs = RMATCH_REGS(match);

        pos = BEG(0);
        sep = rb_str_subseq(str, pos, END(0) - pos);
    }
    else {
        pos = rb_str_sublen(str, pos);
        pos = rb_str_rindex(str, sep, pos);
        if (pos < 0) {
            goto failed;
        }
    }

    return rb_ary_new3(3, rb_str_subseq(str, 0, pos),
                          sep,
                          rb_str_subseq(str, pos+RSTRING_LEN(sep),
                                        RSTRING_LEN(str)-pos-RSTRING_LEN(sep)));
  failed:
    return rb_ary_new3(3, str_new_empty_String(str), str_new_empty_String(str), str_duplicate(rb_cString, str));
}

返回一个 3 元素的 self 子串数组。

在 self 中搜索 pattern 的匹配项，查找最后一个匹配项。

如果 pattern 未匹配，则返回数组：

["", "", self.dup]

如果 pattern 匹配，则返回数组：

[pre_match, last_match, post_match]

其中：

last_match 是最后一个找到的匹配子串。
pre_match 和 post_match 是前面的和后面的子串。

使用的模式是：

pattern 本身，如果它是 Regexp。
Regexp.quote(pattern)，如果 pattern 是字符串。

请注意，在以下示例中，返回的字符串 'hello' 是 self 的副本，而不是 self 本身。

如果 pattern 是 Regexp，则搜索最后一个匹配子串（同时设置匹配数据全局变量）。

'hello'.rpartition(/l/)     # => ["hel", "l", "o"]
'hello'.rpartition(/ll/)    # => ["he", "ll", "o"]
'hello'.rpartition(/h/)     # => ["", "h", "ello"]
'hello'.rpartition(/o/)     # => ["hell", "o", ""]
'hello'.rpartition(//)      # => ["hello", "", ""]
'hello'.rpartition(/x/)     # => ["", "", "hello"]
'тест'.rpartition(/т/)      # => ["тес", "т", ""]
'こんにちは'.rpartition(/に/) # => ["こん", "に", "ちは"]

如果 pattern 不是 Regexp，则将其转换为字符串（如果它还不是字符串），然后搜索最后一个匹配子串（并且不设置匹配数据全局变量）。

'hello'.rpartition('l')     # => ["hel", "l", "o"]
'hello'.rpartition('ll')    # => ["he", "ll", "o"]
'hello'.rpartition('h')     # => ["", "h", "ello"]
'hello'.rpartition('o')     # => ["hell", "o", ""]
'hello'.rpartition('')      # => ["hello", "", ""]
'тест'.rpartition('т')      # => ["тес", "т", ""]
'こんにちは'.rpartition('に') # => ["こん", "に", "ちは"]

相关：参见转换为非字符串。

rstrip(*selectors) → new_string

Source

static VALUE
rb_str_rstrip(int argc, VALUE *argv, VALUE str)
{
    rb_encoding *enc;
    char *start;
    long olen, roffset;

    enc = STR_ENC_GET(str);
    RSTRING_GETMEM(str, start, olen);
    if (argc > 0) {
        char table[TR_TABLE_SIZE];
        VALUE del = 0, nodel = 0;

        tr_setup_table_multi(table, &del, &nodel, str, argc, argv);
        roffset = rstrip_offset_table(str, start, start+olen, enc, table, del, nodel);
    }
    else {
        roffset = rstrip_offset(str, start, start+olen, enc);
    }
    if (roffset <= 0) return str_duplicate(rb_cString, str);
    return rb_str_subseq(str, 0, olen-roffset);
}

返回 self 的副本，并删除尾随空格；请参阅字符串中的空格。

whitespace = "\x00\t\n\v\f\r "
s = whitespace + 'abc' + whitespace
s        # => "\u0000\t\n\v\f\r abc\u0000\t\n\v\f\r "
s.rstrip # => "\u0000\t\n\v\f\r abc"

如果给定了 selectors，则从 self 的末尾删除 selectors 中的字符。

s = "---abc+++"
s.rstrip("+") # => "---abc"

selectors 必须是有效的字符选择器（请参阅 Character Selectors），并且可以使用其任何有效形式，包括否定、范围和转义。

"01234abc56789".rstrip("0-9") # "01234abc"
"01234abc56789".rstrip("0-9", "^4-6") # "01234abc56"

相关：参见转换为新字符串。

rstrip!(*selectors) → self or nil

Source

static VALUE
rb_str_rstrip_bang(int argc, VALUE *argv, VALUE str)
{
    rb_encoding *enc;
    char *start;
    long olen, roffset;

    str_modify_keep_cr(str);
    enc = STR_ENC_GET(str);
    RSTRING_GETMEM(str, start, olen);
    if (argc > 0) {
        char table[TR_TABLE_SIZE];
        VALUE del = 0, nodel = 0;

        tr_setup_table_multi(table, &del, &nodel, str, argc, argv);
        roffset = rstrip_offset_table(str, start, start+olen, enc, table, del, nodel);
    }
    else {
        roffset = rstrip_offset(str, start, start+olen, enc);
    }
    if (roffset > 0) {
        long len = olen - roffset;

        STR_SET_LEN(str, len);
        TERM_FILL(start+len, rb_enc_mbminlen(enc));
        return str;
    }
    return Qnil;
}

类似于 String#rstrip，但：

在 self 中执行删除操作（而不是在 self 的副本中）。
如果删除了任何字符，则返回 self，否则返回 nil。

相关：参见修改。

scan(pattern) → array_of_results

scan(pattern) {|result| ... } → self

Source

static VALUE
rb_str_scan(VALUE str, VALUE pat)
{
    VALUE result;
    long start = 0;
    long last = -1, prev = 0;
    char *p = RSTRING_PTR(str); long len = RSTRING_LEN(str);

    pat = get_pat_quoted(pat, 1);
    mustnot_broken(str);
    if (!rb_block_given_p()) {
        VALUE ary = rb_ary_new();

        while (!NIL_P(result = scan_once(str, pat, &start, 0))) {
            last = prev;
            prev = start;
            rb_ary_push(ary, result);
        }
        if (last >= 0) rb_pat_search(pat, str, last, 1);
        else rb_backref_set(Qnil);
        return ary;
    }

    while (!NIL_P(result = scan_once(str, pat, &start, 1))) {
        last = prev;
        prev = start;
        rb_yield(result);
        str_mod_check(str, p, len);
    }
    if (last >= 0) rb_pat_search(pat, str, last, 1);
    return str;
}

匹配 self 中的模式。

如果 pattern 是 Regexp，则使用的模式是 pattern 本身。
如果 pattern 是字符串，则使用的模式是 Regexp.quote(pattern)。

生成匹配结果的集合，并更新 Regexp 相关全局变量。

如果模式不包含分组，则每个结果都是一个匹配的子串。
如果模式包含分组，则每个结果都是一个数组，其中包含每个分组的匹配子串。

如果不提供块，则返回结果数组。

'cruel world'.scan(/\w+/)      # => ["cruel", "world"]
'cruel world'.scan(/.../)      # => ["cru", "el ", "wor"]
'cruel world'.scan(/(...)/)    # => [["cru"], ["el "], ["wor"]]
'cruel world'.scan(/(..)(..)/) # => [["cr", "ue"], ["l ", "wo"]]
'тест'.scan(/../)              # => ["те", "ст"]
'こんにちは'.scan(/../)         # => ["こん", "にち"]
'abracadabra'.scan('ab')       # => ["ab", "ab"]
'abracadabra'.scan('nosuch')   # => []

如果提供了块，则用每个结果调用该块；返回 self。

'cruel world'.scan(/\w+/) {|w| p w }
# => "cruel"
# => "world"
'cruel world'.scan(/(.)(.)/) {|x, y| p [x, y] }
# => ["c", "r"]
# => ["u", "e"]
# => ["l", " "]
# => ["w", "o"]
# => ["r", "l"]

scrub{|sequence| ... } → new_string

Source

static VALUE
str_scrub(int argc, VALUE *argv, VALUE str)
{
    VALUE repl = argc ? (rb_check_arity(argc, 0, 1), argv[0]) : Qnil;
    VALUE new = rb_str_scrub(str, repl);
    return NIL_P(new) ? str_duplicate(rb_cString, str): new;
}

返回 self 的副本，其中每个无效字节序列都替换为给定的 replacement_string。

如果不提供块，则用给定的 default_replacement_string 替换每个无效序列（默认情况下，对于 Unicode 编码为 "�"，否则为 '?'）。

"foo\x81\x81bar"scrub                             # => "foo��bar"
"foo\x81\x81bar".force_encoding('US-ASCII').scrub # => "foo??bar"
"foo\x81\x81bar".scrub('xyzzy')                   # => "fooxyzzyxyzzybar"

如果提供了块，则用每个无效序列调用该块，并将该序列替换为块的返回值。

"foo\x81\x81bar".scrub {|sequence| p sequence; 'XYZZY' } # => "fooXYZZYXYZZYbar"

输出：

"\x81"
"\x81"

scrub!{|sequence| ... } → self

Source

static VALUE
str_scrub_bang(int argc, VALUE *argv, VALUE str)
{
    VALUE repl = argc ? (rb_check_arity(argc, 0, 1), argv[0]) : Qnil;
    VALUE new = rb_str_scrub(str, repl);
    if (!NIL_P(new)) rb_str_replace(str, new);
    return str;
}

类似于 String#scrub，但：

所有替换都发生在 self 中。
返回 self。

相关：参见修改。

setbyte(index, integer) → integer

Source

VALUE
rb_str_setbyte(VALUE str, VALUE index, VALUE value)
{
    long pos = NUM2LONG(index);
    long len = RSTRING_LEN(str);
    char *ptr, *head, *left = 0;
    rb_encoding *enc;
    int cr = ENC_CODERANGE_UNKNOWN, width, nlen;

    if (pos < -len || len <= pos)
        rb_raise(rb_eIndexError, "index %ld out of string", pos);
    if (pos < 0)
        pos += len;

    VALUE v = rb_to_int(value);
    VALUE w = rb_int_and(v, INT2FIX(0xff));
    char byte = (char)(NUM2INT(w) & 0xFF);

    if (!str_independent(str))
        str_make_independent(str);
    enc = STR_ENC_GET(str);
    head = RSTRING_PTR(str);
    ptr = &head[pos];
    if (!STR_EMBED_P(str)) {
        cr = ENC_CODERANGE(str);
        switch (cr) {
          case ENC_CODERANGE_7BIT:
            left = ptr;
            *ptr = byte;
            if (ISASCII(byte)) goto end;
            nlen = rb_enc_precise_mbclen(left, head+len, enc);
            if (!MBCLEN_CHARFOUND_P(nlen))
                ENC_CODERANGE_SET(str, ENC_CODERANGE_BROKEN);
            else
                ENC_CODERANGE_SET(str, ENC_CODERANGE_VALID);
            goto end;
          case ENC_CODERANGE_VALID:
            left = rb_enc_left_char_head(head, ptr, head+len, enc);
            width = rb_enc_precise_mbclen(left, head+len, enc);
            *ptr = byte;
            nlen = rb_enc_precise_mbclen(left, head+len, enc);
            if (!MBCLEN_CHARFOUND_P(nlen))
                ENC_CODERANGE_SET(str, ENC_CODERANGE_BROKEN);
            else if (MBCLEN_CHARFOUND_LEN(nlen) != width || ISASCII(byte))
                ENC_CODERANGE_CLEAR(str);
            goto end;
        }
    }
    ENC_CODERANGE_CLEAR(str);
    *ptr = byte;

  end:
    return value;
}

将零基偏移量 index 处的字节设置为给定 integer 的值；返回 integer。

s = 'xyzzy'
s.setbyte(2, 129) # => 129
s                 # => "xy\x81zy"

相关：参见修改。

shellescape → string

Source

# File lib/shellwords.rb, line 238
def shellescape
  Shellwords.escape(self)
end

转义 str，使其可以安全地用于 Bourne shell 命令。

有关详细信息，请参阅 Shellwords.shellescape。

shellsplit → array

Source

# File lib/shellwords.rb, line 227
def shellsplit
  Shellwords.split(self)
end

以 UNIX Bourne shell 的方式将 str 分割成一个令牌数组。

有关详细信息，请参阅 Shellwords.shellsplit。

size

别名： length

slice

别名： []

slice!(index) → new_string or nil

slice!(start, length) → new_string or nil

slice!(range) → new_string or nil

slice!(regexp, capture = 0) → new_string or nil

slice!(substring) → new_string or nil

Source

static VALUE
rb_str_slice_bang(int argc, VALUE *argv, VALUE str)
{
    VALUE result = Qnil;
    VALUE indx;
    long beg, len = 1;
    char *p;

    rb_check_arity(argc, 1, 2);
    str_modify_keep_cr(str);
    indx = argv[0];
    if (RB_TYPE_P(indx, T_REGEXP)) {
        if (rb_reg_search(indx, str, 0, 0) < 0) return Qnil;
        VALUE match = rb_backref_get();
        struct re_registers *regs = RMATCH_REGS(match);
        int nth = 0;
        if (argc > 1 && (nth = rb_reg_backref_number(match, argv[1])) < 0) {
            if ((nth += regs->num_regs) <= 0) return Qnil;
        }
        else if (nth >= regs->num_regs) return Qnil;
        beg = BEG(nth);
        len = END(nth) - beg;
        goto subseq;
    }
    else if (argc == 2) {
        beg = NUM2LONG(indx);
        len = NUM2LONG(argv[1]);
        goto num_index;
    }
    else if (FIXNUM_P(indx)) {
        beg = FIX2LONG(indx);
        if (!(p = rb_str_subpos(str, beg, &len))) return Qnil;
        if (!len) return Qnil;
        beg = p - RSTRING_PTR(str);
        goto subseq;
    }
    else if (RB_TYPE_P(indx, T_STRING)) {
        beg = rb_str_index(str, indx, 0);
        if (beg == -1) return Qnil;
        len = RSTRING_LEN(indx);
        result = str_duplicate(rb_cString, indx);
        goto squash;
    }
    else {
        switch (rb_range_beg_len(indx, &beg, &len, str_strlen(str, NULL), 0)) {
          case Qnil:
            return Qnil;
          case Qfalse:
            beg = NUM2LONG(indx);
            if (!(p = rb_str_subpos(str, beg, &len))) return Qnil;
            if (!len) return Qnil;
            beg = p - RSTRING_PTR(str);
            goto subseq;
          default:
            goto num_index;
        }
    }

  num_index:
    if (!(p = rb_str_subpos(str, beg, &len))) return Qnil;
    beg = p - RSTRING_PTR(str);

  subseq:
    result = rb_str_new(RSTRING_PTR(str)+beg, len);
    rb_enc_cr_str_copy_for_substr(result, str);

  squash:
    if (len > 0) {
        if (beg == 0) {
            rb_str_drop_bytes(str, len);
        }
        else {
            char *sptr = RSTRING_PTR(str);
            long slen = RSTRING_LEN(str);
            if (beg + len > slen) /* pathological check */
                len = slen - beg;
            memmove(sptr + beg,
                    sptr + beg + len,
                    slen - (beg + len));
            slen -= len;
            STR_SET_LEN(str, slen);
            TERM_FILL(&sptr[slen], TERM_LEN(str));
        }
    }
    return result;
}

类似于 String#[]（及其别名 String#slice），但：

在 self 中执行替换（而不是在 self 的副本中）。
如果进行了任何修改，则返回移除的子串，否则返回 nil。

一些示例：

s = 'hello'
s.slice!('e') # => "e"
s             # => "hllo"
s.slice!('e') # => nil
s             # => "hllo"

split(field_sep = $;, limit = 0) {|substring| ... } → self

Source

static VALUE
rb_str_split_m(int argc, VALUE *argv, VALUE str)
{
    rb_encoding *enc;
    VALUE spat;
    VALUE limit;
    split_type_t split_type;
    long beg, end, i = 0, empty_count = -1;
    int lim = 0;
    VALUE result, tmp;

    result = rb_block_given_p() ? Qfalse : Qnil;
    if (rb_scan_args(argc, argv, "02", &spat, &limit) == 2) {
        lim = NUM2INT(limit);
        if (lim <= 0) limit = Qnil;
        else if (lim == 1) {
            if (RSTRING_LEN(str) == 0)
                return result ? rb_ary_new2(0) : str;
            tmp = str_duplicate(rb_cString, str);
            if (!result) {
                rb_yield(tmp);
                return str;
            }
            return rb_ary_new3(1, tmp);
        }
        i = 1;
    }
    if (NIL_P(limit) && !lim) empty_count = 0;

    enc = STR_ENC_GET(str);
    split_type = SPLIT_TYPE_REGEXP;
    if (!NIL_P(spat)) {
        spat = get_pat_quoted(spat, 0);
    }
    else if (NIL_P(spat = rb_fs)) {
        split_type = SPLIT_TYPE_AWK;
    }
    else if (!(spat = rb_fs_check(spat))) {
        rb_raise(rb_eTypeError, "value of $; must be String or Regexp");
    }
    else {
        rb_category_warn(RB_WARN_CATEGORY_DEPRECATED, "$; is set to non-nil value");
    }
    if (split_type != SPLIT_TYPE_AWK) {
        switch (BUILTIN_TYPE(spat)) {
          case T_REGEXP:
            rb_reg_options(spat); /* check if uninitialized */
            tmp = RREGEXP_SRC(spat);
            split_type = literal_split_pattern(tmp, SPLIT_TYPE_REGEXP);
            if (split_type == SPLIT_TYPE_AWK) {
                spat = tmp;
                split_type = SPLIT_TYPE_STRING;
            }
            break;

          case T_STRING:
            mustnot_broken(spat);
            split_type = literal_split_pattern(spat, SPLIT_TYPE_STRING);
            break;

          default:
            UNREACHABLE_RETURN(Qnil);
        }
    }

#define SPLIT_STR(beg, len) ( \
        empty_count = split_string(result, str, beg, len, empty_count), \
        str_mod_check(str, str_start, str_len))

    beg = 0;
    char *ptr = RSTRING_PTR(str);
    char *const str_start = ptr;
    const long str_len = RSTRING_LEN(str);
    char *const eptr = str_start + str_len;
    if (split_type == SPLIT_TYPE_AWK) {
        char *bptr = ptr;
        int skip = 1;
        unsigned int c;

        if (result) result = rb_ary_new();
        end = beg;
        if (is_ascii_string(str)) {
            while (ptr < eptr) {
                c = (unsigned char)*ptr++;
                if (skip) {
                    if (ascii_isspace(c)) {
                        beg = ptr - bptr;
                    }
                    else {
                        end = ptr - bptr;
                        skip = 0;
                        if (!NIL_P(limit) && lim <= i) break;
                    }
                }
                else if (ascii_isspace(c)) {
                    SPLIT_STR(beg, end-beg);
                    skip = 1;
                    beg = ptr - bptr;
                    if (!NIL_P(limit)) ++i;
                }
                else {
                    end = ptr - bptr;
                }
            }
        }
        else {
            while (ptr < eptr) {
                int n;

                c = rb_enc_codepoint_len(ptr, eptr, &n, enc);
                ptr += n;
                if (skip) {
                    if (rb_isspace(c)) {
                        beg = ptr - bptr;
                    }
                    else {
                        end = ptr - bptr;
                        skip = 0;
                        if (!NIL_P(limit) && lim <= i) break;
                    }
                }
                else if (rb_isspace(c)) {
                    SPLIT_STR(beg, end-beg);
                    skip = 1;
                    beg = ptr - bptr;
                    if (!NIL_P(limit)) ++i;
                }
                else {
                    end = ptr - bptr;
                }
            }
        }
    }
    else if (split_type == SPLIT_TYPE_STRING) {
        char *substr_start = ptr;
        char *sptr = RSTRING_PTR(spat);
        long slen = RSTRING_LEN(spat);

        if (result) result = rb_ary_new();
        mustnot_broken(str);
        enc = rb_enc_check(str, spat);
        while (ptr < eptr &&
               (end = rb_memsearch(sptr, slen, ptr, eptr - ptr, enc)) >= 0) {
            /* Check we are at the start of a char */
            char *t = rb_enc_right_char_head(ptr, ptr + end, eptr, enc);
            if (t != ptr + end) {
                ptr = t;
                continue;
            }
            SPLIT_STR(substr_start - str_start, (ptr+end) - substr_start);
            str_mod_check(spat, sptr, slen);
            ptr += end + slen;
            substr_start = ptr;
            if (!NIL_P(limit) && lim <= ++i) break;
        }
        beg = ptr - str_start;
    }
    else if (split_type == SPLIT_TYPE_CHARS) {
        int n;

        if (result) result = rb_ary_new_capa(RSTRING_LEN(str));
        mustnot_broken(str);
        enc = rb_enc_get(str);
        while (ptr < eptr &&
               (n = rb_enc_precise_mbclen(ptr, eptr, enc)) > 0) {
            SPLIT_STR(ptr - str_start, n);
            ptr += n;
            if (!NIL_P(limit) && lim <= ++i) break;
        }
        beg = ptr - str_start;
    }
    else {
        if (result) result = rb_ary_new();
        long len = RSTRING_LEN(str);
        long start = beg;
        long idx;
        int last_null = 0;
        struct re_registers *regs;
        VALUE match = 0;

        for (; rb_reg_search(spat, str, start, 0) >= 0;
             (match ? (rb_match_unbusy(match), rb_backref_set(match)) : (void)0)) {
            match = rb_backref_get();
            if (!result) rb_match_busy(match);
            regs = RMATCH_REGS(match);
            end = BEG(0);
            if (start == end && BEG(0) == END(0)) {
                if (!ptr) {
                    SPLIT_STR(0, 0);
                    break;
                }
                else if (last_null == 1) {
                    SPLIT_STR(beg, rb_enc_fast_mbclen(ptr+beg, eptr, enc));
                    beg = start;
                }
                else {
                    if (start == len)
                        start++;
                    else
                        start += rb_enc_fast_mbclen(ptr+start,eptr,enc);
                    last_null = 1;
                    continue;
                }
            }
            else {
                SPLIT_STR(beg, end-beg);
                beg = start = END(0);
            }
            last_null = 0;

            for (idx=1; idx < regs->num_regs; idx++) {
                if (BEG(idx) == -1) continue;
                SPLIT_STR(BEG(idx), END(idx)-BEG(idx));
            }
            if (!NIL_P(limit) && lim <= ++i) break;
        }
        if (match) rb_match_unbusy(match);
    }
    if (RSTRING_LEN(str) > 0 && (!NIL_P(limit) || RSTRING_LEN(str) > beg || lim < 0)) {
        SPLIT_STR(beg, RSTRING_LEN(str)-beg);
    }

    return result ? result : str;
}

通过在给定字段分隔符 field_sep 的每个出现处分割 self 来创建子串数组。

如果不提供参数，则使用字段分隔符 $; 进行分割，其默认值为 nil。

如果不提供块，则返回子串数组。

'abracadabra'.split('a') # => ["", "br", "c", "d", "br"]

当 field_sep 为 nil 或 ' '（单个空格）时，在每个空格序列处分割。

'foo bar baz'.split(nil)          # => ["foo", "bar", "baz"]
'foo bar baz'.split(' ')          # => ["foo", "bar", "baz"]
"foo \n\tbar\t\n  baz".split(' ') # => ["foo", "bar", "baz"]
'foo  bar   baz'.split(' ')       # => ["foo", "bar", "baz"]
''.split(' ')                     # => []

当 field_sep 为空字符串时，在每个字符处分割。

'abracadabra'.split('') # => ["a", "b", "r", "a", "c", "a", "d", "a", "b", "r", "a"]
''.split('')            # => []
'тест'.split('')        # => ["т", "е", "с", "т"]
'こんにちは'.split('')   # => ["こ", "ん", "に", "ち", "は"]

当 field_sep 为非空字符串且与 ' '（单个空格）不同时，将其用作分隔符。

'abracadabra'.split('a')  # => ["", "br", "c", "d", "br"]
'abracadabra'.split('ab') # => ["", "racad", "ra"]
''.split('a')             # => []
'тест'.split('т')         # => ["", "ес"]
'こんにちは'.split('に')    # => ["こん", "ちは"]

当 field_sep 为 Regexp 时，在匹配子串的每个出现处分割。

'abracadabra'.split(/ab/) # => ["", "racad", "ra"]
'1 + 1 == 2'.split(/\W+/) # => ["1", "1", "2"]
'abracadabra'.split(//)   # => ["a", "b", "r", "a", "c", "a", "d", "a", "b", "r", "a"]

如果 Regexp 包含分组，则其匹配项包含在返回的数组中。

'1:2:3'.split(/(:)()()/, 2) # => ["1", ":", "", "", "2:3"]

参数 limit 设置返回数组的大小限制；它还决定是否在返回的数组中包含尾随的空字符串。

当 limit 为零时，数组大小没有限制，但会省略尾随的空字符串。

'abracadabra'.split('', 0)  # => ["a", "b", "r", "a", "c", "a", "d", "a", "b", "r", "a"]
'abracadabra'.split('a', 0) # => ["", "br", "c", "d", "br"]  # Empty string after last 'a' omitted.

当 limit 为正整数时，数组大小有限制（最多发生 n - 1 次分割），并包含尾随的空字符串。

'abracadabra'.split('', 3)   # => ["a", "b", "racadabra"]
'abracadabra'.split('a', 3)  # => ["", "br", "cadabra"]
'abracadabra'.split('', 30)  # => ["a", "b", "r", "a", "c", "a", "d", "a", "b", "r", "a", ""]
'abracadabra'.split('a', 30) # => ["", "br", "c", "d", "br", ""]
'abracadabra'.split('', 1)   # => ["abracadabra"]
'abracadabra'.split('a', 1)  # => ["abracadabra"]

当 limit 为负数时，数组大小没有限制，并且会省略尾随的空字符串。

'abracadabra'.split('', -1)  # => ["a", "b", "r", "a", "c", "a", "d", "a", "b", "r", "a", ""]
'abracadabra'.split('a', -1) # => ["", "br", "c", "d", "br", ""]

如果提供了块，则用每个子串调用该块，并返回 self。

'foo bar baz'.split(' ') {|substring| p substring }

输出：

"foo"
"bar"
"baz"

请注意，上述示例在功能上等同于：

'foo bar baz'.split(' ').each {|substring| p substring }

输出：

"foo"
"bar"
"baz"

但是后者：

性能较差，因为它创建了一个中间数组。
返回一个数组（而不是 self）。

相关：参见转换为非字符串。

squeeze(*selectors) → new_string

Source

static VALUE
rb_str_squeeze(int argc, VALUE *argv, VALUE str)
{
    str = str_duplicate(rb_cString, str);
    rb_str_squeeze_bang(argc, argv, str);
    return str;
}

返回 self 的副本，其中指定的字符的每个元组（重复、三连等）被“压缩”为单个字符。

要被压缩的元组由参数 selectors 指定，每个参数都是一个字符串；请参阅 Character Selectors。

单个参数可以是单个字符。

'Noooooo!'.squeeze('o')      # => "No!"
'foo  bar  baz'.squeeze(' ') # => "foo bar baz"
'Mississippi'.squeeze('s')   # => "Misisippi"
'Mississippi'.squeeze('p')   # => "Mississipi"
'Mississippi'.squeeze('x')   # => "Mississippi"  # Unused selector character is ignored.
'бессонница'.squeeze('с')    # => "бесонница"
'бессонница'.squeeze('н')    # => "бессоница"

单个参数可以是字符字符串。

'Mississippi'.squeeze('sp')       # => "Misisipi"
'Mississippi'.squeeze('ps')       # => "Misisipi"   # Order doesn't matter.
'Mississippi'.squeeze('nonsense') # => "Misisippi"  # Unused selector characters are ignored.

单个参数可以是字符范围。

'Mississippi'.squeeze('a-p') # => "Mississipi"
'Mississippi'.squeeze('q-z') # => "Misisippi"
'Mississippi'.squeeze('a-z') # => "Misisipi"

允许使用多个参数；请参阅 Multiple Character Selectors。

相关：参见转换为新字符串。

squeeze!(*selectors) → self or nil

Source

static VALUE
rb_str_squeeze_bang(int argc, VALUE *argv, VALUE str)
{
    char squeez[TR_TABLE_SIZE];
    rb_encoding *enc = 0;
    VALUE del = 0, nodel = 0;
    unsigned char *s, *send, *t;
    int i, modify = 0;
    int ascompat, singlebyte = single_byte_optimizable(str);
    unsigned int save;

    if (argc == 0) {
        enc = STR_ENC_GET(str);
    }
    else {
        for (i=0; i<argc; i++) {
            VALUE s = argv[i];

            StringValue(s);
            enc = rb_enc_check(str, s);
            if (singlebyte && !single_byte_optimizable(s))
                singlebyte = 0;
            tr_setup_table(s, squeez, i==0, &del, &nodel, enc);
        }
    }

    str_modify_keep_cr(str);
    s = t = (unsigned char *)RSTRING_PTR(str);
    if (!s || RSTRING_LEN(str) == 0) return Qnil;
    send = (unsigned char *)RSTRING_END(str);
    save = -1;
    ascompat = rb_enc_asciicompat(enc);

    if (singlebyte) {
        while (s < send) {
            unsigned int c = *s++;
            if (c != save || (argc > 0 && !squeez[c])) {
                *t++ = save = c;
            }
        }
    }
    else {
        while (s < send) {
            unsigned int c;
            int clen;

            if (ascompat && (c = *s) < 0x80) {
                if (c != save || (argc > 0 && !squeez[c])) {
                    *t++ = save = c;
                }
                s++;
            }
            else {
                c = rb_enc_codepoint_len((char *)s, (char *)send, &clen, enc);

                if (c != save || (argc > 0 && !tr_find(c, squeez, del, nodel))) {
                    if (t != s) rb_enc_mbcput(c, t, enc);
                    save = c;
                    t += clen;
                }
                s += clen;
            }
        }
    }

    TERM_FILL((char *)t, TERM_LEN(str));
    if ((char *)t - RSTRING_PTR(str) != RSTRING_LEN(str)) {
        STR_SET_LEN(str, (char *)t - RSTRING_PTR(str));
        modify = 1;
    }

    if (modify) return str;
    return Qnil;
}

类似于 String#squeeze，但：

在 self 中执行压缩（而不是在 self 的副本中）。
如果进行了任何更改，则返回 self，否则返回 nil。

相关：参见修改。

start_with?(*patterns) → true or false

Source

static VALUE
rb_str_start_with(int argc, VALUE *argv, VALUE str)
{
    int i;

    for (i=0; i<argc; i++) {
        VALUE tmp = argv[i];
        if (RB_TYPE_P(tmp, T_REGEXP)) {
            if (rb_reg_start_with_p(tmp, str))
                return Qtrue;
        }
        else {
            const char *p, *s, *e;
            long slen, tlen;
            rb_encoding *enc;

            StringValue(tmp);
            enc = rb_enc_check(str, tmp);
            if ((tlen = RSTRING_LEN(tmp)) == 0) return Qtrue;
            if ((slen = RSTRING_LEN(str)) < tlen) continue;
            p = RSTRING_PTR(str);
            e = p + slen;
            s = p + tlen;
            if (!at_char_right_boundary(p, s, e, enc))
                continue;
            if (memcmp(p, RSTRING_PTR(tmp), tlen) == 0)
                return Qtrue;
        }
    }
    return Qfalse;
}

返回 self 是否以任何给定的 patterns 开头。

对于每个参数，使用的模式是：

模式本身，如果它是 Regexp。
Regexp.quote(pattern)，如果它是字符串。

如果任何模式匹配开头，则返回 true，否则返回 false。

'hello'.start_with?('hell')               # => true
'hello'.start_with?(/H/i)                 # => true
'hello'.start_with?('heaven', 'hell')     # => true
'hello'.start_with?('heaven', 'paradise') # => false
'тест'.start_with?('т')                   # => true
'こんにちは'.start_with?('こ')              # => true

相关：参见查询。

strip(*selectors) → new_string

Source

static VALUE
rb_str_strip(int argc, VALUE *argv, VALUE str)
{
    char *start;
    long olen, loffset, roffset;
    rb_encoding *enc = STR_ENC_GET(str);

    RSTRING_GETMEM(str, start, olen);

    if (argc > 0) {
        char table[TR_TABLE_SIZE];
        VALUE del = 0, nodel = 0;

        tr_setup_table_multi(table, &del, &nodel, str, argc, argv);
        loffset = lstrip_offset_table(str, start, start+olen, enc, table, del, nodel);
        roffset = rstrip_offset_table(str, start+loffset, start+olen, enc, table, del, nodel);
    }
    else {
        loffset = lstrip_offset(str, start, start+olen, enc);
        roffset = rstrip_offset(str, start+loffset, start+olen, enc);
    }

    if (loffset <= 0 && roffset <= 0) return str_duplicate(rb_cString, str);
    return rb_str_subseq(str, loffset, olen-loffset-roffset);
}

返回 self 的副本，并删除前导和尾随空格；请参阅字符串中的空格。

whitespace = "\x00\t\n\v\f\r "
s = whitespace + 'abc' + whitespace
# => "\u0000\t\n\v\f\r abc\u0000\t\n\v\f\r "
s.strip # => "abc"

如果给定了 selectors，则从 self 的两端删除 selectors 中的字符。

s = "---abc+++"
s.strip("-+") # => "abc"
s.strip("+-") # => "abc"

selectors 必须是有效的字符选择器（请参阅 Character Selectors），并且可以使用其任何有效形式，包括否定、范围和转义。

"01234abc56789".strip("0-9") # "abc"
"01234abc56789".strip("0-9", "^4-6") # "4abc56"

相关：参见转换为新字符串。

strip!(*selectors) → self or nil

Source

static VALUE
rb_str_strip_bang(int argc, VALUE *argv, VALUE str)
{
    char *start;
    long olen, loffset, roffset;
    rb_encoding *enc;

    str_modify_keep_cr(str);
    enc = STR_ENC_GET(str);
    RSTRING_GETMEM(str, start, olen);

    if (argc > 0) {
        char table[TR_TABLE_SIZE];
        VALUE del = 0, nodel = 0;

        tr_setup_table_multi(table, &del, &nodel, str, argc, argv);
        loffset = lstrip_offset_table(str, start, start+olen, enc, table, del, nodel);
        roffset = rstrip_offset_table(str, start+loffset, start+olen, enc, table, del, nodel);
    }
    else {
        loffset = lstrip_offset(str, start, start+olen, enc);
        roffset = rstrip_offset(str, start+loffset, start+olen, enc);
    }

    if (loffset > 0 || roffset > 0) {
        long len = olen-roffset;
        if (loffset > 0) {
            len -= loffset;
            memmove(start, start + loffset, len);
        }
        STR_SET_LEN(str, len);
        TERM_FILL(start+len, rb_enc_mbminlen(enc));
        return str;
    }
    return Qnil;
}

类似于 String#strip，但：

所有修改都应用于 self。
如果进行了任何修改，则返回 self，否则返回 nil。

相关：参见修改。

sub(pattern, replacement) → new_string

sub(pattern) {|match| ... } → new_string

Source

static VALUE
rb_str_sub(int argc, VALUE *argv, VALUE str)
{
    str = str_duplicate(rb_cString, str);
    rb_str_sub_bang(argc, argv, str);
    return str;
}

返回 self 的副本，可能替换了子串。

参数 pattern 可以是字符串或 Regexp；参数 replacement 可以是字符串或 Hash。

参数值的不同类型使此方法非常通用。

下面是一些简单的示例；有关更多示例，请参见替换方法。

给定参数 pattern 和字符串 replacement，用给定的替换字符串替换第一个匹配的子串。

s = 'abracadabra'       # => "abracadabra"
s.sub('bra', 'xyzzy')   # => "axyzzycadabra"
s.sub(/bra/, 'xyzzy')   # => "axyzzycadabra"
s.sub('nope', 'xyzzy')  # => "abracadabra"

给定参数 pattern 和哈希 replacement，用给定的替换哈希中的值替换第一个匹配的子串，或将其删除。

h = {'a' => 'A', 'b' => 'B', 'c' => 'C'}
s.sub('b', h)  # => "aBracadabra"
s.sub(/b/, h)  # => "aBracadabra"
s.sub(/d/, h)  # => "abracaabra"  # 'd' removed.

给定参数 pattern 和一个块，用每个匹配的子字符串调用该块；将该子字符串替换为块的返回值

s.sub('b') {|match| match.upcase } # => "aBracadabra"

sub!(pattern) {|match| ... } → self or nil

Source

static VALUE
rb_str_sub_bang(int argc, VALUE *argv, VALUE str)
{
    VALUE pat, repl, hash = Qnil;
    int iter = 0;
    long plen;
    int min_arity = rb_block_given_p() ? 1 : 2;
    long beg;

    rb_check_arity(argc, min_arity, 2);
    if (argc == 1) {
        iter = 1;
    }
    else {
        repl = argv[1];
        hash = rb_check_hash_type(argv[1]);
        if (NIL_P(hash)) {
            StringValue(repl);
        }
    }

    pat = get_pat_quoted(argv[0], 1);

    str_modifiable(str);
    beg = rb_pat_search(pat, str, 0, 1);
    if (beg >= 0) {
        rb_encoding *enc;
        int cr = ENC_CODERANGE(str);
        long beg0, end0;
        VALUE match, match0 = Qnil;
        struct re_registers *regs;
        char *p, *rp;
        long len, rlen;

        match = rb_backref_get();
        regs = RMATCH_REGS(match);
        if (RB_TYPE_P(pat, T_STRING)) {
            beg0 = beg;
            end0 = beg0 + RSTRING_LEN(pat);
            match0 = pat;
        }
        else {
            beg0 = BEG(0);
            end0 = END(0);
            if (iter) match0 = rb_reg_nth_match(0, match);
        }

        if (iter || !NIL_P(hash)) {
            p = RSTRING_PTR(str); len = RSTRING_LEN(str);

            if (iter) {
                repl = rb_obj_as_string(rb_yield(match0));
            }
            else {
                repl = rb_hash_aref(hash, rb_str_subseq(str, beg0, end0 - beg0));
                repl = rb_obj_as_string(repl);
            }
            str_mod_check(str, p, len);
            rb_check_frozen(str);
        }
        else {
            repl = rb_reg_regsub(repl, str, regs, RB_TYPE_P(pat, T_STRING) ? Qnil : pat);
        }

        enc = rb_enc_compatible(str, repl);
        if (!enc) {
            rb_encoding *str_enc = STR_ENC_GET(str);
            p = RSTRING_PTR(str); len = RSTRING_LEN(str);
            if (coderange_scan(p, beg0, str_enc) != ENC_CODERANGE_7BIT ||
                coderange_scan(p+end0, len-end0, str_enc) != ENC_CODERANGE_7BIT) {
                rb_raise(rb_eEncCompatError, "incompatible character encodings: %s and %s",
                         rb_enc_inspect_name(str_enc),
                         rb_enc_inspect_name(STR_ENC_GET(repl)));
            }
            enc = STR_ENC_GET(repl);
        }
        rb_str_modify(str);
        rb_enc_associate(str, enc);
        if (ENC_CODERANGE_UNKNOWN < cr && cr < ENC_CODERANGE_BROKEN) {
            int cr2 = ENC_CODERANGE(repl);
            if (cr2 == ENC_CODERANGE_BROKEN ||
                (cr == ENC_CODERANGE_VALID && cr2 == ENC_CODERANGE_7BIT))
                cr = ENC_CODERANGE_UNKNOWN;
            else
                cr = cr2;
        }
        plen = end0 - beg0;
        rlen = RSTRING_LEN(repl);
        len = RSTRING_LEN(str);
        if (rlen > plen) {
            RESIZE_CAPA(str, len + rlen - plen);
        }
        p = RSTRING_PTR(str);
        if (rlen != plen) {
            memmove(p + beg0 + rlen, p + beg0 + plen, len - beg0 - plen);
        }
        rp = RSTRING_PTR(repl);
        memmove(p + beg0, rp, rlen);
        len += rlen - plen;
        STR_SET_LEN(str, len);
        TERM_FILL(&RSTRING_PTR(str)[len], TERM_LEN(str));
        ENC_CODERANGE_SET(str, cr);

        RB_GC_GUARD(match);

        return str;
    }
    return Qnil;
}

类似于 String#sub，但：

更改应用于 self，而不是 self 的副本。
如果进行了任何更改，则返回 self，否则返回 nil。

相关：参见修改。

succ → new_str

Source

VALUE
rb_str_succ(VALUE orig)
{
    VALUE str;
    str = rb_str_new(RSTRING_PTR(orig), RSTRING_LEN(orig));
    rb_enc_cr_str_copy_for_substr(str, orig);
    return str_succ(str);
}

返回 self 的后继项。后继项是通过递增字符计算的。

要递增的第一个字符是右侧的字母数字字符；如果没有字母数字字符，则是右侧的字符。

'THX1138'.succ   # => "THX1139"
'<<koala>>'.succ # => "<<koalb>>"
'***'.succ       # => '**+'
'тест'.succ      # => "тесу"
'こんにちは'.succ  # => "こんにちば"

数字的后继项是另一个数字，“进位”到下一个左侧字符，以实现从 9 到 0 的“翻转”，如果需要，则添加另一个数字。

'00'.succ # => "01"
'09'.succ # => "10"
'99'.succ # => "100"

字母的后继项是另一个相同大小写的字母，通过“进位”到下一个左侧字符来实现翻转，如果需要，则添加另一个相同大小写的字母。

'aa'.succ # => "ab"
'az'.succ # => "ba"
'zz'.succ # => "aaa"
'AA'.succ # => "AB"
'AZ'.succ # => "BA"
'ZZ'.succ # => "AAA"

非字母数字字符的后继项是底层字符集排序序列中的下一个字符，通过“进位”到下一个左侧字符来实现翻转，如果需要，则添加另一个字符。

s = 0.chr * 3   # => "\x00\x00\x00"
s.succ        # => "\x00\x00\x01"
s = 255.chr * 3 # => "\xFF\xFF\xFF"
s.succ        # => "\x01\x00\x00\x00"

可以在字母数字字符的混合之间以及它们之间发生进位。

s = 'zz99zz99' # => "zz99zz99"
s.succ         # => "aaa00aa00"
s = '99zz99zz' # => "99zz99zz"
s.succ         # => "100aa00aa"

空 String 的后继项是一个新的空 String。

''.succ # => ""

相关：参见转换为新字符串。

也别名为： next

succ! → self

Source

static VALUE
rb_str_succ_bang(VALUE str)
{
    rb_str_modify(str);
    str_succ(str);
    return str;
}

类似于 String#succ，但会就地修改 self；返回 self。

相关：参见修改。

也别名为： next!

sum(n = 16) → integer

Source

static VALUE
rb_str_sum(int argc, VALUE *argv, VALUE str)
{
    int bits = 16;
    char *ptr, *p, *pend;
    long len;
    VALUE sum = INT2FIX(0);
    unsigned long sum0 = 0;

    if (rb_check_arity(argc, 0, 1) && (bits = NUM2INT(argv[0])) < 0) {
        bits = 0;
    }
    ptr = p = RSTRING_PTR(str);
    len = RSTRING_LEN(str);
    pend = p + len;

    while (p < pend) {
        if (FIXNUM_MAX - UCHAR_MAX < sum0) {
            sum = rb_funcall(sum, '+', 1, LONG2FIX(sum0));
            str_mod_check(str, ptr, len);
            sum0 = 0;
        }
        sum0 += (unsigned char)*p;
        p++;
    }

    if (bits == 0) {
        if (sum0) {
            sum = rb_funcall(sum, '+', 1, LONG2FIX(sum0));
        }
    }
    else {
        if (sum == INT2FIX(0)) {
            if (bits < (int)sizeof(long)*CHAR_BIT) {
                sum0 &= (((unsigned long)1)<<bits)-1;
            }
            sum = LONG2FIX(sum0);
        }
        else {
            VALUE mod;

            if (sum0) {
                sum = rb_funcall(sum, '+', 1, LONG2FIX(sum0));
            }

            mod = rb_funcall(INT2FIX(1), idLTLT, 1, INT2FIX(bits));
            mod = rb_funcall(mod, '-', 1, INT2FIX(1));
            sum = rb_funcall(sum, '&', 1, mod);
        }
    }
    return sum;
}

返回 self 中字符的基本 n 位校验和；校验和是 self 中每个字节的二进制值的总和，模 2**n - 1。

'hello'.sum     # => 532
'hello'.sum(4)  # => 4
'hello'.sum(64) # => 532
'тест'.sum      # => 1405
'こんにちは'.sum  # => 2582

这不是一个特别强的校验和。

相关：参见查询。

swapcase(mapping = :ascii) → new_string

Source

static VALUE
rb_str_swapcase(int argc, VALUE *argv, VALUE str)
{
    rb_encoding *enc;
    OnigCaseFoldType flags = ONIGENC_CASE_UPCASE | ONIGENC_CASE_DOWNCASE;
    VALUE ret;

    flags = check_case_options(argc, argv, flags);
    enc = str_true_enc(str);
    if (RSTRING_LEN(str) == 0 || !RSTRING_PTR(str)) return str_duplicate(rb_cString, str);
    if (flags&ONIGENC_CASE_ASCII_ONLY) {
        ret = rb_str_new(0, RSTRING_LEN(str));
        rb_str_ascii_casemap(str, ret, &flags, enc);
    }
    else {
        ret = rb_str_casemap(str, &flags, enc);
    }
    return ret;
}

返回一个字符串，其中包含 self 中的字符，大小写已反转。

每个大写字符都转换为小写。
每个小写字符都转换为大写。

示例

'Hello'.swapcase        # => "hELLO"
'Straße'.swapcase       # => "sTRASSE"
'Привет'.swapcase       # => "пРИВЕТ"
'RubyGems.org'.swapcase # => "rUBYgEMS.ORG"

self 和大写结果的大小可能不同。

s = 'Straße'
s.size          # => 6
s.swapcase      # => "sTRASSE"
s.swapcase.size # => 7

某些字符（以及某些字符集）没有大写和小写的版本；参见大小写映射

s = '1, 2, 3, ...'
s.swapcase == s # => true
s = 'こんにちは'
s.swapcase == s # => true

大小写受给定的 mapping 影响，该映射可以是 :ascii、:fold 或 :turkic；参见大小写映射。

相关：参见转换为新字符串。

swapcase!(mapping) → self or nil

Source

static VALUE
rb_str_swapcase_bang(int argc, VALUE *argv, VALUE str)
{
    rb_encoding *enc;
    OnigCaseFoldType flags = ONIGENC_CASE_UPCASE | ONIGENC_CASE_DOWNCASE;

    flags = check_case_options(argc, argv, flags);
    str_modify_keep_cr(str);
    enc = str_true_enc(str);
    if (flags&ONIGENC_CASE_ASCII_ONLY)
        rb_str_ascii_casemap(str, str, &flags, enc);
    else
        str_shared_replace(str, rb_str_casemap(str, &flags, enc));

    if (ONIGENC_CASE_MODIFIED&flags) return str;
    return Qnil;
}

类似于 String#swapcase，但：

更改应用于 self，而不是 self 的副本。
如果进行了任何更改，则返回 self，否则返回 nil。

相关：参见修改。

to_c → complex

Source

static VALUE
string_to_c(VALUE self)
{
    VALUE num;

    rb_must_asciicompat(self);

    (void)parse_comp(rb_str_fill_terminator(self, 1), FALSE, &num);

    return num;
}

返回一个 Complex 对象：解析 self 的前导子串以提取两个数字，这两个数字成为复数对象的坐标。

子串被解释为包含矩形坐标（实部和虚部）或极坐标（幅度和角度），具体取决于包含或隐含的“分隔符”字符。

'+'、'-' 或无分隔符：矩形坐标。
'@'：极坐标。

简而言之

在这些示例中，我们使用方法 Complex#rect 显示矩形坐标，使用方法 Complex#polar 显示极坐标。

# Rectangular coordinates.

# Real-only: no separator; imaginary part is zero.
'9'.to_c.rect         # => [9, 0]         # Integer.
'-9'.to_c.rect        # => [-9, 0]        # Integer (negative).
'2.5'.to_c.rect       # => [2.5, 0]       # Float.
'1.23e-14'.to_c.rect  # => [1.23e-14, 0]  # Float with exponent.
'2.5/1'.to_c.rect     # => [(5/2), 0]     # Rational.

# Some things are ignored.
'foo1'.to_c.rect      # => [0, 0]         # Unparsed entire substring.
'1foo'.to_c.rect      # => [1, 0]         # Unparsed trailing substring.
' 1 '.to_c.rect       # => [1, 0]         # Leading and trailing whitespace.
*
# Imaginary only: trailing 'i' required; real part is zero.
'9i'.to_c.rect        # => [0, 9]
'-9i'.to_c.rect       # => [0, -9]
'2.5i'.to_c.rect      # => [0, 2.5]
'1.23e-14i'.to_c.rect # => [0, 1.23e-14]
'2.5/1i'.to_c.rect    # => [0, (5/2)]

# Real and imaginary; '+' or '-' separator; trailing 'i' required.
'2+3i'.to_c.rect      # => [2, 3]
'-2-3i'.to_c.rect     # => [-2, -3]
'2.5+3i'.to_c.rect    # => [2.5, 3]
'2.5+3/2i'.to_c.rect  # => [2.5, (3/2)]

# Polar coordinates; '@' separator; magnitude required.
'1.0@0'.to_c.polar             # => [1.0, 0.0]
'1.0@'.to_c.polar              # => [1.0, 0.0]
"1.0@#{Math::PI}".to_c.polar   # => [1.0, 3.141592653589793]
"1.0@#{Math::PI/2}".to_c.polar # => [1.0, 1.5707963267948966]

解析值

解析可以被视为在子串中查找嵌入的数字字面量。

本节展示了该方法如何从前导子串解析数字值。示例显示了仅实部或仅虚部的解析；解析对每个部分都相同。

'1foo'.to_c # => (1+0i)      # Ignores trailing unparsed characters.
' 1 '.to_c  # => (1+0i)      # Ignores leading and trailing whitespace.
'x1'.to_c   # => (0+0i)      # Finds no leading numeric.

# Integer literal embedded in the substring.
'1'.to_c       # => (1+0i)
'-1'.to_c      # => (-1+0i)
'1i'.to_c      # => (0+1i)

# Integer literals that don't work.
'0b100'.to_c   # => (0+0i)   # Not parsed as binary.
'0o100'.to_c   # => (0+0i)   # Not parsed as octal.
'0d100'.to_c   # => (0+0i)   # Not parsed as decimal.
'0x100'.to_c   # => (0+0i)   # Not parsed as hexadecimal.
'010'.to_c     # => (10+0i)  # Not parsed as octal.

# Float literals:
'3.14'.to_c    # => (3.14+0i)
'3.14i'.to_c   # => (0+3.14i)
'1.23e4'.to_c  # => (12300.0+0i)
'1.23e+4'.to_c # => (12300.0+0i)
'1.23e-4'.to_c # => (0.000123+0i)

# Rational literals:
'1/2'.to_c     # => ((1/2)+0i)
'-1/2'.to_c    # => ((-1/2)+0i)
'1/2r'.to_c    # => ((1/2)+0i)
'-1/2r'.to_c   # => ((-1/2)+0i)

矩形坐标

使用分隔符 '+' 或 '-'，或者不使用分隔符，将值解释为矩形坐标：实部和虚部。

不使用分隔符时，将单个值分配给实部或虚部。

 ''.to_c  # => (0+0i)  # Defaults to zero.
'1'.to_c  # => (1+0i)  # Real (no trailing 'i').
'1i'.to_c # => (0+1i)  # Imaginary (trailing 'i').
'i'.to_c  # => (0+1i)  # Special case (imaginary 1).

使用分隔符 '+'，两个部分都为正（或零）。

# Without trailing 'i'.
'+'.to_c    # => (0+0i)  # No values: defaults to zero.
'+1'.to_c   # => (1+0i)  # Value after '+': real only.
'1+'.to_c   # => (1+0i)  # Value before '+': real only.
'2+1'.to_c  # => (2+0i)  # Values before and after '+': real and imaginary.
# With trailing 'i'.
'+1i'.to_c  # => (0+1i)  # Value after '+': imaginary only.
'2+i'.to_c  # => (2+1i)  # Value before '+': real and imaginary 1.
'2+1i'.to_c # => (2+1i)  # Values before and after '+': real and imaginary.

使用分隔符 '-'，虚部为负。

# Without trailing 'i'.
'-'.to_c    # => (0+0i)   # No values: defaults to zero.
'-1'.to_c   # => (-1+0i)  # Value after '-': negative real, zero imaginary.
'1-'.to_c   # => (1+0i)   # Value before '-': positive real, zero imaginary.
'2-1'.to_c  # => (2+0i)   # Values before and after '-': positive real, zero imaginary.
# With trailing 'i'.
'-1i'.to_c  # => (0-1i)   # Value after '-': negative real, zero imaginary.
'2-i'.to_c  # => (2-1i)   # Value before '-': positive real, negative imaginary.
'2-1i'.to_c # => (2-1i)   # Values before and after '-': positive real, negative imaginary.

请注意，后缀字符 'i' 可以是 'I'、'j' 或 'J'，效果相同。

极坐标

使用分隔符 '@'）将值解释为极坐标：幅度和角度。

'2@'.to_c.polar  # => [2, 0.0]    # Value before '@': magnitude only.
 # Values before and after '@': magnitude and angle.
'2@1'.to_c.polar # => [2.0, 1.0]
"1.0@#{Math::PI/2}".to_c # => (0.0+1i)
"1.0@#{Math::PI}".to_c   # => (-1+0.0i)
# Magnitude not given: defaults to zero.
'@'.to_c.polar   # => [0, 0.0]
'@1'.to_c.polar  # => [0, 0.0]

'1.0@0'.to_c             # => (1+0.0i)

请注意，在所有情况下，后缀字符 'i' 都可以是 'I'、'j'、'J'，效果相同。

请参阅转换为非字符串。

to_f → float

Source

static VALUE
rb_str_to_f(VALUE str)
{
    return DBL2NUM(rb_str_to_dbl(str, FALSE));
}

Returns the result of interpreting leading characters in +self+ as a Float:

  '3.14159'.to_f  # => 3.14159
  '1.234e-2'.to_f # => 0.01234

Characters past a leading valid number are ignored:

  '3.14 (pi to two places)'.to_f # => 3.14

Returns zero if there is no leading valid number:

  'abcdef'.to_f # => 0.0

请参阅转换为非字符串。

to_i(base = 10) → integer

Source

static VALUE
rb_str_to_i(int argc, VALUE *argv, VALUE str)
{
    int base = 10;

    if (rb_check_arity(argc, 0, 1) && (base = NUM2INT(argv[0])) < 0) {
        rb_raise(rb_eArgError, "invalid radix %d", base);
    }
    return rb_str_to_inum(str, base, FALSE);
}

返回解释 self 前导字符为给定 base 的整数的结果；base 必须是 0 或范围 (2..36)。

'123456'.to_i     # => 123456
'123def'.to_i(16) # => 1195503

当给定 base 为零时，字符串 object 可能包含前导字符来指定实际基数。

'123def'.to_i(0)   # => 123
'0123def'.to_i(0)  # => 83
'0b123def'.to_i(0) # => 1
'0o123def'.to_i(0) # => 83
'0d123def'.to_i(0) # => 123
'0x123def'.to_i(0) # => 1195503

（给定 base）的有效数字后的字符将被忽略。

'12.345'.to_i   # => 12
'12345'.to_i(2) # => 1

如果没有前导有效数字，则返回零。

'abcdef'.to_i # => 0
'2'.to_i(2)   # => 0

相关：参见转换为非字符串。

to_json_raw(*args)

Source

# File ext/json/lib/json/add/string.rb, line 32
def to_json_raw(...)
  to_json_raw_object.to_json(...)
end

此方法通过调用此 String 的 to_json_raw_object 的结果创建 JSON 文本。

to_json_raw_object()

Source

# File ext/json/lib/json/add/string.rb, line 21
def to_json_raw_object
  {
    JSON.create_id => self.class.name,
    "raw" => unpack("C*"),
  }
end

此方法创建一个原始对象哈希，可以嵌套到其他数据结构中，并会生成为原始字符串。如果您想将原始字符串转换为 JSON 而不是 UTF-8 字符串（例如，二进制数据），则应使用此方法。

to_r → rational

Source

static VALUE
string_to_r(VALUE self)
{
    VALUE num;

    rb_must_asciicompat(self);

    num = parse_rat(RSTRING_PTR(self), RSTRING_END(self), 0, TRUE);

    if (RB_FLOAT_TYPE_P(num) && !FLOAT_ZERO_P(num))
        rb_raise(rb_eFloatDomainError, "Infinity");
    return num;
}

返回解释 self 前导字符为有理数值的结果。

'123'.to_r       # => (123/1)   # Integer literal.
'300/2'.to_r     # => (150/1)   # Rational literal.
'-9.2'.to_r      # => (-46/5)   # Float literal.
'-9.2e2'.to_r    # => (-920/1)  # Float literal.

忽略前导和尾随空格，以及尾随的非数字字符。

' 2 '.to_r       # => (2/1)
'21-Jun-09'.to_r # => (21/1)

如果没有前导数字字符，则返回有理数零。

'BWV 1079'.to_r  # => (0/1)

注意： '0.3'.to_r 等同于 3/10r，但与 0.3.to_r 不同。

'0.3'.to_r # => (3/10)
3/10r      # => (3/10)
0.3.to_r   # => (5404319552844595/18014398509481984)

相关：参见转换为非字符串。

to_s → self or new_string

Source

static VALUE
rb_str_to_s(VALUE str)
{
    if (rb_obj_class(str) != rb_cString) {
        return str_duplicate(rb_cString, str);
    }
    return str;
}

如果 self 是 String，则返回 self；如果 self 是 String 的子类，则返回 self 转换为 String。

相关：参见转换为新字符串。

也别名为： to_str

to_str

别名： to_s

to_sym

别名： intern

tr(selector, replacements) → new_string

Source

static VALUE
rb_str_tr(VALUE str, VALUE src, VALUE repl)
{
    str = str_duplicate(rb_cString, str);
    tr_trans(str, src, repl, 0);
    return str;
}

返回 self 的副本，其中由字符串 selector 指定的每个字符都转换为字符串 replacements 中相应的字符。对应关系是位置的。

selector 指定的第一个字符的每次出现都将转换为 replacements 中的第一个字符。
selector 指定的第二个字符的每次出现都将转换为 replacements 中的第二个字符。
依此类推。

示例

'hello'.tr('el', 'ip') #=> "hippo"

如果 replacements 比 selector 短，它将被隐式地用其最后一个字符进行填充。

'hello'.tr('aeiou', '-')   # => "h-ll-"
'hello'.tr('aeiou', 'AA-') # => "hAll-"

参数 selector 和 replacements 必须是有效的字符选择器（请参阅 Character Selectors），并且可以使用其任何有效形式，包括否定、范围和转义。

'hello'.tr('^aeiou', '-')       # => "-e--o"     # Negation.
'ibm'.tr('b-z', 'a-z')          # => "hal"       # Range.
'hel^lo'.tr('\^aeiou', '-')     # => "h-l-l-"    # Escaped leading caret.
'i-b-m'.tr('b\-z', 'a-z')       # => "ibabm"     # Escaped embedded hyphen.
'foo\\bar'.tr('ab\\', 'XYZ')    # => "fooZYXr"   # Escaped backslash.

Source

static VALUE
rb_str_tr_bang(VALUE str, VALUE src, VALUE repl)
{
    return tr_trans(str, src, repl, 0);
}

类似于 String#tr，但：

在 self 中执行替换（而不是在 self 的副本中）。
如果进行了任何修改，则返回 self，否则返回 nil。

Source

static VALUE
rb_str_tr_s_bang(VALUE str, VALUE src, VALUE repl)
{
    return tr_trans(str, src, repl, 1);
}

类似于 String#tr_s，但：

就地修改 self（而不是 self 的副本）。
如果进行了任何更改，则返回 self，否则返回 nil。

相关：参见转换为新字符串。

unicode_normalize(form = :nfc) → string

Source

static VALUE
rb_str_unicode_normalize(int argc, VALUE *argv, VALUE str)
{
    return unicode_normalize_common(argc, argv, str, id_normalize);
}

返回 self 的副本，其中应用了 Unicode 规范化。

参数 form 必须是以下符号之一（请参阅 Unicode 规范化形式）：

:nfc：规范分解，然后进行规范组合。
:nfd：规范分解。
:nfkc：兼容性分解，然后进行规范组合。
:nfkd：兼容性分解。

self 的编码必须是以下之一：

Encoding::UTF_8.
Encoding::UTF_16BE.
Encoding::UTF_16LE.
Encoding::UTF_32BE.
Encoding::UTF_32LE.
Encoding::GB18030.
Encoding::UCS_2BE.
Encoding::UCS_4BE.

示例

"a\u0300".unicode_normalize       # => "à"  # Lowercase 'a' with grave accens.
"a\u0300".unicode_normalize(:nfd) # => "à"  # Same.

相关：参见转换为新字符串。

unicode_normalize!(form = :nfc) → self

Source

static VALUE
rb_str_unicode_normalize_bang(int argc, VALUE *argv, VALUE str)
{
    return rb_str_replace(str, unicode_normalize_common(argc, argv, str, id_normalize));
}

类似于 String#unicode_normalize，但规范化是在 self 上执行的（而不是在 self 的副本上）。

Source

static VALUE
rb_str_unicode_normalized_p(int argc, VALUE *argv, VALUE str)
{
    return unicode_normalize_common(argc, argv, str, id_normalized_p);
}

返回 self 是否处于给定 form 的 Unicode 规范化状态；请参阅 String#unicode_normalize。

form 必须是 :nfc、:nfd、:nfkc 或 :nfkd 之一。

示例

"a\u0300".unicode_normalized?       # => false
"a\u0300".unicode_normalized?(:nfd) # => true
"\u00E0".unicode_normalized?        # => true
"\u00E0".unicode_normalized?(:nfd)  # => false

如果 self 不是 Unicode 编码，则引发异常。

s = "\xE0".force_encoding(Encoding::ISO_8859_1)
s.unicode_normalized? # Raises Encoding::CompatibilityError

unpack(template, offset: 0) → array

Source

# File pack.rb, line 25
def unpack(fmt, offset: 0)
  Primitive.attr! :use_block
  Primitive.pack_unpack(fmt, offset)
end

从 self 中提取数据以形成新对象；请参阅 Packed Data。

如果提供了块，则用每个解包的对象调用该块。

如果不提供块，则返回包含解包对象的数组。

相关：参见转换为非字符串。

unpack1(template, offset: 0) → object

Source

# File pack.rb, line 37
def unpack1(fmt, offset: 0)
  Primitive.pack_unpack1(fmt, offset)
end

类似于不带块的 String#unpack，但只解包并返回第一个提取的对象。请参阅 Packed Data。

相关：参见转换为非字符串。

upcase(mapping = :ascii) → new_string

Source

static VALUE
rb_str_upcase(int argc, VALUE *argv, VALUE str)
{
    rb_encoding *enc;
    OnigCaseFoldType flags = ONIGENC_CASE_UPCASE;
    VALUE ret;

    flags = check_case_options(argc, argv, flags);
    enc = str_true_enc(str);
    if (case_option_single_p(flags, enc, str)) {
        ret = rb_str_new(RSTRING_PTR(str), RSTRING_LEN(str));
        str_enc_copy_direct(ret, str);
        upcase_single(ret);
    }
    else if (flags&ONIGENC_CASE_ASCII_ONLY) {
        ret = rb_str_new(0, RSTRING_LEN(str));
        rb_str_ascii_casemap(str, ret, &flags, enc);
    }
    else {
        ret = rb_str_casemap(str, &flags, enc);
    }

    return ret;
}

返回一个新字符串，其中包含 self 中的大写字符。

'hello'.upcase        # => "HELLO"
'straße'.upcase       # => "STRASSE"
'привет'.upcase       # => "ПРИВЕТ"
'RubyGems.org'.upcase # => "RUBYGEMS.ORG"

self 和大写结果的大小可能不同。

s = 'Straße'
s.size        # => 6
s.upcase      # => "STRASSE"
s.upcase.size # => 7

某些字符（以及某些字符集）没有大写和小写的版本；参见大小写映射

s = '1, 2, 3, ...'
s.upcase == s # => true
s = 'こんにちは'
s.upcase == s # => true

大小写受给定的 mapping 影响，该映射可以是 :ascii、:fold 或 :turkic；参见大小写映射。

相关：参见转换为新字符串。

upcase!(mapping) → self or nil

Source

static VALUE
rb_str_upcase_bang(int argc, VALUE *argv, VALUE str)
{
    rb_encoding *enc;
    OnigCaseFoldType flags = ONIGENC_CASE_UPCASE;

    flags = check_case_options(argc, argv, flags);
    str_modify_keep_cr(str);
    enc = str_true_enc(str);
    if (case_option_single_p(flags, enc, str)) {
        if (upcase_single(str))
            flags |= ONIGENC_CASE_MODIFIED;
    }
    else if (flags&ONIGENC_CASE_ASCII_ONLY)
        rb_str_ascii_casemap(str, str, &flags, enc);
    else
        str_shared_replace(str, rb_str_casemap(str, &flags, enc));

    if (ONIGENC_CASE_MODIFIED&flags) return str;
    return Qnil;
}

类似于 String#upcase，但：

更改 self 中的字符大小写（而不是 self 的副本）。
如果进行了任何更改，则返回 self，否则返回 nil。

upto(other_string, exclusive = false) → new_enumerator

Source

static VALUE
rb_str_upto(int argc, VALUE *argv, VALUE beg)
{
    VALUE end, exclusive;

    rb_scan_args(argc, argv, "11", &end, &exclusive);
    RETURN_ENUMERATOR(beg, argc, argv);
    return rb_str_upto_each(beg, end, RTEST(exclusive), str_upto_i, Qnil);
}

给定一个块，将 successive calls to String#succ 返回的每个 String 值调用该块；第一个值是 self，下一个是 self.succ，以此类推；序列在达到值 other_string 时终止；返回 self

a = []
'a'.upto('f') {|c| a.push(c) }
a # => ["a", "b", "c", "d", "e", "f"]

a = []
'Ж'.upto('П') {|c| a.push(c) }
a # => ["Ж", "З", "И", "Й", "К", "Л", "М", "Н", "О", "П"]

a = []
'よ'.upto('ろ') {|c| a.push(c) }
a # => ["よ", "ら", "り", "る", "れ", "ろ"]

a = []
'a8'.upto('b6') {|c| a.push(c) }
a # => ["a8", "a9", "b0", "b1", "b2", "b3", "b4", "b5", "b6"]

如果参数 exclusive 被给出为一个真值对象，则最后一个值将被省略

a = []
'a'.upto('f', true) {|c| a.push(c) }
a # => ["a", "b", "c", "d", "e"]

如果 other_string 无法被达到，则不调用该块

'25'.upto('5') {|s| fail s }
'aa'.upto('a') {|s| fail s }

没有给出块时，返回一个新的 Enumerator

'a8'.upto('b6') # => #<Enumerator: "a8":upto("b6")>

相关：参见迭代。

valid_encoding? → true or false

Source

static VALUE
rb_str_valid_encoding_p(VALUE str)
{
    int cr = rb_enc_str_coderange(str);

    return RBOOL(cr != ENC_CODERANGE_BROKEN);
}

返回 self 是否编码正确

s = 'Straße'
s.valid_encoding?                                 # => true
s.encoding                                        # => #<Encoding:UTF-8>
s.force_encoding(Encoding::ASCII).valid_encoding? # => false

相关：参见查询。