创建 Ruby 的扩展库

本文档解释了如何为 Ruby 创建扩展库。

基本知识

在 C 语言中，变量有类型，而数据没有类型。相比之下，Ruby 变量没有静态类型，数据本身有类型，因此需要在语言之间进行数据转换。

Ruby 中的对象由 C 类型 'VALUE' 表示。每个 VALUE 数据都有其数据类型。

要从 VALUE 中检索 C 数据，您需要

确定 VALUE 的数据类型
将 VALUE 转换为 C 数据

转换为错误的数据类型可能会导致严重问题。

Ruby 数据类型

Ruby 解释器具有以下数据类型

T_NIL: nil
T_OBJECT: 普通对象
T_CLASS: class
T_MODULE: 模块
T_FLOAT: 浮点数
T_STRING: 字符串
T_REGEXP: 正则表达式
T_ARRAY: 数组
T_HASH: 关联数组
T_STRUCT: (Ruby) 结构体
T_BIGNUM: 多精度整数
T_FIXNUM: Fixnum（31 位或 63 位整数）
T_COMPLEX: 复数
T_RATIONAL: 有理数
T_FILE: IO
T_TRUE: true
T_FALSE: false
T_DATA: 数据
T_SYMBOL: 符号

此外，还有一些用于内部使用的类型

T_ICLASS: 包含的模块
T_MATCH: MatchData 对象
T_UNDEF: 未定义
T_NODE: 语法树节点
T_ZOMBIE: 等待清理的对象

大多数类型都由 C 结构表示。

检查 VALUE 数据的类型

ruby.h 中定义的 TYPE() 宏显示了 VALUE 的数据类型。TYPE() 返回上面描述的 T_XXXX 常量。要处理数据类型，您的代码看起来会像这样

switch (TYPE(obj)) {
  case T_FIXNUM:
    /* process Fixnum */
    break;
  case T_STRING:
    /* process String */
    break;
  case T_ARRAY:
    /* process Array */
    break;
  default:
    /* raise exception */
    rb_raise(rb_eTypeError, "not valid value");
    break;
}

有类型检查函数

void Check_Type(VALUE value, int type)

如果 VALUE 没有指定的类型，则会引发异常。

还有更快的针对 fixnums 和 nil 的检查宏。

FIXNUM_P(obj)
NIL_P(obj)

将 VALUE 转换为 C 数据

T_NIL、T_FALSE、T_TRUE 的数据分别是 nil、false、true。它们是该数据类型的单例。等效的 C 常量是：Qnil、Qfalse、Qtrue。RTEST() 当 VALUE 不是 Qfalse 也不是 Qnil 时返回 true。如果您需要区分 Qfalse 和 Qnil，请专门与 Qfalse 进行比较。

T_FIXNUM 数据是 31 位或 63 位长度的固定整数。此大小取决于 long 的大小：如果 long 是 32 位，则 T_FIXNUM 是 31 位；如果 long 是 64 位，则 T_FIXNUM 是 63 位。可以使用 FIX2INT() 宏或 FIX2LONG() 将 T_FIXNUM 转换为 C 整数。虽然在使用它们之前必须检查数据是否真的是 FIXNUM，但它们速度更快。FIX2LONG() 永远不会引发异常，但 FIX2INT() 如果结果大于或小于 int 的大小，则会引发 RangeError。还有 NUM2INT() 和 NUM2LONG()，它们可以将任何 Ruby 数字转换为 C 整数。这些宏包含类型检查，因此如果转换失败，会引发异常。NUM2DBL() 可用于以相同方式检索双精度浮点值。

您可以使用 StringValue() 和 StringValuePtr() 宏从 VALUE 获取 char*。StringValue(var) 将 var 的值替换为“var.to_str()”的结果。StringValuePtr(var) 执行相同的替换并返回 var 的 char* 表示形式。如果 var 是 String，这些宏会跳过替换。请注意，这些宏只将左值作为其参数，以便就地更改 var 的值。

您还可以使用名为 StringValueCStr() 的宏。这与 StringValuePtr() 相同，但总是在结果末尾添加一个 NUL 字符。如果结果包含 NUL 字符，此宏会引发 ArgumentError 异常。StringValuePtr() 不保证结果末尾存在 NUL，并且结果可能包含 NUL。

其他数据类型具有相应的 C 结构，例如 T_ARRAY 的 struct RArray 等。具有相应结构的类型的值可以强制转换为以检索结构体的指针。强制转换宏的形式是每个数据类型的 RXXXX；例如， RARRAY(obj)。请参阅“ruby.h”。但是，我们不建议直接访问 RXXXX 数据，因为这些数据结构很复杂。使用相应的 rb_xxx() 函数来访问内部结构。例如，要访问数组的条目，请使用 rb_ary_entry(ary, offset) 和 rb_ary_store(ary, offset, obj)。

有一些用于结构成员的访问宏，例如 ‘RSTRING_LEN(str)’ 用于获取 Ruby String 对象的大小。可以通过 ‘RSTRING_PTR(str)’ 访问分配的区域。

注意：除非您对结果负责，否则请勿直接修改结构的值。这可能会导致有趣的 bug。

将 C 数据转换为 VALUE

将 C 数据转换为 Ruby 值

FIXNUM: 左移 1 位，并将其最低有效位 (LSB) 设置为 1。
其他指针值: 强制转换为 VALUE。

您可以通过检查 LSB 来确定 VALUE 是否为指针。

注意：Ruby 不允许任意指针值作为 VALUE。它们应该是 Ruby 已知结构的指针。已知结构在 <ruby.h> 中定义。

要将 C 数字转换为 Ruby 值，请使用这些宏

INT2FIX(): 适用于 31 位以内的整数。
INT2NUM(): 适用于任意大小的整数。

INT2NUM() 会在整数超出 FIXNUM 范围时将其转换为 Bignum，但速度稍慢。

操作 Ruby 对象

如前所述，不建议修改对象的内部结构。要操作对象，请使用 Ruby 解释器提供的函数。以下是一些（并非全部）有用的函数

`String` 函数

rb_str_new(const char *ptr, long len)

创建新的 Ruby 字符串。

rb_str_new2(const char *ptr)

rb_str_new_cstr(const char *ptr)

从 C 字符串创建新的 Ruby 字符串。这相当于 rb_str_new(ptr, strlen(ptr))。

rb_str_new_literal(const char *ptr)

从 C 字符串字面量创建新的 Ruby 字符串。

rb_sprintf(const char *format, …)

rb_vsprintf(const char *format, va_list ap)

使用 printf(3) 格式创建新的 Ruby 字符串。

注意：在格式字符串中，“%”PRIsVALUE 可用于 Object#to_s（或 Object#inspect，如果设置了“+”标志）的输出（相关的参数必须是 VALUE）。由于它与“%i”冲突，对于格式字符串中的整数，请使用“%d”。

rb_str_append(VALUE str1, VALUE str2)

将 Ruby 字符串 str2 追加到 Ruby 字符串 str1。

rb_str_cat(VALUE str, const char *ptr, long len)

将 ptr 中的 len 字节数据追加到 Ruby 字符串。

rb_str_cat2(VALUE str, const char* ptr)

rb_str_cat_cstr(VALUE str, const char* ptr)

将 C 字符串 ptr 追加到 Ruby 字符串 str。此函数等同于 rb_str_cat(str, ptr, strlen(ptr))。

rb_str_catf(VALUE str, const char* format, …)

rb_str_vcatf(VALUE str, const char* format, va_list ap)

根据类似 printf 的格式将 C 字符串 format 和后续参数追加到 Ruby 字符串 str。这些函数分别等同于 rb_str_append(str, rb_sprintf(format, …)) 和 rb_str_append(str, rb_vsprintf(format, ap))。

rb_enc_str_new(const char *ptr, long len, rb_encoding *enc)

rb_enc_str_new_cstr(const char *ptr, rb_encoding *enc)

创建具有指定编码的新 Ruby 字符串。

rb_enc_str_new_literal(const char *ptr, rb_encoding *enc)

从具有指定编码的 C 字符串字面量创建新的 Ruby 字符串。

rb_usascii_str_new(const char *ptr, long len)

rb_usascii_str_new_cstr(const char *ptr)

创建具有 US-ASCII 编码的新 Ruby 字符串。

rb_usascii_str_new_literal(const char *ptr)

从具有 US-ASCII 编码的 C 字符串字面量创建新的 Ruby 字符串。

rb_utf8_str_new(const char *ptr, long len)

rb_utf8_str_new_cstr(const char *ptr)

创建具有 UTF-8 编码的新 Ruby 字符串。

rb_utf8_str_new_literal(const char *ptr)

从具有 UTF-8 编码的 C 字符串字面量创建新的 Ruby 字符串。

rb_str_resize(VALUE str, long len)

将 Ruby 字符串的大小调整为 len 字节。如果 str 不可修改，此函数将引发异常。必须提前设置 str 的长度。如果 len 小于旧长度，则超出 len 字节的内容将被丢弃；如果 len 大于旧长度，则超出旧长度字节的内容不会被保留，而是垃圾。请注意，调用此函数可能会更改 RSTRING_PTR(str)。

rb_str_set_len(VALUE str, long len)

设置 Ruby 字符串的长度。如果 str 不可修改，此函数将引发异常。此函数保留 len 字节的内容，而不考虑 RSTRING_LEN(str)。len 不得超过 str 的容量。

rb_str_modify(VALUE str)

准备修改 Ruby 字符串。如果 str 不可修改，此函数将引发异常；如果 str 的缓冲区被共享，此函数将分配新缓冲区以使其不被共享。在修改使用 RSTRING_PTR 和/或 rb_str_set_len 的内容之前，您始终必须调用此函数。

`Array` 函数

rb_ary_new(): 创建一个不带元素的数组。
rb_ary_new2(long len)
rb_ary_new_capa(long len): 创建一个不带元素的数组，为 len 个元素分配内部缓冲区。
rb_ary_new3(long n, …)
rb_ary_new_from_args(long n, …): 根据参数创建 n 个元素的数组。
rb_ary_new4(long n, VALUE *elts)
rb_ary_new_from_values(long n, VALUE *elts): 从 C 数组创建 n 个元素的数组。
rb_ary_to_ary(VALUE obj): 将对象转换为数组。相当于 Object#to_ary。

有许多函数可以操作数组。如果给定其他类型，它们可能会导致核心转储。

rb_ary_aref(int argc, const VALUE *argv, VALUE ary): 相当于 Array#[]。
rb_ary_entry(VALUE ary, long offset): ary[offset]
rb_ary_store(VALUE ary, long offset, VALUE obj): ary[offset] = obj
rb_ary_subseq(VALUE ary, long beg, long len): ary[beg, len]
rb_ary_push(VALUE ary, VALUE val)
rb_ary_pop(VALUE ary)
rb_ary_shift(VALUE ary)
rb_ary_unshift(VALUE ary, VALUE val): ary.push, ary.pop, ary.shift, ary.unshift
rb_ary_cat(VALUE ary, const VALUE *ptr, long len): 将 ptr 中的 len 个对象元素追加到数组。

使用 C 扩展 Ruby

为 Ruby 添加新功能

您可以为 Ruby 解释器添加新功能（类、方法等）。Ruby 提供了定义以下内容的 API

类、模块
方法、单例方法
Constants

类和模块定义

要定义类或模块，请使用以下函数

VALUE rb_define_class(const char *name, VALUE super)
VALUE rb_define_module(const char *name)

这些函数返回新创建的类或模块。您可能需要将此引用保存到变量中以供以后使用。

要定义嵌套的类或模块，请使用以下函数

VALUE rb_define_class_under(VALUE outer, const char *name, VALUE super)
VALUE rb_define_module_under(VALUE outer, const char *name)

方法和单例方法定义

要定义方法或单例方法，请使用这些函数

void rb_define_method(VALUE klass, const char *name,
                      VALUE (*func)(ANYARGS), int argc)

void rb_define_singleton_method(VALUE object, const char *name,
                                VALUE (*func)(ANYARGS), int argc)

“argc”表示 C 函数的参数数量，必须小于 17。但我怀疑您是否需要那么多。

如果“argc”为负，则表示调用序列，而不是参数数量。

如果 argc 为 -1，则函数将被调用为

VALUE func(int argc, VALUE *argv, VALUE obj)

其中 argc 是实际参数数量，argv 是参数的 C 数组，obj 是接收者。

如果 argc 为 -2，则参数将通过 Ruby 数组传递。函数将如下调用

VALUE func(VALUE obj, VALUE args)

其中 obj 是接收者，args 是包含实际参数的 Ruby 数组。

还有一些用于定义方法的函数。一个需要 ID 作为要定义的方法名。另请参阅下面的 ID 或 Symbol。

void rb_define_method_id(VALUE klass, ID name,
                         VALUE (*func)(ANYARGS), int argc)

有两种函数用于定义私有/受保护方法

void rb_define_private_method(VALUE klass, const char *name,
                              VALUE (*func)(ANYARGS), int argc)
void rb_define_protected_method(VALUE klass, const char *name,
                                VALUE (*func)(ANYARGS), int argc)

最后，rb_define_module_function 定义一个模块函数，它是模块的私有 AND 单例方法。例如，sqrt 是在 Math 模块中定义的模块函数。可以通过以下方式调用它

Math.sqrt(4)

或

include Math
sqrt(4)

要定义模块函数，请使用

void rb_define_module_function(VALUE module, const char *name,
                               VALUE (*func)(ANYARGS), int argc)

此外，函数式方法（在 Kernel 模块中定义的私有方法）可以使用以下方法定义

void rb_define_global_function(const char *name, VALUE (*func)(ANYARGS), int argc)

要为方法定义别名，

void rb_define_alias(VALUE module, const char* new, const char* old);

要为属性定义读/写方法，

void rb_define_attr(VALUE klass, const char *name, int read, int write)

定义和取消定义“allocate”类方法，

void rb_define_alloc_func(VALUE klass, VALUE (*func)(VALUE klass));
void rb_undef_alloc_func(VALUE klass);

func 必须接受 klass 作为参数并返回新分配的实例。此实例应尽可能为空，不包含任何昂贵的（包括外部）资源。

如果您正在覆盖类任何祖先的现有方法，您可以依赖

VALUE rb_call_super(int argc, const VALUE *argv)

指定调用 super 时是否传递关键字参数

VALUE rb_call_super_kw(int argc, const VALUE *argv, int kw_splat)

kw_splat 可以有以下可能值（由所有接受 kw_splat 参数的方法使用）

RB_NO_KEYWORDS: 不传递关键字
RB_PASS_KEYWORDS: 传递关键字，最后一个参数应为关键字哈希
RB_PASS_CALLED_KEYWORDS: 如果当前方法使用关键字调用，则传递关键字，适用于参数委托

要获取当前作用域的接收者（如果没有其他方法可用），您可以使用

VALUE rb_current_receiver(void)

常量定义

我们有两个函数可以定义常量

void rb_define_const(VALUE klass, const char *name, VALUE val)
void rb_define_global_const(const char *name, VALUE val)

前者用于在指定类/模块下定义常量。后者用于定义全局常量。

从 C 使用 Ruby 功能

有几种方法可以从 C 代码调用 Ruby 的功能。

评估字符串中的 Ruby 程序

从 C 程序使用 Ruby 功能的最简单方法是将字符串评估为 Ruby 程序。此函数将完成此任务

VALUE rb_eval_string(const char *str)

评估在当前上下文中进行，因此可以访问最内层方法（由 Ruby 定义）的当前局部变量。

请注意，评估可能会引发异常。有一个更安全的功能

VALUE rb_eval_string_protect(const char *str, int *state)

如果发生错误，它将返回 nil。此外，*state 在 str 成功评估时为零，否则为非零。

ID 或 `Symbol`

您可以直接调用方法，而无需解析字符串。首先我需要解释 ID。ID 是整数，用于表示 Ruby 的标识符，例如变量名。与 ID 对应的 Ruby 数据类型是 Symbol。可以从 Ruby 以以下形式访问它

:Identifier

或

:"any kind of string"

您可以使用以下方法从 C 代码中获取字符串的 ID 值

rb_intern(const char *name)
rb_intern_str(VALUE name)

您可以使用以下方法从作为参数给出的 Ruby 对象（Symbol 或 String）中检索 ID

rb_to_id(VALUE symbol)
rb_check_id(volatile VALUE *name)
rb_check_id_cstr(const char *name, long len, rb_encoding *enc)

这些函数会尝试将参数转换为 String，如果它不是 Symbol 也不是 String。第二个函数将转换后的结果存储在 *name 中，如果字符串不是已知符号，则返回 0。在此函数返回非零值后，*name 始终是 Symbol 或 String，否则如果结果为 0，则为 String。第三个函数接受 NUL 终止的 C 字符串，而不是 Ruby VALUE。

您可以使用以下方法从作为参数给出的 Ruby 对象（Symbol 或 String）中检索 Symbol

rb_to_symbol(VALUE name)
rb_check_symbol(volatile VALUE *namep)
rb_check_symbol_cstr(const char *ptr, long len, rb_encoding *enc)

这些函数与上面的函数类似，只是它们返回一个 Symbol 而不是 ID。

您可以使用以下方法将 C ID 转换为 Ruby Symbol

VALUE ID2SYM(ID id)

并使用以下方法将 Ruby Symbol 对象转换为 ID

ID SYM2ID(VALUE symbol)

从 C 调用 Ruby 方法

要直接调用方法，您可以使用以下函数

VALUE rb_funcall(VALUE recv, ID mid, int argc, ...)

此函数在 recv 上调用方法，方法名由符号 mid 指定。

访问变量和常量

您可以使用访问函数访问类变量和实例变量。此外，全局变量可以在两个环境之间共享。没有办法访问 Ruby 的局部变量。

访问/修改实例变量的函数如下

VALUE rb_ivar_get(VALUE obj, ID id)
VALUE rb_ivar_set(VALUE obj, ID id, VALUE val)

id 必须是符号，可以通过 rb_intern() 获取。

访问类/模块的常量

VALUE rb_const_get(VALUE obj, ID id)

另请参阅上面的常量定义。

可从 C 访问的 Ruby 常量

如第 1.3 节所述，以下 Ruby 常量可以从 C 引用。

Qtrue
Qfalse: 布尔值。Qfalse 在 C 中也为 false（即 0）。
Qnil: Ruby nil 在 C 作用域中。

C 和 Ruby 之间的共享全局变量

可以使用共享全局变量在两个环境之间共享信息。要定义它们，您可以使用以下函数

void rb_define_variable(const char *name, VALUE *var)

此函数定义了在两个环境中共享的变量。可以通过 Ruby 中名为“name”的全局变量访问由“var”指向的全局变量的值。

您可以使用以下函数定义只读（当然，从 Ruby 角度看）变量。

void rb_define_readonly_variable(const char *name, VALUE *var)

您可以定义钩子变量。在访问钩子变量时会调用访问函数（getter 和 setter）。

void rb_define_hooked_variable(const char *name, VALUE *var,
                               VALUE (*getter)(), void (*setter)())

如果您需要提供 setter 或 getter，只需为不需要的钩子提供 0。如果两个钩子都是 0，rb_define_hooked_variable() 的作用就像 rb_define_variable()。

getter 和 setter 函数的原型如下

VALUE (*getter)(ID id, VALUE *var);
void (*setter)(VALUE val, ID id, VALUE *var);

您还可以定义一个没有相应 C 变量的 Ruby 全局变量。变量的值将仅由钩子设置/获取。

void rb_define_virtual_variable(const char *name,
                                VALUE (*getter)(), void (*setter)())

getter 和 setter 函数的原型如下

VALUE (*getter)(ID id);
void (*setter)(VALUE val, ID id);

将 C 数据封装到 Ruby 对象中

有时您需要将 C 世界中的 struct 暴露为 Ruby 对象。在这种情况下，使用 TypedData_XXX 宏系列，struct 的指针和 Ruby 对象可以相互转换。

C struct 到 Ruby 对象

您可以使用下一个宏将您的 struct 的指针 sval 转换为 Ruby 对象。

TypedData_Wrap_Struct(klass, data_type, sval)

TypedData_Wrap_Struct() 返回创建的 Ruby 对象作为 VALUE。

klass 参数是对象的类。klass 应派生自 rb_cObject，并且必须通过调用 rb_define_alloc_func 或 rb_undef_alloc_func 来设置分配器。

data_type 是一个指向 const rb_data_type_t 的指针，它描述了 Ruby 如何管理该 struct。

rb_data_type_t 定义如下。让我们看看 struct 的每个成员。

typedef struct rb_data_type_struct rb_data_type_t;

struct rb_data_type_struct {
    const char *wrap_struct_name;
    struct {
        void (*dmark)(void*);
        void (*dfree)(void*);
        size_t (*dsize)(const void *);
        void (*dcompact)(void*);
        void *reserved[1];
    } function;
    const rb_data_type_t *parent;
    void *data;
    VALUE flags;
};

wrap_struct_name 是此 struct 实例的标识符。它主要用于收集和发出统计信息。因此，标识符在进程中必须是唯一的，但不需要是有效的 C 或 Ruby 标识符。

这些 dmark / dfree 函数在 GC 执行期间调用。在此期间不允许分配对象，因此请勿在其中分配 ruby 对象。

dmark 是一个标记 struct 引用的 Ruby 对象的函数。如果您的 struct 包含此类引用，则必须使用 rb_gc_mark 或其系列来标记所有引用。

dfree 是一个用于释放指针分配的函数。如果这是 RUBY_DEFAULT_FREE，则指针将被简单地释放。

dsize 计算 struct 占用的内存量（以字节为单位）。其参数是指向 struct 的指针。如果难以实现此类函数，您可以传递 0 作为 dsize。但仍建议避免使用 0。

dcompact 在发生内存压缩时被调用。由 rb_gc_mark_movable() 标记的引用 Ruby 对象可以在此处根据 rb_gc_location() 进行更新。

您必须将 reserved 填充为 0。

parent 可以指向 Ruby 对象继承自的另一个 C 类型定义。然后 TypedData_Get_Struct() 也接受派生对象。

您可以将“data”填充为您任意值。Ruby 不对此成员进行任何操作。

flags 是以下标志值的按位或。由于它们需要深入了解 Ruby 的垃圾收集器，如果您不确定，可以只将 flags 设置为 0。

RUBY_TYPED_FREE_IMMEDIATELY

此标志使垃圾收集器在需要释放 struct 时在 GC 期间立即调用 dfree()。如果 dfree 永远不会解锁 Ruby 的内部锁（GVL），您可以指定此标志。

如果未设置此标志，Ruby 将推迟 dfree() 的调用，并在最终项的同时调用 dfree()。

RUBY_TYPED_WB_PROTECTED

它表明对象实现支持写屏障。如果设置了此标志，Ruby 可以更好地对对象进行垃圾回收。

但是，如果设置了此标志，您将负责在对象所有方法的实现中适当放置写屏障。否则 Ruby 可能会在运行时崩溃。

有关写屏障的更多信息，请参阅 Generational GC。

RUBY_TYPED_FROZEN_SHAREABLE

此标志指示对象如果是冻结对象，则为可共享对象。有关更多详细信息，请参阅 Ractor support。

如果未设置此标志，则对象无法通过 Ractor.make_shareable() 方法成为可共享对象。

请注意，此宏可能会引发异常。如果将被包装的 sval 持有需要释放的资源（例如，已分配的内存、外部库的句柄等），您将不得不使用 rb_protect。

您可以以更优选的方式一次性分配并包装 struct。

TypedData_Make_Struct(klass, type, data_type, sval)

此宏返回一个已分配的 T_DATA 对象，包装 struct 的指针，该指针也已分配。此宏的作用类似于

(sval = ZALLOC(type), TypedData_Wrap_Struct(klass, data_type, sval))

但是，如果 struct 只是简单分配的，您应该使用此宏而不是上面的“分配然后包装”代码，因为后者可能会引发 NoMemoryError 并且 sval 将在此情况下发生内存泄漏。

参数 klass 和 data_type 的工作方式与 TypedData_Wrap_Struct() 中的对应参数相同。分配的 struct 的指针将被赋给 sval，后者应为指定类型的指针。

声明式地标记/压缩 struct 引用

如果您的 struct 引用了简单的 Ruby 对象值，而不是封装在条件逻辑或复杂数据结构中的对象，则可以通过声明 struct 中 VALUE 的偏移量引用来提供一种标记和更新引用的替代方法。

这样做可以让 Ruby GC 支持标记这些引用以及 GC 压缩，而无需定义 dmark 和 dcompact 回调。

您必须定义一个静态列表，其中包含 struct 中引用的 VALUE 指针的偏移量，并将“data”成员设置为指向此引用列表。引用列表必须以 RUBY_END_REFS 结尾。

提供了一些宏来简化边缘引用

RUBY_TYPED_DECL_MARKING =一个可以设置在 ruby_data_type_t 上的标志，表示引用被声明为边。
RUBY_REFERENCES(ref_list_name) - 将 ref_list_name 定义为引用列表
RUBY_REF_END - 引用列表的结束标记。
RUBY_REF_EDGE(struct, member) - 将 member 声明为来自 struct 的 VALUE 边。在 RUBY_REFERENCES_START 之后使用此选项
RUBY_REFS_LIST_PTR - 将引用列表强制转换为现有 dmark 接口可以接受的格式。

下面的示例来自 Dir（定义在 dir.c）

// The struct being wrapped. Notice this contains 3 members of which the second
// is a VALUE reference to another ruby object.
struct dir_data {
    DIR *dir;
    const VALUE path;
    rb_encoding *enc;
}

// Define a reference list `dir_refs` containing a single entry to `path`.
// Needs terminating with RUBY_REF_END
RUBY_REFERENCES(dir_refs) = {
    RUBY_REF_EDGE(dir_data, path),
    RUBY_REF_END
};

// Override the "dmark" field with the defined reference list now that we
// no longer need a marking callback and add RUBY_TYPED_DECL_MARKING to the
// flags field
static const rb_data_type_t dir_data_type = {
    "dir",
    {RUBY_REFS_LIST_PTR(dir_refs), dir_free, dir_memsize,},
    0, NULL, RUBY_TYPED_WB_PROTECTED | RUBY_TYPED_FREE_IMMEDIATELY | RUBY_TYPED_DECL_MARKING
};

以这种方式声明简单的引用允许 GC 在压缩期间标记和移动底层对象，并自动更新对它的引用。

Ruby 对象到 C struct

要从 T_DATA 对象检索 C 指针，请使用宏 TypedData_Get_Struct()。

TypedData_Get_Struct(obj, type, &data_type, sval)

指向 struct 的指针将被赋给变量 sval。

有关详细信息，请参阅下面的示例。

示例 - 创建 dbm 扩展

好的，这是创建扩展库的示例。这是访问 DBM 的扩展。完整的源代码包含在 Ruby 源代码树的 ext/ 目录中。

创建目录

% mkdir ext/dbm

设计库

在创建库之前，您需要设计库的功能。

编写 C 代码

您需要为您的扩展库编写 C 代码。如果您的库只有一个源文件，则首选文件名“LIBRARY.c”。另一方面，如果您的库有多个源文件，请避免选择“LIBRARY.c”作为文件名。在某些平台上，它可能与中间文件“LIBRARY.o”冲突。请注意，mkmf 库（下面介绍）中的某些函数会生成一个名为“conftest.c”的文件用于编译检查。您不应该选择“conftest.c”作为源文件名。

Ruby 将执行库中名为“Init_LIBRARY”的初始化函数。例如，加载库时将执行“Init_dbm()”。

以下是初始化函数的示例。

#include <ruby.h>
void
Init_dbm(void)
{
    /* define DBM class */
    VALUE cDBM = rb_define_class("DBM", rb_cObject);
    /* Redefine DBM.allocate
    rb_define_alloc_func(cDBM, fdbm_alloc);
    /* DBM includes Enumerable module */
    rb_include_module(cDBM, rb_mEnumerable);

    /* DBM has class method open(): arguments are received as C array */
    rb_define_singleton_method(cDBM, "open", fdbm_s_open, -1);

    /* DBM instance method close(): no args */
    rb_define_method(cDBM, "close", fdbm_close, 0);
    /* DBM instance method []: 1 argument */
    rb_define_method(cDBM, "[]", fdbm_aref, 1);

    /* ... */

    /* ID for a instance variable to store DBM data */
    id_dbm = rb_intern("dbm");
}

dbm 扩展使用 TypedData_Make_Struct 在 C 环境中包装 dbm struct。

struct dbmdata {
    int  di_size;
    DBM *di_dbm;
};

static const rb_data_type_t dbm_type = {
    "dbm",
    {0, free_dbm, memsize_dbm,},
    0, 0,
    RUBY_TYPED_FREE_IMMEDIATELY,
};

static VALUE
fdbm_alloc(VALUE klass)
{
    struct dbmdata *dbmp;
    /* Allocate T_DATA object and C struct and fill struct with zero bytes */
    return TypedData_Make_Struct(klass, struct dbmdata, &dbm_type, dbmp);
}

此代码将 dbmdata 结构包装到 Ruby 对象中。我们避免直接包装 DBM*，因为我们想缓存大小信息。由于 Object.allocate 分配普通 T_OBJECT 类型（而不是 T_DATA），因此使用 rb_define_alloc_func() 覆盖它或使用 rb_undef_alloc_func() 删除它很重要。

要从 Ruby 对象检索 dbmdata 结构，我们定义了以下宏

#define GetDBM(obj, dbmp) do {\
    TypedData_Get_Struct((obj), struct dbmdata, &dbm_type, (dbmp));\
    if ((dbmp) == 0) closed_dbm();\
    if ((dbmp)->di_dbm == 0) closed_dbm();\
} while (0)

这种复杂的宏负责检索和关闭 DBM 的检查。

有三种接收方法参数的方式。首先，具有固定参数数量的方法接收参数如下

static VALUE
fdbm_aref(VALUE obj, VALUE keystr)
{
    struct dbmdata *dbmp;
    GetDBM(obj, dbmp);
    /* Use dbmp to access the key */
    dbm_fetch(dbmp->di_dbm, StringValueCStr(keystr));
    /* ... */
}

C 函数的第一个参数是 self，其余参数是方法的参数。

其次，具有任意数量参数的方法接收参数如下

static VALUE
fdbm_s_open(int argc, VALUE *argv, VALUE klass)
{
    /* ... */
    if (rb_scan_args(argc, argv, "11", &file, &vmode) == 1) {
        mode = 0666;          /* default value */
    }
    /* ... */
}

第一个参数是方法参数的数量，第二个参数是方法参数的 C 数组，第三个参数是方法的接收者。

您可以使用 rb_scan_args() 函数来检查和检索参数。第三个参数是一个字符串，它指定如何捕获方法参数并将其分配给以下 VALUE 引用。

您只需使用 rb_check_arity() 检查参数数量，当您想将参数视为列表时，这很方便。

以下是采用 Ruby 数组参数的方法示例

static VALUE
thread_initialize(VALUE thread, VALUE args)
{
    /* ... */
}

第一个参数是接收者，第二个参数是包含方法参数的 Ruby 数组。

注意：GC 应该知道引用 Ruby 对象但未导出到 Ruby 世界的全局变量。您需要使用以下方法保护它们

void rb_global_variable(VALUE *var)

或对象本身

void rb_gc_register_mark_object(VALUE object)

准备 extconf.rb

如果存在名为 extconf.rb 的文件，则会执行该文件以生成 Makefile。

extconf.rb 文件用于检查编译条件等。您需要在文件顶部放置

require 'mkmf'

。您可以使用以下函数检查各种条件。

append_cppflags(array-of-flags[, opt]): append each flag to $CPPFLAGS if usable
append_cflags(array-of-flags[, opt]): append each flag to $CFLAGS if usable
append_ldflags(array-of-flags[, opt]): append each flag to $LDFLAGS if usable
have_macro(macro[, headers[, opt]]): check whether macro is defined
have_library(lib[, func[, headers[, opt]]]): check whether library containing function exists
find_library(lib[, func, *paths]): find library from paths
have_func(func[, headers[, opt]): check whether function exists
have_var(var[, headers[, opt]]): check whether variable exists
have_header(header[, preheaders[, opt]]): check whether header file exists
find_header(header, *paths): find header from paths
have_framework(fw): check whether framework exists (for MacOS X)
have_struct_member(type, member[, headers[, opt]]): check whether struct has member
have_type(type[, headers[, opt]]): check whether type exists
find_type(type, opt, *headers): check whether type exists in headers
have_const(const[, headers[, opt]]): check whether constant is defined
check_sizeof(type[, headers[, opts]]): check size of type
check_signedness(type[, headers[, opts]]): check signedness of type
convertible_int(type[, headers[, opts]]): find convertible integer type
find_executable(bin[, path]): find executable file path
create_header(header): generate configured header
create_makefile(target[, target_prefix]): generate Makefile

有关这些函数的完整文档，请参阅 MakeMakefile。

以下变量的值将影响 Makefile。

$CFLAGS: included in CFLAGS make variable (such as -O)
$CPPFLAGS: included in CPPFLAGS make variable (such as -I, -D)
$LDFLAGS: included in LDFLAGS make variable (such as -L)
$objs: list of object file names

编译器/链接器标志通常不具可移植性，您应该使用 append_cppflags、append_cpflags 和 append_ldflags 分别代替直接追加上述变量。

通常，目标文件列表是通过搜索源文件自动生成的，但如果构建过程中会生成任何源文件，则必须显式定义它们。

如果编译条件未满足，则不应调用“create_makefile”。Makefile 将不会生成，编译也不会执行。

准备 depend（可选）

如果存在名为 depend 的文件，Makefile 将包含该文件以检查依赖项。您可以通过调用以下命令创建此文件

% gcc -MM *.c > depend

没关系。准备好它。

生成 Makefile

尝试通过以下方式生成 Makefile

ruby extconf.rb

如果库应安装在 vendor_ruby 目录而不是 site_ruby 目录中，请使用 --vendor 选项，如下所示。

ruby extconf.rb --vendor

如果您将扩展库放在 ruby 源代码树的 ext 目录下，则不需要此步骤。在这种情况下，解释器的编译将为您完成此步骤。

运行 make

键入

make

来编译您的扩展。如果您将扩展库放在 ruby 源代码树的 ext 目录下，您也不需要此步骤。

调试

您可能需要对扩展进行 rb_debug。通过将目录名添加到 ext/Setup 文件中，可以静态链接扩展，以便您可以使用调试器检查扩展。

完成！现在您有了扩展库

您可以随心所欲地使用您的库。Ruby 的作者不会对您的代码根据 Ruby API 的使用施加任何限制。请随意使用、修改、分发或销售您的程序。

$repo_root/include/ruby 下的所有内容都已安装 make install。它应该通过 C 扩展的 #include <ruby.h> 来包含。除以 rbimpl_ 或 RBIMPL_ 开头的符号外，所有符号都是公共 API。它们是实现细节，不应被 C 扩展使用。

只有在 $repo_root/include/ruby.h 头文件中定义了相应宏的 $repo_root/include/ruby/*.h 文件才允许被 C 扩展包含。

$repo_root/internal/ 下的头文件或根目录 $repo_root/*.h 下的头文件不进行 make install。它们是内部头文件，只有内部 API。

Ruby 语言核心

class.c: 类和模块
error.c: 异常类和异常机制
gc.c: 内存管理
load.c: 库加载
object.c: 对象
variable.c: 变量和常量

Ruby 语法解析器

parse.y: 语法定义
parse.c: 从 parse.y 自动生成
defs/keywords: 保留关键字
lex.c: 从关键字自动生成

Ruby 求值器（又名 YARV）

compile.c
eval.c
eval_error.c
eval_jump.c
eval_safe.c
insns.def           : definition of VM instructions
iseq.c              : implementation of VM::ISeq
thread.c            : thread management and context switching
thread_win32.c      : thread implementation
thread_pthread.c    : ditto
vm.c
vm_dump.c
vm_eval.c
vm_exec.c
vm_insnhelper.c
vm_method.c

defs/opt_insns_unif.def  : instruction unification
defs/opt_operand.def     : definitions for optimization

  -> insn*.inc           : automatically generated
  -> opt*.inc            : automatically generated
  -> vm.inc              : automatically generated

正则表达式引擎（Onigumo）

regcomp.c
regenc.c
regerror.c
regexec.c
regparse.c
regsyntax.c

实用函数

debug.c: C 调试器的调试符号
dln.c: 动态加载
st.c: 通用哈希表
strftime.c: 格式化时间
util.c: 杂项实用程序

Ruby 解释器实现

dmyext.c
dmydln.c
dmyencoding.c
id.c
inits.c
main.c
ruby.c
version.c

gem_prelude.rb
prelude.rb

类库

array.c: Array
bignum.c: Bignum
compar.c: Comparable
complex.c: Complex
cont.c: Fiber, Continuation
dir.c: Dir
enum.c: Enumerable
enumerator.c: Enumerator
file.c: File
hash.c: Hash
io.c: IO
marshal.c: Marshal
math.c: Math
numeric.c: Numeric, Integer, Fixnum, Float
pack.c: Array#pack, String#unpack
proc.c: Binding, Proc
process.c: Process
random.c: random number
range.c: Range
rational.c: Rational
re.c: Regexp, MatchData
signal.c: Signal
sprintf.c: String#sprintf
string.c: String
struct.c: Struct
time.c: Time
defs/known_errors.def: Errno::* exception classes
-> known_errors.inc: automatically generated

Multilingualization

encoding.c: Encoding
transcode.c: Encoding::Converter
enc/*.c: encoding classes
enc/trans/*: codepoint mapping tables

goruby interpreter implementation

goruby.c
golf_prelude.rb     : goruby specific libraries.
  -> golf_prelude.c : automatically generated

Appendix B. Ruby extension API reference

Types

VALUE: The type for the Ruby object. Actual structures are defined in ruby.h, such as struct RString, etc. To refer the values in structures, use casting macros like RSTRING(obj).

Variables and constants

Qnil: nil object
Qtrue: true object (default true value)
Qfalse: false object

C pointer wrapping

Data_Wrap_Struct(VALUE klass, void (*mark)(), void (*free)(), void *sval): Wrap a C pointer into a Ruby object. If object has references to other Ruby objects, they should be marked by using the mark function during the GC process. Otherwise, mark should be 0. When this object is no longer referred by anywhere, the pointer will be discarded by free function.
Data_Make_Struct(klass, type, mark, free, sval): This macro allocates memory using malloc(), assigns it to the variable sval, and returns the DATA encapsulating the pointer to memory region.
Data_Get_Struct(data, type, sval): This macro retrieves the pointer value from DATA, and assigns it to the variable sval.

Checking VALUE types

RB_TYPE_P(value, type): Is value an internal type (T_NIL, T_FIXNUM, etc.)?
TYPE(value): Internal type (T_NIL, T_FIXNUM, etc.)
FIXNUM_P(value): Is value a Fixnum?
NIL_P(value): Is value nil?
RB_INTEGER_TYPE_P(value): Is value an Integer?
RB_FLOAT_TYPE_P(value): Is value a Float?
void Check_Type(VALUE value, int type): Ensures value is of the given internal type or raises a TypeError

VALUE type conversion

FIX2INT(value), INT2FIX(i): Fixnum <-> integer
FIX2LONG(value), LONG2FIX(l): Fixnum <-> long
NUM2INT(value), INT2NUM(i): Numeric <-> integer
NUM2UINT(value), UINT2NUM(ui): Numeric <-> unsigned integer
NUM2LONG(value), LONG2NUM(l): Numeric <-> long
NUM2ULONG(value), ULONG2NUM(ul): Numeric <-> unsigned long
NUM2LL(value), LL2NUM(ll): Numeric <-> long long
NUM2ULL(value), ULL2NUM(ull): Numeric <-> unsigned long long
NUM2OFFT(value), OFFT2NUM(off): Numeric <-> off_t
NUM2SIZET(value), SIZET2NUM(size): Numeric <-> size_t
NUM2SSIZET(value), SSIZET2NUM(ssize): Numeric <-> ssize_t
rb_integer_pack(value, words, numwords, wordsize, nails, flags), rb_integer_unpack(words, numwords, wordsize, nails, flags): Numeric <-> Arbitrary size integer buffer
NUM2DBL(value): Numeric -> double
rb_float_new(f): double -> Float
RSTRING_LEN(str): String -> length of String data in bytes
RSTRING_PTR(str): String -> pointer to String data Note that the result pointer may not be NUL-terminated
StringValue(value): Object with #to_str -> String
StringValuePtr(value): Object with #to_str -> pointer to String data
StringValueCStr(value): Object with #to_str -> pointer to String data without NUL bytes It is guaranteed that the result data is NUL-terminated
rb_str_new2(s): char * -> String

Defining classes and modules

VALUE rb_define_class(const char *name, VALUE super): Defines a new Ruby class as a subclass of super.
VALUE rb_define_class_under(VALUE module, const char *name, VALUE super): Creates a new Ruby class as a subclass of super, under the module’s namespace.
VALUE rb_define_module(const char *name): Defines a new Ruby module.
VALUE rb_define_module_under(VALUE module, const char *name): Defines a new Ruby module under the module’s namespace.
void rb_include_module(VALUE klass, VALUE module): Includes module into class. If class already includes it, just ignored.
void rb_extend_object(VALUE object, VALUE module): Extend the object with the module’s attributes.

Defining global variables

void rb_define_variable(const char *name, VALUE *var)

Defines a global variable which is shared between C and Ruby. If name contains a character which is not allowed to be part of the symbol, it can’t be seen from Ruby programs.

void rb_define_readonly_variable(const char *name, VALUE *var)

Defines a read-only global variable. Works just like rb_define_variable(), except the defined variable is read-only.

void rb_define_virtual_variable(const char *name, VALUE (*getter)(), void (*setter)())

Defines a virtual variable, whose behavior is defined by a pair of C functions. The getter function is called when the variable is referenced. The setter function is called when the variable is set to a value. The prototype for getter/setter functions are

VALUE getter(ID id)
void setter(VALUE val, ID id)

The getter function must return the value for the access.

void rb_define_hooked_variable(const char *name, VALUE *var, VALUE (*getter)(), void (*setter)())

Defines hooked variable. It’s a virtual variable with a C variable. The getter is called as

VALUE getter(ID id, VALUE *var)

returning a new value. The setter is called as

void setter(VALUE val, ID id, VALUE *var)

void rb_global_variable(VALUE *var)

Tells GC to protect C global variable, which holds Ruby value to be marked.

void rb_gc_register_mark_object(VALUE object)

Tells GC to protect the object, which may not be referenced anywhere.

常量定义

void rb_define_const(VALUE klass, const char *name, VALUE val)

Defines a new constant under the class/module.

void rb_define_global_const(const char *name, VALUE val)

Defines a global constant. This is just the same as

rb_define_const(rb_cObject, name, val)

Method definition

rb_define_method(VALUE klass, const char *name, VALUE (*func)(ANYARGS), int argc)

Defines a method for the class. func is the function pointer. argc is the number of arguments. if argc is -1, the function will receive 3 arguments: argc, argv, and self. if argc is -2, the function will receive 2 arguments, self and args, where args is a Ruby array of the method arguments.

rb_define_private_method(VALUE klass, const char *name, VALUE (*func)(ANYARGS), int argc)

Defines a private method for the class. Arguments are same as rb_define_method().

rb_define_singleton_method(VALUE klass, const char *name, VALUE (*func)(ANYARGS), int argc)

Defines a singleton method. Arguments are same as rb_define_method().

rb_check_arity(int argc, int min, int max)

Check the number of arguments, argc is in the range of min..max. If max is UNLIMITED_ARGUMENTS, upper bound is not checked. If argc is out of bounds, an ArgumentError will be raised.

rb_scan_args(int argc, VALUE *argv, const char *fmt, …)

Retrieve argument from argc and argv to given VALUE references according to the format string. The format can be described in ABNF as follows

scan-arg-spec  := param-arg-spec [keyword-arg-spec] [block-arg-spec]

param-arg-spec := pre-arg-spec [post-arg-spec] / post-arg-spec /
                  pre-opt-post-arg-spec
pre-arg-spec   := num-of-leading-mandatory-args [num-of-optional-args]
post-arg-spec  := sym-for-variable-length-args
                  [num-of-trailing-mandatory-args]
pre-opt-post-arg-spec := num-of-leading-mandatory-args num-of-optional-args
                         num-of-trailing-mandatory-args
keyword-arg-spec := sym-for-keyword-arg
block-arg-spec := sym-for-block-arg

num-of-leading-mandatory-args  := DIGIT ; The number of leading
                                        ; mandatory arguments
num-of-optional-args           := DIGIT ; The number of optional
                                        ; arguments
sym-for-variable-length-args   := "*"   ; Indicates that variable
                                        ; length arguments are
                                        ; captured as a ruby array
num-of-trailing-mandatory-args := DIGIT ; The number of trailing
                                        ; mandatory arguments
sym-for-keyword-arg            := ":"   ; Indicates that keyword
                                        ; argument captured as a hash.
                                        ; If keyword arguments are not
                                        ; provided, returns nil.
sym-for-block-arg              := "&"   ; Indicates that an iterator
                                        ; block should be captured if
                                        ; given

For example, “12” means that the method requires at least one argument, and at most receives three (1+2) arguments. So, the format string must be followed by three variable references, which are to be assigned to captured arguments. For omitted arguments, variables are set to Qnil. NULL can be put in place of a variable reference, which means the corresponding captured argument(s) should be just dropped.

The number of given arguments, excluding an option hash or iterator block, is returned.

rb_scan_args_kw(int kw_splat, int argc, VALUE *argv, const char *fmt, …)

The same as rb_scan_args, except the kw_splat argument specifies whether keyword arguments are provided (instead of being determined by the call from Ruby to the C function). kw_splat should be one of the following values

RB_SCAN_ARGS_PASS_CALLED_KEYWORDS: Same behavior as rb_scan_args.
RB_SCAN_ARGS_KEYWORDS: The final argument should be a hash treated as keywords.
RB_SCAN_ARGS_LAST_HASH_KEYWORDS: Treat a final argument as keywords if it is a hash, and not as keywords otherwise.

int rb_get_kwargs(VALUE keyword_hash, const ID *table, int required, int optional, VALUE *values)

Retrieves argument VALUEs bound to keywords, which directed by table into values, deleting retrieved entries from keyword_hash along the way. First required number of IDs referred by table are mandatory, and succeeding optional (- optional - 1 if optional is negative) number of IDs are optional. If a mandatory key is not contained in keyword_hash, raises “missing keyword” ArgumentError. If an optional key is not present in keyword_hash, the corresponding element in values is set to Qundef. If optional is negative, rest of keyword_hash are ignored, otherwise raises “unknown keyword” ArgumentError.

Be warned, handling keyword arguments in the C API is less efficient than handling them in Ruby. Consider using a Ruby wrapper method around a non-keyword C function. ref: bugs.ruby-lang.org/issues/11339

VALUE rb_extract_keywords(VALUE *original_hash)

Extracts pairs whose key is a symbol into a new hash from a hash object referred by original_hash. If the original hash contains non-symbol keys, then they are copied to another hash and the new hash is stored through original_hash, else 0 is stored.

Invoking Ruby method

VALUE rb_funcall(VALUE recv, ID mid, int narg, …): Invokes a method. To retrieve mid from a method name, use rb_intern(). Able to call even private/protected methods.
VALUE rb_funcall2(VALUE recv, ID mid, int argc, VALUE *argv)
VALUE rb_funcallv(VALUE recv, ID mid, int argc, VALUE *argv): Invokes a method, passing arguments as an array of values. Able to call even private/protected methods.
VALUE rb_funcallv_kw(VALUE recv, ID mid, int argc, VALUE *argv, int kw_splat): Same as rb_funcallv, using kw_splat to determine whether keyword arguments are passed.
VALUE rb_funcallv_public(VALUE recv, ID mid, int argc, VALUE *argv): Invokes a method, passing arguments as an array of values. Able to call only public methods.
VALUE rb_funcallv_public_kw(VALUE recv, ID mid, int argc, VALUE *argv, int kw_splat): Same as rb_funcallv_public, using kw_splat to determine whether keyword arguments are passed.
VALUE rb_funcall_passing_block(VALUE recv, ID mid, int argc, const VALUE* argv): Same as rb_funcallv_public, except is passes the currently active block as the block when calling the method.
VALUE rb_funcall_passing_block_kw(VALUE recv, ID mid, int argc, const VALUE* argv, int kw_splat): Same as rb_funcall_passing_block, using kw_splat to determine whether keyword arguments are passed.
VALUE rb_funcall_with_block(VALUE recv, ID mid, int argc, const VALUE *argv, VALUE passed_procval): Same as rb_funcallv_public, except passed_procval specifies the block to pass to the method.
VALUE rb_funcall_with_block_kw(VALUE recv, ID mid, int argc, const VALUE *argv, VALUE passed_procval, int kw_splat): Same as rb_funcall_with_block, using kw_splat to determine whether keyword arguments are passed.
VALUE rb_eval_string(const char *str): Compiles and executes the string as a Ruby program.
ID rb_intern(const char *name): Returns ID corresponding to the name.
char *rb_id2name(ID id): Returns the name corresponding ID.
char *rb_class2name(VALUE klass): Returns the name of the class.
int rb_respond_to(VALUE obj, ID id): Returns true if the object responds to the message specified by id.

Instance variables

VALUE rb_iv_get(VALUE obj, const char *name): Retrieve the value of the instance variable. If the name is not prefixed by ‘@’, that variable shall be inaccessible from Ruby.
VALUE rb_iv_set(VALUE obj, const char *name, VALUE val): Sets the value of the instance variable.

Control structure

VALUE rb_block_call(VALUE recv, ID mid, int argc, VALUE * argv, VALUE (*func) (ANYARGS), VALUE data2)

Calls a method on the recv, with the method name specified by the symbol mid, with argc arguments in argv, supplying func as the block. When func is called as the block, it will receive the value from yield as the first argument, and data2 as the second argument. When yielded with multiple values (in C, rb_yield_values(), rb_yield_values2() and rb_yield_splat()), data2 is packed as an Array, whereas yielded values can be gotten via argc/argv of the third/fourth arguments.

VALUE rb_block_call_kw(VALUE recv, ID mid, int argc, VALUE * argv, VALUE (*func) (ANYARGS), VALUE data2, int kw_splat)

Same as rb_funcall_with_block, using kw_splat to determine whether keyword arguments are passed.

[OBSOLETE] VALUE rb_iterate(VALUE (*func1)(), VALUE arg1, VALUE (*func2)(), VALUE arg2)

Calls the function func1, supplying func2 as the block. func1 will be called with the argument arg1. func2 receives the value from yield as the first argument, arg2 as the second argument.

When rb_iterate is used in 1.9, func1 has to call some Ruby-level method. This function is obsolete since 1.9; use rb_block_call instead.

VALUE rb_yield(VALUE val)

Yields val as a single argument to the block.

VALUE rb_yield_values(int n, …)

Yields n number of arguments to the block, using one C argument per Ruby argument.

VALUE rb_yield_values2(int n, VALUE *argv)

Yields n number of arguments to the block, with all Ruby arguments in the C argv array.

VALUE rb_yield_values_kw(int n, VALUE *argv, int kw_splat)

Same as rb_yield_values2, using kw_splat to determine whether keyword arguments are passed.

VALUE rb_yield_splat(VALUE args)

Same as rb_yield_values2, except arguments are specified by the Ruby array args.

VALUE rb_yield_splat_kw(VALUE args, int kw_splat)

Same as rb_yield_splat, using kw_splat to determine whether keyword arguments are passed.

VALUE rb_rescue(VALUE (*func1)(ANYARGS), VALUE arg1, VALUE (*func2)(ANYARGS), VALUE arg2)

Calls the function func1, with arg1 as the argument. If an exception occurs during func1, it calls func2 with arg2 as the first argument and the exception object as the second argument. The return value of rb_rescue() is the return value from func1 if no exception occurs, from func2 otherwise.

VALUE rb_ensure(VALUE (*func1)(ANYARGS), VALUE arg1, VALUE (*func2)(ANYARGS), VALUE arg2)

Calls the function func1 with arg1 as the argument, then calls func2 with arg2 if execution terminated. The return value from rb_ensure() is that of func1 when no exception occurred.

VALUE rb_protect(VALUE (*func) (VALUE), VALUE arg, int *state)

Calls the function func with arg as the argument. If no exception occurred during func, it returns the result of func and *state is zero. Otherwise, it returns Qnil and sets *state to nonzero. If state is NULL, it is not set in both cases. You have to clear the error info with rb_set_errinfo(Qnil) when ignoring the caught exception.

void rb_jump_tag(int state)

Continues the exception caught by rb_protect() and rb_eval_string_protect(). state must be the returned value from those functions. This function never return to the caller.

void rb_iter_break()

Exits from the current innermost block. This function never return to the caller.

void rb_iter_break_value(VALUE value)

Exits from the current innermost block with the value. The block will return the given argument value. This function never return to the caller.

Exceptions and errors

void rb_warn(const char *fmt, …): Prints a warning message according to a printf-like format.
void rb_warning(const char *fmt, …): Prints a warning message according to a printf-like format, if $VERBOSE is true.
void rb_raise(rb_eRuntimeError, const char *fmt, …): Raises RuntimeError. The fmt is a format string just like printf().
void rb_raise(VALUE exception, const char *fmt, …): Raises a class exception. The fmt is a format string just like printf().
void rb_fatal(const char *fmt, …): Raises a fatal error, terminates the interpreter. No exception handling will be done for fatal errors, but ensure blocks will be executed.
void rb_bug(const char *fmt, …): Terminates the interpreter immediately. This function should be called under the situation caused by the bug in the interpreter. No exception handling nor ensure execution will be done.

Threading

As of Ruby 1.9, Ruby supports native 1:1 threading with one kernel thread per Ruby Thread object. Currently, there is a GVL (Global VM Lock) which prevents simultaneous execution of Ruby code which may be released by the rb_thread_call_without_gvl and rb_thread_call_without_gvl2 functions. These functions are tricky-to-use and documented in thread.c; do not use them before reading comments in thread.c.

void rb_thread_schedule(void): Give the scheduler a hint to pass execution to another thread.

Input/Output (`IO`) on a single file descriptor

int rb_io_wait_readable(int fd)

Wait indefinitely for the given FD to become readable, allowing other threads to be scheduled. Returns a true value if a read may be performed, false if there is an unrecoverable error.

int rb_io_wait_writable(int fd)

Like rb_io_wait_readable, but for writability.

int rb_wait_for_single_fd(int fd, int events, struct timeval *timeout)

Allows waiting on a single FD for one or multiple events with a specified timeout.

events is a mask of any combination of the following values

RB_WAITFD_IN - wait for readability of normal data
RB_WAITFD_OUT - wait for writability
RB_WAITFD_PRI - wait for readability of urgent data

Use a NULL timeout to wait indefinitely.

I/O multiplexing

Ruby supports I/O multiplexing based on the select(2) system call. The Linux select_tut(2) manpage <man7.org/linux/man-pages/man2/select_tut.2.html> provides a good overview on how to use select(2), and the Ruby API has analogous functions and data structures to the well-known select API. Understanding of select(2) is required to understand this section.

typedef struct rb_fdset_t

The data structure which wraps the fd_set bitmap used by select(2). This allows Ruby to use FD sets larger than that allowed by historic limitations on modern platforms.

void rb_fd_init(rb_fdset_t *)

Initializes the rb_fdset_t, it must be initialized before other rb_fd_* operations. Analogous to calling malloc(3) to allocate an fd_set.

void rb_fd_term(rb_fdset_t *)

Destroys the rb_fdset_t, releasing any memory and resources it used. It must be reinitialized using rb_fd_init before future use. Analogous to calling free(3) to release memory for an fd_set.

void rb_fd_zero(rb_fdset_t *)

Clears all FDs from the rb_fdset_t, analogous to FD_ZERO(3).

void rb_fd_set(int fd, rb_fdset_t *)

Adds a given FD in the rb_fdset_t, analogous to FD_SET(3).

void rb_fd_clr(int fd, rb_fdset_t *)

Removes a given FD from the rb_fdset_t, analogous to FD_CLR(3).

int rb_fd_isset(int fd, const rb_fdset_t *)

Returns true if a given FD is set in the rb_fdset_t, false if not. Analogous to FD_ISSET(3).

int rb_thread_fd_select(int nfds, rb_fdset_t *readfds, rb_fdset_t *writefds, rb_fdset_t *exceptfds, struct timeval *timeout)

Analogous to the select(2) system call, but allows other Ruby threads to be scheduled while waiting.

When only waiting on a single FD, favor rb_io_wait_readable, rb_io_wait_writable, or rb_wait_for_single_fd functions since they can be optimized for specific platforms (currently, only Linux).

Initialize and start the interpreter

The embedding API functions are below (not needed for extension libraries)

void ruby_init(): Initializes the interpreter.
void *ruby_options(int argc, char **argv): Process command line arguments for the interpreter. And compiles the Ruby source to execute. It returns an opaque pointer to the compiled source or an internal special value.
int ruby_run_node(void *n): Runs the given compiled source and exits this process. It returns EXIT_SUCCESS if successfully runs the source. Otherwise, it returns other value.
void ruby_script(char *name): Specifies the name of the script ($0).

Hooks for the interpreter events

void rb_add_event_hook(rb_event_hook_func_t func, rb_event_flag_t events, VALUE data)

Adds a hook function for the specified interpreter events. events should be OR’ed value of

RUBY_EVENT_LINE
RUBY_EVENT_CLASS
RUBY_EVENT_END
RUBY_EVENT_CALL
RUBY_EVENT_RETURN
RUBY_EVENT_C_CALL
RUBY_EVENT_C_RETURN
RUBY_EVENT_RAISE
RUBY_EVENT_ALL

The definition of rb_event_hook_func_t is below

typedef void (*rb_event_hook_func_t)(rb_event_t event, VALUE data,
                                     VALUE self, ID id, VALUE klass)

The third argument ‘data’ to rb_add_event_hook() is passed to the hook function as the second argument, which was the pointer to the current NODE in 1.8. See RB_EVENT_HOOKS_HAVE_CALLBACK_DATA below.

int rb_remove_event_hook(rb_event_hook_func_t func)

Removes the specified hook function.

Memory usage

void rb_gc_adjust_memory_usage(ssize_t diff): Adjusts the amount of registered external memory. You can tell GC how much memory is used by an external library by this function. Calling this function with positive diff means the memory usage is increased; new memory block is allocated or a block is reallocated as larger size. Calling this function with negative diff means the memory usage is decreased; a memory block is freed or a block is reallocated as smaller size. This function may trigger the GC.

Macros for compatibility

Some macros to check API compatibilities are available by default.

NORETURN_STYLE_NEW

Means that NORETURN macro is functional style instead of prefix.

HAVE_RB_DEFINE_ALLOC_FUNC

Means that function rb_define_alloc_func() is provided, that means the allocation framework is used. This is the same as the result of have_func(“rb_define_alloc_func”, “ruby.h”).

HAVE_RB_REG_NEW_STR

Means that function rb_reg_new_str() is provided, that creates Regexp object from String object. This is the same as the result of have_func(“rb_reg_new_str”, “ruby.h”).

HAVE_RB_IO_T

Means that type rb_io_t is provided.

USE_SYMBOL_AS_METHOD_NAME

Means that Symbols will be returned as method names, e.g., Module#methods, #singleton_methods and so on.

HAVE_RUBY_*_H

Defined in ruby.h and means corresponding header is available. For instance, when HAVE_RUBY_ST_H is defined you should use ruby/st.h not mere st.h.

Header files corresponding to these macros may be include directly from extension libraries.

RB_EVENT_HOOKS_HAVE_CALLBACK_DATA

Means that rb_add_event_hook() takes the third argument ‘data’, to be passed to the given event hook function.

Defining backward compatible macros for keyword argument functions

Most ruby C extensions are designed to support multiple Ruby versions. In order to correctly support Ruby 2.7+ in regards to keyword argument separation, C extensions need to use *_kw functions. However, these functions do not exist in Ruby 2.6 and below, so in those cases macros should be defined to allow you to use the same code on multiple Ruby versions. Here are example macros you can use in extensions that support Ruby 2.6 (or below) when using the *_kw functions introduced in Ruby 2.7.

#ifndef RB_PASS_KEYWORDS
/* Only define macros on Ruby <2.7 */
#define rb_funcallv_kw(o, m, c, v, kw) rb_funcallv(o, m, c, v)
#define rb_funcallv_public_kw(o, m, c, v, kw) rb_funcallv_public(o, m, c, v)
#define rb_funcall_passing_block_kw(o, m, c, v, kw) rb_funcall_passing_block(o, m, c, v)
#define rb_funcall_with_block_kw(o, m, c, v, b, kw) rb_funcall_with_block(o, m, c, v, b)
#define rb_scan_args_kw(kw, c, v, s, ...) rb_scan_args(c, v, s, __VA_ARGS__)
#define rb_call_super_kw(c, v, kw) rb_call_super(c, v)
#define rb_yield_values_kw(c, v, kw) rb_yield_values2(c, v)
#define rb_yield_splat_kw(a, kw) rb_yield_splat(a)
#define rb_block_call_kw(o, m, c, v, f, p, kw) rb_block_call(o, m, c, v, f, p)
#define rb_fiber_resume_kw(o, c, v, kw) rb_fiber_resume(o, c, v)
#define rb_fiber_yield_kw(c, v, kw) rb_fiber_yield(c, v)
#define rb_enumeratorize_with_size_kw(o, m, c, v, f, kw) rb_enumeratorize_with_size(o, m, c, v, f)
#define SIZED_ENUMERATOR_KW(obj, argc, argv, size_fn, kw_splat) \
    rb_enumeratorize_with_size((obj), ID2SYM(rb_frame_this_func()), \
                               (argc), (argv), (size_fn))
#define RETURN_SIZED_ENUMERATOR_KW(obj, argc, argv, size_fn, kw_splat) do { \
        if (!rb_block_given_p())                                            \
            return SIZED_ENUMERATOR(obj, argc, argv, size_fn);              \
    } while (0)
#define RETURN_ENUMERATOR_KW(obj, argc, argv, kw_splat) RETURN_SIZED_ENUMERATOR(obj, argc, argv, 0)
#define rb_check_funcall_kw(o, m, c, v, kw) rb_check_funcall(o, m, c, v)
#define rb_obj_call_init_kw(o, c, v, kw) rb_obj_call_init(o, c, v)
#define rb_class_new_instance_kw(c, v, k, kw) rb_class_new_instance(c, v, k)
#define rb_proc_call_kw(p, a, kw) rb_proc_call(p, a)
#define rb_proc_call_with_block_kw(p, c, v, b, kw) rb_proc_call_with_block(p, c, v, b)
#define rb_method_call_kw(c, v, m, kw) rb_method_call(c, v, m)
#define rb_method_call_with_block_kw(c, v, m, b, kw) rb_method_call_with_block(c, v, m, b)
#define rb_eval_cmd_kw(c, a, kw) rb_eval_cmd(c, a, 0)
#endif

Appendix C. Functions available for use in extconf.rb

See documentation for mkmf.

Appendix D. Generational `GC`

Ruby 2.1 introduced a generational garbage collector (called RGenGC). RGenGC (mostly) keeps compatibility.

Generally, the use of the technique called write barriers is required in extension libraries for generational GC (en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29). RGenGC works fine without write barriers in extension libraries.

If your library adheres to the following tips, performance can be further improved. Especially, the “Don’t touch pointers directly” section is important.

Incompatibility

You can’t write RBASIC(obj)->klass field directly because it is const value now.

Basically you should not write this field because MRI expects it to be an immutable field, but if you want to do it in your extension you can use the following functions

VALUE rb_obj_hide(VALUE obj): Clear RBasic::klass field. The object will be an internal object. ObjectSpace::each_object can’t find this object.
VALUE rb_obj_reveal(VALUE obj, VALUE klass): Reset RBasic::klass to be klass. We expect the ‘klass’ is hidden class by rb_obj_hide().

Write barriers

RGenGC doesn’t require write barriers to support generational GC. However, caring about write barrier can improve the performance of RGenGC. Please check the following tips.

Don’t touch pointers directly

In MRI (include/ruby/ruby.h), some macros to acquire pointers to the internal data structures are supported such as RARRAY_PTR(), RSTRUCT_PTR() and so on.

DO NOT USE THESE MACROS and instead use the corresponding C-APIs such as rb_ary_aref(), rb_ary_store() and so on.

Consider whether to insert write barriers

You don’t need to care about write barriers if you only use built-in types.

If you support T_DATA objects, you may consider using write barriers.

Inserting write barriers into T_DATA objects only works with the following type objects: (a) long-lived objects, (b) when a huge number of objects are generated and (c) container-type objects that have references to other objects. If your extension provides such a type of T_DATA objects, consider inserting write barriers.

(a): short-lived objects don’t become old generation objects. (b): only a few oldgen objects don’t have performance impact. (c): only a few references don’t have performance impact.

Inserting write barriers is a very difficult hack, it is easy to introduce critical bugs. And inserting write barriers has several areas of overhead. Basically we don’t recommend you insert write barriers. Please carefully consider the risks.

Combine with built-in types

Please consider utilizing built-in types. Most built-in types support write barrier, so you can use them to avoid manually inserting write barriers.

For example, if your T_DATA has references to other objects, then you can move these references to Array. A T_DATA object only has a reference to an array object. Or you can also use a Struct object to gather a T_DATA object (without any references) and an that Array contains references.

With use of such techniques, you don’t need to insert write barriers anymore.

Insert write barriers

[AGAIN] Inserting write barriers is a very difficult hack, and it is easy to introduce critical bugs. And inserting write barriers has several areas of overhead. Basically we don’t recommend you insert write barriers. Please carefully consider the risks.

Before inserting write barriers, you need to know about RGenGC algorithm (gc.c will help you). Macros and functions to insert write barriers are available in include/ruby/ruby.h. An example is available in iseq.c.

For a complete guide for RGenGC and write barriers, please refer to <bugs.ruby-lang.org/projects/ruby-master/wiki/RGenGC>.

Appendix E. RB_GC_GUARD to protect from premature `GC`

C Ruby currently uses conservative garbage collection, thus VALUE variables must remain visible on the stack or registers to ensure any associated data remains usable. Optimizing C compilers are not designed with conservative garbage collection in mind, so they may optimize away the original VALUE even if the code depends on data associated with that VALUE.

The following example illustrates the use of RB_GC_GUARD to ensure the contents of sptr remain valid while the second invocation of rb_str_new_cstr is running.

VALUE s, w;
const char *sptr;

s = rb_str_new_cstr("hello world!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!");
sptr = RSTRING_PTR(s);
w = rb_str_new_cstr(sptr + 6); /* Possible GC invocation */

RB_GC_GUARD(s); /* ensure s (and thus sptr) do not get GC-ed */

In the above example, RB_GC_GUARD must be placed after the last use of sptr. Placing RB_GC_GUARD before dereferencing sptr would be of no use. RB_GC_GUARD is only effective on the VALUE data type, not converted C data types.

如果在上述示例中，在解引用 sptr 后对 ‘s’ VALUE 调用了非内联函数，则 RB_GC_GUARD 完全没有必要。因此，在上述示例中，调用任何对 ‘s’ 的非内联函数，例如

rb_str_modify(s);

将确保 ‘s’ 停留在栈上或寄存器中，以防止 GC 调用过早地释放它。

使用 RB_GC_GUARD 宏比使用 C 中的 “volatile” 关键字更可取。RB_GC_GUARD 具有以下优点

宏使用的意图很清晰
RB_GC_GUARD 只影响其调用点，“volatile” 每次使用变量时都会生成一些额外的代码，影响优化。
“volatile” 在某些编译器和体系结构上可能存在 bug/不一致的实现。RB_GC_GUARD 可以针对有问题的系统/编译器进行定制，而不会对其他系统产生负面影响。

附录 F. `Ractor` 支持

Ractor 是 Ruby 3.0 中引入的并行执行机制。所有 Ractor 都可以在不同的操作系统线程上并行运行（使用底层系统提供的线程），因此 C 扩展应该是线程安全的。能够运行在多个 Ractor 中的 C 扩展称为“Ractor 安全”。

C 扩展周围的 Ractor 安全具有以下特性

默认情况下，所有 C 扩展都被识别为 Ractor 不安全。
Ractor 不安全 C 方法只能从主 Ractor 调用。如果由非主 Ractor 调用，则会引发 Ractor::UnsafeError。
如果扩展希望被标记为 Ractor 安全，则扩展应在扩展的 Init_ 函数中调用 rb_ext_ractor_safe(true)，并且所有已定义的方法都将被标记为 Ractor 安全。

为了使 C 扩展“Ractor 安全”，我们需要检查以下几点

不要在 Ractor 之间共享不可共享的对象

例如，C 的全局变量可能导致在 Ractor 之间共享不可共享的对象。
```
VALUE g_var;
VALUE set(VALUE self, VALUE v){ return g_var = v; }
VALUE get(VALUE self){ return g_var; }
```
set() 和 get() 对可以使用 g_var 共享不可共享的对象，这是 Ractor 不安全的。

不仅直接使用全局变量，某些间接数据结构（如全局 st_table）也可以共享对象，因此请注意。

请注意，类和模块对象是可共享对象，因此您可以将代码 “cFoo = rb_define_class(…)” 保存在 C 的全局变量中。
检查扩展的线程安全性

扩展应该是线程安全的。例如，以下代码不是线程安全的
```
bool g_called = false;
VALUE call(VALUE self) {
  if (g_called) rb_raise("recursive call is not allowed.");
  g_called = true;
  VALUE ret = do_something();
  g_called = false;
  return ret;
}
```
因为 g_called 全局变量应该由其他 Ractor 的线程同步。为了避免这种数据竞争，应该使用一些同步机制。请检查 include/ruby/thread_native.h 和 include/ruby/atomic.h。

使用 Ractors，作为方法参数给出的所有对象以及接收者（self）都保证来自当前 Ractor 或可共享。因此，使代码 Ractor 安全比使代码普遍线程安全更容易。例如，我们不需要锁定数组对象来访问其元素。
检查任何已使用库的线程安全性

如果扩展依赖于外部库，例如来自库 libfoo 的函数 foo()，则函数 libfoo foo() 应该是线程安全的。
使对象可共享

这并非使扩展 Ractor 安全的必需条件。

如果扩展提供了由 rb_data_type_t 定义的特殊对象，请考虑这些对象是否可以变得可共享。

RUBY_TYPED_FROZEN_SHAREABLE 标志表示如果对象被冻结，这些对象就是可共享对象。这意味着如果对象被冻结，则不允许修改包装的数据。
其他

在创建 Ractor 安全扩展时，可能还有其他需要考虑的要点或要求。本文档将在发现这些要点或要求时进行扩展。

String 函数

Array 函数

Input/Output (IO) on a single file descriptor

附录 F. Ractor 支持

`String` 函数

`Array` 函数

Input/Output (`IO`) on a single file descriptor

附录 F. `Ractor` 支持