neovim/runtime/doc/dev_style.txt
Justin M. Keyes 01b6bff7e9 docs: news
Set dev_xx.txt help files to use "flow" layout.
2024-05-15 23:19:26 +02:00

947 lines
31 KiB
Plaintext

*dev_style.txt* Nvim
NVIM REFERENCE MANUAL
Nvim style guide *dev-style*
Style guidelines for developers working Nvim's source code.
License: CC-By 3.0 https://creativecommons.org/licenses/by/3.0/
Type |gO| to see the table of contents.
==============================================================================
Background
One way in which we keep the code base manageable is by enforcing consistency.
It is very important that any programmer be able to look at another's code and
quickly understand it.
Maintaining a uniform style and following conventions means that we can more
easily use "pattern-matching" to infer what various symbols are and what
invariants are true about them. Creating common, required idioms and patterns
makes code much easier to understand.
In some cases there might be good arguments for changing certain style rules,
but we nonetheless keep things as they are in order to preserve consistency.
==============================================================================
Header Files *dev-style-header*
Header guard ~
All header files should start with `#pragma once` to prevent multiple inclusion.
In foo/bar.h:
>c
#pragma once
<
Headers system ~
Nvim uses two types of headers. There are "normal" headers and "defs" headers.
Typically, each normal header will have a corresponding defs header, e.g.
`fileio.h` and `fileio_defs.h`. This distinction is done to minimize
recompilation on change. The reason for this is because adding a function or
modifying a function's signature happens more frequently than changing a type
The goal is to achieve the following:
- All headers (defs and normal) must include only defs headers, system
headers, and generated declarations. In other words, headers must not
include normal headers.
- Source (.c) files may include all headers, but should only include normal
headers if they need symbols and not types.
Use the following guideline to determine what to put where:
Symbols:
- regular function declarations
- `extern` variables (including the `EXTERN` macro)
Non-symbols:
- macros, i.e. `#define`.
- static inline functions, but only if its function declaration has a
`REAL_FATTR_ALWAYS_INLINE` attribute.
- typedefs
- structs
- enums
- All symbols must be moved to normal headers.
- Non-symbols used by multiple headers should be moved to defs headers. This
is to ensure headers only include defs headers. Conversely, non-symbols used
by only a single header should be moved to that header.
- EXCEPTION: if the macro calls a function, then it must be moved to a normal
header.
==============================================================================
Scoping *dev-style-scope*
Local Variables ~
Place a function's variables in the narrowest scope possible, and initialize
variables in the declaration.
C99 allows you to declare variables anywhere in a function. Declare them in as
local a scope as possible, and as close to the first use as possible. This
makes it easier for the reader to find the declaration and see what type the
variable is and what it was initialized to. In particular, initialization
should be used instead of declaration and assignment, e.g. >c
int i;
i = f(); // BAD: initialization separate from declaration.
int j = g(); // GOOD: declaration has initialization.
Initialization ~
Multiple declarations can be defined in one line if they aren't initialized,
but each initialization should be done on a separate line.
>c
int i;
int j; // GOOD
int i, j; // GOOD: multiple declarations, no initialization.
int i = 0;
int j = 0; // GOOD: one initialization per line.
int i = 0, j; // BAD: multiple declarations with initialization.
int i = 0, j = 0; // BAD: multiple declarations with initialization.
==============================================================================
Nvim-Specific Magic
clint ~
Use `clint.py` to detect style errors.
`src/clint.py` is a Python script that reads a source file and identifies
style errors. It is not perfect, and has both false positives and false
negatives, but it is still a valuable tool. False positives can be ignored by
putting `// NOLINT` at the end of the line.
uncrustify ~
src/uncrustify.cfg is the authority for expected code formatting, for cases
not covered by clint.py. We remove checks in clint.py if they are covered by
uncrustify rules.
==============================================================================
Other C Features *dev-style-features*
Variable-Length Arrays and alloca() ~
We do not allow variable-length arrays or `alloca()`.
Variable-length arrays can cause hard to detect stack overflows.
Postincrement and Postdecrement ~
Use postfix form (`i++`) in statements. >c
for (int i = 0; i < 3; i++) { }
int j = ++i; // OK: ++i is used as an expression.
for (int i = 0; i < 3; ++i) { }
++i; // BAD: ++i is used as a statement.
Use of const ~
Use `const` pointers whenever possible. Avoid `const` on non-pointer parameter definitions.
Where to put the const ~
Some people favor the form `int const *foo` to `const int *foo` . They
argue that this is more readable because it's more consistent: it keeps
the rule that `const` always follows the object it's describing. However,
this consistency argument doesn't apply in codebases with few
deeply-nested pointer expressions since most `const` expressions have only
one `const`, and it applies to the underlying value. In such cases, there's
no consistency to maintain. Putting the `const` first is arguably more
readable, since it follows English in putting the "adjective" (`const`)
before the "noun" (`int`).
That said, while we encourage putting `const` first, we do not require it.
But be consistent with the code around you! >c
void foo(const char *p, int i);
}
int foo(const int a, const bool b) {
}
int foo(int *const p) {
}
Integer Types ~
Of the built-in integer types only use `char`, `int`, `uint8_t`, `int8_t`,
`uint16_t`, `int16_t`, `uint32_t`, `int32_t`, `uint64_t`, `int64_t`,
`uintmax_t`, `intmax_t`, `size_t`, `ssize_t`, `uintptr_t`, `intptr_t`, and
`ptrdiff_t`.
Use `int` for error codes and local, trivial variables only.
Use care when converting integer types. Integer conversions and promotions can
cause non-intuitive behavior. Note that the signedness of `char` is
implementation defined.
Public facing types must have fixed width (`uint8_t`, etc.)
There are no convenient `printf` format placeholders for fixed width types.
Cast to `uintmax_t` or `intmax_t` if you have to format fixed width integers.
Type unsigned signed
`char` `%hhu` `%hhd`
`int` n/a `%d`
`(u)intmax_t` `%ju` `%jd`
`(s)size_t` `%zu` `%zd`
`ptrdiff_t` `%tu` `%td`
Booleans ~
Use `bool` to represent boolean values. >c
int loaded = 1; // BAD: loaded should have type bool.
Conditions ~
Don't use "yoda-conditions". Use at most one assignment per condition. >c
if (1 == x) {
if (x == 1) { //use this order
if ((x = f()) && (y = g())) {
Function declarations ~
Every function must not have a separate declaration.
Function declarations are created by the gen_declarations.lua script. >c
static void f(void);
static void f(void)
{
...
}
General translation unit layout ~
The definitions of public functions precede the definitions of static
functions. >c
<HEADER>
<PUBLIC FUNCTION DEFINITIONS>
<STATIC FUNCTION DEFINITIONS>
Integration with declarations generator ~
Every C file must contain #include of the generated header file, guarded by
#ifdef INCLUDE_GENERATED_DECLARATIONS.
Include must go after other #includes and typedefs in .c files and after
everything else in header files. It is allowed to omit #include in a .c file
if .c file does not contain any static functions.
Included file name consists of the .c file name without extension, preceded by
the directory name relative to src/nvim. Name of the file containing static
functions declarations ends with `.c.generated.h`, `*.h.generated.h` files
contain only non-static function declarations. >c
// src/nvim/foo.c file
#include <stddef.h>
typedef int FooType;
#ifdef INCLUDE_GENERATED_DECLARATIONS
# include "foo.c.generated.h"
#endif
// src/nvim/foo.h file
#pragma once
#ifdef INCLUDE_GENERATED_DECLARATIONS
# include "foo.h.generated.h"
#endif
64-bit Portability ~
Code should be 64-bit and 32-bit friendly. Bear in mind problems of printing,
comparisons, and structure alignment.
- Remember that `sizeof(void *)` != `sizeof(int)`. Use `intptr_t` if you want
a pointer-sized integer.
- You may need to be careful with structure alignments, particularly for
structures being stored on disk. Any class/structure with a
`int64_t`/`uint64_t` member will by default end up being 8-byte aligned on a
64-bit system. If you have such structures being shared on disk between
32-bit and 64-bit code, you will need to ensure that they are packed the
same on both architectures. Most compilers offer a way to alter structure
alignment. For gcc, you can use `__attribute__((packed))`. MSVC offers
`#pragma pack()` and `__declspec(align())`.
- Use the `LL` or `ULL` suffixes as needed to create 64-bit constants. For
example: >c
int64_t my_value = 0x123456789LL;
uint64_t my_mask = 3ULL << 48;
sizeof ~
Prefer `sizeof(varname)` to `sizeof(type)`.
Use `sizeof(varname)` when you take the size of a particular variable.
`sizeof(varname)` will update appropriately if someone changes the variable
type either now or later. You may use `sizeof(type)` for code unrelated to any
particular variable, such as code that manages an external or internal data
format where a variable of an appropriate C type is not convenient. >c
Struct data;
memset(&data, 0, sizeof(data));
memset(&data, 0, sizeof(Struct));
if (raw_size < sizeof(int)) {
fprintf(stderr, "compressed record not big enough for count: %ju", raw_size);
return false;
}
==============================================================================
Naming *dev-style-naming*
The most important consistency rules are those that govern naming. The style
of a name immediately informs us what sort of thing the named entity is: a
type, a variable, a function, a constant, a macro, etc., without requiring us
to search for the declaration of that entity. The pattern-matching engine in
our brains relies a great deal on these naming rules.
Naming rules are pretty arbitrary, but we feel that consistency is more
important than individual preferences in this area, so regardless of whether
you find them sensible or not, the rules are the rules.
General Naming Rules ~
Function names, variable names, and filenames should be descriptive; eschew
abbreviation.
Give as descriptive a name as possible, within reason. Do not worry about
saving horizontal space as it is far more important to make your code
immediately understandable by a new reader. Do not use abbreviations that are
ambiguous or unfamiliar to readers outside your project, and do not abbreviate
by deleting letters within a word. >c
int price_count_reader; // No abbreviation.
int num_errors; // "num" is a widespread convention.
int num_dns_connections; // Most people know what "DNS" stands for.
int n; // Meaningless.
int nerr; // Ambiguous abbreviation.
int n_comp_conns; // Ambiguous abbreviation.
int wgc_connections; // Only your group knows what this stands for.
int pc_reader; // Lots of things can be abbreviated "pc".
int cstmr_id; // Deletes internal letters.
File Names ~
Filenames should be all lowercase and can include underscores (`_`).
Use underscores to separate words. Examples of acceptable file names: >
my_useful_file.c
getline_fix.c // OK: getline refers to the glibc function.
C files should end in `.c` and header files should end in `.h`.
Do not use filenames that already exist in `/usr/include`, such as `db.h`.
In general, make your filenames very specific. For example, use
`http_server_logs.h` rather than `logs.h`.
Type Names ~
Typedef-ed structs and enums start with a capital letter and have a capital
letter for each new word, with no underscores: `MyExcitingStruct`.
Non-Typedef-ed structs and enums are all lowercase with underscores between
words: `struct my_exciting_struct` . >c
struct my_struct {
...
};
typedef struct my_struct MyAwesomeStruct;
Variable Names ~
Variable names are all lowercase, with underscores between words. For
instance: `my_exciting_local_variable`.
Common Variable names ~
For example: >c
string table_name; // OK: uses underscore.
string tablename; // OK: all lowercase.
string tableName; // BAD: mixed case.
<
Struct Variables ~
Data members in structs should be named like regular variables. >c
struct url_table_properties {
string name;
int num_entries;
}
<
Global Variables ~
Don't use global variables unless absolutely necessary. Prefix global
variables with `g_`.
Constant Names ~
Use a `k` followed by mixed case: `kDaysInAWeek`.
All compile-time constants, whether they are declared locally or globally,
follow a slightly different naming convention from other variables. Use a `k`
followed by words with uppercase first letters: >c
const int kDaysInAWeek = 7;
Function Names ~
Function names are all lowercase, with underscores between words. For
instance: `my_exceptional_function()`. All functions in the same header file
should have a common prefix.
In `os_unix.h`: >c
void unix_open(const char *path);
void unix_user_id(void);
If your function crashes upon an error, you should append `or_die` to the
function name. This only applies to functions which could be used by
production code and to errors that are reasonably likely to occur during
normal operation.
Enumerator Names ~
Enumerators should be named like constants: `kEnumName`. >c
enum url_table_errors {
kOK = 0,
kErrorOutOfMemory,
kErrorMalformedInput,
};
Macro Names ~
They're like this: `MY_MACRO_THAT_SCARES_CPP_DEVELOPERS`. >c
#define ROUND(x) ...
#define PI_ROUNDED 5.0
==============================================================================
Comments *dev-style-comments*
Comments are vital to keeping our code readable. The following rules describe
what you should comment and where. But remember: while comments are very
important, the best code is self-documenting.
When writing your comments, write for your audience: the next contributor who
will need to understand your code. Be generous — the next one may be you!
Nvim uses Doxygen comments.
Comment Style ~
Use the `//`-style syntax only. >c
// This is a comment spanning
// multiple lines
f();
File Comments ~
Start each file with a description of its contents.
Legal Notice ~
We have no such thing. These things are in LICENSE and only there.
File Contents ~
Every file should have a comment at the top describing its contents.
Generally a `.h` file will describe the variables and functions that are
declared in the file with an overview of what they are for and how they
are used. A `.c` file should contain more information about implementation
details or discussions of tricky algorithms. If you feel the
implementation details or a discussion of the algorithms would be useful
for someone reading the `.h`, feel free to put it there instead, but
mention in the `.c` that the documentation is in the `.h` file.
Do not duplicate comments in both the `.h` and the `.c`. Duplicated
comments diverge. >c
/// A brief description of this file.
///
/// A longer description of this file.
/// Be very generous here.
Struct Comments ~
Every struct definition should have accompanying comments that describes what
it is for and how it should be used. >c
/// Window info stored with a buffer.
///
/// Two types of info are kept for a buffer which are associated with a
/// specific window:
/// 1. Each window can have a different line number associated with a
/// buffer.
/// 2. The window-local options for a buffer work in a similar way.
/// The window-info is kept in a list at g_wininfo. It is kept in
/// most-recently-used order.
struct win_info {
/// Next entry or NULL for last entry.
WinInfo *wi_next;
/// Previous entry or NULL for first entry.
WinInfo *wi_prev;
/// Pointer to window that did the wi_fpos.
Win *wi_win;
...
};
If the field comments are short, you can also put them next to the field. But
be consistent within one struct, and follow the necessary doxygen style. >c
struct wininfo_S {
WinInfo *wi_next; ///< Next entry or NULL for last entry.
WinInfo *wi_prev; ///< Previous entry or NULL for first entry.
Win *wi_win; ///< Pointer to window that did the wi_fpos.
...
};
If you have already described a struct in detail in the comments at the top of
your file feel free to simply state "See comment at top of file for a complete
description", but be sure to have some sort of comment.
Document the synchronization assumptions the struct makes, if any. If an
instance of the struct can be accessed by multiple threads, take extra care to
document the rules and invariants surrounding multithreaded use.
Function Comments ~
Declaration comments describe use of the function; comments at the definition
of a function describe operation.
Function Declarations ~
Every function declaration should have comments immediately preceding it
that describe what the function does and how to use it. These comments
should be descriptive ("Opens the file") rather than imperative ("Open the
file"); the comment describes the function, it does not tell the function
what to do. In general, these comments do not describe how the function
performs its task. Instead, that should be left to comments in the
function definition.
Types of things to mention in comments at the function declaration:
- If the function allocates memory that the caller must free.
- Whether any of the arguments can be a null pointer.
- If there are any performance implications of how a function is used.
- If the function is re-entrant. What are its synchronization assumptions? >c
/// Brief description of the function.
///
/// Detailed description.
/// May span multiple paragraphs.
///
/// @param arg1 Description of arg1
/// @param arg2 Description of arg2. May span
/// multiple lines.
///
/// @return Description of the return value.
Iterator *get_iterator(void *arg1, void *arg2);
<
Function Definitions ~
If there is anything tricky about how a function does its job, the
function definition should have an explanatory comment. For example, in
the definition comment you might describe any coding tricks you use, give
an overview of the steps you go through, or explain why you chose to
implement the function in the way you did rather than using a viable
alternative. For instance, you might mention why it must acquire a lock
for the first half of the function but why it is not needed for the second
half.
Note you should not just repeat the comments given with the function
declaration, in the `.h` file or wherever. It's okay to recapitulate
briefly what the function does, but the focus of the comments should be on
how it does it. >c
// Note that we don't use Doxygen comments here.
Iterator *get_iterator(void *arg1, void *arg2)
{
...
}
Variable Comments ~
In general the actual name of the variable should be descriptive enough to
give a good idea of what the variable is used for. In certain cases, more
comments are required.
Global Variables ~
All global variables should have a comment describing what they are and
what they are used for. For example: >c
/// The total number of tests cases that we run
/// through in this regression test.
const int kNumTestCases = 6;
Implementation Comments ~
In your implementation you should have comments in tricky, non-obvious,
interesting, or important parts of your code.
Line Comments ~
Also, lines that are non-obvious should get a comment at the end of the
line. These end-of-line comments should be separated from the code by 2
spaces. Example: >c
// If we have enough memory, mmap the data portion too.
mmap_budget = max<int64>(0, mmap_budget - index_->length());
if (mmap_budget >= data_size_ && !MmapData(mmap_chunk_bytes, mlock)) {
return; // Error already logged.
}
<
Note that there are both comments that describe what the code is doing,
and comments that mention that an error has already been logged when the
function returns.
If you have several comments on subsequent lines, it can often be more
readable to line them up: >c
do_something(); // Comment here so the comments line up.
do_something_else_that_is_longer(); // Comment here so there are two spaces between
// the code and the comment.
{ // One space before comment when opening a new scope is allowed,
// thus the comment lines up with the following comments and code.
do_something_else(); // Two spaces before line comments normally.
}
<
NULL, true/false, 1, 2, 3... ~
When you pass in a null pointer, boolean, or literal integer values to
functions, you should consider adding a comment about what they are, or
make your code self-documenting by using constants. For example, compare:
>c
bool success = calculate_something(interesting_value,
10,
false,
NULL); // What are these arguments??
<
versus: >c
bool success = calculate_something(interesting_value,
10, // Default base value.
false, // Not the first time we're calling this.
NULL); // No callback.
<
Or alternatively, constants or self-describing variables: >c
const int kDefaultBaseValue = 10;
const bool kFirstTimeCalling = false;
Callback *null_callback = NULL;
bool success = calculate_something(interesting_value,
kDefaultBaseValue,
kFirstTimeCalling,
null_callback);
<
Don'ts ~
Note that you should never describe the code itself. Assume that the
person reading the code knows C better than you do, even though he or she
does not know what you are trying to do: >c
// Now go through the b array and make sure that if i occurs,
// the next element is i+1.
... // Geez. What a useless comment.
Punctuation, Spelling and Grammar ~
Pay attention to punctuation, spelling, and grammar; it is easier to read
well-written comments than badly written ones.
Comments should be as readable as narrative text, with proper capitalization
and punctuation. In many cases, complete sentences are more readable than
sentence fragments. Shorter comments, such as comments at the end of a line of
code, can sometimes be less formal, but you should be consistent with your
style.
Although it can be frustrating to have a code reviewer point out that you are
using a comma when you should be using a semicolon, it is very important that
source code maintain a high level of clarity and readability. Proper
punctuation, spelling, and grammar help with that goal.
TODO Comments ~
Use `TODO` comments for code that is temporary, a short-term solution, or
good-enough but not perfect.
`TODO`s should include the string `TODO` in all caps, followed by the name,
email address, or other identifier of the person who can best provide context
about the problem referenced by the `TODO`. The main purpose is to have a
consistent `TODO` format that can be searched to find the person who can
provide more details upon request. A `TODO` is not a commitment that the
person referenced will fix the problem. Thus when you create a `TODO`, it is
almost always your name that is given. >c
// TODO(kl@gmail.com): Use a "*" here for concatenation operator.
// TODO(Zeke): change this to use relations.
If your `TODO` is of the form "At a future date do something" make sure that
you either include a very specific date ("Fix by November 2005") or a very
specific event ("Remove this code when all clients can handle XML
responses.").
Deprecation Comments ~
Mark deprecated interface points with `@deprecated` docstring token.
You can mark an interface as deprecated by writing a comment containing the
word `@deprecated` in all caps. The comment goes either before the declaration
of the interface or on the same line as the declaration.
After `@deprecated`, write your name, email, or other identifier in
parentheses.
A deprecation comment must include simple, clear directions for people to fix
their callsites. In C, you can implement a deprecated function as an inline
function that calls the new interface point.
Marking an interface point `DEPRECATED` will not magically cause any callsites
to change. If you want people to actually stop using the deprecated facility,
you will have to fix the callsites yourself or recruit a crew to help you.
New code should not contain calls to deprecated interface points. Use the new
interface point instead. If you cannot understand the directions, find the
person who created the deprecation and ask them for help using the new
interface point.
==============================================================================
Formatting *dev-style-format*
Coding style and formatting are pretty arbitrary, but a project is much easier
to follow if everyone uses the same style. Individuals may not agree with
every aspect of the formatting rules, and some of the rules may take some
getting used to, but it is important that all project contributors follow the
style rules so that they can all read and understand everyone's code easily.
Non-ASCII Characters ~
Non-ASCII characters should be rare, and must use UTF-8 formatting.
You shouldn't hard-code user-facing text in source (OR SHOULD YOU?), even
English, so use of non-ASCII characters should be rare. However, in certain
cases it is appropriate to include such words in your code. For example, if
your code parses data files from foreign sources, it may be appropriate to
hard-code the non-ASCII string(s) used in those data files as delimiters. More
commonly, unittest code (which does not need to be localized) might contain
non-ASCII strings. In such cases, you should use UTF-8, since that is an
encoding understood by most tools able to handle more than just ASCII.
Hex encoding is also OK, and encouraged where it enhances readability — for
example, `"\uFEFF"`, is the Unicode zero-width no-break space character, which
would be invisible if included in the source as straight UTF-8.
Braced Initializer Lists ~
Format a braced list exactly like you would format a function call in its
place but with one space after the `{` and one space before the `}`
If the braced list follows a name (e.g. a type or variable name), format as if
the `{}` were the parentheses of a function call with that name. If there is
no name, assume a zero-length name. >c
struct my_struct m = { // Here, you could also break before {.
superlongvariablename1,
superlongvariablename2,
{ short, interior, list },
{ interiorwrappinglist,
interiorwrappinglist2 } };
Loops and Switch Statements ~
Annotate non-trivial fall-through between cases.
If not conditional on an enumerated value, switch statements should always
have a `default` case (in the case of an enumerated value, the compiler will
warn you if any values are not handled). If the default case should never
execute, simply use `abort()`: >c
switch (var) {
case 0:
...
break;
case 1:
...
break;
default:
abort();
}
Switch statements that are conditional on an enumerated value should not have
a `default` case if it is exhaustive. Explicit case labels are preferred over
`default`, even if it leads to multiple case labels for the same code. For
example, instead of: >c
case A:
...
case B:
...
case C:
...
default:
...
You should use: >c
case A:
...
case B:
...
case C:
...
case D:
case E:
case F:
...
Certain compilers do not recognize an exhaustive enum switch statement as
exhaustive, which causes compiler warnings when there is a return statement in
every case of a switch statement, but no catch-all return statement. To fix
these spurious errors, you are advised to use `UNREACHABLE` after the switch
statement to explicitly tell the compiler that the switch statement always
returns and any code after it is unreachable. For example: >c
enum { A, B, C } var;
...
switch (var) {
case A:
return 1;
case B:
return 2;
case C:
return 3;
}
UNREACHABLE;
Return Values ~
Do not needlessly surround the `return` expression with parentheses.
Use parentheses in `return expr`; only where you would use them in `x =
expr;`. >c
return result;
return (some_long_condition && another_condition);
return (value); // You wouldn't write var = (value);
return(result); // return is not a function!
Horizontal Whitespace ~
Use of horizontal whitespace depends on location.
Variables ~
>c
int long_variable = 0; // Don't align assignments.
int i = 1;
struct my_struct { // Exception: struct arrays.
const char *boy;
const char *girl;
int pos;
} my_variable[] = {
{ "Mia", "Michael", 8 },
{ "Elizabeth", "Aiden", 10 },
{ "Emma", "Mason", 2 },
};
<
==============================================================================
Parting Words
The style guide is intended to make the code more readable. If you think you
must violate its rules for the sake of clarity, do it! But please add a note
to your pull request explaining your reasoning.
vim:tw=78:ts=8:et:ft=help:norl: