lib/microtar/README.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268

# microtar

A lightweight tar library written in ANSI C.

This version is a fork of [rxi's microtar](https://github.com/rxi/microtar)
with bugfixes and API changes aimed at improving usability, but still keeping
with the minimal design of the original library.

## License

This library is free software; you can redistribute it and/or modify it under
the terms of the MIT license. See [LICENSE](LICENSE) for details.


## Supported format variants

No effort has been put into handling every tar format variant. Basically
what is accepted is the "old-style" format, which appears to work well
enough to access basic archives created by GNU `tar`.


## Basic usage

The library consists of two files, `microtar.c` and `microtar.h`, which only
depend on a tiny part of the standard C library & can be easily incorporated
into a host project's build system.

The core library does not include any I/O hooks as these are supposed to be
provided by the host application. If the C library's `fopen` and friends is
good enough, you can use `microtar-stdio.c`.


### Initialization

Initialization is very simple. Everything the library needs is contained in
the `mtar_t` struct; there is no memory allocation and no global state. It is
enough to zero-initialize an `mtar_t` object to put it into a "closed" state.
You can use `mtar_is_open()` to query whether the archive is open or not.

An archive can be opened for reading _or_ writing, but not both. You have to
specify which access mode you're using when you create the archive.

```c
mtar_t tar;
mtar_init(&tar, MTAR_READ, my_io_ops, my_stream);
```

Or if using `microtar-stdio.c`:

```c
int error = mtar_open(&tar, "file.tar", "rb");
if(error) {
    /* do something about it */
}
```

Note that `mtar_init()` is called for you in this case and the access mode is
deduced from the mode flags.


### Iterating and locating files

If you opened an archive for reading, you'll likely want to iterate over
all the files. Here's the long way of doing it:

```c
mtar_t tar;
int err;

/* Go to the start of the archive... Not necessary if you've
 * just opened the archive and are already at the beginning.
 * (And of course you normally want to check the return value.) */
mtar_rewind(&tar);

/* Iterate over the archive members */
while((err = mtar_next(&tar)) == MTAR_ESUCCESS) {
    /* Get a pointer to the current file header. It will
     * remain valid until you move to another record with
     * mtar_next() or call mtar_rewind() */
    const mtar_header_t* header = mtar_get_header(&tar);

    printf("%s (%d bytes)\n", header->name, header->size);
}

if(err != MTAR_ENULLRECORD) {
    /* ENULLRECORD means we hit end of file; any
     * other return value is an actual error. */
}
```

There's a useful shortcut for this type of iteration which removes the
loop boilerplate, replacing it with another kind of boilerplate that may
be more palatable in some cases.

```c
/* Will be called for each archive member visited by mtar_foreach().
 * The member's header is passed in as an argument so you don't need
 * to fetch it manually with mtar_get_header(). You can freely read
 * data (if present) and seek around. There is no special cleanup
 * required and it is not necessary to read to the end of the stream.
 *
 * The callback should return zero (= MTAR_SUCCESS) to continue the
 * iteration or return nonzero to abort. On abort, the value returned
 * by the callback will be returned from mtar_foreach(). Since it may
 * also return normal microtar error codes, it is suggested to use a
 * positive value or pass the result via 'arg'.
 */
int foreach_cb(mtar_t* tar, const mtar_header_t* header, void* arg)
{
    // ...
    return 0;
}

void main()
{
    mtar_t tar;

    // ...

    int ret = mtar_foreach(&tar, foreach_cb, NULL);
    if(ret < 0) {
        /* Microtar error codes are negative and may be returned if
         * there is a problem with the iteration. */
    } else if(ret == MTAR_ESUCCESS) {
        /* If the iteration reaches the end of the archive without
         * errors, the return code is MTAR_ESUCCESS. */
    } else if(ret > 0) {
        /* Positive values might be returned by the callback to
         * signal some condition was met; they'll never be returned
         * by microtar */
    }
}
```

The other thing you're likely to do is look for a specific file:

```c
/* Seek to a specific member in the archive */
int err = mtar_find(&tar, "foo.txt");
if(err == MTAR_ESUCCESS) {
    /* File was found -- read the header with mtar_get_header() */
} else if(err == MTAR_ENOTFOUND) {
    /* File wasn't in the archive */
} else {
    /* Some error occurred */
}
```

Note this isn't terribly efficient since it scans the entire archive
looking for the file.


### Reading file data

Once pointed at a file via `mtar_next()` or `mtar_find()` you can read the
data with a simple POSIX-like API.

- `mtar_read_data(tar, buf, count)` reads up to `count` bytes into `buf`,
  returning the actual number of bytes read, or a negative error value.
  If at EOF, this returns zero.

- `mtar_seek_data(tar, offset, whence)` works exactly like `fseek()` with
  `whence` being one of `SEEK_SET`, `SEEK_CUR`, or `SEEK_END` and `offset`
  indicating a point relative to the beginning, current position, or end
  of the file. Returns zero on success, or a negative error code.

- `mtar_eof_data(tar)` returns nonzero if the end of the file has been
  reached. It is possible to seek backward to clear this condition.


### Writing archives

Microtar has limited support for creating archives. When an archive is opened
for writing, you can add new members using `mtar_write_header()`.

- `mtar_write_header(tar, header)` writes out the header for a new member.
  The amount of data that follows is dictated by `header->size`, though if
  the underlying stream supports seeking and re-writing data, this size can
  be updated later with `mtar_update_header()` or `mtar_update_file_size()`.

- `mtar_update_header(tar, header)` will re-write the previously written
  header. This may be used to change any header field. The underlying stream
  must support seeking. On a successful return the stream will be returned
  to the position it was at before the call.

File data can be written with `mtar_write_data()`, and if the underlying stream
supports seeking, you can seek with `mtar_seek_data()` and read back previously
written data with `mtar_read_data()`. Note that it is not possible to truncate
the file stream by any means.

- `mtar_write_data(tar, buf, count)` will write up to `count` bytes from
  `buf` to the current member's data. Returns the number of bytes actually
  written or a negative error code.

- `mtar_update_file_size(tar)` will update the header size to reflect the
  actual amount of written data. This is intended to be called right before
  `mtar_end_data()` if you are not declaring file sizes in advance.

- `mtar_end_data(tar)` will end the current member. It will complain if you
  did not write the correct amount data provided in the header. This must be
  called before writing the next header.

- `mtar_finalize(tar)` is called after you have written all members to the
  archive. It writes out some null records which mark the end of the archive,
  so you cannot write any more archive members after this.

Note that `mtar_close()` can fail if there was a problem flushing buffered
data to disk, so its return value should always be checked.


## Error handling

Most functions that return `int` return an error code from `enum mtar_error`.
Zero is success and all other error codes are negative. `mtar_strerror()` can
return a string describing the error code.

A couple of functions use a different return value convention:

- `mtar_foreach()` may error codes or an arbitrary nonzero value provided
  by the callback.
- `mtar_read_data()` and `mtar_write_data()` returns the number of bytes read
  or written, or a negative error code. In particular zero means that no bytes
  were read or written.
- `mtar_get_header()` may return `NULL` if there is no valid header.
  It is only possible to see a null pointer if misusing the API or after
  a previous error so checking for this is usually not necessary.

There is essentially no support for error recovery. After an error you can
only do two things reliably: close the archive with `mtar_close()` or try
rewinding to the beginning with `mtar_rewind()`.


## I/O hooks

You can provide your own I/O hooks in a `mtar_ops_t` struct. The same ops
struct can be shared among multiple `mtar_t` objects but each object gets
its own `void* stream` pointer.

Name    | Arguments                                 | Required
--------|-------------------------------------------|------------
`read`  | `void* stream, void* data, unsigned size` | If reading
`write` | `void* stream, void* data, unsigned size` | If writing
`seek`  | `void* stream, unsigned pos`              | If reading
`close` | `void* stream`                            | Always

`read` and `write` should transfer the number of bytes indicated
and return the number of bytes actually read or written, or a negative
`enum mtar_error` code on error.

`seek` must have semantics like `lseek(..., pos, SEEK_SET)`; that is,
the position is an absolute byte offset in the stream. Seeking is not
optional for read support, but the library only performs backward
seeks under two circumstances:

- `mtar_rewind()` seeks to position 0.
- `mtar_seek_data()` may seek backward if the user requests it.

Therefore, you will be able to get away with a limited forward-only
seek function if you're able to read everything in a single pass use
the API carefully. Note `mtar_find()` and `mtar_foreach()` will call
`mtar_rewind()`.

`close` is called by `mtar_close()` to clean up the stream. Note the
library assumes that the stream handle is cleaned up by `close` even
if an error occurs.

`seek` and `close` should return an `enum mtar_error` code, either
`MTAR_SUCCESS`, or a negative value on error.