Module Simple63
Simple63 is a module for compressing and decompressing sequences of integers along the ideas described in the 2010 paper by Anh and Moffat. Like Simple-8b technique described in that paper, Simple63 is a word-bounded, and (as the name suggests) is the result of adapting Simple-8b to work with OCaml's 63-bit integers. While using int64
integer types would have been possible, the additional boxing required to manipulate int64
's make this option unappealing.
val max_value : int
The range of integers that can be encoded is
0
tomax_value
, wheremax_value
=(1 lsl 59) - 1
=576460752303423488
~5.8e17
.
exception
Invalid of int
exception
Invalid
is raised whenever the value of an integer to be encoded falls outside the valid range of such values. Seemax_value
.
val encode_to_seq : int Stdlib.Seq.t -> int Stdlib.Seq.t
encode_to_seq seq
returns an sequence of encoded words (63-bit integers). Example use:let in_lst = [1; 22; 333; 4444] in let in_seq = List.to_seq in_lst in let out_seq = encode_to_seq in_seq in (* confirm that we get out what we've put in: *) let in_seq' = decode_from_seq out_seq in let in_lst' = List.of_seq in_seq' in assert (in_lst = in_lst')
val decode_from_seq : int Stdlib.Seq.t -> int Stdlib.Seq.t
decode_from_seq seq
returns a sequence of decoded integers, whereseq
is a sequence of encoded integers. Seeencode_to_seq
for an example.
val encode_len : int Stdlib.Seq.t -> int
encode_len seq
returns the number of words into which input sequenceseq
would be encoded. The quotient of that count and the length ofseq
is the compression ratio.encode_len
merely encodesseq
to determine the length of output.
val encode_to_bigarray : int Stdlib.Seq.t -> int -> iba -> int
encode_to_bigarray seq offset a
encodes input sequenceseq
onto bytearraya
starting at offsetoffset
val decode_from_bigarray : (int -> unit) -> n:int -> offset:int -> iba -> unit
decode_from_bigarray f ~n ~offset a
decodesn
integers from bigarraya
, starting at array offsetoffset
, and calling functionf
with each decoded value.