.LH

From NSMBW Modding Database
Revision as of 01:50, 4 February 2021 by Zementblock (talk | contribs) (→‎Programs and Tools)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
*.LH (Compressed File)
File Extension(s): .arc.LH & kpbin.LH & .bin.LH
Developer(s): Nintendo, Lempel-Ziv (Compression), Huffman (Compression), Treeki, Tempus, megazig
Category: Compressed Data File
Description: Compressed archive file used by Newer Super Mario Bros. Wii and others. Any file can be compressed. Used in Newer for .arc/.kpbin/.bin files.
Compression: *.LH: Lempel-Ziv (improved implementation of the LZ77 algorithm) + Huffman (adaptive entropy coding)
Tools to manipulate .arc.LH files:

NTCompress (Nintendo SDK)
LHDecompressor

This file type is used by Nintendo in several games to compress files, resulting in reduced disk space. It is also used in Newer Super Mario Bros. Wii to compress Wii U8 Archive files (.arc), non-editable map/overworld files for Newer (.kpbin), and other binary files.

The compression algorithm is a joint invention by Lempel-Ziv (most famously known for LZ77/LZ78), which uses RLE ("Runtime-Length-Encoding") as compression method, and Huffman, which encodes data based on individual symbols within the data rather than the whole data stream.

Important Advice:
There are typical problems that arise when a user tries to save a custom level uncompressed, please see here: Usage in Newer.


*.LH (File Type)

Files with this file type are compressed files, they cannot be edited as-is. The extension .LH is added to the extension of the encapsulated file.

Some examples
  • .arc (Wii U8 Archive) becomes .arc.LH
  • .kpbin (non-editable overworld file) becomes .kpbin.LH
  • .bin (binary file) becomes .bin.LH

.LH-encoded files have to be decoded at runtime or sooner. The game must know a method of decoding such files. This method was added to Newer Super Mario Bros. Wii, meaning that this game mod can interpret *.LH files. This is not the case for the base game New Super Mario Bros. Wii.

Encode and Decode

Files can be encoded/compressed and decoded/decompressed to/from the .LH file format by applying the correct algorithm to the data stream.

Please see Programs and Tools for available methods to decode and encode such files.

Usage in Newer

.LH files take precedence over other files with the same name that do not have the .LH extension.
See next chapter (Compressed Assets in Newer) for more info.

In Newer Super Mario Bros. Wii, several files are compressed to the *.LH format. Those are:

  • .arc (Wii U8 Archive) files in
    • BGs (Backgrounds)
    • Layouts
    • SpriteTex (Mod for additional models)
    • Stages (Levels)
    • Tilesets
    • Some exceptions: (message.arc, folders Others/E/J/P) are not compressed
  • ..kpbin files in
    • Maps (Overworld)
  • .bin files in
    • Texture (Texture for Maps) (see Warning below)
    • TitleReplay

Attention: Textures for maps must be .LH-compressed! The game crashes if any of the map textures are not in its compressed form.

Compressed Assets in Newer

As we have learned above, .LH files take precedence over other files with the same name that do not have the .LH extension.

Let's do an example. We'll look at 01-01.arc, which is the first level in the game. The following files with the same name exist in the Stages directory:

  • 01-01.arc
  • 01-01.arc.LH (compressed)

The file with the .LH extensions has priority and will be loaded by the game. The other file (01-01.arc) won't be loaded at all.


There are two options to play custom levels (or replace any other LH-encoded assets).
  1. Encode the custom level/tileset/etc with a tool from Programs and Tools, and directly replace the *.LH file.
  2. Delete the *.LH file, and save your level/tileset/etc as .arc uncompressed, replacing the existing file. Consider backing up deleted files.
Examples for 2.
  • Delete coin.arc.LH
    • Save your custom model as coin.arc
  • Delete 05-01.arc.LH
    • Save your custom level as 05-01.arc
  • Delete bgA_3FE2.arc.LH
    • Save your custom background as bgA_3FE2.arc

Programs and Tools

Two programs exist that can encode and decode LH-compressed data streams.

NTCompress comes bundled with the official Nintendo SDK (Software Development Kit). It was most likely used by Nintendo developers during the Wii era. It is a proprietary software and protected by patents. Due to those cirumcstances, no download link can be provided.

Usage
ntcompress <-d(4|8)|r|l|lex|h(8|16)|lh|lrc> [-o outputFile] [-A(4|8|16|32
)] [-<t|T>[width]] [-s] [-H] [-v] <inputFile>
      ntcompress -x [-o outputFile] [-s] <inputFile>
       -v                   Show version
       -r                   Runlength encode.
       -l                   LZ77 encode(compatible with previous LZ77).
       -lex                 LZ77 encode.
       -h BitSize(4|8)      Huffman   encode.
       -d BitSize(8|16)     Differential filter.
       -lh                  LZ and Huffman encode
       -lrc                 LZ and RangeCoder encode
       -A(4|8|16|32)        Align n byte for compressed filesize
       -o outputFile        Specify the output file name.
       -t[TypeWidth(1|2|4)] output C format text(little endian).
                            We can specify the type(1=u8,2=u16,4=u32).
       -T[TypeWidth(1|2|4)] output C format text(big endian).
                            We can specify the type(1=u8,2=u16,4=u32).
       -s                   Print no message if you've been successful in the conversion.
       -H                   Raw data header
       -x                   Extract compressed file.
Example Encode/Compress
ntcompress -lh -o kuribo.arc.LH kuribo.arc 
Example Decode/Decompress
ntcompress -x -o met.arc met.arc.LH

Please note: The output file comes first, initialized with -o. At the end is the input file.


LHDecompressor

Used in Mario Sports Mix. Not sure what the format is actually called -- it has the header byte 0x40, and specifies the uncompressed size in the same way as all the other CX formats.

NSMB Wii has (disabled) support for it, using the file extension "LH" which is why I've provisionally named it "LH Decompressor". There's another mystery compression format in there too (LRC).

This is pretty messy code, since it's mostly just a direct translation from the assembly -- but it works! (Not too much testing done yet, though.)

g++ -o LH LHDecompressor.cpp
./LH sourceFile.bin destFile.bin

Source: Treeki (Github)

Algorithm Details

Lempel-Ziv

LZ77 and LZ78 are the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. They are also known as LZ1 and LZ2 respectively. These two algorithms form the basis for many variations including Lempel–Ziv–Welch (LZW), Lempel-Ziv-Storer-Szymanski (LZSS), Lempel-Ziv-Markov chain algorithm (LZMA) and others. Besides their academic influence, these algorithms formed the basis of several ubiquitous compression schemes, including GIF and the DEFLATE algorithm used in Portable Network Graphics (PNG) and Zip (file format).

They are both theoretically dictionary coders. LZ77 maintains a sliding window during compression. This was later shown to be equivalent to the explicit dictionary constructed by LZ78—however, they are only equivalent when the entire data is intended to be decompressed.

Since LZ77 encodes and decodes from a sliding window over previously seen characters, decompression must always start at the beginning of the input. Conceptually, LZ78 decompression could allow random access to the input if the entire dictionary were known in advance. However, in practice the dictionary is created during encoding and decoding by creating a new phrase whenever a token is output.

Source: Wikipedia

Huffman

In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding, an algorithm developed by David A. Huffman while he was a Sc.D. student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes".

The output from Huffman's algorithm can be viewed as a variable-length code table for encoding a source symbol (such as a character in a file). The algorithm derives this table from the estimated probability or frequency of occurrence (weight) for each possible value of the source symbol. As in other entropy encoding methods, more common symbols are generally represented using fewer bits than less common symbols. Huffman's method can be efficiently implemented, finding a code in time linear to the number of input weights if these weights are sorted. However, although optimal among methods encoding symbols separately, Huffman coding is not always optimal among all compression methods - it is replaced with arithmetic coding or asymmetric numeral systems if better compression ratio is required.

Source: Wikipedia