Files
a0_basic_app
a1_vehicle
a2_async_sim
ab_glyph
ab_glyph_rasterizer
adler
adler32
agents
aho_corasick
anyhow
approx
aquamarine
ash
atty
bitflags
bytemuck
byteorder
cache_padded
cfg_if
chrono
color_quant
crc32fast
crossbeam_channel
crossbeam_deque
crossbeam_epoch
crossbeam_utils
deflate
draw2d
either
flexi_logger
generic_array
gif
glfw
glfw_sys
glob
image
indoc
itertools
jpeg_decoder
lazy_static
libc
libloading
log
matrixmultiply
memchr
memoffset
miniz_oxide
nalgebra
base
geometry
linalg
third_party
num_complex
num_cpus
num_integer
num_iter
num_rational
num_traits
owned_ttf_parser
paste
png
proc_macro2
proc_macro_error
proc_macro_error_attr
quote
raw_window_handle
rawpointer
rayon
rayon_core
regex
regex_syntax
scoped_threadpool
scopeguard
semver
semver_parser
serde
serde_derive
simba
smawk
spin_sleep
syn
terminal_size
textwrap
thiserror
thiserror_impl
tiff
time
triple_buffer
ttf_parser
typenum
unicode_width
unicode_xid
unindent
vk_sys
weezl
yansi
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
// Copyright 2016 - 2018 Ulrik Sverdrup "bluss"
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.
//!
//! General matrix multiplication for f32, f64 matrices. Operates on matrices
//! with general layout (they can use arbitrary row and column stride).
//!
//! This crate uses the same macro/microkernel approach to matrix multiplication as
//! the [BLIS][bl] project.
//!
//! We presently provide a few good microkernels, portable and for x86-64, and
//! only one operation: the general matrix-matrix multiplication (“gemm”).
//!
//! [bl]: https://github.com/flame/blis
//!
//! ## Matrix Representation
//!
//! **matrixmultiply** supports matrices with general stride, so a matrix
//! is passed using a pointer and four integers:
//!
//! - `a: *const f32`, pointer to the first element in the matrix
//! - `m: usize`, number of rows
//! - `k: usize`, number of columns
//! - `rsa: isize`, row stride
//! - `csa: isize`, column stride
//!
//! In this example, A is a m by k matrix. `a` is a pointer to the element at
//! index *0, 0*.
//!
//! The *row stride* is the pointer offset (in number of elements) to the
//! element on the next row. It’s the distance from element *i, j* to *i + 1,
//! j*.
//!
//! The *column stride* is the pointer offset (in number of elements) to the
//! element in the next column. It’s the distance from element *i, j* to *i,
//! j + 1*.
//!
//! For example for a contiguous matrix, row major strides are *rsa=k,
//! csa=1* and column major strides are *rsa=1, csa=m*.
//!
//! Strides can be negative or even zero, but for a mutable matrix elements
//! may not alias each other.
//!
//! ## Portability and Performance
//!
//! - The default kernels are written in portable Rust and available
//!   on all targets. These may depend on autovectorization to perform well.
//!
//! - *x86* and *x86-64* features can be detected at runtime by default or
//!   compile time (if enabled), and the crate following kernel variants are
//!   implemented:
//!
//!   - `fma`
//!   - `avx`
//!   - `sse2`
//!
//! ## Features
//!
//! ### `std`
//!
//! `std` is enabled by default.
//!
//! This crate can be used without the standard library (`#![no_std]`) by
//! disabling the default `std` feature. To do so, use this in your
//! `Cargo.toml`:
//!
//! ```toml
//! matrixmultiply = { version = "0.2", default-features = false }
//! ```
//!
//! Runtime CPU feature detection is available only when `std` is enabled.
//! Without the `std` feature, the crate uses special CPU features only if they
//! are enabled at compile time. (To enable CPU features at compile time, pass
//! the relevant
//! [`target-cpu`](https://doc.rust-lang.org/rustc/codegen-options/index.html#target-cpu)
//! or
//! [`target-feature`](https://doc.rust-lang.org/rustc/codegen-options/index.html#target-feature)
//! option to `rustc`.)
//!
//! ### `threading`
//!
//! `threading` is an optional crate feature
//!
//! Threading enables multithreading for the operations. The environment variable
//! `MATMUL_NUM_THREADS` decides how many threads are used at maximum. At the moment 1-4 are
//! supported and the default is the number of physical cpus (as detected by `num_cpus`).
//!
//! ## Other Notes
//!
//! The functions in this crate are thread safe, as long as the destination
//! matrix is distinct.
//!
//! ## Rust Version
//!
//! This version requires Rust 1.41.1 or later; the crate follows a carefully
//! considered upgrade policy, where updating the minimum Rust version is not a breaking
//! change.

#![doc(html_root_url = "https://docs.rs/matrixmultiply/0.2/")]
#![cfg_attr(not(feature = "std"), no_std)]

#[cfg(not(feature = "std"))]
extern crate alloc;
#[cfg(feature = "std")]
extern crate core;

#[macro_use]
mod debugmacros;
#[macro_use]
mod loopmacros;
mod archparam;
mod gemm;
mod kernel;
mod ptr;
mod threading;

mod aligned_alloc;
mod util;

#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
#[macro_use]
mod x86;
mod dgemm_kernel;
mod sgemm_kernel;

pub use crate::gemm::dgemm;
pub use crate::gemm::sgemm;