head 1.3; access; symbols; locks; strict; comment @# @; 1.3 date 2001.08.31.11.11.04; author bbeaver; state Exp; branches; next 1.2; 1.2 date 2001.08.30.10.08.46; author bbeaver; state Exp; branches; next 1.1; 1.1 date 2001.08.24.07.21.09; author bbeaver; state Exp; branches; next ; desc @@ 1.3 log @no message @ text @//=========================================================================== // $Id: sha_1_digest.v,v 1.3 2001/08/31 11:13:44 Blue Beaver Exp $ // // Copyright 2001 Blue Beaver. All Rights Reserved. // // Summary: Calculate SHA1 digest of some data. // This algorithm is specified in FIPS PUB 180-1 // See: http://csrc.nist.gov/cryptval/shs.html // See also: http://csrc.nist.gov/cryptval/shs/sha1-vectors.zip // See also: http://www.w3.org/PICS/DSig/SHA1_1_0.html // See also: http://www.altera.com/literature/wp/wp_hcores_sha1.pdf // // This library is free software; you can distribute it and/or modify it // under the terms of the GNU Lesser General Public License as published // by the Free Software Foundation; either version 2.1 of the License, or // (at your option) any later version. // // This library is distributed in the hope that it will be useful, but // WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. // See the GNU Lesser General Public License for more details. // // You should have received a copy of the GNU Lesser General Public License // along with this library. If not, write to // Free Software Foundation, Inc. // 59 Temple Place, Suite 330 // Boston, MA 02111-1307 USA // // Author's note about this license: The intention of the Author and of // the Gnu Lesser General Public License is that users should be able to // use this code for any purpose, including combining it with other source // code, combining it with other logic, translated it into a gate-level // representation, or projected it into gates in a programmable or // hardwired chip, as long as the users of the resulting source, compiled // source, or chip are given the means to get a copy of this source code // with no new restrictions on redistribution of this source. // // If you make changes, even substantial changes, to this code, or use // substantial parts of this code as an inseparable part of another work // of authorship, the users of the resulting IP must be given the means // to get a copy of the modified or combined source code, with no new // restrictions on redistribution of the resulting source. // // Separate parts of the combined source code, compiled code, or chip, // which are NOT derived from this source code do NOT need to be offered // to the final user of the chip merely because they are used in // combination with this code. Other code is not forced to fall under // the GNU Lesser General Public License when it is linked to this code. // The license terms of other source code linked to this code might require // that it NOT be made available to users. The GNU Lesser General Public // License does not prevent this code from being used in such a situation, // as long as the user of the resulting IP is given the means to get a // copy of this component of the IP with no new restrictions on // redistribution of this source. // // This code was developed using VeriLogger Pro, by Synapticad. // Their support is greatly appreciated. // // NOTE: Calculate a Message Digest as specified by the SHA-1 algorithm. // Return a 160-bit Digest. // // NOTE: The SHA-1 algorithm applies to 512-BIT blocks. Operations // are applied to 32-bit words. Each block is 16 words long. // // NOTE: The SHA-1 algorithm uses these following operations: // &, |, ^, ~, +, <<<> (shift left circular a constant distance) // // NOTE: The first step applied to the Data is to pad it to a block boundry: // 1) Take a message expressed as a string of BITS! // 2) Append a 1'b1 to the end of the string // 3) Append enough 1'b0s to fill the block up to // 512 - 64 bits. // 4) Append the number of bits in the ORIGINAL message, // expressed as a 64 bit number, MSB bit first, to // the message. // // NOTE: A sequence of 80ogical Functions are defined: // 1) Functions 0 thru 19: f(t)(B,C,D) = (B & C) | (~B & D); // 2) Functions 20 thru 39: f(t)(B,C,D) = B ^ C ^ D; // 3) Functions 40 thru 59: f(t)(B,C,D) = (B & C) | (B & D) | (C & D); // 4) Functions 60 thru 79: f(t)(B,C,D) = B ^ C ^ D; // // NOTE: A sequence of 80 Constants are defined: // 1) Constants 0 thru 19: K(t) = 32'h5A827999; // 2) Constants 20 thru 39: K(t) = 32'h6ED9EBA1; // 3) Constants 40 thru 59: K(t) = 32'h8F1BBCDC; // 4) Constants 60 thru 79: K(t) = 32'hCA62C1D6; // // NOTE: The calculation of the Digest uses 2 buffers, each with 5 32-bit words. // The entries in the first buffer are called A, B, C, D, and E; // The entries in the second buffer are called H_0, H_1, H_2, H_3, H_4; // In addition, it uses a sequence of 80 32-bit words for state, called // W_0, W_1, ..., W_79; // A single 32-bit TEMP is also used. // // NOTE: At the start of a Digest calculation, the H_* elements are initialized. // a) H_0 = 32'h67452301; // H_1 = 32'hEFCDAB89; // H_2 = 32'h98BADCFE; // H_3 = 32'h10325476; // H_4 = 32'hC3D2E1F0; // // NOTE: Finally, each block (including the last padded block) is worked on: // Consider each block as 16 32-bit words. // b) Copy the block into the first 16 words of W memory, with W_0 // the Most Significant word of the block. // c) For t = 16 to 79, W(t) = Shift Left Circular by 1 of // (W(t - 3) ^ W(t - 8) ^ W(t - 14) ^ W(t - 16)); // d) Assign A = H_0; B = H_1; C = H_2; D = H_3; E = H_4; // e) For t = 0 thru 79, do: // f) TEMP = shift left circular by 5 (A) + f(t)(B,C,D) + E + W(t) + K(t); // g) E = D; D = C; C = shift left circular by 30 of (B); B = A; A = TEMP; // h) After each block, H_0 = H_0 + A, H_1 = H_1 + B; H_2 = H_2 + C; H_3 = H_3 + D, H_4 = H_4 + E; // After the last block is operated on, the Digest is {H_0, H_1, H_2, H_3, H_4}; // // NOTE: The standard gives an alternate method of comultation, in chapter 8, which // uses 16 words of storage instead of 80. This is possible because the // entries in the W array are only written, read 4 times, and discarded. // It works this way: // b) Copy block into W memory as in b) above. // c) Assign A = H_0; B = H_1; C = H_2; D = H_3; E = H_4 as in d) above. // d) For t = 0 thru 15, do: // e) Calculate a modified address s = t & 7'h0F; // f) TEMP = shift left circular by 5 (A) + f(t)(B,C,D) + E + W(s) + K(t); // g) E = D; D = C; C = shift left circular by 30 of (B); B = A; A = TEMP; // h) For t = 16 thru 79, do: // i) Calculate a modified address s = t & 7'h0F; // j) W(s) = shift left circular by 1 of // ( W((s + 13) & 7'h0F) ^ W((s + 8) & 7'h0F) // ^ W((s + 2) & 7'h0F) ^ W((s + 0) & 7'h0F)); // k) TEMP = shift left circular by 5 (A) + f(t)(B,C,D) + E + W(s) + K(t); // l) E = D; D = C; C = shift left circular by 30 of (B); B = A; A = TEMP; // m) After each block, H_0 = H_0 + A, H_1 = H_1 + B; H_2 = H_2 + C; H_3 = H_3 + D, H_4 = H_4 + E; // After the last block is operated on, the Digest is {H_0, H_1, H_2, H_3, H_4}; // // NOTE: There are 4 Memory Reads and 1 Memory Write per iteration. The obvious // way to do these are as references to a single-port memory. It would // be 20% faster to use a dual-port memory, and do the last read and the // write at the same time. It would be even faster if 2 words could be // read out of the Ram per clock, either by implementing the RAM as a // bunch of flops followed by MUXes, as a 2-read SRAM, or as 2 copies of // an SRAM which can read 1 word per clock. // // NOTE: There is one place above where 5 numbers are added together. This might // turn out to be slow. They can be done sequentially. It might also be // possible to use 2 adders in parallel to do this faster. It might finally // possible to use a Wallace Tree adder scheme. This will unfortunately // NOT fit will into an FPGA with a fast adder built-in. // // NOTE: Using a single-port SRAM, it looks like it would take 16 + (5 * 64) = 336 // clocks to handle a 64-byte block. // Using a RAM which can write at the same time it can read, it looks like it // would take 16 + (4 * 64) = 272 clocks. // If 2 reads per clock were possible, followed by a write, it looks like it // would take 16 + (3 * 64) = 208 clocks. But 2 adds would need to be done // in parallel. // If 2 reads per clock were possible, and the write overlapped the second // read phase, it looks like it would take 16 + (2 * 64) = 144 clocks. // Again, 2 reads would need to be done per clock. // If ALL reads could happen at the same time, and the 4 adds happened at // the same time, it looks like this could be done in 80 clocks. // If ALL reads could happen at the same time, and more than 1 write could // happen per clock, I don't see any reason why things couldn't happen // EVEN FASTER. It seems possible to do 2 results per clock trivially. // Using much hardware, it seems possible to do 1 block in 40 clocks. // To achieve this, the user would have to offer 2 data items per clock // during the first 8 clocks. // Those are the choices. An 8-to-1 range in the number of clocks. I imagine // that the faster version would need 8 times as much logic, though! // // NOTE: The standard gives 2 example messages: // {8'h61, 8'h62, 8'h63} results in the digest // {32'hA9993E36, 32'h4706816A, 32'hBA3E2571, 32'h7850C26C, 32'h9CD0D89D} // // The message with ascii representation (no parity bit) // "abcdbcdecdefdefgefghfghighijhijkijkljklmklnmlmnomnopnopq" // results in a 2-block message with digest // {32'h84983E44, 32'h1C3BD26E, 32'hBAAE4AA1, 32'hF95129E5, 32'hE54670F1} // // The message consisting of 1 million "a" characters results in digest // {32'h34AA973C, 32'hD4C4DAA4, 32'hF61EEB2B, 32'hDBAD2731, 32'h6534016F} // // More test messages are available via the reference at the top. // //=========================================================================== `timescale 1ns/1ps module sha_1_digest_336_clocks ( start_new_digest, data_in_32, ready_for_next_block, march_out_digest, digest_out_32 ); input start_new_digest; input [31:0] data_in_32; output ready_for_next_block; input march_out_digest; output [31:0] digest_out_32; // The simplest version. endmodule @ 1.2 log @no message @ text @d2 1 a2 1 // $Id: sha_1_digest.v,v 1.2 2001/08/30 09:54:16 Blue Beaver Exp $ d136 8 d150 21 d189 6 a194 5 module sha_1_digest ( use_F_for_CRC, present_crc, data_in_8, next_crc d196 7 a202 4 input use_F_for_CRC; input [`NUMBER_OF_BITS_IN_CRC - 1 : 0] present_crc; input [7:0] data_in_8; output [`NUMBER_OF_BITS_IN_CRC - 1 : 0] next_crc; @ 1.1 log @no message @ text @d2 1 a2 1 // $Id: sha_1_digest.v,v 1.1 2001/08/24 07:20:27 Blue Beaver Exp $ a6 2 // I don't enough to even write the description (even though // I HAVE read and tuned SHA code! d9 2 a10 1 // Seealso: http://www.w3.org/PICS/DSig/SHA1_1_0.html a11 1 // Want to do 1 or more bytes per clock. d59 96 a154 1 // NOTE: I am greatly confused about the order in which the CRC should be @