Convert string to array of bytes in C

Rip Cord

Administrator
Staff member
Developer
As everyone knows, for a console program commandline arguments are passed to the program as an array of strings.

C:\>program.exe 123456

To use 123456 as a numerical data type, library functions are commonly used to convert the string.
int i;
i = atoi(argv[1]);
or
unsigned long i;
i = strtoul(argv[1]);

I couldn't find a library function in native C for converting a string to a byte array. Searching the internet for a method mainly turns up the silly answer that C stores strings as a byte array so there is no need to convert them.
Here is a simple way to convert a fixed length string to a byte array. I chose a 32 character string to use as a 16 byte key.
C:
// convert_to_byte.c : Defines the entry point for the console application.
//
//converts a string which is 32 characters long to a byte array
 
#pragma warning(disable : 4996)
#include "stdafx.h"
#include <string.h>
#include <stdlib.h>
 
 
int main(int argc, char *argv[])
{
   
	unsigned char input_string[33];	//a string variable for holding the input string
	unsigned char little_strings[16][3];	//split input string into 16 little strings
	int i;	// i and j are counters for the loops
	int j;
	unsigned __int8 byte_array[32];	//for the array of bytes
 
 
 
	printf("\nconvert_to_byte.exe version 0.1.1\n");
 
	//check number of command line arguments
	if(argc !=2) {
		printf("\n\nusage: %s input_string", argv[0]);
		printf("\ninput_string is a 32 digit string of numbers...\n\n");
		return 0;
	}
 
	//check if length of input number is 32 digits
	if((strlen(argv[1])) != 32) {
		printf("\n\nthe input number must be 32 digits...\n\n");
		return 0;
	}
 
 
	//print argv[1] as string and as characters
	printf("\nargv[1] as\n  string	 %s\n  characters ", argv[1]);
	for(i=0;i<32;i++) printf("%c ", argv[1][i]);
 
 
	//copy string from argv[1] to a string 32 characters plus null to terminate the string
	//not necessary, but provides a level of abstraction from the command line
	memcpy(input_string, argv[1], 33);
 
	printf("\n\ninput_string as\n  string	 %s\n  characters ", input_string);
	for(i=0;i<32;i++) printf("%c ", input_string[i]);
 
	//copy 2 characters at a time from the input string into 16 little strings
	printf("\n\ncopying 32 characters, 2 at a time, from the string into 16 little strings...");
	for(j=0,i=0; j<16; j++,i+=2 ) {
		little_strings[j][0] = input_string[i];	//2 characters to make 2 digits of byte
		little_strings[j][1] = input_string[i+1];
		little_strings[j][2] = '\0';	//null character terminates a string
	}
 
	//convert array of strings to array of byte values
	//by using the library function strtoul to convert a string to unsigned long
	//and using a type cast to convert unsigned long to byte, an unsigned 8 bit int
	printf("\n\nconverting each little string to a byte...");
	for(j=0;j<16;j++) byte_array[j] = (unsigned __int8)(strtoul(&little_strings[j][0],NULL,16));
 
	//print little strings as strings
	printf("\n\nindex		 ");
	for(j=0;j<16;j++) printf(" %2d",j);
	printf("\nlittle_strings");
	for(j=0;j<16;j++) printf(" %s", &little_strings[j][0]);
 
	//print byte array
	printf("\nbyte_array	");
	for(i=0;i<16;i++) printf(" %.2X", byte_array[i]);
 
	printf("\n\nthe little strings were printed using %%s");
	printf("\nthe bytes were printed using %%X");
 
   
	//for the address of the string
	//which was written: &little_strings[j][0]
	//can also use the short form:  little_strings[j]
 
	printf("\n\n\nfinished...\n\n");
	return 0;
}
output:
C:
C:\>convert_to_byte 0123456789ABCDEF0123456789ABCDEF
 
convert_to_byte.exe version 0.1.1
 
argv[1] as
  string	 0123456789ABCDEF0123456789ABCDEF
  characters 0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F
 
input_string as
  string	 0123456789ABCDEF0123456789ABCDEF
  characters 0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F
 
copying 32 characters, 2 at a time, from the string into 16 little strings...
 
converting each little string to a byte...
 
index			0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
little_strings 01 23 45 67 89 AB CD EF 01 23 45 67 89 AB CD EF
byte_array	   01 23 45 67 89 AB CD EF 01 23 45 67 89 AB CD EF
 
the little strings were printed using %s
the bytes were printed using %X
 
finished...
 
Last edited by a moderator:

Rip Cord

Administrator
Staff member
Developer
It's easy to show how silly is the answer that a string is already stored as a byte array. With a couple of lines of code display the address of input_string right after argv[1] is copied to it and pause the program.

Code:
memcpy(input_string, argv[1], 33);
printf("\naddress of input_string is %p", input_string);
printf("\npress enter to continue");
getchar();

console output:
Code:
C:\convert_to_byte 0123456789ABCDEF0123456789ABCDEF

address of input_string is 0013FF50
press enter to continue

While paused, use a hex editor to open ram and look at the memory location pointed to by input_string.

Code:
Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
0013FF50  30 31 32 33 34 35 36 37 38 39 41 42 43 44 45 46  0123456789ABCDEF
0013FF60  30 31 32 33 34 35 36 37 38 39 41 42 43 44 45 46  0123456789ABCDEF

or demonstrate the same with these lines of code:
Code:
	printf("\n\nHere is how c natively stores the input string as hex bytes:\n");
	for(i=0;i<32;i++) printf("%.2X", input_string[i]);


console output:
Code:
Here is how c natively stores the input string as hex bytes:
3031323334353637383941424344454630313233343536373839414243444546
Of course, when a person enters 123456... they want to use the numbers 123456... in calculations, not 303132333435... In order to use a string and have the entered numbers be the actual hex values, it is necessary to convert a string to a byte array even in C.
 

sebastiencs

New member
Hello,

There is no need to use a little_string array or strtoul.
There is a simpler methode to do that:

C:
#include <stdio.h>
#include <stdint.h>
#include <string.h>
 
/*
**  Convert ASCII to number
**  'A'  = 65, 'B' = 66, [...], 'F' = 70
**  '0'  = 48, '1' = 49, [...], '9' = 57
*/
uint8_t		 get_num(char c) {
  return ((c >= 'A' && c <= 'F') ? (c - 'A' + 10) : (c - '0'));
}
 
/*
**  Convert 2 numbers to byte with 2 half byte
**
**  binary:	 1001 1010
**  hexa:	=	9	 A
*/
uint8_t		 to_byte(char c1, char c2) {
  return (get_num(c1) << 4 | get_num(c2));
}
 
int			 main(int argc, char *argv[]) {
  size_t		i, j;
  uint8_t	   byte[16];
 
  if (argc > 1 && strlen(argv[1]) == 32)
  {
	for (i = 0, j = 0; j < sizeof(byte); i += 2, j += 1)
	{
	  byte[j] = to_byte(argv[1][i], argv[1][i + 1]);
	}
	for (i = 0; i < sizeof(byte); i += 1)
	{
	  printf((i + 1 != sizeof(byte)) ? ("%02X|") : ("%02X\n"), byte[i]);
	}
  }
  return (0);
}
 
Last edited:

Rip Cord

Administrator
Staff member
Developer
minor update, works for any size array of bytes
though method of sebastiencs is more correct way, this still uses strtoul to do the "heavy lifting" o_O
C:
int ishex(char* input)
{
	uint32_t i;
 
	for(i=0; i<strlen(input); i++) {
		if(!isxdigit(input[i])) return 1;
	}
	return 0;
}
 
uint8_t *to_bytes(char* a_string)
{
	uint8_t *bytes;
	size_t length;
	size_t size;
	uint8_t byte_size_string[3];
	uint32_t i;
 
	bytes = NULL;
	memset(byte_size_string, 0x00, 3);
	length = strlen(a_string);
 
	printf("\n\nhex string:  %s", a_string);
 
	if(length < 2) { printf("\n\nwarning, less than 2 characters not supported\n\n"); return NULL; }
	if(length % 2) { printf("\n\nwarning, %d characters is not an even number\n\n", length); return NULL; }
	if(ishex(a_string)) { printf("\n\nwarning, string is not all hex characters\n\n"); return NULL; }
 
	size = length / 2;
	bytes = (uint8_t *)malloc(size * sizeof(uint8_t));
	if (bytes == NULL) {
		printf("\n\nerror, failed to allocate 0x%X [%u] bytes memory\n\n", size, size);
		return NULL;
	}
 
	//convert string to byte array
	for(i=0;i<size;i++) {
		memcpy(byte_size_string, &a_string[2*i], 2);
		bytes[i] = (uint8_t)strtoul(byte_size_string, NULL, 0x10);
	}
	printf("\nhex bytes:   "); for(i=0;i<size;i++) printf("%.02X", bytes[i]);
 
	printf("\n\nstring length:  %2d", length);
	printf("\narray length:   %2d", size);
 
	return bytes;
}

source and header
 

Attachments

  • to_bytes.zip
    1.1 KB · Views: 1
Last edited:
Top