Sunday, December 10, 2023

E-Book: Machine Learning Guia Completo em C/C++ com OpenCV e YOLO/Darknet para Windows e Linux

Hello Everyone, I would like to announce my first technical book in the field of Computer Vision and Machine Learning (OpenCV and YOLO/Darknet). With direct information and step-by-step details, you will find:

  1. How to set up the development environment using OpenCV and YOLO/Darknet for both Windows and Linux
  2. Creating, preparing, and augmenting your image dataset for training an intelligent model
  3. Setting up and Training a custom YOLO/Darknet model
  4. Complete development of a real project, including object detection and character recognition
  5. Access to all codes and resources generated and used in the elaboration of this book

Available in English and Portuguese (Brazil) at the link below:

https://lnkd.in/drQZhcTg - PT-BR

https://lnkd.in/dwZ7MtAK - EN

Greetings,

André

Wednesday, May 27, 2020

Creating an OpenCL demo application using the NXP Demo Framework (GPU SDK)

As mentioned before, in this post I am going to cover the setup and how to create an OpenCL based application, using the NXP Demo Framework.

The Demo Framework as it is described in its documentation as: “A multi-platform framework for fast and easy demo development. The framework abstracts away all the boilerplate & OS specific code of allocating windows, creating the context, texture loading, shader compilation, render loop, animation ticks, benchmarking graph overlays etc. Thereby allowing the demo/benchmark developer to focus on writing the actual 'demo' code.”

We will be cross-compiling our application, using Linux in the host and the Yocto as our target’s system. So, that requires a Yocto Image and SDK (arm gcc).

You can get it from the NXP website or download the source code and build your own image using the latest release available (Recommended). So, let’s start from it.


1 - Building Yocto from source

a) Preparing the Host Machine

$ sudo apt-get install gawk wget git-core diffstat unzip texinfo build-essential chrpath libsdl1.2-dev xterm curl ncurses-dev
$ mkdir ~/bin
$ curl http://commondatastorage.googleapis.com/git-repo-downloads/repo > ~/bin/repo
$ chmod +x ~/bin/repo
$ export PATH=~/bin/:$PATH


b) BSP Source Code Download

$ mkdir fsl-arm-yocto-bsp
$ cd fsl-arm-yocto-bsp
$ repo init -u git://source.codeaurora.org/external/imx/imx-manifest.git -b imx-linux-zeus -m imx-5.4.3-2.0.0.xml
$ repo sync


c) Configuring the target (i.MX8QXP as an example)

$ EULA=1 MACHINE=imx8qxpmek DISTRO=fsl-imx-xwayland source ./imx-setup-release.sh -b build-imx8qxpmek-xwayland


d) Building the BSP image

$ bitbake fsl-image-full (for full image) à we will use this one, already includes qt5
$ bitbake fsl-image-qt5 (for qt5 image)


e) Flashing the Yocto Image

Extract the image (compressed as xxxxxxxxx.rootfs.sdcard.tar.bz2), located in ../build-imx8qxpmek-xwayland/tmp/deploy/images/imx8qxpmek

$ dd if=imx8qxpmek-xwayland/tmp/deploy/images/imx8qxpmek/imx-image-full-imx8qxpmek-xxxxxxxxxxxxx.rootfs.sdcard of=/dev/sde 

change the /dev/sde to your correct sd card device


f) Building and installing the SDK package

$ bitbake fsl-image-full -c populate sdk
$ cd ../build-imx8qxpmek-xwayland/tmp/deploy/sdk
$ ./fsl-imx-xwayland-glibc-x86_64-imx-image-full-aarch64-imx8qxpmek-toolchain-5.4-zeus.sh  

default location will be: /opt/fsl-imx-xwayland/5.4-zeus/

Now with the BSP and Toolchain installed, we can proceed to the Demo Framework installation.


2 - Installing the Demo Framework

a) Download (or clone) the NXP Demo Framework source code

$ git clone https://github.com/NXPmicro/gtec-demo-framework.git


b)      Set the toolchain environment variables by sourcing the setup environment script

$ source /opt/fsl-imx-xwayland/5.4-zeus/environment-setup-aarch64-poky-linux


c) Set the Demo framework environment variables by sourcing the prepare.sh script

$ source prepare.sh




3 - Creating the OpenCL demo application

Change directory to DemoApps/OpenCL and create a new demo application from it using the FslBuildNew.py script as follows:

$ FslBuildNew.py OpenCL1_2 HelloWorld  

use OpenCL1_1 if needed




At this point we already have our OpenCL demo skeleton, in the HelloWord.cpp we can add our base OpenCL code (pretty much the same code we used in the last OpenCL post) and as this post is big enough, let's make it smaller by directly downloading the code here.

For the OCL kernel, for a testing we can simply do a copy from the input to the output buffer, the input buffer has a size of 3840*2160 * 4, with values generated randomically:


__kernel void hello_world      (__global uchar *input, __global uchar *output)
{
   int id = get_global_id (0);
   output[id] = input[id];
}


The OCL kernel can be saved with the name: hello_world.cl (you can create a folder called “Content” and place it there).

Once the steps above are completed, we can proceed to build our demo application.


4 - Building the OpenCL demo application

a) Change dir to the HelloWorld demo

$ cd HelloWorld


b) Use the FslBuild.py to generate the makefile, the FslBuild script can accept some input arguments, the one you will need now is to set the target backend, in the case of this sample, the wayland (xwayland to be correct), it can be also FB or x11.

$ FslBuild.py –Variants [WindowSystem=Wayland]




For the first build, it can take some time, since the DF modules must be built as well.


c) If errors regarding 3rd party libraries occurs, like:


Or


use the following argument to the build command line: --Recipes [*]

$ FslBuild.py --Variants [WindowSystem=Wayland] --Recipes [*]


d) the binary (Executable) file will be place at:

../gtec-demo-ramework/build/Yocto/DemoApps/OpenCL/HelloWorld/OpenCL.HelloWorld


5 - Running the OpenCL demo application

a) Copy the OpenCL.HelloWorld and the hello_world.cl to the target rootfs




b) $ ./OpenCL.HelloWorld




Process time from GPU is ZERO, because of the timer resolution I am using, and I am just measuring the clEnqueueNDRange function, which is responsible for the OCL kernel to run. If you measure the write and read time from the memory it will be bigger, in this case is suggested to use physical allocated memory and then mapping the buffers using OCL, which is called zero-copy operation. This will be a topic for another post.


EOF !

Wednesday, May 13, 2020

New topics in this blog !

There is a long time I haven't been posting anything new, from now, I will starting adding some content that mixes computer vision and compute (OpenCL), and uses a very nice tool for designing very stable and robust applications, the NXP GPU SDK, also known as the Demo Framework, which will be the topic for the next post.

 EOF!

Monday, April 25, 2016

UPDATE - Building OpenCV-2.4.X for Freescale's i.MX6 BSP (Yocto)

Just updating some information of an old post:  http://imxcv.blogspot.com.br/2014/02/building-opencv-24x-for-freescales-imx6.html

you can base your local.conf on the following one:


 
MACHINE ??= 'imx6qpsabreauto'

DISTRO ?= 'fsl-imx-fb'

PACKAGE_CLASSES ?= "package_rpm"

EXTRA_IMAGE_FEATURES = "debug-tweaks"

USER_CLASSES ?= "buildstats image-mklibs"

PATCHRESOLVE = "noop"

BB_DISKMON_DIRS = "\

    STOPTASKS,${TMPDIR},1G,100K \

    STOPTASKS,${DL_DIR},1G,100K \

    STOPTASKS,${SSTATE_DIR},1G,100K \

    STOPTASKS,/tmp,100M,100K \

    ABORT,${TMPDIR},100M,1K \

    ABORT,${DL_DIR},100M,1K \

    ABORT,${SSTATE_DIR},100M,1K \

    ABORT,/tmp,10M,1K"

PACKAGECONFIG_append_pn-qemu-native = " sdl"

PACKAGECONFIG_append_pn-nativesdk-qemu = " sdl"

ASSUME_PROVIDED += "libsdl-native"

CONF_VERSION = "1"



DL_DIR ?= "${BSPDIR}/downloads/"

ACCEPT_FSL_EULA = "1"



CORE_IMAGE_EXTRA_INSTALL += "libopencv-core-dev libopencv-imgproc-dev libopencv-objdetect-dev libopencv-ml-dev libopencv-calib3d-dev libopencv-highgui-dev"



LICENSE_FLAGS_WHITELIST = "commercial"


that will solve mostly of your issues.

EOF !

Tuesday, February 23, 2016

OpenCL Embedded Profile (GC2000) - Hello World Application


Based on our last post, let's take a look on a very basic "hello world" application. But before the walkthrough there are some information you need to be aware of. 

OpenCL applications are divided basically in 2 parts: Host and Device codes.

The code for host is responsible for the  hardware initialization, create CL objects (memory, program, kernel, command queues), send signals to device in order to write/read/execute/sync data. 

The code for device is the OpenCL Kernel (C99 based language), which will be accelerated by the GPU. This is a veeeeeeeeery basic description just to keep things as easy as possible, you can find a lot of good  and detailed information on the web to cover it in deep.

Below is the flow for an OpenCL application:




As an example, let's use a very simple CL kernel that only makes a copy from the input  to the output buffer.

Please, note that the pieces of code below was modified from original source (link available down in this post) just to make it more readable , since they are placed in separated functions.

STEP 1 - Defining the Problem Size


OpenCL is meant to solve specific problems, which means that once defined how your kernel will operate, its arguments list and size of your memory objects can't be modified unless releasing all the objects and re-create them with the new desired values.

Defining the problem size means defining our global work-group size and dimension, it can be the length of an array (1D - our use case) or a 2D/3D matrix (we will see it in another sample code on a future post). 

On our Hello World application the Global work-group size is 512, populated with random data (just for testing). Also we need to set our local work-group size (Global work-group data access). Based on the last post, the preferred local work-group size is 16 (per dimension), we will use this value for better performance.

        
        cl_platform_id  platform_id;
        cl_device_id  device_id;
        cl_context  context;
        cl_command_queue cq;
        cl_program  program;
        cl_kernel  kernel;
        cl_mem helloworld_mem_input = NULL;
        cl_mem helloworld_mem_output = NULL;

        // one dimensional work-items
 int dimension = 1;
 
 // our problem size
 size_t global = 512;
 
 // preferred work-group size
 size_t local = 16;
 
 int size;
 
 // input data buffer - random values for the helloworld sample
 char *input_data_buffer;
 
 // output data_buffer for results
 char *output_data_buffer;

 cl_int ret;
 
        // make our size equals our global work-group size
 size = global;
  
 input_data_buffer = (char *) malloc (sizeof (char) * size);
 if (! input_data_buffer)
 {
  printf ("\nFailed to allocate input data buffer memory\n");
  return 0;
 }

 output_data_buffer = (char *) malloc (sizeof (char) * size);
 if (! output_data_buffer)
 {
  printf ("\nFailed to allocate output data buffer memory\n");
  return 0;
 }
 
 // populate data_buffer with random values 
 for (int i = 0; i < size; i++)
 {
  input_data_buffer[i] = rand () % 255;
 }


STEP 2 - Hardware Initialization


This is the basic step to an OpenCL application. Hardware initialization consists in:
  • Listing available platforms (one GC2000)
  • Getting compute device information (Vivante OCL EP device)
  • Create the CL Context
  • Create the Command Queue (Control information from Host to Device)

 
       cl_uint  platforms, devices;
 cl_int error;

 //-------------------------------------------
 // cl_int clGetPlatformIDs (cl_uint num_entries, cl_platform_id *platforms, cl_uint *num_platforms)
 //--------------------------------------------
 error = clGetPlatformIDs (1, &platform_id, &platforms);
 if (error != CL_SUCCESS) 
  return CL_ERROR;

 //--------------------------------------------
 // cl_int clGetDeviceIDs (cl_platform_id platform, cl_device_type device_type, cl_uint num_entries, 
 //   cl_device_id *devices, cl_uint *num_devices)
 //--------------------------------------------
 error = clGetDeviceIDs (platform_id, CL_DEVICE_TYPE_GPU, 1, &device_id, &devices);
 if (error != CL_SUCCESS) 
  return CL_ERROR;
 
 //--------------------------------------------
 // cl_context clCreateContext (cl_context_properties *properties, cl_uint num_devices, 
 //    const cl_device_id *devices, void *pfn_notify (const char *errinfo, 
 //    const void *private_info, size_t cb, void *user_data),  
 //    void *user_data, cl_int *errcode_ret)
 //----------------------------------------------
 cl_context_properties properties[] = {CL_CONTEXT_PLATFORM, (cl_context_properties)platform_id, 0};
 context = clCreateContext (properties, 1, &device_id, NULL, NULL, &error);
 if (error != CL_SUCCESS) 
  return CL_ERROR;
 
 //----------------------------------------------
 // cl_command_queue clCreateCommandQueue (cl_context context, cl_device_id device, 
 //     cl_command_queue_properties properties, cl_int *errcode_ret)
 //----------------------------------------------- 
 cq = clCreateCommandQueue (context, device_id, 0, &error);
 if (error != CL_SUCCESS) 
  return CL_ERROR;


STEP 3 - Create the OCL Objects (Program, Kernel and Memory)


This step is pretty straight forward, since you already defined your problem size just need to set them on the OpenCL Objects:

 cl_int error = CL_SUCCESS;

 //----------------------------------------------
 // cl_program clCreateProgramWithSource (cl_context context, cl_uint count, const char **strings, 
 //          const size_t *lengths, cl_int *errcode_ret)
 //------------------------------------------------
 program = clCreateProgramWithSource (context, 1, (const char **)kernel_src, &kernel_size, &error);
 if (error != CL_SUCCESS)
 {
  return CL_ERROR;
 }
 
 //------------------------------------------------
 // cl_int clBuildProgram (cl_program program, cl_uint num_devices, const cl_device_id *device_list,
 //   const char *options, void (*pfn_notify)(cl_program, void *user_data), void *user_data)
 //-------------------------------------------------
 error = clBuildProgram (program, 1, &device_id, "", NULL, NULL);
 if (error < 0)
 {
  //---------------------------------------------------
  // cl_int clGetProgramBuildInfo ( cl_program  program, cl_device_id  device, cl_program_build_info  
  //   param_name, size_t  param_value_size, void  *param_value, size_t  *param_value_size_ret)
  //---------------------------------------------------
  clGetProgramBuildInfo(program, device_id, CL_PROGRAM_BUILD_LOG, kernel_size, kernel_src, NULL);
  printf ("\n%s", kernel_src);
 }
 
 //---------------------------
 // cl_kernel clCreateKernel (cl_program  program, const char *kernel_name, cl_int *errcode_ret)
        // "hello_world" is the name of our kernel function inside the external file.
 //---------------------------
        kernel = clCreateKernel (program, "hello_world", &error );
 if (error != CL_SUCCESS)
 {
  return CL_ERROR;
 }

 //---------------------------
 // cl_mem clCreateBuffer (cl_context context, cl_mem_flags flags, size_t size, 
 //    void *host_ptr, cl_int *errcode_ret)
 //----------------------------
 helloworld_mem_input = clCreateBuffer (context, CL_MEM_READ_ONLY, size, NULL, &error); 
 
 if (error!= CL_SUCCESS) 
 {
                 return CL_ERROR;
 }
  
 helloworld_mem_output = clCreateBuffer (context, CL_MEM_WRITE_ONLY, size, NULL, &error); 
 
 if (error!= CL_SUCCESS) 
 {
   return CL_ERROR
 }


STEP 4 - Set Kernel Arguments

The Kernel arguments must be set in the host code and the sequence of the arguments has to be the same, in our case the inpute and output buffer respectively.

 
 //-----------------------------
 // cl_int clSetKernelArg (cl_kernel kernel, cl_uint arg_index, size_t arg_size, 
 //       const void *arg_value)
 //-------------------------------
 clSetKernelArg (kernel, 0, sizeof(cl_mem), &helloworld_mem_input);
 clSetKernelArg (kernel, 1, sizeof(cl_mem), &helloworld_mem_output);


STEP 5 - Execute the Kernel (Command Queue)


This is the part where all the magic happens. We write the data from host to device , send the start signal to the Device to execute the kernel and finally reads the data back from device to the host. Here is how it is done:

  • clEnqueueWriteBuffer: Write data on the device
  • clEnqueueNDRangeKernel: start the Kernel execution
  • clEnqueueReadBuffer: Reads data back from device

 
 //-------------------------------
 // cl_int clEnqueueWriteBuffer (cl_command_queue command_queue, cl_mem buffer, 
 //        cl_bool blocking_write, size_t offset, size_t cb, 
 //        const void *ptr, cl_uint num_events_in_wait_list, 
 //        const cl_event *event_wait_list, cl_event *event)
 //---------------------------------
 error = clEnqueueWriteBuffer(cq, helloworld_mem_input, CL_TRUE, 0, size, input_data_buffer, 0, NULL, NULL);
 if (error != CL_SUCCESS) 
  return CL_ERROR
 
 //-------------------------------
 // cl_int clEnqueueNDRangeKernel (cl_command_queue command_queue, cl_kernel kernel, 
 //        cl_uint work_dim, const size_t *global_work_offset, 
 //        const size_t *global_work_size, const size_t *local_work_size, 
 //        cl_uint num_events_in_wait_list, const cl_event *event_wait_list, 
 //        cl_event *event)
 //---------------------------------
 error = clEnqueueNDRangeKernel (cq, kernel, dimension, NULL, &global, &local, 0, NULL, NULL);
 if  (ret == CL_SUCCESS)
 {
  //------------------------------------
  // cl_int clEnqueueReadBuffer (cl_command_queue command_queue, cl_mem buffer, 
  //        cl_bool blocking_read, size_t offset, size_t cb, 
  //        void *ptr, cl_uint num_events_in_wait_list,
  //        const cl_event *event_wait_list, cl_event *event)
  //----------------------------------------
  error = clEnqueueReadBuffer(cq, helloworld_mem_output, CL_TRUE, 0, size, output_data_buffer, 0, NULL, NULL);
 }
 else
  return CL_ERROR

STEP 5 - Clean Up the OpenCL Objects


To prevent memory leak or any other issue, the CL objects must be cleaned:

 
 clFlush( cq);
 clFinish(cq);

 clReleaseContext(context);
 clReleaseProgram(program);
 clReleaseCommandQueue(cq);
 clReleaseKernel (kernel);
 clReleaseMemObject (helloworld_mem_input);
 clReleaseMemObject (helloworld_mem_output);


Final Considerations


The sample application presented on this post meant to demonstrate how to create a basic OpenCL based application and it can gets complicated as much as you need it.

The complete and functional source code can be found here !

For information related to the OpenCL EP API, access the Khronos website here.

I hope the information shared in this post can help and give you some directions in your future projects.


EOF !

Wednesday, September 30, 2015

Introduction to i.MX6Q/D (GC2000) Vivante OpenCL Embedded Profile

The purpose of this post is not make you a new OpenCL expert, but provide you the basic knowledge to take advantage of the i.MX6’s GPGPU support and get your code (or part of it) accelerated by its Graphics Processing Unit.


First of All, what is GPGPU and OpenCL ?



GPGPU:


       Stands for General Purpose Graphics Processing Unit

       Algorithms well-suited to GPGPU implementation are those that exhibit two properties: they are data parallel and throughput  intensive

       Data parallel: means that a processor can execute the operation on different data elements simultaneously.

       Throughput intensive: means that the algorithm is going to process lots of data elements, so there will be plenty to operate on in parallel.

       Pixel-based applications such as computer vision and video and image processing are very well suited to GPGPU  technology, and for this reason, many of the commercial software packages in these areas now include GPGPU acceleration


OpenCL


       Open Computing Language (OpenCL) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors.

       OpenCL includes a language (based on C99) for writing kernels (functions that execute on OpenCL devices), plus application programming interfaces (APIs) that are used to define and then control the platforms

       OpenCL provides parallel computing using task-based and data-based parallelism.

       OpenCL is an open standard maintained by the non-profit technology consortium Khronos Group.

       Apple, Intel, Qualcomm, Advanced Micro Devices (AMD), Nvidia, Altera, Samsung, Vivante and ARM Holdings have adopted it.

  
There are A LOT of OpenCL tutorials on the web explaining all its concepts and capabilities. Below you will find only the most important ones:


Introduction to OpenCL


       In order to visualize the heterogeneous architecture in terms of the API and restrict memory usage for parallel execution, OpenCL defines multiple cascading layers of virtual hardware definitions

       The basic execution engine that runs the kernels is called a Processing Element (PE)

       A group of Processing Elements is called a Compute Unit (CU)

       Finally, a group of Compute Unit is called Compute Device.


       A host system could interact with multiple Compute Devices on a system (e.g., a GPGPU and a DSP), but data sharing and synchronization is coarsely defined at this level. 




       Each item a kernel works on is called a 'work item'.

       A simple example of this is determining the color of a single pixel (work-item) in an output image.

       Work-items are grouped into 'work-groups' , which are each executed in parallel to speed up calculation performance

       How big a work-group is depends on the algorithm being executed and the dimensions of the data being processed (e.g. one work-item per pixel for a block of pixels in a filter) 





OpenCL runs in a 'data parallel’ programming model where the kernels run once for each item in an 'index space‘. The dimensionality of the data being processed (e.g., 1, 2, or 3 dimension arrays; called NDRange or N-dimensional range).



Freescale’s i.MX6Q/D GPU (GC2000) OpenCL EP features


       Vivante GC2000 GPGPU capable of running OpenCL 1.1 Embedded Profile (EP)

       OpenCL embedded profile capabilities (that means for instance no atomic variables, does not mandate support for 3D Images,  64 bit integers or double precision floating point numbers)

       4xSIMD cores (vec-4) shader units

       Up to 512 general purpose registers 128b each for each of the cores

        Maximum number of instructions for kernels is 512

       1-cycle throughput for all shader instructions

        L1 cache of 4KB

        Uniform registers 168 for vertex shader and 64 for fragment shader

        Single integer pipeline/core

       In OpenCL Embedded Profile, the requirements requirements for samplers are reduced, with the number of samplers decreased from 16 (FP – Full Profile) to 8 (EP), and the math precision (ULP) is slightly relaxed below the IEEE-754 specification for some functions

       Lastly, in OpenCL EP the minimum image size is reduced to 2048 (from 8192 in FP) and the local memory requirement is reduced to 1KB (from 32KB in FP)





Each of the shader cores function as a CU. The cores are a native Vec4 ISA, thus the preferred vector width for all primitives 4.





Code Optimization for Freescale’s i.MX6Q/D OpenCL EP


       Vector math inputs in multiples of 4.

     As mentioned previously, the GC2000 in i.MX 6Q is a vec4 floating point SIMD engine, so vector math always prefers 4 inputs (or a multiple of 4) for maximum math throughput.


       Use full 32 bit native registers for math.

     Both integer and floating point math is natively 32 bit. 8 and 16bit primitives will still use 32 bit registers, so there is no gain (for the math computation) in going with lower sizes.

       Use floating point instead of integer formats

     1x 32-bit Integer pipeline (supports 32-bit INT formats in hardware, 8-bit/16-bit done in software)
     4x 32-bit Floating Point pipeline (supports 16-bit and 32-bit FP formats in hardware)


       To maximize OpenCL compute efficiency, it is better to convert integer formats to floating point to utilize the four (4) parallel FP math units.


       Use 16-bit integer division and 32-bit for other integer math operations

     For integer math (excluding division), there is one 32-bit integer adder and one 32-bit integer multiplier per core. If integer formats are required, use 32 bit integer formats for addition, multiplication, mask, and sin extensions.
      Try to minimize or not use 8-bit or 16-bit integer formats since they will be calculated in software and the 32-bit INT ALU will not be used.
     Integer division: iMX 6Q hardware supports only 16-bit integer division, and software is used to implement 8-bit and 32-bit division.
     It is better to use 16-bit division if possible. There will be a performance penalty if 32-bit division is used.


       Use Round to Zero mode

     Floating point computation supports “round-to-zero” only (round-to-nearest-even is not required for EP, if round-to-zero is supported).


       Data accesses should be 16B

     For the most efficient use of the GPGPU’s L1 cache.
     Work-group size should be a multiple of thread-group size.
     Work-group size should be an integer multiple of the GPU's internal preferred work-group size (16 for GC2000) for optimum hardware usage.


       Keep Global work size at 64K (maximum) per dimension

     Since global IDs for work-items are 16 bits, it is necessary to keep the global work size within 64K (65,536 or 216) per dimension.


       Conditional Branching should be avoided if possible

      Branch penalties depend on the percentage of work-items that go down the branch paths.



This post is long enough for just an “introductory” information about i.MX6Q/D OpenCL EP, for more information including a sample application, take a look on this good white paper provided by Freescale: https://community.freescale.com/docs/DOC-100694


EOF !




Saturday, August 8, 2015

New posts coming soon !

Hello Everybody !!!

Sorry for not posting anything new since last year. But unfortunately I have been running out of time to create applications that I could share the full source code.

And even that, with old posts this blog still holds 2K views per month. THANK YOU !

I am preparing a set of posts (at least 2 per month) that will also envolve GPU acceleration on our Computer Vision topics !! The use of OpenCV is the main purpose of this blog but not all posts will base on that.

Stay  Tuned !



EOF !

Wednesday, August 13, 2014

Onboard Camera - V4L wrapper with Yocto Project

Based on my last post, there are a lot of people having problems to use this wrapper in Yocto, since it uses a kernel header mxcfb.h and it differs a bit from LTIB, so this post is to help solving this issue.

We have a nice tool on linux that most people don't use it (myself included) and can do the job of dealing with dependencies like a charm ! This tool is called AUTOTOOLS, yes, that one which create the pratical and easy ./configure ./make and ./make install system. Below I will demonstrate how we can use that to solve our issue:


STEP#1 - YOCTO BUILD CHANGES

We now need to add the kernel-dev to our local.conf (in build/conf) file and it will look like:

     BB_NUMBER_THREADS ?= "${@oe.utils.cpu_count()}"
     PARALLEL_MAKE ?= "-j ${@oe.utils.cpu_count()}"
     MACHINE ??= 'imx6qsabresd'
     DISTRO ?= 'poky'
     PACKAGE_CLASSES ?= "package_rpm"
     EXTRA_IMAGE_FEATURES = "debug-tweaks"
     USER_CLASSES ?= "buildstats image-mklibs image-prelink"
     PATCHRESOLVE = "noop"
     BB_DISKMON_DIRS = "\
         STOPTASKS,${TMPDIR},1G,100K \
         STOPTASKS,${DL_DIR},1G,100K \
         STOPTASKS,${SSTATE_DIR},1G,100K \
         ABORT,${TMPDIR},100M,1K \
         ABORT,${DL_DIR},100M,1K \
         ABORT,${SSTATE_DIR},100M,1K" 
     PACKAGECONFIG_pn-qemu-native = "sdl"
     ASSUME_PROVIDED += "libsdl-native"
     CONF_VERSION = "1"

     BB_NUMBER_THREADS = '4'
     PARALLEL_MAKE = '-j 4'

     DL_DIR ?= "${BSPDIR}/downloads/"
    ACCEPT_FSL_EULA = ""

     DISTRO_FEATURES_remove = "x11 wayland"

     CORE_IMAGE_EXTRA_INSTALL += "gpu-viv-bin-mx6q gpu-viv-bin-mx6q-dev kernel-dev"


STEP #2 - REBUILDING YOCTO

     $ bitbake core-image-base


STEP#3 - POSSIBLE ISSUE

     if the mxcfb.h file doens't appear in your sysroot (/opt/poky/1.6.1/sysroots/cortexa9hf-vfp-neon-poky-linux-gnueabi/) then you need to build your toolchain and then reinstall it as well:

     rebuild with -c populate:

          $ cd fsl-community-bsp/build
          $ bitbake core-image-base -c populate_sdk

     and then install:

          $ cd /fsl-community-bsp/build/tmp/deploy/sdk
          $ ./poky-eglibc-x86_64-core-image-base-cortexa9hf-vfp-neon-toolchain-1.6.1.sh


STEP #4 - MODIFYING THE APP TO SUPPORT AUTOTOOLS

     Now is the interesting part, we will need 2 different files (pretty simple ones actually), they are: Makefile.am and configure.ac.

     Based on my projects configuration, I like to have bin, include and src folders, so the sample below will be on those:


     Project Tree:
    
             Project folder
                        |
a)                     ----- Makefile.am
                        |
b)                     ----- configure.ac
                        |
                        ----- bin /
                        |
                        ----- src /
                        |         |
c)                     |         ------ Makefile.am
                        |         |
                        |         ------ camera_test.c
                        |         |
                        |         ------ v4l_wrapper.c
                        |
                        ----- include /
                                  |
                                  ------ v4l_wrapper.h


    a) Makefile.am

             AUTOMAKE_OPTIONS = foreign
        SUBDIRS = src          # add just those ones that will have a Makefile inside.

    b) configure.ac
         
             AC_INIT([camera_test], [0.1], [andre.silva@freescale.com])
             AM_INIT_AUTOMAKE([-Wall -Werror foreign])
             AC_PROG_CC
             AC_CONFIG_HEADERS([config.h])
             AC_CHECK_HEADERS([fcntl.h stdint.h stdlib.h string.h sys/ioctl.h unistd.h])

             # Checks for library functions.
             AC_FUNC_MMAP
             AC_CHECK_FUNCS([gettimeofday memset munmap])

             ##########################################################################
             # Checks for programs needed for tests
             ##########################################################################

             AC_CONFIG_FILES([Makefile
             src/Makefile])

             AC_OUTPUT


    c) Makefile.am

             bin_PROGRAMS = camera_test

             camera_test_SOURCES = camera_test.c ../include/v4l_wrapper.h
             nodist_camera_test_SOURCES = ../src/v4l_wrapper.c

             ROOTFS_DIR = $(SDKTARGETSYSROOT)

             TARGET_PATH_LIB = $(ROOTFS_DIR)/usr/lib
             TARGET_PATH_INCLUDE = $(ROOTFS_DIR)/usr/include

             AM_CPPFLAGS = -I $(prefix)/usr/src/kernel/include/uapi -I $(prefix)/usr/src/kernel/include/ -I $(TARGET_PATH_INCLUDE) -I ../include

             AM_LDFLAGS = -Wl,--library-path=$(TARGET_PATH_LIB),-rpath-link=$(TARGET_PATH_LIB) -lm  -L $(prefix)/usr/lib 
     
             # workaround to get the binaries copied to local /bin folder
             all:
                       mv ../src/$(bin_PROGRAMS) ../bin

             clean-local: 
                        rm -rf ../bin/*
                  rm -rf ../src/*.o


     * if you are not going to use any additional source you can only let the camera_test_SOURCES = camera_test.c


STEP #5 - GENERATING THE BUILD FILES

     In a clean terminal enter the following commands to generate the files:

          $ autoheader
          $ autoreconf --install

     it will create all the necessary files to build your application, including dealing with dependencies, even if they are from kernel.


STEP #6 - BUILDING THE APPLICATION

     For building any application using yocto (unless you have a recipe for it), you have to export the toolchain environment variables, and once it is done, the terminal you have used is now dirty and if you want to do all the steps above again, you will need another one (cleaned).

          $ cd /opt/poky/1.6.1/
          $ .  ./environment-setup-cortexa9hf-vfp-neon-poky-linux-gnueabi
          $ cd /home/usr/v4l_wrapper_yocto/
          $ ./configure --host=arm-poky-linux-gnueabi --prefix=$SDKTARGETSYSROOT
          $ make

     At this point you will have your binary placed in bin/ folder, if you enter the make install command it will install (copy) the application binary into your rootfs/bin, where the rootfs is defined by the --ṕrefix in ./configure line.


THE RESULT





The sample code can be found here.

Thanks to Rogério Pimentel who helped me with the autotools and how to solve the mxcfb.h dependency issue and Daiane Angolini with YOCTO.

EOF !




Monday, May 19, 2014

Onboard Camera - V4L wrapper for use with OpenCV

Many people have been asking how to use the onbard camera (CSI and MIPI that comes with the Freesacale's development board) with OpenCV.

Unfortunately the mxc driver for this camera is not compatible with OpenCV and there is no way to use it directly, instead, we can use wrapper functions to access it. In this post I will share with you guys some utility functions I created (based on an Ipu demo app from Rogerio Pimentel - Freescaler) that uses V4L to access the camera.

When using these functions we will get frames on UYVY format (YUV422) and to use with OpenCV it must be converted to RGB24 (24bpp), all these are included in the source code.

The code can be downloaded from here, it also comes with a sample application, you just need to export the ROOTFS envioroment variable and check if the toolchain you are going to use is the same as defined in the Makefile, otherwise you must change it too.

The demo application (tested on i.MX6 sabresd board) performs the Canny edge detector and displays it using cvShowImage, you could display it directly in the framebuffer using the utility function, just uncomment the functions in the code and also add the conversion from RGB888toYUV422.

to run this application, make sure to have these drivers installed:

modprobe ov5642_camera
modprobe ov5640_camera_mipi
modprobe mxc_v4l2_capture

here is the result:




ps:  There is a "small" issue in these wrapper functions, as it uses multiple buffers for capturing and output (if you use it instead cvShowImage), you need to at least use the  V4LWrapper_QueryFrame function 3 times in a row to fill all buffers and the display it, so, the best scenario would be using this function like cvQueryFrame, when I got a fix for this I will update this post. Meanwhile just throw this function inside a loop function and you are good to go =)

EOF !



Wednesday, February 26, 2014

Building OpenCV-2.4.X for Freescale's i.MX6 BSP (Yocto)

Lately a lot of people are working with the Yocto Project and many of them migrated from LTIB (like me). Yocto uses a different conception when adding new packages/applications to the system, now everything is based on RECIPES. As it is being highly used, the amount of packages (recipes) already included is very big and it keep increasing continuously. For our lucky the recipe for OpenCV is already there, we just need to configure the system in order to add it to us.

In order to get everything up running we will divide de taks in steps:


Step #1 - Installing Yocto
--------------------------

As our focus is to install OpenCV, the Yocto install procedure we can use this very good tutorial created by Daiane: https://community.freescale.com/docs/DOC-94849


Step #2 - Enabling OpenCV
----------------------------

As we already have the OpenCV recipe in our Yocto release, we just need to add what packages we want in our local.conf file, located at /yocto/fsl-community-bsp/build/conf. With some modification (opencv package), it should look like this:

    MACHINE ??= 'imx6qsabresd'
    DISTRO ?= 'poky'
    PACKAGE_CLASSES ?= "package_rpm"
    EXTRA_IMAGE_FEATURES = "debug-tweaks dev-pkgs"
    USER_CLASSES ?= "buildstats image-mklibs image-prelink"
    PATCHRESOLVE = "noop"
    BB_DISKMON_DIRS = "\
    STOPTASKS,${TMPDIR},1G,100K \
    STOPTASKS,${DL_DIR},1G,100K \
    STOPTASKS,${SSTATE_DIR},1G,100K \
    ABORT,${TMPDIR},100M,1K \
    ABORT,${DL_DIR},100M,1K \
    ABORT,${SSTATE_DIR},100M,1K" 
    PACKAGECONFIG_pn-qemu-native = "sdl"
    ASSUME_PROVIDED += "libsdl-native"
    CONF_VERSION = "1"

    BB_NUMBER_THREADS = '4'
    PARALLEL_MAKE = '-j 4'

    DL_DIR ?= "${BSPDIR}/downloads/"
    ACCEPT_FSL_EULA = ""

    CORE_IMAGE_EXTRA_INSTALL += "gpu-viv-bin-mx6q gpu-viv-bin-mx6q-dev"
    CORE_IMAGE_EXTRA_INSTALL += "libopencv-core-dev libopencv-highgui-dev
libopencv-imgproc-dev libopencv-objdetect-dev libopencv-ml-dev"

    LICENSE_FLAGS_WHITELIST = "commercial"


Note that we included the "-dev" packages, this is necessary if you always want to have the OpenCV headers/libraries included in the rootfs, Yocto is smart if you don´t add a "-dev" package and the libraries are just included any application that uses it needs to be built. As we always want our OpenCV stuff to build our applications so we use it this way.


Step #3 - Building OpenCV
----------------------------

Now the easy part:

/yocto/fsl-community-bsp/build$./bitbake core-image-x11

after build is finished you can check the images generated by the bitbake command at:

/build/tmp/deploy/images/imx6qsabresd/

and after extracting the rootfs: core-image-x11-imx6qsabresd.tar.bz2, you can find the opencv libraries in the /usr/lib folder:

andre@b22958:~/bsps/yocto/rootfs$ ls usr/lib/libopen*
usr/lib/libopencv_calib3d.so           usr/lib/libopencv_ml.so
usr/lib/libopencv_calib3d.so.2.4       usr/lib/libopencv_ml.so.2.4
usr/lib/libopencv_calib3d.so.2.4.7     usr/lib/libopencv_ml.so.2.4.7
usr/lib/libopencv_contrib.so           usr/lib/libopencv_nonfree.so
usr/lib/libopencv_contrib.so.2.4       usr/lib/libopencv_nonfree.so.2.4
usr/lib/libopencv_contrib.so.2.4.7     usr/lib/libopencv_nonfree.so.2.4.7
usr/lib/libopencv_core.so              usr/lib/libopencv_objdetect.so
usr/lib/libopencv_core.so.2.4          usr/lib/libopencv_objdetect.so.2.4
usr/lib/libopencv_core.so.2.4.7        usr/lib/libopencv_objdetect.so.2.4.7
usr/lib/libopencv_features2d.so        usr/lib/libopencv_ocl.so
usr/lib/libopencv_features2d.so.2.4    usr/lib/libopencv_ocl.so.2.4
usr/lib/libopencv_features2d.so.2.4.7  usr/lib/libopencv_ocl.so.2.4.7
usr/lib/libopencv_flann.so             usr/lib/libopencv_photo.so
usr/lib/libopencv_flann.so.2.4         usr/lib/libopencv_photo.so.2.4
usr/lib/libopencv_flann.so.2.4.7       usr/lib/libopencv_photo.so.2.4.7
usr/lib/libopencv_gpu.so               usr/lib/libopencv_stitching.so
usr/lib/libopencv_gpu.so.2.4           usr/lib/libopencv_stitching.so.2.4
usr/lib/libopencv_gpu.so.2.4.7         usr/lib/libopencv_stitching.so.2.4.7
usr/lib/libopencv_highgui.so           usr/lib/libopencv_superres.so
usr/lib/libopencv_highgui.so.2.4       usr/lib/libopencv_superres.so.2.4
usr/lib/libopencv_highgui.so.2.4.7     usr/lib/libopencv_superres.so.2.4.7
usr/lib/libopencv_imgproc.so           usr/lib/libopencv_video.so
usr/lib/libopencv_imgproc.so.2.4       usr/lib/libopencv_video.so.2.4
usr/lib/libopencv_imgproc.so.2.4.7     usr/lib/libopencv_video.so.2.4.7
usr/lib/libopencv_legacy.so            usr/lib/libopencv_videostab.so
usr/lib/libopencv_legacy.so.2.4        usr/lib/libopencv_videostab.so.2.4
usr/lib/libopencv_legacy.so.2.4.7      usr/lib/libopencv_videostab.so.2.4.7
andre@b22958:~/bsps/yocto/rootfs$

ps: don´t forget to flash the card with the image created at /tmp/deploy/images/imx6qsabresd/core-image-x11-imx6qsabresd.sdcard

$ sudo dd if= /build/tmp/deploy/images/imx6qsabresd/core-image-x11-imx6qsabresd.sdcard of=/dev/sdb
----------------------------------------

After those 3 steps above you should be able to find all the OpenCV headers/libraries needed by mostly of your application, but in any case you need more dev packages, you can look at: /tmp/work/cortexa9hf-vfp-neon-poky-linux-gnueabi/opencv/2.4.6+gitAUTOINC+1253c2101b-r0/packages-split

Now that you have the OpenCV headers/libraries we need the toolchain to build our sample application, just re-do the bitbake command now adding the "-c populate" option in the command line:

/yocto/fsl-community-bsp/build$./bitbake core-image-x11 -c populate_sdk

and then run the install script created at: /yocto/fsl-community-bsp/build/tmp/deploy/sdk to install it.

With that you will be able to see the toolchain installed at: /opt/poky

Now we are able to test our sample code, just a camera test and you can find the source code here: camera_test_sample

To build this application you need a new terminal window (all environment variables will be reset), then run the setup environment:

$ cd /opt/poky/1.5+snapshot/
$ . ./environment-setup-cortexa9hf-vfp-neon-poky-linux-gnueabi

and then go to the camera_test_yocto folder and type make. The binary will be placed in the bin folder.

Once flashed your card with the Yocto image (opencv included), mount the sd card in your host computer and then copy the binary to your rootfs.

To test it, run the application with the following command:

$ DISPLAY =:0 ./camera_test



EOF !