Qt Developer Days: March 2015

Qt Quick Controls for Embedded

Qt Quick Controls are a set of styled UI controls to conveniently build user interfaces in QML. The first release of Qt Quick Controls was included in Qt 5.1 and targeted desktop platforms. Subsequent releases have also vastly improved the support for mobile platforms.

Qt Quick Controls are not only offered for convenience, but also act as an example to vendors wanting to create their own UI control sets. Unlike specialized control sets for a defined target platform, Qt Quick Controls have a broad scope from desktop to mobile and embedded. This has led to a flexible, but somewhat complex design.

When targeting embedded hardware with limited resources, we would like to offer better performance than what the current set of Qt Quick Controls provide. Over the past months, we have spent a great deal of time researching, profiling, and discussing alternative approaches. We would like our customers to enjoy the convenience of Qt Quick Controls everywhere without a significant performance impact.

We now have a promising prototype up and running, so we thought it would be a good time to share the status. We have posted a little sneak preview showing the new controls in action. Please notice that the visual appearance is not final.

The following sections highlight some of the ideas that together helped to achieve a remarkable performance boost.

QML vs. C++

In many cases, the internal state of a control can be more efficiently processed in C++. For example, handling input events in C++ makes a difference for controls that would otherwise need to create internal MouseAreas and attached Keys objects. By doing all the heavy-lifting in C++, the visual QML layer could be implemented using simple and efficient declarative bindings.

The following charts present creation times of various Qt Quick Controls compared to what we have now in the works.

qqc-i7-1

qqc-rpi-1

As you can see, on a device like RPi, one still cannot create too many controls per frame when aiming for 60 FPS scrolling. We are clearly headed the right direction, though. :)

Styles

The new controls are spiced up with a customizable, light-weight and platform independent Qt style that performs well on devices with limited resources. The concept of styling is changing in a way that styles no longer provide components that are dynamically instantiated by controls, but controls themselves consist of delegates that can be replaced without dynamic instantiation. Style objects on the other hand have become simple sets of styling attributes. In order to brand an application, changing the color scheme is a matter of setting a few properties that are automatically inherited by the hierarchy of children.

Keep things simple

When it comes to more complex compound controls, it is sometimes better to provide the sub-controls as separate building blocks. As an example, we consider replacing the complex ScrollView control by simple ScrollBar/Indicator controls that can be attached to any Flickable:


ScrollView {
    horizontalScrollBarPolicy: Qt.ScrollBarAlwaysOff
    Flickable {
        ...
    }
}

// vs.

Flickable {
    ...
    ScrollBar.vertical: ScrollBar { }
}

Maybe not entirely fair to compare these two approaches, but here’s the gain of the oversimplification in numbers:

qqc-i7-2

qqc-rpi-2

Wrap up

We have been exploring a bit what it would take to bring Qt Quick Controls to the segment of devices with limited resources. Some ideas, leading to significant performance improvements, are presented above. The performance comparisons focus on creation time, which reflects directly to application startup time, loading time of application views, and scrolling speed of item views that create and destroy delegate instances while scrolling. Simplifying things and doing all the heavy-lifting in C++ reduces memory consumption as well. Currently a Button using the “Base” style consists of 17 Items (of which 4 are Loaders) and a total of 64 QObjects. The amount of Items of a Button is now down to 3, and the total amount of QObjects is 7 at the moment.

Please bear in mind that the new light-weight controls are still in early development. The numbers, visual appearance, and the whole concept are still subject to change.

The post Qt Quick Controls for Embedded appeared first on Qt Blog.

Source :- Qt Blog http://ift.tt/1Dk0J3Y

Introducing video filters in Qt Multimedia

Qt Multimedia makes it very easy to get a video stream from the camera or a video file rendered as part of your application’s Qt Quick scene. What is more, its modularized backend and video node plugin system allows to provide hardware accelerated, zero copy solutions on platforms where such an option is available. All this is hidden from the applications when using Qt Quick elements like Camera, MediaPlayer and VideoOutput, which is great. But what if you want to do some additional filtering or computations on the video frames before they are presented? For example because you want to transform the frame, or compute something from it, on the GPU using OpenCL or CUDA. Or because you want to run some algorithms provided by OpenCV. Before Qt 5.5 there was no easy way to do this in combination with the familiar Qt Quick elements. With Qt 5.5 this is going to change: say hello to QAbstractVideoFilter.

QAbstractVideoFilter serves as a base class for classes that are exposed to the QML world and are instantiated from there. They are then associated with a VideoOutput element. From that point on, every video frame the VideoOutput receives is run through the filter first. The filter can provide a new video frame, which is used in place of the original, calculate some results or both. The results of the computation are exposed to QML as arbitrary data structures and can be utilized from Javascript. For example, an OpenCV-based object detection algorithm can generate a list of rectangles that is exposed to QML. The corresponding Javascript code can then position some Rectangle elements at the indicated locations.

Let’s see some code


    import QtQuick 2.3
    import QtMultimedia 5.5
    import my.cool.stuff 1.0

    Item {
        Camera {
            id: camera
        }
        VideoOutput {
            source: camera
            anchors.fill: parent
            filters: [ faceRecognitionFilter ]
        }
        FaceRecognizer {
            id: faceRecognitionFilter
            property real scaleFactor: 1.1 // Define properties either in QML or in C++. Can be animated too.
            onFinished: {
                console.log("Found " + result.rects.length + " faces");
                ... // do something with the rectangle list
            }
        }
    }

The new filters property of VideoOutput allows to associate one or more QAbstractVideoFilter instances with it. These are then invoked in order for every incoming video frame.

The outline of the C++ implementation is like this:


QVideoFilterRunnable *FaceRecogFilter::createFilterRunnable()
{
    return new FaceRecogFilterRunnable(this);
}
...
QVideoFrame FaceRecogFilterRunnable::run(QVideoFrame *input, const QVideoSurfaceFormat &surfaceFormat, RunFlags flags)
{
    // Convert the input into a suitable OpenCV image format, then run e.g. cv::CascadeClassifier,
    // and finally store the list of rectangles into a QObject exposing a 'rects' property.
    ...
    emit m_filter->finished(result);
    return *input;
}   
...
int main(..)
{
    ...
    qmlRegisterType<FaceRecogFilter>("my.cool.stuff", 1, 0, "FaceRecognizer");
    ...
}

Here our filter implementation simply passes the input video frame through, while generating a list of rectangles. This can then be examined from QML, in the finished signal handler. Simple and flexible.

While the registration of our custom filter happens from the main() function in the example, the filter can also be provided from QML extension plugins, independently from the application.

The QAbstractVideoFilter – QVideoFilterRunnable split mirrors the approach with QQuickItem – QSGNode. This is essential in order to support threaded rendering: when the Qt Quick scenegraph is using its threaded render loop, all rendering (the OpenGL operations) happen on a dedicated thread. This includes the filtering operations too. Therefore we have to ensure that the graphics and compute resources live and are only accessed on the render thread. A QVideoFilterRunnable always lives on the render thread and all its functions are guaranteed to be invoked on that thread, with the Qt Quick scenegraph’s OpenGL context bound. This makes creating filters relying on GPU compute APIs easy and painless, even when OpenGL interop is involved.

GPU compute and OpenGL interop

All this is very powerful when it comes to avoiding copies of the pixel data and utilizing the GPU as much as possible. The output video frame can be in any supported format and can differ from the input frame, for instance a GPU-accelerated filter can upload the image data received from the camera into an OpenGL texture, perform operations on that (using OpenCL – OpenGL interop for example) and provide the resulting OpenGL texture as its output. This means that after the initial texture upload, which is naturally in place even when not using any QAbstractVideoFilter at all, everything happens on the GPU. When doing video playback, the situation is even better on some platforms: in case the input is already an OpenGL texture, D3D texture, EGLImage or similar, we can potentially perform everything on the GPU without any readbacks or copies.

The OpenCL-based example that comes with Qt Multimedia demonstrates this well. Shown below running on OS X, the input frames from the video already contain OpenGL textures. All we need to do is to use OpenCL’s GL interop to get a CL image object. The output image object is also based on a GL texture, allowing us to pass it to VideoOutput and the Qt Quick scenegraph as-is.

Real-time image transformation on the GPU with OpenCL, running on OS X

While the emboss effect is not really interesting and can also be done with OpenGL shaders using ShaderEffect items, the example proves that integrating OpenCL and similar APIs with Qt Multimedia does not have to be hard – in fact the code is surprisingly simple and yet so powerful.

It is worth pointing out that filters that do not result in a modified image and are not interested in staying in sync with the displayed frames do not have to block until the computation is finished: the implementation of run() can queue the necessary operations without waiting for them to finish. A signal indicating the availability of the computation results is then emitted later, for example from the associated event callback in case of OpenCL. All this is made possible by the thread-awareness of Qt’s signals: the signal emission will work equally well regardless of which thread the callback is invoked on.

CPU-based filtering

Not all uses of video filters will rely on the GPU. Today’s PCs and even many embedded devices are powerful enough to perform many algorithms on the CPU. Below is a screenshot from the finished version of the code snippet above. Instead of faces, we recognize something more exciting, namely Qt logos:

Qt logo recognition with OpenCV and a webcam in a Qt Quick application using Qt Multimedia and Qt Quick Controls.

The application’s user interface is fully QML-based, even the rectangles are actual Rectangle elements. It is shown here running on a desktop Linux system, where the video frames from the camera are provided as YUV image data in system memory. However, it functions identically well on Embedded Linux devices supported by Qt Multimedia, for example the i.MX6-based Sabre SD board. Here comes the proof:

The same application, using the on-board MIPI camera

And it just works, with the added bonus of the touch-friendly controls from the Flat style.

The post Introducing video filters in Qt Multimedia appeared first on Qt Blog.

Source :- Qt Blog http://ift.tt/1C3NEL8

Qt Creator 3.3.2 released

Qt Creator 3.3.2 is a small patch release that only fixes:

deployment of the Clang code model plugin on OS X (QTCREATORBUG-14038)

a crash with Qt Quick emulation layer when using fallback emulation (QTCREATORBUG-14031)

Download here

Source :- Qt Blog http://ift.tt/1NjTQ6r

Qt Weekly #28: Qt and CUDA on the Jetson TK1

NVIDIA’s Jetson TK1 is a powerful development board based on the Tegra K1 chip. It comes with a GPU capable of OpenGL 4.4, OpenGL ES 3.1 and CUDA 6.5. From Qt’s perspective this is a somewhat unorthodox embedded device because its customized Linux system is based on Ubuntu 14.04 and runs the regular X11 environment. Therefore the approach that is typical for low and medium-end embedded hardware, running OpenGL-accelerated Qt apps directly on the framebuffer using the eglfs platform plugin, will not be suitable.

In addition, the ability to do hardware-accelerated computing using CUDA is very interesting, especially when it comes to interoperating with OpenGL. Let’s take a look at how CUDA code can be integrated with a Qt-based application.

The board

Building Qt

This board is powerful enough to build everything on its own without any cross-compilation. Configuring and building Qt is no different than in any desktop Linux environment. One option that needs special consideration however is -opengl es2 because Qt can be built either in a GLX + OpenGL or EGL + OpenGL ES configuration.

For example, the following configures Qt to use GLX and OpenGL:


configure -release -nomake examples -nomake tests

while adding -opengl es2 requests the usage of EGL and OpenGL ES:


configure -release -opengl es2 -nomake examples -nomake tests

If you are planning to run applications relying on modern, non-ES OpenGL features, or use CUDA, then go for the first. If you however have some existing code from the mobile or embedded world relying on EGL or OpenGL ES then it may be useful to go for #2.

The default platform plugin will be xcb, so running Qt apps without specifying the platform plugin will work just fine. This is the exact same plugin that is used on any ordinary X11-based Linux desktop system.

Vsync gotchas

Once the build is done, you will most likely run some OpenGL-based Qt apps. And then comes the first surprise: applications are not synchronized to the vertical refresh rate of the screen.

When running for instance the example from qtbase/examples/opengl/qopenglwindow, we expect a nice and smooth 60 FPS animation with the rendering thread throttled appropriately. This unfortunately isn’t the case. Unless the application is fullscreen. Therefore many apps will want to replace calls like show() or showMaximized() with showFullScreen(). This way the thread is throttled as expected.

A further surprise may come in QWidget-based applications when opening a popup or a dialog. Unfortunately this also disables synchronization, even though the main window still covers the entire screen. In general we can conclude that the standard embedded recommendation of sticking to a single fullscreen window is very valid for this board too, even when using xcb, although for completely different reasons.

CUDA

After installing CUDA, the first and in fact the only challenge is to tackle the integration of nvcc with out Qt projects.

Unsurprisingly, this has been tackled by others before. Building on this excellent article, the most basic integration in our .pro file could look like this:


... # QT, SOURCES, HEADERS, the usual stuff 

CUDA_SOURCES = cuda_stuff.cu

CUDA_DIR = /usr/local/cuda
CUDA_ARCH = sm_32 # as supported by the Tegra K1

INCLUDEPATH += $$CUDA_DIR/include
LIBS += -L $$CUDA_DIR/lib -lcudart -lcuda
osx: LIBS += -F/Library/Frameworks -framework CUDA

cuda.commands = $$CUDA_DIR/bin/nvcc -c -arch=$$CUDA_ARCH -o ${QMAKE_FILE_OUT} ${QMAKE_FILE_NAME}
cuda.dependency_type = TYPE_C
cuda.depend_command = $$CUDA_DIR/bin/nvcc -M ${QMAKE_FILE_NAME}
cuda.input = CUDA_SOURCES
cuda.output = ${QMAKE_FILE_BASE}_cuda.o
QMAKE_EXTRA_COMPILERS += cuda

In addition to Linux this will also work out of the box on OS X. Adapting it to Windows should be easy. For advanced features like reformatting nvcc’s error messages to be more of Creator’s liking, see the article mentioned above.

A QOpenGLWindow-based application that updates an image via CUDA on every frame could now look something like the following. The approach is the same regardless of the OpenGL enabler in use: QOpenGLWidget or a custom Qt Quick item would operate along the same principles: call cudaGLSetGLDevice when the OpenGL context is available, register the OpenGL resources to CUDA, and then do map – invoke CUDA kernel – unmap – draw on every frame.

Note that in this example we are using a single pixel buffer object. There are other ways to do interop, for example we could have registered the GL texture, got a CUDA array out of it and bound that either to a CUDA texture or surface.


...
// functions from cuda_stuff.cu
extern void CUDA_init();
extern void *CUDA_registerBuffer(GLuint buf);
extern void CUDA_unregisterBuffer(void *res);
extern void *CUDA_map(void *res);
extern void CUDA_unmap(void *res);
extern void CUDA_do_something(void *devPtr, int w, int h);

class Window : public QOpenGLWindow, protected QOpenGLFunctions
{
public:
    ...
    void initializeGL();
    void paintGL();

private:
    QSize m_imgSize;
    GLuint m_buf;
    GLuint m_texture;
    void *m_cudaBufHandle;
};

...

void Window::initializeGL()
{
    initializeOpenGLFunctions();
    
    CUDA_init();

    QImage img("some_image.png");
    m_imgSize = img.size();
    img = img.scaled(m_imgSize).convertToFormat(QImage::Format_RGB32); // BGRA on little endian
    
    glGenBuffers(1, &m_buf);
    glBindBuffer(GL_PIXEL_UNPACK_BUFFER, m_buf);
    glBufferData(GL_PIXEL_UNPACK_BUFFER, m_imgSize.width() * m_imgSize.height() * 4, img.constBits(), GL_DYNAMIC_COPY);
    glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);

    m_cudaBufHandle = CUDA_registerBuffer(m_buf);

    glGenTextures(1, &m_texture);
    glBindTexture(GL_TEXTURE_2D, m_texture);

    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, m_imgSize.width(), m_imgSize.height(), 0, GL_BGRA, GL_UNSIGNED_BYTE, 0);

    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
}

void Window::paintGL()
{
    glClear(GL_COLOR_BUFFER_BIT);

    void *devPtr = CUDA_map(m_cudaBufHandle);
    CUDA_do_something(devPtr, m_imgSize.width(), m_imgSize.height());
    CUDA_unmap(m_cudaBufHandle);

    glBindBuffer(GL_PIXEL_UNPACK_BUFFER, m_buf);
    glBindTexture(GL_TEXTURE_2D, m_texture);
    // Fast path due to BGRA
    glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, m_imgSize.width(), m_imgSize.height(), GL_BGRA, GL_UNSIGNED_BYTE, 0);
    glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);

    ... // do something with the texture

    update(); // request the next frame
}
...

The corresponding cuda_stuff.cu:


#include <stdio.h>
#ifdef Q_OS_MAC
#include <OpenGL/gl.h>
#else
#include <GL/gl.h>
#endif
#include <cuda.h>
#include <cuda_gl_interop.h>

void CUDA_init()
{
    cudaDeviceProp prop;
    int dev;
    memset(&prop, 0, sizeof(cudaDeviceProp));
    prop.major = 3;
    prop.minor = 2;
    if (cudaChooseDevice(&dev, &prop) != cudaSuccess)
        puts("failed to choose device");
    if (cudaGLSetGLDevice(dev) != cudaSuccess)
        puts("failed to set gl device");
}

void *CUDA_registerBuffer(GLuint buf)
{
    cudaGraphicsResource *res = 0;
    if (cudaGraphicsGLRegisterBuffer(&res, buf, cudaGraphicsRegisterFlagsNone) != cudaSuccess)
        printf("Failed to register buffer %u\n", buf);
    return res;
}

void CUDA_unregisterBuffer(void *res)
{
    if (cudaGraphicsUnregisterResource((cudaGraphicsResource *) res) != cudaSuccess)
        puts("Failed to unregister resource for buffer");
}

void *CUDA_map(void *res)
{
    if (cudaGraphicsMapResources(1, (cudaGraphicsResource **) &res) != cudaSuccess) {
        puts("Failed to map resource");
        return 0;
    }
    void *devPtr = 0;
    size_t size;
    if (cudaGraphicsResourceGetMappedPointer(&devPtr, &size, (cudaGraphicsResource *) res) != cudaSuccess) {
        puts("Failed to get device pointer");
        return 0;
    }
    return devPtr;
}

void CUDA_unmap(void *res)
{
    if (cudaGraphicsUnmapResources(1,(cudaGraphicsResource **) &res) != cudaSuccess)
        puts("Failed to unmap resource");
}

__global__ void run(uchar4 *ptr)
{
    int x = threadIdx.x + blockIdx.x * blockDim.x;
    int y = threadIdx.y + blockIdx.y * blockDim.y;
    int offset = x + y * blockDim.x * gridDim.x;

    ...
}

void CUDA_do_something(void *devPtr, int w, int h)
{
    const int blockSize = 16; // 256 threads per block
    run<<<dim3(w / blockSize, h / blockSize), dim3(blockSize, blockSize)>>>((uchar4 *) devPtr);
}

This is all that’s needed to integrate the power of Qt, OpenGL and CUDA. Happy hacking!

Source :- Qt Blog http://ift.tt/1DO8VYE