MARS: MARS - Multicore Application Runtime System

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".

THIS DOCUMENT IS PROVIDED "AS IS," AND COPYRIGHT HOLDERS MAKE NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NON-INFRINGEMENT, OR TITLE; THAT THE CONTENTS OF THE DOCUMENT ARE SUITABLE FOR ANY PURPOSE; NOR THAT THE IMPLEMENTATION OF SUCH CONTENTS WILL NOT INFRINGE ANY THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS. COPYRIGHT HOLDERS WILL NOT BE LIABLE FOR ANY DIRECT, INDIRECT, SPECIAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF ANY USE OF THE DOCUMENT OR THE PERFORMANCE OR IMPLEMENTATION OF THE CONTENTS THEREOF.

1 General Concepts

MARS assumes a target multicore architecture where there is a single host processor (host) that is managing or controlling the execution of programs or processes on 1 or more separate microprocessing units (MPUs).

MARS assumes a target audience of developers of applications that run on such multicore architectures.

1.1 Host Processor (host)

The host program is responsible for the initialization of all sub programs to be run on the various microprocessing units (MPU) available on the target multicore architecture.

The memory area accessible by the host processor will be referred to as the host storage.

1.2 Microprocessing Unit (MPU)

The MPU program is the sub program that is initialized by the host program and executed on the MPU.

The MPU program should be in the ELF format. When the host program initializes the MPU program for execution, it will need to know the address of the MPU program ELF image in host storage. The procedures to get the MPU program ELF image loaded into host storage is platform independent and outside the scope of MARS.

1.3 Multicore Programming Limitations

First, the memory size of the MPU storage is limited. As each application processing gets more complex and the code sizes of MPU programs get larger, the size of MPU programs offloaded to the MPUs may exceed the physical memory size of the MPU storage.

If the size exceeds the memory size of the MPU storage, the offloaded MPU processing must be partitioned into smaller pieces of code in order to reduce the code size for each MPU program. As the result of this partitioning of code, some collaborative processing such as transferring computation results or waiting for processing completion between various MPU programs becomes necessary.

Second, the number of physical MPUs is limited. Although multi-MPU parallelization for many processing is required, the allowable number of MPU processors is limited. If application processing is multi-threaded and many different MPU programs are run simultaneously on the MPUs, the number of MPUs necessary for executing MPU programs will easily run out.

To run more MPU programs in parallel than there are physical number of MPUs available, we need a mechanism to switch currently running MPU programs depending on the situation. Also, if programs interact with each other like above (1), the program execution order should be considered when switching and running MPU programs.

Thus, a complex mechanism to control execution of MPU programs is required for applications where multi-MPU parallelization is needed.

1.4 Host Centric Programming Model

As shown in Fig. 1.4, when using such a host processor centric programming model, the host processor load becomes heavily utilized in the managing of all MPU program execution and control operations. Not only will this tie up the host processor from processing other tasks, but the MPU programs will also experience a decrease in performance as they wait for the host processor to finish managing of all MPU programs.

To have finer control over the execution of MPU programs by this host-centric approach, MPU control code (for loading/switching MPU programs or making interactions between MPU programs) in host programs become more complex. Furthermore, this results in the decreased performance of MPU programs due to the waiting for completion of host programs.

For example, if a host program is running, other host programs in this application may need to wait for their turn to be processed until the host program has been completed. This causes a delay in the loading/switching of MPU programs or sending/receiving data between MPU programs, and consequently MPU must wait idle even if it is free to run other programs during that idle wait time.

This is the result of controlling the execution of MPU programs via the host programs. If such host processing flow can be eliminated, and MPUs can directly send/receive data and load/switch MPU programs for execution, the MPUs can be used more efficiently.

1.5 MPU Centric Programming Model

As shown in Fig. 1.5, loading/switching MPU programs and sending/receiving data is performed independently of the host program. In making the MPUs self managing, there is no longer a need to wait for the host processor to finish MPU management and thus increasing the MPU utilization and performance.

This also frees up the host processor from much of the MPU management. However the host processor is still responsible for some of the setup management necessary for MPU program execution. Other operations which cannot be performed by the MPU, such as file input/output must also be processed by the host program.

2 MARS Concepts

By using the MARS library, multiple MPU programs can be run cooperatively. This means that applications that run a large number of MPU programs one after another can be created without taking into account the physical number of MPUs available while leaving the reponsibility of efficiently switching MPU program execution up to the MARS library.

2.1 Kernel

The kernel is a relatively simple and small piece of code that stays resident on each MPU's storage area. Each kernel has its own scheduler that determines which workload to process. Based on the scheduled workload, the kernel will load the necessary MPU program to MPU storage and execute it.

As shown in Fig. 2.1, the kernel has 3 basic states of operation. Once loaded and started, the only reponsibility of the kernel is to search for a workload to schedule, jump execution to the MPU program of the workload, and context switch the workload if necessary, then return back to the scheduling state.

The kernel is a non-preemptive kernel and therefore workloads that are executed on each MPU will continue to run and use up the MPU's resources until it finishes execution or enters a wait state. When a workload enters a wait state, the kernel must handle the context switch. This involves saving the workload context into host storage for continued execution when the context is scheduled for execution at a later stage.

2.2 Workload

One example of a workload model may be a single large process that is executed on a single MPU, while another example of a workload model may define a large number of small processes that are executed on various MPUs. The workload is simply the encapsulation of a single process or multiple processes and is the unit used to schedule and execute processes on the MPUs.

MARS aims to provide various programming models not specific to just one. Therefore an abstract MARS workload is necessary to accommodate various programming models.

2.3 Workload Queue

The MARS kernel is responsible for searching for a schedulable workload in this queue and when found it loads the workload into MPU storage for processing. When a workload is scheduled by the kernel, the workload's state within the queue is set to a reserved state so no other kernel will attempt to schedule the same workload.

Since this queue is shared by both host and MPU, its access is protected by atomic operations.

2.4 Context

Before any of the MARS functionalities can be utilized, an instance of a MARS context must be initialized. When the system is completely done with MARS functionality, the context must be finalized.

When a context is initialized within a system by the host processor, each MPU (depending on how many MPUs are initialized for the context) is loaded with the MARS kernel that stays resident in MPU storage and continues to run until the host processor finalizes the context.

The context also creates the workload queue in host storage. Each kernel, through the use of atomic synchronization primitives, will reserve and schedule workloads from this queue.

When the context is finalized, all kernels running on the MPUs are terminated and all resources are freed.

In a system, multiple MARS contexts may be initialized and the kernels and workloads of each context will be independent of each other. However, one of the main purposes of MARS is to avoid the high cost of process context switches within MPUs initiated by the host processor. If multiple MARS contexts are initialized, there will be an enormous decrease in performance as each MARS context is context switched in and out. In the ideal scenario, there should be a single MARS context initialized for the whole system.

3 Overview of Usage

3.1 Host Library

Depending on the target platform, MARS should install the necessary host headers and libraries to the appropriate host paths.

In order to use any of the host processor library API, the user must include the following header:

        #include <mars/mars.h>  /* include header */

The host program written for the host processor needs to link in the MARS host library.

        libmars.a               /* static library */
        libmars.so              /* dynamic library */

The actual procedure to compile a MARS host program and to link the MARS host library may vary depending on the target platform.

        /* Example host 32-bit compile on Cell B.E. platform */
        HOST_CC =       ppu-gcc
        HOST_CFLAGS =   -m32

        $(HOST_CC) $(HOST_CFLAGS) host_prog.c -lspe2 -lmars

        /* Example host 64-bit compile on Cell B.E. platform */
        HOST_CC =       ppu-gcc
        HOST_CFLAGS =   -m64

        $(HOST_CC) $(HOST_CFLAGS) host_prog.c -lspe2 -lmars

3.2 MPU Library

Depending on the target platform, MARS should install the necessary MPU headers and libraries to the appropriate MPU paths.

The MPU programs written for the MPUs need to link in the MARS MPU library. In order to use any of the MPU library API, the user must include the following header:

        #include <mars/mars.h>

        libmars.a               /* static library */

When compiling the MPU programs, it is also necessary to specify the entry function as 'mars_entry' and specify the '.init' section to the workload base address (TBD: currently 0x10000).

The actual procedure to compile a MARS MPU program and to link the MARS MPU library may vary depending on the target platform.

        /* Example MPU compile on Cell B.E. platform */
        MPU_CC =        spu-gcc
        MPU_LD_FLAGS =  -Wl,-N -Wl,-gc-sections \
                        -Wl,--section-start,.init=0x10000 \
                        -Wl,--entry,mars_entry \
                        -Wl,-u,mars_entry

        $(MPU_CC) $(MPU_LD_FLAGS) mpu_prog.c -lmars

3.3 General Sequence

1. Initialize a MARS context.
2. Initialize a MARS workload instance.
3. Process necessary synchronizations between host and MPU programs.
4. Process other host program tasks asynchronous to MPU processing.
5. Finalize the MARS workload instance (waits until MARS workload completion).
6. Finalize the MARS context.

4 Context Management

4.1 Context Overview

When all processing is completed, the host program must also be responsible for finalizing the initialized MARS context.

4.2 Context Initialize

        /* sample host processor side host_prog.c */

        /* globals */
        static struct mars_context mars;                /* mars context */
        static struct mars_params params;               /* mars context params */
        ...

        params.num_mpus = 6;                            /* num MPUs */

        /* Initialize a MARS context with the parameters you specified */
        int ret = mars_initialize(&mars, &params);
        if (ret != MARS_SUCCESS)                        /* error checking */
                return USER_DEFINED_ERROR;              /* initialize failed */

4.3 Context Finalize

        /* Finalize the MARS context previously initialized */
        ret = mars_finalize(&mars);
        if (ret != MARS_SUCCESS)                        /* error checking */
                return USER_DEFINED_ERROR;              /* finalize failed */

5 Task Management

5.1 Task Overview

Tasks can be used to run a small MPU program many times. However the primary usage of the task model is for large grained programs that take long amounts of time to process. Since tasks may occupy the MPU for a long time and prevent other workloads to be executed on that MPU, it has the ability to yield the MPU to other workloads.

The MARS task synchronization API also provides various methods that when used to wait for certain events, allows it to enter a wait state. When tasks have yielded or are waiting, the task state is saved into host storage and the MPU is freed up to process other available workloads.

As shown in Fig. 5.1, the MARS kernel switches which MPU task programs are being executed on the MPUs. The kernel autonomously executes the tasks on the MPUs independently from the host. Whenever an MPU is free, the kernel will load any available task into the MPU storage for execution.

1. (host) Prepare the task program ELF image in host storage.
2. (host) Initialize task instances.
3. (host) Schedule tasks for execution.
4. (task) Schedule sub tasks for execution.
5. (task) Wait for sub task completion.
6. (task) Resume execution when all sub tasks have completed.
7. (task) Process and finish task execution.
8. (host) Wait for all tasks to complete.
9. (host) Finalize all task instances.

5.2 Task Program

The MARS task program must define the mars_task_main function, as that is the main entry point of the program. This function is what gets called when the kernel is ready to run the task.

The arguments (mars_task_args) passed into the mars_task_main function is specified in the host program when calling mars_task_schedule to allow the task to be scheduled for execution. If no args are specified when calling mars_task_schedule, the args passed into the mars_task_main function is uninitialized and its state is undefined.

        /* sample MPU side mpu_prog.c */

        #include <stdio.h>
        #include <mars/mars.h>

        int mars_task_main(const struct mars_task_args *task_args)
        {
                (void)task_args;

                printf("Hello World!\n");

                return 0;
        }

5.3 Task Parameters

        /* sample host processor side host_prog.c */

        static struct mars_task_params params;          /* MARS task params */

        /* Initialize the task parameters */
        params.name = "Task";                           /* name string */
        params.elf_image = elf_image;                   /* pointer to ELF image */
        params.context_save_size = 0;                   /* no context save area */

name
This specifies a string identifier for the task. The string length must be no longer than MARS_TASK_NAME_LEN_MAX.

elf_image
This specifies the address to the MPU program ELF image loaded into host storage. This MPU program needs to be a MARS task program.

context_save_size
This specifies the task context save size for the task. A memory area of the size specified will be allocated in host storage. If the task enters a wait state, the task context will be saved into this save area in host storage until the task resumes execution. Currently the context_save_size must be either 0 or MARS_TASK_CONTEXT_SAVE_SIZE_MAX.

5.4 Task Initialize

        /* sample host processor side host_prog.c */

        /* globals */
        static struct mars_context mars;                /* MARS context */
        static struct mars_task_context task;           /* MARS task */
        static struct mars_task_params params;          /* MARS task params */
        static struct mars_task_id id;                  /* MARS task id */
        ...

        /* Assume MARS context is initialized as shown above */
        ...

        /* Assume MARS task params are initialized as shown above */
        ...

        /* Task initialization creates the task instance */
        int ret = mars_task_initialize(&mars, &task, &id, &params);
        if (ret != MARS_SUCCESS)                        /* error checking */
                return USER_DEFINED_ERROR;              /* initialize failed */

MARS task initialization creates an instance of a task in the MARS context's workload queue.

The initialized task id is returned to the user. The task id needs to be saved for management of the task.

Once a task is initialized, it must be scheduled for execution before it is ever executed by calling mars_task_schedule.

Any initialized tasks should be properly cleaned up with a call to mars_task_finalize when the task will no longer be scheduled for execution.

5.5 Task Execution

        /* sample host processor side host_prog.c */

        struct mars_task_args task_args;                /* MARS task args */
        uint8_t task_priority = 0;                      /* priority (0 to 255) */

        /* Sets the task to a schedulable state */
        ret = mars_task_schedule(&task_id, &task_args, task_priority);
        if (ret != MARS_SUCCESS)                        /* error checking */
                return USER_DEFINED_ERROR;              /* schedule failed */

        /* Host processor can process something while the MPUs execute the tasks asynchronously. */
        ...

        /* Blocks until the scheduled task has finished execution */
        ret = mars_task_wait(&task_id);
        if (ret != MARS_SUCCESS)                        /* error checking */
                return USER_DEFINED_ERROR;              /* wait failed */

MARS task execution is done by scheduling an initialized task to be run by the MARS kernel. The MARS kernels running on the MPUs will automatically schedule it and load the task over to the MPU to begin execution.

While the MARS kernels process various workloads on the MPU side, the host is free to do any other processing asynchronous to any workload processing on the MPUs.

When the user chooses to do so, they can wait for a specific scheduled task to finish execution.

After a MARS task is initialized, it may be scheduled for execution any number of times until it is finalized. However, a task can only be scheduled if it is not currently in the process of execution.

A MARS task that has been initialized by the host can be scheduled for execution by both the host and MPU-side APIs. The behavior of scheduling a task from host or MPU is identical in nature. If a task schedules a sub task for execution, and waits for the sub task to finish execution (assuming the use of a blocking wait call), it will yield its own execution until the sub task has completed. This allows for other workloads to be processed on the MPU that was executing the waiting task.

args
This specifies the argument structure that will be passed into the task program's mars_task_main function. If NULL is specified for args, the args passed into the mars_task_main function is uninitialized and its state is undefined. You should specify NULL only if you are certain the task program will not access the args passed into mars_task_main function.

priority
This specifies the priority of the task. Task priorities range from 0 to 255, from lowest to highest priority. Higher priority tasks will be scheduled over lower priority tasks if both are available to be scheduled for execution.

5.6 Task Switching

When the task is no longer in a waiting state and is scheduled by the kernel to run again, the saved task context will be restored from the host storage back into MPU storage for resuming of task execution where it left off.

This task switching allows the kernel to schedule other workloads to be executed on the MPU without wasting valuable processing time while some tasks are left in a waiting state.

It is important to note that a task is only capable of doing a task switch if it is initialized with a context save area (see mars_task_initialize). If no context save area is specified for the task, yield calls and any blocking calls that may put the task into a waiting state will result in error.

5.7 Task Finalize

        /* sample host processor side host_prog.c */

        /* Finalize the task previously initialized */
        ret = mars_task_finalize(&task_id);
        if (ret != MARS_SUCCESS)                        /* error checking */
                return USER_DEFINED_ERROR;              /* finalize failed */

MARS task finalization will cleanup the inialized task and remove the instance from the workload queue. Once the task is finalized, the task instance will be removed from the MARS context's workload queue and the task's resources will be freed.

This function should be called when the task will no longer be scheduled for execution by a call to mars_task_schedule. Once a task is finalized, the task and task id will become obsolete.

6 Task Synchronization

6.1 Overview

As described previously, enabling MARS tasks to send/receive data directly between each other independently of the host is the important factor in improving the usability and efficiency of MPUs. MARS provides various synchronization and communication functions which can make efficient interaction between MARS tasks or between MARS tasks and host programs.

The MARS Task Synchronization API provides the following types of synchronization objects:

This is used to make multiple MARS tasks wait at a certain point in a program and to resume the task execution when all tasks are ready.

This is used to send event notifications between MARS tasks or between MARS tasks and host programs.

This is used to provide a FIFO queue mechanism for data transfer between MARS tasks or between MARS tasks and host programs.

This is used to limit the number of concurrent accesses to shared resources among MARS tasks.

This is used to signal a MARS task in the waiting state to change state so that it can be scheduled to continue execution.

As shown in Fig. 6.1, task synchronization instances are created in host storage. Both the host program and MPU program's MARS task access these instances resident on the host storage.

6.2 Benefits

In Fig. 6.2, the semaphore synchronization method is used as an example to show the benefit of using the MARS task synchronization over a simple synchronization method.

When using simple synchronization methods within a MARS task, if the synchronization method blocks, it will force the task to wait until the synchronization method allows for execution to resume. If a task must wait on some synchronization method for a very long time, the MPU executing the task will be forced to block without being able to process anything else during that time.

The MARS task synchronization methods prevent the wasting of valuable MPU processing time during the time a task blocks on some synchronization method. When a MARS task blocks on some synchronization method, the task itself will enter a waiting state. This allows for the MPU executing the task to do a task switch, allowing it to execute some other task that is not in a waiting state. Once the original task in the waiting state receives the synchronization event it was waiting for its state will be returned to a runnable state and will be scheduled for resumed execution when the MPU becomes available.

6.3 Task Barrier

1. (host) Allocate memory for task barrier structure.
2. (host) Initialize task barrier.
3. (host) Initialize tasks and schedule for execution.
4. (task) Process until synchronization point.
5. (task) Notify barrier of synchronization point arrival.
6. (task) Wait until all tasks notify barrier and barrier is released.
7. (task) Finish task execution.
8. (host) Wait for task completion and finalize tasks.
9. (host) Finalize task barrier and free allocated memory.

In Fig. 6.3, there is a MARS task barrier initialized to wait on notifications from 3 separate tasks.

First, Task A reaches the synchronization point first and notifies the barrier. Since the barrier has not yet been released, it enters a wait state and yields the MPU to execute another Task X.

Next, Task B reach the synchronization point soon after and yields MPU execution to another Task Y after notifying the barrier. Finally, Task C reaches the synchronization point, at which point it notifies the barrier and the barrier is released.

Once the barrier is released, Task B continues with execution while both Tasks A and C are avialable to be scheduled for execution as soon as there is an available MPU.

6.4 Task Event Flag

The event flags can be sent from host program to MARS task or vice versa, as well as between multiple MARS tasks. While waiting on certain event flags to be received, the task transitions to the waiting state until the event flag is received.

1. (host) Allocate memory for task event flag structure.
2. (host) Initialize task event flag.
3. (host) Initialize tasks and schedule for execution.
4. (task) Process until synchronization point.
5. (task) Wait until specified event flag bit is set.
6. (host or task) Set the specified event flag bit.
7. (task) Finish task execution.
8. (host) Wait for task completion and finalize tasks.
9. (host) Finalize task event flag and free allocated memory.

In Fig. 6.4, there are 2 separate MARS event flags initialized. One event flag is initialized for host to MPU communication, while the other is initialized for MPU to MPU communication.

First, Task A reaches the synchronization point first and waits for a specific event flag bit to be set. As it waits for the event, it enters the wait state and yields execution of the MPU so that Task X can run.

Next, Task B reaches its synchronization point and allows for Task Y to run while it waits for the event.

Next, the host program sets the event flag bit Task A is waiting on, at which point Task A becomes available for resumed execution.

Finally, as Task A becomes scheduled and resumes execution it then sets the event flag bit Task B is waiting on, at which point Task B becomes available for resumed execution.

6.5 Task Queue

From either a host program or MARS task you can push data into the queue and also from either a host program or MARS task you can pop data out from the queue as soon as it becomes available.

The advantage of the MARS task queue is that when a MARS task requests to do a pop and no data is available yet to be received from the queue, the MARS task will enter a waiting state. As soon as data is available to be popped from the queue, the MARS task can be scheduled for resumed execution with the received data.

1. (host) Allocate memory for task queue structure.
2. (host) Initialize task queue.
3. (host) Initialize tasks and schedule for execution.
4. (task) Process until synchronization point.
5. (task) Pop queue and wait until data is available.
6. (host or task) Push queue with data.
7. (task) Receive data and finish task execution.
8. (host) Wait for task completion and finalize tasks.
9. (host) Finalize task queue and free allocated memory.

In Fig. 6.5, there is a MARS queue instance is initialized to send and receive data between a host program and MARS tasks.

First, Task A reaches the synchronization point first where it requests to pop data from the queue. At this point in time, nobody has pushed data into the queue, and the queue is empty. This causes Task A to enter a wait state and yield MPU execution to Task X.

Next, Task B reaches its synchronization point and requests to pop data. Since the queue is still empty, it also enters the waiting state and yields MPU execution to another Task Y.

Next, the host program push some data into the queue, at which point Task A becomes available for resumed execution with the data from the host received.

Finally, as Task A becomes scheduled and resumes execution, it then pushes some other data into the queue, at which point Task B becomes available for resumed execution with the data from Task A received.

6.6 Task Semaphore

Whenever a task wants to access some semaphore protected shared resource, it must first request to acquire the semaphore access (P operation) of the semaphore. When done accessing the shared resource it must then release access (V operation) of the semaphore. If attempting to request a a semaphore and other tasks have already requested the total number of allowed accesses, the task will transition to the waiting state until some other tasks release the semaphore and access is obtained.

1. (host) Allocate memory for task semaphore structure.
2. (host) Initialize task semaphore.
3. (host) Initialize tasks and schedule for execution.
4. (task) Process until synchronization point.
5. (task) Acquire sempahore and wait until semaphore is obtained.
6. (task) Modify shared resource data.
7. (task) Release semaphore and finish execution.
8. (host) Wait for task completion and finalize tasks.
9. (host) Finalize task semaphore and free allocated memory.

In Fig. 6.6, there is a MARS semaphore initialized to be shared between 2 MARS tasks. This semaphore is used to prevent simultaenous access of some shared data in the host storage.

First, Task A reaches the synchronization point first where it requests to acquire the semaphore. Since no other task holds the semaphore, Task A successfully acquires the semaphore without having to wait. It then continues execution to modify some shared data in the host storage.

Next, Task B reaches the synchronization point where it requests to acquire the same semaphore to modify the same shared data in host storage. At the time of the request to acquire the semaphore, Task A still holds the semaphore, causing Task B to enter a waiting state. As Task B is waiting, it yields MPU execution to another Task X.

Next, Task A completes modifying the shared data in host storage and releases the semaphore. This allows Task B to become available for resumed execution.

Finally, as Task B becomes scheduled for resumed execution, it continues to modify the shared data in host storage. Task B then releases the semaphore when access to the shared data is complete.

6.7 Task Signal

From either a host program or MARS task you can specify a certain task to signal. When the task waits for a signal to be received it will be transitioned to the waiting state until the signal is received.

1. (host) Initialize tasks and schedule for execution.
2. (task) Process until synchronization point.
3. (task) Wait for signal.
4. (host or task) Send signal to the waiting task.
5. (task) Resume and finish execution.
6. (host) Wait for task completion and finalize tasks.

In Fig. 6.7, there is a host program using signals to synchronize execution between 2 MARS tasks.

First, Task A reaches the synchronization point first where it waits on a signal. At this point in time, nothing has signalled Task A and causes it to enter a wait state and yield MPU execution to Task X.

Next, Task B reaches its synchronization point and waits on a signal. Since nothing has signalled Task B, it also enters the waiting state and yields MPU execution to another Task Y.

Next, the host program sends a signal to Task A, at which point Task A becomes available for resumed execution.

Finally, as Task A becomes scheduled and resumes execution, it signals Task B, at which point Task B becomes available for resumed execution.

7 Task Tutorials

7.1 Task Execution from Host

The sample code initializes and schedules a task that prints that prints "Hello!" to stdout and exits.

 1      #include <mars/mars.h>
 2
 3      static void *task_program_elf_image;
 4
 5      static struct mars_context mars_ctx;
 6      static struct mars_task_id task_id;
 7      static struct mars_task_params task_params;
 8
 9      int main(void)
10      {
11              task_params.name = "Task";
12              task_params.elf_image = task_program_elf_image;
13              task_params.context_save_size = 0;
14
15              mars_initialize(&mars_ctx, NULL);
16              mars_task_initialize(&mars_ctx, &task_id, &task_params);
17              mars_task_schedule(&task_id, NULL, 0);
18              mars_task_wait(&task_id);
19              mars_task_finalize(&task_id);
20              mars_finalize(&mars_ctx);
21
22              return 0;
23      }

Line:1	Include the header file "mars/mars.h" necessary for utilizing the MARS library.
Line:3	Pointer to the task program's ELF image in host storage. The procedure to load the task program into host storage is platform specific. Therefore, the code to do so is not shown anywhere in this sample code.
Line:5	Declare the structure for storing the MARS context. You can also dynamically allocate the MARS context instance: struct mars_context *mars_ctx = malloc(sizeof(struct mars_context));
Line:6	Declare the structure for storing the MARS task id.
Line:7	Declare the structure for storing the MARS task params.
Lines:11-13	Initialize the parameters for task initialization. struct mars_task_params { name: Specify the NULL terminated string name of the task you want to initialize. This parameter will be passed into the task initialization function in Line 15. elf_image: Specify the address of the task program's ELF image that is loaded into host storage. The task program specified here is what will be loaded into MPU storage for execution when this task is scheduled to run by the MARS kernel. context_save_size: Specify the context save area size for this task. Since this task will not be yielding execution or entering a wait state, we do not need to initialize a context save area so specify 0 for context save area size. Otherwise, to initialize a context save size, specify MARS_TASK_CONTEXT_SAVE_SIZE_MAX. }
Line:15	Initialize the MARS context instance. int mars_initialize ( arg1: Pass in the pointer to the MARS context instance declared at Line:5. The instance will be initialized as required. arg2: Pass in the pointer to MARS parameters structure. Since we want to initialize the MARS context with default behavior, we can pass NULL for the context parameter argument. The default behavior for the MARS context is to use all available MPUs for running the MARS kernel. return: MARS_SUCCESS is returned on success and a negative error value otherwise. )
Line:16	Initialize the task instance. Initialize the MARS task instance. int mars_task_initialize ( arg1: Pass in the pointer to the MARS context instance initialized at Line:15. arg2: Pass in the pointer to MARS task id structure declared at Line:16. Upon successful completion, the task id will be initialized as required. arg3: Pass in the pointer to the MARS task parameters we want to specify for the task, previously initialized by Lines:11-13. return: MARS_SUCCESS is returned on success and a negative error value otherwise. )
Line:17	Schedule the task for execution. int mars_task_schedule ( arg1: Pass in the pointer to the task id initialized at Line:16. arg2: Pass in the pointer to the task arg structure we want to pass into the task program's mars_task_main function. For this sample we do not need to pass any args into the task program so specify NULL. arg3: Pass in the value for the scheduling priority this task. Since we only schedule 1 task for execution, the scheduling priority has no effect in this example. return: MARS_SUCCESS is returned on success and a negative error value otherwise. )
Line:18	Wait for the completion of the task. int mars_task_wait ( arg1: Pass in the pointer to the task id we want to wait for. return: MARS_SUCCESS is returned on success and a negative error value otherwise. ) This call will block until the task previously scheduled in Line:17 completes execution. If we want to process some other tasks in the host program while waiting for the task to complete, we can do so before calling wait. Similarly, a non-blocking wait function mars_task_try_wait is also provided to poll for task completion.
Line:19	Finalize the completed task. int mars_task_finalize ( arg1: Pass in the pointer to the task id we want to finalize. return: MARS_SUCCESS is returned on success and a negative error value otherwise. ) We can only call this function when we are sure the task has finished. In this example we are sure of completion because we properly waited for task completion in Line:18. After the task is finalized, we can no longer schedule this task for execution.
Line:20	Finalize the MARS context. int mars_finalize ( arg1: Pass in the pointer to the MARS context we want to finalize. return: MARS_SUCCESS is returned on success and a negative error value otherwise. ) This unloads all running MARS kernels from the MPUs and handles any necessary cleanup for the MARS library. No more MARS API calls can be made after this function until the MARS context is initialized once again.

 1      #include <stdio.h>
 2      #include <mars/mars.h>
 3
 4      int mars_task_main(const struct mars_task_args *task_args)
 5      {
 6              (void)task_args;
 7
 8              printf("MPU(%d): %s - Hello!\n",
 9                      mars_task_get_kernel_id(), mars_task_get_name());
10
11              return 0;
12      }

Lines:1-2	Include the header file "stdio.h" for printf and "mars/mars.h" necessary for utilizing the MARS library.
Line:6	Since we specified NULL for the task args in Line:17 of the host program above, the state of task_args is undefined. In this program we do not and should not access the task_args.
Lines:8-9	Print out message to stdout. The calls to mars_task_get_kernel_id returns the id of the kernel that the current task is running on. The calls to mars_task_get_name return the string name of the current running task specified during task initialization at Line:16 of the host program above.
Line:11	Returning from mars_task_main completes execution of the task. This will signal anything waiting for this task's completion to resume execution. In this example, the host program's call to mars_task_wait in Line:18 will return. Equivalent to returning from mars_task_main, we can also call mars_task_exit.

7.2 Task Execution from MPU

The sample code initializes 3 separate task instances. One instance of the main task 1 program is initialized and 2 instances of a sub task 2 program is initialized.

The first main task is scheduled for execution by the host. The main task then schedules 2 instances of the sub task for execution using the sub task's id's specified by the arguments passed in by the host during scheduling.

Each instance of the sub task will print out "Hello!" and a unique value specified by the arguments passed in by the main task during scheduling.

 1      #include <mars/mars.h>
 2
 3      #define NUM_SUB_TASKS   2
 4
 5      static void *task1_program_elf_image;
 6      static void *task2_program_elf_image;
 7
 8      static struct mars_context mars_ctx;
 9      static struct mars_task_id task1_id;
10      static struct mars_task_id task2_id[NUM_TASKS];
11      static struct mars_task_params task_params;
12      static struct mars_task_args task_args;
13
14      int main(void)
15      {
16              int i;
17
18              mars_initialize(&mars_ctx, NULL);
19
20              task_params.name = "Task 1";
21              task_params.elf_image = task1_program_elf_image;
22              task_params.context_save_size = MARS_TASK_CONTEXT_SAVE_SIZE_MAX;
23              mars_task_initialize(&mars_ctx, &task1_id, &task_params);
24
25              for (i = 0; i < NUM_SUB_TASKS; i++) {
26                      char name[16];
27                      sprintf(name, "Task 2.%d", i);
28
29                      task_params.name = (const char *)&name;
30                      task_params.elf_image = task2_program_elf_image;
31                      task_params.context_save_size = 0;
32                      mars_task_initialize(&mars_ctx, &task2_id[i], &task_params);
33              }
34
35              task_args.type.u64[0] = (uint64_t)(uintptr_t)&task2_id[0];
36              task_args.type.u64[1] = (uint64_t)(uintptr_t)&task2_id[1];
37              mars_task_schedule(&task1_id, &task_args, 0);
38
39              mars_task_wait(&task1_id);
40              mars_task_finalize(&task1_id);
41
42              for (i = 0; i < NUM_SUB_TASKS; i++)
43                      mars_task_wait(&task2_id[i]);
44
45              mars_finalize(&mars_ctx);
46
47              return 0;
48      }

Line:10	Declare an instance of the task id structure for each sub task we want to initialize and schedule.
Lines:20-23	Initialize the main task instance with the ELF image of task program 1. The main task needs to provide a context save area in order to allow for context switching while waiting for sub task completion. Specify MARS_TASK_CONTEXT_SAVE_SIZE_MAX for the context save area size so a context save area is initialized for the main task context.
Lines:25-33	Initialize the 2 sub task instances with the ELF image of task program 2. The sub task does not need to do a context switch so no context save area size needs to specified.
Lines:35-36	The main task needs to know the addresses of the task ids it plans to schedule for execution. Store each sub task id address into the task args passed into the main task's mars_task_main function.
Line:37	Schedule the main task for execution. Pass in the task args we initialized with the sub task id addresses at Lines:35-36. Since we only schedule 1 main task for execution, and the main task is waiting when any one of its sub task's is being executed, the scheduling priority specified has no effect in this example.
Line:39	Wait for the completion of the main task.
Line:40	Finalize the completed main task.
Line:42-43	Finalize the completed sub tasks also.

 1      #include <mars/mars.h>
 2
 3      int mars_task_main(const struct mars_task_args *task_args)
 4      {
 5              struct mars_task_id task2_0_id;
 6              struct mars_task_id task2_1_id;
 7              struct mars_task_args args;
 8
 9              get(&task2_0_id, task_args->type.u64[0], sizeof(task2_0_id));
10              get(&task2_1_id, task_args->type.u64[1], sizeof(task2_1_id));
11
12              args.type.u32[0] = 123;
13              mars_task_schedule(&task2_0_id, &args, 0);
14
15              args.type.u32[0] = 321;
16              mars_task_schedule(&task2_1_id, &args, 0);
17
18              mars_task_wait(&task2_0_id);
19              mars_task_wait(&task2_1_id);
20
21              return 0;
22      }

Line:3	Since the task args were passed into mars_task_schedule at Line:40 of the host program, task_args is pointing to an initialized mars_task_args structure.
Line:5	Declare an instance to store the task id of the first sub task to execute.
Line:6	Declare an instance to store the task id of the second sub task to execute.
Line:7	Declare an instance or the task arg structure we want to initialize with unique IDs to pass into the sub tasks.
Lines:9-10	Memory transfer from host storage to MPU storage the task id structures of the initialized sub tasks. The host storage addresses of these task id structures were specified at Line:37 of the host program. The function "get" shown here is a generic place holder for the platform specific function to do the memory transfer. Please refer to your platform specific API to learn how to do the memory transfer from host storage to MPU storage on your specific platform.
Lines:12-13	Initialize the task args structure with a unique value. Schedule the first sub task instance using the task id obtained at Line:9. Pass in the task args and priority of 0.
Lines:15-16	Initialize the task args structure with a unique value. Schedule the second sub task instance using the task id obtained at Line:10. Pass in the task args and priority of 0.
Lines:18-19	Wait for the completion of both sub tasks. If the first sub task has not finished execution by the time of the call to mars_task_wait at Line:18, this main task will enter a wait state and its context will be switched out. When the first sub task completes execution, this main task will resume execution and continue on to wait for the second sub task to complete. Similarly, at the time of the call to mars_task_wait at Line:19, if the second sub task has not yet completed it will enter a wait state once again until completion of the second sub task.

 1      #include <stdio.h>
 2      #include <mars/mars.h>
 3
 4      int mars_task_main(const struct mars_task_args *task_args)
 5      {
 6              printf("MPU(%d): %s - Hello! (%d)\n",
 7                      mars_task_get_kernel_id(), mars_task_get_name(),
 8                      task_args->type.u32[0]);
 9
10              return 0;
11      }

Line:4	Since the task args were passed into mars_task_schedule at Line:13 and Line:16 of the main task 1 program, task_args is pointing to an initialized mars_task_args structure. This structure contains the unique value specified by the main task 1 program.
Lines:6-8	Print out message to stdout. Print out the unique value specified by the main task 1 program. This value should be unique for each sub task program.

7.3 Task Barrier Usage

The sample code initializes a task barrier, 10 task instances of task 1 program, and a final task instance of task 2 program. When each of the 10 first tasks finishes its own processing, they each notify the barrier. Once all tasks notify the barrier and the barrier is released, the final task waiting on the barrier is allowed to process.

 1      #include <mars/mars.h>
 2
 3      #define NUM_TASKS       10
 4
 5      static void *task1_program_elf_image;
 6      static void *task2_program_elf_image;
 7
 8      static struct mars_context mars_ctx;
 9      static struct mars_task_id task1_id[NUM_TASKS];
10      static struct mars_task_id task2_id;
11      static struct mars_task_params task_params;
12      static struct mars_task_args task_args;
13      static struct mars_task_barrier barrier;
14
15      int main(void)
16      {
17              int i;
18
19              mars_initialize(&mars_ctx, NULL);
20
21              mars_task_barrier_initialize(&mars_ctx, &barrier, NUM_TASKS);
22
23              for (i = 0; i < NUM_TASKS; i++) {
24                      char name[16];
25                      sprintf(name, "Task %d", i);
26
27                      task_params.name = (const char *)&name;
28                      task_params.elf_image = task1_program_elf_image;
29                      task_params.context_save_size = MARS_TASK_CONTEXT_SAVE_SIZE_MAX;
30                      mars_task_initialize(&mars_ctx, &task1_id[i], &task_params);
31
32                      task_args.type.u64[0] = (uint64_t)(uintptr_t)&barrier;
33                      mars_task_schedule(&task1_id[i], &task_args, 0);
34              }
35
36              task_params.name = "Task Final";
37              task_params.elf_image = task2_program_elf_image;
38              task_params.context_save_size = MARS_TASK_CONTEXT_SAVE_SIZE_MAX;
39              mars_task_initialize(&mars_ctx, &task2_id, &task2_params);
40
41              task_args.type.u64[0] = (uint64_t)(uintptr_t)&barrier;
42              mars_task_schedule(&task2_id, &task2_args, 0);
43
44              for (i = 0; i < NUM_TASKS; i++) {
45                      mars_task_wait(&task1_id[i]);
46                      mars_task_finalize(&task1_id[i]);
47              }
48
49              mars_task_wait(&task2_id);
50              mars_task_finalize(&task2_id);
51
52              mars_finalize(&mars_ctx);
53
54              return 0;
55      }

Line:9	Declare an array of 10 task ids for each instance of task 1 program we plan to initialize and schedule.
Line:10	Declare one instance of the task id structure for the final task 2 program we plan to initialize and schedule.
Line:13	Declare an instance of the task barrier structure we plan to initialize.
Line:21	Initialize the task barrier instance. int mars_task_barrier_initialize ( arg1: Pass in the pointer to the initialized MARS context. arg2: Pass in the pointer to the barrier instance we declared at Line:13. arg3: Pass in the total number of task notifications to wait for before the barrier is released. return: MARS_SUCCESS is returned on success and a negative error value otherwise. )
Lines:27-30	Initialize the task params and initialize each of the 10 task 1 instances. Specify the task 1 program ELF image for these task instances Specify a context save area size of MARS_TASK_CONTEXT_SAVE_SIZE_MAX to allow these tasks to context switch.
Lines:32-33	Initialize the task args we want passed into task 1 program's mars_task_main function. Store the host storage address of the barrier instance initialized at Line:21. Schedule the task instance for execution, passing in the task args.
Lines:36-39	Initialize the task params and initialize a single task 2 instance. Specify the task 2 program ELF image for these task instances Specify a context save area size of MARS_TASK_CONTEXT_SAVE_SIZE_MAX to allow these tasks to context switch.
Lines:32-33	Initialize the task args we want passed into task 2 program's mars_task_main function. Store the host storage address of the barrier instance initialized at Line:21. Schedule the task instance for execution, passing in the task args.
Lines:44-52	Wait for completion and finalize all 10 task 1 instances. Wait for completion finalize the final task instance. Finally finalize the MARS context.

 1      #include <stdio.h>
 2      #include <mars/mars.h>
 3
 4      int mars_task_main(const struct mars_task_args *task_args)
 5      {
 6              uint64_t barrier_ea = task_args->type.u64[0];
 7
 8              printf("MPU(%d): %s - Hello!\n",
 9                      mars_task_get_kernel_id(), mars_task_get_name());
10
11              mars_task_barrier_notify(barrier_ea);
12
13              return 0;
14      }

Line:4

Since the task args were passed into mars_task_schedule at Line:33 of the host program, task_args is pointing to an initialized mars_task_args structure.

Line:6

Grab the host storage address of the barrier initialized in the host program from the task arg structure.

Line:11

Notify the barrier that we have arrived at the synchronization point.

int mars_task_barrier_notify (

arg1: Pass in the host storage address of the barrier.

return: MARS_SUCCESS is returned on success and a negative error value otherwise.

)

 1      #include <stdio.h>
 2      #include <mars/mars.h>
 3
 4      int mars_task_main(const struct mars_task_args *task_args)
 5      {
 6              uint64_t barrier_ea = task_args->type.u64[0];
 7
 8              mars_task_barrier_wait(barrier_ea);
 9
10              printf("MPU(%d): %s - Hello!\n",
11                      mars_task_get_kernel_id(), mars_task_get_name());
12
13              return 0;
14      }

Line:4

Since the task args were passed into mars_task_schedule at Line:42 of the host program, task_args is pointing to an initialized mars_task_args structure.

Line:6

Grab the host storage address of the barrier initialized in the host program from the task arg structure.

Line:11

Wait for the barrier to be released.

int mars_task_barrier_wait (

arg1: Pass in the host storage address of the barrier.

return: MARS_SUCCESS is returned on success and a negative error value otherwise.

)

If the barrier has not been released by the time of this call, this task will enter a wait state and its context will be switched out. When the all tasks notify the barrier and the barrier is released, this task will resume execution and continue.

7.4 Task Event Flag Usage

This sample code initializes 2 task instances for task 1 program and task 2 program and initializes 3 event flags.

The first event flag is used to synchronize between the host program and task 1. Task 1 can only begin processing after the host program has waited 1 second and sets the event flag for task 1 to begin.

The second event flag is used to synchronize between the 2 tasks. Task 2 can only begin processing after task 1 has completed its processing and sets the event flag for task 2 to begin.

The third event flag is used to synchronize between task 2 and the host program. The host program waits until task 2 has completed its processing and sets the event flag for the host program to continue and finish execution.

 1      #include <unistd.h>
 2      #include <mars/mars.h>
 3
 4      static void *task1_program_elf_image;
 5      static void *task2_program_elf_image;
 6
 7      static struct mars_context mars_ctx;
 8      static struct mars_task_id task1_id;
 9      static struct mars_task_id task2_id;
10      static struct mars_task_params task_params;
11      static struct mars_task_args task_args;
12      static struct mars_task_event_flag host_to_mpu;
13      static struct mars_task_event_flag mpu_to_host;
14      static struct mars_task_event_flag mpu_to_mpu;
15
16      int main(void)
17      {
18              mars_initialize(&mars_ctx, NULL);
19
20              mars_task_event_flag_initialize(&mars_ctx, &host_to_mpu,
21                                              MARS_TASK_EVENT_FLAG_HOST_TO_MPU,
22                                              MARS_TASK_EVENT_FLAG_CLEAR_AUTO);
23
24              mars_task_event_flag_initialize(&mars_ctx, &mpu_to_host,
25                                              MARS_TASK_EVENT_FLAG_MPU_TO_HOST,
26                                              MARS_TASK_EVENT_FLAG_CLEAR_AUTO);
27
28              mars_task_event_flag_initialize(&mars_ctx, &mpu_to_mpu,
29                                              MARS_TASK_EVENT_FLAG_MPU_TO_MPU,
30                                              MARS_TASK_EVENT_FLAG_CLEAR_AUTO);
31
32              task_params.name = "Task 1";
33              task_params.elf_image = task1_program_elf_image;
34              task_params.context_save_size = MARS_TASK_CONTEXT_SAVE_SIZE_MAX;
35              mars_task_initialize(&mars_ctx, &task1_id, &task_params);
36
37              task_params.name = "Task 2";
38              task_params.elf_image = task2_program_elf_image;
39              task_params.context_save_size = MARS_TASK_CONTEXT_SAVE_SIZE_MAX;
40              mars_task_initialize(&mars_ctx, &task2_id, &task_params);
41
42              task_args.type.u64[0] = (uint64_t)(uintptr_t)&host_to_mpu;
43              task_args.type.u64[1] = (uint64_t)(uintptr_t)&mpu_to_mpu;
44              mars_task_schedule(&task1_id, &task_args, 0);
45
46              task_args.type.u64[0] = (uint64_t)(uintptr_t)&mpu_to_mpu;
47              task_args.type.u64[1] = (uint64_t)(uintptr_t)&mpu_to_host;
48              mars_task_schedule(&task2_id, &task_args, 0);
49
50              sleep(1);
51
52              mars_task_event_flag_set(&host_to_mpu, 0x1);
53
54              mars_task_event_flag_wait(&mpu_to_host, 0x1,
55                                      MARS_TASK_EVENT_FLAG_MASK_AND);
56
57              mars_task_wait(&task1_id);
58              mars_task_wait(&task2_id);
59
60              mars_task_finalize(&task1_id);
61              mars_task_finalize(&task2_id);
62
63              mars_finalize(&mars_ctx);
64
65              return 0;
66      }

Lines:12-14	Declare 3 instances of the task event flag structure we plan to initialize.
Lines:20-30	Initialize the 3 task event flag instances. int mars_task_event_flag_initialize ( arg1: Pass in the pointer to the initialized MARS context. arg2: Pass in the pointer to the event flag instances we declared at Lines:12-14. arg3: Pass in the direction of events for each instance. The direction must be MARS_TASK_EVENT_FLAG_HOST_TO_MPU, MARS_TASK_EVENT_FLAG_MPU_TO_HOST, or MARS_TASK_EVENT_FLAG_MPU_TO_MPU. arg4: Pass in the clear mode for each instance. Specify MARS_TASK_EVENT_FLAG_CLEAR_AUTO so the event flag bit is automatically cleared when the first task waiting on the event receives the event. To specify not clearing the event bits automatically so that the event flag bits are set until some task manually clears it, specify MARS_TASK_EVENT_FLAG_CLEAR_MANUAL. return: MARS_SUCCESS is returned on success and a negative error value otherwise. ) The first event flag is initialized for host program to task program events. The second event flag is initialized for task program to host program events. The third event flag is initialized for task program to task program events.
Lines:32-40	Initialize the task params and initialize a task instance for both the task 1 program and task 2 program. Specify a context save area size of MARS_TASK_CONTEXT_SAVE_SIZE_MAX to allow these tasks to context switch.
Lines:42-44	Initialize the task args we want passed into task 1 program's mars_task_main function. Store the host storage address of the event flag instances for both host to mpu and mpu to mpu communication. These event flags will be used to receive events from the host program and also to send events to task 2 program. Schedule the task instance for execution, passing in the task args.
Lines:46-48	Initialize the task args we want passed into task 2 program's mars_task_main function. Store the host storage address of the event flag instances for both mpu to mpu and mpu to host communication. These event flags will be used to receive events from the task 1 program and also to send events to the host program. Schedule the task instance for execution, passing in the task args.
Line:50	Sleep for 1 second before continuing. This allows enough time for the tasks to be scheduled and begin execution. This is only to demonstrate the task entering the wait state when waiting for a specific event.
Line:52	Set the event that task 1 is waiting for to allow task 1 to continue execution. int mars_task_event_flag_set ( arg1: Pass in the pointer to the event flag instance we initialized for host to MPU communication. arg2: Pass in the value specifying which bits to set in the event flag. These bits are logically OR'ed with the bits already set in the event flag. Lines:12-14. return: MARS_SUCCESS is returned on success and a negative error value otherwise. )
Lines:54-55	Wait for an event from task 2 before continuing execution. int mars_task_event_flag_wait ( arg1: Pass in the pointer to the task instance we initialized for MPU to host communication. arg2: Pass in the value specifying which bits to check in the event flag. Specify MARS_TASK_EVENT_FLAG_MASK_OR to wait for any of the specified bits to be set. Specify MARS_TASK_EVENT_FLAG_MASK_AND to wait for all of the specified bits to be set. return: MARS_SUCCESS is returned on success and a negative error value otherwise. ) If the event flag has not been set by the time of this call, this call will block until the specific event flag bit is set.
Lines:57-63	Wait for completion and finalize the 2 task instances and finally finalize the MARS context.

 1      #include <stdio.h>
 2      #include <mars/mars.h>
 3
 4      int mars_task_main(const struct mars_task_args *task_args)
 5      {
 6              uint64_t host_to_mpu_ea = task_args->type.u64[0];
 7              uint64_t mpu_to_mpu_ea = task_args->type.u64[1];
 8
 9              mars_task_event_flag_wait(host_to_mpu_ea, 0x1,
10                                      MARS_TASK_EVENT_FLAG_MASK_AND);
11
12              printf("MPU(%d): %s - Hello!\n",
13                      mars_task_get_kernel_id(), mars_task_get_name());
14
15              mars_task_event_flag_set(mpu_to_mpu_ea, 0x1);
16
17              return 0;
18      }

Line:6	Grab the host storage address of the event flag initialized in the host program for host to MPU communication from the task arg structure.
Line:7	Grab the host storage address of the event flag initialized in the host program for MPU to MPU communication from the task arg structure.
Lines:9-10	Wait for an event from the host program before continuing execution. Make sure to check for the proper bit set from the host program. If the event flag has not been set by the time of this call, this task will enter a wait state and its context will be switched out. When the event flag bit this task is checking for is set, this task will resume execution and continue.
Line:15	Set the event that task 2 is waiting for to allow task 2 execution to resume.

 1      #include <stdio.h>
 2      #include <mars/mars.h>
 3
 4      int mars_task_main(const struct mars_task_args *task_args)
 5      {
 6              uint64_t mpu_to_mpu_ea = task_args->type.u64[0];
 7              uint64_t mpu_to_host_ea = task_args->type.u64[1];
 8
 9              mars_task_event_flag_wait(mpu_to_mpu_ea, 0x1,
10                                      MARS_TASK_EVENT_FLAG_MASK_AND);
11
12              printf("MPU(%d): %s - Hello!\n",
13                      mars_task_get_kernel_id(), mars_task_get_name());
14
15              mars_task_event_flag_set(mpu_to_host_ea, 0x1);
16
17              return 0;
18      }

Line:6	Grab the host storage address of the event flag initialized in the host program for MPU to MPU communication from the task arg structure.
Line:7	Grab the host storage address of the event flag initialized in the host program for MPU to host communication from the task arg structure.
Lines:9-10	Wait for an event from the task 1 program before continuing execution. Make sure to check for the proper bit set from the task 1 program. If the event flag has not been set by the time of this call, this task will enter a wait state and its context will be switched out. When the event flag bit this task is checking for is set, this task will resume execution and continue.
Line:15	Set the event that the host program is waiting for to allow the host program execution to resume.

7.5 Task Queue Usage

This sample code initializes multiple task instances for task 1 program and task 2 program and initializes 3 queues.

The first queue is initialized for host to MPU communication, so the host program can send data to the task 1 program. The second queue is initialized for MPU to MPU communication, so the task 1 program can send data to the task 2 program. The third queue is initialized for MPU to host communication, so the task 2 program can send data to the host program.

First the host program initializes and schedules all task instances for execution. It then immediately begins pushing data into the host to MPU queue for task 1 program to process.

The task 1 program instances wait for data to arrive from the host and pop the data as it arrives. After popping the data, it handles some processing before pushing data into the MPU to MPU queue for task 2 program to process.

The task 2 program instances wait for data to arrive from the first task program and pop the data as it arrives. After popping the data, it handles some processing before pushing data into the MPU to host queue for the host program to receive the resulting data.

The program is completed when the host pops and receives all result data from the task 2 program.

 1      #include <stdio.h>
 2      #include <mars/mars.h>
 3
 4      #define NUM_TASKS       3
 5      #define NUM_ENTRIES     10
 6      #define QUEUE_DEPTH     (NUM_TASKS * NUM_ENTRIES)
 7
 8      struct queue_entry {
 9              char text[64];
10      };
11
12      static void *task1_program_elf_image;
13      static void *task2_program_elf_image;
14
15      static struct mars_context mars_ctx;
16      static struct mars_task_id task1_id[NUM_TASKS];
17      static struct mars_task_id task2_id[NUM_TASKS];
18      static struct mars_task_params task_params;
19      static struct mars_task_args task_args;
20      static struct mars_task_queue host_to_mpu;
21      static struct mars_task_queue mpu_to_host;
22      static struct mars_task_queue mpu_to_mpu;
23
24      int main(void)
25      {
26              struct queue_entry buffer_host_to_mpu[QUEUE_DEPTH] __attribute__((aligned(MARS_TASK_QUEUE_BUFFER_ALIGN)));
27              struct queue_entry buffer_mpu_to_host[QUEUE_DEPTH] __attribute__((aligned(MARS_TASK_QUEUE_BUFFER_ALIGN)));
28              struct queue_entry buffer_mpu_to_mpu[QUEUE_DEPTH] __attribute__((aligned(MARS_TASK_QUEUE_BUFFER_ALIGN)));
29              struct queue_entry data __attribute__((aligned(MARS_TASK_QUEUE_ENTRY_ALIGN)));
30              int i;
31
32              printf(INFO);
33
34              mars_initialize(&mars_ctx, NULL);
35
36              mars_task_queue_initialize(&mars_ctx, &host_to_mpu, &buffer_host_to_mpu,
37                                      sizeof(struct queue_entry), QUEUE_DEPTH,
38                                      MARS_TASK_QUEUE_HOST_TO_MPU);
39
40              mars_task_queue_initialize(&mars_ctx, &mpu_to_host, &buffer_mpu_to_host,
41                                      sizeof(struct queue_entry), QUEUE_DEPTH,
42                                      MARS_TASK_QUEUE_MPU_TO_HOST);
43
44              mars_task_queue_initialize(&mars_ctx, &mpu_to_mpu, &buffer_mpu_to_mpu,
45                                      sizeof(struct queue_entry), QUEUE_DEPTH,
46                                      MARS_TASK_QUEUE_MPU_TO_MPU);
47
48              for (i = 0; i < NUM_TASKS; i++) {
49                      char name[MARS_TASK_NAME_LEN_MAX];
50
51                      snprintf(name, MARS_TASK_NAME_LEN_MAX, "Task 1.%d", i + 1);
52                      task_params.name = name;
53                      task_params.elf_image = task1_program_elf_image;
54                      task_params.context_save_size = MARS_TASK_CONTEXT_SAVE_SIZE_MAX;
55                      mars_task_initialize(&mars_ctx, &task1_id[i], &task_params);
56
57                      snprintf(name, MARS_TASK_NAME_LEN_MAX, "Task 2.%d", i + 1);
58                      task_params.name = name;
59                      task_params.elf_image = task2_program_elf_image;
60                      task_params.context_save_size = MARS_TASK_CONTEXT_SAVE_SIZE_MAX;
61                      mars_task_initialize(&mars_ctx, &task2_id[i], &task_params);
62
63                      task_args.type.u64[0] = (uint64_t)(uintptr_t)&host_to_mpu;
64                      task_args.type.u64[1] = (uint64_t)(uintptr_t)&mpu_to_mpu;
65                      task_args.type.u32[4] = (uint32_t)NUM_ENTRIES;
66                      mars_task_schedule(&task1_id[i], &task_args, 0);
67
68                      task_args.type.u64[0] = (uint64_t)(uintptr_t)&mpu_to_mpu;
69                      task_args.type.u64[1] = (uint64_t)(uintptr_t)&mpu_to_host;
70                      task_args.type.u32[4] = (uint32_t)NUM_ENTRIES;
71                      mars_task_schedule(&task2_id[i], &task_args, 0);
72              }
73
74              for (i = 0; i < QUEUE_DEPTH; i++) {
75                      sprintf(data.text, "Host Data %d", i + 1);
76                      mars_task_queue_push(&host_to_mpu, &data);
77              }
78
79              for (i = 0; i < QUEUE_DEPTH; i++) {
80                      mars_task_queue_pop(&mpu_to_host, &data);
81                      printf("%s\n", data.text);
82              }
83
84              for (i = 0; i < NUM_TASKS; i++) {
85                      mars_task_wait(&task1_id[i]);
86                      mars_task_wait(&task2_id[i]);
87
88                      mars_task_finalize(&task1_id[i]);
89                      mars_task_finalize(&task2_id[i]);
90              }
91
92              mars_finalize(&mars_ctx);
93
94              return 0;
95      }

Lines:8-10	Define the data entry structure. For this sample this is a 64-byte char array.
Lines:20-22	Declare 3 instances of the task queue structure we plan to initialize.
Line:23	Declare a local instance of the task queue data entry structure.
Lines:26-28	Declare 3 instances of the buffer where the queue will store the data entries. Declare a separate buffer for each of the queue instances we will initialize. Make sure the buffer is aligned properly to MARS_TASK_QUEUE_BUFFER_ALIGN. The user is responsible for making sure enough memory is allocated for the buffer to satisfy the depth of the queue for the specified data entry size. These parameters will be specified at queue initialization.
Lines:36-46	Initialize the 3 task event flag instances. int mars_task_queue_initialize ( arg1: Pass in the pointer to the initialized MARS context. arg2: Pass in the pointer to the queue instances we declared at Lines:20-22. arg3: Pass in the pointer to the buffer instances we declared at Lines:26-28. The user is responsible for making sure the size of this buffer is at least (queue depth * queue data entry size). arg4: Pass in size of each queue data entry. The size must be a multiple of 16 and not greater than MARS_TASK_QUEUE_ENTRY_SIZE_MAX. arg5: Pass in depth of queue which is the maximum number of data entries allowed in the queue at any time. arg6: Pass in the direction of queue for each instance. The direction must be MARS_TASK_QUEUE_HOST_TO_MPU, MARS_TASK_QUEUE_MPU_TO_HOST, or MARS_TASK_QUEUE_MPU_TO_MPU. return: MARS_SUCCESS is returned on success and a negative error value otherwise. ) The first queue is initialized for host program to task program data passing. The second queue is initialized for task program to host program data passing. The third queue is initialized for task program to task program data passing.
Lines:51-61	Initialize the task params and initialize multiple task instances for both the task 1 program and task 2 program. Specify a context save area size of MARS_TASK_CONTEXT_SAVE_SIZE_MAX to allow these tasks to context switch when there is no data available to be popped from the queues.
Lines:63-66	Initialize the task args we want passed into task 1 program's mars_task_main function. Store the host storage address of the queue instances for both host to mpu and mpu to mpu communication. These queues will be used to receive data from the host program and also to send data to task 2 program. Also store the number of data entries task 1 program should expect to process. Schedule the task instance for execution, passing in the task args.
Lines:68-71	Initialize the task args we want passed into task 2 program's mars_task_main function. Store the host storage address of the queue instances for both mpu to mpu and mpu to host communication. These queues will be used to receive data from the task 1 program and also to send data to the host program. Also store the number of data entries task 2 program should expect to process. Schedule the task instance for execution, passing in the task args.
Lines:74-77	Loop to push data into the queue for task 1 program to receive. The data is a string identifying the data id number. On Line:75 initialize the data queue entry structure declared at Line:23 with some string identifying the data id. int mars_task_queue_push ( arg1: Pass in the pointer to the task instance we initialized for host to MPU communication. arg2: Pass in the pointer to the data queue entry instance we initialized at Line:75. return: MARS_SUCCESS is returned on success and a negative error value otherwise. )
Lines:79-82	Loop to pop data from the queue that task 2 program populates with the final result data. int mars_task_queue_pop ( arg1: Pass in the pointer to the task instance we initialized for host to MPU communication. arg2: Pass in the pointer to the data queue entry to store the data from the queue. return: MARS_SUCCESS is returned on success and a negative error value otherwise. ) On Line:81 print the resulting data that has been processed by task 1 program and task 2 program. The final data should be a string identifying the processing path of the data from host program, to task 1 program, to task 2 program.
Lines:84-92	Wait for task completion and finalize all task instances and finally finalize the MARS context.

 1      #include <stdio.h>
 2      #include <mars/mars.h>
 3
 4      struct queue_entry {
 5              char text[64];
 6      };
 7
 8      int mars_task_main(const struct mars_task_args *task_args)
 9      {
10              int i;
11              uint64_t host_to_mpu_ea = task_args->type.u64[0];
12              uint64_t mpu_to_mpu_ea = task_args->type.u64[1];
13              uint32_t num_entries = task_args->type.u32[4];
14              struct queue_entry data __attribute__((aligned(MARS_TASK_QUEUE_ENTRY_ALIGN)));
15
16              for (i = 0; i < num_entries; i++) {
17                      mars_task_queue_pop(host_to_mpu_ea, &data);
18
19                      sprintf(&data.text[strlen(data.text)], " -> %s Data %d",
20                              mars_task_get_name(), i + 1);
21
22                      mars_task_queue_push(mpu_to_mpu_ea, &data);
23              }
24
25              return 0;
26      }

Lines:4-6	Define the data entry structure. For this sample this is a 64-byte char array. This is a redefinition of host program Lines:8-10.
Line:11	Grab the host storage address of the queue initialized in the host program for host to MPU communication from the task arg structure.
Line:12	Grab the host storage address of the queue initialized in the host program for MPU to MPU communication from the task arg structure.
Line:13	Grab the number of entries this task needs to pop from the queue and process.
Line:14	Declare a local instance of the task queue data entry structure.
Line:16	Loop the number of data entries this task needs to processed specified by the task_args specified at Line:12.
Line:17	Pop data from the queue being sent from the host program to be processed. If the queue is empty by the time of this call, this task will enter a wait state and its context will be switched out. When the host program pushes new data into the queue and is available to be popped by this task, this task will resume execution and continue.
Lines:19-20	Take the data string received from the host program and append a string identifier for this task.
Line:22	Push the processed data into the queue for the task 2 program to receive.

 1      #include <stdio.h>
 2      #include <mars/mars.h>
 3
 4      struct queue_entry {
 5              char text[64];
 6      };
 7
 8      int mars_task_main(const struct mars_task_args *task_args)
 9      {
10              int i;
11              uint64_t mpu_to_mpu_ea = task_args->type.u64[0];
12              uint64_t mpu_to_host_ea = task_args->type.u64[1];
13              uint32_t num_entries = task_args->type.u32[4];
14              struct queue_entry data __attribute__((aligned(MARS_TASK_QUEUE_ENTRY_ALIGN)));
15
16              for (i = 0; i < num_entries; i++) {
17                      mars_task_queue_pop(mpu_to_mpu_ea, &data);
18
19                      sprintf(&data.text[strlen(data.text)], " -> %s Data %d",
20                              mars_task_get_name(), i + 1);
21
22                      mars_task_queue_push(mpu_to_host_ea, &data);
23              }
24
25              return 0;
26      }

Lines:4-6	Define the data entry structure. For this sample this is a 64-byte char array. This is a redefinition of host program Lines:8-10.
Line:11	Grab the host storage address of the queue initialized in the host program for MPU to MPU communication from the task arg structure.
Line:12	Grab the host storage address of the queue initialized in the host program for MPU to host communication from the task arg structure.
Line:13	Grab the number of entries this task needs to pop from the queue and process.
Line:14	Declare a local instance of the task queue data entry structure.
Line:16	Loop the number of data entries this task needs to processed specified by the task_args specified at Line:12.
Line:17	Pop data from the queue being sent from the task 1 program to be processed. If the queue is empty by the time of this call, this task will enter a wait state and its context will be switched out. When the task 1 program pushes new data into the queue and is available to be popped by this task, this task will resume execution and continue.
Lines:19-20	Take the data string received from the task 1 program and append a string identifier for this task.
Line:22	Push the processed data into the queue for the host program to receive.

7.6 Task Semaphore Usage

This sample code creates 10 task instances of the same task program and initializes a single semaphore to protect access of a shared resource integer counter located in main storage. As each task runs, it tries to obtain the semaphore and increments the shared resource counter before releasing the semaphore. Since the shared resource is protected from concurrent accesses, the resulting value of the counter should equal to the number of total tasks, d, when the program has completed.

 1      #include <stdio.h>
 2      #include <mars/mars.h>
 3
 4      #define NUM_TASKS       10
 5
 6      static void *task_program_elf_image;
 7
 8      static struct mars_context mars_ctx;
 9      static struct mars_task_id task_id[NUM_TASKS];
10      static struct mars_task_params task_params;
11      static struct mars_task_args task_args;
12      static struct mars_task_semaphore semaphore;
13
14      int main(void)
15      {
16              uint32_t shared_resource __attribute__((aligned(16)));
17              int i;
18
19              mars_initialize(&mars_ctx, NULL);
20
21              mars_task_semaphore_initialize(&mars_ctx, &semaphore, 1);
22
23              shared_resource = 0;
24
25              printf("HOST  : Main() - Shared Resource Counter = %d\n",
26                      shared_resource);
27
28              for (i = 0; i < NUM_TASKS; i++) {
29                      char name[16];
30                      sprintf(name, "Task %d", i);
31
32                      task_params.name = name;
33                      task_params.elf_image = task_program_elf_image;
34                      task_params.context_save_size = MARS_TASK_CONTEXT_SAVE_SIZE_MAX;
35                      mars_task_initialize(&mars_ctx, &task_id[i], &task_params);
36
37                      task_args.type.u64[0] = (uint64_t)(uintptr_t)&semaphore;
38                      task_args.type.u64[1] = (uint64_t)(uintptr_t)&shared_resource;
39                      mars_task_schedule(&task_id[i], &task_args, 0);
40              }
41
42              for (i = 0; i < NUM_TASKS; i++) {
43                      mars_task_wait(&task_id[i]);
44                      mars_task_finalize(&task_id[i]);
45              }
46
47              printf("HOST  : Main() - Shared Resource Counter = %d\n",
48                      shared_resource);
49
50              mars_finalize(&mars_ctx);
51
52              return 0;
53      }

Line:9	Declare an array of 10 task ids for each instance of task 1 program we plan to initialize and schedule.
Line:12	Declare an instance of the task semaphore structure we plan to initialize.
Line:16	Declare an instance of a shared resource counter we plan to modify from various tasks.
Line:21	Initialize the task semaphore instance. int mars_task_semaphore_initialize ( arg1: Pass in the pointer to the initialized MARS context. arg2: Pass in the pointer to the sempahore instances we declared at Line:12. arg3: Pass in the total number of simultaneous task accesses allowed. return: MARS_SUCCESS is returned on success and a negative error value otherwise. )
Line:23	Initialize the shared resource counter to 0.
Lines:25-26	Print the current value of the shared resource counter to stdout.
Lines:32-35	Initialize the task params and initialize 10 task instances of task 1 program. Specify a context save area size of MARS_TASK_CONTEXT_SAVE_SIZE_MAX to allow these tasks to context switch.
Lines:37-39	Initialize the task args we want passed into task 1 program's mars_task_main function. Store the host storage address of the semaphore instance. Also store the host storage address of the shared resource instance, so that each task can modify it.
Lines:56-62	Wait for completion and finalize all task instances.
Lines:47-48	Print the current value of the shared resource counter to stdout. Since each one of 10 tasks should have incremented the shared resource counter one time with no simultaneous access allowed, the resulting shared resource counter should equal the number of tasks of 10.
Line:50	Finalize the MARS context.

 1      #include <mars/mars.h>
 2
 3      int mars_task_main(const struct mars_task_args *task_args)
 4      {
 5              uint64_t semaphore_ea = task_args->type.u64[0];
 6              uint64_t shared_resource_ea = task_args->type.u64[1];
 7              uint32_t shared_resource __attribute__((aligned(16)));
 8
 9              mars_task_semaphore_acquire(semaphore_ea);
10
11              get(&shared_resource, shared_resource_ea, sizeof(uint32_t));
12
13              shared_resource++;
14
15              put(&shared_resource, shared_resource_ea, sizeof(uint32_t));
16
17              mars_task_semaphore_release(semaphore_ea);
18
19              return 0;
20      }

Line:5	Grab the host storage address of the semaphore initialized in the host program from the task arg structure.
Line:6	Grab the host storage address of the shared resource counter declared in the host program.
Line:7	Declare a local instance of the shared resource counter.
Line:9	Attempt to acquire access to the semaphore. int mars_task_semaphore_acquire ( arg1: Pass in the host storage address of the sempahore instance initialized at Line:5. return: MARS_SUCCESS is returned on success and a negative error value otherwise. ) If the semaphore cannot be acquired at the time of this call, this task will enter a wait state and its context will be switched out. When the semaphore is released by another task and available for this task to acquire, this task will resume execution and continue.
Line:11	Memory transfer from host storage to MPU storage the shared resource counter instance. The function "get" shown here is a generic place holder for the platform specific function to do the memory transfer. Please refer to your platform specific API to learn how to do the memory transfer from host storage to MPU storage on your specific platform.
Line:13	Increment the shared resource counter. Since the shared resource is proteced by the semaphore, it is guaranteed that no other tasks have access to the same shared resource during the time this task holds the semaphore.
Line:15	Memory transfer from MPU storage to host storage the modified shared resource counter instance. The function "put" shown here is a generic place holder for the platform specific function to do the memory transfer. Please refer to your platform specific API to learn how to do the memory transfer from MPU storage to host storage on your specific platform.
Line:17	Release the access to the semaphore. int mars_task_semaphore_release ( arg1: Pass in the host storage address of the sempahore instance. return: MARS_SUCCESS is returned on success and a negative error value otherwise. )

7.7 Task Signal Usage

This sample code creates 2 separate task instances. Task 1 can only begin processing after the host program has waited 1 second and signals for task 1 to begin. Task 2 can only begin after task 1 has completed its processing and signals for task 2 to begin. Task 1 must also wait to receive a signal back from task 2 notifying that it has finished processing before it itself can finish execution. The host program waits for completion of both tasks before finishing.

 1      #include <unistd.h>
 2      #include <mars/mars.h>
 3
 4      static void *task1_program_elf_image;
 5      static void *task2_program_elf_image;
 6
 7      static struct mars_context mars_ctx;
 8      static struct mars_task_id task1_id;
 9      static struct mars_task_id task2_id;
10      static struct mars_task_params task_params;
11      static struct mars_task_args task_args;
12
13      int main(void)
14      {
15              mars_initialize(&mars_ctx, NULL);
16
17              task_params.name = "Task 1";
18              task_params.elf_image = task1_program_elf_image;
19              task_params.context_save_size = MARS_TASK_CONTEXT_SAVE_SIZE_MAX;
20              mars_task_initialize(&mars_ctx, &task1_id, &task_params);
21
22              task_params.name = "Task 2";
23              task_params.elf_image = task2_program_elf_image;
24              task_params.context_save_size = MARS_TASK_CONTEXT_SAVE_SIZE_MAX;
25              mars_task_initialize(&mars_ctx, &task2_id, &task_params);
26
27              task_args.type.u64[0] = (uint64_t)(uintptr_t)&task2_id;
28              mars_task_schedule(&task1_id, &task_args, 0);
29
30              task_args.type.u64[0] = (uint64_t)(uintptr_t)&task1_id;
31              mars_task_schedule(&task2_id, &task_args, 0);
32
33              sleep(1);
34              mars_task_signal_send(&task1_id);
35
36              mars_task_wait(&task1_id);
37              mars_task_wait(&task2_id);
38
39              mars_task_finalize(&task1_id);
40              mars_task_finalize(&task2_id);
41
42              mars_finalize(&mars_ctx);
43
44              return 0;
45      }

Lines:17-25	Initialize task instances for task 1 program and task 2 program each with context save areas.
Lines:27-28	Initialize the task args we want passed into task 1 program's mars_task_main function. Store the host storage address of the task id structure of task 2. Schedule task 1 for execution.
Lines:30-31	Initialize the task args we want passed into task 2 program's mars_task_main function. Store the host storage address of the task id structure of task 1. Schedule task 2 for execution.
Line:33	Sleep for 1 second before continuing. This allows enough time for the tasks to be scheduled and begin execution. This is only to demonstrate the task entering the wait state when waiting for a signal.
Line:34	Send a signal to task 1 that is waiting for a signal to allow it to continue execution. int mars_task_signal_send ( arg1: Pass in the pointer to the task id instance of the intialized task we want to signal. return: MARS_SUCCESS is returned on success and a negative error value otherwise. )
Lines:36-42	Wait for completion and finalize the 2 task instances and finally finalize the MARS context.

 1      #include <stdio.h>
 2      #include <mars/mars.h>
 3
 4      int mars_task_main(const struct mars_task_args *task_args)
 5      {
 6              struct mars_task_id task2_id;
 7
 8              get(&task2_id, task_args->type.u64[0], sizeof(struct mars_task_id));
 9
10              mars_task_signal_wait();
11
12              printf("MPU(%d): %s - Hello!\n",
13                      mars_task_get_kernel_id(), mars_task_get_name());
14
15              mars_task_signal_send(&task2_id);
16
17              mars_task_signal_wait();
18
19              return 0;
20      }

Line:6	Declare a local task id instance to store the task 2's id.
Line:8	Memory transfer from host storage to MPU storage the task id instance of task 2. The host storage address of the task id for task 2 is obtained from the task_args passed in from the host program. The function "get" shown here is a generic place holder for the platform specific function to do the memory transfer. Please refer to your platform specific API to learn how to do the memory transfer from host storage to MPU storage on your specific platform.
Line:10	Wait for a signal from the host program before continuing execution. int mars_task_signal_wait ( return: MARS_SUCCESS is returned on success and a negative error value otherwise. ) If a signal has not been set by the time of this call, this task will enter a wait state and its context will be switched out. When the task receives a signal, this task will resume execution and continue.
Line:15	Send a signal to task 2 that is waiting for a signal to allow task 2 execution to resume. int mars_task_signal_send ( arg1:: Address of the local task id instance of task 2 initialized at Line:8. return: MARS_SUCCESS is returned on success and a negative error value otherwise. )
Lines:9-10	Wait for a signal from the task 2 program before continuing execution.

 1      #include <stdio.h>
 2      #include <mars/mars.h>
 3
 4      int mars_task_main(const struct mars_task_args *task_args)
 5      {
 6              struct mars_task_id task1_id;
 7
 8              get(&task1_id, task_args->type.u64[0], sizeof(struct mars_task_id));
 9
10              mars_task_signal_wait();
11
12              printf("MPU(%d): %s - Hello!\n",
13                      mars_task_get_kernel_id(), mars_task_get_name());
14
15              mars_task_signal_send(&task1_id);
16
17              return 0;
18      }

Line:6	Declare a local task id instance to store the task 1's id.
Line:8	Memory transfer from host storage to MPU storage the task id instance of task 1. The host storage address of the task id for task 1 is obtained from the task_args passed in from the host program. The function "get" shown here is a generic place holder for the platform specific function to do the memory transfer. Please refer to your platform specific API to learn how to do the memory transfer from host storage to MPU storage on your specific platform.
Line:10	Wait for a signal from the host program before continuing execution. If a signal has not been set by the time of this call, this task will enter a wait state and its context will be switched out. When the task receives a signal, this task will resume execution and continue.
Line:15	Send a signal to task 1 that is waiting for a signal to allow task 1 execution to resume.

7.8 Task Grayscale Program

In this program, the data partitioning process of the input image is handled by the MARS main task 1 program, and the actual grayscale conversion process is handled by multiple instances of the MARS sub task 2 program. As a result, the major processes of grayscale conversion processing can be executed all on the MPUs. The following describes detailed processing of each program.

1. Initialize a MARS context.
2. Initialize both the main and sub tasks.
3. Initialize a task queue and task event flag to be used for communication between the tasks.
4. Schedule the main task for execution.
5. Wait for completion of the main task.
6. Finalize the MARS context.

To create the main task, the host program passes the following information to the main task:

1. parameters for grayscale conversion processing (effective addresses and number of pixels of input/output buffers)
2. host addresses of the initialized sub tasks
3. host addresses of synchronization objects (queue and event flag) to be used for communication between the tasks

1. Retrieves data to be passed from host program to sub tasks.
2. Schedule instances of sub tasks for execution.
3. Partition grayscale conversion processing.
4. Insert parameters for partitioned processing to task queues.
5. Wait for completion of sub tasks using task event flag.

Only the host address of the task queue is passed from the main task to the sub tasks through the task arguments. Other information is passed to the sub tasks through the task queue.

The main task and sub tasks pass the following parameters for the partitioned grayscale conversion processing through the task queue:

1. host addresses of task event flag
2. host addresses of partitioned input data
3. host addresses of partitioned output data
4. number of pixels of partitioned input/output data
5. identification numbers to be used for sending completion notification of partitioned data

1. Get parameters for processing partitioned by main task from the task queue.
2. Execute grayscale conversion processing.
3. Send completion notification to main task using task event flag.

By using MARS, the MPUs can perform all the processing necessary except for the initialization of the MARS execution environment, and efficient applications for MPU-centric program execution and control can be created.

In this tutorial program, MARS instances are processed in the function rgb2y() in the host program so that readers can easily understand the program. However, this method is not generally recommended because if the function is frequently called (such as when multiple images are processed in an application), the MARS isntances are initialized every time the function is called and becomes very inefficient. Ideally, programs should be designed so that MARS instances are needed to be initialized only once.

  1     #include <stdio.h>
  2     #include <stdlib.h>
  3     #include <string.h>
  4     #include <malloc.h>
  5     #include <sys/stat.h>
  6     #include <libspe2.h>
  7     #include <mars/mars.h>
  8
  9     #define IN_FILENAME     "in.ppm"
 10     #define OUT_FILENAME    "out.ppm"
 11     #define PPM_MAGIC       "P6"
 12
 13     #define NUM_TASKS       4
 14     #define QUEUE_DEPTH     4
 15
 16     typedef struct _image_t {
 17             int width;
 18             int height;
 19             unsigned char *src;
 20             unsigned char *dst;
 21     } image_t;
 22
 23     typedef struct {
 24             uint64_t ea_task_id;
 25             uint64_t ea_event;
 26             uint64_t ea_queue;
 27             uint64_t ea_src;
 28             uint64_t ea_dst;
 29             uint32_t num;
 30             uint32_t pad;
 31     } grayscale_params_t;
 32
 33     typedef struct {
 34             uint64_t ea_event;
 35             uint64_t ea_src;
 36             uint64_t ea_dst;
 37             uint32_t num;
 38             uint32_t id;
 39     } grayscale_queue_elem_t;
 40
 41     extern struct spe_program_handle task1_spe_prog;
 42     extern struct spe_program_handle task2_spe_prog;
 43
 44     static struct mars_context mars_ctx;
 45     static struct mars_task_id task1_id;
 46     static struct mars_task_id task2_id[NUM_TASKS];
 47     static struct mars_task_params task_params;
 48     static struct mars_task_args task_args;
 49     static struct mars_task_event_flag event;
 50     static struct mars_task_queue queue;
 51
 52     static grayscale_params_t grayscale_params __attribute__((aligned(16)));
 53     static grayscale_queue_elem_t queue_buffer[QUEUE_DEPTH] __attribute__((aligned(16)));
 54
 55     /* initialize MARS execution environment for rgb2y processing */
 56     void rgb2y(unsigned char *src, unsigned char *dst, int num)
 57     {
 58             int ret, i;
 59
 60             ret = mars_initialize(&mars_ctx, NULL);
 61             if (ret) {
 62                     printf("Could not initialize MARS context! (%d)\n", ret);
 63                     exit(1);
 64             }
 65
 66             ret = mars_task_event_flag_initialize(&mars_ctx, &event,
 67                                             MARS_TASK_EVENT_FLAG_MPU_TO_MPU,
 68                                             MARS_TASK_EVENT_FLAG_CLEAR_AUTO);
 69             if (ret) {
 70                     printf("Could not initialize MARS task event flag! (%d)\n", ret);
 71                     exit(1);
 72             }
 73
 74             ret = mars_task_queue_initialize(&mars_ctx, &queue, &queue_buffer,
 75                                             sizeof(grayscale_queue_elem_t),
 76                                             QUEUE_DEPTH,
 77                                             MARS_TASK_QUEUE_MPU_TO_MPU);
 78             if (ret) {
 79                     printf("Could not initialize MARS task queue! (%d)\n", ret);
 80                     exit(1);
 81             }
 82
 83             task_params.name = "Grayscale Main Task";
 84             task_params.elf_image = task1_spe_prog.elf_image;
 85             task_params.context_save_size = MARS_TASK_CONTEXT_SAVE_SIZE_MAX;
 86             ret = mars_task_initialize(&mars_ctx, &task1_id, &task_params);
 87             if (ret) {
 88                     printf("Could not initialize MARS main task! (%d)\n", ret);
 89                     exit(1);
 90             }
 91
 92             task_params.name = "Grayscale Sub Task";
 93             task_params.elf_image = task2_spe_prog.elf_image;
 94             task_params.context_save_size = MARS_TASK_CONTEXT_SAVE_SIZE_MAX;
 95
 96             for (i = 0; i < NUM_TASKS; i++) {
 97                     ret = mars_task_initialize(&mars_ctx, &task2_id[i], &task_params);
 98                     if (ret) {
 99                             printf("Could not initialize MARS sub task! (%d)\n", ret);
100                             exit(1);
101                     }
102             }
103
104             grayscale_params.ea_task_id = (uint64_t)(uintptr_t)&task2_id;
105             grayscale_params.ea_event   = (uint64_t)(uintptr_t)&event;
106             grayscale_params.ea_queue   = (uint64_t)(uintptr_t)&queue;
107             grayscale_params.ea_src     = (uint64_t)(uintptr_t)src;
108             grayscale_params.ea_dst     = (uint64_t)(uintptr_t)dst;
109             grayscale_params.num        = num;
110             task_args.type.u64[0] = (uint64_t)(uintptr_t)&grayscale_params;
111
112             ret = mars_task_schedule(&task1_id, &task_args, 0);
113             if (ret) {
114                     printf("Could not schedule MARS main task! (%d)\n", ret);
115                     exit(1);
116             }
117
118             ret = mars_task_wait(&task1_id);
119             if (ret) {
120                     printf("Could not wait for MARS main task! (%d)\n", ret);
121                     exit(1);
122             }
123
124             ret = mars_task_finalize(&task1_id);
125             if (ret) {
126                     printf("Could not finalize MARS main task! (%d)\n", ret);
127                     exit(1);
128             }
129
130             for (i = 0; i < NUM_TASKS; i++) {
131                     ret = mars_task_finalize(&task2_id[i]);
132                     if (ret) {
133                             printf("Could not finalize MARS sub task! (%d)\n", ret);
134                             exit(1);
135                     }
136             }
137
138             ret = mars_finalize(&mars_ctx);
139             if (ret) {
140                     printf("Could not finalize MARS context! (%d)\n", ret);
141                     exit(1);
142             }
143     }
144
145     /* read ppm data from input file */
146     void read_ppm(image_t *img, char *fname)
147     {
148             char *token, *pc, *buf, *del = " \t\n";
149             int i, w, h, luma, pixs, filesize;
150             struct stat st;
151             unsigned char *dot;
152             FILE *fp;
153
154             /* read raw data */
155             stat(fname, &st);
156             filesize = (int) st.st_size;
157             buf = (char *) malloc(filesize * sizeof(char));
158
159             if ((fp = fopen(fname, "r")) == NULL) {
160                     fprintf(stderr, "error: failed to open file %s\n", fname);
161                     exit(1);
162             }
163
164             fseek(fp, 0, SEEK_SET);
165             fread(buf, filesize * sizeof(char), 1, fp);
166             fclose(fp);
167
168             /* validate file format */
169             token = (char *) (unsigned long) strtok(buf, del);
170             if (strncmp(token, PPM_MAGIC, 2) != 0) {
171                     fprintf(stderr, "error: invalid file format\n");
172                     exit(1);
173             }
174
175             /* skip comments */
176             token = (char *) (unsigned long) strtok(NULL, del);
177             if (token[0] == '#') {
178                     token = (char *) (unsigned long) strtok(NULL, "\n");
179                     token = (char *) (unsigned long) strtok(NULL, del);
180             }
181
182             /* read picture size (and luma) */
183             w = strtoul(token, &pc, 10);
184             token = (char *) (unsigned long) strtok(NULL, del);
185             h = strtoul(token, &pc, 10);
186             token = (char *) (unsigned long) strtok(NULL, del);
187             luma = strtoul(token, &pc, 10);
188
189             img->width = w;
190             img->height = h;
191
192             /* allocate an aligned memory */
193             pixs = w * h;
194             img->src = (unsigned char *)memalign(16, pixs*4);
195             img->dst = (unsigned char *)memalign(16, pixs*4);
196
197             /* read rgb data with 'r,g,b,0' formatted */
198             dot = img->src;
199             pc++;
200             for (i = 0; i < pixs*4; i++) {
201                     if (i % 4 == 3) {
202                             *dot++ = 0;
203                     } else {
204                             *dot++ = *pc++;
205                     }
206             }
207
208             return;
209     }
210
211     /* write ppm data to output file */
212     void write_ppm(image_t *img, char *fname)
213     {
214             int i;
215             int w = img->width;
216             int h = img->height;
217             unsigned char *dot = img->dst;
218             FILE *fp;
219
220             if ((fp = fopen(fname, "wb+")) == NULL) {
221                     fprintf(stderr, "failed to open file %s\n", fname);
222                     exit(1);
223             }
224
225             fprintf(fp, "%s\n", PPM_MAGIC);
226             fprintf(fp, "%d %d\n", w, h);
227             fprintf(fp, "255\n");
228
229             for (i = 0; i < (w * h * 4); i++) {
230                     if (i % 4 == 3) {
231                             dot++;
232                     } else {
233                             putc((int) *dot++, fp);
234                     }
235             }
236
237             fclose(fp);
238
239             return;
240     }
241
242     void delete_image(image_t *img)
243     {
244             free(img->src);
245             free(img->dst);
246
247             return;
248     }
249
250     int main(int argc, char **argv)
251     {
252             image_t image;
253
254             printf(INFO);
255
256             read_ppm(&image, IN_FILENAME);
257
258             rgb2y(image.src, image.dst, image.width * image.height);
259
260             write_ppm(&image, OUT_FILENAME);
261
262             delete_image(&image);
263
264             return 0;
265     }

Line:9	Filename of input source image.
Line:10	Filename of output source image.
Line:13	Define the number of the sub task 2 programs as a constant NUM_TASKS. In this tutorial, 4 instances of the sub task are created to allocate the grayscale conversion processing to each.
Line:14	Define the depth of the task queue as a constant QUEUE_DEPTH. In this tutorial, the depth of the task queue is set to 4 in accordance with the number of the sub task instances.
Lines:16-21	Define the structure to store the image information.
Lines:23-31	Define the structure of the parameter set for storing the information to be passed into the main task 1 program.
Lines:33-39	Define the structure for the task queue data element. Each entry in the task queue will be an instance of this structure. The size of this structure must be a multiple of 16 bytes.
Line:52	Declare an instance of the structure we defined at Lines:23-31 for passing parameters to the main task.
Line:53	Declare an instance of the buffer to provide to the task queue instance as the data storage area.
Line:56	This function handles the gray scale processing of input image data buffer and outputs the results to the destination buffer.
Lines:60-64	Initialize the MARS context.
Lines:66-72	Initialize the task event flag instance for MPU to MPU communication. This will be used by the sub tasks instances to notify the main task that their portion of grayscale processing is completed.
Lines:74-81	Initialize the task queue instance for MPU to MPU communication. This will be used by the main task to send grayscale processing requests to the sub tasks.
Lines:83-90	Initialize the task params and task for the main task 1 program. Specify a context save area size of MARS_TASK_CONTEXT_SAVE_SIZE_MAX to allow the main task to context switch.
Lines:92-102	Initialize the task params and initialize multiple instances for the sub task 2 program. Specify a context save area size of MARS_TASK_CONTEXT_SAVE_SIZE_MAX to allow the sub tasks to context switch.
Lines:104-110	Initialize the parameters for grayscale conversion processing in the parameter structure declared at Line:52. The parameters stored in this structure are the host addresses of the task ids of the initialized sub tasks, task event flag and task queue, storage areas for input/output image data, and total number of pixels of image data. The host address of this structure is passed to the main task using the task argument for the main task.
Lines:112-116	Schedule the main task for execution.
Lines:118-122	Wait for the main task to complete execution.
Lines:124-128	Finalize the main task instance.
Lines:130-136	Finalize the sub task instances.
Lines:138-143	Finalize the MARS context.
Lines:146-209	This function reads the input source image file from Line:9 and stores the image data into the structure defined at Lines:16-21.
Lines:212-240	This function writes the output grayscaled image data to the output image file from Line:10.
Lines:242-248	This function cleans up an instance of the image data structure.
Lines:250-263	This is the entry function of the host program that does the following: 1. Read the input image data from input image file. 2. Process the grayscale conversion of input image data. 3. Write the output image data to output image file. 4. Cleanup the image data instance.

  1     #include <stdio.h>
  2     #include <stdint.h>
  3     #include <spu_intrinsics.h>
  4     #include <spu_mfcio.h>
  5     #include <mars/mars.h>
  6
  7     #define NUM_TASKS       4
  8
  9     #define ALIGN4_UP(x)    (((x) + 0x3) & ~0x3)
 10
 11     typedef struct {
 12             uint64_t ea_task_id;
 13             uint64_t ea_event;
 14             uint64_t ea_queue;
 15             uint64_t ea_src;
 16             uint64_t ea_dst;
 17             uint32_t num;
 18             uint32_t pad;
 19     } grayscale_params_t;
 20
 21     typedef struct {
 22             uint64_t ea_event;
 23             uint64_t ea_src;
 24             uint64_t ea_dst;
 25             uint32_t num;
 26             uint32_t id;
 27     } grayscale_queue_elem_t;
 28
 29     static struct mars_task_id task2_id[NUM_TASKS];
 30     static struct mars_task_args task2_args;
 31
 32     static grayscale_params_t grayscale_params __attribute__((aligned(16)));
 33     static grayscale_queue_elem_t data __attribute__((aligned(16)));
 34
 35     int mars_task_main(const struct mars_task_args *task_args)
 36     {
 37             int ret, i, tag = 0;
 38             int num, remain, chunk;
 39             uint64_t ea_task_id, ea_event, ea_queue;
 40             uint64_t ea_src, ea_dst;
 41             uint16_t mask = 0;
 42
 43             /* Get application parameters */
 44             mfc_get(&grayscale_params, task_args->type.u64[0], sizeof(grayscale_params_t), tag, 0, 0);
 45             mfc_write_tag_mask(1 << tag);
 46             mfc_read_tag_status_all();
 47
 48             ea_task_id = grayscale_params.ea_task_id;
 49             ea_event   = grayscale_params.ea_event;
 50             ea_queue   = grayscale_params.ea_queue;
 51             ea_src     = grayscale_params.ea_src;
 52             ea_dst     = grayscale_params.ea_dst;
 53             num        = grayscale_params.num;
 54
 55             /* Get sub task ids */
 56             mfc_get(&task2_id, ea_task_id, sizeof(struct mars_task_id) * NUM_TASKS, tag, 0, 0);
 57             mfc_write_tag_mask(1 << tag);
 58             mfc_read_tag_status_all();
 59
 60             /* Pass queue ea to sub task args */
 61             task2_args.type.u64[0] = ea_queue;
 62
 63             /* Schedule sub tasks for execution */
 64             for (i = 0; i < NUM_TASKS; i++) {
 65                     ret = mars_task_schedule(&task2_id[i], &task2_args, 0);
 66                     if (ret) {
 67                             printf("Could not schedule MARS sub task! (%d)\n", ret);
 68                             return 1;
 69                     }
 70             }
 71
 72             remain = num;
 73             chunk = num/NUM_TASKS;
 74             for (i = 0; i < NUM_TASKS; i++) {
 75                     data.ea_event = ea_event;
 76                     data.ea_src   = ea_src;
 77                     data.ea_dst   = ea_dst;
 78                     data.id       = i;
 79                     if (remain > chunk) {
 80                             data.num = ALIGN4_UP(chunk);
 81                     } else {
 82                             data.num = ALIGN4_UP(remain);
 83                     }
 84
 85                     /* Push data to queue */
 86                     ret = mars_task_queue_push_begin(ea_queue, &data, tag);
 87                     if (ret) {
 88                             printf("Could not push data to MARS task queue! (%d)\n", ret);
 89                             return 1;
 90                     }
 91                     ret = mars_task_queue_push_end(ea_queue, tag);
 92                     if (ret) {
 93                             printf("Could not complete data push to MARS task queue! (%d)\n", ret);
 94                             return 1;
 95                     }
 96
 97                     remain -= chunk;
 98                     ea_src += (chunk * 4);
 99                     ea_dst += (chunk * 4);
100
101                     /* Create event mask */
102                     mask |= 1 << i;
103             }
104
105             /* Wait until specified bits are set to event flag */
106             ret = mars_task_event_flag_wait(ea_event, mask, MARS_TASK_EVENT_FLAG_MASK_AND);
107             if (ret) {
108                     printf("Could not wait for MARS task event flag! (%d)\n", ret);
109                     return 1;
110             }
111
112             /* Wait for all scheduled sub tasks to complete */
113             for (i = 0; i < NUM_TASKS; i++) {
114                     ret = mars_task_wait(&task2_id[i]);
115                     if (ret) {
116                             printf("Could not wait for MARS sub task! (%d)\n", ret);
117                             return 1;
118                     }
119             }
120
121             return 0;
122     }

Lines:11-19	Define the number of the sub task 2 programs that need to be scheduled for execution. This number should be the same as the one specified in the host program at Line:13.
Lines:11-19	Define the structure for the parameters passed in from the host program. This is a redefinition of the same structure defined in the host program at Lines:23-31.
Lines:21-27	Define the structure for the task queue entry data. This is a redefinition of the same structure defined in the host program at Lines:33-38.
Lines:29-30	Declare an array of task ids and an instance of a task arg structure that will be passed into the sub task.
Lines:32-33	Declare an instance of the parameter structure to be passed in from the host program and an instance of the task queue data entry structure.
Lines:44-46	Memory transfer the grayscale parameter structure from the host storage address specified in the task args sent from the host program.
Lines:48-53	Initialize the local variables with the parameters from the host program.
Lines:56-58	Memory transfer the array of sub task ids from the host storage address specified in the task args sent from the host program.
Line:61	Initialize the task args to pass into the sub task and give it the host address of the task queue.
Lines:64-70	Schedule all the instances of the sub tasks for execution.
Lines:72-103	Partition the source image data evenly to each of the multiple sub task instances. Push the partitioned data into the task queue so that each sub task can pop it and begin processing. The parameters for the partitioned data indicate the host addresses and the number of pixels of the partitioned input/output data and the host addresses of task event flag and the identification numbers of each sub task.
Lines:106-110	Wait for the task event flag event that notifies when all sub tasks have completed their processing. The main task will enter a wait state until the event is received. Note: In this tutorial, the task event flag is used only for example purposes. In this tutorial, the task event flag is only used to notify the main task of sub task completion. Since the main task waits for completion of all sub tasks immediately after waiting for the task event flag, the waiting of the task event flag is not actually necessary.
Lines:113-119	Wait for completion of all sub tasks.

  1     #include <stdio.h>
  2     #include <stdint.h>
  3     #include <spu_intrinsics.h>
  4     #include <spu_mfcio.h>
  5     #include <mars/mars.h>
  6
  7     #define MAX_BUFSIZE     (16 << 10)
  8
  9     typedef struct {
 10             uint64_t ea_event;
 11             uint64_t ea_src;
 12             uint64_t ea_dst;
 13             uint32_t num;
 14             uint32_t id;
 15     } grayscale_queue_elem_t;
 16
 17     static unsigned char src_spe[MAX_BUFSIZE] __attribute__((aligned(128)));
 18     static unsigned char dst_spe[MAX_BUFSIZE] __attribute__((aligned(128)));
 19
 20     static grayscale_queue_elem_t data __attribute__((aligned(16)));
 21
 22     void rgb2y(unsigned char *src, unsigned char *dst, int num)
 23     {
 24             int i;
 25
 26             __vector unsigned char *vsrc = (__vector unsigned char *) src;
 27             __vector unsigned char *vdst = (__vector unsigned char *) dst;
 28
 29             __vector unsigned int vr, vg, vb, vy, vpat;
 30             __vector float vfr, vfg, vfb, vfy;
 31
 32             __vector float vrconst = spu_splats(0.29891f);
 33             __vector float vgconst = spu_splats(0.58661f);
 34             __vector float vbconst = spu_splats(0.11448f);
 35             __vector float vfzero = spu_splats(0.0f);
 36             __vector unsigned int vmax = spu_splats((unsigned int) 255);
 37
 38             __vector unsigned char vpatr = (__vector unsigned char) { 0x10, 0x10, 0x10, 0x00,
 39                                                                     0x10, 0x10, 0x10, 0x04,
 40                                                                     0x10, 0x10, 0x10, 0x08,
 41                                                                     0x10, 0x10, 0x10, 0x0c };
 42             __vector unsigned char vpatg = (__vector unsigned char) { 0x10, 0x10, 0x10, 0x01,
 43                                                                     0x10, 0x10, 0x10, 0x05,
 44                                                                     0x10, 0x10, 0x10, 0x09,
 45                                                                     0x10, 0x10, 0x10, 0x0d };
 46             __vector unsigned char vpatb = (__vector unsigned char) { 0x10, 0x10, 0x10, 0x02,
 47                                                                     0x10, 0x10, 0x10, 0x06,
 48                                                                     0x10, 0x10, 0x10, 0x0a,
 49                                                                     0x10, 0x10, 0x10, 0x0e };
 50             __vector unsigned char vpaty = (__vector unsigned char) { 0x03, 0x03, 0x03, 0x10,
 51                                                                     0x07, 0x07, 0x07, 0x10,
 52                                                                     0x0b, 0x0b, 0x0b, 0x10,
 53                                                                     0x0f, 0x0f, 0x0f, 0x10 };
 54             __vector unsigned char vzero = spu_splats((unsigned char) 0);
 55
 56             for (i = 0; i < num/4; i++) {
 57                     vr = (__vector unsigned int) spu_shuffle(vsrc[i], vzero, vpatr);
 58                     vg = (__vector unsigned int) spu_shuffle(vsrc[i], vzero, vpatg);
 59                     vb = (__vector unsigned int) spu_shuffle(vsrc[i], vzero, vpatb);
 60
 61                     vfr = spu_convtf(vr, 0);
 62                     vfg = spu_convtf(vg, 0);
 63                     vfb = spu_convtf(vb, 0);
 64
 65                     vfy = spu_madd(vfr, vrconst, vfzero);
 66                     vfy = spu_madd(vfg, vgconst, vfy);
 67                     vfy = spu_madd(vfb, vbconst, vfy);
 68             
 69                     vy = spu_convtu(vfy, 0);
 70
 71                     vpat = spu_cmpgt(vy, vmax);
 72                     vy = spu_sel(vy, vmax, vpat);
 73
 74                     vdst[i] = (__vector unsigned char) spu_shuffle(vy, (__vector unsigned int) vzero, vpaty);
 75             }
 76
 77             return;
 78     }
 79
 80     int mars_task_main(const struct mars_task_args *task_args)
 81     {
 82             int ret, tag = 0;
 83             int my_id;
 84             uint64_t ea_event, ea_queue;
 85             uint16_t bits;
 86             uint64_t ea_src, ea_dst;
 87             unsigned int remain, num;
 88
 89             ea_queue = task_args->type.u64[0];
 90
 91             /* Pop data from queue */
 92             ret = mars_task_queue_pop_begin(ea_queue, &data, tag);
 93             if (ret) {
 94                     printf("Could not pop data from MARS task queue! (%d)\n", ret);
 95                     return 1;
 96             }
 97             ret = mars_task_queue_pop_end(ea_queue, tag);
 98             if (ret) {
 99                     printf("Could not complete data pop from MARS task queue! (%d)\n", ret);
100                     return 1;
101             }
102
103             my_id    = data.id;
104             ea_event = data.ea_event;
105             ea_src   = data.ea_src;
106             ea_dst   = data.ea_dst;
107             remain   = data.num;
108
109             /* main loop */
110             while (remain > 0) {
111                     if (remain > MAX_BUFSIZE/4) {
112                             num = MAX_BUFSIZE/4;
113                     } else {
114                             num = remain;
115                     }
116
117                     /* DMA Transfer : GET input data */
118                     mfc_get(src_spe, ea_src, num * 4, tag, 0, 0);
119                     mfc_write_tag_mask(1 << tag);
120                     mfc_read_tag_status_all();
121
122                     /* convert to grayscale data */
123                     rgb2y(src_spe, dst_spe, num);
124
125                     /* DMA Transfer : PUT output data */
126                     mfc_put(dst_spe, ea_dst, num * 4, tag, 0, 0);
127                     mfc_write_tag_mask(1 << tag);
128                     mfc_read_tag_status_all();
129
130                     remain -= num;
131                     ea_src += num * 4;
132                     ea_dst += num * 4;
133             }
134
135             /* Set bit to SPURS event flag */
136             bits = 1 << my_id;
137             ret = mars_task_event_flag_set(ea_event, bits);
138             if (ret) {
139                     printf("Could not set MARS task event flag! (%d)\n", ret);
140                     return 1;
141             }
142
143             return 0;
144     }

Lines:9-15	Define the structure for the task queue entry data. This is a redefinition of the same structure defined in the host program at Lines:33-38 as well as in the task 1 program at Lines:11-19.
Lines:17-18	Declare instances of the source and destination buffer to store the processing input/output data.
Lines:22-78	This function handles the grayscale processing of the partitioned input image data in the source buffer and stores the output to the destination buffer.
Line:89	Get the host address of the task queue passed in from the main task.
Lines:92-101	Pop the data from the task queue. If the main task has not pushed data into the task queue by the time of this call, this task will enter a wait state and its context will be switched out. When the task is able to pop data from the task queue, this task will resume execution and continue.
Lines:103-107	Initialize the local variables with the parameters from the data popped from the task queue.
Line:110	Loop until all the input data has been processed. Processing of data in each loop iteration is limited by the size of the local buffer sizes specified at Line:7.
Lines:118-120	Memory transfer the input data from host storage to the source buffer declared at Line:17.
Line:123	Do the actual grayscale processing of image data from source buffer to destination buffer.
Lines:126-128	Memory transfer the output data from the destination buffer declared at Line:18 to host storage.
Lines:136-141	Set the task event flag bits specified by this sub task's identification number.