Distributed Python: Celery
Nowadays, SOA architectures is in fashion. These architectures use little and very specific services, so they interact each other.
In this post I’ll show how to use Celery in order to create a SOA architecture.
About Celery
Celery is not a communication system, because it uses RabbitMQ, Redis, etc. as communication system. The same applies to argue it is not a message queue system. Neither it is a protocol, because it uses AMQP. Neither it is an abstraction over all these, because Kombu does that work, the communications library it uses under the hood.
Celery is a set ot tools to work easily with several services, something like syntactic sugar. It allows to launch services as tasks.
How to install it
It is very easy to install; with pip
:
|
|
or apt
:
|
|
The problem here is we need a Broker. The Broker is the way to transport messages from a service to another. In this case, we need a message queue.
We will use two of them as example: RabbitMQ and Redis. The first one is somehow… complicated, but very interesting if you already know and install the management plugin. The second one is easier to be installed, but more difficult to see what is happening internally. To use Redis you will also require the library; with pip
:
|
|
or with apt
:
|
|
Example
Creating services
Let’s start creating a little service to multiply two numbers:
|
|
The overhead added by Celery is quite small: importing the library, connecting (app
) and adding a decorator to our service.
It is already executable:
|
|
There is a lot of information here:
- Used versions
- Created queues
- Brokers used to transport and retrieve results
- Number of workers, that is, processes available to process a request in a concurrent way.
And more, but it can be ignored right now.
We will leave it running.
Running services
That was the subscriptor. Let create the publisher.
|
|
This is more complicated… but quick to be explained:
- First part is the same than before, because we need to conect to the same server.
- Second part runs the task with its name, passing the arguments. That returns a promise. At this point we have leaved a message in the queue.
- Finally, the promise is resolved with
promise.get()
. Whenever, the service had read and processed the message, and it has leaved the result in other queue and, with this method, it is read.
So it seems complicated… but it could be even more.
Anyways, I’ve saved a video to see it all:
Everything together is easier
You can put it all together in the same file, and everything become easier. This is not always possible and can give problems (like every change implies to reboot both client and service), but is very educational:
|
|
Now we are running a task method directly, so we get the task.
And now we have a very complicated way to multiply two numbers :)
Tools
As I said before, Celery gives you tools and syntactic sugar.
First of all, we’ll see partials. They are functions with some parameters already set.
It is very easy to create the partial duplicate
by forcing the first parameter of multiply
:
|
|
Yeah, partials are built with method .si
, despite we can use .s
too. The fist one is immutable (that explains the i
). It is very difficult to explain this before explaining another Celery tool: Chains
.
Celery allows to chain tasks, so the result of one is the first parameter of the next one. Why the first one? It works so. It is a fucking shit, but works so.
Sometimes we will require chained tasks just to set an order, so we can ignore the previous result. Here is when we can use the immutable functions, because they won’t use the previous task result.
I suppose you have noticed I use “functions” or “services” equally.
But too much talk; let’s concat something:
|
|
As I said before, Celery gives syntactic sugar, so it can be rewritten to:
|
|
And here is where fun starts: we have a canvas, that is, a task workflow. Believe ir or not, if you launch several workers in different hosts and all of them are connected to the same broker, each operation could run in a different host.
But… Why the canvas if all the operations run sequentially? Because we can process them concurrently with Groups
:
|
|
There is even more syntactic sugar, like chords
, maps
, starmaps
and chunks
, but we won’t browse them. This is an introductory post. You can read more about Canvas.
The Beat
Other important tool in Celery is the Beat. It is a beat or a periodic signal. We can start it in the worker with --beat
option. While Beat exclusiveness is not added, it is important to run just one or we will receive more than one beat.
This beat allows us to run periodic tasks.
Let’s see a small example: we are going to multiply the hour by 2 every 10 seconds:
|
|
And here you have the video:
Again, I’m going to stop here; I can only show a little about it. You can read more about periodic tasks if you are interested in.
Warning: pickle
Perhaps you see a warning like warnings.warn(CDeprecationWarning(W_PICKLE_DEPRECATED))
. This is because we are using the pickle serializer, which is deprecated. You can avoid it by adding:
|
|
Complex environments
Obviously, I’ve talked just about basic Celery. We’ve used just a queue to communicate. As long as RabbitMQ allows queue operations, Celery too.
You can:
- assign different queues to perform different tasks, so not all workers listen all requests or to assign different workers to different queues.
- route messages between queues.
- send messages to all queues (topic).
- …
I’m not going to explain it deeply, but I’m going to describe the message path. This information is required to understand how routing works (found at Exchanges, queues and routing keys):
- Message is sent. It is leaved in an EXCHANGE. If it doesn’t exist, Celery will create it.
- The EXCHANGE routes the message to one or more queues, depending on its configuration. If queues doesn’t exist, Celery will care.
- The message waits in the queue until a consumer takeExchanges it. At this moment, the message is blocked to avoid other consumers to take it.
- After processing it, consumer sends an ACK and the message is finally removed.
By default, Celery creates queues with the exchange name and binds them in a direct way, that is, every message that gets the exchange is routed to that queue.
You can play with routings and create really complex architectures, so any message arrives to its target depending on the operation.
More information
The post AMQP, RabbitMQ and Celery - A Visual Guide For Dummies has beginner level, but I like it a lot because it explains everything with images. How To Use Celery with RabbitMQ to Queue Tasks on an Ubuntu VPS is good too, and the PyCon slides Celery for Internal API in SOA infrastructure too.
Getting Started Scheduling Tasks with Celery is about configuring periodic tasks in a dynamic way by using DJCelery, quite interesting to use it with Django.
In order to understand it under the hood, I recommend AMQP 0-9-1 Model Explained, where AMQP protocol is explained. It is very interesing to try complex routings. AMQP in 10 mins : Part3 – Flexible Routing Model talks about that too, but is easier and just explains the basic concepts.
If you want to participate in a project using Celery, I can cheer you to help me with DJCron, a wrapper over DJCelery that allows to configure distributed tasks dinamically, adding some extra features over DJCelery.