.. meta:: :description lang=en: xlSlim makes it simple, efficient and performant to use Python to stream all types of data, including numpy arrays and pandas DataFrames, from Kafa into Excel. Streaming from Kafka ==================== .. _Confluent: https://docs.confluent.io/platform/current/quickstart/ce-docker-quickstart.html .. _aiokafka: https://github.com/aio-libs/aiokafka xlSlim can very efficiently stream data from Kafka into Excel. Preparing the data ------------------ This example assumes you have a Kafka broker available. One alternative is to run Kafka locally using Docker, `Confluent`_ package Kafka into Docker containers with good documentation. The example uses the Python package `aiokafka`_ which can be installed using pip. Once a Kafka broker is available we need to publish a sequence of one million doubles to a Kafka topic "doubles". We will use this Python code to publish the data. .. code-block:: python # A running Kafka broker can be setup simply using Docker, the Confluent platform includes a nice management console, see: # https://docs.confluent.io/platform/current/quickstart/ce-docker-quickstart.html # The aiokafka package can be installed using pip, see https://github.com/aio-libs/aiokafka import pickle import asyncio from aiokafka import AIOKafkaProducer async def send(): producer = AIOKafkaProducer( bootstrap_servers="localhost:9092", value_serializer=lambda m: pickle.dumps(m), ) # Get cluster layout and initial topic/partition leadership information await producer.start() try: # Produce messages for i in range(1_000_000): await producer.send("doubles", float(i)) finally: # Wait for all pending messages to be delivered or expire. await producer.stop() if __name__ == "__main__": asyncio.run(send()) .. note:: All the Python code and Excel files shown are available from github in the `xlslim-code-samples`_ repo. .. _xlslim-code-samples: https://github.com/RusselWebber/xlslim-code-samples/archive/refs/heads/main.zip Save the Python code as a new file on your PC. I saved the file in my Documents folder. Run the code directly in Python.:: python kafka_double_producer.py .. image:: ../_static/streaming_kafka_doubles_create_data.png :width: 400 :alt: Create Kafka topic loaded with doubles In the Kafka dashboard I can see one million messages are available (Confluent has a nice dashboard, your Kafka broker may not). .. image:: ../_static/streaming_kafka_doubles_topic_available.png :width: 400 :alt: Kafka data waiting Streaming the data ------------------ Now that the data is available we can stream the sequence of doubles into Excel. As you will see, xlSlim makes this very easy. This Python code defines an asynchronous generator function that yields every message received from a Kafka topic: .. code-block:: python # Shows how a subscription to a Kafka topic can be setup # A running Kafka broker can be setup simply using Docker, the Confluent platform includes a nice management console, see: # https://docs.confluent.io/platform/current/quickstart/ce-docker-quickstart.html # The aiokafka package can be installed using pip, see https://github.com/aio-libs/aiokafka import asyncio import pickle from aiokafka import AIOKafkaConsumer async def kafka_topic_subscription(topic: str, server: str) -> float: """Subscribe to data from a Kafka topic.""" consumer = AIOKafkaConsumer( topic, bootstrap_servers=server, # Using pickle for this demo, more realistic options would be JSON, MessagePack, etc value_deserializer=lambda m: pickle.loads(m), # For this demo we always replay all messages auto_offset_reset="earliest", ) await consumer.start() try: # Consume messages async for msg in consumer: yield msg.value finally: # Will leave consumer group; perform autocommit if enabled. await consumer.stop() Save the Python code as a new file on your PC. I saved the file in my Documents folder. Open Excel and enter this :func:`RegisterPyModule` formula (amending the location to match where you saved the file and the environment where you installed `aiokafka`_): :: =RegisterPyModule("C:\Users\russe\Documents\kafka_double_consumer.py","C:\Users\russe\anaconda3\envs\py37") You should see a message similar to this confirming the module registration: .. image:: ../_static/streaming_kafka_doubles_module_registered.png :width: 400 :alt: Register the Kafka consumer function The `kafka_topic_subscription()` generator function is now available in Excel. Let's call the function with the Kafka broker address and topic: :: =kafka_topic_subscription("doubles","localhost:9092") You will see Excel very rapidly receiving the one million doubles. .. image:: ../_static/streaming_kafka_doubles_receiving_data.png :width: 400 :alt: Receiving Kafka data in Excel .. note:: Set Excel's Calculation Options to Automatic to see the value ticking. The functionality works perfectly in Manual calculation mode too, the value shown updates every time you press F9. Note that the generator function runs in an independent background thread so runs regardless of Excel's calculation mode. The Kafka subscription continues to run once all the messages have been received. If you rerun :: python kafka_double_producer.py you will see that new values are streamed to Excel as soon as they are published to Kafka.