In its most basic and adaptable form, this hardware provides three core functions: **Capture** sound through a high-SNR microphone, **Process** it via onset detection, clustering, and mapping to synthesis parameters, and **Play** it back using a wave sequencing synthesizer engine. 1) Capture -> Microphone 2) Process -> Cluster, Map to synthesizer 3) Play -> Synthesiser engine Quasi-Detailed block diagram: ![[Block--Diagram.png]] ## Recording-Chopping-Clustering ### Internal Microphone(TSB-2590A) ![[Screenshot 2025-05-11 at 15.04.32.png]] With its 25mm diaphragm, it features a subtle presence boost of a few dB in the 10–15 kHz range and incorporates a low-noise internal FET, delivering a studio-grade signal-to-noise ratio of ≥70 dB. [^1] #### Future Add-On The synthesiser has an audio jack and a internal microphone. I was thinking to upgrade the microphone to a Ambisonic microphone to be able to capture, ambient sound in multi direction. However this might propose processing challenges in the t-SNE step in dimension reduction. [^2] ![[Screenshot 2025-05-10 at 11.31.30.png]] ## Initial Processing I was greatly inspired by the development of [Audiostellar](https://audiostellar.xyz/lang/en/index.html) by the [MUNTREF](https://untref.edu.ar/muntref/es/arte-y-ciencia/) Group please check out there work as well.[^3] Currently the clustering algorithm is theirs. In this the recording is chopped using onset-detection. ![[Screenshot 2025-05-11 at 15.08.19.png]] ## Graphical User Interface The initial Clustering algorithm is their open source project but I had to develop novel interactions for the field context. The below video demonstrates how the Cicada differentiates itself from the rest of samplers by engaging with a group of samples rather than just a single chopped one. When the field-recording is done, the recording is chopped and clustered automatically. The user can learn about the sonic qualities of the clusters by using the browse feature. This feature consists of a circular pointer to play the samples that fall into the pointer. ![[Multisample_Example.mp4]] Below you can see the [[sound selection process]] from the clustered and analysed slices. ![[Sample-Select.mp4]] ## Sample-to-Synth #2 ##### As of 2 June: - The whole system is operational with rule based algorithms guiding the markov decision processes. I hope to update the system to a fully generative one when I figure out how to generate a dataset for training as it would expand the input possibilities(ideally the user inputs anything that they feel, not just choose from existing classes). As this is an edge application, GRU's might be a good option to utilise, but I am still expanding on that. #### Core Concept The mapping tool operates on the principle that audio samples can be positioned in a multidimensional space based on their timbral similarity. By understanding these relationships, the system can create smooth transitions between similar sounds or dramatic contrasts between distant ones, depending on the desired musical character. #### Three-Lane Architecture The synthesis engine operates through three independent but synchronized decision-making processes: ##### Sample Selection Lane This lane determines which audio samples to use in the sequence. The system first analyzes your sample library using dimensionality reduction techniques to create a "map" of sonic relationships. Samples that sound similar are placed close together in this virtual space, while dramatically different samples are positioned far apart. The tool then builds a network of transition probabilities between samples, allowing it to intelligently choose the next sample based on the current one and the desired musical mood. This creates sequences that can flow smoothly between similar timbres or leap dramatically to contrasting sounds. ##### Pitch Transformation Lane Working in parallel, this lane decides how to transpose each selected sample. The system considers musical scales and octave ranges to ensure harmonic coherence while allowing for creative pitch exploration. The pitch lane operates within a defined musical framework, respecting scale relationships while introducing controlled variation. This ensures that even when samples are transposed dramatically, they maintain musical relationships that make sense to the listener. ##### Timing Control Lane The third lane manages the rhythmic structure of the sequence by determining how long each sample should play. It works with a comprehensive set of note durations including standard note values, dotted rhythms, triplets, and rests. This lane creates the Cicada backbone of the composition, establishing patterns that can be steady and predictable or complex and evolving, depending on the chosen mood setting. #### Musical Mood System The mapping tool's intelligence comes from its mood-based parameter system, which fundamentally changes how the three lanes behave: ##### Chaotic Mode - **Sample Selection**: Favors dramatic jumps between dissimilar sounds - **Pitch Control**: Embraces extreme transpositions and unexpected harmonic leaps - **Timing**: Creates rhythmic disruption with sudden changes and strategic silences ##### Ambient Mode - **Sample Selection**: Emphasizes smooth transitions between similar timbres - **Pitch Control**: Maintains harmonic stability with minimal transposition - **Timing**: Focuses on longer, flowing durations that create atmospheric continuity ##### Minimal Mode - **Sample Selection**: Balances similarity and contrast for controlled variation - **Pitch Control**: Uses moderate, musical intervals for subtle harmonic movement - **Timing**: Establishes steady, foundational rhythms with measured complexity ##### Evolving Mode - **Sample Selection**: Creates progressive movement through the sonic space - **Pitch Control**: Encourages harmonic development and forward motion - **Timing**: Builds complexity in cycles, creating natural musical arcs #### Technical Foundation The system uses t-SNE (t-Distributed Stochastic Neighbor Embedding) to analyze audio samples and position them in a perceptual space based on their sonic characteristics. This machine learning technique reduces complex audio features into a map that reflects how humans perceive sonic similarity. Markov chains drive the decision-making process in each lane, using probability matrices that are dynamically adjusted based on the selected mood. This creates sequences that feel both structured and organic, with each decision influenced by the previous state while maintaining overall coherence. ##### Configuration and Customization The tool accepts several key parameters that shape the output: - **Sequence Length**: Controls how many samples will be generated in the final sequence - **Octave Range**: Defines the maximum pitch transposition range available - **Musical Scale**: Constrains pitch relationships to specific harmonic frameworks - **Sample Library**: The collection of audio files to draw from during generation #### Output and Integration The synthesis process generates three synchronized sequences of equal length: - A sequence of sample selections from your library - A corresponding sequence of pitch transpositions - A matching sequence of timing values These three components work together to create a wave sequence synthesis patch to be utilised with the synth engine. ## Synth Engine One of the best ways to utilise samples is the Wave Sequence Synthesis technique. By utilising the sliced-samples almost as is, this method provides an unique way of representing the sonic environment the device is used. Similar to Cicada, the underlying artistic vision emerged from the Music Concrete movement. Check out [this article](https://midi.org/the-origins-of-wave-sequencing) by Stanley Jungleib.[^4] ## Physical User Interface While the current embodiment of the instrument in the videos has a polished look. The best way to move forward is to consider ease of production in the open source context. (to be updated-changes are made) 1) **Accessibility & Reproducibility** Off the shelf parts, minimal soldering. 2) **Low skill floor, High Skill ceiling** Easy to build basic version, but expandable if wanted ### Midi Keyboard For that with the Minimum Likeable Product, we will A/B test: - PCB capacitative touch midi(can be used to modulate)[^5]![[Screenshot 2025-05-11 at 16.20.54.png]] ![[Screenshot 2025-05-11 at 16.21.25.png]] - Keyboard midi from off the shelf parts.[^6] ![[Screenshot 2025-05-11 at 16.46.55.png]] ### Display The current GUI and sample selection interaction requires an touch-display. However there might be some way to reimagine the selection process so that a more affordable, LCD screen can be used to get the cost down. However the touch screen can be a versatile interface for example, vector automation, multitouch, gestural recording, region-based modulation, physics-based modulation etc. This TFT display is 42$[^7] ![[Screenshot 2025-05-11 at 17.13.07.png]] ### Footnotes [^1]: https://www.jlielectronics.com/microphone-capsules/jli-2590a/ [^2]: https://www.instructables.com/Ambi-Alice-a-First-Order-Ambisonic-Microphone/ [^3]: https://audiostellar.xyz/lang/en/index.html [^4]: https://www.jungleib.com/ [^5]: https://ww1.microchip.com/downloads/en/AppNotes/Atmel-42479-Capacitive-Touch-Long-Slider-Design-with-PTC_AT11805_ApplicationNote.pdf [^6]: https://chompiclub.com/?srsltid=AfmBOooVbHW4PqPYAzI4nxh3Gqc1LvPnXfWo7ZYANBY42uKAsBH1oSd1 [^7]: [Aliexpress](https://www.aliexpress.com/item/1005007768229394.html?spm=a2g0o.productlist.main.2.5580Ej3MEj3MU0&algo_pvid=80eb30ed-177a-4b60-afce-9facdec4bcd8&algo_exp_id=80eb30ed-177a-4b60-afce-9facdec4bcd8-1&pdp_ext_f=%7B%22order%22%3A%223%22%2C%22eval%22%3A%221%22%7D&pdp_npi=4%40dis%21GBP%2118.19%2118.19%21%21%2123.36%2123.36%21%402103894417469799092401796e6bd2%2112000047220705233%21sea%21UK%210%21ABX&curPageLogUid=grNDhGTpBLNk&utparam-url=scene%3Asearch%7Cquery_from%3A)